CN116679983A - Data flow-based computing unit, vector processor and artificial intelligent chip - Google Patents

Data flow-based computing unit, vector processor and artificial intelligent chip Download PDF

Info

Publication number
CN116679983A
CN116679983A CN202210169225.4A CN202210169225A CN116679983A CN 116679983 A CN116679983 A CN 116679983A CN 202210169225 A CN202210169225 A CN 202210169225A CN 116679983 A CN116679983 A CN 116679983A
Authority
CN
China
Prior art keywords
calculation
circuit
computing
data
switch
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210169225.4A
Other languages
Chinese (zh)
Inventor
曾成龙
蔡权雄
牛昕宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Corerain Technologies Co Ltd
Original Assignee
Shenzhen Corerain Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Corerain Technologies Co Ltd filed Critical Shenzhen Corerain Technologies Co Ltd
Priority to CN202210169225.4A priority Critical patent/CN116679983A/en
Publication of CN116679983A publication Critical patent/CN116679983A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • G06F9/30036Instructions to perform operations on packed data, e.g. vector, tile or matrix operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline, look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline, look ahead using instruction pipelines
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The disclosure provides a computation unit, a vector processor and an artificial intelligence chip based on data flow, and relates to the technical field of artificial intelligence, wherein the computation unit comprises a plurality of computation circuits, and the computation circuits comprise: a first computing circuit configured to perform a first computation on the first set of data to obtain a first set of computation results; a second calculation circuit, in parallel with the first calculation circuit, configured to perform a second calculation on a second set of data to obtain a second set of calculation results; and a third calculation circuit configured to perform a third calculation on the third set of data to obtain a third set of calculation results, where a third set of data flowing to the third calculation circuit includes the first set of calculation results and the second set of calculation results, wherein at least one of a duration of the first calculation and a duration of the second calculation partially coincides with a duration of the third calculation.

Description

Data flow-based computing unit, vector processor and artificial intelligent chip
Technical Field
The disclosure relates to the technical field of artificial intelligence, in particular to a computing unit, a vector processor and an artificial intelligence chip based on data flow.
Background
With the development of artificial intelligence technology, various artificial intelligence algorithms have been widely used in various industries. However, the computations involved in artificial intelligence algorithms are generally complex and of a wide variety, requiring high demands on the computational unit used to perform the computations.
In the related art, computation in an artificial intelligence algorithm is generally performed using a computation unit based on an instruction set.
Disclosure of Invention
The inventors noted that in the related art, the calculation speed of the calculation unit is low.
The inventors have found through analysis that, for multi-step calculations in which there is a data dependency (i.e., the data required for the next step calculation includes the calculation results of the previous step calculation), in such an instruction set-based calculation unit, the previous calculation circuit outputs all the calculation results obtained together to the next calculation circuit only after all the previous steps to be executed have been completed, and the next calculation circuit must be in an idle state until it is possible to start executing the next step calculation after receiving the calculation results output from the previous calculation circuit. This serial calculation manner of the stepwise calculation results in a lower calculation speed of the calculation unit.
In order to solve the above-described problems, the embodiments of the present disclosure propose the following solutions.
According to an aspect of the embodiments of the present disclosure, there is provided a data stream-based computing unit including: a first computing circuit configured to perform a first computation on the first set of data to obtain a first set of computation results; a second calculation circuit, in parallel with the first calculation circuit, configured to perform a second calculation on a second set of data to obtain a second set of calculation results; and a third calculation circuit configured to perform a third calculation on the third set of data to obtain a third set of calculation results, where a third set of data flowing to the third calculation circuit includes the first set of calculation results and the second set of calculation results, wherein at least one of a duration of the first calculation and a duration of the second calculation partially coincides with a duration of the third calculation.
In some embodiments, the first calculation is the same as the second calculation.
In some embodiments, the plurality of computing circuits further comprises: a fourth calculation circuit configured to perform a fourth calculation on the fourth set of data to obtain a fourth set of calculation results, where the fourth calculation is different from both the first calculation and the third calculation, and where a duration of the fourth calculation and a duration of the third calculation partially overlap, if the fourth set of data flowing to the fourth calculation circuit includes the third set of calculation results.
In some embodiments the fourth calculation is an inverse of the first calculation.
In some embodiments the first calculation is a fixed point number to floating point number calculation and the fourth calculation is a floating point number to fixed point number calculation.
In some embodiments, the plurality of computing circuits further comprises: a fifth calculation circuit configured to perform a fifth calculation on the fifth set of data to obtain a fifth set of calculation results, in a case where the fifth set of data flowing to the fifth calculation circuit includes the second set of calculation results, wherein the fifth calculation is different from the second calculation, and a duration of the fifth calculation partially coincides with a duration of the second calculation; the fourth computing circuit is further configured to perform the fourth computation on the fourth set of data if the fourth set of data includes the fifth set of computation results.
In some embodiments, the fifth calculation is an exponential function calculation.
In some embodiments, the computing unit further comprises at least one set of switches in one-to-one correspondence with at least one of the plurality of computing circuits, each set of switches comprising a first switch in parallel with a corresponding computing circuit, and a second switch in series with the corresponding computing circuit and in parallel with the first switch.
In some embodiments, the at least one set of switches includes two sets of switches in one-to-one correspondence with at least two computing circuits.
In some embodiments, the fourth and fifth computing circuits are in one-to-one correspondence with two sets of switches, each set of switches including a first switch in parallel with a corresponding computing circuit, and a second switch in series with the corresponding computing circuit and in parallel with the first switch.
According to another aspect of the disclosed embodiments, there is provided a vector processor including the data stream based computing unit according to any one of the above embodiments.
In some embodiments, the vector processor further comprises: a data buffer configured to buffer the first set of data and the second set of data, transmit the first set of data to the first computing circuit and transmit the second set of data to the second computing circuit in response to a first driving signal corresponding to a computing task; a direct memory access module configured to read the first set of data and the second set of data from an off-chip memory in response to a second drive signal corresponding to the computing task and to transmit the first set of data and the second set of data to the data buffer; and a control register configured to transmit the first driving signal and the second driving signal in response to a control signal.
According to yet another aspect of the disclosed embodiments, an artificial intelligence chip is provided, including the vector processor according to any one of the above embodiments.
According to still another aspect of the embodiments of the present disclosure, there is provided a data flow-based computing method, including: the first computing circuit performs a first computation on the first set of data to obtain a first set of computation results; a second calculation circuit connected in parallel with the first calculation circuit performs a second calculation on a second set of data to obtain a second set of calculation results; in the case where the third set of data flowing to the third calculation circuit includes the first set of calculation results and the second set of calculation results, the third calculation circuit performs a third calculation on the third set of data to obtain a third set of calculation results, wherein at least one of a duration of the first calculation and a duration of the second calculation partially coincides with a duration of the third calculation.
In some embodiments, the first calculation is the same as the second calculation.
In some embodiments, the computing method further comprises: in the case where the fourth set of data flowing to the fourth calculation circuit includes the third set of calculation results, the fourth calculation circuit performs a fourth calculation on the fourth set of data to obtain a fourth set of calculation results, wherein the fourth calculation is different from both the first calculation and the third calculation, and a duration of the fourth calculation and a duration of the third calculation partially overlap.
In some embodiments, the fourth calculation is an inverse of the first calculation.
In some embodiments, the first calculation is a fixed-point-to-floating-point calculation and the fourth calculation is a floating-point-to-fixed-point calculation.
In some embodiments, the computing method further comprises: in the case that a fifth set of data flowing to a fifth calculation circuit includes the second set of calculation results, the fifth calculation circuit performs a fifth calculation on the fifth set of data to obtain a fifth set of calculation results, wherein the fifth calculation is different from the second calculation, and a duration of the fifth calculation and a duration of the second calculation partially overlap; in the case where the fourth set of data includes the fifth set of calculation results, the fourth calculation circuit also performs the fourth calculation on the fourth set of data.
In some embodiments, the fifth calculation is an exponential function calculation.
In some embodiments, the first computing circuit and the second computing circuit are in one-to-one correspondence with two sets of switches, each set of switches including a first switch and a second switch; a first switch corresponding to the first computing circuit is connected in parallel with the first computing circuit, and a second switch corresponding to the first computing circuit is connected in series with the first computing circuit and is connected in parallel with the first switch corresponding to the first computing circuit; the first switch corresponding to the second computing circuit is connected in parallel with the second computing circuit, and the second switch corresponding to the second computing circuit is connected in series with the second computing circuit and is connected in parallel with the first switch corresponding to the second computing circuit; under a first condition, the third set of data comprises the first set of data and the second set of calculation results, the first condition comprises that a first switch corresponding to the first calculation circuit is turned on, a second switch corresponding to the first calculation circuit is turned off, and the first switch corresponding to the second calculation circuit is turned off, and the second switch corresponding to the second calculation circuit is turned on; under a second condition, the third set of data comprises the second set of data and the first set of calculation results, the second condition comprises that a first switch corresponding to the first calculation circuit is turned off, a second switch corresponding to the first calculation circuit is turned on, and the first switch corresponding to the second calculation circuit is turned on, and the second switch corresponding to the second calculation circuit is turned off; in a third condition, the third set of data includes the first set of data and the second set of data, the third condition includes that both a first switch corresponding to the first computing circuit and a first switch corresponding to the second computing circuit are on, and both a second switch corresponding to the first computing circuit and a second switch corresponding to the second computing circuit are off.
In some embodiments, the first computing circuit, the second computing circuit, and the third computing circuit are in one-to-one correspondence with three sets of switches, each set of switches including a first switch and a second switch; a first switch corresponding to the first computing circuit is connected in parallel with the first computing circuit, and a second switch corresponding to the first computing circuit is connected in series with the first computing circuit and is connected in parallel with the first switch corresponding to the first computing circuit; the first switch corresponding to the second computing circuit is connected in parallel with the second computing circuit, and the second switch corresponding to the second computing circuit is connected in series with the second computing circuit and is connected in parallel with the first switch corresponding to the second computing circuit; the first switch corresponding to the third computing circuit is connected in parallel with the third computing circuit, and the second switch corresponding to the third computing circuit is connected in series with the third computing circuit and is connected in parallel with the first switch corresponding to the third computing circuit; under a fourth condition, the fourth set of data includes the first set of data, the fourth condition includes that a first switch corresponding to the first computing circuit and a first switch corresponding to the third computing circuit are both on, and a second switch corresponding to the first computing circuit, a second switch corresponding to the third computing circuit and a set of switches corresponding to the second computing circuit are all off; under a fifth condition, the fourth set of data includes the first set of calculation results, the fifth condition includes that a first switch corresponding to the first calculation circuit is turned off, a corresponding second switch is turned on, and a first switch corresponding to the third calculation circuit is turned on, a second switch corresponding to the third calculation circuit is turned off, and a set of switches corresponding to the second calculation circuit is turned off; under a sixth condition, the fourth set of data includes the second set of data, the sixth condition includes that a first switch corresponding to the second computing circuit is turned on, a corresponding second switch is turned off, and a group of switches corresponding to the first computing circuit and a group of switches corresponding to the third computing circuit are turned off; under a seventh condition, the fourth set of data includes the second set of calculation results, the seventh condition includes that the first switch corresponding to the second calculation circuit is turned off, the corresponding second switch is turned on, and the group of switches corresponding to the first calculation circuit and the group of switches corresponding to the third calculation circuit are turned off.
In an embodiment of the present disclosure, a data stream based computing unit includes a first computing circuit, a second computing circuit, and a third computing circuit. Since at least one of the duration of the first calculation performed by the first calculation circuit and the duration of the second calculation performed by the second calculation circuit partially coincides with the duration of the third calculation performed by the third calculation circuit, the third calculation performed in the subsequent step can be started without waiting for the completion of the first calculation and the second calculation performed in the previous step. In this way, the third calculation can be performed in parallel with the first calculation or the second calculation for a period of time, thereby increasing the calculation speed of the calculation unit.
In addition, in the process of executing the first calculation and the second calculation, the third calculation circuit is not required to be in an idle state all the time, so that the idle time of calculation resources in the calculation unit is shortened, and the resource utilization rate of the calculation unit is improved.
The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to a person of ordinary skill in the art.
FIG. 1 is a schematic diagram of a data flow based computing unit according to some embodiments of the present disclosure;
FIG. 2 is a schematic diagram of a data flow based computing unit according to further embodiments of the present disclosure;
FIG. 3 is a schematic diagram of a data flow based computing unit according to further embodiments of the present disclosure;
FIG. 4 is a schematic diagram of a data flow based computing unit according to further embodiments of the present disclosure;
FIG. 5 is a schematic diagram of a data flow based vector processor according to some embodiments of the present disclosure;
FIG. 6 is a data flow based computing method according to some embodiments of the present disclosure;
fig. 7 is a data flow based computing method according to further embodiments of the present disclosure.
Detailed Description
The following description of the technical solutions in the embodiments of the present disclosure will be made clearly and completely with reference to the accompanying drawings in the embodiments of the present disclosure, and it is apparent that the described embodiments are only some embodiments of the present disclosure, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments in this disclosure without inventive faculty, are intended to fall within the scope of this disclosure.
The relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but should be considered part of the specification where appropriate.
In all examples shown and discussed herein, any specific values should be construed as merely illustrative, and not a limitation. Thus, other examples of the exemplary embodiments may have different values.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Fig. 1 is a schematic diagram of a data flow based computing unit according to some embodiments of the present disclosure.
As shown in fig. 1, the data stream based computing unit 100 includes a first computing circuit 11, a second computing circuit 12, and a third computing circuit 13. Wherein the second calculation circuit 12 is connected in parallel with the first calculation circuit 11.
The first calculation circuit 11 may be configured to perform a first calculation on the first set of data to obtain a first set of calculation results.
The second calculation circuit 12 may be configured to perform a second calculation on the second set of data to obtain a second set of calculation results.
For example, the first calculation circuit 11 may include an input and an output (not shown in fig. 1) such that the first calculation circuit 11 may receive the first set of data through the input and output the first set of calculation results through the output. Similarly, the second computing circuit 12 may also include an input and an output (not shown in FIG. 1) such that the second computing circuit 12 may receive the second set of data via the input and output the second set of computing results via the output.
The third calculation circuit 13 may be connected to the outputs of the first calculation circuit 11 and the second calculation circuit 12 via two data paths, respectively, such that a first set of calculation results flows to the third calculation circuit 13 via one of the two data paths and a second set of calculation results flows to the third calculation circuit 13 via the other of the two data paths.
The third calculation circuit 13 may be configured to perform a third calculation on the third set of data to obtain a third set of calculation results, in the case where the third set of data flowing to the third calculation circuit includes the first set of calculation results and the second set of calculation results. In other embodiments, the third set of data flowing to the third computing circuit 13 may include other data, as will be further described below.
Here, at least one of the duration of the first calculation and the duration of the second calculation may partially coincide with the duration of the third calculation.
It should be appreciated that the first, second, and third sets of computing results may each include one or more computing results.
In some embodiments, the first calculation may be the same as the second calculation.
In some embodiments, the duration of the first calculation and the duration of the second calculation may both partially coincide with the duration of the third calculation.
For example, the first calculation and the second calculation are fixed-point number to floating-point number calculations, the third calculation is a floating-point number addition calculation, the first set of data received by the first calculation circuit 11 is 10 fixed-point numbers, and the second set of data received by the second calculation circuit 12 is another 10 fixed-point numbers. The first calculation circuit 11 and the second calculation circuit 12 may perform fixed-point number to floating-point number calculations in parallel. In the case where the third set of data flowing to the third calculation circuit 13 includes one floating point number output by the first calculation circuit 11 and one floating point number output by the second calculation circuit 12, the third calculation circuit 13 may perform floating point number addition calculation on the two floating point numbers to obtain one floating point number addition calculation result.
In this case, the duration of the first calculation and the duration of the second calculation are both partially coincident with the duration of the third calculation, i.e., the third calculation circuit 13 does not need to wait for the first calculation circuit 11 and the second calculation circuit 12 to turn all of the 10 fixed-point numbers received by each into 10 floating-point numbers, respectively, before starting to perform the calculation.
In the above embodiment, the data stream based computing unit includes the first computing circuit, the second computing circuit, and the third computing circuit. Since at least one of the duration of the first calculation performed by the first calculation circuit and the duration of the second calculation performed by the second calculation circuit partially coincides with the duration of the third calculation performed by the third calculation circuit, the third calculation performed in the subsequent step can be started without waiting for the completion of the first calculation and the second calculation performed in the previous step. In this way, at least one of the first calculation and the second calculation may be performed in parallel with the third calculation for a period of time, thereby increasing the calculation speed of the calculation unit.
In addition, in the process of executing the first calculation and the second calculation, the third calculation circuit is not required to be in an idle state all the time, so that the idle time of calculation resources in the calculation unit is shortened, and the resource utilization rate of the calculation unit is improved.
Fig. 2 is a schematic diagram of a data flow based computing unit according to further embodiments of the present disclosure.
As shown in fig. 2, the plurality of computation circuits in the data stream based computation unit 100 may further include a fourth computation circuit 14. For example, the fourth calculation circuit 14 may be connected to the third calculation circuit 13 through a data path so that the third set of calculation results outputted from the third calculation circuit 13 may flow to the fourth calculation circuit 14 through the data path.
The fourth computing circuit 14 may be configured to perform a fourth computation on the fourth set of data to obtain a fourth set of computation results, in the case where the fourth set of data flowing to the fourth computing circuit includes the third set of computation results.
Here, the duration of the fourth calculation may partially coincide with the duration of the third calculation.
In some embodiments, the fourth calculation may be different from both the first calculation and the third calculation. At this time, the first calculation may be the same as the second calculation.
In some embodiments, the fourth calculation may be an inverse of the first calculation. For example, the first calculation and the second calculation may both be fixed-point-to-floating-point calculations, the third calculation being a floating-point addition calculation, and the fourth calculation being a floating-point-to-fixed-point calculation.
In the above embodiment, the data stream based computing unit may further comprise a fourth computing circuit to support more complex multi-step computations. In this case, since the duration of the fourth calculation performed by the fourth calculation circuit can partially coincide with the duration of the third calculation performed by the third calculation circuit, the fourth calculation performed in the subsequent step can be started without waiting for the completion of the whole execution of the third calculation performed in the previous step, thereby further improving the calculation speed of the calculation unit.
Fig. 3 is a schematic diagram of a data flow based computing unit according to further embodiments of the present disclosure.
As shown in fig. 3, the plurality of calculation circuits in the data stream based calculation unit 100 may further include a fifth calculation circuit 15.
The fifth calculation circuit 15 may be configured to perform a fifth calculation on the fifth set of data to obtain a fifth set of calculation results in the case where the fifth set of data flowing to the fifth calculation circuit includes the second set of calculation results output by the second calculation circuit 12. Here, the duration of the fifth calculation and the duration of the second calculation partially coincide.
In some embodiments, the fifth calculation may be different from the second calculation. For example, the second calculation may be a fixed point number to floating point number calculation and the fifth calculation may be an exponential function calculation. In this way, the data stream based computing unit may also optionally support a wider variety of multi-step computations.
In some embodiments, the second calculation circuit 12 may be connected to the third calculation circuit 13 and the fifth calculation circuit 15 through a first control switch 31 (e.g., a single pole double throw switch). By controlling the first switch 31, the second group of calculation results output from the second calculation circuit 12 can be selectively output to the third calculation circuit 13 or the fifth calculation circuit 15.
In some embodiments, the fourth calculation circuit 14 may be connected to the third calculation circuit 13 and the fifth calculation circuit 15 through a second control switch 32 (e.g., a single pole double throw switch). By controlling the second switch 32, it is possible to select either to flow the third group of calculation junctions output from the third calculation circuit 13 to the fourth calculation circuit 14 or to flow the fifth group of calculation results output from the fifth calculation circuit 15 to the fourth calculation circuit 14.
In the case where the fourth set of data flowing to the fourth calculation circuit 14 includes the fifth set of calculation results output by the fifth calculation circuit 15, the fourth calculation circuit 14 may be configured to perform the fourth calculation on the fifth set of calculation results. For example, the second calculation circuit 12 may perform a fixed-point number to fixed-point number calculation to convert 10 fixed-point numbers to 10 floating-point numbers, the fifth calculation circuit 15 may perform an exponential function calculation to obtain an exponential function calculation result in the form of 10 floating-point numbers, and the fourth calculation circuit 14 may perform a floating-point number to fixed-point number calculation to convert the exponential function calculation result in the form of 10 floating-point numbers to an exponential function calculation result in the form of 10 fixed-point numbers.
It should be appreciated that the duration of the fourth calculation may partially coincide with the duration of the fifth calculation.
It should also be appreciated that single pole double throw switches 31 and 32 are merely examples. For example, the second computing circuit 12 may also be connected to a demultiplexer, which may receive control signals corresponding to the computing tasks to select to which computing circuit to output the second set of computing results; the fourth calculation circuit 14 may also be connected to a multiplexer that receives control signals corresponding to the calculation tasks to select which set of calculation results to flow to the fourth calculation circuit 14.
In some embodiments, the data stream based computing unit 100 may further include at least one set of switches, which may be in one-to-one correspondence with at least one of the plurality of computing circuits. Each set of switches may include a first switch in parallel with a corresponding computing circuit, and a second switch in series with the corresponding computing circuit and in parallel with the first switch.
For example, when a first switch corresponding to a certain computing circuit is turned on and a second switch corresponding to the certain computing circuit is turned off, data will flow to a next computing circuit via a path where the first switch corresponding to the computing circuit is located, and will not flow to the computing circuit. In this case, the calculation circuit does not receive data nor does it perform calculations.
For another example, when a first switch corresponding to a certain calculation circuit is turned off and a corresponding second switch is turned on, data will flow into the calculation circuit, but will not flow to a path where the first switch corresponding to the calculation circuit is located. In this case, the calculation circuit will perform calculations based on the incoming data.
Therefore, different data paths in the data flow-based computing unit can be selectively configured according to actual computing requirements, so that various types of computation can be supported, and the flexibility of the computing unit is improved.
Fig. 4 is a schematic diagram of a data flow based computing unit according to further embodiments of the present disclosure.
As shown in fig. 4, at least one set of switches in the data stream based computing unit 100 may include two sets of switches in one-to-one correspondence with at least two computing circuits. Fig. 4 schematically shows 5 calculation circuits and corresponding 5 sets of switches 41. Each set of switches 41 may include a first switch 42 in parallel with a corresponding computing circuit and a second switch 43 in series with the corresponding computing circuit and in parallel with the first switch 42.
In some embodiments, the first computing circuit 11 and the second computing circuit 12 may correspond to two sets of switches 41 one to one. The third set of data flowing to the third calculation circuit 13 can be made different by controlling the two sets of switches 41.
For example, if the corresponding first switch 42 of the first computing circuit 11 is on, the corresponding second switch 43 is off, and the corresponding first switch 42 of the second computing circuit 12 is off, the corresponding second switch 43 is on, the third set of data flowing to the third computing circuit 13 may include the first set of data and the second set of computing results. The third calculation circuit 13 may also be configured to perform a third calculation on the first set of data and the second set of calculation results.
For another example, if the corresponding first switch 42 of the first computing circuit 11 is off, the corresponding second switch 43 is on, and the corresponding first switch 42 of the second computing circuit 12 is on, the corresponding second switch 43 is off, the third set of data flowing to the third computing circuit 13 may include the second set of data and the first set of computing results. The third calculation circuit 13 may also be configured to perform a third calculation on the second set of data and the first set of calculation results.
For another example, if the first switch 42 corresponding to the first calculation circuit 11 and the first switch 42 corresponding to the second calculation circuit 12 are both on, and the second switch 43 corresponding to the first calculation circuit 11 and the second switch 43 corresponding to the second calculation circuit 12 are both off, the third set of data flowing to the third calculation circuit 13 may include the first set of data and the second set of data. The third calculation circuit 13 may also be configured to perform a third calculation on the first set of data and the second set of data.
In this way, the third calculation circuit 13 may perform a third calculation according to a third different set of data.
In other embodiments, the first computing circuit 11 and the third computing circuit 13 may correspond to two sets of switches 41 one to one. Similarly, the second calculation circuit 12 and the third calculation circuit 13 may correspond to two sets of switches 41 one by one.
For example, if the set of switches 41 corresponding to the first calculation circuit 11 is turned off, and the first switch 42 corresponding to the third calculation circuit 13 is turned on, and the corresponding second switch 43 is turned off, the second set of calculation results of the second calculation circuit 12 may flow out through the path in which the first switch 42 corresponding to the third calculation circuit 13 is located.
In some embodiments, the first computing circuit 11, the second computing circuit 12, and the third computing circuit may correspond one-to-one to three sets of switches 41.
For example, if the corresponding first switch 42 of the third calculation circuit 13 is on and the corresponding second switch 43 is off, the third calculation circuit 13 does not receive data nor does it perform calculation. In this case, if the corresponding first switch 42 of the first computing circuit 11 is turned off, the corresponding second switch 43 is turned on, and both the corresponding first switch 42 and second switch 43 of the second computing circuit 12 are turned off, the first set of computing results of the first computing circuit 11 may flow out through the path in which the corresponding first switch 42 of the third computing circuit 13 is located.
In this way, by controlling the group of switches 41 corresponding to the third calculation circuit 13, it is possible to realize direct output of the calculation result of the first calculation circuit 11.
In some embodiments, the fourth calculation circuit 14 may also correspond to a set of switches 41. Similarly, for example, in a case where the corresponding first switch 42 of the fourth calculation circuit 14 is on but the corresponding second switch 43 is off, the fourth calculation circuit 14 does not receive data nor perform calculation. In this case, the third set of calculation results of the third calculation circuit 13 may flow out through the path in which the first switch 42 corresponding to the fourth calculation circuit 14 is located.
In this way, by controlling the group of switches 41 corresponding to the fourth calculation circuit 14, it is possible to realize direct output of the calculation result of the third calculation circuit 13.
In some embodiments, the fifth calculation circuit 15 may also correspond to a set of switches 41. Similarly, for example, in a case where the corresponding first switch 42 of the fifth calculation circuit 15 is on but the corresponding second switch 43 is off, the fifth calculation circuit 15 does not receive data nor perform calculation. In this case, the second set of calculation results of the second calculation circuit 12 may flow out through the path in which the first switch 42 corresponding to the fifth calculation circuit 15 is located.
In this way, by controlling the set of switches 41 corresponding to the fifth calculation circuit 15, it is possible to realize direct output of the calculation result of the second calculation circuit 12.
In some embodiments, in the case where the second computing circuit 12 corresponds to the set of switches 41, the fifth set of data flowing to the fifth computing circuit 15 may be made different by controlling the set of switches 41 corresponding to the second computing circuit 12.
For example, if the corresponding first switch 42 of the second calculation circuit 12 is on and the corresponding second switch 43 is off, the fifth set of data flowing to the fifth calculation circuit 15 may include the second set of data. In this case, the fifth calculation circuit 15 may be configured to perform a fifth calculation on the second set of data.
In this way, the fifth calculation circuit 15 may perform the fifth calculation according to the different fifth group data.
In some embodiments, in the case where the first calculation circuit 11, the second calculation circuit 12, and the third calculation circuit correspond to three sets of switches 41 one by one, the fourth set of data flowing to the fourth calculation circuit 14 can be made different by controlling the three sets of switches 41.
For example, if the first switch 42 corresponding to the first calculation circuit 11 and the first switch 42 corresponding to the third calculation circuit 13 are both turned on, and the second switch 43 corresponding to the first calculation circuit 11, the second switch 43 corresponding to the third calculation circuit 13, and the set of switches 41 corresponding to the second calculation circuit 12 are all turned off, the fourth set of data flowing to the fourth calculation circuit 14 may include the first set of data. The fourth calculation circuit 14 may be configured to perform a fourth calculation on the first set of data.
For another example, if the first switch 42 corresponding to the first computing circuit 11 is turned off, the corresponding second switch 43 is turned on, and the first switch 42 corresponding to the third computing circuit 13 is turned on, the second switch 43 corresponding to the third computing circuit is turned off, and the set of switches 41 corresponding to the second computing circuit is turned off, the fourth set of data flowing to the fourth computing circuit 14 may include the first set of computing results. The fourth calculation circuit 14 may be configured to perform a fourth calculation on the first set of calculation results.
For another example, if the corresponding first switch 42 of the second computing circuit 12 is on, the corresponding second switch 43 is off, and both the corresponding set of switches 41 of the first computing circuit 11 and the corresponding set of switches 41 of the third computing circuit 13 are off, the fourth set of data flowing to the fourth computing circuit 14 may include the second set of data. The fourth calculation circuit 14 may be configured to perform a fourth calculation on the second set of data.
Also for example, if the corresponding first switch 42 of the second computing circuit 12 is off, the corresponding second switch 43 is on, and both the corresponding set of switches 41 of the first computing circuit 11 and the corresponding set of switches 41 of the third computing circuit are off, the fourth set of data flowing to the fourth computing circuit 14 may include the second set of computing results. The fourth computing circuit may be configured to perform a fourth computation on the second set of computation results.
As such, the fourth calculation circuit 14 may perform a fourth calculation based on a fourth, different set of data.
Fig. 5 is a schematic diagram of a data flow based vector processor according to some embodiments of the present disclosure.
As shown in fig. 5, the Data stream based vector processor 500 includes the Data stream based computing unit (e.g., the Data stream based computing unit 100), the Data Buffer (DB) 51, the direct memory access (Direct Memory Access, DMA) module 52, and the control register (CSR) 53 of any of the above embodiments.
The control register 53 may be configured to transmit the first driving signal and the second driving signal corresponding to the calculation task in response to the control signal. For example, the control register 53 may send the first driving signal to the data buffer 51 and the second driving signal to the direct memory access module 52.
The data buffer 51 may be configured to buffer a first set of data and a second set of data, to transmit the first set of data to the first computing circuit 11 in the data stream based computing unit 100 and to transmit the second set of data to the second computing circuit 12 in the data stream based computing unit 100 in response to a first driving signal corresponding to a computing task.
For example, the data buffer 51 may be connected to the first calculation circuit 11 and the second calculation circuit 12 through two data paths, respectively, so that the first set of data is transferred to the first calculation circuit 11 through one of the two data paths and the second set of data is transferred to the second calculation circuit 12 through one of the two data paths, so that the first calculation circuit 11 and the second calculation circuit 12 may perform corresponding calculations, respectively, based on the data received by each.
The direct memory access module 52 may be configured to read the first set of data and the second set of data from an off-chip memory (not shown in fig. 5) and transfer the first set of data and the second set of data to the data buffer 51 in response to a second drive signal corresponding to the computing task. For example, the direct memory access module 52 may transfer the first set of data and the second set of data to the data buffer 51 via a data path coupled to the data buffer 51.
Embodiments of the present disclosure also provide an artificial intelligence chip including a data stream based vector processor (e.g., data stream based vector processor 500) of any of the embodiments described above.
The embodiment of the disclosure also provides an accelerator comprising the artificial intelligent chip of any one of the embodiments.
Data flow based computing methods according to some embodiments of the present disclosure are described below in conjunction with fig. 6-7.
Fig. 6 is a data flow based computing method according to some embodiments of the present disclosure.
In step 602, a first calculation circuit performs a first calculation on a first set of data to obtain a first set of calculation results.
In step 604, a second calculation circuit, in parallel with the first calculation circuit, performs a second calculation on the second set of data to obtain a second set of calculation results.
In step 606, where the third set of data flowing to the third computing circuit includes the first set of computing results and the second set of computing results, the third computing circuit performs a third computation on the third set of data to obtain a third set of computing results.
Here, at least one of the duration of the first calculation and the duration of the second calculation may partially coincide with the duration of the third calculation.
In some embodiments, the first calculation may be the same as the second calculation.
In some embodiments, the data flow based computing method may further include the steps of:
in the case where the fourth set of data flowing to the fourth calculation circuit includes the third set of calculation results, the fourth calculation circuit performs the fourth calculation on the fourth set of data to obtain the fourth set of calculation results.
Here, the fourth calculation is different from both the first calculation and the third calculation, and the duration of the fourth calculation may partially coincide with the duration of the third calculation.
In some embodiments, the fourth calculation may be an inverse of the first calculation. For example, the first calculation may be a fixed point number to floating point number calculation and the fourth calculation may be a floating point number to fixed point number calculation.
Fig. 7 is a data flow based computing method according to further embodiments of the present disclosure.
In step 702, a second calculation circuit in parallel with the first calculation circuit performs a second calculation on a second set of data to obtain a second set of calculation results.
In step 704, in the case where the fifth set of data flowing to the fifth calculation circuit includes the second set of calculation results, the fifth calculation circuit performs a fifth calculation on the fifth set of data to obtain the fifth set of calculation results.
Here, the fifth calculation may be different from the second calculation, and the duration of the fifth calculation may partially coincide with the duration of the second calculation.
In some embodiments, the fifth calculation may be an exponential function calculation.
In some embodiments, the data flow based computing method may further include the steps of:
in the case where the fourth set of data flowing to the fourth calculation circuit includes the fifth set of calculation results, the fourth calculation circuit further performs a fourth calculation on the fourth set of data.
In some embodiments, the first computing circuit and the second computing circuit may correspond one-to-one to two sets of switches, each set of switches including the first switch and the second switch. The first switch corresponding to the first computing circuit can be connected in parallel with the first computing circuit, and the second switch corresponding to the first computing circuit can be connected in series with the first computing circuit and connected in parallel with the first switch corresponding to the first computing circuit; the first switch corresponding to the second computing circuit may be connected in parallel with the second computing circuit, and the second switch corresponding to the second computing circuit may be connected in series with the second computing circuit and connected in parallel with the first switch corresponding to the second computing circuit.
In this case, the third set of data flowing to the third calculation circuit may be different depending on the different conduction conditions of the two sets of switches.
In the first condition, the third set of data flowing to the third computing circuit may include the first set of data and the second set of computing results. Here, the first condition may include that the first switch corresponding to the first calculation circuit is turned on, the second switch corresponding to the second calculation circuit is turned off, and the first switch corresponding to the second calculation circuit is turned off, and the second switch corresponding to the second calculation circuit is turned on.
In the second condition, the third set of data flowing to the third computing circuit may include the second set of data and the first set of computing results. Here, the second condition may include that the first switch corresponding to the first calculation circuit is turned off, the corresponding second switch is turned on, and the first switch corresponding to the second calculation circuit is turned on, the corresponding second switch is turned off.
In a third condition, the third set of data flowing to the third computing circuit may include the first set of data and the second set of data. Here, the third condition may include that both the first switch corresponding to the first calculation circuit and the first switch corresponding to the second calculation circuit are on, and both the second switch corresponding to the first calculation circuit and the second switch corresponding to the second calculation circuit are off.
In some embodiments, the first, second, and third computing circuits may correspond one-to-one to three sets of switches, each set of switches including the first switch and the second switch. The first switch corresponding to the first computing circuit can be connected in parallel with the first computing circuit, and the second switch corresponding to the first computing circuit can be connected in series with the first computing circuit and connected in parallel with the first switch corresponding to the first computing circuit; the first switch corresponding to the second calculation circuit can be connected in parallel with the second calculation circuit, and the second switch corresponding to the second calculation circuit can be connected in series with the second calculation circuit and connected in parallel with the first switch corresponding to the second calculation circuit; the first switch corresponding to the third computing circuit may be connected in parallel with the third computing circuit, and the second switch corresponding to the third computing circuit may be connected in series with the third computing circuit and connected in parallel with the first switch corresponding to the third computing circuit.
In this case, the fourth set of data flowing to the fourth calculation circuit may be different according to the different conduction conditions of the three sets of switches.
In a fourth condition, the fourth set of data flowing to the fourth computing circuit may include the first set of data. Here, the fourth condition may include that the first switch corresponding to the first calculation circuit and the first switch corresponding to the third calculation circuit are both on, and the second switch corresponding to the first calculation circuit, the second switch corresponding to the third calculation circuit, and the group of switches corresponding to the second calculation circuit are both off.
In a fifth condition, the fourth set of data flowing to the fourth computing circuit may include the first set of computing results. Here, the fifth condition may include that the first switch corresponding to the first calculation circuit is turned off, the corresponding second switch is turned on, and the first switch corresponding to the third calculation circuit is turned on, the second switch corresponding to the third calculation circuit is turned off, and the group of switches corresponding to the second calculation circuit is turned off.
In a sixth condition, the fourth set of data flowing to the fourth computing circuit may include the second set of data. Here, the sixth condition may include that the first switch corresponding to the second calculation circuit is turned on, the corresponding second switch is turned off, and the group of switches corresponding to the first calculation circuit and the group of switches corresponding to the third calculation circuit are both turned off.
In a seventh condition, the fourth set of data flowing to the fourth computing circuit may include the second set of computing results. Here, the seventh condition may include that the first switch corresponding to the second calculation circuit is turned off, the corresponding second switch is turned on, and the group of switches corresponding to the first calculation circuit and the group of switches corresponding to the third calculation circuit are both turned off.
In some embodiments, where the second computing circuit corresponds to a set of switches, the set of switches may include a first switch in parallel with the second computing circuit and a second switch in series with the second computing circuit and in parallel with the first switch corresponding to the second computing circuit.
The fifth set of data flowing to the fifth calculation circuit may be different depending on the different conduction conditions of the set of switches corresponding to the second calculation circuit.
In an eighth condition, the fifth set of data flowing to the fifth computing circuit may include the second set of data. Here, the eighth condition may include the first switch corresponding to the second calculation circuit being on, the second switch corresponding to the second calculation circuit being off.
The specific description of the above data stream based computing method may refer to the related embodiments of the foregoing data stream based computing unit, which are not described herein.
Thus, various embodiments of the present disclosure have been described in detail. In order to avoid obscuring the concepts of the present disclosure, some details known in the art are not described. How to implement the solutions disclosed herein will be fully apparent to those skilled in the art from the above description.
Although some specific embodiments of the present disclosure have been described in detail by way of example, it should be understood by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the present disclosure. It will be understood by those skilled in the art that the foregoing embodiments may be modified and equivalents substituted for elements thereof without departing from the scope and spirit of the disclosure. The scope of the present disclosure is defined by the appended claims.

Claims (22)

1. A data stream based computing unit comprising a plurality of computing circuits, the plurality of computing circuits comprising:
a first computing circuit configured to perform a first computation on the first set of data to obtain a first set of computation results;
a second calculation circuit, in parallel with the first calculation circuit, configured to perform a second calculation on a second set of data to obtain a second set of calculation results;
And a third calculation circuit configured to perform a third calculation on the third set of data to obtain a third set of calculation results, where a third set of data flowing to the third calculation circuit includes the first set of calculation results and the second set of calculation results, wherein at least one of a duration of the first calculation and a duration of the second calculation partially coincides with a duration of the third calculation.
2. The computing unit of claim 1, wherein the first computation is the same as the second computation.
3. The computing unit of claim 2, wherein the plurality of computing circuits further comprises:
a fourth calculation circuit configured to perform a fourth calculation on the fourth set of data to obtain a fourth set of calculation results, where the fourth calculation is different from both the first calculation and the third calculation, and where a duration of the fourth calculation and a duration of the third calculation partially overlap, if the fourth set of data flowing to the fourth calculation circuit includes the third set of calculation results.
4. A computing unit according to claim 3, wherein the fourth computation is an inverse of the first computation.
5. The computing unit of claim 4, wherein the first computation is a fixed-point-to-floating-point computation and the fourth computation is a floating-point-to-fixed-point computation.
6. The computing unit of any of claims 3-5, wherein the plurality of computing circuits further comprises:
a fifth calculation circuit configured to perform a fifth calculation on the fifth set of data to obtain a fifth set of calculation results, in a case where the fifth set of data flowing to the fifth calculation circuit includes the second set of calculation results, wherein the fifth calculation is different from the second calculation, and a duration of the fifth calculation partially coincides with a duration of the second calculation;
the fourth computing circuit is further configured to perform the fourth computation on the fourth set of data if the fourth set of data includes the fifth set of computation results.
7. The computing unit of claim 6, wherein the fifth computation is an exponential function computation.
8. The computing unit of claim 1, further comprising at least one set of switches in one-to-one correspondence with at least one of the plurality of computing circuits, each set of switches comprising a first switch in parallel with a corresponding computing circuit, and a second switch in series with the corresponding computing circuit and in parallel with the first switch.
9. The computing unit of claim 8, wherein the at least one set of switches comprises two sets of switches in one-to-one correspondence with at least two computing circuits.
10. The computing unit of claim 6, wherein the fourth computing circuit and the fifth computing circuit correspond one-to-one to two sets of switches, each set of switches comprising a first switch in parallel with a corresponding computing circuit, and a second switch in series with the corresponding computing circuit and in parallel with the first switch.
11. A data stream based vector processor comprising:
the data stream based computing unit of any one of claims 1-10.
12. The vector processor of claim 11, further comprising:
a data buffer configured to buffer the first set of data and the second set of data, transmit the first set of data to the first computing circuit and transmit the second set of data to the second computing circuit in response to a first driving signal corresponding to a computing task;
a direct memory access module configured to read the first set of data and the second set of data from an off-chip memory in response to a second drive signal corresponding to the computing task and to transmit the first set of data and the second set of data to the data buffer;
And a control register configured to transmit the first driving signal and the second driving signal in response to a control signal.
13. An artificial intelligence chip comprising:
the vector processor of claim 11 or 12.
14. A data stream based computing method, comprising:
the first computing circuit performs a first computation on the first set of data to obtain a first set of computation results;
a second calculation circuit connected in parallel with the first calculation circuit performs a second calculation on a second set of data to obtain a second set of calculation results;
in the case where the third set of data flowing to the third calculation circuit includes the first set of calculation results and the second set of calculation results, the third calculation circuit performs a third calculation on the third set of data to obtain a third set of calculation results, wherein at least one of a duration of the first calculation and a duration of the second calculation partially coincides with a duration of the third calculation.
15. The computing method of claim 14, wherein the first computation is the same as the second computation.
16. The computing method of claim 15, further comprising:
in the case where the fourth set of data flowing to the fourth calculation circuit includes the third set of calculation results, the fourth calculation circuit performs a fourth calculation on the fourth set of data to obtain a fourth set of calculation results, wherein the fourth calculation is different from both the first calculation and the third calculation, and a duration of the fourth calculation and a duration of the third calculation partially overlap.
17. The computing method of claim 16, wherein the fourth computation is an inverse of the first computation.
18. The computing method of claim 17, wherein the first computation is a fixed-point-to-floating-point computation and the fourth computation is a floating-point-to-fixed-point computation.
19. The computing method of any of claims 16-18, further comprising:
in the case that a fifth set of data flowing to a fifth calculation circuit includes the second set of calculation results, the fifth calculation circuit performs a fifth calculation on the fifth set of data to obtain a fifth set of calculation results, wherein the fifth calculation is different from the second calculation, and a duration of the fifth calculation and a duration of the second calculation partially overlap;
in the case where the fourth set of data includes the fifth set of calculation results, the fourth calculation circuit also performs the fourth calculation on the fourth set of data.
20. The computing method of claim 19, wherein the fifth computation is an exponential function computation.
21. The computing method of claim 14, wherein the first computing circuit and the second computing circuit are in one-to-one correspondence with two sets of switches, each set of switches comprising a first switch and a second switch;
A first switch corresponding to the first computing circuit is connected in parallel with the first computing circuit, and a second switch corresponding to the first computing circuit is connected in series with the first computing circuit and is connected in parallel with the first switch corresponding to the first computing circuit;
the first switch corresponding to the second computing circuit is connected in parallel with the second computing circuit, and the second switch corresponding to the second computing circuit is connected in series with the second computing circuit and is connected in parallel with the first switch corresponding to the second computing circuit;
under a first condition, the third set of data comprises the first set of data and the second set of calculation results, the first condition comprises that a first switch corresponding to the first calculation circuit is turned on, a second switch corresponding to the first calculation circuit is turned off, and the first switch corresponding to the second calculation circuit is turned off, and the second switch corresponding to the second calculation circuit is turned on;
under a second condition, the third set of data comprises the second set of data and the first set of calculation results, the second condition comprises that a first switch corresponding to the first calculation circuit is turned off, a second switch corresponding to the first calculation circuit is turned on, and the first switch corresponding to the second calculation circuit is turned on, and the second switch corresponding to the second calculation circuit is turned off;
In a third condition, the third set of data includes the first set of data and the second set of data, the third condition includes that both a first switch corresponding to the first computing circuit and a first switch corresponding to the second computing circuit are on, and both a second switch corresponding to the first computing circuit and a second switch corresponding to the second computing circuit are off.
22. The computing method of any of claims 16-18, wherein the first, second, and third computing circuits are in one-to-one correspondence with three sets of switches, each set of switches comprising a first switch and a second switch;
a first switch corresponding to the first computing circuit is connected in parallel with the first computing circuit, and a second switch corresponding to the first computing circuit is connected in series with the first computing circuit and is connected in parallel with the first switch corresponding to the first computing circuit;
the first switch corresponding to the second computing circuit is connected in parallel with the second computing circuit, and the second switch corresponding to the second computing circuit is connected in series with the second computing circuit and is connected in parallel with the first switch corresponding to the second computing circuit;
the first switch corresponding to the third computing circuit is connected in parallel with the third computing circuit, and the second switch corresponding to the third computing circuit is connected in series with the third computing circuit and is connected in parallel with the first switch corresponding to the third computing circuit;
Under a fourth condition, the fourth set of data includes the first set of data, the fourth condition includes that a first switch corresponding to the first computing circuit and a first switch corresponding to the third computing circuit are both on, and a second switch corresponding to the first computing circuit, a second switch corresponding to the third computing circuit and a set of switches corresponding to the second computing circuit are all off;
under a fifth condition, the fourth set of data includes the first set of calculation results, the fifth condition includes that a first switch corresponding to the first calculation circuit is turned off, a corresponding second switch is turned on, and a first switch corresponding to the third calculation circuit is turned on, a second switch corresponding to the third calculation circuit is turned off, and a set of switches corresponding to the second calculation circuit is turned off;
under a sixth condition, the fourth set of data includes the second set of data, the sixth condition includes that a first switch corresponding to the second computing circuit is turned on, a corresponding second switch is turned off, and a group of switches corresponding to the first computing circuit and a group of switches corresponding to the third computing circuit are turned off;
under a seventh condition, the fourth set of data includes the second set of calculation results, the seventh condition includes that the first switch corresponding to the second calculation circuit is turned off, the corresponding second switch is turned on, and the group of switches corresponding to the first calculation circuit and the group of switches corresponding to the third calculation circuit are turned off.
CN202210169225.4A 2022-02-23 2022-02-23 Data flow-based computing unit, vector processor and artificial intelligent chip Pending CN116679983A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210169225.4A CN116679983A (en) 2022-02-23 2022-02-23 Data flow-based computing unit, vector processor and artificial intelligent chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210169225.4A CN116679983A (en) 2022-02-23 2022-02-23 Data flow-based computing unit, vector processor and artificial intelligent chip

Publications (1)

Publication Number Publication Date
CN116679983A true CN116679983A (en) 2023-09-01

Family

ID=87786007

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210169225.4A Pending CN116679983A (en) 2022-02-23 2022-02-23 Data flow-based computing unit, vector processor and artificial intelligent chip

Country Status (1)

Country Link
CN (1) CN116679983A (en)

Similar Documents

Publication Publication Date Title
US11531540B2 (en) Processing apparatus and processing method with dynamically configurable operation bit width
CN109376861B (en) Apparatus and method for performing full connectivity layer neural network training
WO2019218896A1 (en) Computing method and related product
CN110689126A (en) Device for executing neural network operation
US10416964B2 (en) Adder device, data accumulation method and data processing device
JPH0773149A (en) System and method for data processing
EP3798823A1 (en) Apparatuses, methods, and systems for instructions of a matrix operations accelerator
CN110059797B (en) Computing device and related product
CN100444107C (en) Optimized processors and instruction alignment
EP3819788A1 (en) Data processing system and data processing method
US8402251B2 (en) Selecting configuration memory address for execution circuit conditionally based on input address or computation result of preceding execution circuit as address
CN116679983A (en) Data flow-based computing unit, vector processor and artificial intelligent chip
CN102298568B (en) Method and device for switching configuration information of dynamic reconfigurable array
CN111368967B (en) Neural network computing device and method
US10476492B2 (en) Structures and operations of integrated circuits having network of configurable switches
CN111178492B (en) Computing device, related product and computing method for executing artificial neural network model
CN112862079B (en) Design method of running water type convolution computing architecture and residual error network acceleration system
CN114186183A (en) Light FFT operation method and implementation device thereof
CN111368986B (en) Neural network computing device and method
CN113342719A (en) Operation acceleration unit and operation method thereof
CN116796816B (en) Processor, computing chip and computing device
US20230289065A1 (en) Data flow control device in streaming architecture chip
CN111367567A (en) Neural network computing device and method
CN109948785B (en) High-efficiency neural network circuit system and method
US11269651B2 (en) Reusing adjacent SIMD unit for fast wide result generation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination