CN115328436A - Calculation method and device of multiple accumulators, electronic equipment and storage medium - Google Patents

Calculation method and device of multiple accumulators, electronic equipment and storage medium Download PDF

Info

Publication number
CN115328436A
CN115328436A CN202211237771.3A CN202211237771A CN115328436A CN 115328436 A CN115328436 A CN 115328436A CN 202211237771 A CN202211237771 A CN 202211237771A CN 115328436 A CN115328436 A CN 115328436A
Authority
CN
China
Prior art keywords
data
accumulator
group
input
inter
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211237771.3A
Other languages
Chinese (zh)
Inventor
梁监天
蔡权雄
牛昕宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Corerain Technologies Co Ltd
Original Assignee
Shenzhen Corerain Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Corerain Technologies Co Ltd filed Critical Shenzhen Corerain Technologies Co Ltd
Priority to CN202211237771.3A priority Critical patent/CN115328436A/en
Publication of CN115328436A publication Critical patent/CN115328436A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/491Computations with decimal numbers radix 12 or 20.
    • G06F7/498Computations with decimal numbers radix 12 or 20. using counter-type accumulators

Landscapes

  • Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application provides a method and a device for calculating a plurality of accumulators, electronic equipment and a storage medium, wherein the method comprises the following steps: for each type of data to be accumulated and calculated, dividing the data into n groups: the following target operations are performed for each type of data: extracting an input end A of a data input group accumulator from an array, and inputting an output result obtained from an output end of the group accumulator to an input end B of the group accumulator, wherein the period of data input is one clock period or a plurality of clock periods; after all data are input into the intragroup accumulator, inputting the data of the next category into the intragroup accumulator according to the target operation; determining n initial accumulation results output by the accumulators in the group; and summing the n initial accumulation results through at least one inter-group accumulator to obtain a data accumulation result of all data. The method and the device improve the calculation efficiency of the accumulator.

Description

Calculation method and device of multiple accumulators, electronic equipment and storage medium
Technical Field
The present application relates to the field of accumulator technologies, and in particular, to a method and an apparatus for calculating multiple accumulators, an electronic device, and a storage medium.
Background
With the rapid development of deep learning, convolutional neural networks have been largely applied to machine vision applications such as image recognition and image classification. In convolutional neural network computation, accumulation is one of the most important operation processes. How to improve the efficiency of accumulation calculation is a key technology in the architecture of a data stream chip.
The working principle of the conventional accumulator is as follows:
the accumulator generally comprises an accumulator and corresponding data control logic, wherein T0-Tn in FIG. 1 are time scales, the unit is the working clock period of the chip, and D0-Dn are data to be accumulated and calculated. Assuming that the time scale is set by the clock period T, the data D0 to be calculated reaches the accumulator at the time T0, the data D1 reaches the accumulator at the time T1, \8230: \ 8230, and the data Dn reaches the accumulator at the time Tn.
At the time of T0, D0 and 0 are respectively sent to two input interfaces A and B of an accumulator;
at the time of T1, the accumulator calculates to obtain a first sum D0+0= d0, and at the same time, D1 also arrives, and sends the first calculation result D0+0= d0 back to the B interface of the accumulator, and sends D1 to the a interface;
at the moment of T2, D0+ D1 is calculated, and D2 just arrives, and then the calculation result D0+ D1 of the second time is sent back to the interface B of the accumulator, and D2 is sent to the interface A;
and so on, finally obtaining the accumulated result of D0+ D1+ D2+ D3+ \8230 ++ Dn.
However, after the chip operating frequency is increased, the above-mentioned conventional accumulator structure is no longer applicable, because the clock cycle is shortened after the chip operating frequency is increased, the calculation duration of the accumulator is longer than one clock cycle, i.e. the calculation cannot be completed within one clock cycle T, for example, the calculation of D0+0 is performed at the time of T0, and the result is not obtained until T2, but at this time, the accumulator already has two new data (D1 and D2), and the accumulation operation of the pipeline cannot be performed through the above structure, so that one data can be input every two clock cycles, the data input flow rate is slow, the accumulation operation speed is slow, and the efficiency of the accumulator is low.
Disclosure of Invention
An object of the embodiments of the present application is to provide a method and an apparatus for calculating a plurality of accumulators, an electronic device, and a storage medium, so as to solve the problem of low efficiency of the accumulators. The specific technical scheme is as follows:
in a first aspect, a method for calculating a plurality of accumulators is provided, the method comprising:
for each type of data to be accumulated and calculated, dividing the data into n arrays, wherein n is determined according to the number of clock cycles contained in an operation cycle of an accumulator, and n is a positive integer greater than 1:
the following target operations are performed for each type of data:
extracting a data input end A of the accumulator in the group from an array, and inputting an output result obtained from an output end of the accumulator in the group to an input end B of the accumulator in the group, wherein the period of data input is one clock period or a plurality of clock periods;
after all data are input into the intra-group accumulator, inputting the data of the next category into the intra-group accumulator according to the target operation;
determining n initial accumulation results output by the accumulators in the group;
and summing the n initial accumulation results through at least one inter-group accumulator to obtain a data accumulation result of all data.
Optionally, determining n initial accumulation results output by the intra-group accumulator comprises:
outputting the first two initial accumulation results to a first inter-group accumulator through the intra-group accumulator, wherein the number of the first inter-group accumulators is 1;
and respectively outputting one initial accumulation result to a second inter-group accumulator by the intra-group accumulators according to the rest initial accumulation results, wherein the number of the second inter-group accumulators is n-2.
Optionally, summing the n initial accumulated results by at least one inter-group accumulator to obtain a data accumulated result of all data includes:
through the first inter-group accumulator, summing two initial accumulation results output earliest by the intra-group accumulators, and outputting an intermediate accumulation result to a second inter-group accumulator;
and adding the obtained intermediate accumulation result and the initial accumulation result through the current second inter-group accumulator, and outputting the intermediate accumulation result to the next second inter-group accumulator until all the initial accumulation results are input to the inter-group accumulators to obtain the data accumulation result of all the data output by the last second inter-group accumulator.
Optionally, the data is a floating point number or a fixed point number;
dividing the data into n number groups comprises: according to a preset dividing mode, starting from the first data, adjacent data are divided into an array until all data are divided into n arrays.
Optionally, the extracting a data from an array and inputting an output result obtained from an output of the intra-group accumulator to an input a of the intra-group accumulator comprises:
according to a preset array cycle sequence, at each input moment, extracting data from an array to input an input end A of the accumulator, and inputting an output result of the array obtained from an output end of the accumulator to an input end B of the accumulator, wherein the array cycle data is the array sequence determined in sequence when the data are divided.
Optionally, the extracting of the data from an array is input to the input a of the accumulator in the array, and comprises:
extracting the data arranged most front in the array according to the arrangement sequence of the data in the array from the data not extracted currently in the array;
and inputting the data arranged most ahead into an input end A of the accumulator.
In a second aspect, there is provided a computing device for a plurality of accumulators, the device comprising:
the dividing module is used for dividing a plurality of data to be accumulated and calculated into n arrays aiming at each type of the data, wherein n is determined according to the number of clock cycles contained in an operation cycle of an accumulator, and n is a positive integer greater than 1:
the following target operations are performed for each type of data:
the extraction module is used for extracting data from an array and inputting the data into an input end A of the in-group accumulator, and inputting an output result obtained from an output end of the in-group accumulator into an input end B of the in-group accumulator;
the input module is used for inputting the data of the next category into the internal group accumulator according to the target operation after all the data are input into the internal group accumulator;
the determining module is used for determining n initial accumulation results output by the accumulators in the group;
and the obtaining module is used for summing the n initial accumulation results through at least one inter-group accumulator to obtain a data accumulation result of all data.
Optionally, the determining module is configured to:
outputting the first two initial accumulation results to a first inter-group accumulator through the intra-group accumulator, wherein the number of the first inter-group accumulators is 1;
and respectively outputting one initial accumulation result to a second inter-group accumulator by the intra-group accumulators for the rest initial accumulation results, wherein the number of the second inter-group accumulators is n-2.
In a third aspect, an electronic device is provided, which includes a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing the calculation method steps of any accumulator when the program stored in the memory is executed.
In a fourth aspect, a computer-readable storage medium is provided, in which a computer program is stored, which computer program, when being executed by a processor, carries out the method steps of calculating any of the plurality of accumulators.
The embodiment of the application has the following beneficial effects:
the embodiment of the application provides a calculation method of a plurality of accumulators, the calculation method is divided into n arrays aiming at the same data category, aiming at the input of at least partial data, at each clock moment, one data input in-group accumulator can be extracted from one array, and an output result can be obtained and input into the in-group accumulator again, so that the reduction of data input flow can be avoided, the in-group accumulator can simultaneously carry out accumulation calculation on other data when a series of data is input and output, the in-group accumulator is in a full load working state, the utilization rate of a chip is not reduced, and the calculation efficiency of the in-group accumulator is improved.
After the group accumulators obtain the initial accumulation results, the group accumulators realize the summation of a plurality of initial accumulation results, so that the group accumulators can receive the data of the next category without intervals, the data of the next category can be input after the group accumulators calculate the intermediate accumulation results, and the calculation efficiency of the accumulators is further improved.
Of course, it is not necessary for any product or method of the present application to achieve all of the above advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flow chart of a prior art accumulator calculation;
FIG. 2 is a flow chart of a method for calculating a plurality of accumulators according to an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of two accumulators provided in an embodiment of the present application;
FIG. 4 is a schematic structural diagram of a computing apparatus with multiple accumulators according to an embodiment of the present disclosure;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without making creative efforts shall fall within the protection scope of the present application.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.
To address the problems noted in the background, according to an aspect of embodiments of the present application, an embodiment of a method for calculating a plurality of accumulators is provided.
The embodiment of the application provides a method for calculating a plurality of accumulators, which can be applied to digital chip design, field Programmable Gate Array (FPGA) design, software design, a server and the like and is used for improving the calculation efficiency of the accumulators.
The following will describe in detail a calculation method for multiple accumulators provided in the embodiment of the present application with reference to a specific implementation manner, as shown in fig. 2, the specific steps are as follows:
step 201: and for each type of data to be accumulated and calculated, dividing the data into n arrays.
Wherein n is determined according to the number of clock cycles contained in the operation cycle of the accumulator, and n is a positive integer greater than 1.
In the embodiment of the application, the data to be accumulated and calculated are divided into a plurality of categories, the control logic firstly inputs data of one category to the accumulator for accumulation and calculation, and then inputs data of another category to the accumulator after the data accumulation and calculation of the category is finished.
After the working frequency of the chip is increased, the accumulation of data needs to be processed, and the accumulator cannot complete calculation within one clock cycle T, namely the calculation cycle is longer relative to the clock cycle, and one calculation cycle comprises n clock cycles.
For each type of data, the data is divided into n arrays, wherein the number of the data contained in each array can be the same or different.
Illustratively, dividing D0, D1, D2, D3.. D8 into 2 arrays, each array may be set to contain 4 data; it is also possible to set one array to contain 3 data and the other array to contain 5 data.
The division mode is not limited in the application, several continuous data can be divided into an array, several data can be randomly selected to be divided into an array, and one data can be extracted and included into an array every time according to a preset interval n-1.
Illustratively, every time one data is extracted at a preset interval n-1 and is included in one array, and n is 2, one data is extracted every 1 data of D0, D1, D2, D3.. Dn and is included in one array, so that two arrays D0, D2, D4, dn-1 and D1, D3, D5, dn are obtained.
The control logic performs steps 202 through 205 for each category of data.
Step 202: a data input to an input A of the intra-group accumulator is extracted from an array, and an output result obtained from an output of the intra-group accumulator is input to an input B of the intra-group accumulator.
The period of the data input is one clock period or a plurality of clock periods.
At each input instant, the control logic takes a data from an array and inputs it to input A of the intra-group accumulator and inputs the empty data to input B of the intra-group accumulator.
After the intra-group accumulator is able to produce an output result, a data is extracted from the current array and input to input a of the intra-group accumulator, and an output result is obtained from the output of the intra-group accumulator and input to input B of the intra-group accumulator. The data extraction mode can be that an array is randomly selected for data extraction, or the data extraction can be sequentially performed according to a certain array circulation sequence.
Illustratively, if the extraction method is to randomly select an array for data extraction, n is 3.
At time T0, extracting data from the first array and inputting the data into the accumulator in the array;
at time T1, extracting data from the second array and inputting the data into the in-group accumulator;
at the time of T3, extracting data from a third array and inputting the data into an in-group accumulator;
at time T1, extracting data from the first array and inputting the data into the accumulator in the array; (in this case, the data in the first array is accumulated)
At time T3, extracting data from the third array and inputting the data into the accumulator in the array; (in this case, the data in the second array and the data in the third array are accumulated)
Illustratively, if the extraction is performed sequentially in a certain array loop order, n is 3. For the first array, the second array and the third array, according to the sequence from array one to array three, each data is extracted from an array to the intra-array accumulator.
The data input is normally one clock cycle, which can speed up the data input flow, but there are some special cases, where the first stage is one clock cycle, and the later stage is multiple clock cycles, which appear in the floating point number and the fixed point number, and will be described in detail later, and details will not be described here.
Step 203: after all the data is input into the intragroup accumulator, the data of the next category is input into the intragroup accumulator according to the array circulation sequence.
After all the data of the category are input into the in-group accumulators, the control logic continues to input the data of the next category into the in-group accumulators according to the array circulation sequence.
Illustratively, after the control logic sends Dn to the input interface of the intra-group accumulator at time Tn, the data input of the class is finished, and the control logic inputs H0 and 0 to the two input interfaces A and B of the intra-group accumulator at the next time, wherein the data classes of the D-type data and the H-type data are different.
Step 204: n initial accumulation results output by the intra-group accumulators are determined.
After the control logic inputs all data into the intra-group accumulator, since the number of the arrays is n, data is extracted from one array at a time for accumulation, so that n initial accumulation results are obtained, where the initial accumulation results may be accumulation results of internal data in one array, for example, the above example of extraction in the array cycle order, or accumulation results of data between different arrays, for example, the above example of the random extraction manner.
Step 205: and summing the n initial accumulation results through at least one inter-group accumulator to obtain a data accumulation result of all data.
The intra-group accumulator outputs the initial accumulation result to the inter-group accumulator, and the n initial accumulation results are added through the inter-group accumulator, so that the data accumulation result of all data is obtained. The number of the inter-group accumulators can be only one or more, and is at most n-1.
In the application, the method is divided into n arrays aiming at the same data category, and aiming at the input of at least partial data, at each clock moment, one data input in-group accumulator can be extracted from one array, and an output result can be obtained and input into the in-group accumulator again, so that the reduction of data input flow can be avoided, the in-group accumulator can simultaneously carry out accumulation calculation on other data when inputting and outputting a series of data, the in-group accumulator is in a full load working state, the utilization rate of a chip is not reduced, and the calculation efficiency of the in-group accumulator is improved.
After the intra-group accumulator obtains the initial accumulation result, the inter-group accumulator sums a plurality of initial accumulation results, so that the intra-group accumulator can receive the next type of data at no interval, and can input the next type of data without calculating the intermediate accumulation result, thereby further improving the calculation efficiency of the accumulator.
Illustratively, fig. 3 is a schematic diagram of the accumulation of two accumulators. D0, D1, D2, D3.. Dn to be calculated by the control logic will reach the intra-group accumulator at times T0, T1, T2, T3.. Tn, respectively.
At the moment of T0, D0 and 0 are respectively sent to two input interfaces A and B of the accumulator in the group;
at the moment of T1, D1 and 0 are respectively sent to two input interfaces A and B of the accumulator in the group;
at the time of T2, the group accumulator calculates to obtain a first sum D0+0= d0, and at the same time, D2 also arrives, and sends the first calculation result D0+0= d0 back to the B interface of the group accumulator, and sends D2 to the a interface;
at the time of T3, D1+0 is calculated, and at the same time D3 just arrives, the calculation result D1+0 of the second time is sent back to the interface B of the accumulator in the group, and D3 is sent to the interface A;
...
at the moment of Tn, D1+ D3+ D5+ \8230 + Dn-2 is calculated, and Dn just arrives at the same time, and the calculation result D1+ D3+ D5+ \8230isobtained at the second time, and the + Dn-2 is sent back to an interface B of an accumulator in the group, and the Dn is sent to an interface A; (this time, the input of data of the next category is started at the next time after the input of data of the previous category is completed)
Obtaining an initial accumulation result D0+ D2+ D4+ \8230atthe moment of Tn +1 and + Dn-1, and outputting the initial accumulation result to an input port A of the intergroup accumulator;
obtaining an initial accumulation result D1+ D3+ D5+ \8230, + Dn at the moment Tn +2, and outputting the initial accumulation result to an input port B of the inter-group accumulator;
...
at the time Tn +4, the intergroup accumulator obtains the accumulation result of (D0 + D2+ D4+ \8230; + Dn-1) + (D1 + D3+ D5+ \8230; + Dn).
As an alternative embodiment, determining the n initial accumulation results output by the intra-group accumulators comprises: outputting the first two initial accumulation results to a first inter-group accumulator through the intra-group accumulator, wherein the number of the first inter-group accumulators is 1; and respectively outputting one initial accumulation result to a second inter-group accumulator for the rest initial accumulation results through the intra-group accumulators, wherein the number of the second inter-group accumulators is n-2.
The first two initial accumulation results are output to a first inter-group accumulator, the number of the first inter-group accumulators is 1, the first inter-group accumulator outputs an intermediate accumulation result to a second inter-group accumulator, each time the subsequent inter-group accumulator completes calculation, the subsequent inter-group accumulator outputs an inter-group accumulation result to an inter-group accumulator, the total number of the inter-group accumulators is n, therefore, n initial accumulation results exist, the first inter-group accumulator calculates the two initial accumulation results, and each remaining initial accumulation result is respectively sent to the second inter-group accumulator, therefore, the number of the second inter-group accumulator is n-1.
As an alternative embodiment, the summing the n initial accumulation results by at least one inter-group accumulator to obtain the data accumulation result of all data includes: through the first inter-group accumulator, the two initial accumulation results output earliest by the intra-group accumulator are summed, and an intermediate accumulation result is output to the second inter-group accumulator; and adding the obtained intermediate accumulation result and the initial accumulation result through the current second inter-group accumulator, and outputting the intermediate accumulation result to the next second inter-group accumulator until all the initial accumulation results are input to the inter-group accumulators to obtain the data accumulation result of all the data output by the last second inter-group accumulator.
The first two initial accumulation results are output to a first inter-group accumulator by the inter-group accumulator, the first inter-group accumulator sums the two initial accumulation results and outputs an intermediate accumulation result to a second inter-group accumulator, after the second inter-group accumulator obtains one intra-group accumulation result output by the intra-group accumulator, the intra-group accumulation result and the received intermediate accumulation result are summed to obtain a new intermediate accumulation result, the new intermediate accumulation result is output to a next second inter-group accumulator, the next second inter-group accumulator sums the received intra-group accumulation result and the received intermediate accumulation result in the same way, the operation is continued until no initial accumulation result is output to the second inter-group accumulator, and thus the data accumulation result of all data output by the last second inter-group accumulator is obtained.
Illustratively, the run period includes 3 clock cycles, the control logic inputs three sets of data to be accumulated into the accumulator, D0, D3.. Dn-2 and D1, and D4.. Dn-1 and D2, D5.. Dn, crossing, the calculation process is as follows:
at the time of T0, D0 and 0 are respectively sent to two input interfaces A and B of an accumulator in the group;
at the moment of T1, D1 and 0 are respectively sent to two input interfaces A and B of the accumulator in the group;
at the moment of T2, D2 and 0 are respectively sent to two input interfaces A and B of the accumulator in the group;
at the time of T3, the group accumulator calculates to obtain an output result D0+0= d0, and at the same time, D3 also arrives, and sends D0+0= d0 back to the B interface of the group accumulator and D3 to the a interface;
at the time of T4, the group accumulator calculates to obtain an output result D1+0= D1, and at the same time, D3 also arrives, and sends D1+0= D1 back to the B interface of the group accumulator, and sends D4 to the a interface;
at the time of T5, the accumulator in the group calculates to obtain an output result D2+0= D2, and at the same time, D3 also arrives, and sends D2+0= D2 back to the B interface of the accumulator in the group, and sends D5 to the a interface;
...
at the moment Tn, the accumulator in the group calculates to obtain an output result D2+ D5+ \8230, + Dn-3, and Dn just arrives at the same time, and then D2+ D5+ \8230, + Dn-3 is sent back to the interface B of the accumulator in the group, and Dn is sent to the interface A; (so far this type of data is all input to the accumulator)
At Tn +1 moment, the accumulator in the group calculates to obtain an initial accumulation result D0+ D3+ \8230, + Dn-2;
at the moment of Tn +2, the intra-group accumulator calculates to obtain an initial accumulation result D1+ D4+ \8230, and + Dn-1 outputs D0+ D3+ \8230, and + Dn-2 to an input port A of the inter-group accumulator r, and D1+ D4+ \8230, and + Dn-1 to an input port B of the inter-group accumulator r;
at the moment Tn +3, an initial accumulation result D2+ D5+ \8230, + Dn is obtained by the calculation of the accumulator in the group;
at time Tn +5, the interclass accumulator r obtains the intermediate accumulation result of (D0 + D3+ \8230; + Dn-2) + (D1 + D4+ \8230; + Dn-1), and sends the intermediate accumulation result to the input port A of the interclass accumulator w, and the intraclass accumulator sends the initial accumulation result of D2+ D5+ \8230; + Dn to the input port B of the interclass accumulator w.
At the moment Tn +8, the inter-group accumulator w outputs an intermediate accumulation result (D0 + D3+ \8230; + Dn-2) + D1+ D4+ \8230; + Dn-1) + (D2 + D5+ \8230; + Dn).
It can be seen from the above that, by adding the accumulator, the accumulator in the group is not occupied for accumulating the final initial accumulation result, the next kind of data can be immediately input into the accumulator in the group, and the operation between the kinds of data has no gap, thereby improving the operation rate. The intra-group accumulation calculation and the inter-group accumulation calculation respectively use different accumulators, the calculation is not stopped, the flow line work of the accumulators is realized, and the efficiency is maximized.
The control logic determines an initial accumulation result obtained currently; and adding a currently obtained initial accumulation result and the output intermediate accumulation result of the round by an accumulator to obtain a new intermediate accumulation result (Tn +8 time to obtain an intermediate accumulation result (D0 + D3+ \8230; + Dn-2) + (D1 + D4+ \8230; + Dn-1) + (D2 + D5+ \8230; + Dn)) by the accumulator every xn +2 clock cycles, and increasing xn +2, x =2, n =, 3 by the Tn +8 relative to Tn.
And x is a positive integer larger than 1, and the control logic repeats the above-mentioned intergroup accumulation along with the increase of x until all initial accumulation results participate in the addition calculation, so that the data accumulation result of all data is completed.
As an alternative embodiment, the accumulator may end up with accumulated results that are not divided by adjacent data, so that the order of data accumulation is disturbed. Illustratively, the sequence of input data is D0, D1, D2, and D3.. Dn, and the input data is divided into two arrays, and if one data is extracted into one array every n-1 data, the sequence of the final data accumulation result is D0, D2.. Dn-1, D1, and D3.. Dn.
For floating point numbers and fixed point numbers, the accumulation process needs to be carried out according to the data sequence, namely accumulation is carried out in a mode of D0+ D1+ D2+ \8230and + Dn. The division of this type of data needs to start with the first data and divide the adjacent data into an array until all data is divided into n arrays.
In the data input process, it is necessary to extract a data input to the input terminal a of the accumulator from an array at each input time according to the array cycle order determined during data division, and input the output result of the array obtained from the output terminal of the accumulator to the input terminal B of the accumulator. The data extraction sequence is also to be extracted in sequence according to the arrangement sequence of the data in the array.
For example, D0, D1, D2 \8230, D7 are divided into 3 arrays, D0 and D1 are divided into array 1, D2, D3 and D4 are divided into array 2, and D5, D6 and D7 are divided into array 3.
At the time of T0, extracting data D0 and 0 from the array 1 and inputting the data into an accumulator;
at the time of T1, extracting data D2 and 0 from an array 2 and inputting the data into an accumulator;
at the time of T2, extracting data D5 and 0 from the array 3 and inputting the data into an accumulator;
at the moment of T3, obtaining an output result D0+0 from the array 1, extracting data D1 and inputting the data into an accumulator;
at the time of T4, obtaining an output result D2+0 from the array 2, extracting data D3 and inputting the data into an accumulator;
at the time of T5, obtaining an output result D5+0 from the array 3, and extracting data D6 to input into an accumulator;
at the time of T6, obtaining an output result D0+ D1 from the array 1; (No data input at this point)
At the time of T7, an output result D2+ D3 is obtained from the array 2, and data D4 is extracted and input into an accumulator;
at the time of T8, obtaining an output result D5+ D6 from the array 3, extracting data D7 and inputting the data into an accumulator;
...
at the time of T10, obtaining an output result D2+ D3+ D4 from the array 2;
at the time of T9, obtaining an output result D5+ D6+ D7 from the array 3;
finally, the initial accumulation results are added, (D0 + D1) + (D2 + D3+ D4) + (D5 + D6+ D7).
Therefore, the sequence of the initial accumulation result in the array 1 is D0, D1 and D2 \8230andD 7, and the precision influence on the accumulation result of floating point numbers or fixed point numbers cannot be generated. It can be seen that there is no data input at time T6, in order to ensure that data is not out of order, two inputs of the accumulator at the same time must be data in the same array, so that data input in the same array at time T7 and time T8 are accumulated, and a clock cycle is separated between time T5 and time T7, so that the cycle of data input may be multiple clock cycles.
Based on the same technical concept, the embodiment of the present application further provides a calculating apparatus for calculating a plurality of accumulators, as shown in fig. 4, the apparatus includes:
a dividing module 401, configured to divide data into n arrays for each type of data to be accumulated, where n is determined according to the number of clock cycles included in an operation cycle of an accumulator, and n is a positive integer greater than 1:
the following operations are performed for each type of data:
an extracting module 402, configured to extract an input end a of a data input in-bank accumulator from an array, and input an output result obtained from an output end of the in-bank accumulator to an input end B of the in-bank accumulator, where a period of data input is one clock cycle or multiple clock cycles;
an input module 403, configured to input the next category of data into the in-group accumulator according to a target operation after all the data are input into the in-group accumulator;
a determining module 404, configured to determine n initial accumulation results output by the group internal accumulators;
and an obtaining module 405, configured to sum the n initial accumulation results through at least one inter-group accumulator to obtain a data accumulation result of all data.
Optionally, the determining module 404 is configured to:
outputting the first two initial accumulation results to a first inter-group accumulator through the intra-group accumulators, wherein the number of the first inter-group accumulators is 1;
and respectively outputting one initial accumulation result to a second inter-group accumulator by the intra-group accumulators for the rest initial accumulation results, wherein the number of the second inter-group accumulators is n-2.
Optionally, the obtaining module 405 is configured to:
through the first inter-group accumulator, the two initial accumulation results output earliest by the intra-group accumulator are summed, and an intermediate accumulation result is output to the second inter-group accumulator;
and adding the obtained intermediate accumulation result and the initial accumulation result through the current second inter-group accumulator, and outputting the intermediate accumulation result to the next second inter-group accumulator until all the initial accumulation results are input to the inter-group accumulators to obtain the data accumulation result of all the data output by the last second inter-group accumulator.
Optionally, the data is a floating point number or a fixed point number;
a dividing module 401 configured to: according to a preset dividing mode, starting from the first data, adjacent data are divided into an array until all data are divided into n arrays.
Optionally, the extracting module 402 is configured to:
and according to a preset array cycle sequence, at each input moment, extracting data from an array to input an input end A of the accumulator, and inputting an output result of the array obtained from an output end of the accumulator to an input end B of the accumulator, wherein the array cycle data is the array sequence determined in sequence when the data are divided.
Optionally, the extraction module 402 is configured to:
extracting the data arranged most front in the array according to the arrangement sequence of the data in the array from the data not extracted currently in the array;
the first data will be arranged and input to input a of the accumulator.
According to another aspect of the embodiments of the present application, there is provided an electronic device, as shown in fig. 5, including a memory 503, a processor 501, a communication interface 502, and a communication bus 504, where the memory 503 stores a computer program that can be executed on the processor 501, the memory 503 and the processor 501 communicate through the communication interface 502 and the communication bus 504, and the processor 501 executes the computer program to implement the steps of the method.
The memory and the processor in the electronic equipment are communicated with the communication interface through the communication bus. The communication bus may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc.
The Memory may include a Random Access Memory (RAM), and may also include a non-volatile Memory (non-volatile Memory), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; the Integrated Circuit may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic device, or discrete hardware set.
There is also provided, in accordance with yet another aspect of an embodiment of the present application, a computer-readable medium having non-volatile program code executable by a processor.
Optionally, in an embodiment of the present application, a computer readable medium is configured to store program code for the processor to execute the above method.
Optionally, for a specific example in this embodiment, reference may be made to the example described in the foregoing embodiment, and this embodiment is not described herein again.
When the embodiments of the present application are specifically implemented, reference may be made to the above embodiments, and corresponding technical effects are achieved.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof. For a hardware implementation, the Processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented by means of units performing the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the modules is merely a logical division, and the actual implementation may have another division, for example, multiple modules or groups may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application, which are essential or part of the technical solutions contributing to the prior art, may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a U disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk. It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.
The above description is merely exemplary of the present application and is presented to enable those skilled in the art to understand and practice the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method of calculating a plurality of accumulators, the method comprising:
for each type of data to be accumulated and calculated, dividing the data into n arrays, wherein n is determined according to the number of clock cycles contained in an operation cycle of an accumulator, and n is a positive integer greater than 1:
the following target operations are performed for each type of data:
extracting an input end A of a data input group accumulator from an array, and inputting an output result obtained from an output end of the group accumulator to an input end B of the group accumulator, wherein the period of data input is one clock period or a plurality of clock periods;
after all data are input into the intra-group accumulator, inputting the data of the next category into the intra-group accumulator according to the target operation;
determining n initial accumulation results output by the accumulators in the group;
and summing the n initial accumulation results through at least one inter-group accumulator to obtain a data accumulation result of all data.
2. The method of claim 1, wherein determining the n initial accumulation results output by the intra-group accumulator comprises:
outputting the first two initial accumulation results to a first inter-group accumulator through the intra-group accumulator, wherein the number of the first inter-group accumulators is 1;
and respectively outputting one initial accumulation result to a second inter-group accumulator by the intra-group accumulators according to the rest initial accumulation results, wherein the number of the second inter-group accumulators is n-2.
3. The method of claim 2, wherein summing the n initial accumulation results by at least one inter-group accumulator to obtain a data accumulation result for all data comprises:
through the first inter-group accumulator, summing two initial accumulation results output earliest by the intra-group accumulator, and outputting an intermediate accumulation result to a second inter-group accumulator;
and adding the obtained intermediate accumulation result and the initial accumulation result through the current second inter-group accumulator, and outputting the intermediate accumulation result to the next second inter-group accumulator until all the initial accumulation results are input to the inter-group accumulators to obtain the data accumulation result of all the data output by the last second inter-group accumulator.
4. The method of claim 1, wherein the data is a floating point number or a fixed point number;
dividing the data into n number groups comprises: according to a preset dividing mode, starting from the first data, adjacent data are divided into an array until all data are divided into n arrays.
5. The method of claim 4, wherein extracting a data input from an array into input a of an intra-bank accumulator and inputting an output result obtained from an output of the intra-bank accumulator into input B of the intra-bank accumulator comprises:
according to a preset array cycle sequence, at each input moment, extracting data from an array to input an input end A of the accumulator, and inputting an output result of the array obtained from an output end of the accumulator to an input end B of the accumulator, wherein the array cycle data is the array sequence determined in sequence when the data are divided.
6. The method of claim 4, wherein extracting a data from an array input to input a of an intra-array accumulator comprises:
extracting the data arranged most front in the array according to the arrangement sequence of the data in the array from the data not extracted currently in the array;
and inputting the data arranged most ahead into an input end A of the accumulator.
7. An apparatus for calculating a plurality of accumulators, the apparatus comprising:
the dividing module is used for dividing a plurality of data to be accumulated and calculated into n arrays aiming at each type of the data, wherein n is determined according to the number of clock cycles contained in the operation cycle of the accumulator, and n is a positive integer greater than 1:
the following target operations are performed for each type of data:
the extraction module is used for extracting data from an array and inputting the data into an input end A of the in-group accumulator, and inputting an output result obtained from an output end of the in-group accumulator into an input end B of the in-group accumulator;
the input module is used for inputting the data of the next category into the in-group accumulator according to the target operation after all the data are input into the in-group accumulator;
the determining module is used for determining n initial accumulation results output by the accumulators in the group;
and the obtaining module is used for summing the n initial accumulation results through at least one inter-group accumulator to obtain a data accumulation result of all data.
8. The apparatus of claim 7, wherein the determination module is configured to:
outputting the first two initial accumulation results to a first inter-group accumulator through the intra-group accumulator, wherein the number of the first inter-group accumulators is 1;
and respectively outputting one initial accumulation result to a second inter-group accumulator by the intra-group accumulators according to the rest initial accumulation results, wherein the number of the second inter-group accumulators is n-2.
9. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 6 when executing a program stored in the memory.
10. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 6.
CN202211237771.3A 2022-10-11 2022-10-11 Calculation method and device of multiple accumulators, electronic equipment and storage medium Pending CN115328436A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211237771.3A CN115328436A (en) 2022-10-11 2022-10-11 Calculation method and device of multiple accumulators, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211237771.3A CN115328436A (en) 2022-10-11 2022-10-11 Calculation method and device of multiple accumulators, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115328436A true CN115328436A (en) 2022-11-11

Family

ID=83913426

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211237771.3A Pending CN115328436A (en) 2022-10-11 2022-10-11 Calculation method and device of multiple accumulators, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115328436A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010037427A1 (en) * 2000-04-27 2001-11-01 Yoshihiro Ogawa Information processing apparatus with parallel accumulation capability
CN102033732A (en) * 2010-12-17 2011-04-27 浙江大学 Field programmable gate array (FPGA)-based high-speed and low-delay floating point accumulator and realizing method thereof
CN113168308A (en) * 2020-04-20 2021-07-23 深圳市大疆创新科技有限公司 Floating point accumulation apparatus, method and computer storage medium

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20010037427A1 (en) * 2000-04-27 2001-11-01 Yoshihiro Ogawa Information processing apparatus with parallel accumulation capability
CN102033732A (en) * 2010-12-17 2011-04-27 浙江大学 Field programmable gate array (FPGA)-based high-speed and low-delay floating point accumulator and realizing method thereof
CN113168308A (en) * 2020-04-20 2021-07-23 深圳市大疆创新科技有限公司 Floating point accumulation apparatus, method and computer storage medium

Similar Documents

Publication Publication Date Title
CN108199993B (en) Synchronous head detection method and device, electronic equipment and readable storage medium
CN110377267B (en) Signed number adder/subtracter based on probability calculation concentrated sequence
CN110221808A (en) Preprocess method, adder and multiplier and the computer-readable medium of vector multiply-add operation
CN116028013B (en) Optimization method, device, divider and medium for FPGA division operation
CN100375011C (en) Arithmetic unit of arbitrary precision, operation method for processing data of arbitrary precision and electronic equipment
CN105956921A (en) Method and device for selecting bankcard number by user himself/herself
CN111709777A (en) Payment mode recommendation method, system, terminal device and storage medium
RU2439667C1 (en) Processor of higher functioning reliability
CN114662960A (en) Business process generation method, terminal device and computer readable storage medium
CN115328436A (en) Calculation method and device of multiple accumulators, electronic equipment and storage medium
JP3811354B2 (en) Arithmetic processing semiconductor circuit and arithmetic processing method
CN113724814A (en) Diagnosis method, diagnosis device, calculation equipment and storage medium
CN115309363A (en) Accumulator calculation method and device, electronic equipment and storage medium
CN111210345B (en) Signature method and device based on block chain network transaction and terminal equipment
CN108170403B (en) Data screening method and device
CN110515591A (en) Random digit generation method and device based on block chain
CN115328437A (en) Accumulator calculation method and device, electronic equipment and storage medium
CN111858862B (en) Reply recommendation method, reply recommendation device and electronic equipment
Solomko et al. Study of carry optimization while adding binary numbers in the rademacher number-theoretic basis
CN111353675B (en) Job scheduling method and device
RU102407U1 (en) CPU COMPUTER
CN112765594A (en) Weak password detection method and device
CN109460533B (en) Method and device for improving GEMM calculation performance
CN111309285A (en) Multimedia data sorting method and device, terminal equipment and storage medium
CN113434113B (en) Floating-point number multiply-accumulate control method and system based on static configuration digital circuit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination