CN108647007B - Computing system and chip - Google Patents

Computing system and chip Download PDF

Info

Publication number
CN108647007B
CN108647007B CN201810400084.6A CN201810400084A CN108647007B CN 108647007 B CN108647007 B CN 108647007B CN 201810400084 A CN201810400084 A CN 201810400084A CN 108647007 B CN108647007 B CN 108647007B
Authority
CN
China
Prior art keywords
arithmetic
computation
formula
data
read
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810400084.6A
Other languages
Chinese (zh)
Other versions
CN108647007A (en
Inventor
王封
刘勤让
朱珂
沈剑良
宋克
吕平
张波
杨镇西
陶常勇
汪欣
李沛杰
刘长江
付豪
张楠
陈艇
黄雅静
张帆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Information Technology Innovation Center Of Tianjin Binhai New Area
Tianjin Xinhaichuang Technology Co ltd
Original Assignee
Information Technology Innovation Center Of Tianjin Binhai New Area
Tianjin Xinhaichuang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Information Technology Innovation Center Of Tianjin Binhai New Area, Tianjin Xinhaichuang Technology Co ltd filed Critical Information Technology Innovation Center Of Tianjin Binhai New Area
Priority to CN201810400084.6A priority Critical patent/CN108647007B/en
Publication of CN108647007A publication Critical patent/CN108647007A/en
Application granted granted Critical
Publication of CN108647007B publication Critical patent/CN108647007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/57Arithmetic logic units [ALU], i.e. arrangements or devices for performing two or more of the operations covered by groups G06F7/483 – G06F7/556 or for performing logical operations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computational Mathematics (AREA)
  • Computing Systems (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Advance Control (AREA)

Abstract

The invention provides an arithmetic system and a chip, wherein the system comprises a read-write control module, an arithmetic rule controller, a sequencer, an arithmetic generator and a scheduling arithmetic module: the arithmetic rule controller acquires a storage address and an arithmetic symbol of arithmetic data from the pre-loaded configuration information; the arithmetic formula generator reads the arithmetic data from the read-write control module according to the storage address; the scheduling computation module calls an arithmetic unit corresponding to the computation symbol, computes the computation data and stores the computation result to the read-write control module; the sequencer sorts and counts the storage addresses of the operation results to obtain counting results; and the arithmetic rule controller determines the storage address of the next operation data according to the counting result. According to the invention, the algorithm function can be reconstructed in real time under the condition that the system architecture is not changed through the configuration information, so that the flexibility and the resource reuse rate of the operation system are improved; the parallel execution of the operation instruction is realized by the mode of computation scheduling, and the computing capability of the system is improved.

Description

Computing system and chip
Technical Field
The present invention relates to the field of system architecture, and in particular, to an arithmetic system and a chip.
Background
In a system built by a hardware platform, different algorithms are often time-shared. In order to solve the problem, the needed algorithm is usually processed redundantly and then called in time. Otherwise, only one algorithm function can be realized, and no adaptability exists. However, when a large number of algorithms are needed, the above processing method consumes a large amount of system resources, and the power consumption of the system is also very large.
Disclosure of Invention
In view of the above, the present invention provides an operating system and a chip thereof, so as to improve the flexibility and the resource reuse rate of the operating system and the computing power of the system.
In a first aspect, an embodiment of the present invention provides an operation system, where the system includes a read-write control module, an equation rule controller, a sequencer, an equation generator, and a scheduling algorithm module: the read-write control module is used for storing operation data; the arithmetic rule controller is used for acquiring a storage address and an arithmetic symbol of the arithmetic data from the pre-loaded configuration information; the configuration information corresponds to the operational data; the arithmetic formula generator is used for reading the arithmetic data from the read-write control module according to the storage address; the scheduling computation module is used for calling the calculators corresponding to the computation symbols, performing computation on the computation data to obtain computation results, and storing the computation results to the read-write control module; the sequencer is used for sequencing and counting the storage addresses of the operation result to obtain a counting result; the arithmetic rule controller is also used for determining the storage address of the next operation data according to the counting result.
With reference to the first aspect, an embodiment of the present invention provides a first possible implementation manner of the first aspect, where the arithmetic system further includes a division operand module; the read-write control module is also used for judging whether the operation data needs division operation according to the operation symbol, and if so, starting the division operation particle module to complete the division operation; the division operation particle module is used for transmitting an operation result corresponding to the division operation to the formula generator.
With reference to the first aspect, an embodiment of the present invention provides a second possible implementation manner of the first aspect, where the formula rule controller is further configured to reconstruct, according to the configuration information, the operation data according to a form of the following formula:
Figure BDA0001646052470000021
a, B, C and D are numerical values corresponding to the operation data; k is an initial value of the accumulation operation; n is the total number of accumulation operations; and sending the parameters corresponding to the reconstruction result to the formula generator.
With reference to the second possible implementation manner of the first aspect, an embodiment of the present invention provides a third possible implementation manner of the first aspect, where the equation generator is further configured to generate a corresponding equation group according to the parameter, expand a minimum address of the equation group to a maximum address, and obtain addresses of A, B, C and D; and reading corresponding operation data from the read-write control module according to the A, B, C and the D addresses.
With reference to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a fourth possible implementation manner of the first aspect, where the equation generator is further configured to store the read operand data in an address corresponding to the parameter, and output the equation group to the scheduling operand module.
With reference to the third possible implementation manner of the first aspect, an embodiment of the present invention provides a fifth possible implementation manner of the first aspect, where the scheduling computation particle module is further configured to schedule corresponding calculators to perform parallel computation according to the order of each formula in the formula group, the last data bit of each formula, the computation data and the computation symbol corresponding to each formula, so as to obtain a computation result, and store the computation result in the read-write control module; the arithmetic unit comprises an adder, a multiplier and an accumulator.
With reference to the fifth possible implementation manner of the first aspect, an embodiment of the present invention provides a sixth possible implementation manner of the first aspect, where a counter group is disposed in the sequencer, and the counter group includes a row counter group and a column counter group; the sequencer is also used for sequencing the operation result output by the scheduling computation module; and transmitting the addresses of the sorted operation results to a counter group so that the counter group counts the addresses of the operation results to obtain counting results, and obtaining an indication signal of the operation degree of the operation data according to the counting results.
With reference to the sixth possible implementation manner of the first aspect, an embodiment of the present invention provides a seventh possible implementation manner of the first aspect, where the read-write control module is further configured to: receiving an operation result and an address of the operation result; and storing the operation result into the read-write control module according to the address of the operation result.
With reference to the sixth possible implementation manner of the first aspect, an embodiment of the present invention provides a seventh possible implementation manner of the first aspect, where the mathematical formula controller is further configured to obtain a storage address of next operation data according to the configuration information and the counting result output by the sequencer.
In a second aspect, an embodiment of the invention provides a chip, where the above-mentioned computing system is disposed in the chip.
The embodiment of the invention has the following beneficial effects:
the embodiment of the invention provides an operation system and a chip, wherein a read-write control module stores operation data; the arithmetic rule controller acquires a storage address and an arithmetic symbol of arithmetic data from the pre-loaded configuration information; the arithmetic formula generator reads the arithmetic data from the read-write control module according to the storage address; the scheduling computation module calls an arithmetic unit corresponding to the computation symbol, computes the computation data to obtain a computation result, and stores the computation result in the read-write control module; the sequencer sorts and counts the storage address of the operation result to obtain a counting result; the arithmetic rule controller determines the storage address of the next operation data according to the counting result; the method can reconstruct the algorithm function in real time under the condition of unchanging the system architecture through the configuration information, thereby improving the flexibility and the resource reuse rate of the operation system; the parallel execution of the operation instruction is realized by the mode of computation scheduling, and the computing capability of the system is improved.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the invention as set forth above.
In order to make the aforementioned and other objects, features and advantages of the present invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic structural diagram of an operating system according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of another computing system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a signal flow in the computing system according to an embodiment of the present invention;
FIG. 4 is a timing diagram of interface signals from the equation generator to the scheduling algorithm module according to an embodiment of the present invention;
FIG. 5 is a timing diagram of the interface signals output from the scheduling computation module according to an embodiment of the present invention;
FIG. 6 is a timing diagram of interface signals from the formula rule controller to the formula generator according to an embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In the existing hardware platform, the resource utilization rate of the algorithm implementation mode is low, and the computing capability is poor.
For the convenience of understanding the embodiment, a detailed description will be given to a computing system disclosed in the embodiment of the present invention.
Referring to fig. 1, a schematic diagram of an arithmetic system is shown, which includes a read-write control module 100, an arithmetic rule controller 101, a sequencer 102, an arithmetic generator 103, and a scheduling arithmetic module 104.
The read-write control module 100 is used for storing operation data; specifically, the initial operation data is sent to the read-write control module by the upper computer; in the subsequent operation process, the read-write control module receives the operation result transmitted by the scheduling computation module and stores the operation result in a corresponding address.
The arithmetic rule controller 101 is used for acquiring a storage address and an arithmetic symbol of arithmetic data from the pre-loaded configuration information; the configuration information corresponds to the operational data; the configuration information is generally sent by an upper computer and comprises a storage address of initial operation data, a storage address distributed for the operation data to be calculated and an operation symbol of a corresponding equation; the storage address is an address of corresponding data in the read-write control module.
The arithmetic formula generator 103 is used for reading the arithmetic data from the read-write control module according to the storage address; specifically, after receiving the storage address and the operation symbol of the operation data sent by the formula rule controller, the formula generator sends the storage address of the operation data to the read-write control module, and receives the corresponding operation data returned by the read-write control module.
The scheduling computation module 104 is configured to call a computing unit corresponding to the computation symbol, perform computation on the computation data to obtain a computation result, and store the computation result in the read-write control module; specifically, the scheduling operator module receives the operation data, the operation symbol and the storage address of the operation data to be calculated, which are sent by the formula generator, and calls a corresponding operator according to the operation symbol to complete the operation; the scheduling computation module generally comprises an adder, a multiplier and an accumulator; in the operation process, various operations are generally required, and the scheduling computation module can work in a sequential computation mode and can also work in a disorder parallel mode; compared with sequential calculation, the out-of-order parallel calculation mode has more sufficient resource utilization and saves time.
The sequencer 102 is configured to sequence and count storage addresses of the operation result to obtain a count result; specifically, the sequencer firstly receives a storage address of operation data in the configuration information; and in the subsequent operation process, receiving the storage address of the operation result sent by the scheduling computation module, sequencing and counting the storage address, and sending the counting result to the formula rule controller.
The above formula rule controller 101 is further configured to determine a storage address of next operation data according to the counting result; specifically, the arithmetic rule controller determines the storage address of the next arithmetic data to be calculated according to the counting result, reads the corresponding arithmetic data and arithmetic symbols, and transmits the arithmetic data and arithmetic symbols to the arithmetic formula generator for the next operation.
The embodiment of the invention provides an operation system; in the system, a read-write control module stores operation data; the arithmetic rule controller acquires a storage address and an arithmetic symbol of arithmetic data from the pre-loaded configuration information; the arithmetic formula generator reads the arithmetic data from the read-write control module according to the storage address; the scheduling computation module calls an arithmetic unit corresponding to the computation symbol, computes the computation data to obtain a computation result, and stores the computation result in the read-write control module; the sequencer sorts and counts the storage address of the operation result to obtain a counting result; the arithmetic rule controller determines the storage address of the next operation data according to the counting result; the system can reconstruct the algorithm function in real time under the condition of unchanging the system architecture through the configuration information, thereby improving the flexibility and the resource reuse rate of the operation system; the parallel execution of the operation instruction is realized by the mode of computation scheduling, and the computing capability of the system is improved.
Referring to fig. 2, a schematic structural diagram of another computing system is shown, which is implemented on the basis of the computing system shown in fig. 1. The arithmetic system comprises a read-write control module 200, an arithmetic rule controller 201, a sequencer 202, an arithmetic generator 203, a scheduling arithmetic grain module 204 and a division arithmetic grain module 205.
The read-write control module is mainly used for storing operation data, including initial operation data and operation results in the algorithm implementation process; in addition, the read-write control module is also used for judging whether the operation data needs division operation according to the operation sign, and if so, starting the division operation module to complete the division operation; specifically, the read-write control module receives an operation symbol sent by the formula generator, so as to judge whether to start calculation division, and if so, reads data from a RAM (Random-Access Memory) of the read-write control module to the division calculation grain module; the division operation particle module is used for transmitting an operation result corresponding to the division operation to the formula generator; in addition, the division calculation grain module sends the division indication to the formula rule controller after division calculation.
The arithmetic rule controller is used for reconstructing the arithmetic data according to the configuration information and in the form of the following formula:
Figure BDA0001646052470000071
a, B, C and D are numerical values corresponding to the operation data; k is an initial value of the accumulation operation; n is the total number of accumulation operations; and sending the parameters corresponding to the reconstruction result to the formula generator.
In general, all algorithms can be split into the form of the following underlying formulas:
Figure BDA0001646052470000072
a, B, C, D can be any number, if the formula includes addition calculation after subtraction and inversion, the division can solve the reciprocal and then the multiplication is performed;
from equation 1, the following can be derived:
(1) y is C; (constant number)
(2) Y is A + C; (adding)
(3) Y ═ a + F; (F ═ -C) (subtract)
(4)Y=AK*BK(riding)
(5) Y is C/E; (E1/D) (except)
(6)
Figure BDA0001646052470000073
(accumulation)
(7)
Figure BDA0001646052470000074
(Reynaud J ═ A)
(8)
Figure BDA0001646052470000081
(multiply accumulate)
(9)
Figure BDA0001646052470000082
(multiply decrease I ═ B)
(10)
Figure BDA0001646052470000083
(divide accumulate/subtract H1/B)
(11)
Figure BDA0001646052470000084
(multiply accumulate/subtract add/subtract fixed number)
(12)
Figure BDA0001646052470000085
(division by a fixed number H1/B)
(13)
Figure BDA0001646052470000086
(multiply accumulate/subtract add/subtract fixed number and multiply/divide fixed number)
(14)
Figure BDA0001646052470000087
(division by a fixed number, accumulation/subtraction, multiplication/division by a fixed number H1/B)
The algorithm rule controller receives the configuration information and stores it, according to the address of the operation data (Y) to be calculated
Figure BDA0001646052470000088
The form of (2) obtains A, B, C and the storage address and the operation sign of D, reconstructs the formula, and sends the operation data required by the formula to the formula generator.
The formula generator is used for generating a corresponding formula group according to the parameters, and expanding the minimum address of the formula group to the maximum address to obtain A, B, C and D addresses; reading corresponding operation data from the read-write control module according to the addresses A, B, C and D; specifically, the formula generator generates a group of formulas from the parameters sent by the formula rule controller module, expands the minimum address to the maximum address of the formula one by one, and sends the address of the multiplier A, the address of the multiplier B, the address of the addend C and the address of the accumulated multiplier D in the formula as indication signals to the read-write control module, so that the read-write control module reads and writes the data of the corresponding address.
Furthermore, the formula generator is also used for storing the read operation data in the address corresponding to the parameter and outputting the formula group to the scheduling calculation module; specifically, the arithmetic formula generator replaces the address of the arithmetic data with the data read from the RAM of the read-write control module, and then outputs the arithmetic formula to the scheduling module according to the interface between the arithmetic formula generator and the scheduling module so as to carry out the next processing.
The scheduling arithmetic module is used for scheduling corresponding arithmetic units to perform parallel operation according to the sequence of each arithmetic formula in the arithmetic formula group, the last data bit of each arithmetic formula, the arithmetic data and the arithmetic symbol corresponding to each arithmetic formula to obtain an arithmetic result, and storing the arithmetic result to the read-write control module; the arithmetic unit comprises an adder, a multiplier and an accumulator; specifically, the scheduling computation grain module carries out arbitration and scheduling basic multiplication, addition and accumulation computation grains according to a plurality of pairs of data sent by the computation formula generator and the computation types among the data, so that the data are executed in a parallel out-of-order mode, the computation of a plurality of computation formulas is completed, and the computation result is returned to the value read-write control module.
Out-of-order execution (out-of-order execution) refers to a technique of separately developing a plurality of instructions in an order not specified by a program to be processed by each corresponding circuit unit. Therefore, after the state of each circuit unit and the specific situation of whether each instruction can be executed in advance are analyzed, the instruction which can be executed in advance is immediately sent to the corresponding circuit. The scheduling computation module may include a plurality of adders, multipliers or accumulators, and query states of corresponding calculators according to computation symbols, and transmit computation data to the corresponding calculators in an idle state or requiring less latency for data computation.
A counter group is arranged in the sequencer and comprises a row counter group 202a and a column counter group 202 b; the sequencer is also used for sequencing the operation result output by the scheduling computation module; transmitting the addresses of the sorted operation results to a counter group so that the counter group counts the addresses of the operation results to obtain counting results, and obtaining an indication signal of the operation degree of the operation data according to the counting results; because the dispatching computation module carries out parallel out-of-order computation, the addresses of the computation results returned to the sequencer are out-of-order, the sequencer has the function of arranging the out-of-order addresses in sequence, then transmitting the sequenced addresses to the row-column counter group module, giving out the row-column sequence according to the row-column counting, and obtaining the counting result.
The read-write control module is also used for receiving the operation result and the address of the operation result; storing the operation result into the read-write control module according to the address of the operation result; specifically, the read-write control module receives the operation result and the address of the operation result sent by the scheduling computation module, and stores the operation result to the corresponding position in the RAM through address mapping.
The arithmetic rule controller is also used for obtaining the storage address of the next operational data according to the configuration information and the counting result output by the sequencer; specifically, the formula rule controller indicates which element (i.e. to-be-calculated operation data) in the operation data matrix should be started to calculate according to the rule in the configuration information by the judgment of the row-column counter group after division indication and sorting, obtains the storage address of the parameter (i.e. known operation data) required for calculating the element, and transmits the address to the formula generator.
FIG. 3 is a schematic diagram of signal flow in the computing system; the operation system can also be called an instruction level parallel control framework of real-time reconfigurable computing, and the framework comprises an algorithm control subsystem and a scheduling computation particle subsystem; the algorithm control subsystem can carry out real-time reconstruction under the condition that the whole architecture is not changed through the input of configuration information, and can realize any algorithm which can be decomposed into bottom layer calculation grains such as addition, subtraction, multiplication, division and the like; when different algorithms are needed to be adapted in a time-sharing mode, such as matrix calculation, Fast Fourier Transform (FFT), a filter and the like, the algorithms can be reconstructed to realize different functions on the basis of not changing codes and hardware only by manually and freely loading or fully automatically loading configuration information; the dispatching computation particle subsystem can execute the instruction level in parallel, so that the sequential instructions are executed out of order, any multiple computation particles at the bottom layer can be dispatched in parallel for parallel computation, and the parallel computation capability of the dispatching computation particle subsystem is greatly improved; the two subsystems can be subjected to multiple superposition to realize any complex algorithm; the whole system has real-time performance, and an algorithm can be reconstructed in real time without power failure; several fixed algorithms can be completed through full-automatic sequential time-sharing reconstruction without human-computer interaction.
The embodiment of the invention also provides a chip, and the operation system is arranged in the chip.
The chip provided by the embodiment of the invention has the same technical characteristics as the operation system provided by the embodiment, so that the same technical problems can be solved, and the same technical effects can be achieved.
The embodiment of the invention also provides an FPGA chip, and the operation system shown in FIG. 2 is arranged in the FPGA. Each module is realized by a circuit, and different modules communicate with each other through a bus; the sequencer module obtains the storage address of the operation data from the configuration information module, and the meanings of the interface signals of the configuration information module and the sequencer module are shown in table 1:
TABLE 1
Name of signal Bit width Means of
ROW_START 7 First address of row ordering
ROW_Stop
7 Termination address of row ordering
COL_START
7 First address of column ordering
COL_Stop
7 Termination address of column ordering
Valid 1 Indicating data validity
Ready
1 Bus idle indication
The arithmetic rule controller obtains the storage address and the arithmetic symbol of the arithmetic data from the configuration information. The meaning of each interface signal of the configuration information module and the formula rule controller module is shown in table 2:
TABLE 2
Figure BDA0001646052470000111
Figure BDA0001646052470000121
Figure BDA0001646052470000131
Figure BDA0001646052470000141
The arithmetic generator sends the arithmetic data and the arithmetic sign to the scheduling arithmetic module to calculate Y1=A1*B1+C1、Y2=(A2*B2+C2*D2)*E2、Y3=(A3*B3+C3)*D3For an example of the three calculation formulas, referring to the timing chart of the interface signal from the formula generator to the scheduling algorithm module shown in fig. 4, the meaning of each part in the interface signal between the formula generator module and the scheduling algorithm module is shown in table 3:
TABLE 3
Figure BDA0001646052470000142
The scheduling computation module sends the computation result and the storage address of the computation result to the read-write control module through the bus, and sends the storage address of the computation result to the sequencer, see fig. 5 for a timing chart of interface signals output from the scheduling computation module, and fig. 5 depicts Y1、Y2、Y3The timing of the return of the calculation result of (2).
The meaning of each signal is shown in table 4, Ready is sent to the algorithm generator, and it indicates that the scheduling algorithm module can receive the operation data:
TABLE 4
Name of signal Bit width Means of
Ready 1 Bus ready indicates that and valid is a pair of handshake signals
Valid 1 Valid indicator signal of Result
ID
3 To which algorithm controller the returned calculation result belongs
Seq_num 14 Operational serial number in an algorithm controller
Result 128 Calculation results
The formula rule controller sends the storage address and the operation symbol of the operation data required by the formula to the formula generator. It is assumed that all equations can only be one of the following three forms:
Y1=A1*B1+C1、Y2=(A2*B2+C2*D2)*E2、Y3=(A3*B3+C3)*D3
the Interface signal form is similar to AXI (Advanced Extensible Interface) stream (stream) Interface, and the timing diagram of the Interface signal from the formula rule controller to the formula generator is shown in fig. 6, where the meaning of each signal is shown in table 5:
TABLE 5
Figure BDA0001646052470000151
Figure BDA0001646052470000161
And the arithmetic formula generator sends the storage address of the arithmetic data to the read-write control module to read the data of the corresponding address. The meaning of each interface signal from the equation generator module to the read-write control module is shown in table 6:
TABLE 6
Figure BDA0001646052470000162
Figure BDA0001646052470000171
The read-write control module returns corresponding operation data to the formula generator, and the meaning of the interface signal is shown in table 7:
TABLE 7
Figure BDA0001646052470000172
Figure BDA0001646052470000181
The following further describes the operation process of the above-mentioned operation system by taking four operations of matrix LU decomposition, matrix U inversion, matrix L inversion, and multiplication of matrix U inversion and matrix L inversion as examples.
The LU decomposition formula is as follows:
Figure BDA0001646052470000182
Figure BDA0001646052470000183
the formula for the inversion of U is as follows:
Figure BDA0001646052470000184
Figure BDA0001646052470000185
the formula for the inversion of L is as follows:
Figure BDA0001646052470000186
Figure BDA0001646052470000187
the inverse of U multiplied by the inverse of L is given by:
Figure BDA0001646052470000188
from the algorithmic analysis, the above equations (5), (8), (11) and (13) respectively corresponding to the deductions from the above equation (1); the realization process is that the algorithm rule controller reconstructs the LU decomposition formula according to the parameters of the configuration information, if the configuration information meets the LU decomposition operation condition, the corresponding formula generator is started, if the first element a is firstly used11Giving a division calculation grain, starting reciprocal operation, starting operation of a first row of elements after reciprocal operation is finished, and monitoring a12The result of the calculation of (c) is returned (monitored by the row counter group and the column counter group), the calculation of the second row element is started, and so on.
A certain element aijWhether the operation can be started or not needs to satisfy three conditions:
when i is larger than j, judging that the row counter of the ith row is larger than j-1 and the column counter of the jth column is larger than j-1;
when i < >, j is determined, the row counter of the ith row is larger than i-1, and the column counter of the jth column is larger than i-1;
in the formula rule controller, the operations of L and U are respectively executed, and the judgment results of each row and column counter group are returned to the counter and the counter which is not started to determine which elements can start the calculation; dynamically changing the logic in the formula rule control module according to the parameter input of the configuration information module to realize the function; as shown in table 8, it is determined that the L operation in row 2 and column 2 is possible according to the above rule.
TABLE 8
Figure BDA0001646052470000191
According to the LU inversion operation formula, LU inversion can be operated independently without mutual interference, taking U inversion as an example, as shown in table 9, the calculation sequence of U inversion is implemented by dynamically changing the logic in the operation rule control module according to the parameter input of the configuration information module.
TABLE 9
Figure BDA0001646052470000192
Figure BDA0001646052470000201
According to the L inverse U inverse multiplication operation formula, the calculation of all elements in the matrix can be calculated in parallel without mutual interference, and the logic in the formula rule control module is dynamically changed according to the parameter input of the configuration information module to realize the function.
The embodiment of the invention provides an FPGA chip, which can realize dynamic reconfiguration without power-down operation, can simultaneously execute a plurality of instructions in parallel, and has high performance, high flexibility and very high resource reuse rate; the computing system can also be applied to the aspect of brain-like computing, so that a hardware platform is more intelligent and generalized, and can carry out computing of any algorithm along with the core like the human brain.
The computer program product of the computing system and the chip provided by the embodiment of the present invention includes a computer readable storage medium storing a program code, and instructions included in the program code may be used to execute the method described in the foregoing method embodiment, and specific implementation may refer to the method embodiment, which is not described herein again.
In addition, in the description of the embodiments of the present invention, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
Finally, it should be noted that: the above-mentioned embodiments are only specific embodiments of the present invention, which are used for illustrating the technical solutions of the present invention and not for limiting the same, and the protection scope of the present invention is not limited thereto, although the present invention is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present invention, and they should be construed as being included therein. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (9)

1. The direct current operation system is characterized by comprising a read-write control module, an arithmetic rule controller, a sequencer, an arithmetic generator and a scheduling arithmetic module:
the read-write control module is used for storing operation data;
the arithmetic rule controller is used for acquiring a storage address and an arithmetic symbol of the arithmetic data from the pre-loaded configuration information; the configuration information corresponds to the operational data;
the arithmetic formula generator is used for reading the arithmetic data from the read-write control module according to the storage address;
the scheduling computation module is used for calling the arithmetic unit corresponding to the computation symbol, computing the computation data to obtain a computation result, and storing the computation result to the read-write control module;
the sequencer is used for sequencing and counting the storage addresses of the operation result to obtain a counting result;
the arithmetic rule controller is also used for determining the storage address of the next operational data according to the counting result;
the formula rule controller is further configured to reconstruct the operation data according to the configuration information in the form of the following formula:
Figure FDA0002575041710000011
a, B, C and D are numerical values corresponding to the operation data; k is an initial value of the accumulation operation; n is the total number of accumulation operations;
and sending the parameters corresponding to the reconstruction result to the formula generator.
2. The computing system of claim 1, further comprising a division operator module;
the read-write control module is also used for judging whether the operation data needs division operation according to the operation symbol, and if so, starting the division operation particle module to complete the division operation; the division operation module is used for transmitting an operation result corresponding to the division operation to the arithmetic formula generator.
3. The computing system of claim 1, wherein the formula generator is further configured to generate a corresponding formula group according to the parameters, and expand a minimum address of the formula group to a maximum address to obtain the addresses of A, B, C and D; and reading corresponding operation data from the read-write control module according to the A, B, C and D addresses.
4. The computing system of claim 3, wherein the formula generator is further configured to store the read operation data in an address corresponding to the parameter, and output the formula group to the scheduling and computation module.
5. The computing system of claim 3, wherein the scheduling computation element module is further configured to schedule corresponding computing units to perform parallel computation according to the order of each formula in the formula group, the last data bit of each formula, the computation data corresponding to each formula, and the computation symbol, so as to obtain a computation result, and store the computation result in the read-write control module; the arithmetic unit comprises an adder, a multiplier and an accumulator.
6. The computing system of claim 5, wherein the sequencer is configured with a counter group, the counter group comprising a row counter group and a column counter group;
the sorter is also used for sorting the operation results output by the scheduling computation module; and transmitting the sorted addresses of the operation results to the counter group so that the counter group counts the addresses of the operation results to obtain counting results, and obtaining an indication signal of the operation degree of the operation data according to the counting results.
7. The computing system of claim 6, wherein the read-write control module is further configured to:
receiving the operation result and the address of the operation result;
and storing the operation result into the read-write control module according to the address of the operation result.
8. The computing system of claim 6, wherein the formula rule controller is further configured to obtain a storage address of next operation data according to the configuration information and the counting result output by the sequencer.
9. A chip, wherein the computing system of any one of claims 1-8 is disposed in the chip.
CN201810400084.6A 2018-04-28 2018-04-28 Computing system and chip Active CN108647007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810400084.6A CN108647007B (en) 2018-04-28 2018-04-28 Computing system and chip

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810400084.6A CN108647007B (en) 2018-04-28 2018-04-28 Computing system and chip

Publications (2)

Publication Number Publication Date
CN108647007A CN108647007A (en) 2018-10-12
CN108647007B true CN108647007B (en) 2020-10-16

Family

ID=63748531

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810400084.6A Active CN108647007B (en) 2018-04-28 2018-04-28 Computing system and chip

Country Status (1)

Country Link
CN (1) CN108647007B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114065122A (en) * 2020-07-31 2022-02-18 深圳市中兴微电子技术有限公司 Data processing method, device and storage medium
WO2022120722A1 (en) * 2020-12-10 2022-06-16 深圳市大疆创新科技有限公司 Resource scheduling apparatus, digital signal processor and movable platform
CN113064854B (en) * 2021-04-15 2022-07-19 天津芯海创科技有限公司 Hardware calculation reconstruction method and device, computer equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1862520A (en) * 2006-06-09 2006-11-15 北京中星微电子有限公司 Data processing device and set-up method thereof
CN101770355A (en) * 2009-12-30 2010-07-07 北京龙芯中科技术服务中心有限公司 Floating-point multiply-add fused unit compatible with double-precision and double-single-precision and compatibility processing method thereof
WO2012062086A1 (en) * 2010-11-12 2012-05-18 中兴通讯股份有限公司 Method and apparatus for acquiring data rate
CN102750127A (en) * 2012-06-12 2012-10-24 清华大学 Coprocessor
CN103294446A (en) * 2013-05-14 2013-09-11 中国科学院自动化研究所 Fixed-point multiply-accumulator
CN104375802A (en) * 2014-09-23 2015-02-25 上海晟矽微电子股份有限公司 Multiplication and division device and operational method
CN106598735A (en) * 2016-12-13 2017-04-26 广东金赋科技股份有限公司 Distributive calculation method, main control node, calculation node and system
CN107688469A (en) * 2016-12-23 2018-02-13 北京国睿中数科技股份有限公司 Take into account the Reconfigurable Computation device of universal command and special instruction

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7088860B2 (en) * 2001-03-28 2006-08-08 Canon Kabushiki Kaisha Dynamically reconfigurable signal processing circuit, pattern recognition apparatus, and image processing apparatus

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1862520A (en) * 2006-06-09 2006-11-15 北京中星微电子有限公司 Data processing device and set-up method thereof
CN101770355A (en) * 2009-12-30 2010-07-07 北京龙芯中科技术服务中心有限公司 Floating-point multiply-add fused unit compatible with double-precision and double-single-precision and compatibility processing method thereof
WO2012062086A1 (en) * 2010-11-12 2012-05-18 中兴通讯股份有限公司 Method and apparatus for acquiring data rate
CN102750127A (en) * 2012-06-12 2012-10-24 清华大学 Coprocessor
CN103294446A (en) * 2013-05-14 2013-09-11 中国科学院自动化研究所 Fixed-point multiply-accumulator
CN104375802A (en) * 2014-09-23 2015-02-25 上海晟矽微电子股份有限公司 Multiplication and division device and operational method
CN106598735A (en) * 2016-12-13 2017-04-26 广东金赋科技股份有限公司 Distributive calculation method, main control node, calculation node and system
CN107688469A (en) * 2016-12-23 2018-02-13 北京国睿中数科技股份有限公司 Take into account the Reconfigurable Computation device of universal command and special instruction

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于体系结构模板的粗粒度可重构SoC设计方法;沈剑良;《一种基于体系结构模板的粗粒度可重构SoC设计方法》;20160615;全文 *

Also Published As

Publication number Publication date
CN108647007A (en) 2018-10-12

Similar Documents

Publication Publication Date Title
CN108647007B (en) Computing system and chip
CN105022670B (en) Heterogeneous distributed task processing system and its processing method in a kind of cloud computing platform
US20170109323A9 (en) Techniques to perform data reduction for statistical tests
CN110163363B (en) Computing device and method
CN110750351B (en) Multi-core task scheduler, multi-core task scheduling method, multi-core task scheduling device and related products
Xiang et al. Efficient sparse group feature selection via nonconvex optimization
CN104573822B (en) For integrated control component for calculating the model computing unit of LOLIMOT
CN110554854B (en) Data processor, method, chip and electronic equipment
CN112256623A (en) Heterogeneous system-based processing performance optimization method and device
CN110515587B (en) Multiplier, data processing method, chip and electronic equipment
CN108595149B (en) Reconfigurable multiply-add operation device
CN113158435A (en) Complex system simulation running time prediction method and device based on ensemble learning
CN111258541B (en) Multiplier, data processing method, chip and electronic equipment
CN111381882B (en) Data processing device and related product
CN113031912A (en) Multiplier, data processing method, device and chip
CN113591031A (en) Low-power-consumption matrix operation method and device
CN111260070B (en) Operation method, device and related product
CN113112009B (en) Method, apparatus and computer-readable storage medium for neural network data quantization
CN110688087B (en) Data processor, method, chip and electronic equipment
CN114219091A (en) Network model reasoning acceleration method, device, equipment and storage medium
CN110750249B (en) Method and device for generating fast Fourier transform code
CN111985628A (en) Computing device and neural network processor including the same
CN117611425B (en) Method, apparatus, computer device and storage medium for configuring computing power of graphic processor
CN112230884B (en) Target detection hardware accelerator and acceleration method
CN111258641A (en) Operation method, device and related product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant