CN113407235A - Operation circuit system and chip based on hardware acceleration - Google Patents

Operation circuit system and chip based on hardware acceleration Download PDF

Info

Publication number
CN113407235A
CN113407235A CN202110719972.6A CN202110719972A CN113407235A CN 113407235 A CN113407235 A CN 113407235A CN 202110719972 A CN202110719972 A CN 202110719972A CN 113407235 A CN113407235 A CN 113407235A
Authority
CN
China
Prior art keywords
reciprocal
iteration
output
square
data register
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110719972.6A
Other languages
Chinese (zh)
Other versions
CN113407235B (en
Inventor
詹植铜
何再生
肖刚军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Amicro Semiconductor Co Ltd
Original Assignee
Zhuhai Amicro Semiconductor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Amicro Semiconductor Co Ltd filed Critical Zhuhai Amicro Semiconductor Co Ltd
Priority to CN202110719972.6A priority Critical patent/CN113407235B/en
Publication of CN113407235A publication Critical patent/CN113407235A/en
Application granted granted Critical
Publication of CN113407235B publication Critical patent/CN113407235B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30098Register arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses an arithmetic circuit system and a chip based on hardware acceleration, wherein the arithmetic circuit system comprises a square-opening iteration module and a reciprocal iteration module; the squaring iteration module is used for controlling the preset number to be squared to be multiplied by a reciprocal result which is output by the reciprocal iteration module and meets a preset convergence condition so as to carry out the squaring iteration operation with preset precision when the ending mark of the reciprocal iteration operation is effective and the ending mark of the squaring iteration operation is ineffective; and the reciprocal iteration module is used for controlling a reciprocal result output by the reciprocal iteration module in the last reciprocal iteration operation and a square root result output by the square-open iteration module in the last square-open iteration operation in the current reciprocal iteration operation when the enable flag of the reciprocal iteration operation is effective, and obtaining the reciprocal result which is output by the reciprocal iteration module and meets the preset convergence condition. The hardware processing of reciprocal iteration operation formula and square-open iteration operation is completed, and a divider is avoided.

Description

Operation circuit system and chip based on hardware acceleration
Technical Field
The present invention relates to the field of integrated circuit technology, and in particular, to an arithmetic circuit system and chip based on hardware acceleration.
Background
The open-square operation is applied to a large number of digital circuits, and methods generally adopted in the prior art for performing the open-square operation include calculation methods such as a real function approximation method and a newton iteration method, wherein the real function approximation method is low in precision, the newton iteration method relates to the calculation of division, and the calculation processes of the real function approximation method and the newton iteration method require a large amount of hardware resources. Particularly, the conventional floating-point division operation is complex, and is generally finished by using multiple times of subtraction iteration, so that the hardware resource overhead is high, the finished clock period is long, and the method is not suitable for the application scene of batch processing of data by sensors which are sensitive in cost and have high requirements on calculation delay, and restricts the application range of square-opening operation.
Disclosure of Invention
Aiming at the defect of square root calculation by a Newton iteration method, the invention provides an arithmetic circuit system and a chip based on hardware acceleration on the basis of square root calculation by the Newton iteration method, thereby avoiding the defect that a divider is required for square root calculation by the original Newton iteration method and greatly saving hardware resources. The specific technical scheme is as follows:
an arithmetic circuit system based on hardware acceleration comprises a square-opening iteration module and a reciprocal iteration module; the squaring iteration module is used for controlling the preset number to be squared to be multiplied by a reciprocal result which is output by the reciprocal iteration module and meets a preset convergence condition so as to carry out the squaring iteration operation with preset precision when the ending mark of the reciprocal iteration operation is effective and the ending mark of the squaring iteration operation is ineffective; the reciprocal iteration module is used for controlling a reciprocal result output by the reciprocal iteration module in the last reciprocal iteration operation and a square root result output by the square-open iteration module in the last square-open iteration operation in the current reciprocal iteration operation when the enable flag of the reciprocal iteration operation is effective, and obtaining the reciprocal result which is output by the reciprocal iteration module and meets the preset convergence condition; the end mark of reciprocal iteration operation is an iteration end control signal set from a reciprocal iteration module to an open square iteration module; the end mark of the square-opening iteration operation is an iteration end control signal set by the square-opening iteration module; the enabling mark of reciprocal iteration operation is an iteration starting control signal set from the square-opening iteration module to the reciprocal iteration module; the square-open iteration operation is an iteration operation of solving the square root of a number to be squared by utilizing a Newton iteration method formula, and a first circuit unit framework corresponding to the iteration logic relationship of the square-open iteration operation is connected in a square-open iteration module; the reciprocal iteration operation is an iteration operation of solving the reciprocal of the number to be extracted by using a Newton iteration method formula, and a second circuit unit framework corresponding to the iteration logic relationship of the reciprocal iteration operation is connected in a reciprocal iteration module.
Compared with the prior art, before the square root of the number to be derived is obtained, the operation circuit system needs to calculate the reciprocal meeting the preset convergence condition through the reciprocal iteration module in advance in an iteration mode, and then the square-derived iteration module is used for calculating the square root of the number to be derived in an iteration mode, so that the situation that the required reciprocal of the square root is obtained through calculation by using divider resources in the operation circuit system is avoided, and the square-derived iteration module and the reciprocal iteration module are in the same precision for iteration calculation, the calculation speed is improved, and hardware resources are saved.
Further, in the process of performing the square-off iteration operation with the preset precision by the square-off iteration module, when the absolute value of the difference between the square root result output in the previous-stage square-off iteration operation and the square root result output in the current-stage square-off iteration operation meets the square-off iteration termination condition, determining to terminate the square-off iteration operation, taking the square root result output in the current-stage square-off iteration operation as the square root result finally obtained by the square-off iteration operation, and setting the termination flag of the square-off iteration operation as valid; when the absolute value of the difference value between the square root result output in the last-stage squaring iterative operation and the square root result output in the current-stage squaring iterative operation does not meet the squaring iterative termination condition, outputting the square root result output in the current-stage squaring iterative operation to a reciprocal iteration module, and simultaneously setting an effective enabling mark of reciprocal iteration operation to the reciprocal iteration module; therefore, on the premise that the square root of the current-stage squaring iterative operation does not meet the evolution iteration termination condition, the reciprocal iteration module is started to obtain a more appropriate reciprocal result, so that the square root closer to the evolution iteration termination condition can be conveniently obtained in the next-stage squaring iterative operation.
In the process of reciprocal iteration operation with preset precision by a reciprocal iteration module, when the difference value between a reciprocal result output in the last reciprocal iteration operation and a reciprocal result output in the current reciprocal iteration operation meets a reciprocal iteration termination condition, setting an effective reciprocal iteration operation termination mark to a square-open iteration module, determining that the reciprocal result output in the last reciprocal iteration operation by the reciprocal iteration module is a reciprocal result meeting a preset convergence condition, and transmitting the reciprocal result meeting the preset convergence condition to the square-open iteration module; when the difference value between the reciprocal result output in the last reciprocal iterative operation and the reciprocal result output in the previous reciprocal iterative operation does not meet the reciprocal iteration termination condition, updating the reciprocal result output in the current reciprocal iterative operation to the input operand of the next reciprocal iterative operation, and maintaining the enabling flag of the reciprocal iterative operation to be effective. According to the technical scheme, the reciprocal iteration termination condition and the calculation precision are set for the reciprocal iteration module, the calculation of the reciprocal iteration module is prevented from being too long, the acquisition of the reciprocal result meeting the preset convergence condition is accelerated, the reciprocal iteration operation is stopped in time, the square-opening iteration module is assisted to calculate the square root meeting the square-opening iteration termination condition in the new stage of square-opening iteration operation, and the calculation precision is guaranteed.
Further, the operational circuitry further comprises a pre-processing module; the square-on iteration module comprises a first data register and a second data register; the preprocessing module is used for sending a pre-input number to be squared to a built-in shift register, setting the number to be squared after shift processing of the shift register as an initial operand with preset precision for square-off iterative operation, and setting an operable mark to the square-off iterative module; the square-open iteration module is used for receiving and caching the initial operand into a first data register when the operable flag is valid, and caching the initial operand into a second data register from the first data register under the control of an external driving clock so as to update the initial operand into an input operand of reciprocal iteration operation; and the squaring iteration module is used for caching the square root result output by the current-level squaring iteration operation into the second data register from the first data register on the premise that the operable flag is invalid, if the first data register caches the square root result output by the current-level squaring iteration operation and the second data register caches the square root result output by the previous-level squaring iteration operation, and the absolute value of the difference value between the square root result output by the previous-level squaring iteration operation and the square root result output by the current-level squaring iteration operation does not meet the squaring iteration termination condition, so that the square root result output by the current-level squaring iteration operation covers the data cached by the second data register. According to the technical scheme, a reciprocal iteration module is started through a preprocessing module to execute reciprocal iteration operation, a square-open iteration module is also started to cache the initial operand, and the initial operand is updated to be an input operand of the current one-level square-open iteration operation, namely a Newton iteration initial value, so that the initial operand can be used for inputting a new one-level square-open iteration operation.
Further, the number of bits corresponding to the decimal number of the initial operand is one bit larger than the number of shifting bits of the shifting register configuration built in the preprocessing module, so that the initial operand is half of the number to be opened. Therefore, the distance between the initial operand and the square root result is shortened, and the speed of iterative operation is increased.
Further, the reciprocal iteration module includes a third data register and a fourth data register, and is configured to, under the premise that the third data register caches a reciprocal result output by the current stage of reciprocal iteration operation, and the fourth data register caches a reciprocal result output by the previous stage of reciprocal iteration operation, if a difference between the reciprocal result output by the previous stage of reciprocal iteration operation and the reciprocal result output by the current stage of reciprocal iteration operation does not satisfy a reciprocal iteration termination condition, cache the reciprocal result output by the current stage of reciprocal iteration operation in the fourth data register from the third data register, so that the reciprocal result output by the current stage of reciprocal iteration operation covers data cached in the fourth data register; and the fourth data register is used for caching a preset operand for reciprocal iteration operation when the enable flag of the reciprocal iteration operation is invalid. In the technical scheme, in the reciprocal iteration operation process, the reciprocal results obtained twice before and after are cached at the same time, so that the condition of terminating the reciprocal iteration is determined by calculating the difference of the reciprocal results of the two times, and the calculation complexity and the hardware cost are reduced.
Further, the reciprocal iteration module includes a second multiplier, a second right shift register, a second left shift register, a second subtractor, a third multiplier, a third right shift register, a second selector, a third data register, a fourth data register, a third subtractor, and a second comparator, and is configured to be connected to form the second circuit unit architecture to perform reciprocal iteration operation; a first input end of the second multiplier is connected with a first output end of a second data register built in the square-open iteration module, and the first input end of the second multiplier is used for receiving data output by the second data register; a second input end of the second multiplier is connected with a first output end of the fourth data register, and a second input end of the second multiplier is used for receiving data output by the fourth data register; the output end of the second multiplier is connected with the input end of a second right shift register, and the second right shift register is used for performing right shift processing on the product result output by the second multiplier; wherein the square root result and the reciprocal result both have a corresponding fractional number of digits at the predetermined precision; the input end of the second left shift register is used for receiving a first preset binary number and shifting the first preset binary number to form a binary number with reserved complete decimal places; the first preset binary number is a normal number deformed in the process of solving the square root of the number to be squared by using a Newton iteration method formula; the output end of the second right shift register is connected with the first input end of the second subtracter, the output end of the left shift register is connected with the second input end of the second subtracter, the output end of the second subtracter is connected with the first input end of the third multiplier, and the second subtracter is used for subtracting the shifting result output by the left shift register from the shifting result output by the second right shift register when the enable flag of reciprocal iterative operation is effective, and then sending the difference obtained by subtraction to the third multiplier; a second input end of the third multiplier is connected with a first output end of the fourth data register, and a second input end of the third multiplier is used for receiving data output by the fourth data register, so that the third multiplier processes a reciprocal result output by the reciprocal iteration module in a last reciprocal iteration operation or the preset operand output by the fourth data register in a current reciprocal iteration operation; the output end of the third multiplier is connected with the input end of the third right shift register, and the third multiplier is used for controlling the difference value output by the second subtracter to be multiplied by the data output by the fourth data register when the enable flag of the reciprocal iterative operation is effective; the third right shift register is used for performing right shift processing on the product output by the third multiplier so that the product result output by the third multiplier can represent the decimal digit number in a binary number form without loss after the shift processing; the output end of the third right shift register is connected with the first input end of the second selector, the second input end of the second selector is used for receiving the preset operand, the selection end of the second selector is used for receiving the enable mark of the reciprocal iteration operation, and the output end of the second selector is connected with the input end of the third data register; the second selector is used for selecting and sending the data output by the third right shift register to the third data register when the enable flag of the reciprocal iterative operation is effective; the second selector is used for selecting the preset operand received by the second input end of the second selector to be sent to the third data register when the enable flag of the reciprocal iterative operation is invalid; a first output end of the third data register is connected with a first input end of a third subtracter, a second output end of the third data register is connected with an input end of a fourth data register, and a second output end of the fourth data register is connected with a second input end of the third subtracter, so that a reciprocal result output by the current stage of reciprocal iterative operation is buffered in the third data register, and a reciprocal result output by the previous stage of reciprocal iterative operation is buffered in the fourth data register; the difference value of the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is processed and output by a third subtracter; the output end of the third subtractor is connected with the input end of the second comparator, and the output end of the second comparator is used for setting an effective reciprocal iteration operation ending mark to the square-opening iteration module when the absolute value of the difference value between the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is smaller than a reciprocal iteration convergence difference value, so that the condition that the reciprocal iteration ending condition is met is determined, and the data output by the first output end of the fourth data register to the square-opening iteration module is a reciprocal result meeting a preset convergence condition; when the absolute value of the difference value between the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is greater than or equal to the reciprocal iteration convergence difference value, setting an invalid reciprocal iteration operation ending mark to the square-open iteration module, and then configuring the latest cached data of the fourth data register as the input operand of the next reciprocal iteration operation; wherein the number of bits of the shift supported by the second left shift register, the second right shift register and the third right shift register is determined by a predetermined precision.
The technical scheme takes a multiplier and a right shift register as basic calculation units, and a subtracter and a selector are matched in sequence to complete a reciprocal iterative operation formula Y deduced by a Newton iterative method formulai+1=((2<<m–((b*Yi)>>m))*Yi)>>m hardware architecture layout design, wherein Yi+1The reciprocal of the precision with m-bit decimal output by the (i +1) th stage reciprocal iterative operation is obtained, and 2, the constant quantity deformed in the process of solving the square root of the number to be squared by utilizing a Newton iterative method formula is obtained; b is data output by a second data register built in the squaring iteration module, namely a square root result output by the squaring iteration module in the last-stage squaring iteration operation; y isiThe integer is the reciprocal of the precision of m-bit decimal output by the ith stage reciprocal iterative operation, and m and i are integers. Therefore, reciprocal iterative operation is not carried out by using hardware resources of a divider.
Furthermore, the square-on iteration module comprises a first multiplier, an and gate, a first adder, a first right shift register, a first selector, a first data register, a second data register, a first subtractor and a first comparator, and is used for connecting to form the first circuit unit architecture to execute square iteration operation; the first input end of the first multiplier is used for receiving the number to be extracted; the second input end of the first multiplier is connected with the first output end of the fourth data register and used for receiving the data output by the fourth data register; the enable end of the first multiplier is connected with the output end of the AND gate, the first input end of the AND gate is used for receiving the ending mark of the reciprocal iterative operation, the second input end of the AND gate is used for receiving the ending mark of the square-open iterative operation, and the output end of the first multiplier is connected with the first input end of the first adder; the first multiplier is used for multiplying the number to be squared and the data output by the fourth data register when the ending mark of the reciprocal iterative operation is valid and the ending mark of the square-off iterative operation is invalid, and then sending a product result to the first adder; the output end of the AND gate is connected with the enable end of a first adder, the first input end of the first adder is connected with the output end of a first multiplier, the second input end of the first adder is connected with the first output end of a second data register, and the first adder is used for adding the data output by the second data register and the product result output by the first multiplier when the end mark of the reciprocal iteration operation is valid and the end mark of the square-open iteration operation is invalid; the output end of the first adder is connected with the input end of the first right shift register, the output end of the first right shift register is connected with the second input end of the first selector, and the first right shift register is used for receiving the sum value output by the first adder and then performing right shift processing on the sum value; the output end of the first right shift register is connected with the first input end of a first selector, the second input end of the first selector is used for receiving the initial operand, the selection end of the first selector is used for receiving the operable mark, and the output end of the first selector is connected with the input end of the first data register; a first selector, configured to select to send data output by the first right shift register to the first data register when the operable flag is invalid, so as to cache, in the first data register, a square root result output in a current stage of the square-root iterative operation; a first selector for selecting to pass said initial operand received at a second input of the first selector to the first data register when said operable flag is active; the first output end of the first data register is connected with the first input end of the first subtracter, the second output end of the first data register is connected with the input end of the second data register, and the second output end of the second data register is connected with the second input end of the first subtracter, so that the reciprocal result output by the current-stage open square iterative operation is cached in the third data register, and the reciprocal result output by the previous-stage open square iterative operation is cached in the fourth data register; the difference value of the data output by the first output end of the first data register and the data output by the second output end of the second data register is processed and output by a first subtracter; the output end of the first subtractor is connected with the input end of the first comparator, the output end of the first comparator is used for outputting a signal corresponding to the end mark of the squaring iteration operation, the squaring iteration module is used for setting the output end mark of the squaring iteration operation to be valid when the absolute value of the difference value between the data output by the first output end of the first data register and the data output by the second output end of the second data register is smaller than the convergence difference value of the squaring iteration operation, the requirement for the end condition of the squaring iteration operation is determined to be met, and the data output to the outside by the third output end of the second data register is a square root result finally calculated by the operation circuit system; the squaring iteration module is further configured to set an enable flag of reciprocal iteration operation to be valid and output to the reciprocal iteration module when an absolute value of a difference between data output from a first output end of the first data register and data output from a second output end of the second data register is greater than or equal to a convergence difference of the squaring iteration, and then configure latest cached data of the second data register as an input operand of the reciprocal iteration operation.
The technical scheme takes a multiplier and a right shift register as basic calculation units, and an adder and a selector are matched in sequence to complete an open square iterative operation formula X derived from a Newton iterative method formulan+1=(Xn+a*(1/Xn))>>1, wherein Xn+1Is the square root with preset precision output by the (n +1) th-level square-opening iterative operation, and a is the original number of the square to be opened; xnIs the square root with a predetermined precision of the output of the nth stage of the open-square iterative operation, n being an integer. And performing the square-open iteration operation without using hardware resources of a divider.
In the above-mentioned hardware technical solutions of the square-open iteration module and the reciprocal iteration module, when a preset convergence difference is reached, iteration exits from the corresponding iterative operation to obtain appropriate result data. Thereby reducing the amount of redundant iterative computation without degrading the estimation accuracy.
Furthermore, the preprocessing module comprises a preset data register and a first left shift register, and the preset data register is used for caching the number of the to-be-opened squares; the output end of the preset data register is connected with the input end of a first left shift register, and the first left shift register is used for left-shifting the number to be opened into the initial operand so that the initial operand is half of the number to be opened, wherein the number to be opened has a corresponding decimal place number under the preset precision; the output end of the first left shift register is connected with the second input end of the first selector, and the first left shift register is used for transmitting the initial operand to the first selector; the output end of the preset data register is connected with the first input end of the first multiplier, and the output end of the preset data register is used for transmitting the to-be-opened square number to the first multiplier. The technical scheme is that an input initial operand which is used as an initial value of Newton iteration is processed into a half of a to-be-derived number through a left shift register.
Further, the reciprocal iteration convergence difference is greater than the squared iteration convergence difference; the smaller the reciprocal iteration convergence difference value is, the higher the precision of the corresponding reciprocal iteration operation is, and the longer the time of the corresponding reciprocal iteration operation is; the smaller the square-opening iteration convergence difference value is, the higher the precision of the corresponding square-opening iteration operation is, and the longer the time of the corresponding square-opening iteration operation is. The method is suitable for application scenarios with high requirements on computing delay and sensitive cost.
Further, the predetermined precision specific form is that the decimal of the corresponding binary number is set to 16 bits, so that the 16-bit wide decimal supports the reciprocal iteration module to calculate the reciprocal of the decimal number ranging from 0 to 65565; wherein the number to be extracted is a binary number with the lower 16 bits configured as a decimal number. The calculation requirement of the binary numbers of the batches is met.
A chip is internally provided with the hardware acceleration-based operation circuit system. The disadvantage that a divider is needed in the prior art for solving the square root by a Newton iteration method is avoided, and hardware resources are greatly saved.
Drawings
Fig. 1 is a schematic diagram of an operational circuit system according to an embodiment of the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the drawings. Each unit block in the following embodiments is a logic circuit, and one logic circuit may be one physical unit, may be a state machine in which a plurality of logic devices are combined according to a certain read/write sequence and signal logic change, may be a part of one physical unit, or may be implemented by combining a plurality of physical units. In addition, in order to highlight the innovative part of the present invention, elements that are not so closely related to solving the technical problems presented by the present invention are not introduced in the embodiments of the present invention, but it does not indicate that other elements are not present in the embodiments of the present invention.
As will be understood by those skilled in the relevant art, the newton iteration method is one of the important methods for solving the root of the equation, and has the greatest advantage of square convergence near a single root of equation f (x) 0, and the method can also be used for solving the multiple roots and the complex roots of the equation, and the equation converges linearly at this time, but can become super-linear convergence by some methods. In addition, the method is widely applied to computer programming, but in the prior art, a multiplexing divider is required to operate, so that a more accurate approximate value of an arithmetic root of data to be extracted is calculated, and the expenditure of hardware resources is increased.
Specifically, as known to those skilled in the art from Bowen of https:// blog.csdn.net/qq _43227036/article/details/104439102, to obtain the square root of the number to be derived, f (X) is addedn)=Xn 2Substituting-a into Newton's iterative method formula Xn+1=Xn-f(Xn)/f'(Xn) (n-0, 1,2 … …) to obtain the formula Xn+1=1/2(Xn+a/Xn). As can be seen from the foregoing Bowen derivation, Xn+1Always in the direction f (X)n) X point of 0 is approached so that only X is selected0After being used as an initial value of Newton iteration, continuously iterating to obtain Xn+1,Xn+1F (X) can always be reachedn) Point 0, X in this casen+1It is the value of the square root of a, i.e. the open-squared result of a. At this time, the formula of the square-open iterative operation includes (1/X)n) Then X needs to be aligned during each iterationnThe reciprocal is taken, where n is the number of iterations, also understood as the number of iterations. Therefore, the operation needs to be carried out by a multiplexing divider according to the conditions of the prior art, and the overhead of hardware resources is increased.
Aiming at the defect of square root calculation by a Newton iteration method, the embodiment of the invention provides an operation circuit system and a chip based on a quadratic Newton iteration method on the basis of square root calculation by the Newton iteration method, thereby avoiding the defect that a divider is required for square root calculation by the original Newton iteration method, and greatly saving hardware resources. In particular, in the hardware architecture design of the arithmetic circuit for obtaining reciprocal according to this embodiment, the derivation idea of Bo Wen based on https:// blog.csdn.net/qq _ 43227036/arrow/details/104439102 is known, and in order to obtain the reciprocal of operand b (in this embodiment, b is the aforementioned XnTo determine X in advancenReciprocal of (a), f (Y)i)=1/YiSubstituting-b into Newton's iterative method formula Yi+1=Yi-f(Yi)/f'(Yi) (i-0, 1,2 … …) to obtain the reciprocal iterative operation formula Yi+1=Yi(2-b*Yi). Wherein, Yi+1Always in the direction f (Y)i) The coordinate point of 0 is approached, so that only Y is selected0After being used as an initial value of Newton iteration, the Y is continuously iteratively solvedi+1,Yi+1F (Y) is considered to be reached under a certain predetermined convergence conditioni) Point of 0, Y in this casei+1It is the reciprocal of b. Where i is the number of iterations, and is also understood as the number of iteration stages after the hardware processing.
It should be noted that, firstly, the hardware implementation has no decimal, and secondly, the integer is stored in binary form in the CPU and its associated digital logic circuit, so to represent the corresponding decimal under the predetermined precision, in this embodiment, the operand (including the number to be squared), and/or, the iterative operation result, and/or the intermediate data of the operation that is input from outside is forcibly converted into the integer data that has the corresponding decimal under the predetermined precision through the shift register, specifically, the shift register performs left shift adjustment or right shift adjustment on the fixed point position of the relevant number, so that the bit width of the conversion result includes the bit width of the integer part and the bit width of the decimal part.
In some implementation scenarios, in order to compute the reciprocal of a wider range of integers, the bit width of the fractional part needs to be increased to ensure the precision of the square-on result, and the computed result needs to be adjusted according to the fixed-point position of the operation-related intermediate number to maintain the precision of the operation result consistent with the operands participating in the operation, which is clear to any person skilled in the art.
As shown in fig. 1, an embodiment of the present invention discloses a hardware acceleration-based operation circuit system, which includes a square-open iteration module and a reciprocal iteration module, and here, the operation circuit system only includes a core operation unit and includes, but is not limited to, a preprocessing unit. In this embodiment, when the square-root iteration module is used for the end flag3 of reciprocal iteration operation to be valid and the end flag4 of the square-root iteration operation to be invalid (flag 4), in the implementation mode of the current-stage square-root iteration operation, the preset number to be squared and the reciprocal result output by the reciprocal iteration module and meeting the preset convergence condition may be controlled to be multiplied to perform the square-root iteration operation with the preset precision, and after actually entering the square-root iteration module, the number to be squared enters the square-root iteration module, the square-root iteration module performs the square-root iteration operation according to the square-root iteration operationFormula Xn+1=1/2(Xn+a/Xn) Or Xn+1=(Xn+a/Xn)>>1 carrying out the square-on operation of the Newton iteration method under the preset precision, and the initial value X of the iteration exists0When the iteration termination condition of the square opening is not met, the square iteration operation is disconnected in the square opening iteration module, the next-stage square opening iteration operation is not started, and X is usedn+1(equal to b shown in the figure, i.e. the operand to be reciprocal) is sent to the reciprocal iteration module to obtain the reciprocal result 1/X meeting the preset convergence conditionn+1(1/b), continuing to perform next-stage square-opening iterative operation; in such iteration, the square-open iteration operation is not ended until the end flag4 of the square-open iteration operation is received to be valid (set to be 1); the end flag3 of reciprocal iteration operation is an iteration end control signal output (or set) from the reciprocal iteration module to the square-open iteration module; the end flag4 of the square-on iteration operation is an iteration end control signal output (or set) by the square-on iteration module.
The reciprocal iteration module is used for controlling the square root result output by the square-open iteration module in the last stage of square-open iteration operation to be multiplied by the reciprocal result output by the reciprocal iteration module in the last stage of reciprocal iteration operation to carry out reciprocal iteration operation with preset precision in the current stage of reciprocal iteration operation when the enable flag2 of the reciprocal iteration operation is valid, so as to obtain the reciprocal result output by the reciprocal iteration module and meeting the preset convergence condition; in some implementation scenarios, the preset operand output by the reciprocal iteration module is controlled to be multiplied by the initial operand output by the square-open iteration module to perform reciprocal iteration operation with a predetermined precision, so as to obtain a reciprocal result output by the reciprocal iteration module and meeting a preset convergence condition. The initial operand cached by the squaring iteration module is an external input number to be squared after shift processing with preset precision; the preset operand is a Newton iteration initial value which is cached in advance by the reciprocal iteration module when reciprocal iteration operation is executed.
When the enable flag2 of reciprocal iteration operation is valid, the reciprocal iteration module receives data b from the square-open iteration module (when the square-open iteration operation executed by the square-open iteration module does not meet the iteration termination condition, the valid requirement output to the reciprocal iteration module takes the reciprocal operand, which can be understood as the square root result obtained by the newly executed first-stage square-open iteration operation of the square-open iteration module), according to Yi+1=(Yi*(2<<16-((b*Yi)>>16)))>>16, continuously iterating under a preset precision until the reciprocal iteration module outputs a reciprocal result meeting a preset convergence condition, and then transmitting the reciprocal result to the square-open iteration module to continuously execute square-open iteration operation (a new level of square-open iteration operation, including a next level or a brand new level after zero clearing treatment), and setting an end flag3 of the reciprocal iteration operation to be effective (namely setting 1), which also indicates that the reciprocal iteration module currently outputs an effective reciprocal result; the enable flag2 of reciprocal iteration operation is an iteration start control signal set from the square-open iteration module to the reciprocal iteration module, and indicates that reciprocal iteration operation is started. Wherein, Yi+1And YiThe lower 16 bits are required to be configured to represent their fractions such that the predetermined accuracy requirement is overlaid to the lower 16-bit fraction.
Based on the embodiment, the square-open iteration operation belongs to the iteration operation of solving the square root of the number to be squared by using a Newton iteration method formula, and a first circuit unit framework corresponding to the iteration logic relationship of the square-open iteration operation is connected in a square-open iteration module; the reciprocal iteration operation is an iteration operation of solving the reciprocal of the number to be extracted by using a Newton iteration method formula, and a second circuit unit framework corresponding to the iteration logic relationship of the reciprocal iteration operation is connected in a reciprocal iteration module. Compared with the prior art, before the square root of the number to be derived is obtained, the operation circuit system needs to calculate the reciprocal meeting the preset convergence condition through the reciprocal iteration module in advance in an iteration mode, and then the square-derived iteration module is used for calculating the square root of the number to be derived in an iteration mode, so that the situation that the required reciprocal of the square root is obtained through calculation by using divider resources in the operation circuit system is avoided, and the square-derived iteration module and the reciprocal iteration module are in the same precision for iteration calculation, the calculation speed is improved, and hardware resources are saved.
As an embodiment, referring to fig. 1, in the process of performing the square-root iteration with a predetermined precision by the square-root iteration module, the square root result (X) outputted when the previous square-root iteration is performed is obtainednBuffered at the second data register shown) and the square root result (X) output in the current level of the open-square iterative operationn+1When the absolute value of the difference value is cached in the first data register shown in the figure), the square-open iteration operation is ended, the square root result output in the current-stage square-open iteration operation is taken as the square root result finally obtained by the square iteration operation (the second data register is updated in a clock beating mode and then output from the end S of the second data register) and is taken as the square root meeting the end condition of the square-open iteration operation, and the end flag4 of the square-open iteration operation is set to be valid; square root result (X) output when last stage of square-open iteration operationn) And the square root result (X) output in the current stage of the square-open iterative operationn+1) When the absolute value of the difference value does not meet the termination condition of the evolution iteration, the square root result output by the current stage of the evolution iteration operation is output to a reciprocal iteration module, namely, the X cached in the first data register of the figure is output in a beating moden+1To the second data register shown in the figure to overwrite update XnFinally, X is output from the second data registern+1And the input result is sent to a reciprocal iteration module to be used as an operand for inputting a new stage reciprocal iteration operation, an effective enabling flag (flag2 is 1) of the reciprocal iteration operation is set to the reciprocal iteration module, and the current stage of open square iteration operation is determined to be completed. Therefore, on the premise that the square root of the current-stage squaring iteration operation does not meet the condition of the termination of the squaring iteration, the reciprocal iteration module is started to obtain a more appropriate reciprocal result, so that the next reciprocal result is facilitatedAnd obtaining the square root which is closer to the end condition of the evolution iteration in the stage evolution square iteration operation.
It should be noted that, as shown in fig. 1, the square-on iteration module is configured to receive and buffer the initial operand into a first built-in data register when the operable flag1 is active (the flag1 is set to high), and preferably, the time that the flag1 is set to high is only maintained for one pulse period, which is less than a period consumed for performing a first-level square-on iteration operation or a first-level reciprocal iteration operation, and the initial operand X0 is buffered from the first data register into a second built-in data register of the square-on iteration module under the control of an external driving clock, so that the initial operand is updated to an input operand of a next-level reciprocal iteration operation. The square-on iteration module is used for caching a square root result output by the current-level square-on iteration operation into a first data register when the operable flag1 is invalid (flag1 is 0), and caching a square root result output by the previous-level square-on iteration operation into a second data register; when the absolute value of the difference value between the square root result output by the last-stage squaring iterative operation and the square root result output by the current-stage squaring iterative operation does not meet the end condition of the squaring iteration, the square root result output by the current-stage squaring iterative operation is cached into a second data register arranged in the squaring iterative module by a first data register under the control of an external driving clock, so that the data cached by the second data register is covered by the square root result output by the current-stage squaring iterative operation cached by the first data register, at the moment, the current-stage squaring iterative operation is ended, a new-stage reciprocal iterative operation is started, at the moment, the square root result output by the current-stage squaring iterative operation is updated into the square root result output by the last-stage squaring iterative operation, and the next-stage squaring iterative operation started to be updated into the current-stage squaring iterative operation, and updating the square root result output by the next-stage open-square iterative operation into the square root result output by the current-stage open-square iterative operation. It should be noted that, before the end flag4 of the square-root iterative operation becomes valid, if the second data register caches the square-root result output by the previous-stage square-root iterative operation, the first data register is used to cache the square-root result output by the current-stage square-root iterative operation, and the square-root result here is understood as an intermediate result before the end flag of the square-root iterative operation becomes valid.
In this embodiment, the preprocessing module starts the reciprocal iteration module to perform reciprocal iteration operation, and also starts the square-open iteration module to cache the initial operand, and updates the initial operand to the input operand of the current one-stage square-open iteration operation, that is, the newton iteration initial value, for input of a new one-stage square-open iteration operation.
As an embodiment, referring to fig. 1, in the process of reciprocal iteration operation with a predetermined precision performed by the reciprocal iteration module, the reciprocal result (Y) output in the last reciprocal iteration operationiWhen buffered in the fourth data register shown) and the reciprocal result (Y) output by the current stage of reciprocal iterative operationi+1When the absolute value of the difference value is cached in the third data register of the diagram), determining to end the reciprocal iteration operation, that is, to complete the current stage of reciprocal iteration operation, and setting an effective sign (flag3 ═ 1) for ending the reciprocal iteration operation to the square-open iteration module, and determining that the reciprocal result output by the reciprocal iteration module in the last stage of reciprocal iteration operation (the data cached in the fourth data register) is the reciprocal result meeting the preset convergence condition (that is, the operand b to be reciprocal is cached in the fourth data register of the diagram under the preset precision and meets the reciprocal value Y of the preset convergence condition)i) Meanwhile, the reciprocal result meeting the preset convergence condition is transmitted to a square-opening iteration module, so that the reciprocal result meeting the preset convergence condition becomes 1/X in the square-opening iteration operation formulan(ii) a Reciprocal result (Y) output when the last stage reciprocal iterative operationi) Reciprocal result (Y) output by reciprocal iterative operation with current stagei+1) When the absolute value of the difference value does not meet the reciprocal iteration termination condition, the reciprocal result output in the current stage of reciprocal iteration operation is updated to the input operand of the next stage of reciprocal iteration operation, namely, the reciprocal result is printed in a beating modeY buffered in the third data register as showni+1To the second data register shown in the figure to overwrite the update YiAnd the enable flag2 of the reciprocal iteration operation is kept valid, and the end flag3 of the reciprocal iteration operation is set invalid. The embodiment of the reciprocal iteration module avoids the overlong calculation of the reciprocal iteration module by setting a reciprocal iteration termination condition and calculation precision for the reciprocal iteration module, accelerates the acquisition of a reciprocal result meeting a preset convergence condition and stops the reciprocal iteration operation in time, assists the squaring iteration module to calculate a square root meeting the evolution termination condition in a new first-stage squaring iteration operation, and ensures the calculation precision.
Specifically, as shown in fig. 1, the reciprocal iteration module includes a third data register and a fourth data register, and is configured to, when an enable flag (flag2) of the reciprocal iteration operation is asserted and in a state where the third data register buffers a reciprocal result output by a current stage of reciprocal iteration operation and the fourth data register buffers a reciprocal result output by a previous stage of reciprocal iteration operation, when an absolute value of a difference between data buffered by the fourth data register and data buffered by the third data register does not satisfy a reciprocal iteration termination condition, buffer a reciprocal result output by the current stage of reciprocal iteration operation from the third data register into the fourth data register built in the reciprocal iteration module under control of an external driving clock so as to override the reciprocal result output by the previous stage of reciprocal iteration operation originally buffered by the fourth data register, and the fourth data register is also used for caching the input operand of the next stage reciprocal iteration operation, at the moment, the current stage reciprocal iteration operation is finished, and a new stage reciprocal iteration operation is started to be executed, at the moment, the reciprocal result output by the current stage reciprocal iteration operation is updated to the reciprocal result output by the previous stage reciprocal iteration operation, the next stage reciprocal iteration operation started to be executed is updated to be the current stage reciprocal iteration operation, and the reciprocal result output by the next stage reciprocal iteration operation is updated to be the reciprocal result output by the current stage reciprocal iteration operation. And if the reciprocal result output by the current stage of reciprocal iterative operation is configured as the input operand of the next stage of reciprocal iterative operation and is output by the fourth data register to carry out the next stage of reciprocal iterative operation, the reciprocal result output by the next stage of reciprocal iterative operation is cached to the third data register. It should be noted that, before the end flag3 of the reciprocal iteration operation becomes valid, if the fourth data register buffers the reciprocal result output by the previous stage reciprocal iteration operation, the third data register is used to buffer the reciprocal result output by the current stage reciprocal iteration operation, and the reciprocal result is understood as an intermediate result before the end flag3 of the reciprocal iteration operation becomes valid. And the fourth data register is used for caching a preset operand for reciprocal iteration operation when the enable flag2 of the reciprocal iteration operation is invalid (flag2 is 0), and is preferably set to 4' b1001 as an initial operand of the reciprocal iteration operation, and the initial operand is equivalent to an initial Newton iteration value when the reciprocal is obtained. In the reciprocal iteration operation process, the reciprocal results obtained twice before and after are cached at the same time, so that the condition of terminating the reciprocal iteration is determined by calculating the difference of the reciprocal results of the two times, and the calculation complexity and the hardware cost are reduced.
It should be noted that the data register mentioned in the foregoing embodiment is used for storing operands, operation results, and intermediate results of operations, so as to reduce the number of times of accessing the memory.
As an embodiment, as can be seen in fig. 1, the operation circuit system further includes a preprocessing module; the preprocessing module is used for sending the pre-input number to be squared to a built-in shift register, setting the number to be squared after shift processing of the shift register to be an initial operand X0 with preset precision for the squaring iterative operation, and setting an operable flag to the squaring iterative module, so that when the operable flag is valid, the initial operand X0 is configured to be the initial operand input to the squaring iterative module and then is cached by a first data register of the squaring iterative module. Specifically, the preprocessing module comprises a preset data register and a first left shift register, wherein the preset data register is used for caching the number a to be opened; the output end of the preset data register is connected with the input end of a first left shift register, and the first left shift register is used for left-shifting the to-be-squared number a into an initial operand used for square-off iterative operation, so that the initial operand X0 is half of the to-be-squared number a, wherein the to-be-squared number a has a corresponding decimal place number under the preset precision. Therefore, the initial operand has a decimal number corresponding to one bit larger than the shift register configuration built in the preprocessing module, so that the initial operand X0 is half of the to-be-opened number a. Therefore, the distance between the initial operand and the square root result is shortened, and the speed of iterative operation is increased. As shown in fig. 1, when the lower 16 bits of the number to be extracted are decimal, the predetermined precision corresponds to a decimal precision of 16 bits, and the first left shift register (rectangular box indicated by "< < 15") is configured to support left shift of the input data by 15 bits in a bit-by-bit left shift manner under the action of a shift pulse.
As an embodiment, as shown in fig. 1, the reciprocal iteration module includes a second multiplier, a second right shift register, a second left shift register, a second subtractor, a third multiplier, a third right shift register, a second selector, a third data register, a fourth data register, a third subtractor, and a second comparator, and is configured to be connected to form the second circuit unit architecture to perform reciprocal iteration operation; a first input end of the second multiplier is connected with a first output end of a second data register which is arranged in the square-open iteration module, and the first input end of the second multiplier is used for receiving data output by the second data register so as to enable the reciprocal iteration module to start processing a square root result output by the square-open iteration module in the last-stage square-open iteration operation, specifically, when an absolute value of a difference value between data cached by the second data register and data cached by the first data register does not satisfy a square-open iteration termination condition, the reciprocal iteration module starts processing the square root result output by the square-open iteration module in the newly executed first-stage square-open iteration operation (specifically, the square root result is output by the second data register after the cached data is updated); or, each time the reciprocal iteration module starts to execute the reciprocal iteration operation (including the case that the enable flag2 of the reciprocal iteration operation is set to be valid (equal to 1) and the reciprocal iteration module does not perform the reciprocal iteration operation before), the initial operand X0 is configured as the input operand of the reciprocal iteration operation, and at this time, the second multiplier starts to process the initial operand output by the second data register in the current stage of reciprocal iteration operation.
A second input end of the second multiplier is connected with a first output end of the fourth data register, and a second input end of the second multiplier is used for receiving data output by the fourth data register, so that the second multiplier starts to process a reciprocal result output by the reciprocal iteration module in a last reciprocal iteration operation in a current reciprocal iteration operation, specifically, when an absolute value of a difference value between data cached in the third data register and data cached in the fourth data register does not satisfy a reciprocal iteration termination condition, the second multiplier starts to process the reciprocal result output by the reciprocal iteration module output by the fourth data register in a latest executed reciprocal iteration operation; or, each time the reciprocal iteration module starts to perform reciprocal iteration operation (including the case that the reciprocal iteration module has not performed any one stage of reciprocal iteration operation in advance and the enable flag2 of the reciprocal iteration operation is set to be valid (equal to 1)), the preset operand is configured as another input operand of the reciprocal iteration operation, and the second multiplier starts to process the preset operand output by the fourth data register in the current stage of reciprocal iteration operation.
The output end of the second multiplier is connected with the input end of a second right shift register, and the second right shift register is used for right-shifting the product result output by the second multiplier so that the product result output by the second multiplier can represent the decimal digit number in a binary number form without loss after the shift processing; wherein the square root result and the reciprocal result both have a corresponding fractional number of digits at the predetermined precision; in the embodiment shown in fig. 1, a second right shift register (rectangular box labeled "> > 16") is provided to enable a right shift of the input data by 16 bits to cover the fractional part of the 16 bit width, and can also be expressed as an approximation of the lower 16-bit fraction covering the square root result and the reciprocal result.
The input end of the second left shift register is used for receiving a first preset binary number 2 'b 10 (equal to decimal 2) and shifting the first preset binary number to form a binary number with reserved complete decimal places, and when the preset precision is a decimal number required to be covered to 16 lower bits, the second left shift register (a rectangular frame marked with a mark "< < 16") shifts the first preset binary number 2' b10 by 16 bits to the left; the first preset binary number is a constant quantity deformed in the process of solving the square root of the number to be squared by using a newton iteration method formula, that is, a constant 2 of the reciprocal iteration formula in the foregoing embodiment.
The output end of the second right shift register is connected with the first input end of the second subtracter, the output end of the left shift register is connected with the second input end of the second subtracter, the output end of the second subtracter is connected with the first input end of the third multiplier, the second subtracter is used for subtracting the shift result output by the left shift register from the shift result output by the second right shift register when the enable flag2 of reciprocal iterative operation is effective, and then the difference value obtained by subtraction is sent to the third multiplier; a second input end of the third multiplier is connected with a first output end of the fourth data register, and a second input end of the third multiplier is used for receiving data output by the fourth data register, so that when an absolute value of a difference value between the data cached by the third data register and the data cached by the fourth data register does not meet a reciprocal iteration termination condition, the third multiplier processes a reciprocal result output by the reciprocal iteration module in a last reciprocal iteration operation (a latest executed reciprocal iteration operation) in a current reciprocal iteration operation, and multiplies a difference result output by the second subtractor and a reciprocal result output by the reciprocal iteration module in the last reciprocal iteration operation; or when the enable flag2 of reciprocal iteration operation is set to be valid (equal to 1) (including the case that the reciprocal iteration module has not performed reciprocal iteration operation in advance), the third multiplier processes the preset operand output by the fourth data register when starting to perform reciprocal iteration operation, that is, multiplies the difference result output by the second subtractor by the preset operand.
The output end of the third multiplier is connected with the input end of the third right shift register, and the third multiplier is used for controlling the difference value output by the second subtracter to be multiplied by the data output by the fourth data register when the enable flag2 of reciprocal iterative operation is valid; the third right shift register is used for right shift processing of the product output by the third multiplier, so that the product result output by the third multiplier can represent the decimal digit number in the form of binary number without loss after shift processing. When the predetermined precision is a decimal requiring coverage to the lower 16 bits, the third right shift register is labeled in FIG. 1 ">>And a rectangular frame of 16 ″, namely, performing right-shift processing on the intermediate result subjected to the multiplication processing of the second multiplier and the third multiplier in sequence by 16 bits, so that the precision of the finally obtained reciprocal result covers a decimal with 16 bits. In this embodiment, the result output by the third right shift register is the reciprocal result (intermediate result) of the round (one-stage) iterative operation in the reciprocal iterative operation formula, that is, the step of restoring the round (one-stage) iterative operation in the reciprocal iterative operation formula through a hardware circuit, and corresponds to Y which is the condition of a specific iteration number ii+1=(Yi*(2<<16-((b*Yi)>>16)))>>16。
The output of the third right shift register is connected to a first input of a second selector, the second input of which is arranged to receive said preset operand for the reciprocal iteration operation, corresponding to 4' b1001 in fig. 1. The selection end of the second selector is used for receiving the enable flag2 of the reciprocal iterative operation, and the output end of the second selector is connected with the input end of the third data register; a second selector, configured to select to send data output by the third right shift register to the third data register when the enable flag2 of the reciprocal iterative operation is enabled (flag2 is 1), so as to buffer a reciprocal result output in the current stage of reciprocal iterative operation in the third data register; and a second selector for selecting to send the preset operand received by the second input terminal of the second selector to the third data register when the enable flag2 of the reciprocal iterative operation is invalid (flag2 ═ 0), which may be stored as initial values of newton iterations required for each start of the reciprocal iterative operation.
The first output end of the third data register is connected with the first input end of the third subtracter, the second output end of the third data register is connected with the input end of the fourth data register to form two cascaded data registers, the second output end of the fourth data register is connected with the second input end of the third subtracter, so that the reciprocal result output by the current stage of reciprocal iteration operation is cached in the third data register, the reciprocal result output by the last stage of reciprocal iteration operation is cached in the fourth data register, and the cached data of the fourth data register supports data coverage updating transmitted by the third data register.
The output end of the third subtractor is connected with the input end of the second comparator, and the output end of the second comparator is configured to determine that the reciprocal iteration termination condition is met when the absolute value of the difference between the data output from the first output end of the third data register and the data output from the second output end of the fourth data register is smaller than the reciprocal iteration convergence difference, and determine that the reciprocal iteration operation is terminated, where of course, the enable flag2 of the reciprocal iteration operation is set to be invalid (flag2 is 0), and the valid end flag3 of the reciprocal iteration operation is set to the square-open iteration module, so that the data output from the first output end of the fourth data register to the square-open iteration module is a reciprocal result meeting the preset convergence condition.
When the absolute value of the difference value between the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is greater than or equal to the reciprocal iteration convergence difference value, setting an invalid reciprocal iteration operation ending flag3 to the squaring iteration module, controlling the data cached by the third data register to be transmitted to the fourth data register so as to cover the data originally cached by the fourth data register, and configuring the latest cached data of the fourth data register as the input operand of the next reciprocal iteration operation, thereby controlling the second circuit unit architecture to execute the next reciprocal iteration operation; and after the reciprocal result output by the previous stage of reciprocal iterative operation is configured as the input operand of the next stage of reciprocal iterative operation, the reciprocal result is output to the second input end of the second multiplier by the fourth data register so as to carry out the next stage of reciprocal iterative operation, and then the reciprocal result output by the next stage of reciprocal iterative operation is cached in the third data register. The difference value of the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is processed and output by a third subtracter; wherein the number of bits of the shift supported by the second left shift register, the second right shift register and the third right shift register is determined by a predetermined precision.
The embodiment corresponding to the reciprocal iteration module takes a multiplier and a right shift register as basic calculation units, and a subtracter and a selector are matched in sequence to complete a reciprocal iteration operation formula Y deduced from a Newton iteration method formulai+1=((2<<m–((b*Yi)>>m))*Yi)>>m hardware architecture layout design, wherein Yi+1The reciprocal is the reciprocal of the precision of m-bit decimal number output by the (i +1) th stage reciprocal iterative operation, m is preferably 16, and 2 is the constant number deformed in the process of solving the square root of the number to be squared by using a Newton iterative method formula; b is data output by a second data register built in the squaring iteration module, namely a square root result output by the squaring iteration module in the last-stage squaring iteration operation; y isiThe integer is the reciprocal of the precision of m-bit decimal output by the ith stage reciprocal iterative operation, and m and i are integers. And performing reciprocal iteration operation without using hardware resources of a divider.
As an embodiment, as shown in fig. 1, as an embodiment, the reciprocal iteration module includes a second multiplier, a second right shift register, a second left shift register, a second subtractor, a third multiplier, a third right shift register, a second selector, a third data register, a fourth data register, a third subtractor, and a second comparator, and is configured to be connected to form the second circuit unit architecture to perform reciprocal iteration operation; a first input end of the second multiplier is connected with a first output end of a second data register which is arranged in the squaring iteration module, and the first input end of the second multiplier is used for receiving data output by the second data register so as to enable the reciprocal iteration module to start processing a square root result output by the squaring iteration module in the last-stage squaring iteration operation, specifically, when an absolute value of a difference value between data cached by the second data register and data cached by the first data register does not satisfy a squaring iteration termination condition, the reciprocal iteration module starts processing the square root result output by the squaring iteration module in the last-stage squaring iteration operation (specifically, output by the second data register); or, each time the reciprocal iteration module starts to execute the reciprocal iteration operation (including the case that the enable flag2 of the reciprocal iteration operation is set to be valid (equal to 1) and the reciprocal iteration module does not perform the reciprocal iteration operation before), the initial operand X0 is configured as the input operand of the reciprocal iteration operation, and at this time, the second multiplier starts to process the initial operand output by the second data register in the current stage of reciprocal iteration operation.
A second input end of the second multiplier is connected to a first output end of the fourth data register, and a second input end of the second multiplier is used for receiving data output by the fourth data register, so that the second multiplier starts to process a reciprocal result output by the reciprocal iteration module in a last reciprocal iteration operation in a current reciprocal iteration operation, specifically, when an absolute value of a difference value between data cached in the third data register and data cached in the fourth data register does not satisfy a reciprocal iteration termination condition, the second multiplier starts to process a reciprocal result Y output by the reciprocal iteration module in the last reciprocal iteration operation output by the fourth data registeri(ii) a The reciprocal iteration module outputs a reciprocal result Y in the last stage of reciprocal iteration operationi+1(ii) a Or, the reciprocal iteration module starts to execute reciprocal iteration operation each time (including that the reciprocal iteration module does not perform any one-stage reciprocal iteration operation in advance and the reciprocal iteration operation is performed in a way thatThe enable flag2 is asserted (equal to 1), the preset operand is configured as another input operand of the reciprocal iterative operation, and the second multiplier starts processing the preset operand output by the fourth data register in the current stage of reciprocal iterative operation.
The output end of the second multiplier is connected with the input end of a second right shift register, and the second right shift register is used for right-shifting the product result output by the second multiplier so that the product result output by the second multiplier can represent the decimal digit number in a binary number form without loss after the shift processing; wherein the square root result and the reciprocal result both have a corresponding fractional number of digits at the predetermined precision; in the embodiment shown in fig. 1, a second right shift register (rectangular box labeled "> > 16") is provided to enable a right shift of the input data by 16 bits to cover the fractional part of the 16 bit width, and can also be expressed as an approximation of the lower 16-bit fraction covering the square root result and the reciprocal result.
The input end of the second left shift register is used for receiving a first preset binary number 2 'b 10 (equal to decimal 2) and shifting the first preset binary number to form a binary number with reserved complete decimal places, and when the preset precision is a decimal number required to be covered to 16 lower bits, the second left shift register (a rectangular frame marked with a mark "< < 16") shifts the first preset binary number 2' b10 by 16 bits to the left; the first preset binary number is a constant quantity deformed in the process of solving the square root of the number to be squared by using a newton iteration method formula, that is, a constant 2 of the reciprocal iteration formula in the foregoing embodiment.
The output end of the second right shift register is connected with the first input end of the second subtracter, the output end of the left shift register is connected with the second input end of the second subtracter, the output end of the second subtracter is connected with the first input end of the third multiplier, the second subtracter is used for subtracting the shift result output by the left shift register from the shift result output by the second right shift register when the enable flag2 of reciprocal iterative operation is effective, and then the difference value obtained by subtraction is sent to the third multiplier; a second input end of the third multiplier is connected with a first output end of the fourth data register, and a second input end of the third multiplier is used for receiving data output by the fourth data register, so that when an absolute value of a difference value between data cached by the third data register and data cached by the fourth data register does not meet a reciprocal iteration termination condition, the third multiplier processes a reciprocal result output by the reciprocal iteration module in a last reciprocal iteration operation in a current reciprocal iteration operation, and multiplies a difference result output by the second subtractor and a reciprocal result output by the reciprocal iteration module in the last reciprocal iteration operation; or when the enable flag2 of reciprocal iteration operation is set to be valid (equal to 1) (including the case that the reciprocal iteration module has not performed reciprocal iteration operation in advance), the third multiplier processes the preset operand output by the fourth data register when starting to perform reciprocal iteration operation, that is, multiplies the difference result output by the second subtractor by the preset operand.
The output end of the third multiplier is connected with the input end of the third right shift register, and the third multiplier is used for controlling the difference value output by the second subtracter to be multiplied by the data output by the fourth data register when the enable flag2 of reciprocal iterative operation is valid; the third right shift register is used for right shift processing of the product output by the third multiplier, so that the product result output by the third multiplier can represent the decimal digit number in the form of binary number without loss after shift processing. When the predetermined precision is a decimal requiring coverage to the lower 16 bits, the third right shift register is labeled in FIG. 1 ">>And a rectangular frame of 16 ″, namely, performing right-shift processing on the intermediate result subjected to the multiplication processing of the second multiplier and the third multiplier in sequence by 16 bits, so that the precision of the finally obtained reciprocal result covers a decimal with 16 bits. At this time, the result output by the third right shift register is the reciprocal result (intermediate result) of the round (one-stage) iterative operation in the reciprocal iterative operation formula, that is, the step of the round (one-stage) iterative operation in the reciprocal iterative operation formula is restored through a hardware circuit, and the step corresponds to a specific iteration numberY of condition ii+1=(Yi*(2<<16-((b*Yi)>>16)))>>16;
Wherein the number of bits of the shift supported by the second left shift register, the second right shift register and the third right shift register is determined by a predetermined precision.
The output of the third right shift register is connected to a first input of a second selector, the second input of which is arranged to receive said preset operand for the reciprocal iteration operation, corresponding to 4' b1001 in fig. 1. The selection end of the second selector is used for receiving the enable flag2 of the reciprocal iterative operation, and the output end of the second selector is connected with the input end of the third data register; a second selector, configured to select to send data output by the third right shift register to the third data register when the enable flag2 of the reciprocal iterative operation is enabled (flag2 is 1), so as to buffer a reciprocal result output in the current stage of reciprocal iterative operation in the third data register; and a second selector for selecting to send the preset operand received by the second input terminal of the second selector to the third data register when the enable flag2 of the reciprocal iterative operation is invalid (flag2 ═ 0), which may be stored as initial values of newton iterations required for each start of the reciprocal iterative operation.
The first output end of the third data register is connected with the first input end of the third subtracter, the second output end of the third data register is connected with the input end of the fourth data register to form two cascaded data registers, the second output end of the fourth data register is connected with the second input end of the third subtracter, so that the reciprocal result output by the current stage of reciprocal iteration operation is cached in the third data register, the reciprocal result output by the last stage of reciprocal iteration operation is cached in the fourth data register, and the cached data of the fourth data register supports data coverage updating transmitted by the third data register.
The output end of the third subtractor is connected with the input end of the second comparator, and the output end of the second comparator is used for setting an effective reciprocal iteration operation ending flag3 to the square-open iteration module when the absolute value of the difference value between the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is smaller than the reciprocal iteration convergence difference value, determining that the reciprocal iteration ending condition is met, stopping continuously executing the reciprocal iteration operation, resetting the related iteration times, and outputting the data, which is the reciprocal result meeting the preset convergence condition, to the square-open iteration module by the first output end of the fourth data register, corresponding to the reciprocal result output in the last stage of reciprocal iteration operation. Wherein the reciprocal iteration convergence difference is preferably set to 4. When the absolute value of the difference value between the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is greater than or equal to the reciprocal iteration convergence difference value, setting an invalid reciprocal iteration operation ending flag3 to the squaring iteration module, controlling the data cached by the third data register to be transmitted to the fourth data register so as to cover the data originally cached by the fourth data register, and configuring the latest cached data of the fourth data register as the input operand of the next reciprocal iteration operation, thereby controlling the second circuit unit architecture to execute the next reciprocal iteration operation; and after the reciprocal result output by the previous stage of reciprocal iterative operation is configured as the input operand of the next stage of reciprocal iterative operation, the reciprocal result is output to the second input end of the second multiplier by the fourth data register so as to carry out the next stage of reciprocal iterative operation, and then the reciprocal result output by the next stage of reciprocal iterative operation is cached in the third data register. And the difference value of the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is processed and output by the third subtracter.
The embodiment corresponding to the reciprocal iteration module takes a multiplier and a right shift register as basic calculation units, and a subtracter and a selector are matched in sequence to complete a reciprocal iteration operation formula Y deduced from a Newton iteration method formulai+1=((2<<m–((b*Yi)>>m))*Yi)>>m hardware architecture layout design, wherein Yi+1Is the reciprocal of the precision of m-bit decimal output by the reciprocal iterative operation of the (i +1) th stage,m is preferably 16, and 2 is a normal quantity deformed in the process of solving the square root of the number to be squared by using a Newton iteration method formula; b is data output by a second data register built in the squaring iteration module, namely a square root result output by the squaring iteration module in the last-stage squaring iteration operation; y isiThe integer is the reciprocal of the precision of m-bit decimal output by the ith stage reciprocal iterative operation, and m and i are integers. And performing reciprocal iteration operation without using hardware resources of a divider.
As an embodiment, as shown in fig. 1, the square-on iteration module includes a first multiplier, an and gate, a first adder, a first right shift register, a first selector, a first data register, a second data register, a first subtractor, and a first comparator, and is configured to be connected to form the first circuit unit architecture to perform a square iteration operation; the first input end of the first multiplier is used for receiving the to-be-opened power a; the second input terminal of the first multiplier is connected to the first output terminal of the fourth data register, and is configured to receive data output by the fourth data register, and preferably, the reciprocal result meeting the preset convergence condition includes the preset operand, specifically: when the absolute value of the difference value between the data cached in the third data register and the data cached in the fourth data register meets the reciprocal iteration termination condition, the reciprocal iteration module only executes reciprocal iteration operation once, and the data output to the second input end of the first multiplier by the fourth data register is the preset operand, so that the preset operand becomes a reciprocal result meeting the preset convergence condition.
As shown in fig. 1, an enable end of the first multiplier is connected to an output end of the and gate, a first input end of the and gate is configured to receive a signal corresponding to the end flag3 of the reciprocal iterative operation, a second input end of the and gate is configured to receive the negation signal-flag 4 corresponding to the end flag4 of the square-division iterative operation, and the output end of the first multiplier is connected to the first input end of the first adder; a first multiplier configured to, when the end flag3 of the reciprocal iterative operation is valid (flag3 is 1) and the end flag4 of the open-square iterative operation is invalid (flag 4 is 1), add the end flag to the reciprocal iterative operationAnd multiplying the number to be squared by the data output by the fourth data register, and sending the product result to a first adder. The output end of the AND gate is connected with the enable end of the first adder, the first input end of the first adder is connected with the output end of the first multiplier, and the second input end of the first adder is connected with the first output end of the second data register; the second input end of the first adder is used for receiving the data output by the second data register so as to realize that the square-root iterative module starts to process the square root result X output by the square-root iterative module in the previous stage of square-root iterative operation in the current stage of square-root iterative operationnThe method is equivalent to that the square-open iteration module starts to process the reciprocal result output by the reciprocal iteration module in the just finished first reciprocal iteration operation in the current first-stage square-open iteration operation; and a first adder for adding the product result output by the first multiplier to the data output by the second data register when the end flag3 of the reciprocal iterative operation is valid and the end flag4 of the square-open iterative operation is invalid.
The output end of the first adder is connected with the input end of the first right shift register, the output end of the first right shift register is connected with the second input end of the first selector, and the first right shift register is used for receiving a sum value obtained by addition operation executed by the first adder and then performing right shift processing on the sum value; the first right shift register corresponds to the rectangular box marked with the symbol "> > 1" shown in fig. 1, and supports right shift of input data by 1 bit, which is equivalent to controlling the sum result output by the first adder to be divided by 2, so that the operation of using the first right shift register to complete the operation of performing the ratio of 2 in the formula of the square-open iteration operation is realized.
An output terminal of the first right shift register is connected with a first input terminal of a first selector, a second input terminal of the first selector is used for receiving the initial operand, a selection terminal of the first selector is used for receiving the operable flag1, and an output terminal of the first selector is connected with an input terminal of the first data register; a first selector for selecting the data outputted from the first right shift register to be supplied to the first number when the operable flag1 is disabledThe data register is used for caching a square root result output in the current level of square-open iterative operation in the first data register; the first selector, further configured to select to send the initial operand X0 received at the second input of the first selector to the first data register when the operational flag1 is asserted, may store an initial value of newton iterations required for performing the square-on iteration operation each time enabled. The result output by the third right shift register is a reciprocal result (intermediate result) of the first-stage iterative operation in the open-square iterative operation formula, that is, the step of restoring the first-stage iterative operation in the open-square iterative operation formula through a hardware circuit corresponds to X of a condition of a specific iteration number nn+1=(Xn+a/Xn)>>1。
The first output end of the first data register is connected with the first input end of the first subtracter, the second output end of the first data register is connected with the input end of the second data register, the second output end of the second data register is connected with the second input end of the first subtracter, so that the reciprocal result output by the current-stage open square iterative operation is cached in the first data register, the reciprocal result output by the previous-stage open square iterative operation is cached in the fourth data register, and the cached data of the second data register supports data coverage updating transmitted by the first data register.
The output end of the first subtractor is connected with the input end of the first comparator, and the output end of the first comparator is used for outputting an end flag4 of the square-open iterative operation; the squaring iteration module is configured to determine that an end condition of the squaring iteration is satisfied when an absolute value of a difference between data output from a first output end of the first data register and data output from a second output end of the second data register is smaller than a convergence difference of the squaring iteration, and set an end flag4 of the squaring iteration to be output as valid, where this embodiment controls data (a square root result output by a current-stage squaring iteration) buffered by the first data register to be transmitted to the second data register to cover data originally buffered by the second data register under the drive of a beat of an external clock, so that data output to the latest by a third output end S of the second data register is a square root result finally calculated by the operation circuit system, and is used as a square root conforming to the end condition of the squaring iteration and is equal to a square root result of an approximation of a number to be generated, thereby controlling the second circuit unit architecture to stop executing reciprocal iteration operation and the first circuit unit architecture to stop executing open-square iteration operation. Wherein the open-squared iteration convergence difference is preferably set to 2. The square-on iteration module is further used for calculating the difference value between the first output end of the first data register and the second output end of the second data register when the absolute value of the difference value between the data output by the first output end of the first data register and the data output by the second output end of the second data register is larger than or equal to the convergence difference value of the square-on iteration, the enable flag2 of the reciprocal iteration operation is set to be effective and output to the reciprocal iteration module to play a role in controlling the iteration of the reciprocal iteration operation, and the data cached in the first data register is transmitted to the second data register under the control of an external clock so as to cover the data originally cached in the second data register, and the latest cached data in the second data register is configured as the input operand of the reciprocal iterative operation, to restart execution of the reciprocal iteration operation or to initiate execution of the reciprocal iteration operation (for the case of non-execution), of course, the end flag4 of the square-on iterative operation set by the first comparator is invalid; the difference value of the data output by the first output end of the first data register and the data output by the second output end of the second data register is processed and output by the first subtracter.
In this embodiment, a multiplier and a right shift register are used as basic calculation units, and an adder and a selector are sequentially matched to complete a square-off iterative operation formula X derived from a newton iteration formulan+1=(Xn+a*(1/Xn))>>1, wherein Xn+1Is the square root with preset precision output by the (n +1) th-level square-opening iterative operation, and a is the original number of the square to be opened; xnIs the square root with a predetermined precision of the output of the nth stage of the open-square iterative operation, n being an integer. And performing the square-open iteration operation without using hardware resources of a divider.
The square-open iteration module and the reciprocal iteration module cross-interact with each other by using respective iteration control signals, for example, an end flag3 of reciprocal iteration operation set by the reciprocal iteration module plays an iteration control role in the square-open iteration module, the square-open iteration module can set an enable flag2 of the reciprocal iteration operation to be valid and output to the reciprocal iteration module to play an iteration control role in reciprocal iteration operation, and then when the square-open iteration module and the reciprocal iteration module reach a preset convergence difference value, the corresponding iteration operation is iteratively exited to obtain appropriate result data. Thereby reducing the amount of redundant iterative computation without degrading the estimation accuracy.
On the basis of the foregoing embodiment, the reciprocal iteration convergence difference is greater than the squared iteration convergence difference; the smaller the reciprocal iteration convergence difference value is, the higher the precision of the corresponding reciprocal iteration operation is, and the longer the time of the corresponding reciprocal iteration operation is; the smaller the square-opening iteration convergence difference value is, the higher the precision of the corresponding square-opening iteration operation is, and the longer the time of the corresponding square-opening iteration operation is. The method is suitable for application scenarios with high requirements on computing delay and sensitive cost.
On the basis of the foregoing embodiment, the predetermined precision concrete form is that the decimal number of the corresponding binary number is set to 16 bits, so that the decimal number of 16 bits supports the reciprocal iteration module to calculate the reciprocal of the decimal number ranging from 0 to 65565; wherein the number to be extracted is a binary number with the lower 16 bits configured as a decimal number. The calculation requirement of the binary numbers of the batches is met.
A chip is internally provided with the hardware acceleration-based operation circuit system. The disadvantage that a divider is needed in the prior art for solving the square root by a Newton iteration method is avoided, and hardware resources are greatly saved. The calculation complexity and hardware cost of the system software level are greatly reduced, and the accuracy under a specific convergence result can be obtained by the iterative calculation in cooperation with iterative control flag bit signals (the flags 1 to 4 in the foregoing embodiment); provides rich flexibility without additional hardware cost. The compatibility is good.
It should be noted that, in the above embodiments, each multiplier, adder, subtractor, shift register, data register, selector, and comparator in the square-open iteration module and the reciprocal iteration module may be a digital circuit module compiled by a designer using a hardware description language Verilog HDL, or a digital circuit module compiled by a designer on software having a circuit drawing or compiling function. In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module.

Claims (11)

1. An arithmetic circuit system based on hardware acceleration is characterized by comprising a square-opening iteration module and a reciprocal iteration module;
the squaring iteration module is used for controlling the preset number to be squared to be multiplied by a reciprocal result which is output by the reciprocal iteration module and meets a preset convergence condition so as to carry out the squaring iteration operation with preset precision when the ending mark of the reciprocal iteration operation is effective and the ending mark of the squaring iteration operation is ineffective;
the reciprocal iteration module is used for controlling a reciprocal result output by the reciprocal iteration module in the last reciprocal iteration operation and a square root result output by the square-open iteration module in the last square-open iteration operation in the current reciprocal iteration operation when the enable flag of the reciprocal iteration operation is effective, and obtaining the reciprocal result which is output by the reciprocal iteration module and meets the preset convergence condition;
the end mark of reciprocal iteration operation is an iteration end control signal set from a reciprocal iteration module to an open square iteration module; the end mark of the square-opening iteration operation is an iteration end control signal set by the square-opening iteration module; the enabling mark of reciprocal iteration operation is an iteration starting control signal set from the square-opening iteration module to the reciprocal iteration module;
the square-open iteration operation is an iteration operation of solving the square root of a number to be squared by utilizing a Newton iteration method formula, and a first circuit unit framework corresponding to the iteration logic relationship of the square-open iteration operation is connected in a square-open iteration module; the reciprocal iteration operation is an iteration operation of solving the reciprocal of the number to be extracted by using a Newton iteration method formula, and a second circuit unit framework corresponding to the iteration logic relationship of the reciprocal iteration operation is connected in a reciprocal iteration module.
2. The operational circuitry as claimed in claim 1, wherein, in the process of performing the square-root iterative operation with a predetermined precision by the square-root iterative operation module, when an absolute value of a difference between a square-root result output in the previous-stage square-root iterative operation and a square-root result output in the current-stage square-root iterative operation satisfies a square-root iterative termination condition, it is determined to terminate the square-root iterative operation, and the square-root result output in the current-stage square-root iterative operation is taken as a square-root result finally obtained by the square-root iterative operation, and an end flag of the square-root iterative operation is set to be valid; when the absolute value of the difference value between the square root result output in the last-stage squaring iterative operation and the square root result output in the current-stage squaring iterative operation does not meet the squaring iterative termination condition, outputting the square root result output in the current-stage squaring iterative operation to a reciprocal iteration module, and simultaneously setting an effective enabling mark of reciprocal iteration operation to the reciprocal iteration module;
in the process of reciprocal iteration operation with preset precision by a reciprocal iteration module, when the difference value between a reciprocal result output in the last reciprocal iteration operation and a reciprocal result output in the current reciprocal iteration operation meets a reciprocal iteration termination condition, setting an effective reciprocal iteration operation termination mark to a square-open iteration module, determining that the reciprocal result output in the last reciprocal iteration operation by the reciprocal iteration module is a reciprocal result meeting a preset convergence condition, and transmitting the reciprocal result meeting the preset convergence condition to the square-open iteration module; when the difference value between the reciprocal result output in the last reciprocal iterative operation and the reciprocal result output in the previous reciprocal iterative operation does not meet the reciprocal iteration termination condition, updating the reciprocal result output in the current reciprocal iterative operation to the input operand of the next reciprocal iterative operation, and maintaining the enabling flag of the reciprocal iterative operation to be effective.
3. The operational circuitry of claim 1 or 2, wherein the operational circuitry further comprises a pre-processing module; the square-on iteration module comprises a first data register and a second data register;
the preprocessing module is used for sending a pre-input number to be squared to a built-in shift register, setting the number to be squared after shift processing of the shift register as an initial operand with preset precision for square-off iterative operation, and setting an operable mark to the square-off iterative module;
the square-open iteration module is used for receiving and caching the initial operand into a first data register when the operable flag is valid, and caching the initial operand into a second data register from the first data register under the control of an external driving clock so as to update the initial operand into an input operand of reciprocal iteration operation;
and the squaring iteration module is used for caching the square root result output by the current-level squaring iteration operation into the second data register from the first data register on the premise that the operable flag is invalid, if the first data register caches the square root result output by the current-level squaring iteration operation and the second data register caches the square root result output by the previous-level squaring iteration operation, and the absolute value of the difference value between the square root result output by the previous-level squaring iteration operation and the square root result output by the current-level squaring iteration operation does not meet the squaring iteration termination condition, so that the square root result output by the current-level squaring iteration operation covers the data cached by the second data register.
4. The operational circuitry of claim 3, wherein the initial operand has a decimal number corresponding to a number of bits that is one bit greater than the number of bits of the shift register configuration built into the pre-processing module, such that the initial operand is half the number of bits to be opened.
5. The operational circuitry as claimed in claim 3, wherein the reciprocal iteration module includes a third data register and a fourth data register, and is configured to, under the condition that the third data register buffers a reciprocal result output by a current stage of reciprocal iteration operation and the fourth data register buffers a reciprocal result output by a previous stage of reciprocal iteration operation, if a difference between the reciprocal result output by the previous stage of reciprocal iteration operation and the reciprocal result output by the current stage of reciprocal iteration operation does not satisfy a reciprocal iteration termination condition, buffer-store the reciprocal result output by the current stage of reciprocal iteration operation from the third data register into the fourth data register, so that the reciprocal result output by the current stage of reciprocal iteration operation covers data buffered by the fourth data register;
and the fourth data register is used for caching a preset operand for reciprocal iteration operation when the enable flag of the reciprocal iteration operation is invalid.
6. The operational circuitry of claim 5, wherein the reciprocal iteration module comprises a second multiplier, a second right shift register, a second left shift register, a second subtractor, a third multiplier, a third right shift register, a second selector, a third data register, a fourth data register, a third subtractor, and a second comparator, and is configured to be coupled to form the second circuit unit architecture to perform a reciprocal iteration operation;
a first input end of the second multiplier is connected with a first output end of a second data register built in the square-open iteration module, and the first input end of the second multiplier is used for receiving data output by the second data register; a second input end of the second multiplier is connected with a first output end of the fourth data register, and a second input end of the second multiplier is used for receiving data output by the fourth data register;
the output end of the second multiplier is connected with the input end of a second right shift register, and the second right shift register is used for performing right shift processing on the product result output by the second multiplier; wherein the square root result and the reciprocal result both have a corresponding fractional number of digits at the predetermined precision;
the input end of the second left shift register is used for receiving a first preset binary number and shifting the first preset binary number to form a binary number with reserved complete decimal places; the first preset binary number is a normal number deformed in the process of solving the square root of the number to be squared by using a Newton iteration method formula;
the output end of the second right shift register is connected with the first input end of the second subtracter, the output end of the left shift register is connected with the second input end of the second subtracter, the output end of the second subtracter is connected with the first input end of the third multiplier, and the second subtracter is used for subtracting the shifting result output by the left shift register from the shifting result output by the second right shift register when the enable flag of reciprocal iterative operation is effective, and then sending the difference obtained by subtraction to the third multiplier;
a second input end of the third multiplier is connected with a first output end of the fourth data register, and a second input end of the third multiplier is used for receiving data output by the fourth data register, so that the third multiplier processes a reciprocal result output by the reciprocal iteration module in a last reciprocal iteration operation or the preset operand output by the fourth data register in a current reciprocal iteration operation;
the output end of the third multiplier is connected with the input end of the third right shift register, and the third multiplier is used for controlling the difference value output by the second subtracter to be multiplied by the data output by the fourth data register when the enable flag of the reciprocal iterative operation is effective; the third right shift register is used for performing right shift processing on the product output by the third multiplier so that the product result output by the third multiplier can represent the decimal digit number in a binary number form without loss after the shift processing;
the output end of the third right shift register is connected with the first input end of the second selector, the second input end of the second selector is used for receiving the preset operand, the selection end of the second selector is used for receiving the enable mark of the reciprocal iteration operation, and the output end of the second selector is connected with the input end of the third data register; the second selector is used for selecting and sending the data output by the third right shift register to the third data register when the enable flag of the reciprocal iterative operation is effective; the second selector is used for selecting the preset operand received by the second input end of the second selector to be sent to the third data register when the enable flag of the reciprocal iterative operation is invalid;
a first output end of the third data register is connected with a first input end of a third subtracter, a second output end of the third data register is connected with an input end of a fourth data register, and a second output end of the fourth data register is connected with a second input end of the third subtracter, so that a reciprocal result output by the current stage of reciprocal iterative operation is buffered in the third data register, and a reciprocal result output by the previous stage of reciprocal iterative operation is buffered in the fourth data register; the difference value of the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is processed and output by a third subtracter;
the output end of the third subtractor is connected with the input end of the second comparator, and the output end of the second comparator is used for setting an effective reciprocal iteration operation ending mark to the square-opening iteration module when the absolute value of the difference value between the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is smaller than a reciprocal iteration convergence difference value, so that the condition that the reciprocal iteration ending condition is met is determined, and the data output by the first output end of the fourth data register to the square-opening iteration module is a reciprocal result meeting a preset convergence condition; when the absolute value of the difference value between the data output by the first output end of the third data register and the data output by the second output end of the fourth data register is greater than or equal to the reciprocal iteration convergence difference value, setting an invalid reciprocal iteration operation ending mark to the square-open iteration module, and then configuring the latest cached data of the fourth data register as the input operand of the next reciprocal iteration operation;
wherein the number of bits of the shift supported by the second left shift register, the second right shift register and the third right shift register is determined by a predetermined precision.
7. The operational circuitry of claim 6, wherein the open-square iteration module comprises a first multiplier, an AND gate, a first adder, a first right shift register, a first selector, a first data register, a second data register, a first subtractor, and a first comparator, and is configured to be connected to form the first circuit unit architecture to perform a square iteration operation;
the first input end of the first multiplier is used for receiving the number to be extracted; the second input end of the first multiplier is connected with the first output end of the fourth data register and used for receiving the data output by the fourth data register;
the enable end of the first multiplier is connected with the output end of the AND gate, the first input end of the AND gate is used for receiving the ending mark of the reciprocal iterative operation, the second input end of the AND gate is used for receiving the ending mark of the square-open iterative operation, and the output end of the first multiplier is connected with the first input end of the first adder; the first multiplier is used for multiplying the number to be squared and the data output by the fourth data register when the ending mark of the reciprocal iterative operation is valid and the ending mark of the square-off iterative operation is invalid, and then sending a product result to the first adder;
the output end of the AND gate is connected with the enable end of a first adder, the first input end of the first adder is connected with the output end of a first multiplier, the second input end of the first adder is connected with the first output end of a second data register, and the first adder is used for adding the data output by the second data register and the product result output by the first multiplier when the end mark of the reciprocal iteration operation is valid and the end mark of the square-open iteration operation is invalid;
the output end of the first adder is connected with the input end of the first right shift register, the output end of the first right shift register is connected with the second input end of the first selector, and the first right shift register is used for receiving the sum value output by the first adder and then performing right shift processing on the sum value;
the output end of the first right shift register is connected with the first input end of a first selector, the second input end of the first selector is used for receiving the initial operand, the selection end of the first selector is used for receiving the operable mark, and the output end of the first selector is connected with the input end of the first data register; a first selector, configured to select to send data output by the first right shift register to the first data register when the operable flag is invalid, so as to cache, in the first data register, a square root result output in a current stage of the square-root iterative operation; a first selector for selecting to pass said initial operand received at a second input of the first selector to the first data register when said operable flag is active;
the first output end of the first data register is connected with the first input end of the first subtracter, the second output end of the first data register is connected with the input end of the second data register, and the second output end of the second data register is connected with the second input end of the first subtracter, so that the reciprocal result output by the current-stage open square iterative operation is cached in the third data register, and the reciprocal result output by the previous-stage open square iterative operation is cached in the fourth data register; the difference value of the data output by the first output end of the first data register and the data output by the second output end of the second data register is processed and output by a first subtracter;
the output end of the first subtractor is connected with the input end of the first comparator, the output end of the first comparator is used for outputting a signal corresponding to the end mark of the squaring iteration operation, the squaring iteration module is used for setting the output end mark of the squaring iteration operation to be valid when the absolute value of the difference value between the data output by the first output end of the first data register and the data output by the second output end of the second data register is smaller than the convergence difference value of the squaring iteration operation, the requirement for the end condition of the squaring iteration operation is determined to be met, and the data output to the outside by the third output end of the second data register is a square root result finally calculated by the operation circuit system; the squaring iteration module is further configured to set an enable flag of reciprocal iteration operation to be valid and output to the reciprocal iteration module when an absolute value of a difference between data output from a first output end of the first data register and data output from a second output end of the second data register is greater than or equal to a convergence difference of the squaring iteration, and then configure latest cached data of the second data register as an input operand of the reciprocal iteration operation.
8. The operational circuitry of claim 7, wherein the pre-processing module comprises a preset data register and a first left shift register, the preset data register for buffering the number of to-be-derived bits; the output end of the preset data register is connected with the input end of a first left shift register, and the first left shift register is used for left-shifting the number to be opened into the initial operand so that the initial operand is half of the number to be opened, wherein the number to be opened has a corresponding decimal place number under the preset precision;
the output end of the first left shift register is connected with the second input end of the first selector, and the first left shift register is used for transmitting the initial operand to the first selector;
the output end of the preset data register is connected with the first input end of the first multiplier, and the output end of the preset data register is used for transmitting the to-be-opened square number to the first multiplier.
9. The operational circuitry of claim 7, wherein the reciprocal iteration convergence difference is greater than the squared iteration convergence difference;
the smaller the reciprocal iteration convergence difference value is, the higher the precision of the corresponding reciprocal iteration operation is, and the longer the time of the corresponding reciprocal iteration operation is; the smaller the square-opening iteration convergence difference value is, the higher the precision of the corresponding square-opening iteration operation is, and the longer the time of the corresponding square-opening iteration operation is.
10. The operational circuitry of claim 9, wherein the predetermined precision specific form is that a decimal of the corresponding binary number is set to 16 bits such that a 16-bit wide decimal supports the reciprocal iteration module to compute the reciprocal of a decimal number in the range of 0 to 65565; wherein, the number to be extracted is a binary number with 16 lower bits configured as a decimal number;
the number of shift bits of the second left shift register configuration is 15, and the number of shift bits of the first right shift register configuration is 1;
wherein the number of shift bits of the second right shift register arrangement, the number of shift bits of the second left shift register arrangement and the number of shift bits of the second right shift register arrangement are all 16.
11. A chip having built-in hardware acceleration-based arithmetic circuitry as claimed in any one of claims 1 to 10.
CN202110719972.6A 2021-06-28 2021-06-28 Operation circuit system and chip based on hardware acceleration Active CN113407235B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110719972.6A CN113407235B (en) 2021-06-28 2021-06-28 Operation circuit system and chip based on hardware acceleration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110719972.6A CN113407235B (en) 2021-06-28 2021-06-28 Operation circuit system and chip based on hardware acceleration

Publications (2)

Publication Number Publication Date
CN113407235A true CN113407235A (en) 2021-09-17
CN113407235B CN113407235B (en) 2022-07-08

Family

ID=77679846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110719972.6A Active CN113407235B (en) 2021-06-28 2021-06-28 Operation circuit system and chip based on hardware acceleration

Country Status (1)

Country Link
CN (1) CN113407235B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117075841A (en) * 2023-08-03 2023-11-17 上海合芯数字科技有限公司 SRT operation circuit
CN117075841B (en) * 2023-08-03 2024-05-14 上海合芯数字科技有限公司 SRT operation circuit

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0351242A2 (en) * 1988-07-15 1990-01-17 Fujitsu Limited Floating point arithmetic units
US5341321A (en) * 1993-05-05 1994-08-23 Hewlett-Packard Company Floating point arithmetic unit using modified Newton-Raphson technique for division and square root
CN1225468A (en) * 1998-02-02 1999-08-11 国际商业机器公司 High accuracy estimates of elementary functions
US20030149712A1 (en) * 2002-02-01 2003-08-07 Robert Rogenmoser Higher precision divide and square root approximations
US7191204B1 (en) * 1999-12-22 2007-03-13 Wataru Ogata Computing system using newton-raphson method
US20120011182A1 (en) * 2010-07-06 2012-01-12 Silminds, Llc, Egypt Decimal floating-point square-root unit using newton-raphson iterations
CN111353118A (en) * 2018-12-20 2020-06-30 无锡华润矽科微电子有限公司 Method for squaring and corresponding circuit for squaring

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP0351242A2 (en) * 1988-07-15 1990-01-17 Fujitsu Limited Floating point arithmetic units
US4999801A (en) * 1988-07-15 1991-03-12 Fujitsu Limited Floating point operation unit in division and square root operations
US5341321A (en) * 1993-05-05 1994-08-23 Hewlett-Packard Company Floating point arithmetic unit using modified Newton-Raphson technique for division and square root
CN1225468A (en) * 1998-02-02 1999-08-11 国际商业机器公司 High accuracy estimates of elementary functions
US7191204B1 (en) * 1999-12-22 2007-03-13 Wataru Ogata Computing system using newton-raphson method
US20030149712A1 (en) * 2002-02-01 2003-08-07 Robert Rogenmoser Higher precision divide and square root approximations
US20120011182A1 (en) * 2010-07-06 2012-01-12 Silminds, Llc, Egypt Decimal floating-point square-root unit using newton-raphson iterations
CN111353118A (en) * 2018-12-20 2020-06-30 无锡华润矽科微电子有限公司 Method for squaring and corresponding circuit for squaring

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
何婷婷: "统一结构的浮点除法和开方运算单元的研究与实现", 《知网硕士电子期刊网络出版》 *
周泉等: "平方根倒数速算法的精度优化", 《微电子学与计算机》 *
钟花等: "利用矢量旋转求解平方根的算法及其FPGA实现", 《电子产品世界》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117075841A (en) * 2023-08-03 2023-11-17 上海合芯数字科技有限公司 SRT operation circuit
CN117075841B (en) * 2023-08-03 2024-05-14 上海合芯数字科技有限公司 SRT operation circuit

Also Published As

Publication number Publication date
CN113407235B (en) 2022-07-08

Similar Documents

Publication Publication Date Title
US9552328B2 (en) Reconfigurable integrated circuit device
JP2001229217A (en) Higher-order synthesizing method and recording medium used for its implementation
CN113407235B (en) Operation circuit system and chip based on hardware acceleration
JPH05250146A (en) Arithmetic operation circuit executing integer involution processing
US20030149712A1 (en) Higher precision divide and square root approximations
US5822557A (en) Pipelined data processing device having improved hardware control over an arithmetic operations unit
EP0278529A2 (en) Multiplication circuit capable of operating at a high speed with a small amount of hardware
JPH06175821A (en) Arithmetic and logic unit
CN102693118B (en) Scalar floating point operation accelerator
KR20140138053A (en) Fma-unit, in particular for use in a model calculation unit for pure hardware-based calculation of a function-model
JP2583774B2 (en) High-speed numerical operation device
CN111752613A (en) Processing of iterative operations
US7237000B2 (en) Speed of execution of a conditional subtract instruction and increasing the range of operands over which the instruction would be performed correctly
JP2001043084A (en) Processor system
Feng et al. An Area and Power Efficient Design of Fused integer-floating point unit for RISC-V cores
JP3462670B2 (en) Calculation execution method and calculation device
WO2020084721A1 (en) Computation processing device and computation processing device control method
US20130263152A1 (en) System for scheduling the execution of tasks based on logical time vectors
CN115658004A (en) Method and apparatus for performing floating point data operations
JPS6057603B2 (en) arithmetic processing unit
CN116185333A (en) Method, circuit, chip, medium and equipment for changing end 1 of digital signal into 0
JPH0652215A (en) Matrix arithmetic processor
JP2503983B2 (en) Information processing device
JP2747353B2 (en) Address generator
JP3165687B2 (en) Digital signal processor

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 519000 2706, No. 3000, Huandao East Road, Hengqin new area, Zhuhai, Guangdong

Applicant after: Zhuhai Yiwei Semiconductor Co.,Ltd.

Address before: 519000 room 105-514, No. 6, Baohua Road, Hengqin new area, Zhuhai City, Guangdong Province (centralized office area)

Applicant before: AMICRO SEMICONDUCTOR Co.,Ltd.

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant