WO2017037729A1

WO2017037729A1 - Concurrent architecture of vedic multiplier-an accelerator scheme for high speed computing

Info

Publication number: WO2017037729A1
Application number: PCT/IN2016/000117
Authority: WO
Inventors: Jitendra S. Edle; Prashant R. Deshmukh
Original assignee: Edle Jitendra S; Deshmukh Prashant R
Priority date: 2015-08-30
Filing date: 2016-05-04
Publication date: 2017-03-09
Also published as: IN2015MU03315A

Abstract

Present invention provides Concurrent Architecture of Vedic Multiplier-An Accelerator Scheme for High Speed Computing. The methodblogy of applying Vedic fundamentals greatly optimizes the constraints like Power, Time, Area and Hardware Resource Utilization. And can be; proven for development of Efficient and Secured Templates. Vedic mathematics deeply removes the intermediate steps and gives direct output, for complex procedures like multiplication. The said procedure has wide applications in Encryption, Decryption, Image Processing, Signal Processing, Secured · Wireless Sensor j Network, Cloud Computing, Error Correction and Detection Modules and etc. all these applications has one common block of multiplier, which is complex procedure at hardware level. Hence, a need of high speed Multiplication can be fulfilled by implemented Novel Vedic Multiplier using blocks of concurrently executable hardware architecture like FPGA. Following invention is described in detail, with the -help of Figure 1 of sheet 1 showing the block diagram of the architecture of Vedic Multiplier.

Description

TITLE OF INVENTION

Concurrent Architecture of Vedic Multiplier-An Accelerator Scheme for High Speed Computing

Technical field of invention:

Present invention in general relates to develop a system and methodology for concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing and in particular to implement a system using high computing Field Programmable Gate Arrays.

Prior art:-

Very Large Scale Integration (VLSI) offers wide range of circuit to be integrated in a miniaturized form. They lend themselves to countless applications like Encryption, Decryption, Digital Image Processing, Signal Processing, Wireless Sensor Networks, Cloud Computing, Error Detection and Correction Schemes and almost in all fields. These applications offer numerous challenges due to their peculiarities like power utilization, Speed, Area, Hardware Resources, Memory Utilization, Memory Accessing Speed and etc. The Distinguished traits of applications have direct impact on these peculiarities which in fact, generate overheads, and make it difficult to attain.

Main issue with all above applications spins around designing multiplier. It is the component which, when implemented at hardware level, uses maximum hardware, affects the peculiarities of the applications. Multiplication Component (MC) is implemented using different algorithms: Shift-Add, Booth's Algorithm, Modified Booth's Algorithm, Carry Save Adders, Tree Multipliers, Array Multipliers and etc. all of which are hardware intensive which hampers the peculiarities. Hence need of high speed multiplier raises. US 9063870 B l discloses large multiplier for programmable logic device. A plurality of specialized processing blocks in a programmable logic device, including multipliers and circuitry for adding results of those multipliers, can be configured as a larger multiplier by adding to the specialized processing blocks selectable circuitry for shifting multiplier results before adding. In. one embodiment, this allows all but the final addition to take place in specialized processing blocks, with the final addition occurring in programmable logic. In another embodiment, additional compression and adding circuitry allows even the final addition to occur in the specialized processing blocks. Circuitry that controls when an input is signed or unsigned facilitates complex arithmetic.

US 8271570 B2 discloses Unified integer/galois field (2m) multiplier-architecture for elliptic-curve cryptography. A unified integer/Galois-Field 2m multiplier performs multiply operations for public-key systems such as Rivert, Shamir, Aldeman (RSA), Diffie-Hellman key exchange (DH) and Elliptic Curve Cryptosystem (ECC). The multiply operations may be performed on prime fields and different composite binary fields in independent multipliers in an interleaved fashion.

US 6760742 B l describes Multi-dimensional galois field multiplier. An implementation of a multi-dimensional Galois field multiplier and a method of Galois field multi-dimensional multiplication which are able to support many communication standards having various symbol sizes, different GFs, and different primitive polynomials, in a cost-efficient manner is disclosed. The key to allow a single implementation to perform for all different GF sizes is to align the input data such that the Galois fie ld symbols of the operands are aligned to the left most significant bit (MSB) position of the input data field. Similarly, the primitive polynomial used to create a selected Galois field is aligned to the left MSB position. A polynomial multiply is performed. The product polynomial is then conditionally divided by the primitive polynomial starting with the most significant bit, the condition being if the left most bit of the product is a 1. In other words, if the product polynomial has an MSB of 1, then divide the product with the primitive polynomial. Perform this step until the MSB is 0. In addition, for fields smaller than a maximum size Galois field, the sequence of conditional divisions is further conditioned with a predetermined mask in dependence upon the size of the GF. The resultant product is aligned to the left MSB.

U.S. Pat. No. 7,024,444 B l, entitled "Split Multiplier Array and Method of Operation" and issued to Green, a multiplier circuit for use in a data processor is disclosed. The multiplier circuit comprises a partial products generating circuit that receives a multiplicand value and a multiplier value and generates a group of partial products. The multiplier circuit also comprises a split array for adding the partial products. A first summation array comprises a first group of adders that sum the even partial products to produce an even summation value. A second summation array comprises a second group of adders that sum the odd partial products to produce an odd summation value. The even and odd summation values are then summed to produce the output of the multiplier.

U.S. Pat. No. 7,506,017 B l, entitled "Verifiable Multimode Multipliers" and issued to Dupenloup, a verifiable duplex multiplier circuit is disclosed. In one mode, the circuitry of the duplex multiplier functions as an N-bit ^χ N-bit multiplier. In another mode, the circuitry of the duplex multiplier operates as dual (N/2)-bitx (N/2)-bit multipliers. Because the same circuitry can be used to serve as both an N*N multiplier and as dual N/2*N/2 multipliers, integrated circuit resources are conserved. The duplex multiplier circuitry uses an architecture that can be automatically synthesized using a logic synthesis tool. Verification operations can be performed using logic-equivalency error checking tools. Exhaustive verification is possible using this approach, even when relatively large duplex multipliers (e.g., duplex multipliers with N values of 16 or more) are used. Another US 8645448 B2 discloses Carry less multiplication unit. An apparatus having a carry less preformat unit, a Booth encoder, a compressor, a left shifter, and exclusive-OR logic is described. The carry less preformat unit receives a multiplier operand and partitions the multiplier operand into parts. The Booth encoder receives the parts and directs selection of first partial products of a multiplicand that do not reflect implicit carry operations. The compressor sums the first partial products via a configuration of carry save adders that generate sum bits and carry bits, where generation of the carry bits is disabled during execution of the carry less multiplication. The left shifter shifts bits of one or more outputs of the compressor. The exclusive-OR logic is coupled to the compressor and the left shifter, and is configured to execute an exclusive-OR function on the outputs to yield a carry less multiplication result.

Galois field multiply accumulator is described in US 7003715 B l . An OC- 192 front-end application-specific integrated circuit (ASIC) de-interleaves an OC-192 signal to create four OC-48 signals, and decodes error-correction codes embedded in each of the four OC-48 signals. The decoder generates a Bose- Chaudhuri-Hocquenghem (BCH) error polynomial in no more than 12 clock cycles. The decoder includes several Galois field multiply accumulators, and a state machine which controls the Galois field units. In the specific embodiment wherein the error-correction code is a BCH triple error-correcting code, four Galois field units are used to carry out only six equations to solve the error polynomial. The Galois field units are advantageously designed to complete a Galois field multiply/accumulate operation in a single clock cycle. The Galois field units may operate in multiply or addition pass-through modes.

Another US 6286024 B l discloses High-efficiency multiplier and multiplying method. Upon execution of four sets of m/2 bitxn/2 bit multiplication, four multiplicand selectors select m/2-bit multiplicands respectively and four multiplicator selectors select corresponding n/2-bit multiplicators respectively, then the selected m/2-bit multiplicands and n/2-bit multiplicators, are input into four multipliers, and then four sets of m/2 bitxn/2 bit multiplication are executed in parallel. Upon execution of m bitxn bit multiplication, the four multiplicand selectors select upper or lower m/2 -bit multiplicands respectively and the. four multiplicator selectors select upper or lower n/2-bit multiplicators respectively, then the selected m/2-bit multiplicands and n/2-bit multiplicators are input into the four multipliers respectively, then multiplication results of (lower m/2 bits of m bits)x(lower n/2 bits of n bits) and (upper m/2 bits of m bits)^x(upper n/2 bits of n bits) out of four multiplication results of the four multipliers are connected by a connector, and then the connected multiplication results and the other two multiplication results are added by an adder with arranging in a predetermined bit location each other respectively.

Split radix multiplication is mentioned in US 20050102344 Al . A first number is multiplied by a second number, by representing the first number as a first set of one or more W-bit wide numbers, and representing the second number as a second set of one or more W-bit wide numbers. Each of the W-bit wide numbers from the first set is paired with each of the W-bit wide numbers from the second set. For each pair of W-bit wide numbers, a set of sub-partial products is generated. Combinations of the sub-partial products are formed such that each combination is representable by a W-bit wide lower partial product and a carry out term that has fewer than W bits. The W-bit wide lower partial products and the carry out terms are combined to form the product of the first number and the second number. The carry out term is advantageously representable by (W/2)+l bits.

A New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm is described. Hybrid architecture of Multiplication, addition and CSA generates overheads when designed with booth's algorithm. Proposed method generates several intermediate terms which increases computational overheads. Process like multiplication itself is cumbersome process when hybrid architecture is proposed becomes tedious job when actually implemented.

VLSI implementation of Vedic Mathematics and it's applications in RSA cryptosystems is disclosed. Different crypto systems are available out of which RSA is the first public key algorithm, which with Vedic computations becomes faster but lack at security level. Vedic math based crypto system can be compared with any other crypto systems based on several parameters. Matrix data or two dimensional data can be executed directly by considering row wise serial data base. Built in Self-test block can be added as a separate module for testing the system during boot up.

Efficient FPGA Based Matrix Multiplication using Mux and Vedic Multiplier is disclosed. No doubt the image is always represented in matrix form, but matrices can be assumed as the collection of one dimensional array. These arrays can be later on processed with Vedic multiplier. Alternative to this technique is use of conventional method of row and column consideration, but mul tiplied with Vedic multiplier only gives better result. FPGAs as we know are designed for parallel computing only hence designing multiple fast carry adders - never a matter as long as FPGA are equipped with.

Thus there is need to develop and design an efficient and high speed multiplier to avoid the limitations of the conventional system and methods. Hence the present invention implements a concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing. The proposed invention focuses on designing high speed multiplier using fundamentals of the Vedic Mathematics.

Object: Primary object of the present invention is to provide a system and methodology for concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing. Another object of the present invention is to provide a system which is implemented using high computing Field Programmable Gate Arrays famous for implementing concurrent architecture. Yet another object of the present invention is to design a high speed multiplier using fundamentals of the Vedic Mathematics. Yet another object of the present invention is to use Vedic- a Cosmic scheme for implementation of multiplier which is genuine, in sense, suitable to apply to different applications like Data Security, error correction and detection, DIP, DSP, WSN and etc. without using any supporting algorithms. Yet another object of the present invention is to provide concurrent and generic architecture that overcomes the overheads generated due to multi core and configurable architecture. Yet another object of the present invention is to provide flexible architecture so that data can be accepted in any format. Yet another object of the present invention is to provide a system in which diminishing use of virtual memories caused due to generation of several intermediate terms. Yet another object of the present invention is to optimized hardware generation due to minimal intermediate terms generation, which helps to optimize the parameters like hardware resource utilization, power utilization, timing constraints, area utilization, memory element utilization and etc.

9. Yet another object of the present invention is to provide higher speed, lower cost and less VLSI Area.

10. Yet another object of the present invention is to convert all dedicated blocks into unique/ genuine Vedic Multiplier Block.

1 1. Yet another object of the present invention is to develop N-Pipes of N-bit which dramatically change the computing speed.

12. Yet another object of the present invention is to include Built In Self Test (BIST) for self-testing purpose while boot up.

13. Yet another object of the present invention is to implement a system using Virtex/ Kintex family FPGAs, which are designed at lowest nm technology.

14. Yet another object of the present invention is to provide a novel inclusion in which it is possible to save the repeated execution.

15. Yet another object of the present invention is to provide Intelligent Prediction unit which bypasses the multiplication process, once executed for one data set.

16. Yet another object of the present invention is to provide modification to use as individual block used in PSOC.

Other objects, features and advantages will become apparent from detail description and appended claims to those skilled in art. STATEMENT:

Accordingly following invention provides a concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing. System is implemented using FPGAs, famous for implementing concurrent architecture. The system is suitable to apply to different applications like Data Security, error correction and detection, DIP, DSP, WSN and etc. without using any supporting algorithms. The methodology of applying Vedic fundamentals greatly optimizes the constraints like Power, Time, Area and Hardware Resource Utilization. Vedic mathematics deeply removes the intermediate steps and gives direct output, for complex procedures tike multiplication. In this, the user will have to first select the input data pattern. Here, user can input data in any format like binary, octal, decimal, hexa-decimal, or in matrix, in case of image processing or matrix multiplication unit, data packet in case of secured data communication and etc. Upon selection of the input data pattern, the unit will execute the first step of data conversion, this data conversion depends on the operation to be executed or the unit designed. Next immediate the input data pattern will be compared with the already stored data in the internal memory unit. If one particular input data combination is already executed then its output will be drawn to the output lines directly, without computing the data samples again for multiplication. If the input data sample is not available in the history unit, then its Vedic multiplication is executed and the input data sample and its multiplication result will be stored inside the history unit. With such novel inclusion it is possible to save the repeated execution. The preferred embodiment mainly comprises of input data pattern selector, data conversion unit, internal history unit, programmable logic device, display unit, power supply unit, centralized clock unit, etc. BRIEF DESCRIPTION OF DRAWING:

This invention is described by way of example with reference to the following drawing where,

Figure 1 of sheet 1 shows the block diagram of the architecture of Vedic

Multiplier.

Where,

1 denotes Input data selector unit

2 denotes Data conversion unit

3 denotes Vedic Multiplier

4 denotes Internal data storage for creating history of input and output combination

5 denotes Display device

6 denotes Power supply to all units

7 denotes Centralized clock to all synchronous devices.

In order that the manner in which the above-cited and other advantages and objects of the invention are obtained, a more particular description of the invention briefly described above will be referred, which are illustrated in the appended drawing. Understanding that these drawing depict only typical embodiment of the invention and therefore not to be considered limiting on its scope, the invention will be described with additional specificity and details through the use of the accompanying drawing.

Detailed description:

Vedic Mathematics is the ancient procedure of calculations. Vedic means derived from Vedas. Basic and Upa Sutras (Method) in the Vedic mathematics helps in solving almost all the numeric computations in easy and less time. The sutra used in proposed title is the Urdhva - Tiryagbhyam (Vertical and Crosswise). It is general formula which is applicable to all cases of multiplications.

In the proposed title concurrent architecture of Vedic multiplication block is suggested. System is implemented using high computing Field Programmable Gate Arrays. FPGAs are renowned for implementation of concurrent architecture. Which are configured using Hardware Description Language (HDL). Difference between HDL and other language is that HDL is concurrent language that means all written statements and constructs will be executed simultaneously, requisite while describing any hardware at physical level.

Vedic Multiplier:-

Multiplication is greatly used arithmetic operation that figures prominently in Cryptography, Image Processing, Signal Processing, WSN, Cloud, EDC Systems and etc. The predominant approach in design of classic multiplication components are Shift- Add technique, Array Multiplications, Booth's and Modified Booth's algorithms and etc. all these techniques are hardware intensive, at physical level, main criteria of interest are higher speed, lower cost and less VLSI Area.

Based on the observations, the proposed ! work investigates Vedic multiplier component. Vedic mathematics has a property that it is applicable to N-bit word size directly. This feature makes it more suitable to design fast architecture because it diminishes the overheads generated because of generation of intermediate terms in classic multiplication blocks.

Vedic multiplier is implemented using high performance field programmable gate arrays. FPGAs are famous for implementing parallel executable architecture. FPGA is to be configured using Hardware Description Language (HDL) Platform. Best method of performance of the invention:-

The present invention d iscloses the "Concurrent Architecture of Vedic Multipl ier-An Accelerator Scheme for High Speed Computing". It includes module like: Input data pattern selector, Data conversion unit, internal history unit, programmable logic device, Display unit, Power Supply Unit, Centralized Clock Unit.

In this title, the user will have to first select the input data pattern. Here, user can input data in any format l ike binary, octal, decimal, hexa-decimal, or in matrix, in case of image processing or matrix multiplication unit, data packet in case of secured data communication and etc.

Upon selection of the input data pattern, the unit wil l execute the first step of data conversion, this data conversion depends on the operation to be executed or the unit designed. Next immediate the input data pattern will be compared with the already stored data in the internal memory un it.

This is intelligent step, where, if one particular input data combination is already executed then it's output will be drawn to the output l ines di rectly, without computing the data samples again for mu ltiplication. If the input data sample is not available in the history unit, then it's vedic multiplication is executed and the input data sample and its multiplication result will be stored inside the history unit. With such novel inclusion it is possible to save the repeated execution.

The concurrent architecture of the vedic multiplier is implemented using vertical and horizontal multiplication algorithm and real ized using Programmable Devices. Its feature is, it is wel l designed for real ization concurrent architecture.

Finally computed data pattern is displayed on display unit which may be simple like 16x2 LCD character display or touch screen display or etc. In addition to this separate modules will be working for providing the power supply to all units and centralized clock distribution unit for synchronizing the different modules.

Additional advantages and modification will readily occur to those skilled in art. Therefore, the invention in its broader aspect is not limited to specific details and representative embodiments shown and described herein. Accordingly various modifications may be made without departing from the spirit or scope of the general invention concept as defined by the appended claims and their equivalents.

Claims

CLAIMS claim:-

1. A concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing implemented using FPGAs and suitable to apply to different applications like Data Security, error correction and detection, DIP, DSP, W^'SN and etc. without using any supporting algorithms and greatly optimizes the constraints like Power, Time, Area and Hardware Resource Utilization wherein the system mainly comprises of input data pattern selector, data conversion unit, internal history unit, programmable logic device, display' unit, power supply unit, centralized clock unit

2. In the system as claimed in claim 1 :

a) the user will have to first select the input data pattern where, user can input data in any format like binary , octal, decimal, hexa-decimal, or in matrix, in case of image processing or matrix multip lication unit, data packet in case of secured data communication and etc.

b) Upon selection of the input data pattern, the unit will execute the first step of data conversion,. this data conversion depends on the operation to be executed or the unit designed;

c) Next immediate the input data pattern will be compared with the already stored data in the internal memory unit;

d) If one particular input data combination is already executed then its output will be drawn to the output lines directly, without computing the data samples again for multiplication;

e) If the input data sample is not available in the history unit, then its Vedic multiplication is executed and the input data sample and its multiplication result will be stored inside the history unit;

f) With such novel inclusion it is possible to save the repeated execution.

3. The concurrent architecture of the vedic multiplier as claimed in claim 1 is implemented using vertical and horizontal multiplication algorithm and realized using Programmable Devices and its feature is, it is well designed for realization concurrent architecture.

4. In the system as claimed in claim 2 finally computed data pattern is displayed on display unit which may be simple like 16x2 LCD character display or touch screen display or etc.

5. In the system as claimed in claim 2 and 4 separate modules will be working for providing the power supply to all units and centralized clock distribution unit for synchronizing the different modules.