WO2017037729A1 - Concurrent architecture of vedic multiplier-an accelerator scheme for high speed computing - Google Patents

Concurrent architecture of vedic multiplier-an accelerator scheme for high speed computing Download PDF

Info

Publication number
WO2017037729A1
WO2017037729A1 PCT/IN2016/000117 IN2016000117W WO2017037729A1 WO 2017037729 A1 WO2017037729 A1 WO 2017037729A1 IN 2016000117 W IN2016000117 W IN 2016000117W WO 2017037729 A1 WO2017037729 A1 WO 2017037729A1
Authority
WO
WIPO (PCT)
Prior art keywords
vedic
multiplier
unit
multiplication
input data
Prior art date
Application number
PCT/IN2016/000117
Other languages
French (fr)
Inventor
Jitendra S. Edle
Prashant R. Deshmukh
Original Assignee
Edle Jitendra S
Deshmukh Prashant R
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Edle Jitendra S, Deshmukh Prashant R filed Critical Edle Jitendra S
Publication of WO2017037729A1 publication Critical patent/WO2017037729A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/52Multiplying; Dividing
    • G06F7/523Multiplying only
    • G06F7/53Multiplying only in parallel-parallel fashion, i.e. both operands being entered in parallel

Definitions

  • Present invention in general relates to develop a system and methodology for concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing and in particular to implement a system using high computing Field Programmable Gate Arrays.
  • VLSI Very Large Scale Integration
  • These applications offer numerous challenges due to their peculiarities like power utilization, Speed, Area, Hardware Resources, Memory Utilization, Memory Accessing Speed and etc.
  • the Distinguished traits of applications have direct impact on these peculiarities which in fact, generate overheads, and make it difficult to attain.
  • Multiplication Component is implemented using different algorithms: Shift-Add, Booth's Algorithm, Modified Booth's Algorithm, Carry Save Adders, Tree Multipliers, Array Multipliers and etc. all of which are hardware intensive which hampers the peculiarities. Hence need of high speed multiplier raises.
  • US 9063870 B l discloses large multiplier for programmable logic device.
  • a plurality of specialized processing blocks in a programmable logic device can be configured as a larger multiplier by adding to the specialized processing blocks selectable circuitry for shifting multiplier results before adding. In. one embodiment, this allows all but the final addition to take place in specialized processing blocks, with the final addition occurring in programmable logic. In another embodiment, additional compression and adding circuitry allows even the final addition to occur in the specialized processing blocks. Circuitry that controls when an input is signed or unsigned facilitates complex arithmetic.
  • US 8271570 B2 discloses Unified integer/galois field (2m) multiplier-architecture for elliptic-curve cryptography.
  • a unified integer/Galois-Field 2m multiplier performs multiply operations for public-key systems such as Rivert, Shamir, Aldeman (RSA), Diffie-Hellman key exchange (DH) and Elliptic Curve Cryptosystem (ECC).
  • the multiply operations may be performed on prime fields and different composite binary fields in independent multipliers in an interleaved fashion.
  • US 6760742 B l describes Multi-dimensional galois field multiplier.
  • An implementation of a multi-dimensional Galois field multiplier and a method of Galois field multi-dimensional multiplication which are able to support many communication standards having various symbol sizes, different GFs, and different primitive polynomials, in a cost-efficient manner is disclosed.
  • the key to allow a single implementation to perform for all different GF sizes is to align the input data such that the Galois fie ld symbols of the operands are aligned to the left most significant bit (MSB) position of the input data field.
  • MSB most significant bit
  • the primitive polynomial used to create a selected Galois field is aligned to the left MSB position. A polynomial multiply is performed.
  • the product polynomial is then conditionally divided by the primitive polynomial starting with the most significant bit, the condition being if the left most bit of the product is a 1. In other words, if the product polynomial has an MSB of 1, then divide the product with the primitive polynomial. Perform this step until the MSB is 0. In addition, for fields smaller than a maximum size Galois field, the sequence of conditional divisions is further conditioned with a predetermined mask in dependence upon the size of the GF. The resultant product is aligned to the left MSB.
  • the multiplier circuit comprises a partial products generating circuit that receives a multiplicand value and a multiplier value and generates a group of partial products.
  • the multiplier circuit also comprises a split array for adding the partial products.
  • a first summation array comprises a first group of adders that sum the even partial products to produce an even summation value.
  • a second summation array comprises a second group of adders that sum the odd partial products to produce an odd summation value. The even and odd summation values are then summed to produce the output of the multiplier.
  • a verifiable duplex multiplier circuit In one mode, the circuitry of the duplex multiplier functions as an N-bit ⁇ N-bit multiplier. In another mode, the circuitry of the duplex multiplier operates as dual (N/2)-bitx (N/2)-bit multipliers. Because the same circuitry can be used to serve as both an N*N multiplier and as dual N/2*N/2 multipliers, integrated circuit resources are conserved.
  • the duplex multiplier circuitry uses an architecture that can be automatically synthesized using a logic synthesis tool. Verification operations can be performed using logic-equivalency error checking tools.
  • Another US 8645448 B2 discloses Carry less multiplication unit.
  • An apparatus having a carry less preformat unit, a Booth encoder, a compressor, a left shifter, and exclusive-OR logic is described.
  • the carry less preformat unit receives a multiplier operand and partitions the multiplier operand into parts.
  • the Booth encoder receives the parts and directs selection of first partial products of a multiplicand that do not reflect implicit carry operations.
  • the compressor sums the first partial products via a configuration of carry save adders that generate sum bits and carry bits, where generation of the carry bits is disabled during execution of the carry less multiplication.
  • the left shifter shifts bits of one or more outputs of the compressor.
  • the exclusive-OR logic is coupled to the compressor and the left shifter, and is configured to execute an exclusive-OR function on the outputs to yield a carry less multiplication result.
  • Galois field multiply accumulator is described in US 7003715 B l .
  • An OC- 192 front-end application-specific integrated circuit (ASIC) de-interleaves an OC-192 signal to create four OC-48 signals, and decodes error-correction codes embedded in each of the four OC-48 signals.
  • the decoder generates a Bose- Chaudhuri-Hocquenghem (BCH) error polynomial in no more than 12 clock cycles.
  • BCH Bose- Chaudhuri-Hocquenghem
  • the decoder includes several Galois field multiply accumulators, and a state machine which controls the Galois field units.
  • the error-correction code is a BCH triple error-correcting code
  • four Galois field units are used to carry out only six equations to solve the error polynomial.
  • the Galois field units are advantageously designed to complete a Galois field multiply/accumulate operation in a single clock cycle.
  • the Galois field units may operate in multiply or addition pass-through modes.
  • Another US 6286024 B l discloses High-efficiency multiplier and multiplying method. Upon execution of four sets of m/2 bitxn/2 bit multiplication, four multiplicand selectors select m/2-bit multiplicands respectively and four multiplicator selectors select corresponding n/2-bit multiplicators respectively, then the selected m/2-bit multiplicands and n/2-bit multiplicators, are input into four multipliers, and then four sets of m/2 bitxn/2 bit multiplication are executed in parallel. Upon execution of m bitxn bit multiplication, the four multiplicand selectors select upper or lower m/2 -bit multiplicands respectively and the.
  • multiplicator selectors select upper or lower n/2-bit multiplicators respectively, then the selected m/2-bit multiplicands and n/2-bit multiplicators are input into the four multipliers respectively, then multiplication results of (lower m/2 bits of m bits)x(lower n/2 bits of n bits) and (upper m/2 bits of m bits) x (upper n/2 bits of n bits) out of four multiplication results of the four multipliers are connected by a connector, and then the connected multiplication results and the other two multiplication results are added by an adder with arranging in a predetermined bit location each other respectively.
  • a first number is multiplied by a second number, by representing the first number as a first set of one or more W-bit wide numbers, and representing the second number as a second set of one or more W-bit wide numbers.
  • Each of the W-bit wide numbers from the first set is paired with each of the W-bit wide numbers from the second set.
  • a set of sub-partial products is generated for each pair of W-bit wide numbers. Combinations of the sub-partial products are formed such that each combination is representable by a W-bit wide lower partial product and a carry out term that has fewer than W bits.
  • the W-bit wide lower partial products and the carry out terms are combined to form the product of the first number and the second number.
  • the carry out term is advantageously representable by (W/2)+l bits.
  • Hybrid architecture of Multiplication, addition and CSA generates overheads when designed with booth's algorithm.
  • Proposed method generates several intermediate terms which increases computational overheads.
  • Process like multiplication itself is cumbersome process when hybrid architecture is proposed becomes tedious job when actually implemented.
  • VLSI implementation of Vedic Mathematics and it's applications in RSA cryptosystems is disclosed. Different crypto systems are available out of which RSA is the first public key algorithm, which with Vedic computations becomes faster but lack at security level. Vedic math based crypto system can be compared with any other crypto systems based on several parameters. Matrix data or two dimensional data can be executed directly by considering row wise serial data base. Built in Self-test block can be added as a separate module for testing the system during boot up.
  • the present invention implements a concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing.
  • the proposed invention focuses on designing high speed multiplier using fundamentals of the Vedic Mathematics.
  • Primary object of the present invention is to provide a system and methodology for concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing. Another object of the present invention is to provide a system which is implemented using high computing Field Programmable Gate Arrays famous for implementing concurrent architecture. Yet another object of the present invention is to design a high speed multiplier using fundamentals of the Vedic Mathematics. Yet another object of the present invention is to use Vedic- a Cosmic scheme for implementation of multiplier which is genuine, in sense, suitable to apply to different applications like Data Security, error correction and detection, DIP, DSP, WSN and etc. without using any supporting algorithms. Yet another object of the present invention is to provide concurrent and generic architecture that overcomes the overheads generated due to multi core and configurable architecture.
  • Yet another object of the present invention is to provide higher speed, lower cost and less VLSI Area.
  • Yet another object of the present invention is to convert all dedicated blocks into unique/ genuine Vedic Multiplier Block.
  • Yet another object of the present invention is to develop N-Pipes of N-bit which dramatically change the computing speed.
  • Yet another object of the present invention is to include Built In Self Test (BIST) for self-testing purpose while boot up.
  • BIST Built In Self Test
  • Yet another object of the present invention is to implement a system using Virtex/ Kintex family FPGAs, which are designed at lowest nm technology.
  • Yet another object of the present invention is to provide a novel inclusion in which it is possible to save the repeated execution.
  • Yet another object of the present invention is to provide Intelligent Prediction unit which bypasses the multiplication process, once executed for one data set.
  • Yet another object of the present invention is to provide modification to use as individual block used in PSOC.
  • Vedic Multiplier- an accelerator scheme for high speed computing System is implemented using FPGAs, famous for implementing concurrent architecture.
  • the system is suitable to apply to different applications like Data Security, error correction and detection, DIP, DSP, WSN and etc. without using any supporting algorithms.
  • the methodology of applying Vedic fundamentals greatly optimizes the constraints like Power, Time, Area and Hardware Resource Utilization.
  • Vedic mathematics deeply removes the intermediate steps and gives direct output, for complex procedures tike multiplication. In this, the user will have to first select the input data pattern.
  • user can input data in any format like binary, octal, decimal, hexa-decimal, or in matrix, in case of image processing or matrix multiplication unit, data packet in case of secured data communication and etc.
  • the unit Upon selection of the input data pattern, the unit will execute the first step of data conversion, this data conversion depends on the operation to be executed or the unit designed. Next immediate the input data pattern will be compared with the already stored data in the internal memory unit. If one particular input data combination is already executed then its output will be drawn to the output lines directly, without computing the data samples again for multiplication. If the input data sample is not available in the history unit, then its Vedic multiplication is executed and the input data sample and its multiplication result will be stored inside the history unit. With such novel inclusion it is possible to save the repeated execution.
  • the preferred embodiment mainly comprises of input data pattern selector, data conversion unit, internal history unit, programmable logic device, display unit, power supply unit, centralized clock unit, etc.
  • Figure 1 of sheet 1 shows the block diagram of the architecture of Vedic
  • Vedic Mathematics is the ancient procedure of calculations. Vedic means derived from Vedas. Basic and Upa Sutras (Method) in the Vedic mathematics helps in solving almost all the numeric computations in easy and less time.
  • the sutra used in proposed title is the Urdhva - Tiryagbhyam (Vertical and Crosswise). It is general formula which is applicable to all cases of multiplications.
  • concurrent architecture of Vedic multiplication block is suggested.
  • System is implemented using high computing Field Programmable Gate Arrays.
  • FPGAs are renowned for implementation of concurrent architecture. Which are configured using Hardware Description Language (HDL). Difference between HDL and other language is that HDL is concurrent language that means all written statements and constructs will be executed simultaneously, requisite while describing any hardware at physical level.
  • HDL Hardware Description Language
  • Multiplication is greatly used arithmetic operation that figures prominently in Cryptography, Image Processing, Signal Processing, WSN, Cloud, EDC Systems and etc.
  • the predominant approach in design of classic multiplication components are Shift- Add technique, Array Multiplications, Booth's and Modified Booth's algorithms and etc. all these techniques are hardware intensive, at physical level, main criteria of interest are higher speed, lower cost and less VLSI Area.
  • Vedic multiplier component Vedic mathematics has a property that it is applicable to N-bit word size directly. This feature makes it more suitable to design fast architecture because it diminishes the overheads generated because of generation of intermediate terms in classic multiplication blocks.
  • Vedic multiplier is implemented using high performance field programmable gate arrays.
  • FPGAs are famous for implementing parallel executable architecture.
  • FPGA is to be configured using Hardware Description Language (HDL) Platform. Best method of performance of the invention:-
  • the present invention d iscloses the "Concurrent Architecture of Vedic Multipl ier-An Accelerator Scheme for High Speed Computing”. It includes module like: Input data pattern selector, Data conversion unit, internal history unit, programmable logic device, Display unit, Power Supply Unit, Centralized Clock Unit.
  • user will have to first select the input data pattern.
  • user can input data in any format l ike binary, octal, decimal, hexa-decimal, or in matrix, in case of image processing or matrix multiplication unit, data packet in case of secured data communication and etc.
  • the unit wil l execute the first step of data conversion, this data conversion depends on the operation to be executed or the unit designed. Next immediate the input data pattern will be compared with the already stored data in the internal memory un it.
  • the concurrent architecture of the vedic multiplier is implemented using vertical and horizontal multiplication algorithm and real ized using Programmable Devices. Its feature is, it is wel l designed for real ization concurrent architecture.

Abstract

Present invention provides Concurrent Architecture of Vedic Multiplier-An Accelerator Scheme for High Speed Computing. The methodblogy of applying Vedic fundamentals greatly optimizes the constraints like Power, Time, Area and Hardware Resource Utilization. And can be; proven for development of Efficient and Secured Templates. Vedic mathematics deeply removes the intermediate steps and gives direct output, for complex procedures like multiplication. The said procedure has wide applications in Encryption, Decryption, Image Processing, Signal Processing, Secured · Wireless Sensor j Network, Cloud Computing, Error Correction and Detection Modules and etc. all these applications has one common block of multiplier, which is complex procedure at hardware level. Hence, a need of high speed Multiplication can be fulfilled by implemented Novel Vedic Multiplier using blocks of concurrently executable hardware architecture like FPGA. Following invention is described in detail, with the -help of Figure 1 of sheet 1 showing the block diagram of the architecture of Vedic Multiplier.

Description

TITLE OF INVENTION
Concurrent Architecture of Vedic Multiplier-An Accelerator Scheme for High Speed Computing
Technical field of invention:
Present invention in general relates to develop a system and methodology for concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing and in particular to implement a system using high computing Field Programmable Gate Arrays.
Prior art:-
Very Large Scale Integration (VLSI) offers wide range of circuit to be integrated in a miniaturized form. They lend themselves to countless applications like Encryption, Decryption, Digital Image Processing, Signal Processing, Wireless Sensor Networks, Cloud Computing, Error Detection and Correction Schemes and almost in all fields. These applications offer numerous challenges due to their peculiarities like power utilization, Speed, Area, Hardware Resources, Memory Utilization, Memory Accessing Speed and etc. The Distinguished traits of applications have direct impact on these peculiarities which in fact, generate overheads, and make it difficult to attain.
Main issue with all above applications spins around designing multiplier. It is the component which, when implemented at hardware level, uses maximum hardware, affects the peculiarities of the applications. Multiplication Component (MC) is implemented using different algorithms: Shift-Add, Booth's Algorithm, Modified Booth's Algorithm, Carry Save Adders, Tree Multipliers, Array Multipliers and etc. all of which are hardware intensive which hampers the peculiarities. Hence need of high speed multiplier raises. US 9063870 B l discloses large multiplier for programmable logic device. A plurality of specialized processing blocks in a programmable logic device, including multipliers and circuitry for adding results of those multipliers, can be configured as a larger multiplier by adding to the specialized processing blocks selectable circuitry for shifting multiplier results before adding. In. one embodiment, this allows all but the final addition to take place in specialized processing blocks, with the final addition occurring in programmable logic. In another embodiment, additional compression and adding circuitry allows even the final addition to occur in the specialized processing blocks. Circuitry that controls when an input is signed or unsigned facilitates complex arithmetic.
US 8271570 B2 discloses Unified integer/galois field (2m) multiplier-architecture for elliptic-curve cryptography. A unified integer/Galois-Field 2m multiplier performs multiply operations for public-key systems such as Rivert, Shamir, Aldeman (RSA), Diffie-Hellman key exchange (DH) and Elliptic Curve Cryptosystem (ECC). The multiply operations may be performed on prime fields and different composite binary fields in independent multipliers in an interleaved fashion.
US 6760742 B l describes Multi-dimensional galois field multiplier. An implementation of a multi-dimensional Galois field multiplier and a method of Galois field multi-dimensional multiplication which are able to support many communication standards having various symbol sizes, different GFs, and different primitive polynomials, in a cost-efficient manner is disclosed. The key to allow a single implementation to perform for all different GF sizes is to align the input data such that the Galois fie ld symbols of the operands are aligned to the left most significant bit (MSB) position of the input data field. Similarly, the primitive polynomial used to create a selected Galois field is aligned to the left MSB position. A polynomial multiply is performed. The product polynomial is then conditionally divided by the primitive polynomial starting with the most significant bit, the condition being if the left most bit of the product is a 1. In other words, if the product polynomial has an MSB of 1, then divide the product with the primitive polynomial. Perform this step until the MSB is 0. In addition, for fields smaller than a maximum size Galois field, the sequence of conditional divisions is further conditioned with a predetermined mask in dependence upon the size of the GF. The resultant product is aligned to the left MSB.
U.S. Pat. No. 7,024,444 B l, entitled "Split Multiplier Array and Method of Operation" and issued to Green, a multiplier circuit for use in a data processor is disclosed. The multiplier circuit comprises a partial products generating circuit that receives a multiplicand value and a multiplier value and generates a group of partial products. The multiplier circuit also comprises a split array for adding the partial products. A first summation array comprises a first group of adders that sum the even partial products to produce an even summation value. A second summation array comprises a second group of adders that sum the odd partial products to produce an odd summation value. The even and odd summation values are then summed to produce the output of the multiplier.
U.S. Pat. No. 7,506,017 B l, entitled "Verifiable Multimode Multipliers" and issued to Dupenloup, a verifiable duplex multiplier circuit is disclosed. In one mode, the circuitry of the duplex multiplier functions as an N-bit χ N-bit multiplier. In another mode, the circuitry of the duplex multiplier operates as dual (N/2)-bitx (N/2)-bit multipliers. Because the same circuitry can be used to serve as both an N*N multiplier and as dual N/2*N/2 multipliers, integrated circuit resources are conserved. The duplex multiplier circuitry uses an architecture that can be automatically synthesized using a logic synthesis tool. Verification operations can be performed using logic-equivalency error checking tools. Exhaustive verification is possible using this approach, even when relatively large duplex multipliers (e.g., duplex multipliers with N values of 16 or more) are used. Another US 8645448 B2 discloses Carry less multiplication unit. An apparatus having a carry less preformat unit, a Booth encoder, a compressor, a left shifter, and exclusive-OR logic is described. The carry less preformat unit receives a multiplier operand and partitions the multiplier operand into parts. The Booth encoder receives the parts and directs selection of first partial products of a multiplicand that do not reflect implicit carry operations. The compressor sums the first partial products via a configuration of carry save adders that generate sum bits and carry bits, where generation of the carry bits is disabled during execution of the carry less multiplication. The left shifter shifts bits of one or more outputs of the compressor. The exclusive-OR logic is coupled to the compressor and the left shifter, and is configured to execute an exclusive-OR function on the outputs to yield a carry less multiplication result.
Galois field multiply accumulator is described in US 7003715 B l . An OC- 192 front-end application-specific integrated circuit (ASIC) de-interleaves an OC-192 signal to create four OC-48 signals, and decodes error-correction codes embedded in each of the four OC-48 signals. The decoder generates a Bose- Chaudhuri-Hocquenghem (BCH) error polynomial in no more than 12 clock cycles. The decoder includes several Galois field multiply accumulators, and a state machine which controls the Galois field units. In the specific embodiment wherein the error-correction code is a BCH triple error-correcting code, four Galois field units are used to carry out only six equations to solve the error polynomial. The Galois field units are advantageously designed to complete a Galois field multiply/accumulate operation in a single clock cycle. The Galois field units may operate in multiply or addition pass-through modes.
Another US 6286024 B l discloses High-efficiency multiplier and multiplying method. Upon execution of four sets of m/2 bitxn/2 bit multiplication, four multiplicand selectors select m/2-bit multiplicands respectively and four multiplicator selectors select corresponding n/2-bit multiplicators respectively, then the selected m/2-bit multiplicands and n/2-bit multiplicators, are input into four multipliers, and then four sets of m/2 bitxn/2 bit multiplication are executed in parallel. Upon execution of m bitxn bit multiplication, the four multiplicand selectors select upper or lower m/2 -bit multiplicands respectively and the. four multiplicator selectors select upper or lower n/2-bit multiplicators respectively, then the selected m/2-bit multiplicands and n/2-bit multiplicators are input into the four multipliers respectively, then multiplication results of (lower m/2 bits of m bits)x(lower n/2 bits of n bits) and (upper m/2 bits of m bits)x(upper n/2 bits of n bits) out of four multiplication results of the four multipliers are connected by a connector, and then the connected multiplication results and the other two multiplication results are added by an adder with arranging in a predetermined bit location each other respectively.
Split radix multiplication is mentioned in US 20050102344 Al . A first number is multiplied by a second number, by representing the first number as a first set of one or more W-bit wide numbers, and representing the second number as a second set of one or more W-bit wide numbers. Each of the W-bit wide numbers from the first set is paired with each of the W-bit wide numbers from the second set. For each pair of W-bit wide numbers, a set of sub-partial products is generated. Combinations of the sub-partial products are formed such that each combination is representable by a W-bit wide lower partial product and a carry out term that has fewer than W bits. The W-bit wide lower partial products and the carry out terms are combined to form the product of the first number and the second number. The carry out term is advantageously representable by (W/2)+l bits.
A New VLSI Architecture of Parallel Multiplier-Accumulator Based on Radix-2 Modified Booth Algorithm is described. Hybrid architecture of Multiplication, addition and CSA generates overheads when designed with booth's algorithm. Proposed method generates several intermediate terms which increases computational overheads. Process like multiplication itself is cumbersome process when hybrid architecture is proposed becomes tedious job when actually implemented.
VLSI implementation of Vedic Mathematics and it's applications in RSA cryptosystems is disclosed. Different crypto systems are available out of which RSA is the first public key algorithm, which with Vedic computations becomes faster but lack at security level. Vedic math based crypto system can be compared with any other crypto systems based on several parameters. Matrix data or two dimensional data can be executed directly by considering row wise serial data base. Built in Self-test block can be added as a separate module for testing the system during boot up.
Efficient FPGA Based Matrix Multiplication using Mux and Vedic Multiplier is disclosed. No doubt the image is always represented in matrix form, but matrices can be assumed as the collection of one dimensional array. These arrays can be later on processed with Vedic multiplier. Alternative to this technique is use of conventional method of row and column consideration, but mul tiplied with Vedic multiplier only gives better result. FPGAs as we know are designed for parallel computing only hence designing multiple fast carry adders - never a matter as long as FPGA are equipped with.
Thus there is need to develop and design an efficient and high speed multiplier to avoid the limitations of the conventional system and methods. Hence the present invention implements a concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing. The proposed invention focuses on designing high speed multiplier using fundamentals of the Vedic Mathematics.
Object: Primary object of the present invention is to provide a system and methodology for concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing. Another object of the present invention is to provide a system which is implemented using high computing Field Programmable Gate Arrays famous for implementing concurrent architecture. Yet another object of the present invention is to design a high speed multiplier using fundamentals of the Vedic Mathematics. Yet another object of the present invention is to use Vedic- a Cosmic scheme for implementation of multiplier which is genuine, in sense, suitable to apply to different applications like Data Security, error correction and detection, DIP, DSP, WSN and etc. without using any supporting algorithms. Yet another object of the present invention is to provide concurrent and generic architecture that overcomes the overheads generated due to multi core and configurable architecture. Yet another object of the present invention is to provide flexible architecture so that data can be accepted in any format. Yet another object of the present invention is to provide a system in which diminishing use of virtual memories caused due to generation of several intermediate terms. Yet another object of the present invention is to optimized hardware generation due to minimal intermediate terms generation, which helps to optimize the parameters like hardware resource utilization, power utilization, timing constraints, area utilization, memory element utilization and etc.
9. Yet another object of the present invention is to provide higher speed, lower cost and less VLSI Area.
10. Yet another object of the present invention is to convert all dedicated blocks into unique/ genuine Vedic Multiplier Block.
1 1. Yet another object of the present invention is to develop N-Pipes of N-bit which dramatically change the computing speed.
12. Yet another object of the present invention is to include Built In Self Test (BIST) for self-testing purpose while boot up.
13. Yet another object of the present invention is to implement a system using Virtex/ Kintex family FPGAs, which are designed at lowest nm technology.
14. Yet another object of the present invention is to provide a novel inclusion in which it is possible to save the repeated execution.
15. Yet another object of the present invention is to provide Intelligent Prediction unit which bypasses the multiplication process, once executed for one data set.
16. Yet another object of the present invention is to provide modification to use as individual block used in PSOC.
Other objects, features and advantages will become apparent from detail description and appended claims to those skilled in art. STATEMENT:
Accordingly following invention provides a concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing. System is implemented using FPGAs, famous for implementing concurrent architecture. The system is suitable to apply to different applications like Data Security, error correction and detection, DIP, DSP, WSN and etc. without using any supporting algorithms. The methodology of applying Vedic fundamentals greatly optimizes the constraints like Power, Time, Area and Hardware Resource Utilization. Vedic mathematics deeply removes the intermediate steps and gives direct output, for complex procedures tike multiplication. In this, the user will have to first select the input data pattern. Here, user can input data in any format like binary, octal, decimal, hexa-decimal, or in matrix, in case of image processing or matrix multiplication unit, data packet in case of secured data communication and etc. Upon selection of the input data pattern, the unit will execute the first step of data conversion, this data conversion depends on the operation to be executed or the unit designed. Next immediate the input data pattern will be compared with the already stored data in the internal memory unit. If one particular input data combination is already executed then its output will be drawn to the output lines directly, without computing the data samples again for multiplication. If the input data sample is not available in the history unit, then its Vedic multiplication is executed and the input data sample and its multiplication result will be stored inside the history unit. With such novel inclusion it is possible to save the repeated execution. The preferred embodiment mainly comprises of input data pattern selector, data conversion unit, internal history unit, programmable logic device, display unit, power supply unit, centralized clock unit, etc. BRIEF DESCRIPTION OF DRAWING:
This invention is described by way of example with reference to the following drawing where,
Figure 1 of sheet 1 shows the block diagram of the architecture of Vedic
Multiplier.
Where,
1 denotes Input data selector unit
2 denotes Data conversion unit
3 denotes Vedic Multiplier
4 denotes Internal data storage for creating history of input and output combination
5 denotes Display device
6 denotes Power supply to all units
7 denotes Centralized clock to all synchronous devices.
In order that the manner in which the above-cited and other advantages and objects of the invention are obtained, a more particular description of the invention briefly described above will be referred, which are illustrated in the appended drawing. Understanding that these drawing depict only typical embodiment of the invention and therefore not to be considered limiting on its scope, the invention will be described with additional specificity and details through the use of the accompanying drawing.
Detailed description:
Vedic Mathematics is the ancient procedure of calculations. Vedic means derived from Vedas. Basic and Upa Sutras (Method) in the Vedic mathematics helps in solving almost all the numeric computations in easy and less time. The sutra used in proposed title is the Urdhva - Tiryagbhyam (Vertical and Crosswise). It is general formula which is applicable to all cases of multiplications.
In the proposed title concurrent architecture of Vedic multiplication block is suggested. System is implemented using high computing Field Programmable Gate Arrays. FPGAs are renowned for implementation of concurrent architecture. Which are configured using Hardware Description Language (HDL). Difference between HDL and other language is that HDL is concurrent language that means all written statements and constructs will be executed simultaneously, requisite while describing any hardware at physical level.
Vedic Multiplier:-
Multiplication is greatly used arithmetic operation that figures prominently in Cryptography, Image Processing, Signal Processing, WSN, Cloud, EDC Systems and etc. The predominant approach in design of classic multiplication components are Shift- Add technique, Array Multiplications, Booth's and Modified Booth's algorithms and etc. all these techniques are hardware intensive, at physical level, main criteria of interest are higher speed, lower cost and less VLSI Area.
Based on the observations, the proposed ! work investigates Vedic multiplier component. Vedic mathematics has a property that it is applicable to N-bit word size directly. This feature makes it more suitable to design fast architecture because it diminishes the overheads generated because of generation of intermediate terms in classic multiplication blocks.
Vedic multiplier is implemented using high performance field programmable gate arrays. FPGAs are famous for implementing parallel executable architecture. FPGA is to be configured using Hardware Description Language (HDL) Platform. Best method of performance of the invention:-
The present invention d iscloses the "Concurrent Architecture of Vedic Multipl ier-An Accelerator Scheme for High Speed Computing". It includes module like: Input data pattern selector, Data conversion unit, internal history unit, programmable logic device, Display unit, Power Supply Unit, Centralized Clock Unit.
In this title, the user will have to first select the input data pattern. Here, user can input data in any format l ike binary, octal, decimal, hexa-decimal, or in matrix, in case of image processing or matrix multiplication unit, data packet in case of secured data communication and etc.
Upon selection of the input data pattern, the unit wil l execute the first step of data conversion, this data conversion depends on the operation to be executed or the unit designed. Next immediate the input data pattern will be compared with the already stored data in the internal memory un it.
This is intelligent step, where, if one particular input data combination is already executed then it's output will be drawn to the output l ines di rectly, without computing the data samples again for mu ltiplication. If the input data sample is not available in the history unit, then it's vedic multiplication is executed and the input data sample and its multiplication result will be stored inside the history unit. With such novel inclusion it is possible to save the repeated execution.
The concurrent architecture of the vedic multiplier is implemented using vertical and horizontal multiplication algorithm and real ized using Programmable Devices. Its feature is, it is wel l designed for real ization concurrent architecture.
Finally computed data pattern is displayed on display unit which may be simple like 16x2 LCD character display or touch screen display or etc. In addition to this separate modules will be working for providing the power supply to all units and centralized clock distribution unit for synchronizing the different modules.
Additional advantages and modification will readily occur to those skilled in art. Therefore, the invention in its broader aspect is not limited to specific details and representative embodiments shown and described herein. Accordingly various modifications may be made without departing from the spirit or scope of the general invention concept as defined by the appended claims and their equivalents.

Claims

CLAIMS claim:-
1. A concurrent architecture of Vedic Multiplier- an accelerator scheme for high speed computing implemented using FPGAs and suitable to apply to different applications like Data Security, error correction and detection, DIP, DSP, W'SN and etc. without using any supporting algorithms and greatly optimizes the constraints like Power, Time, Area and Hardware Resource Utilization wherein the system mainly comprises of input data pattern selector, data conversion unit, internal history unit, programmable logic device, display' unit, power supply unit, centralized clock unit
2. In the system as claimed in claim 1 :
a) the user will have to first select the input data pattern where, user can input data in any format like binary , octal, decimal, hexa-decimal, or in matrix, in case of image processing or matrix multip lication unit, data packet in case of secured data communication and etc.
b) Upon selection of the input data pattern, the unit will execute the first step of data conversion,. this data conversion depends on the operation to be executed or the unit designed;
c) Next immediate the input data pattern will be compared with the already stored data in the internal memory unit;
d) If one particular input data combination is already executed then its output will be drawn to the output lines directly, without computing the data samples again for multiplication;
e) If the input data sample is not available in the history unit, then its Vedic multiplication is executed and the input data sample and its multiplication result will be stored inside the history unit;
f) With such novel inclusion it is possible to save the repeated execution.
3. The concurrent architecture of the vedic multiplier as claimed in claim 1 is implemented using vertical and horizontal multiplication algorithm and realized using Programmable Devices and its feature is, it is well designed for realization concurrent architecture.
4. In the system as claimed in claim 2 finally computed data pattern is displayed on display unit which may be simple like 16x2 LCD character display or touch screen display or etc.
5. In the system as claimed in claim 2 and 4 separate modules will be working for providing the power supply to all units and centralized clock distribution unit for synchronizing the different modules.
PCT/IN2016/000117 2015-08-30 2016-05-04 Concurrent architecture of vedic multiplier-an accelerator scheme for high speed computing WO2017037729A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN3315/MUM/2015 2015-08-30
IN3315MU2015 IN2015MU03315A (en) 2015-08-30 2016-05-04

Publications (1)

Publication Number Publication Date
WO2017037729A1 true WO2017037729A1 (en) 2017-03-09

Family

ID=54397353

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2016/000117 WO2017037729A1 (en) 2015-08-30 2016-05-04 Concurrent architecture of vedic multiplier-an accelerator scheme for high speed computing

Country Status (2)

Country Link
IN (1) IN2015MU03315A (en)
WO (1) WO2017037729A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109683523A (en) * 2018-12-25 2019-04-26 中国人民解放军96630部队 Accelerator control method and system based on programmable gate array FPGA
WO2022225996A3 (en) * 2021-04-19 2022-12-29 Kevin May Applied usage count reader

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KUMAR, VINAY: "ANALYSIS, VERIFICATION AND FPGA IMPLEMENTATION OF VEDIC MULTIPLIER WITH BIST CAPABILITY", THESIS REPORT FOR DEPARTMENT OF ELECTRONICS AND COMMUNICATION ENGINEERING, June 2009 (2009-06-01), Patiala, India, XP055366370 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109683523A (en) * 2018-12-25 2019-04-26 中国人民解放军96630部队 Accelerator control method and system based on programmable gate array FPGA
WO2022225996A3 (en) * 2021-04-19 2022-12-29 Kevin May Applied usage count reader

Also Published As

Publication number Publication date
IN2015MU03315A (en) 2015-09-11

Similar Documents

Publication Publication Date Title
Kuang et al. Low-cost high-performance VLSI architecture for Montgomery modular multiplication
JP2722413B2 (en) Implementation method of modular multiplication by Montgomery method
US7206410B2 (en) Circuit for the inner or scalar product computation in Galois fields
US7046800B1 (en) Scalable methods and apparatus for Montgomery multiplication
JP4180024B2 (en) Multiplication remainder calculator and information processing apparatus
US6687725B1 (en) Arithmetic circuit for finite field GF (2m)
Niasar et al. Optimized architectures for elliptic curve cryptography over Curve448
WO2017037729A1 (en) Concurrent architecture of vedic multiplier-an accelerator scheme for high speed computing
Zeghid et al. Speed/area-efficient ECC processor implementation over GF (2 m) on FPGA via novel algorithm-architecture co-design
US8527570B1 (en) Low cost and high speed architecture of montgomery multiplier
JP2004258141A (en) Arithmetic unit for multiple length arithmetic of montgomery multiplication residues
Ozcan et al. A high performance full-word Barrett multiplier designed for FPGAs with DSP resources
US6662201B1 (en) Modular arithmetic apparatus and method having high-speed base conversion function
CN109284085B (en) High-speed modular multiplication and modular exponentiation operation method and device based on FPGA
Elango et al. Hardware implementation of residue multipliers based signed RNS processor for cryptosystems
WO2000038047A1 (en) Circuit and method of cryptographic multiplication
Saldamli et al. Spectral modular exponentiation
Wang et al. A novel fast modular multiplier architecture for 8,192-bit RSA cryposystem
Tawalbeh Radix-4 asic design of a scalable montgomery modular multiplier using encoding techniques
EP1504338A1 (en) "emod" a fast modulus calculation for computer systems
Miyamoto et al. Systematic design of high-radix Montgomery multipliers for RSA processors
Wang et al. TCPM: A reconfigurable and efficient Toom-Cook-based polynomial multiplier over rings using a novel compressed postprocessing algorithm
Nedjah et al. Four hardware implementations for the m-ary modular exponentiation
Antao et al. Compact and flexible microcoded elliptic curve processor for reconfigurable devices
Wang et al. High radix montgomery modular multiplier on modern fpga

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16840995

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16840995

Country of ref document: EP

Kind code of ref document: A1