DE102022201693A1

DE102022201693A1 - EFFICIENT MONTGOMERY MULTIPLIER

Info

Publication number: DE102022201693A1
Application number: DE102022201693.9A
Authority: DE
Inventors: Adir Zevulun; Uria Basher; Nir Shmuel; Ben Witulski
Original assignee: Mellanox Technologies Ltd
Current assignee: Mellanox Technologies Ltd
Priority date: 2021-02-22
Filing date: 2022-02-18
Publication date: 2022-08-25
Also published as: DE102022201745A1

Abstract

Ein integrierte Montgomery-Berechnungsmaschine (IMCE) zum Multiplizieren von zwei Multiplikanden mit einer vordefinierten Zahl umfasst eine Carry-Save-Addierschaltung (CSA) und eine Steuerschaltung. Die CSA-Schaltung hat mehrere Eingänge und Ausgänge, darunter einen Summenausgang und einen Übertragsausgang. Die Steuerschaltung ist mit den Eingängen und den Ausgängen der CSA-Schaltung verbunden und so konfiguriert, dass sie die CSA-Schaltung in mindestens (i) einer ersten Einstellung, die einen Montgomery-Vorberechnungswert berechnet, und (ii) einer zweiten Einstellung, die eine Montgomery-Multiplikation der beiden Multiplikanden berechnet, betreibt.An integrated Montgomery calculation engine (IMCE) for multiplying two multiplicands by a predefined number includes a carry save adder circuit (CSA) and a control circuit. The CSA circuit has multiple inputs and outputs, including a sum output and a carry output. The control circuit is connected to the inputs and the outputs of the CSA circuit and is configured to operate the CSA circuit in at least (i) a first setting that calculates a Montgomery prediction value and (ii) a second setting that calculates a Montgomery multiplication of the two multiplicands operates.

Description

QUERVERWEIS AUF VERWANDTE ANMELDUNGENCROSS REFERENCE TO RELATED APPLICATIONS

Diese Anmeldung bezieht sich auf die US-Patentanmeldung Nr. 17/180,999 mit dem Titel „Fast Precomputation for Montgomery Multiplier“, die am 22. Februar 2021 eingereicht wurde und deren Offenbarung hier durch Bezugnahme aufgenommen ist.This application is related to U.S. Patent Application No. 17/180,999 entitled "Fast Precomputation for Montgomery Multiplier," filed February 22, 2021, the disclosure of which is incorporated herein by reference.

GEBIET DER ERFINDUNGFIELD OF THE INVENTION

Die vorliegende Erfindung bezieht sich allgemein auf die Montgomery-Arithmetik und insbesondere auf die Berechnung von Montgomery-Vorberechnungswerten und die Implementierung von Montgomery-Multiplikatoren und zugehörigen Schaltungen.The present invention relates generally to Montgomery arithmetic, and more particularly to the computation of Montgomery precompute values and the implementation of Montgomery multipliers and associated circuitry.

HINTERGRUND DER ERFINDUNGBACKGROUND OF THE INVENTION

In der Kryptographie sind Operationen wie die Modulo-Multiplikation und die Potenzierung großer ganzer Zahlen weit verbreitet. Es wurden mehrere Methoden zur schnellen Implementierung solcher Multiplikationen und Potenzierungen vorgeschlagen. Eine solche weit verbreitete Methode wurde 1985 von Peter Lawrence Montgomery vorgeschlagen und wird beispielsweise von Kork et al. in „Analyzing and Comparing Montgomery Multiplication Algorithms“, IEEE Micro 16(3), Juni 1996, Seiten 26-33, beschrieben, in der die Autoren mehrere Montgomery-Multiplikationsalgorithmen erörtern und den Platz- und Zeitbedarf für die beschriebenen Methoden im Detail analysieren.In cryptography, operations such as modulo multiplication and raising large integers to the power are widely used. Several methods for quickly implementing such multiplication and exponentiation have been proposed. Such a widely used method was proposed by Peter Lawrence Montgomery in 1985 and is used, for example, by Kork et al. in "Analyzing and Comparing Montgomery Multiplication Algorithms", IEEE Micro 16(3), June 1996, pages 26-33, in which the authors discuss several Montgomery multiplication algorithms and analyze in detail the space and time requirements for the described methods.

In „Modified Montgomery modular multiplication and RSA exponentiation techniques,“ IEE Proceedings on Computation Digital Techniques, Vol. 151, No. 6, November 2004, präsentieren Mclvor et al. eine modifizierte Montgomery-Multiplikation und zugehörige modulare Rivest-Shamir-Adleman (RSA) Potenzierungsalgorithmen und Schaltungsarchitekturen, die Carry-Save-Addierer (CSAs) verwenden, um große Wortlängenadditionen durchzuführen. Der vorgestellte Ansatz basiert auf einer Neuformulierung der Lösung für die modulare Multiplikation im Kontext der RSA-Exponentierung und stellt zwei algorithmische Varianten vor, von denen eine auf einem Fünf-zu-Zwei-CSA und die andere auf einem Vier-zu-Zwei-CSA plus Multiplexer basiert.In "Modified Montgomery modular multiplication and RSA exponentiation techniques," IEE Proceedings on Computation Digital Techniques, Vol. 151, no. 6, November 2004, Mclvor et al. a modified Montgomery multiplication and associated modular Rivest-Shamir-Adleman (RSA) exponentiation algorithms and circuit architectures that use carry-save adders (CSAs) to perform large word-length additions. The presented approach is based on a reformulation of the solution for modular multiplication in the context of RSA exponentiation and presents two algorithmic variants, one based on a five-by-two CSA and the other on a four-by-two CSA plus multiplexer based.

ZUSAMMENFASSUNG DER ERFINDUNGSUMMARY OF THE INVENTION

Die Erfindung wird durch die Ansprüche definiert. Zur Veranschaulichung der Erfindung werden hier Aspekte und Ausführungsformen beschrieben, die in den Anwendungsbereich der Ansprüche fallen können oder auch nicht.The invention is defined by the claims. Aspects and embodiments are described herein for purposes of illustration of the invention, which may or may not fall within the scope of the claims.

Eine hier beschriebene Ausführungsform der vorliegenden Erfindung stellt eine Montgomery-Multiplikationsvorrichtung (MMA) zum Multiplizieren zweier Multiplikanden modulo einer vordefinierten Zahl bereit. Die MMA umfasst eine Vorberechnungsschaltung und eine Montgomery-Multiplikationsschaltung. Die Vorberechnungsschaltung ist so konfiguriert, dass sie einen Montgomery-Vorberechnungswert berechnet, indem sie eine Reihe von Iterationen durchführt. In einer gegebenen Iteration ist die Vorberechnungsschaltung so konfiguriert, dass sie einen oder mehrere Zwischenwerte modifiziert, indem sie bitweise Operationen an den in einer vorangegangenen Iteration berechneten Zwischenwerten durchführt. Die Montgomery-Multiplikationsschaltung ist so konfiguriert, dass sie die beiden Multiplikatoren modulo der vordefinierten Zahl multipliziert, indem sie eine Vielzahl von Montgomery-Reduktionsoperationen unter Verwendung des von der Vorberechnungsschaltung berechneten Montgomery-Vorberechnungswertes durchführt.An embodiment of the present invention described herein provides a Montgomery multiplier (MMA) for multiplying two multiplicands modulo a predefined number. The MMA includes a precalculation circuit and a Montgomery multiplication circuit. The precalculation circuit is configured to calculate a Montgomery precalculation value by performing a series of iterations. In a given iteration, the pre-calculation circuit is configured to modify one or more intermediate values by performing bit-wise operations on the intermediate values calculated in a previous iteration. The Montgomery multiplication circuit is configured to multiply the two multipliers modulo the predefined number by performing a plurality of Montgomery reduction operations using the Montgomery precalculation value calculated by the precalculation circuit.

In einigen Ausführungsformen beträgt der Montgomery-Vorberechnungswert mindestens zwei hoch die doppelte Anzahl von Bits der Montgomery-Multiplikatoren.In some embodiments, the Montgomery precalculation value is at least two to the power of twice the number of bits of the Montgomery multipliers.

In einigen Ausführungsformen ist die Vorberechnungsschaltung so konfiguriert, dass sie in der gegebenen Iteration eine bitweise Summe und einen bitweisen Übertrag modifiziert, indem sie bitweise Summen- und bitweise Übertragsoperationen an (i) der in der vorangegangenen Iteration berechneten bitweisen Summe, (ii) dem Zweifachen des in der vorangegangenen Iteration berechneten bitweisen Übertrags und (iii) einer Modulo-Korrekturzahl durchführt. In einem Ausführungsbeispiel ist die Vorberechnungsschaltung so konfiguriert, dass sie den Montgomery-Vorberechnungswert auf der Grundlage der Summe der bitweisen Summe und des doppelten bitweisen Übertrags nach einer letzten Iteration der Reihe von Iterationen berechnet. In einer anderen Ausführungsform ist die Vorberechnungsschaltung so konfiguriert, dass sie die Modulo-Korrekturzahl auf der Grundlage der Summe der bitweisen Summe und des doppelten bitweisen Übertrags berechnet, die in einer letzten Iteration berechnet wurden.In some embodiments, the pre-calculation circuit is configured to modify a bitwise sum and carry in the given iteration by performing bitwise sum and bitwise carry operations on (i) the bitwise sum calculated in the previous iteration, (ii) twice the bitwise carry calculated in the previous iteration and (iii) a modulo correction number. In one embodiment, the precalculation circuit is configured to calculate the Montgomery precalculation value based on the sum of the bitwise sum and the double bitwise carry after a last iteration of the series of iterations net. In another embodiment, the pre-calculation circuit is configured to calculate the modulo correction number based on the sum of the bitwise sum and the double bitwise carry calculated in a last iteration.

In einer weiteren Ausführungsform ist die Vorberechnungsschaltung so konfiguriert, dass sie die Modulo-Korrekturzahl in der gegebenen Iteration auf der Grundlage einer Differenz zwischen der Summe der bitweisen Summe und des bitweisen Übertrags, die in der vorhergehenden Iteration berechnet wurden, und der vordefinierten Zahl berechnet. In einer weiteren Ausführungsform ist die Vorberechnungsschaltung so konfiguriert, dass sie die Modulo-Korrekturzahl in der gegebenen Iteration auf der Grundlage einer Teilmenge der höchstwertigen Bits der Summe des bitweisen Übertrags und der bitweisen Summe, die in der vorhergehenden Iteration berechnet wurden, und einer Teilmenge der höchstwertigen Bits der vordefinierten Zahl berechnet.In another embodiment, the pre-calculation circuit is configured to calculate the modulo correction number in the given iteration based on a difference between the sum of the bitwise sum and the bitwise carry calculated in the previous iteration and the predefined number. In another embodiment, the pre-calculation circuit is configured to calculate the modulo correction number in the given iteration based on a subset of the most significant bits of the sum of the bitwise carry and the bitwise sum calculated in the previous iteration and a subset of the most significant bits of the predefined number.

In einer offenbarten Ausführungsform ist die Vorberechnungsschaltung so konfiguriert, dass sie die Modulo-Korrekturzahl in der gegebenen Iteration auf der Grundlage einer Teilmenge von Bits der Summe des bitweisen Übertrags und der bitweisen Summe, die in der vorhergehenden Iteration berechnet wurden, und auf einer Teilmenge der Bits der vordefinierten Zahl berechnet. In einer Ausführungsform ist die Vorberechnungsschaltung so konfiguriert, dass sie die Modulo-Korrekturzahl auf die vordefinierte Zahl multipliziert mit -1, -2 oder 0 setzt. In einer Ausführungsform umfasst die Vorberechnungsschaltung einen Carry-Save-Addierer (CSA), der so konfiguriert ist, dass er in der gegebenen Iteration eine bitweise Summe und einen bitweisen Übertrag von (i) der doppelten bitweisen Summe, die in der vorhergehenden Iteration berechnet wurde, (ii) dem doppelten bitweisen Übertrag, der in der vorhergehenden Iteration berechnet wurde, und (iii) einer Modulo-Korrekturzahl, die auf die vordefinierte Zahl multipliziert mit -1, -2 oder 0 gesetzt wurde, berechnet.In a disclosed embodiment, the pre-calculation circuit is configured to calculate the modulo correction number in the given iteration based on a subset of bits of the sum of the bitwise carry and the bitwise sum calculated in the previous iteration and on a subset of the Calculates bits of the predefined number. In one embodiment, the pre-calculation circuit is configured to set the modulo correction number to the predefined number multiplied by -1, -2 or 0. In one embodiment, the pre-calculation circuit comprises a carry save adder (CSA) configured to calculate in the given iteration a bitwise sum and a bitwise carry of (i) twice the bitwise sum calculated in the previous iteration , (ii) the double bitwise carry calculated in the previous iteration, and (iii) a modulo correction number set to the predefined number multiplied by -1, -2 or 0.

In einigen Ausführungsformen enthält die Vorberechnungsschaltung einen Carry-Save-Addierer (CSA) mit drei Eingängen, der so konfiguriert ist, dass er in der gegebenen Iteration eine bitweise Summe und einen bitweisen Übertrag aus (i) dem Zweifachen der in der vorangegangenen Iteration berechneten bitweisen Summe, (ii) dem Zweifachen des in der vorangegangenen Iteration berechneten bitweisen Übertrags und (iii) einer Modulo-Korrekturzahl berechnet, die auf die mit -1, -2 oder 0 multiplizierte vordefinierte Zahl konfiguriert ist. In anderen Ausführungsformen umfasst die Vorberechnungsschaltung einen Carry-Save-Addierer (CSA) mit vier Eingängen, der so konfiguriert ist, dass er in der gegebenen Iteration eine bitweise Summe und einen bitweisen Übertrag von (i) dem Zweifachen der in der vorangegangenen Iteration berechneten bitweisen Summe berechnet, (ii) dem Zweifachen des in der vorhergehenden Iteration berechneten bitweisen Übertrags, (iii) einer ersten Modulo-Korrekturzahl, die auf die vordefinierte Zahl, multipliziert mit -1 oder 0, gesetzt wird, und (iv) einer zweiten Modulo-Korrekturzahl, die auf die vordefinierte Zahl, multipliziert mit -2 oder 0, gesetzt wird.In some embodiments, the pre-calculation circuit includes a three-input carry-save adder (CSA) configured to calculate, in the given iteration, a bit-wise sum and bit-wise carry of (i) twice the bit-wise calculated in the previous iteration sum, (ii) twice the bitwise carry calculated in the previous iteration, and (iii) a modulo correction number configured to the predefined number multiplied by -1, -2 or 0. In other embodiments, the pre-calculation circuit comprises a four-input carry-save adder (CSA) configured to calculate in the given iteration a bit-wise sum and a bit-wise carry of (i) twice the bit-wise calculated in the previous iteration sum, (ii) twice the bitwise carry calculated in the previous iteration, (iii) a first modulo correction number set to the predefined number multiplied by -1 or 0, and (iv) a second modulo Correction number that is set to the predefined number multiplied by -2 or 0.

In einigen Ausführungsformen sind die Vorberechnungsschaltung und die Montgomery-Multiplikationsschaltung in einer Netzwerkvorrichtung enthalten und so konfiguriert, dass sie eine kryptografische Operation der Netzwerkvorrichtung durchführen.In some embodiments, the pre-calculation circuit and the Montgomery multiplication circuit are included in a network device and configured to perform a cryptographic operation of the network device.

Zusätzlich wird gemäß einer Ausführungsform der vorliegenden Erfindung ein Verfahren zur Multiplikation zweier Multiplikanden mit einer vordefinierten Zahl bereitgestellt. Das Verfahren umfasst unter Verwendung einer Vorberechnungsschaltung die Berechnung eines Montgomery-Vorberechnungswerts durch Ausführen einer Reihe von Iterationen, wobei in einer gegebenen Iteration ein oder mehrere Zwischenwerte durch Ausführen bitweiser Operationen an den in einer vorhergehenden Iteration berechneten Zwischenwerten modifiziert werden. Unter Verwendung einer Montgomery-Multiplikationsschaltung werden die beiden Multiplikanden modulo der vordefinierten Zahl multipliziert, indem eine Vielzahl von Montgomery-Reduktionsoperationen unter Verwendung des von der Vorberechnungsschaltung berechneten Montgomery-Vorberechnungswertes durchgeführt wird.Additionally, according to an embodiment of the present invention, a method for multiplying two multiplicands by a predefined number is provided. The method includes, using a precalculation circuit, calculating a Montgomery precalculation value by performing a series of iterations, wherein in a given iteration one or more intermediate values are modified by performing bitwise operations on the intermediate values calculated in a previous iteration. Using a Montgomery multiplication circuit, the two multiplicands are multiplied modulo the predefined number by performing a plurality of Montgomery reduction operations using the Montgomery precalculation value calculated by the precalculation circuit.

Ferner wird gemäß einer Ausführungsform der vorliegenden Erfindung eine integrierte Montgomery-Berechnungsmaschine (IMCE) zum Multiplizieren zweier Multiplikanden mit einer vordefinierten Zahl bereitgestellt. Die IMCE umfasst eine Carry-Save-Addierer-Schaltung (CSA) und eine Steuerschaltung. Die CSA-Schaltung hat mehrere Eingänge und Ausgänge, darunter einen Summenausgang und einen Übertragsausgang. Die Steuerschaltung ist mit den Eingängen und den Ausgängen der CSA-Schaltung gekoppelt und so konfiguriert, dass sie die CSA-Schaltung in mindestens (i) einer ersten Einstellung, die einen Montgomery-Vorberechnungswert berechnet, und (ii) einer zweiten Einstellung, die eine Montgomery-Multiplikation der beiden Multiplikanden berechnet, betreibt.Further in accordance with an embodiment of the present invention, an integrated Montgomery calculation engine (IMCE) for multiplying two multiplicands by a predefined number is provided. The IMCE includes a carry-save adder (CSA) circuit and a control circuit. The CSA circuit has multiple inputs and outputs, including a sum output and a carry output. The control circuit is coupled to the inputs and the outputs of the CSA circuit and is configured to operate the CSA circuit in at least (i) a first setting that calculates a Montgomery prediction value and (ii) a second setting that calculates a Montgomery multiplication of the two multiplicands operates.

In einigen Ausführungsformen ist die Steuerschaltung so konfiguriert, dass sie den Summenausgang und den Übertragsausgang der CSA-Schaltung logisch verschiebt und den verschobenen Summenausgang und den verschobenen Übertragsausgang mit den jeweiligen Eingängen der CSA-Schaltung koppelt. In einem Ausführungsbeispiel ist die Steuerschaltung so konfiguriert, dass sie den Summenausgang und den Übertragsausgang der CSA-Schaltung in der ersten Einstellung logisch nach links verschiebt und den Summenausgang und den Übertragsausgang der CSA-Schaltung in der zweiten Einstellung logisch nach rechts verschiebt.In some embodiments, the control circuit is configured to logically shift the sum output and the carry-out of the CSA circuit and to couple the shifted sum output and the shifted carry-out to the respective inputs of the CSA circuit. In one embodiment, the control circuit is configured to logically left shift the sum output and carry output of the CSA circuit in the first setting and logically right shift the sum output and carry output of the CSA circuit in the second setting.

In einer Ausführungsform ist die Steuerschaltung in der ersten Einstellung so konfiguriert, dass sie zwei der Eingänge der CSA-Schaltung auf einen konstanten Wert setzt, der von der vordefinierten Zahl abhängt. In einer anderen Ausführungsform ist der Steuerschaltkreis in der ersten Einstellung so konfiguriert, dass er einen Eingang des CSA-Schaltkreises auf die vordefinierte Zahl oder auf Null setzt, abhängig von den höchstwertigen Bits des Summenausgangs und des Übertragsausgangs des CSA-Schaltkreises und von den beiden Multiplikanden. In einer weiteren Ausführungsform ist die Steuerschaltung in der zweiten Einstellung so konfiguriert, dass sie einen Eingang der CSA-Schaltung auf Null oder auf einen der Multiplikanden setzt, abhängig von dem anderen der Multiplikanden. In einer offenbarten Ausführungsform ist die Steuerschaltung so konfiguriert, dass sie in der zweiten Einstellung einen Eingang der CSA-Schaltung auf Null oder auf die vordefinierte Zahl setzt, abhängig von den niederwertigsten Bits des Summenausgangs, des Übertragsausgangs und der beiden Multiplikanden.In one embodiment, in the first setting, the control circuit is configured to set two of the inputs of the CSA circuit to a constant value that depends on the predefined number. In another embodiment, the control circuit is configured in the first setting to set an input of the CSA circuit to the predefined number or to zero, depending on the most significant bits of the sum output and the carry output of the CSA circuit and on the two multiplicands . In another embodiment, in the second setting, the control circuit is configured to set an input of the CSA circuit to zero or to one of the multiplicands depending on the other of the multiplicands. In a disclosed embodiment, the control circuit is configured to set an input of the CSA circuit to zero or to the predefined number in the second setting depending on the least significant bits of the sum output, the carry output and the two multiplicands.

In einigen Ausführungsformen ist die Steuerschaltung so konfiguriert, dass sie die CSA-Schaltung in einer dritten Einstellung betreibt, die eine Potenzierung einer vordefinierten Basis durch einen vordefinierten Exponenten, modulo der vordefinierten Zahl, berechnet. In einer Ausführungsform ist die Steuerschaltung so konfiguriert, dass sie die CSA-Schaltung in der dritten Einstellung betreibt, indem sie die erste Einstellung und die zweite Einstellung in einer Reihenfolge anwendet, die entsprechend dem Exponenten definiert ist.In some embodiments, the control circuitry is configured to operate the CSA circuitry in a third setting that computes a predefined base raised to a power by a predefined exponent, modulo the predefined number. In one embodiment, the control circuit is configured to operate the CSA circuit in the third setting by applying the first setting and the second setting in an order defined according to the exponent.

In einigen Ausführungsformen sind die CSA und der Steuerschaltkreis in einem Netzwerkgerät enthalten und so konfiguriert, dass sie eine kryptografische Operation des Netzwerkgeräts durchführen.In some embodiments, the CSA and control circuitry are included in a network device and configured to perform a cryptographic operation of the network device.

Gemäß einer Ausführungsform der vorliegenden Erfindung wird auch ein Verfahren zum Multiplizieren zweier Multiplikanden modulo einer vordefinierten Zahl bereitgestellt. Das Verfahren umfasst den Betrieb einer Carry-Save-Addierschaltung (CSA) mit mehreren Eingängen und mit Ausgängen, die einen Summenausgang und einen Übertragsausgang umfassen. Unter Verwendung einer Steuerschaltung, die mit den Eingängen und den Ausgängen der CSA-Schaltung gekoppelt ist, wird die CSA-Schaltung so gesteuert, dass sie in mindestens (i) einer ersten Einstellung arbeitet, die einen Montgomery-Vorberechnungswert berechnet, und (ii) einer zweiten Einstellung arbeitet, die eine Montgomery-Multiplikation der beiden Multiplikanden berechnet.According to an embodiment of the present invention, there is also provided a method for multiplying two multiplicands modulo a predefined number. The method includes operating a multi-input carry-save adder (CSA) circuit having outputs that include a sum output and a carry output. Using a control circuit coupled to the inputs and the outputs of the CSA circuit, the CSA circuit is controlled to operate in at least (i) a first setting that calculates a Montgomery prediction value, and (ii) a second setup that computes a Montgomery multiplication of the two multiplicands.

In einigen Ausführungsformen umfasst die Steuerung der CSA-Schaltung die logische Verschiebung des Summenausgangs und des Übertragsausgangs der CSA-Schaltung und die Kopplung des verschobenen Summenausgangs und des verschobenen Übertragsausgangs mit entsprechenden Eingängen der CSA-Schaltung. In einem Ausführungsbeispiel umfasst die Steuerung der CSA-Schaltung die logische Linksverschiebung des Summenausgangs und des Übertragsausgangs der CSA-Schaltung in der ersten Einstellung und die logische Rechtsverschiebung des Summenausgangs und des Übertragsausgangs der CSA-Schaltung in der zweiten Einstellung.In some embodiments, controlling the CSA circuit includes logically shifting the sum output and the carry output of the CSA circuit and coupling the shifted sum output and the shifted carry output to corresponding inputs of the CSA circuit. In one embodiment, controlling the CSA circuit includes logically left shifting the sum output and carry output of the CSA circuit in the first setting and logically right shifting the sum output and carry output of the CSA circuit in the second setting.

In einer Ausführungsform umfasst die Steuerung der CSA-Schaltung in der ersten Einstellung das Setzen von zwei der Eingänge der CSA-Schaltung auf einen konstanten Wert, der von der vordefinierten Zahl abhängt. In einer anderen Ausführungsform umfasst die Steuerung der CSA-Schaltung in der ersten Einstellung das Setzen eines Eingangs der CSA-Schaltung auf die vordefinierte Zahl oder auf Null, abhängig von den höchstwertigen Bits des Summenausgangs und des Übertragsausgangs der CSA-Schaltung und von den beiden Multiplikanden. In einer weiteren Ausführungsform umfasst die Steuerung der CSA-Schaltung in der zweiten Einstellung das Setzen eines Eingangs der CSA-Schaltung auf Null oder auf einen der Multiplikanden in Abhängigkeit von dem anderen Multiplikanden. In einer offengelegten Ausführungsform umfasst die Steuerung der CSA-Schaltung in der zweiten Einstellung das Setzen eines Eingangs der CSA-Schaltung auf Null oder auf die vordefinierte Zahl, abhängig von den niederwertigsten Bits des Summenausgangs, des Übertragsausgangs und der beiden Multiplikanden.In one embodiment, controlling the CSA circuit in the first setting includes setting two of the inputs of the CSA circuit to a constant value that depends on the predefined number. In another embodiment, the control of the CSA circuit in the first setting comprises setting an input of the CSA circuit to the predefined number or to zero depending on the most significant bits of the sum output and the carry output of the CSA circuit and on the two multiplicands . In another embodiment, controlling the CSA circuit in the second setting includes setting an input of the CSA circuit to zero or to one of the multiplicands depending on the other multiplicand. In a disclosed embodiment controlling the CSA circuit in the second setting includes setting an input of the CSA circuit to zero or to the predefined number depending on the least significant bits of the sum output, the carry output and the two multiplicands.

In einigen Ausführungsformen umfasst die Steuerung des CSA-Schaltkreises ferner das Betreiben des CSA-Schaltkreises in einer dritten Einstellung, die eine Potenzierung einer vordefinierten Basis durch einen vordefinierten Exponenten, modulo der vordefinierten Zahl, errechnet. In einer Ausführungsform umfasst der Betrieb der CSA-Schaltung in der dritten Einstellung die Anwendung der ersten Einstellung und der zweiten Einstellung in einer Reihenfolge, die entsprechend dem Exponenten definiert ist.In some embodiments, controlling the CSA circuitry further includes operating the CSA circuitry in a third setting that exponentiates a predefined base a predefined exponent, modulo the predefined number, is calculated. In one embodiment, operation of the CSA circuit in the third setting includes applying the first setting and the second setting in an order defined according to the exponent.

In einigen Ausführungsformen werden Betrieb und Steuerung der CSA in einem Netzgerät zur Durchführung einer kryptografischen Operation des Netzgeräts durchgeführt.In some embodiments, operation and control of the CSA is performed in a network device to perform a cryptographic operation of the network device.

Jedes Merkmal eines Aspekts oder einer Ausführungsform kann auf andere Aspekte oder Ausführungsformen angewandt werden, und zwar in jeder geeigneten Kombination. Insbesondere kann jedes Merkmal eines Verfahrensaspekts oder einer Ausführungsform auf einen Geräteaspekt oder eine Ausführungsform angewandt werden und umgekehrt.Each feature of one aspect or embodiment may be applied to other aspects or embodiments, in any suitable combination. In particular, any feature of a method aspect or embodiment may be applied to an apparatus aspect or embodiment and vice versa.

Die vorliegende Erfindung wird aus der folgenden detaillierten Beschreibung der Ausführungsformen in Verbindung mit den Figuren, in denen sie dargestellt ist, besser verständlich:The present invention will be better understood from the following detailed description of the embodiments taken in conjunction with the figures in which it is illustrated:

Figurenlistecharacter list

1 Fig. 12 is a block diagram schematically showing a Montgomery multiplier (MMA) according to an embodiment of the present invention;
2 FIG. 12 is a block diagram schematically showing a Montgomery Precomputation Circuit (MPC) in the MMA of FIG 1 according to an embodiment of the present invention;
3 Fig. 12 is a flow chart that schematically illustrates a method for Montgomery precomputation according to an embodiment of the present invention;
4 Fig. 12 is a block diagram schematically showing an MMA with a precalculation circuit integrated into the Montgomery Calculation Engine according to an embodiment of the present invention;
5 Fig. 12 is a block diagram schematically showing an integrated Montgomery computation engine (IMCE) according to an embodiment of the present invention;
6 Fig. 12 is a flow diagram that schematically illustrates a method for Montgomery 4096-bit x 4096-bit multiplication according to an embodiment of the present invention; and
7 FIG. 12 is a flow chart that schematically illustrates a method for modulo exponentiation according to an embodiment of the present invention.

DETAILLIERTE BESCHREIBUNG DER AUSFÜHRUNGSFORMENDETAILED DESCRIPTION OF EMBODIMENTS

ÜBERSICHTOVERVIEW

Public-Key-Kryptosysteme können verwendet werden, um die Vertraulichkeit von Daten, die Authentifizierung des Autors und die Datenintegrität zu gewährleisten. Einige Public-Key-Kryptosysteme (z. B. Rivest-Shamir-Adleman (RSA)) beruhen auf der modularen Potenzierung großer Zahlen, die wiederholte modulare Multiplikationen erfordert. Um die Sicherheit zu erhöhen, sind die Operanden in der Regel weit über 1000 Bit lang, was die Rechenlast der Potenzierungsoperation erhöht.Public-key cryptosystems can be used to ensure data confidentiality, author authentication, and data integrity. Some public-key cryptosystems (e.g. Rivest-Shamir-Adleman (RSA)) rely on the modular exponentiation of large numbers, which requires repeated modular multiplications. To increase security, the operands are usually well over 1000 bits long, which increases the computational load of the exponentiation operation.

Ein typischer Algorithmus, der zur Verringerung des Rechenaufwands bei modularen Multiplikationen verwendet wird, ist der Montgomery-Algorithmus (beschrieben z.B. in dem oben genannten Artikel von Kork et al.). Der Montgomery-Multiplikationsalgorithmus ersetzt die Probedivision durch den Modulus durch eine Reihe von Additionen und Divisionen durch eine Zweierpotenz und ist heute der am häufigsten in RSA-Kryptosystemen verwendete Algorithmus.A typical algorithm used to reduce the computational cost of modular multiplication is the Montgomery algorithm (described e.g. in the above-mentioned article by Kork et al.). The Montgomery multiplication algorithm replaces the trial division by the modulus through a series of additions and divisions by a power of two and is the algorithm most commonly used in RSA cryptosystems today.

Der Montgomery-Algorithmus kann in Hardware oder Software implementiert werden. Typischerweise basieren Hardware-Implementierungen auf sich wiederholenden Operationen, denen eine Vorberechnung eines oder mehrerer Werte vorausgeht und auf die eine Carry-Propagate-Operation und eine abschließende Modulo-Korrektur folgen kann. Der vorberechnete Wert kann z.B. (2²ⁿ) %R sein, wobei n die Anzahl der Bits der Montgomery-Operanden ist, „%“ eine Modulo-Operation bezeichnet und R, der Divisor, eine vorgewählte Zahl ist (R < 2ⁿ).The Montgomery algorithm can be implemented in hardware or software. Typically, hardware implementations are based on repetitive operations that are preceded by a pre-computation of one or more values and may be followed by a carry-propagate operation and a final modulo correction. For example, the precomputed value may be (2 ²ⁿ ) %R, where n is the number of bits of the Montgomery operand, "%" denotes a modulo operation, and R, the divisor, is a preselected number (R < 2 ⁿ ).

Die hier beschriebenen Ausführungsformen der vorliegenden Erfindung bieten effiziente Verfahren und Vorrichtungen für die Berechnung der Montgomery-Vorberechnungswerte. In einigen offengelegten Ausführungsformen ist eine Montgomery-Multiplikationsvorrichtung (MMA) so konfiguriert, dass sie zwei Multiplikanden modulo mit einer vordefinierten Zahl multipliziert. In einigen Ausführungsformen umfasst die MMA eine Vorberechnungsschaltung und eine Montgomery-Multiplikationsschaltung. Die Vorberechnungsschaltung ist so konfiguriert, dass sie einen Montgomery-Vorberechnungswert berechnet, indem sie eine Reihe von Iterationen durchführt. In einer gegebenen Iteration modifiziert die Vorberechnungsschaltung einen oder mehrere Zwischenwerte, indem sie bitweise Operationen an den in einer vorangegangenen Iteration berechneten Zwischenwerten durchführt. In einer Ausführungsform modifiziert die Vorberechnungsschaltung in einer gegebenen Iteration eine bitweise Summe und einen bitweisen Übertrag, indem sie bitweise Summen- und bitweise Übertragsoperationen an (i) der in der vorhergehenden Iteration berechneten bitweisen Summe, (ii) dem doppelten bitweisen Übertrag, der in der vorhergehenden Iteration berechnet wurde, und (iii) einer Modulo-Korrekturzahl durchführt. Die Montgomery-Multiplikationsschaltung ist so konfiguriert, dass sie die beiden Multiplikanden modulo des Divisors multipliziert, indem sie eine Vielzahl von Montgomery-Reduktionsoperationen unter Verwendung des von der Vorberechnungsschaltung berechneten Montgomery-Vorberechnungswertes durchführt.The embodiments of the present invention described herein provide efficient methods and apparatus for computing the Montgomery prediction values. In some disclosed embodiments, a Montgomery multiplier (MMA) is configured to modulo multiply two multiplicands by a predefined number. In some embodiments, the MMA includes a precalculation circuit and a Montgomery multiplication circuit. The precalculation circuit is configured to calculate a Montgomery precalculation value by performing a series of iterations. In a given iteration, the pre-calculation circuit modifies one or more intermediate values by performing bit-wise operations on the intermediate values calculated in a previous iteration. In one embodiment, in a given iteration, the pre-calculation circuit modifies a bitwise sum and carry by performing bitwise sum and bitwise carry operations on (i) the bitwise sum calculated in the previous iteration, (ii) the double bitwise carry calculated in the previous iteration, and (iii) a modulo correction number. The Montgomery multiplication circuit is configured to multiply the two multiplicands modulo the divisor by performing a plurality of Montgomery reduction operations using the Montgomery precalculation value calculated by the precalculation circuit.

In einigen Ausführungsformen werden zu den Operanden der Vorberechnung und/oder der Montgomery-Multiplikation zwei weitere Bits hinzugefügt, um einen abschließenden Modulo-Korrekturschritt zu vermeiden; so werden für 4096-Bit-Arithmetik 4098-Bit-Operanden verwendet. Durch das Hinzufügen von zwei Bits wird auch ein Überlauf von Zwischenwerten verhindert.In some embodiments, two more bits are added to the operands of the precalculation and/or Montgomery multiplication to avoid a final modulo correction step; so for 4096-bit arithmetic, 4098-bit operands are used. The addition of two bits also prevents intermediate values from overflowing.

Andere Ausführungsformen gemäß der vorliegenden Erfindung, die hier vorgestellt werden, sehen eine integrierte Montgomery-Berechnungsmaschine (IMCE) vor, bei der die Vorberechnungsschaltung in die Montgomery-Multiplikationsschaltung eingebettet ist; in einer Ausführungsform werden dieselben Bit-weise-Summen- und Bitweise-Übertragungsschaltungen sowohl bei der Vorberechnung als auch bei der Montgomery-Multiplikation verwendet.Other embodiments according to the present invention presented herein provide an integrated Montgomery calculation engine (IMCE) in which the pre-calculation circuit is embedded in the Montgomery multiplication circuit; in one embodiment, the same bitwise sum and bitwise transfer circuits are used in both precomputation and Montgomery multiplication.

In einigen Ausführungsformen umfasst die IMCE einen CSA und einen Steuerschaltkreis. Der Steuerschaltkreis ist so konfiguriert, dass er den Betrieb des CSA in einer Vielzahl von Einstellungen steuert; in einer ersten Einstellung steuert der Steuerschaltkreis den CSA, um eine Montgomery-Vorberechnung durchzuführen; in einer zweiten Einstellung steuert der Steuerschaltkreis den CSA, um eine Montgomery-Multiplikation durchzuführen, und in einer dritten Einstellung steuert der Steuerschaltkreis den CSA, um eine Modulo-Potenzierung unter Verwendung einer Folge von Montgomery-Multiplikationen zu berechnen. In Ausführungsformen umfasst die Steuerschaltung eine erste Schaltung, die so konfiguriert ist, dass sie Rückschleifeingänge des CSA steuert, und eine zweite Schaltung, die den CSA (über die erste Schaltung) so konfigurieren kann, dass er eine Modulo-Potenzierung berechnet.In some embodiments, the IMCE includes a CSA and a control circuit. The control circuitry is configured to control operation of the CSA in a variety of settings; in a first setting, the control circuit controls the CSA to perform a Montgomery precalculation; in a second setting, the control circuitry controls the CSA to perform a Montgomery multiplication, and in a third setting, the control circuitry controls the CSA to calculate a modulo exponentiation using a sequence of Montgomery multiplications. In embodiments, the control circuit includes a first circuit configured to control loopback inputs of the CSA and a second circuit capable of configuring the CSA (via the first circuit) to calculate a modulo exponentiation.

In den nachstehend beschriebenen Beispielen beträgt die Anzahl der Bits des Montgomery-Operators 4098; die beschriebene Technik ist jedoch nicht auf 4098 Bits beschränkt; in alternativen Ausführungsformen kann jede andere geeignete Anzahl von Bits verwendet werden.In the examples described below, the number of bits of the Montgomery operator is 4098; however, the technique described is not limited to 4098 bits; in alternate embodiments, any other suitable number of bits may be used.

Die offenbarten MMAs und IMCEs können in eine Vielzahl von Host-Systemen eingebettet und in einer Vielzahl von Anwendungsfällen eingesetzt werden. Generell kann jedes System, das mit Montgomery-Multiplikation arbeitet, von den hier beschriebenen Techniken profitieren. Host-Systeme umfassen beispielsweise verschiedene Netzwerkgeräte wie Netzwerkadapter (z.B. Ethernet Network Interface Controllers (NICs), Infiniband Host Channel Adapters (HCAs), Data Processing Units (DPUs) oder „Smart-NICs“, netzwerkfähige Graphics Processing Units (GPUs)), Netzwerk-Switches und Router sowie Beschleuniger.The disclosed MMAs and IMCEs can be embedded in a variety of host systems and used in a variety of use cases. In general, any system that uses Montgomery multiplication can benefit from the techniques described here. For example, host systems include various network devices such as network adapters (e.g. Ethernet Network Interface Controllers (NICs), Infiniband Host Channel Adapters (HCAs), Data Processing Units (DPUs) or "Smart-NICs", network-capable Graphics Processing Units (GPUs)), network -Switches and routers as well as accelerators.

In einem beispielhaften Anwendungsfall wird ein offengelegtes MMA und/oder IMCE in ein Netzwerkgerät eingebettet und in einem sicheren Boot-Prozess des Netzwerkgeräts verwendet, z.B. zur Authentifizierung von Signaturen. In einem anderen Anwendungsbeispiel wird ein offengelegtes MMA und/oder IMCE in einen Netzwerkadapter eingebettet und zur Beschleunigung kryptografischer Operationen wie Public-Key-Operationen verwendet.In an exemplary use case, an exposed MMA and/or IMCE is embedded in a network device and used in a secure boot process of the network device, e.g. to authenticate signatures. In another use case, an exposed MMA and/or IMCE is embedded in a network adapter and used to accelerate cryptographic operations such as public key operations.

BESCHREIBUNG DES SYSTEMSDESCRIPTION OF THE SYSTEM

1 ist ein Blockdiagramm, das schematisch einen Montgomery-Multiplikationsapparat (MMA) 100 gemäß einer Ausführungsform der vorliegenden Erfindung zeigt. MMA 100 berechnet das Produkt von Zahlenpaaren modulo einer großen Primzahl N und umfasst eine Montgomery Calculation Engine (MCE) 102, eine Montgomery Precompute Unit (MPC) 104 und einen Prozessor 106. Die MCE 102 wird hier auch als Montgomery-Multiplikationsschaltung bezeichnet. Je nach anwendbarem Host-System und Anwendungsfall kann der Prozessor 106 beispielsweise aus einer CPU, einer GPU, einem System-on-Chip (SoC), einem Controller, einem digitalen Signalprozessor (DSP) oder einem anderen geeigneten Prozessortyp bestehen oder darin eingebettet sein. 1 12 is a block diagram that schematically shows a Montgomery multiplier (MMA) 100 according to an embodiment of the present invention. MMA 100 calculates the product of pairs of numbers modulo a large prime number N and includes a Montgomery Calculation Engine (MCE) 102, a Montgomery Precompute Unit (MPC) 104; and a processor 106. The MCE 102 is also referred to herein as a Montgomery multiplication circuit. Depending on the applicable host system and use case, processor 106 may consist of or be embedded in, for example, a CPU, GPU, system-on-chip (SoC), controller, digital signal processor (DSP), or other suitable type of processor.

MCE 102 ist so konfiguriert, dass er die Multiplikationsargumente A, B und den Divisor N vom Prozessor 106 und einen Vorberechnungswert 2^R%N vom MPC 104 erhält und das Produkt (A*B)%N an den Prozessor 106 ausgibt. MCE 102 kann ein Prozessor sein, der ein geeignetes Softwareprogramm ausführt, oder ein Hardware-Montgomery-Multiplikator (siehe z.B. „Montgomery Multiplier for Faster Cryptosystems“, von Thampi und Jose, Procedia Technology 25 (2016), Seiten 392-398). In einigen Ausführungsformen umfasst MCE 102 zusätzliche Schaltkreise, die auf der Montgomery-Multiplikation basierende Exponenten berechnen (siehe z.B. den oben zitierten Artikel von Mclvor et al.)MCE 102 is configured to receive the multiplication arguments A, B and the divisor N from the processor 106 and a precalculation value ^2R %N from the MPC 104 and to output the product (A*B)%N to the processor 106. MCE 102 may be a processor running an appropriate software program, or a hardware Montgomery multiplier (see eg "Montgomery Multiplier for Faster Cryptosystems", by Thampi and Jose, Procedia Technology 25 (2016), pages 392-398). In some embodiments, MCE 102 includes additional circuitry that computes exponents based on Montgomery multiplication (see, e.g., the Mclvor et al. article cited above).

Der MPC 104 ist so konfiguriert, dass er N und -N vom Prozessor 106 empfängt. N und -N werden typischerweise in n+2 Bits dargestellt, wobei n die Anzahl der Bits ist, die bei der Montgomery-Multiplikation verwendet werden (-N kann durch die Darstellung im Zweierkomplement dargestellt werden: -N=∼N+1 (N invers + 1)).The MPC 104 is configured to receive N and -N from the processor 106 . N and -N are typically represented in n+2 bits, where n is the number of bits used in Montgomery multiplication (-N can be represented by the two's complement representation: -N=∼N+1 (N inverse + 1)).

MPC 104 berechnet dann den Vorberechnungswert (2²ⁿ)%N und sendet das Ergebnis an MCE 102. In einer Ausführungsform umfasst der MPC einen Carry-Save-Addierer (CSA) mit drei oder vier Eingängen und schließt die Berechnung in einer Anzahl von Zyklen ab, die nahe an n - der Anzahl der Bits - liegt.MPC 104 then calculates the precalculation value (2 ²ⁿ )%N and sends the result to MCE 102. In one embodiment, the MPC includes a three or four input carry-save adder (CSA) and completes the calculation in a number of cycles , which is close to n - the number of bits.

Der Prozessor 106 ist so konfiguriert, dass er Operanden (Multiplikanden) an MCE 102 und MPC 104 sendet und das Multiplikationsergebnis von MCE 102 empfängt. In einigen Ausführungsformen wird der Prozessor 106 möglicherweise nicht benötigt, z.B. wenn MPC 104 einen Prozessor enthält.Processor 106 is configured to send operands (multiplicands) to MCE 102 and MPC 104 and to receive MCE 102 the multiplication result. In some embodiments, processor 106 may not be required, e.g., where MPC 104 includes a processor.

Die Konfiguration der MMA 100 ist eine Beispielkonfiguration, die lediglich aus Gründen der konzeptionellen Klarheit dargestellt ist. Andere geeignete Konfigurationen können in alternativen Ausführungsformen der vorliegenden Erfindung verwendet werden. In einigen Ausführungsformen ist beispielsweise ein einzelner MPC so konfiguriert, dass er Werte für eine Vielzahl von MCEs vorberechnet. In einem anderen Beispiel ist der MPC 104 so konfiguriert, dass er -N durch Zweierkomplementierung von N berechnet; daher sendet der Prozessor 106 kein -N an den MPC 104.The configuration of the MMA 100 is an example configuration presented for conceptual clarity only. Other suitable configurations may be used in alternative embodiments of the present invention. For example, in some embodiments, a single MPC is configured to pre-compute values for a plurality of MCEs. In another example, MPC 104 is configured to compute -N by two's complement of N; therefore, processor 106 does not send -N to MPC 104.

In einigen Ausführungsformen besteht der Prozessor 106 und/oder der MPC 104 aus einem Mehrzweckprozessor, der mit Software programmiert ist, um die hier beschriebenen Funktionen auszuführen. Die Software kann in elektronischer Form auf den Prozessor heruntergeladen werden, z.B. über ein Netzwerk oder von einem Host, oder sie kann alternativ oder zusätzlich auf nicht-übertragbaren, greifbaren Medien wie einem magnetischen, optischen oder elektronischen Speicher bereitgestellt und/oder gespeichert werden.In some embodiments, processor 106 and/or MPC 104 consists of a general purpose processor programmed with software to perform the functions described herein. The software may be downloaded to the processor in electronic form, e.g., over a network or from a host, or alternatively or additionally, may be provided and/or stored on non-portable, tangible media such as magnetic, optical, or electronic storage.

2 ist ein Blockdiagramm, das schematisch die Montgomery-Vorberechnungsschaltung (MPC) 104 gemäß einer Ausführungsform der vorliegenden Erfindung zeigt. Der MPC umfasst einen Carry-Save-Adder (CSA) 200 mit vier Eingängen, der so konfiguriert ist, dass er vier Eingänge (mit In[0] bis In[3] bezeichnet) summiert. Der Wert von -N (N ist der Modulo-Divisor) wird in den MPC eingegeben (z.B. vom Prozessor 106, 1) und an einen R_0-Eingang eines UND-Gates 202 und einen R_1-Eingang eines UND-Gates 204 angelegt. Die UND-Gates 202 und 204 sind so konfiguriert, dass sie den Eingang -N an die Eingänge in[0] bzw. in [1] der CSA übertragen, wenn sie aktiviert sind, und andernfalls einen Wert von 0 übertragen. (Die Freigabeeingänge der UND-Gates 202 und 204 werden als en_0 bzw. en_1 bezeichnet). 2 FIG. 12 is a block diagram that schematically shows the Montgomery Precomputation Circuit (MPC) 104 according to an embodiment of the present invention. The MPC includes a four-input carry-save adder (CSA) 200 configured to sum four inputs (designated In[0] through In[3]). The value of -N (N is the modulo divisor) is input to the MPC (eg, from processor 106, 1 ) and applied to an R_0 input of an AND gate 202 and an R_1 input of an AND gate 204. AND gates 202 and 204 are configured to transfer the -N input to inputs in[0] and in[1] of the CSA, respectively, when activated, and to transfer a value of 0 otherwise. (The enable inputs of AND gates 202 and 204 are labeled en_0 and en_1, respectively).

Es ist zu beachten, dass, wenn sowohl en_0 als auch en_1 ausgeschaltet sind (z.B. bei Logik-0), CSA 200 einen kombinierten Wert von 0 am Eingang in[0] und in[1] erhält; wenn eines von en_0, en_1 eingeschaltet ist, erhält CSA einen kombinierten Wert von -N, und wenn beide en_0, en 1 eingeschaltet sind, erhält CSA einen kombinierten Wert von -2N.Note that when both en_0 and en_1 are off (e.g. at logic 0), CSA 200 gets a combined value of 0 at input in[0] and in[1]; if one of en_0, en_1 is on, CSA gets a combined value of -N, and if both en_0, en_1 are on, CSA gets a combined value of -2N.

Zwei Register - ein R_C-Register 206 und ein R_S-Register 208 - sind so konfiguriert, dass sie den Übertrags- bzw. den Summenausgang von CSA 200 speichern. Die in R_C 206 gespeicherten Daten können über einen Verschieber 210 an den in[3]-Eingang von CSA 200 zurückgeführt werden, während die in R_S 208 gespeicherten Daten über einen Verschieber 212 an den in[2]-Eingang geleitet werden können. Die Verschieber 210 und 212 sind so konfiguriert, dass sie mit zwei multiplizieren, indem sie die Daten um eine Position nach links verschieben (das ganz rechte Ausgangsbit wird auf Logik-0 gesetzt).Two registers - an R_C register 206 and an R_S register 208 - are configured to store the carry and sum output of CSA 200, respectively. The data stored in R_C 206 may be fed back to the in[3] input of CSA 200 via a shifter 210, while the data stored in R_S 208 may be fed via a shifter 212 to the in[2] input. Shifters 210 and 212 are configured to multiply by two by shifting the data left one position (the rightmost output bit is set to logic 0).

MPC 104 umfasst ferner eine Steuereinheit 214, die so konfiguriert ist, dass sie die Eingänge en_0 und en_1 der UND-Gates 202 und 204 steuert. Wie weiter unten (unter Bezugnahme auf 3) beschrieben wird, werden in Ausführungsformen nur einige der höherwertigen Bits (z.B. die fünf höchstwertigen Bits) von N und R_SC in die Steuereinheit 214 eingegeben.MPC 104 further includes a controller 214 configured to control inputs en_0 and en_1 of AND gates 202 and 204 . As below (referring to 3 ), only some of the most significant bits (eg, the five most significant bits) of N and R_SC are input to controller 214 in embodiments.

In einer Ausführungsform umfasst der von MPC 104 durchgeführte Vorberechnungsprozess eine Carry-Save-Phase, in der CSA 200 eine Summen- und Übertragdarstellung des Vorberechnungswertes erzeugt, und eine Carry-Propagate-Phase, in der die Summe und der Übertrag (die in R_S 208 bzw. R_C 206 gespeichert sind) addiert werden, um den Vorberechnungswert P = 2²ⁿ%N zu erzeugen. Gemäß dem in 2 dargestellten Ausführungsbeispiel umfasst MC 104 einen Volladdierer 216, der so konfiguriert ist, dass er die in R_S 208 und R_C 206 gespeicherten Werte addiert, um den Vorberechnungswert P zu erzeugen. In einem Ausführungsbeispiel umfasst der Volladdierer 216 64 Bits und kann eine 4096-Bit-Addition in 64 Zyklen durchführen (wie nachstehend beschrieben wird, können in der CSA zwei weitere Bits benötigt werden, so dass der Volladdierer 216 65 Zyklen benötigt, um die 4098-Bit-Addition auszuführen).In one embodiment, the precalculation process performed by MPC 104 includes a carry-save phase in which CSA 200 generates a sum and carry representation of the precalculation value, and a carry-propagate phase in which the sum and carry (those stored in R_S 208 and R_C 206 respectively) are added to produce the precalculation value P = 2 ²ⁿ %N. According to the 2 In the embodiment shown, MC 104 includes a full adder 216 configured to add the values stored in R_S 208 and R_C 206 to generate the precalculation value P . In one embodiment, full adder 216 is 64 bits and can perform a 4096-bit addition in 64 cycles (as described below, two more bits may be needed in the CSA, so full adder 216 takes 65 cycles to complete the 4098 perform bit addition).

Zusammenfassend lässt sich sagen, dass MPC 104 P = 2²ⁿ%N in einer iterativen Carry-Save-Phase berechnet, gefolgt von einer iterativen Carry-Propagate-Phase. In der Carry-Save-Phase berechnet ein CSA mit 4 Eingängen iterativ P durch Carry-Save-Addition eines Werts von 0, -N oder -2N und der linksverschobenen Carry- und Save-Ergebnisse der vorherigen Iteration. In der Carry-Propagate-Phase summiert ein Volladdierer iterativ den Übertrag und die Summe der Carry-Save-Phase, um P zu erzeugen.In summary, MPC 104 computes P=2 ²ⁿ %N in an iterative carry-save phase, followed by an iterative carry-propagate phase. In the carry-save phase, a 4-input CSA iteratively computes P by adding a carry-save value of 0, -N, or -2N and the left-shifted carry and save results of the previous iteration. In the carry-propagate phase, a full adder iteratively sums the carry and sum of the carry-save phase to produce P.

Wie zu erkennen ist, handelt es sich bei der Konfiguration des MPC 104 um eine Beispielkonfiguration, die lediglich aus Gründen der konzeptionellen Klarheit dargestellt ist. Andere geeignete Konfigurationen können in alternativen Ausführungsformen der vorliegenden Erfindung verwendet werden. So kann beispielsweise ein CSA mit drei statt vier Eingängen verwendet werden, wobei die UND-Gates 202, 204 durch einen Multiplexer ersetzt werden, der so konfiguriert ist, dass er 0, -N oder -2N an einen einzigen CSA-Eingang ausgibt, der in[0] und in[1] ersetzt. In einer Ausführungsform sind die Schieber 210 und/oder 212 möglicherweise nicht erforderlich; stattdessen können R_S und R_C mit in [2] und in [3] in einer verschobenen Weise vernetzt werden (z.B. R-S[0] mit in [2] [1], R-S[1] mit in[2] [2] usw.).As can be appreciated, the configuration of the MPC 104 is an example configuration, shown for conceptual clarity only. Other suitable configurations may be used in alternative embodiments of the present invention. For example, a three-input CSA can be used instead of four, replacing the AND gates 202, 204 with a multiplexer configured to output 0, -N, or -2N to a single CSA input that replaced in[0] and in[1]. In one embodiment, sliders 210 and/or 212 may not be required; instead, R_S and R_C can be meshed with in [2] and in [3] in a shifted fashion (e.g. R-S[0] with in [2][1], R-S[1] with in[2][2], etc. ).

SPEICHERN EINER LETZTEN SUBTRAKTIONSSTUFESAVING A LAST STAGE OF SUBTRACTION

Nach dem ursprünglichen Montgomery-Papier und seiner frühen Umsetzung folgt auf eine Montgomery-Multiplikation ein letzter Schritt, in dem eine Modulo-Korrektur des Ergebnisses C vorgenommen wird: $wenn (C > N) C = C - N .$

According to the original Montgomery paper and its early implementation, a Montgomery multiplication is followed by a final step in which a modulo correction of the result C is made:

if (C > N) C = C - N .

Dieser Vorgang ist relativ kostspielig, da er eine vollständige Übertragsfortpflanzung erfordert. Außerdem kann ein Hacker, der versucht, den Schlüssel zu finden, durch externe Messung der Anzahl der Montgomery-Multiplikationszyklen ableiten, ob eine Modulo-Korrektur erforderlich war, was den Bereich der möglichen Schlüsselwerte einschränkt. In einem Artikel von Walter mit dem Titel „Montgomery exponentiation needs no final subtractions“, Electronics Letters, 35(21), 1999, zeigt der Autor jedoch, wie die abschließende Modulokorrektur vermieden werden kann, wenn die Anzahl der Bits in der Montgomery-Multiplikation um 2 erhöht wird. Die folgende Tabelle beschreibt die Unterschiede zwischen dem ursprünglichen Montgomery-Algorithmus und Walters Vorschlag: Parameter Wert Breite Beschreibung Montgomery Walter n 4,096 Restbreite = 4.096 A [n-1:0] B [n-1:0] N [n-1:0] Rest R R = 2ⁿ⁺² [n:0] [n+2:0] Grenze R' R' = (R ²)mod(N) [n-1: 0] Vorberechnung Schleife n n+2 This process is relatively expensive since it requires full carry propagation. Also, a hacker trying to find the key can deduce whether modulo correction was required by externally measuring the number of Montgomery multiplication cycles, which limits the range of possible key values. However, in an article by Walter entitled "Montgomery exponentiation needs no final subtractions", Electronics Letters, 35(21), 1999, the author shows how the final modulo correction can be avoided if the number of bits in the Montgomery multiplication is increased by 2. The following table describes the differences between the original Montgomery algorithm and Walter's proposal: parameter value Broad description Montgomery Walter n 4,096 Remaining width = 4,096 A [n-1:0] B [n-1:0] N [n-1:0] rest R R = ^{2n +2} [n:0] [n+2:0] Border R' R ' = ( R ² ) mod ( N ) [n-1: 0] precalculation Ribbon n n+2

So berechnet MPC 104 in einigen Ausführungsformen einen Vorberechnungswert, bei dem der Exponent größer als 2n ist, z.B. berechnet er R = 2²⁽ⁿ⁺²⁾.Thus, in some embodiments, MPC 104 calculates a precalculation value where the exponent is greater than 2n, eg calculates R=2 ²⁽ⁿ⁺²⁾ .

3 ist ein Flussdiagramm 300, das schematisch ein Verfahren zur Montgomery-Vorberechnung gemäß einer Ausführungsform der vorliegenden Erfindung zeigt. Der Ablauf wird vom MPC 104 (1) ausgeführt. Das Flussdiagramm beginnt mit einem Initialize-Carry-Save-Addition-Schritt 302, in dem die MPC Anfangswerte für Parameter setzt, die in Registern gespeichert sind, einschließlich R_0, R_1, R-S und R_C (alle oben unter Bezugnahme auf 2 beschrieben), und einen Zähler, der zum Zählen von Iterationen konfiguriert ist. Schritt 302 umfasst: Initialisierung von R0 und R1 auf eine n+3-Bit-Darstellung von -N; Initialisierung von R_S auf eine n+1-Bit-Darstellung von 2ⁿ, Initialisierung von R_C auf eine N+1-Bit-Darstellung von 0, und Initialisierung des Zählers auf 4096+4. 3 FIG. 3 is a flowchart 300 that schematically shows a method for Montgomery precomputation according to an embodiment of the present invention. The MPC 104 ( 1 ) executed. The flow chart begins with an Initialize Carry Save Addition step 302 in which the MPC sets initial values for parameters stored in registers including R_0, R_1, RS and R_C (all with reference to above 2 described), and a counter configured to count iterations. Step 302 includes: initializing R0 and R1 to an n+3 bit representation of -N; Initialize R_S to an n+1 bit representation of 2 ⁿ , initialize R_C to an N+1 bit representation of 0, and initialize the counter to 4096+4.

Der MPC tritt dann in einen Carry-Save-Additionsschritt 304 ein, in dem der MPC: i) en_0 auf 1 setzt, wenn die durch die fünf höchstwertigen Bits von S_N dargestellte Zahl größer ist als die durch die fünf höchstwertigen Bits von N dargestellte Zahl (en_0=1 gibt - N an in[0] aus, während en 0=0 0 ausgibt); ii) en 1 auf 1 setzt, wenn die durch die sechs höchstwertigen Bits von S_N dargestellte Zahl größer ist als die durch die fünf höchstwertigen Bits von N dargestellte Zahl (en_1=1 gibt - N an in[1] aus, während en_1=0 0 ausgibt); iii) bestätigt den um 1 nach links verschobenen Wert von R_S in in[2]; iv) bestätigt den um 1 nach links verschobenen Wert von R_C in in[3]; v) setzt R_S gleich der Summe (ohne Übertrag) von in[0], in[1], in[2] und in[3]; vi) setzt R_C gleich dem Übertrag von in[0], in[1], in[2] und in[3]; und vii) verringert den Zähler.The MPC then enters a carry-save addition step 304 in which the MPC: i) sets en_0 to 1 if the number represented by the five most significant bits of S_N is greater than the number represented by the five most significant bits of N (en_0=1 outputs - N to in[0] while en 0=0 outputs 0); ii) sets en 1 to 1 if the number represented by the 6 most significant bits of S_N is greater than the number represented by the 5 most significant bits of N (en_1=1 outputs -N to in[1] while en_1=0 returns 0); iii) confirms the value of R_S in in[2] shifted left by 1; iv) confirms the value of R_C in in[3] shifted left by 1; v) sets R_S equal to the sum (without carry) of in[0], in[1], in[2] and in[3]; vi) sets R_C equal to the carry of in[0], in[1], in[2] and in[3]; and vii) decrements the counter.

(Carry-Save-Addition Schritt 304 ist mathematisch durch die folgenden Gleichungen definiert: $SUM_SC [5 : 0] = R_S [n : n - 4] + R_C [n : n - 4]$

en_0 = (N [4095 : 4095 - 3] < SUM_SC [5 : 0]);

in 0 = (en_0) ? - 4096 : 0

en_1 = (N [4095 : 4095 - 3] < SUM_SC [5 : 1]);

in 1 = (en_1) ? - 4096 : 0

in 2 = R_S < < 1

in 3 = R_C < < 1

R_C, R_S = CSA (in 0, in 1, in 2, in 3)

Z \ddot{a} hler = Z \ddot{a} hler - 1) .

(Carry-Save Addition step 304 is defined mathematically by the following equations:

SUM_SC [5 : 0] = R_S [n : n - 4] + R_C [n : n - 4]

in_0 = (N [4095 : 4095 - 3] < SUM_SC [5 : 0]);

in 0 = (in_0) ? - 4096 : 0

in_1 = (N [4095 : 4095 - 3] < SUM_SC [5 : 1]);

in 1 = (in_1) ? - 4096 : 0

in 2 = R_S < < 1

in 3 = R_C < < 1

R_C, R_S = CSA (in 0, in 1, in 2, in 3)

Z \ddot{a} hler = Z \ddot{a} hler - 1) .

Nach Schritt 304 geht der MPC in einen Check-CSA-Done-Schritt 306 über und prüft, ob der Zähler den Wert Null erreicht hat. Ist dies der Fall, ist die Phase der Carry-Save-Addition beendet; die Summe und der Übertrag des vorberechneten Wertes P = 2²ⁿ%N werden in R_S bzw. R_C gespeichert, und der MPC geht dann in den Schritt 308 Initialize Carry-Propagate-Addition über. Wenn in Schritt 306 die Carry-Save-Addition nicht durchgeführt wird, kehrt die MPC zu Schritt 304 zurück, um die nächste CSA-Iteration auszuführen.After step 304, the MPC goes to a check-CSA-done step 306 and checks whether the counter has reached the value zero. If so, the carry-save addition phase is complete; the sum and carry of the pre-computed value P=2 ²ⁿ %N are stored in R_S and R_C, respectively, and the MPC then proceeds to step 308 Initialize Carry-Propagate-Addition. If the carry-save addition is not performed in step 306, the MPC returns to step 304 to perform the next CSA iteration.

In Schritt 308 initialisiert der MPC den Zähler auf 65. Gemäß dem in 3 dargestellten Ausführungsbeispiel umfasst der Volladdierer 216 (2) 64 Bits; daher dauert die Carry-Propagate-Addition 64+1 Iterationen (64*64 = 4096; eine zusätzliche Iteration ist erforderlich, da n etwas größer als 4096 ist).In step 308, the MPC initializes the counter to 65. According to the in 3 illustrated embodiment includes the full adder 216 ( 2 ) 64 bits; therefore, carry-propagate addition takes 64+1 iterations (64*64 = 4096; one extra iteration is required because n is slightly larger than 4096).

Nach Schritt 308 tritt der MPC in einen Carry-Propagate-Additionsschritt 310 ein, in dem der Ausgang P berechnet wird (durch Addition des Übertrags aus der vorherigen Iteration, einer 64-Bit-Gruppe aus R_S und einer 64-Bit-Gruppe aus R_C) und der Zähler dekrementiert wird. Die ausgewählten Bitgruppen aus R_S und R_C werden in aufeinanderfolgenden Iterationen nach links verschoben (z.B. werden die Bits 63:0 in der ersten Iteration ausgewählt, die Bits 127:64 in der nächsten Iteration usw.).After step 308, the MPC enters a carry-propagate addition step 310 in which the output P is calculated (by adding the carry from the previous iteration, a 64-bit group of R_S and a 64-bit group of R_C ) and the counter is decremented. The selected bit groups of R_S and R_C are shifted left in successive iterations (e.g. bits 63:0 are selected in the first iteration, bits 127:64 are selected in the next iteration, etc.).

Als Nächstes geht der MPC in den Schritt 312 Check-Carry-Propagation-Addition-Done (CPA-done) und überprüft, ob der Zähler den Wert Null erreicht hat. Ist dies der Fall, ist der Vorberechnungsablaufplan abgeschlossen, und der Vorberechnungswert wird in P gespeichert. Wenn in Schritt 312 die Carry-Propagation-Addition nicht abgeschlossen ist, kehrt der MPC für die nächste CPA-Iteration zu Schritt 310 zurück.Next, the MPC goes to check-carry-propagation-addition-done (CPA-done) step 312 and checks whether the counter has reached zero. If so, the precalculation schedule is complete and the precalculation value is stored in P . If in step 312 the carry propagation addition is not complete, the MPC returns to step 310 for the next CPA iteration.

Das in 3 dargestellte Flussdiagramm 300 ist ein Beispiel, das lediglich der konzeptionellen Klarheit halber dargestellt ist. In alternativen Ausführungsformen der vorliegenden Erfindung können andere geeignete Flussdiagramme verwendet werden. Beispielsweise kann der Zähler eher aufwärts als abwärts zählen (mit entsprechend geänderten Schritten für die Überprüfung der Fertigstellung). In einigen Ausführungsformen kann der Zähler nach den „Check-done“-Schritten inkrementiert (oder dekrementiert) werden.This in 3 The illustrated flowchart 300 is an example presented for conceptual clarity only. In alternate embodiments of the present invention, other suitable flowcharts may be used. For example, the counter can count up rather than down (with steps for checking completion modified accordingly). In some embodiments, the counter may be incremented (or decremented) after the "check-done" steps.

VORBERECHNUNG KLEINER ZAHLENPRECALCULATION OF SMALL NUMBERS

In einigen Ausführungsformen kann die Anzahl der Bits für die Vorberechnungsoperation kleiner sein als die Breite des MPC (z. B. N < 4096). Da bei den oben beschriebenen Verfahren und Schaltungen ein nächster Zyklus in Abhängigkeit von den höherwertigen Bits des Operanden ausgeführt wird, werden zwei Vorstufen hinzugefügt:

a. Der Operand wird (durch den MPC, den MCE oder durch einen Prozessor) nach links verschoben, bis das MSB=1 ist;
b. Die Anzahl der Algorithmuszyklen wird um die Anzahl der Verschiebungen von a) verringert.

In some embodiments, the number of bits for the precompute operation may be less than the width of the MPC (e.g., N < 4096). Since the methods and circuits described above execute a next cycle depending on the high-order bits of the operand, two preliminary stages are added:

a. The operand is shifted left (by the MPC, the MCE, or by a processor) until the MSB=1;
b. The number of algorithm cycles is reduced by the number of shifts from a).

Nachdem der Vorberechnungsalgorithmus abgeschlossen ist, wird das Ergebnis nach rechts verschoben (durch den MPC, den MCE oder einen Prozessor), um die ursprüngliche Bitgröße wiederherzustellen.After the pre-calculation algorithm is complete, the result is right-shifted (by the MPC, the MCE, or a processor) to restore the original bit size.

INTEGRIERTER MONTGOMERY-MULTIPLIKATOR MIT VORBERECHNUNGSSCHALTUNGINTEGRATED MONTGOMERY MULTIPLIER WITH PRE-COMPUTATION CIRCUIT

Die oben beschriebene Vorberechnungsschaltung ist der Montgomery-Multiplikationsschaltung ähnlich. In einigen Ausführungsformen ist die Vorberechnung in die Montgomery-Multiplikationsschaltung integriert, wodurch ein kleiner Teil der Logik hinzugefügt wird.The precalculation circuit described above is similar to the Montgomery multiplication circuit. In some embodiments, the pre-calculation is integrated into the Montgomery multiplication circuit, adding a small amount of logic.

4 ist ein Blockdiagramm, das schematisch einen MMA 400 mit einer in die Montgomery Calculation Engine integrierten Vorberechnungsschaltung gemäß einer Ausführungsform der vorliegenden Erfindung zeigt. Wie MMA 100 (1) berechnet MMA 400 das Produkt von Zahlenpaaren modulo einer großen Primzahl N, aber im Gegensatz zu MMA 100 umfasst MMA 400 eine integrierte Montgomery-Berechnungsmaschine (IMCE) 402, die so konfiguriert ist, dass sie Argumente A, B und den Divisor N von einem Prozessor 404 erhält und das Produkt (A*B)%N an den Prozessor 404 ausgibt. Der Prozessor 404 ist so konfiguriert, dass er Operanden (Multiplikanden) an die IMCE 402 sendet und das Multiplikationsergebnis von der IMCE erhält. In einigen Ausführungsformen ist der Prozessor 404 möglicherweise nicht erforderlich, beispielsweise wenn die IMCE 402 einen Prozessor enthält. 4 12 is a block diagram that schematically shows an MMA 400 with precalculation circuitry integrated into the Montgomery Calculation Engine according to an embodiment of the present invention. Like MMA 100 ( 1 ) MMA 400 computes the product of pairs of numbers modulo a large prime N, but unlike MMA 100, MMA 400 includes an Integrated Montgomery Calculation Engine (IMCE) 402 configured to take arguments A, B and the divisor N of a Processor 404 receives and outputs the product (A*B)%N to processor 404. The processor 404 is configured to send operands (multiplicands) to the IMCE 402 and receive the multiplication result from the IMCE. In some embodiments, processor 404 may not be required, such as when IMCE 402 includes a processor.

Wie bei MMA 100 umfassen der Prozessor 404 und/oder die IMCE 402 in einigen Ausführungsformen einen Mehrzweckprozessor, der in Software programmiert ist, um die hier beschriebenen Funktionen auszuführen. Die Software kann in elektronischer Form auf den Prozessor heruntergeladen werden, z.B. über ein Netzwerk oder von einem Host, oder sie kann alternativ oder zusätzlich auf nicht-übertragbaren, greifbaren Medien bereitgestellt und/oder gespeichert werden, z.B. in einem magnetischen, optischen oder elektronischen Speicher.As with MMA 100, in some embodiments processor 404 and/or IMCE 402 comprise a general purpose processor programmed in software to perform the functions described herein. The software may be downloaded to the processor in electronic form, e.g., over a network or from a host, or alternatively or additionally, may be provided and/or stored on non-transferable, tangible media, e.g., magnetic, optical, or electronic storage .

5 ist ein Blockdiagramm, das schematisch eine integrierte Montgomery-Berechnungsmaschine (IMCE) 402 gemäß einer Ausführungsform der vorliegenden Erfindung zeigt. In dem Ausführungsbeispiel beträgt die Multiplikation 4096 Bit x 4096 Bit (wie jedoch in der oben zitierten Walter-Referenz erläutert, verwenden wir 4096+2=4098 Bit, um eine letzte Modulo-Operation zu sparen). Wie man sieht, ist ICME 402 eine Obermenge von MPC 104 (2); einige der Untereinheiten von ICME 402 sind identisch mit den Gegenstücken von MPC 104 (und behalten die gleichen Untereinheitsnummern; andere Untereinheiten sind Obermengen der entsprechenden Untereinheiten von MPC 104. Darüber hinaus umfasst IMCE 402 drei neue Untereinheiten - ein Steuergerät 518 (das sich von der Steuereinheit 214, 2, unterscheidet) und zwei Register - ein GPR0-Register 514 und ein GPR1-Register 516. 5 FIG. 4 is a block diagram that schematically shows an integrated Montgomery computation engine (IMCE) 402 according to an embodiment of the present invention. In the exemplary embodiment, the multiplication is 4096 bits x 4096 bits (however, as explained in the Walter reference cited above, we use 4096+2=4098 bits to save a final modulo operation). As can be seen, ICME 402 is a superset of MPC 104 ( 2 ); some of the subunits of ICME 402 are identical to their MPC 104 counterparts (and retain the same subunit numbers; other subunits are supersets of the corresponding subunits of MPC 104. In addition, IMCE 402 includes three new subunits - a control unit 518 (which differs from the control unit 214, 2 , distinguishes) and two registers - a GPR0 register 514 and a GPR1 register 516.

CSA 200 mit 4 Eingängen und Übertragsspeicherung addiert die Eingänge IN [0] bis IN [3]. Die Summen- und Übertragsausgänge sind mit einem R_S-Register 208 bzw. mit einem R_C-Register 206 verbunden. Die Eingänge IN[0] und IN[1] sind mit den UND-Gates 202 bzw. 204 verbunden. Das UND-Gate 202 ist so konfiguriert, dass es an IN[0] den Wert eines R_0-Registers 502 ausgibt, wenn ein Signal en_0 auf logisch-1 und andernfalls auf Null ist, während das UND-Gate 204 so konfiguriert ist, dass es an IN[1] den Wert eines R_1-Registers 504 ausgibt, wenn ein Signal en_1 auf logisch-1 und andernfalls auf Null ist.4-input CSA 200 with carry save adds inputs IN[0] through IN[3]. The sum and carry outputs are connected to an R_S register 208 and an R_C register 206, respectively. Inputs IN[0] and IN[1] are connected to AND gates 202 and 204, respectively. The AND gate 202 is configured to output the value of an R_0 register 502 at IN[0] when a signal en_0 is at logic -1 and zero otherwise, while the AND gate 204 is configured so that it outputs the value of an R_1 register 504 at IN[1] when a signal en_1 is at logic-1 and at zero otherwise.

Ein Links/Rechts-Verschieber 512 ist so konfiguriert, dass er den Ausgang von R_S 208 nach links oder rechts verschiebt und den Verschiebungswert an IN[2] von CSA 200 sendet; in ähnlicher Weise ist ein Links/Rechts-Verschieber 510 so konfiguriert, dass er den Ausgang von R_C 206 nach links oder rechts verschiebt und den Verschiebungswert an IN[3] von CSA 200 sendet. Wie zu erkennen ist, sind die Links/Rechts-Verschieber 512 und 510 eine Obermenge der Verschieber 212, 210 (2), die nur für die Linksverschiebung konfiguriert sind. In einigen Ausführungsformen addiert CPA 216 Gruppen von Bits (z.B. 64-Bit-Gruppen) von R_S 208 und R_C 206, um die 4098-Bit-Übertragssummen-Darstellung auf eine 4098-Bit-Binärdarstellung zu reduzieren; in einer Ausführungsform laden GPPR0 514 und/oder GPR1 516 sequentiell den Ausgang von CPA 216, z.B. in Gruppen von 64 Bits.A left/right shifter 512 is configured to shift the output of R_S 208 left or right and send the shift value to IN[2] of CSA 200; similarly, a left/right shifter 510 is configured to shift the output of R_C 206 left or right and sends the shift value to CSA 200 IN[3]. As can be seen, left/right shifters 512 and 510 are a superset of shifters 212, 210 ( 2 ) configured for left shift only. In some embodiments, CPA 216 adds groups of bits (eg, 64-bit groups) from R_S 208 and R_C 206 to reduce the 4098-bit carry-sum representation to a 4098-bit binary representation; in one embodiment, GPPR0 514 and/or GPR1 516 sequentially load the output of CPA 216, eg, in groups of 64 bits.

Der Controller 518 ist so konfiguriert, dass er den Betrieb der IMCE 402 steuert, indem er ein sequentielles Muster von Steuersignalen an die Untereinheiten sendet, einschließlich en_0, en_1; Steuerung der Verschieberichtung der Links-/Rechtsschieber 512 und 510; Laststeuerung der Register R_0 502, R_1 504, GPR0 514, GPR1 516; und durch Initialisierung der Steuerung der Register R_S 208, R_C 206. Der Controller kann (z.B. durch den Prozessor 404, 4) auf eine von mindestens zwei Einstellungen konfiguriert werden - eine erste Einstellung, in der der Controller die UND-Gates 202, 204 und die Verschieber 510, 512 steuert, so dass der CSA einen Montgomery-Vorberechnungswert berechnet, und eine zweite Einstellung, in der der Controller die UND-Gates und die Verschieber steuert, so dass der CSA eine Montgomery-Multiplikation berechnet. In einigen Ausführungsformen kann das Steuergerät für eine dritte Einstellung konfiguriert werden, in der die CSA eine Potenzierung (z.B. RSA-Potenzierung) durch Kaskadierung einer Montgomery-Vorberechnungseinstellung und mehrerer Vorkommen von Montgomery-Multiplikationseinstellungen berechnet.The controller 518 is configured to control the operation of the IMCE 402 by sending a sequential pattern of control signals to the sub-units, including en_0, en_1; controlling the shifting direction of the left/right shifters 512 and 510; load control of registers R_0 502, R_1 504, GPR0 514, GPR1 516; and by initializing control of registers R_S 208, R_C 206. The controller may (eg, by processor 404, 4 ) can be configured to one of at least two settings - a first setting in which the controller controls AND gates 202, 204 and shifters 510, 512 so that the CSA calculates a Montgomery precalculation value, and a second setting in which the controller controls the AND gates and the shifters so that the CSA computes a Montgomery multiplication. In some embodiments, the controller may be configured for a third setting in which the CSA calculates an exponentiation (eg, RSA exponentiation) by cascading a Montgomery precomputation setting and multiple occurrences of Montgomery multiplication settings.

Im Folgenden wird die Gesamtheit von Controller 518, UND-Gate 202, UND-Gate 204, Verschieber 510 und Verschieber 512 zusammen als Steuerschaltung bezeichnet.Hereinafter, the entirety of controller 518, AND gate 202, AND gate 204, shifter 510 and shifter 512 is collectively referred to as a control circuit.

Gemäß dem in 5 dargestellten und oben beschriebenen Ausführungsbeispiel ist IMCE 402 so konfiguriert, dass es sowohl eine Montgomery-Vorberechnung als auch eine Montgomery-Multiplikation durchführt (und insbesondere eine Montgomery-Vorberechnung gefolgt von einer Montgomery-Multiplikation).According to the 5 In the embodiment illustrated and described above, IMCE 402 is configured to perform both Montgomery precomputation and Montgomery multiplication (and specifically, Montgomery precomputation followed by Montgomery multiplication).

Die in 5 dargestellte und hierin beschriebene Konfiguration der IMCE 104 ist eine Beispielkonfiguration, die lediglich der konzeptionellen Klarheit halber dargestellt ist. Andere geeignete Konfigurationen können in alternativen Ausführungsformen der vorliegenden Erfindung verwendet werden. In einigen Ausführungsformen gibt es beispielsweise keine CPA, und alle Operationen werden in einer Summen- und Übertragsschreibweise durchgeführt (mit Ausnahme des endgültigen Potenzierungsergebnisses, das durch eine CPA oder z.B. durch Software in das Binärformat umgewandelt werden kann).In the 5 The IMCE 104 configuration illustrated and described herein is an example configuration presented for conceptual clarity only. Other suitable configurations may be used in alternative embodiments of the present invention. For example, in some embodiments there is no CPA, and all operations are performed in sum-and-carry notation (except for the final exponentiation result, which may be converted to binary by a CPA or, for example, by software).

6 ist ein Flussdiagramm 600, das schematisch ein Verfahren zur Montgomery 4096-Bit X 4096-Bit Multiplikation gemäß einer Ausführungsform der vorliegenden Erfindung darstellt. In dem in 6 dargestellten Ausführungsbeispiel werden die beiden Multiplikanden auf 4098 Bit erweitert, um eine abschließende Modulo-Stufe (wie oben erläutert) zu sparen. Der Fluss wird von der Steuerschaltung 518 ausgeführt, die die verschiedenen Untereinheiten von IMCE 402 steuert (5). Das Flussdiagramm beginnt mit einem Initialisierungs-CSA-Register-Schritt 602, bei dem die Steuerschaltung einen Wert von N (den Modulo) in R0 502 lädt, einen Wert von A (einen ersten Multiplikanden) in R1 504 lädt, B (den zweiten Multiplikanden) in GPR1 514 lädt und Null in die Register R_S 208 und R_C 206 lädt. In einer Ausführungsform lädt der Controller die 4098-Bit-Werte in Gruppen von 64 Bits über 65 Zyklen. In einigen Ausführungsformen empfängt das Steuergerät einige oder alle Werte von einem Prozessor (z.B. Prozessor 404, 4), direkt oder über einen Bus. 6 FIG. 6 is a flowchart 600 that schematically illustrates a method for Montgomery 4096-bit X 4096-bit multiplication according to an embodiment of the present invention. in the in 6 In the exemplary embodiment shown, the two multiplicands are expanded to 4098 bits in order to save a final modulo stage (as explained above). The flow is carried out by control circuitry 518, which controls the various sub-units of IMCE 402 ( 5 ). The flow chart begins with an initialize CSA register step 602 where the control circuit loads a value of N (the modulo) into R0 502, loads a value of A (a first multiplicand) into R1 504, B (the second multiplicand ) loads into GPR1 514 and zero loads into registers R_S 208 and R_C 206. In one embodiment, the controller loads the 4098-bit values in groups of 64 bits over 65 cycles. In some embodiments, the controller receives some or all of the values from a processor (eg, processor 404, 4 ), directly or via a bus.

Als Nächstes tritt die Steuerschaltung in einen Initialisierungszählerschritt 604 ein und lädt einen internen Zähler (nicht dargestellt) mit dem Wert 4098 - der Anzahl der auszuführenden Montgomery-Reduktionsiterationen. Der Steuerkreis tritt dann in einen Montgomery-Iterationsschritt 606 ein, in dem der Steuerkreis:

i) den Eingang en0 des UND-Gates 202 (Fig. 402) auf S[0] + C[0] * GPR1[0]*r1[0] (Bitoperationen) setzt;
ii) den Eingang en1 des UND-Gates 204 auf GPR1[0] setzt;
iii) wenn en_0 auf logisch-1 ist - R0 in das 4098-Bit in[0] kopiert; andernfalls - in[0]=0 setzt;
iv) wenn en_1 auf logisch-1 ist - R1 in das 4098-Bit in[1] kopiert; andernfalls - in[1]=0 setzt;
v) den 4098-Bit-Wert von in[2] auf eine Rechtsverschiebung um 1 von R_S setzt;
vi) den 4098-Bit-Wert in[3] auf eine Verschiebung von R_C nach rechts um 1 setzt;
vii) bitweise von in[0],in[1],in[2],in[3] (Speichern der bitweisen Summe in R_S und des bitweisen Übertrags in R_C) addiert; und,
viii) den Zähler dekrementiert.

Next, the control circuit enters an initialize counter step 604 and loads an internal counter (not shown) with the value 4098 - the number of Montgomery reduction iterations to be performed. The control loop then enters a Montgomery iteration step 606 in which the control loop:

i) sets input en0 of AND gate 202 (Fig. 402) to S[0]+C[0]*GPR1[0]*r1[0] (bit operations);
ii) sets the en1 input of AND gate 204 to GPR1[0];
iii) if en_0 is logic-1 - R0 copied to the 4098 bit in[0]; otherwise - sets in[0]=0;
iv) if en_1 is logic-1 - R1 copied to the 4098 bit in[1]; otherwise - sets in[1]=0;
v) sets the 4098-bit value of in[2] to a right shift of 1 from R_S;
vi) sets the 4098-bit value in[3] to a right shift of R_C by 1;
vii) add bitwise from in[0],in[1],in[2],in[3] (store bitwise sum in R_S and bitwise carry in R_C); and,
viii) decrements the counter.

Der Steuerkreis tritt dann in einen Check-Counter-Greater-Than-Zero-Schritt 608 ein und prüft, ob der Zählerwert immer noch größer als Null ist. Ist dies der Fall, ist die Montgomery-Multiplikationsschleife noch nicht beendet, und der Steuerkreis kehrt zu Schritt 606 zurück, um die nächste Montgomery-Iteration auszuführen. Wenn der Zähler in Schritt 608 nicht größer als Null ist, tritt der Steuerschaltkreis in einen Init-Carry-Propagate-Addition-Schritt 610 ein, in dem der Steuerschaltkreis den Zähler auf 65 setzt und dann in einen Carry-Propagate-Addition (CPA) Schritt 612 eintritt.The control circuit then enters a Check Counter Greater Than Zero step 608 and checks if the counter value is still greater than zero. If so, the Montgomery multiplication loop is not complete and control loops back to step 606 to perform the next Montgomery iteration. If the counter is not greater than zero in step 608, the control circuit enters an init-carry-propagate-addition step 610 in which the control circuit sets the counter to 65 and then enters a carry-propagate-addition (CPA) Step 612 occurs.

CPA-Schritt 612 (wie Schritt 310 in 3) ist eine 64-Bit-Addition, die eine Gruppe von 64 R_S-Bits zu einer entsprechenden Gruppe von 64 R_C-Bits addiert und den Zähler dekrementiert. In einem Check-CPA-Done-Schritt 614 prüft die Steuerschaltung, ob der Zähler Null erreicht hat, und kehrt zu Schritt 612 zurück, wenn der Zähler immer noch größer als Null ist. Die Steuerschaltung durchläuft die Schritte 612 und 614 65 Mal, um alle 4098 Carry-Save-Bitpaare zu akkumulieren. Wenn der Zähler in Schritt 614 den Wert Null erreicht hat, endet das Flussdiagramm.CPA step 612 (same as step 310 in 3 ) is a 64-bit addition that adds a group of 64 R_S bits to a corresponding group of 64 R_C bits and decrements the counter. In a Check CPA Done step 614, the control circuitry checks whether the counter has reached zero and returns to step 612 if the counter is still greater than zero. The control circuit loops through steps 612 and 614 65 times to accumulate all 4098 carry-save bit pairs. When the counter has reached zero in step 614, the flowchart ends.

Das in 6 dargestellte und oben beschriebene Flussdiagramm 600 ist ein Beispiel, das lediglich der konzeptionellen Klarheit halber dargestellt ist. Andere geeignete Flussdiagramme können in alternativen Ausführungsformen der vorliegenden Erfindung verwendet werden. Beispielsweise kann in Ausführungsformen der Zähler hochgezählt und dann mit der Anzahl der Iterationen verglichen werden. In einigen Ausführungsformen wird der Zähler geändert, nachdem er auf Vollständigkeit geprüft wurde.This in 6 The flowchart 600 illustrated and described above is an example presented for conceptual clarity only. Other suitable flow charts may be used in alternative embodiments of the present invention. For example, in embodiments, the counter may be incremented and then compared to the number of iterations. In some embodiments, the counter is changed after it has been checked for completeness.

BERECHNUNG DES RSA-EXPONENTENCALCULATION OF THE RSA EXPONENT

Der RSA-Algorithmus besteht aus Modulo-Potenzierungen von großen Zahlen. In dem oben zitierten Artikel von Mclvor et al. beschreiben die Autoren die Verwendung eines Montgomery-Multiplikators für die Potenzierung. Die Potenzierung ist formell definiert als M=C^D MOD(n). D - der Exponent, kann in der Steuerschaltung 518 gespeichert oder von einem Prozessor (z.B. Prozessor 204, 4) gelesen werden.The RSA algorithm consists of modulo powers of large numbers. In the above-cited Mclvor et al. the authors describe the use of a Montgomery multiplier for exponentiation. The exponentiation is formally defined as M=C ^D MOD(n). D - the exponent, may be stored in control circuitry 518 or supplied by a processor (e.g., processor 204, 4 ) to be read.

7 ist ein Flussdiagramm 700, das schematisch ein Verfahren zur Modulo-Potenzierung gemäß einer Ausführungsform der vorliegenden Erfindung zeigt. Das Flussdiagramm wird von der Steuerschaltung 518 (5) ausgeführt. Das Flussdiagramm zur Potenzierung umfasst die Ausführung des Vorberechnungsflussdiagramms 300 (3) und mehrere Ausführungen des Flussdiagramms zur Montgomery-Multiplikation 600 (6). Im Folgenden bezeichnen wir die Montgomery-Vorberechnung, die K=(2^2k)%n berechnet, als Vorberechnung(k,n) und eine Montgomery-Multiplikation M=(a*b)%n als MONTGOMERY(a,b,n). 7 FIG. 7 is a flowchart 700 that schematically shows a method for modulo exponentiation according to an embodiment of the present invention. The flow chart is generated by the control circuit 518 ( 5 ) executed. The exponentiation flowchart includes the execution of the precomputation flowchart 300 ( 3 ) and several iterations of the Montgomery multiplication flowchart 600 ( 6 ). In the following we denote the Montgomery precomputation that computes K=(2 ^2k )%n Precomputation(k,n) and a Montgomery multiplication M=(a*b)%n MONTGOMERY(a,b,n) .

Flussdiagramm 700 beginnt mit einem Vorberechnungsschritt 702, in dem die Steuerschaltung einen Vorberechnungswert K=VORBERECHNUNG(k,n) durch Ausführen eines Vorberechnungsflusses, z.B. Flussdiagramm 300 ( 3), berechnet. Als nächstes führt der Steuerschaltkreis in einem Calculate-Initial-GPRO-Schritt 704 einen Montgomery-Multiplikationsfluss (z. B. Fluss 600, 6) aus, um MONTGOMERY(K,C,n) zu berechnen, und speichert das Ergebnis in GPR0. Dann führt der Steuerschaltkreis in einem Calculate-Initial-GPR1-Schritt 706 einen weiteren Montgomery-Multiplikationsfluss aus, um MONTGOMERY(K,1,n) zu berechnen, und speichert das Ergebnis in GPR1. Der Steuerkreis setzt nun in einem Set-Counter-4098 den Wert des Zählers auf 4098 - die Anzahl der Iterationen bei der Potenzierung.Flowchart 700 begins with a precalculation step 702, in which the control circuit calculates a precalculation value K=PRECALCULATION(k,n) by executing a precalculation flow, e.g., flowchart 300 ( 3 ), calculated. Next, in a Calculate-Initial-GPRO step 704, the control circuitry executes a Montgomery multiplication flow (e.g., flow 600, 6 ) to calculate MONTGOMERY(K,C,n) and stores the result in GPR0. Then, in a Calculate Initial GPR1 step 706, the control circuit performs another Montgomery multiplication flow to calculate MONTGOMERY(K,1,n) and stores the result in GPR1. The control circuit now sets the value of the counter to 4098 in a Set-Counter-4098 - the number of iterations in the exponentiation.

Nach Schritt 708 startet der Steuerkreis die Folge von 4098 Potenzierungsiterationen. GPR0 speichert nach der i-ten Iteration den Wert von C²ⁱ, während GPR1 das akkumulierte Potenzierungsergebnis für C^D[i-1:0] speichert. In einem Calculate-Next-GPR0-Schritt 710 berechnet der Steuerkreis MONTGOMERY(GPR0,GPR0,n) und quadriert den vorherigen Wert von GPR0. Als nächstes prüft der Steuerkreis in einem Check-Di-Schritt 612, ob das i-teBit von d logisch-1 ist. Ist dies der Fall, geht der Steuerschaltkreis zu einem Update-GPR1-Schritt 714 über, in dem der Steuerschaltkreis eine Montgomery-Multiplikation (z.B. Flussdiagramm 600) ausführt, um MONTGOMERY(GPR0,GPR1,n) zu berechnen, das Ergebnis in GPR1 speichert und zu einem Decrement-Counter-Schritt 716 übergeht (wenn in Schritt 712 d[i] nicht logisch-1 ist, umgeht der Steuerschaltkreis Schritt 714).After step 708, the control circuit starts the sequence of 4098 exponentiation iterations. After the ith iteration, GPR0 stores the value of C ²ⁱ , while GPR1 stores the accumulated exponentiation result for CD ^[i-1:0] . In a Calculate Next GPR0 step 710, the control circuit calculates MONTGOMERY(GPR0,GPR0,n) and squares the previous value of GPR0. Next, the control circuit checks in a Check Di step 612 whether the ith bit of d is logic-1. If so, control circuitry proceeds to an update GPR1 step 714 in which the control circuitry performs a Montgomery multiplication (eg, flowchart 600) to compute MONTGOMERY(GPR0,GPR1,n), storing the result in GPR1 and proceeds to a decrement counter step 716 (if in step 712 d[i] is not logic -1, the control circuitry bypasses step 714).

In Schritt 716 dekrementiert der Steuerkreis den Zähler und prüft dann in einem Check-Counter-0-Schritt 718, ob der Zähler 0 erreicht hat. Ist dies der Fall, endet der Potenzierungsfluss und GPR1 speichert M - das Potenzierungsergebnis. Hat der Zähler in Schritt 718 nicht den Wert 0 erreicht, kehrt der Steuerkreis für die nächste Potenzierungsiteration zu Schritt 710 zurück.In step 716, the control circuit decrements the counter and then in a check counter 0 step 718 checks whether the counter has reached zero. If so, the exponentiation flow ends and GPR1 stores M - the exponentiation result. If the counter has not reached 0 in step 718, control loops back to step 710 for the next exponentiation iteration.

Das in 7 gezeigte und oben beschriebene Flussdiagramm 700 ist ein Beispiel, das lediglich der konzeptionellen Klarheit halber dargestellt ist. In alternativen Ausführungsformen der vorliegenden Erfindung können andere geeignete Flussdiagramme verwendet werden. Zum Schutz vor Sicherheitsangriffen, bei denen die Potenzierungszeit gemessen wird, um die Anzahl der Bits des Logik-1-Exponenten abzuschätzen, wird beispielsweise in einigen Ausführungsformen die Montgomery-Multiplikation von Schritt 714 immer ausgeführt, und der Wert des Bits d[i] des Exponenten (der in Schritt 712 geprüft wird) bestimmt, ob GPR1 mit den Multiplikationsergebnissen aktualisiert wird. In einigen Ausführungsformen wird der Zähler in Schritt 708 gelöscht, in Schritt 716 hochgezählt und in Schritt 718 mit 4098 verglichen. In einer Ausführungsform wird der Zähler nach dem Vergleich mit dem Endwert hochgezählt.This in 7 The flowchart 700 shown and described above is an example presented for conceptual clarity only. In alternate embodiments of the present invention, other suitable flowcharts may be used. For example, to protect against security attacks where the exponentiation time is measured to estimate the number of bits of the logic 1 exponent, in some embodiments the Montgomery multiplication of step 714 is always performed and the value of bit d[i] des Exponent (which is checked in step 712) determines whether GPR1 is updated with the multiplication results. In some embodiments, the counter is cleared at step 708, incremented at step 716, and compared to 4098 at step 718. In one embodiment, the counter is incremented after being compared to the final value.

MONTGOMERY BERECHNUNG VON KLEINEN ZAHLENMONTGOMERY CALCULATION OF SMALL NUMBERS

Bei den oben beschriebenen Methoden und Schaltungen der Montgomery-Multiplikation wird ein nächster Zyklus in Abhängigkeit vom niederwertigen Bit des Operanden ausgeführt. Daher funktioniert der Algorithmus gut, wenn die Anzahl der Bits der zu multiplizierenden Zahlen kleiner ist als die Breite des IMCE (z.B. N<4096). Die Operanden sollten in die LSB-Teile der Register geladen werden, und logische 0-Bits sollten in den unbenutzten MS-Teil geladen werden.In the methods and circuits of Montgomery multiplication described above, a next cycle is executed depending on the low-order bit of the operand. Therefore, the algorithm works well when the number of bits of the numbers to be multiplied is smaller than the width of the IMCE (e.g. N<4096). The operands should be loaded into the LSB parts of the registers and logical 0 bits should be loaded into the unused MS part.

Die Konfigurationen der Montgomery-Multiplikationsgeräte (MMA) 100 und 400, einschließlich der Montgomery-Vorberechnungsschaltung (MPC) 104 und der integrierten Montgomery-Berechnungsmaschine (IMCE) 402, sowie die Methoden der Flussdiagramme 300, 600 und 700, die hier beschrieben werden, sind Beispielkonfigurationen und -methoden, die nur der konzeptionellen Klarheit halber gezeigt werden. Jede andere geeignete Konfiguration und jedes andere Flussdiagramm kann in alternativen Ausführungsformen verwendet werden. Die verschiedenen Elemente der Montgomery-Multiplikationsgeräte (MMA) 100 und 400, einschließlich der Montgomery-Vorberechnungsschaltung 104 und der integrierten Montgomery-Berechnungsmaschine 402, können mit geeigneter Hardware implementiert werden, z.B. in einem oder mehreren anwendungsspezifischen integrierten Schaltkreisen (ASICs) oder feldprogrammierbaren Gate-Arrays (FPGAs).The configurations of the Montgomery multiplication devices (MMA) 100 and 400, including the Montgomery pre-calculation circuit (MPC) 104 and the integrated Montgomery calculation engine (IMCE) 402, as well as the methods of the flowcharts 300, 600 and 700 described here are Example configurations and methods shown for conceptual clarity only. Any other suitable configuration and flowchart may be used in alternative embodiments. The various elements of Montgomery multiplication devices (MMA) 100 and 400, including Montgomery precalculation circuit 104 and integrated Montgomery calculation engine 402, may be implemented with appropriate hardware, such as in one or more application specific integrated circuits (ASICs) or field programmable gate Arrays (FPGAs).

Obwohl sich die hier beschriebenen Ausführungsformen hauptsächlich auf die Montgomery-Multiplikation, die Montgomery-Vorberechnung und die Montgomery-basierte Potenzierung beziehen, können die hier beschriebenen Verfahren und Systeme auch für andere Anwendungen, wie z.B. die schnelle Division, verwendet werden.Although the embodiments described herein relate primarily to Montgomery multiplication, Montgomery precomputation, and Montgomery-based exponentiation, the methods and systems described herein can also be used for other applications, such as fast division.

Es wird daher deutlich, dass die oben beschriebenen Ausführungsformen als Beispiele angeführt sind und dass die vorliegende Erfindung nicht auf das beschränkt ist, was hierin besonders gezeigt und beschrieben wurde. Vielmehr umfasst der Umfang der vorliegenden Erfindung sowohl Kombinationen und Unterkombinationen der verschiedenen hierin beschriebenen Merkmale als auch Variationen und Modifikationen davon, die dem Fachmann beim Lesen der vorstehenden Beschreibung einfallen würden und die im Stand der Technik nicht offenbart sind. Dokumente, die durch Verweis in die vorliegende Patentanmeldung aufgenommen wurden, sind als integraler Bestandteil der Anmeldung zu betrachten, mit der Ausnahme, dass in dem Maße, in dem Begriffe in diesen aufgenommenen Dokumenten in einer Weise definiert werden, die im Widerspruch zu den in der vorliegenden Beschreibung explizit oder implizit gemachten Definitionen steht, nur die Definitionen in der vorliegenden Beschreibung zu berücksichtigen sind.It is therefore understood that the embodiments described above are given as examples and that the present invention is not limited to what has been particularly shown and described herein. Rather, the scope of the present invention includes combinations and sub-combinations of the various features described herein, as well as variations and modifications thereof that would occur to those skilled in the art upon reading the foregoing description and which are not disclosed in the prior art. Documents incorporated by reference into the present patent application are to be considered an integral part of the application, except that to the extent that terms in these incorporated documents are defined in a way that is contrary to the definitions in definitions made explicit or implicit in this description, only the definitions in the present description are to be considered.

Es versteht sich, dass die oben beschriebenen Aspekte und Ausführungsformen nur beispielhaft sind und dass im Rahmen der Ansprüche Änderungen im Detail vorgenommen werden können.It is understood that the aspects and embodiments described above are exemplary only and that changes may be made in detail within the scope of the claims.

Jedes Gerät, Verfahren und Merkmal, das in der Beschreibung und (gegebenenfalls) in den Ansprüchen und Zeichnungen offenbart wird, kann unabhängig oder in jeder geeigneten Kombination bereitgestellt werden.Each apparatus, method and feature disclosed in the description and (where appropriate) claims and drawings may be provided independently or in any suitable combination.

Die in den Ansprüchen enthaltenen Bezugszahlen dienen nur der Veranschaulichung und haben keine einschränkende Wirkung auf den Umfang der Ansprüche.The reference numerals contained in the claims are for illustration only and have no limiting effect on the scope of the claims.

ZITATE ENTHALTEN IN DER BESCHREIBUNGQUOTES INCLUDED IN DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of documents cited by the applicant was generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturPatent Literature Cited

US 17/180999 [0001]

Claims

Integrated Montgomery Calculation Engine (IMCE) for multiplying two multiplicands modulo a predefined number, where the IMCE comprises: a multi-input carry-save adder (CSA) circuit having outputs including a sum output and a carry output; and Control circuitry coupled to the inputs and the outputs of the CSA circuitry and configured to operate the CSA circuitry in at least (i) a first setting that calculates a Montgomery precalculation value and (ii) a second setting , which computes a Montgomery multiplication of the two multiplicands.

The IMCE after claim 1 wherein the control circuit is configured to logically shift the sum output and the carry output of the CSA circuit and to couple the shifted sum output and the shifted carry output to respective inputs of the CSA circuit.

The IMCE after claim 2 wherein the control circuit is configured to logically shift left the sum output and carry output of the CSA circuit in the first setting and logically shift the sum output and carry output of the CSA circuit right in the second setting.

The IMCE of any preceding claim, wherein in the first setting the control circuit is configured to set two of the inputs of the CSA circuit to a constant value dependent on the predefined number.

The IMCE of any preceding claim, wherein in the first setting the control circuit is configured to set an input of the CSA circuit to the predefined number or to zero depending on the most significant bits of the sum output and the carry output of the CSA circuit and from the two multiplicands.

The IMCE of any preceding claim, wherein in the second setting the control circuit is configured to set an input of the CSA circuit to zero or to one of the multiplicands, depending on the other of the multiplicands.

The IMCE of any preceding claim, wherein in the second setting the control circuit is configured to set an input of the CSA circuit to zero or to the predefined number depending on the least significant bits of the sum output, the carry output and the two multiplicands .

The IMCE of any preceding claim, wherein the control circuitry is configured to operate the CSA circuitry in a third setting that calculates a power of a predefined base by a predefined exponent, modulo the predefined number.

The IMCE after claim 8 , wherein the control circuit is configured to operate the CSA circuit in the third setting by applying the first setting and the second setting in an order defined according to the exponent.

The IMCE of any preceding claim, wherein the CSA and the control circuitry are contained within a network device and configured to perform a cryptographic operation of the network device.

A method of multiplying two multiplicands by a predefined number, the method comprising: operating a multi-input carry-save adder (CSA) circuit having outputs including a sum output and a carry output; and using a control circuit coupled to the inputs and the outputs of the CSA circuit, controlling the CSA circuit to operate in at least (i) a first setting that calculates a Montgomery precalculation value and (ii) a second setting, which calculates a Montgomery multiplication of the two multiplicands.

The procedure after claim 11 wherein controlling the CSA circuit comprises logically shifting the sum output and the carry output of the CSA circuit and coupling the shifted sum output and the shifted carry output to respective inputs of the CSA circuit.

The procedure after claim 12 wherein controlling the CSA circuit comprises logically left shifting the sum output and the carry output of the CSA circuit in the first setting and logically right shifting the sum output and the carry output of the CSA circuit in the second setting.

The procedure according to one of the Claims 11 , 12 or 13 , wherein the control of the CSA circuit in the first setting comprises setting two of the inputs of the CSA circuit to a constant value dependent on the predefined number.

The procedure according to one of the Claims 11 , 12 or 13 , wherein the control of the CSA circuit in the first setting comprises setting an input of the CSA circuit to the predefined number or to zero depending on the most significant bits of the sum output and the carry output of the CSA circuit and on the two multiplicands.

The procedure according to one of the Claims 11 until 15 wherein controlling the CSA circuit in the second setting comprises setting an input of the CSA circuit to zero or one of the multiplicands depending on the other of the multiplicands.

The procedure according to one of the Claims 11 until 16 wherein the control of the CSA circuit in the second setting comprises setting an input of the CSA circuit to zero or to the predefined number depending on the least significant bits of the sum output, the carry output and the two multiplicands.

The procedure according to one of the Claims 11 until 17 wherein controlling the CSA circuitry further comprises operating the CSA circuitry in a third setting that computes a predefined base raised to a power by a predefined exponent, modulo the predefined number.

The procedure after Claim 18 , wherein operating the CSA circuit in the third setting includes applying the first setting and the second setting in an order defined according to the exponent.

The procedure according to one of the Claims 11 until 19 wherein the operation and control of the CSA is performed in a network device to perform a cryptographic operation of the network device.