CN113505383A - ECDSA algorithm execution system and method - Google Patents

ECDSA algorithm execution system and method Download PDF

Info

Publication number
CN113505383A
CN113505383A CN202110747114.2A CN202110747114A CN113505383A CN 113505383 A CN113505383 A CN 113505383A CN 202110747114 A CN202110747114 A CN 202110747114A CN 113505383 A CN113505383 A CN 113505383A
Authority
CN
China
Prior art keywords
forwarding
branch
chip
data
branch instruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110747114.2A
Other languages
Chinese (zh)
Inventor
范志华
秦宏
吴欣欣
李文明
安学军
叶笑春
范东睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN202110747114.2A priority Critical patent/CN113505383A/en
Publication of CN113505383A publication Critical patent/CN113505383A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Storage Device Security (AREA)

Abstract

The invention provides an ECDSA algorithm execution system and method.A host end carries out encryption and decryption operations through a data flow chip and transmits encryption information through a network. Compared with the prior art, the method has the advantages that the ECDSA algorithm is transplanted to the data flow architecture chip, the characteristic of low access demand of the data flow architecture chip is utilized, the operation process of the ECDSA algorithm is accelerated, the generation process of the key is accelerated, and the universality of the data flow chip is improved.

Description

ECDSA algorithm execution system and method
Technical Field
The invention relates to the technical field of computer architecture, in particular to an ECDSA algorithm execution system and method.
Background
The asymmetric encryption algorithm comprises a public key and a private key, the plaintext obtains a ciphertext through encrypting information by the public key, and the ciphertext obtains plaintext information through decrypting information by the private key. In the field of encrypted communication such as digital currency transactions, besides using public keys and private keys to encrypt information, digital signatures are also needed to ensure the authenticity of the information. The application of the asymmetric encryption is digital signature, and commonly used encryption communication and digital signature algorithms comprise an RSA algorithm, a DH algorithm and an ECC algorithm.
With the improvement of computer performance, 1024-bit RSA encryption algorithm based on large number factorization can be cracked, 2048-bit RSA encryption algorithm can be gradually cracked along with the technical development, and therefore the adoption of a safer encryption algorithm becomes a necessary route for encryption communication. Compared with the RSA algorithm, the ECC algorithm utilizes the discrete logarithm problem to realize encryption, and is an encryption algorithm which is very simple in forward calculation and has no sub-exponential time complexity algorithm in a solution method in reverse calculation. Therefore, more and more digital signature applications now adopt the ECC algorithm as their encryption algorithm.
An Elliptic Curve Digital Signature Algorithm (ECDSA) is a representative complex Algorithm, and is simulated by using an Elliptic Curve Cryptography (ECC). ECDSA was the ANSI standard in 1999 and the IEEE and NIST standards in 2000. Unlike the conventional Discrete Logarithm Problem (DLP) and the large number decomposition problem (IFP), the Elliptic Curve Discrete Logarithm Problem (ECDLP) has no solution of sub-exponential time. Therefore, the unit bit strength of the elliptic curve cryptography is higher than that of other public key systems.
The ECDSA algorithm has a long calculation flow and needs repeated parameter values required for generating an elliptic curve, so that in the traditional Von Neumann structure, an arithmetic unit needs to repeatedly acquire information from a memory for calculation, so that more memory access instructions and memory access time exist, and the time delay for generating a public key and a private key is increased.
Disclosure of Invention
To solve the above problems in the prior art, an ECDSA algorithm execution system is provided, which includes: a transmitting end and a receiving end,
the sending end comprises first equipment and a first data stream chip, wherein the first equipment is used for generating a private key, and the first data stream chip is used for generating a public key according to the private key and signing a message by using the private key; the receiving end comprises a second device and a second data stream chip, the second data stream chip is used for obtaining a verification result according to the received public key and the received message signature, and the second device is used for receiving the verification result.
Preferably, each PE of the first dataflow chip and/or the second dataflow chip forwards an operation result based on a branch transfer history table of an instruction.
Preferably, the branch transfer history table includes an instruction ID, a forwarding flag, and a forwarding destination, where the instruction ID records a sequence number of the branch instruction, the forwarding flag is used to mark validity of the forwarding record, and the forwarding destination is used to record a forwarding destination of result data of the branch instruction.
Preferably, the forwarding destination is a data forwarding direction, and the data forwarding direction is one of up, down, left, and right.
Preferably, in the instruction decode stage, the ID of the branch instruction is written into the branch history table.
Preferably, when the branch instruction is executed for the first time, the forwarding destination of the branch instruction is written into the branch transfer history table according to the ID of the branch instruction, and the corresponding forwarding flag is set to be valid.
Preferably, when a branch instruction is executed, the result forwarding destination of the branch instruction is judged to be the corresponding forwarding destination in the branch instruction history table, when the result forwarding destination of the branch instruction is different from the corresponding forwarding destination in the branch instruction history table, the forwarding flag is set to be invalid, and when the branch instruction is executed, the forwarding destination of the branch instruction is updated.
The invention provides an ECDSA algorithm execution method based on the system, which comprises the following steps:
a first data flow chip of a sending end generates a public key according to a private key generated by first equipment of the sending end;
the first data flow chip acquires a message signature based on a private key and a message;
the sending end sends the public key and the message signature to a receiving end;
a second data flow chip of the receiving end acquires a verification result according to the received public key and the message signature;
and the second equipment of the receiving end receives the verification result.
Preferably, the execution method includes: and mapping Montgomery number multiplication in the ECDSA algorithm to the first data flow chip and the second data flow chip based on a branch prediction mechanism.
Preferably, the branch prediction mechanism predicts a data transmission direction of the PE through a branch transfer history table.
The invention has the following characteristics and beneficial effects: compared with the prior art, the method has the advantages that the ECDSA algorithm is transplanted to the data flow architecture chip, the operation process of the ECDSA algorithm is accelerated by using the characteristic of low access demand of the data flow architecture chip by comparing with the execution process of the ECDSA algorithm, and the generation process of the secret key is accelerated. The invention transplants the ECDSA algorithm on the data flow architecture chip, and considers the wide application of the ECDSA algorithm, thereby increasing the universality of the data flow chip.
Drawings
FIG. 1 shows a system architecture diagram in accordance with one embodiment of the present invention.
Fig. 2 is a diagram illustrating an operation process of a transmitting end generating a private key and a public key according to an embodiment of the present invention.
Fig. 3 shows a diagram of the operation of generating a signature by means of a public key and a private key according to one embodiment of the invention.
FIG. 4 shows a diagram of a signature verification operation according to one embodiment of the invention.
FIG. 5 is a schematic diagram of a prior art Montgomery large number multiplication loop process.
FIG. 6 shows a dataflow chip architecture diagram, according to one embodiment of the invention.
FIG. 7 illustrates a branch history table corresponding to a branch prediction mechanism according to one embodiment of the invention.
FIG. 8 shows a loop program code for a prior art Montgomery large number multiplication.
FIG. 9 illustrates a data flow diagram for a Montgomery large number multiplication according to one embodiment of the present invention.
FIG. 10 illustrates a mapping of a dataflow graph for a Montgomery large number multiplication to a PE array, according to one embodiment of the invention.
Detailed Description
The invention is described below with reference to the accompanying drawings and specific embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The traditional von neumann computer is composed of a memory, an arithmetic unit, a controller and an input/output device, the idea of storing programs is adopted, the arithmetic unit takes out data from the memory, carries out arithmetic operation and writes the result back to the memory, and instructions in the computer are sequentially executed according to the sequence of the instructions in the memory. However, for application requirements specific to neural networks, computational science, and the like, sequential execution features of the von neumann computer frequently access memory when executing dedicated tasks, and parallel features in large tasks are not well utilized, so that the performance of the von neumann computer when processing dedicated tasks such as neural networks is low, and the sequential execution features prevent optimization of the dedicated tasks, and therefore, the processing-specific computing tasks need to be considered as computers different from the von neumann computer.
The data flow structure is different from a Von Neumann structure, the data flow structure improves the parallelism of data through coarse-grained representation, a compiler of the data flow structure can simultaneously schedule a plurality of sequential cycles and functions, and higher throughput and lower delay are realized, so that the data flow structure has the advantages of less memory access requirement and low synchronization overhead when processing applications with higher data parallelism such as neural networks, scientific calculations and the like. However, due to the specificity of the data stream structure, the data stream structure cannot adapt to a wide range of application scenarios, and a software stack of the widely accepted data stream structure is not developed yet, and a complicated program is migrated to a computer of the data stream structure, and special technical personnel are needed for design and optimization.
The invention aims to solve the problems of high memory access requirement, long key generation time, low utilization rate of PE (Process Element) array components and the like of the ECDSA algorithm in the prior art.
The inventor has recognized in conducting ECDSA algorithm studies that the problems of the prior art can be solved by the following technical improvements in several aspects.
Firstly, the ECDSA algorithm itself has a long calculation flow and requires repeated generation of parameter values required for an elliptic curve, so that in the conventional von neumann structure, an arithmetic unit needs to repeatedly acquire information from a memory for calculation, so that more memory access instructions and memory access time exist, and the time delay for generating public keys and private keys is increased. Based on the above findings, the inventors thought that the time for generating the public key and the private key can be effectively reduced by reducing the memory access time for generating the public key and the private key.
Second, the conventional ECDSA algorithm needs to use montgomery number multiplication when generating a key, and in the sequential von neumann structure, the montgomery number multiplication is executed in a loop program, and similarly, repeated access operations are needed, which increases the time delay in the key generation process, so the inventor thinks that the key generation process can be accelerated by using the low accessibility of the data stream structure.
Thirdly, due to the loop program existing in Montgomery large number multiplication, the prediction of the loop can effectively reduce redundant instructions for judging loop jump and accelerate the calculation time of the large number multiplication. Therefore, the inventor proposes a branch prediction mechanism based on a data flow structure, which predicts the next step of executing instructions and data transmission and transmission of the loop and accelerates the execution time of the program loop.
Fourthly, the ECDSA algorithm is widely applied as an encryption algorithm, the processing and calculation burden of the universal chip is increased when the ECDSA algorithm is adopted in the universal chip for encryption, in order to accelerate the time of key generation and reduce the access and storage requirements during generation, the data required by key generation are put into a special coarse-grained data flow structure, the time of key generation of the ECDSA algorithm can be effectively accelerated, the access and storage requirements during generation are reduced, the characteristic of low access and storage requirements of the data flow chip is fully utilized, and the execution process of the ECDSA algorithm is effectively accelerated.
Based on the above research and analysis, the present invention provides an ECDSA algorithm execution system based on a data flow structure, the architecture of which is shown in fig. 1, and the ECDSA algorithm execution system includes: the system comprises a sending end and a receiving end, wherein the sending end is used for generating a private key and a public key, generating a message signature according to the private key and a message and sending the message signature to the receiving end; the receiving end is used for obtaining a verification result according to the received private key and the message signature; the sending end comprises first equipment and a first data stream chip, wherein the first equipment is used for generating a private key, and the first data stream chip is used for generating a public key according to the private key and signing a message by using the private key; the receiving end comprises a second device and a second data stream chip, the second data stream chip is used for obtaining a verification result according to the received private key and the received message signature, and the second device is used for receiving the verification result.
Fig. 2 is a diagram illustrating an operation process of a function ecc _ make _ key for generating a private key and a public key by a transmitting end according to an embodiment of the present invention. The annotation of the function ecc _ make _ key is as follows:
function name: ecc _ make _ key
Function: generating private and public keys
Parameters are as follows: p _ publickKey [ ]: 48+1 byte (48 bytes is x coordinate value of public key, 1 byte is check bit)
p _ privateKey: 48 bytes, 384 bits
The function returns: generate a successful return 1, otherwise return 0
The specific process of generating the private key and the public key by the function ecc _ make _ key is as follows:
(1) selecting a random integer l _ private as a private key;
(2) judging whether l _ private is 0, if so, returning to the step (1);
(3) judging whether l _ private is smaller than n, if so, assigning l _ private as l _ private-n, and executing the step (4);
(4) calling an EccPoint _ mult function to calculate a public key, and if the public key is 0, returning to the step (1); wherein the EccPoint _ mult function can be implemented by the prior art, for example, nano-ecc or micro-ecc in GitHub;
(5) and generating a public key Q and a private key l _ private, and finishing the program execution.
And (3) finishing the step (1) and the step (3), and generating a public key and a private key required by signature calculation after the step (1) and the step (3) are executed.
After the private key and the public key are generated, the message can be signed using the private key. The ECDSA algorithm is a prior art, and an Elliptic Curve Cryptography (ECC) is used to simulate a digital signature algorithm, wherein a signature is composed of a pair of integers (r, s), and the signature is calculated by a public key and a private key.
Let D be (p, a, b, G, n, h), where,
p is a prime number and is used for determining the range of a finite field;
a and b are parameters in an elliptic curve equation;
g denotes a base point for generating a subgroup;
n is the order of the subgroup;
h is the cofactor h of the subgroup.
The corresponding key pair is (l _ private, Q), where l _ private is the private key and Q is the public key. The specific steps of signing are as follows: randomly selecting an integer k, calculating the value of a point p through an EccPoint _ mult function, obtaining the x coordinate of the point p as r, obtaining d through format change of a private key l _ private, and calculating s as (e + r) d)/k, wherein e is a hash value of data to be signed. In the calculation process, both r and s are required to be 0, and if one 0 exists, the value of the integer k needs to be reselected. The EccPoint _ mult function used to implement the signature algorithm is implemented by using the prior art, such as nano-ecc or micro-ecc in the GitHub.
Fig. 3 shows a flowchart of the operation of generating a signature by using a public key and a private key according to an embodiment of the present invention, where the flowchart of generating a signature is shown in fig. 3, and the step of calculating the signature of a message x is as follows:
(1) selecting a random integer k, with 0< k < n;
(2) calling an EccPoint _ mult function to calculate p (p ═ k × G, x coordinate x1 of p is used to calculate r);
(3) r ═ x1(mod n), with the meaning: if x1 is less than n, then r is x1, otherwise, x1 is modulo such that the final result r satisfies 0< r < n;
(4) if r is 0, returning to the step (1);
(5) obtaining d through format change of a private key l _ private by an ecc _ native2bytes function, and calculating s ═ e + r ×/k, wherein e is a hash value of data to be signed; the ecc _ native2bytes function for format change is used for converting byte order, and can be implemented by adopting the prior art, such as nano-ecc or micro-ecc in GitHub;
(6) and (r, s) is the signature, and the program ends.
After the above process is completed, the signature of the message x is generated, at this time, the sending end transmits the signature of the message x to the receiving end through the network, the receiving end performs the verification part of the ECDSA algorithm through the data flow chip, and sends the verification result to the Host of the receiving end, so as to obtain the message content of encrypted transmission.
The verification process of the ECDSA algorithm verified by the receiver is obtained according to the provided public key Q, the hash value h (x) of the data to be verified and the signature (r, s). The signature verification step comprises: calculating auxiliary values u1 and u2 according to the signature (r, s) and the hash value h (X) of the message, and calculating a point X according to the public key Q, the base point G and the auxiliary value, wherein the abscissa X1 of the point X satisfies the following formula:
x1 mod p=r mod p。
fig. 4 shows an operational process diagram of the signature verification function ecdsa _ verify, according to an embodiment of the present invention, which is annotated as follows:
function name: ecdsa _ verify
Function: judging whether the data is modified according to the public key and the data
Parameters are as follows: p _ signature [ ]: 48 x 2bytes (signature)
p _ publicKey [ ]: 48 bytes (384) bits
p _ hash [ ]: 48 bytes (384) bits, hash of data to be signed
The function returns: verification successfully returns 1, otherwise returns 0
The verification step performed by the function ecdsa _ verify is as follows:
(1) judging whether the signature (r, s) is greater than 0 and less than n, if not, ending the program
(2) The calculation assistance value u1 ═ h (x)/s) mod n
(3) The calculation assistance value u2 ═ h (x)/r) mod n
(5) Calculate rx u 1G + u 2Q
(6) Comparing rx with r in public key (r, s), if rx is r mod q, the signature is valid, otherwise it is invalid
At this time, the process that data are encrypted from the sending end and then transmitted to the receiving end through the network is completed, and the receiving end obtains the content of the message after verifying the correctness and the reliability of the message, so that the ECDSA algorithm is realized on a data flow chip.
The inventor finds in research that the implementation of montgomery number multiplication in the key generation phase includes the case of loop program nesting multiple branches, and the montgomery number multiplication is described in detail below.
In cryptographic algorithms, public key algorithm implementation usually requires computation of modular multiplication (e.g., a · B (mod N), where a, B, and N are large integers of several kilobits), and the operation of modular multiplication includes division. For a computer, division calculation requires circular calculation, is difficult to implement, and has a long calculation time. It is therefore desirable to have other algorithms that can speed up the modular multiplication operation. Montgomery large number multiplication is a method for accelerating the calculation of modular multiplication proposed by Montgomery of mathematicians, and the algorithm does not need to calculate division, thereby greatly facilitating the modular multiplication operation.
The principle of Montgomery large number multiplication is as follows:
for a · b (mod N), it is assumed that N is represented by R, and the number of bits when N is represented by R is k. Then, A is also expressed as R system to obtain
Figure BDA0003144682590000081
Then consider the expression A.B.R-k(mod N) with
Figure BDA0003144682590000082
As can be seen from the above equation, for C, the contents of each i can be accumulated and summed in the computer program and then modulo to obtain the value of C. Each portion to be accumulated is ai·B/Rk-iThus, the following examples are given:
Figure BDA0003144682590000083
however, in the above procedure, the step of C/═ R loses the remainder, and solving the lost remainder requires C to be added to a value that is a multiple of R, and in order to ensure that the final result is unchanged, the value supplemented is also a multiple of N, and the resulting cycle is,
Figure BDA0003144682590000091
considering the additive nature of C, then the condition that q needs to satisfy is such that (C + a [ i [ ])]B + q × N) (mod R) ═ 0, i.e., (C)0+a[i]*B0+q*N0) (mod R) 0, where C0、B0And N0The single digit values for C, B and N expressed in R scale are equivalent to modulo R separately. Then one solution of q is (C)0+a[i]*B0)*(-N0 -1) (mod R) substituting the q value to obtain a cyclic procedure of,
Figure BDA0003144682590000092
in the above loop, it can be seen that the original operation of multiplying a large number and then taking a module is simplified into a series of addition and subtraction and shift operations. In particular, R is usually selected as a power of 2, so that taking the modulus of R can become a shift operation, further eliminating the complexity of the operation.
How to calculate A.B.R is explained above-k(mod N) is simplified, denoted as mont _ prod (A, B), and if A · B (mod N) is calculated, the following four arithmetic expressions may be calculated,
Figure BDA0003144682590000101
Figure BDA0003144682590000102
Figure BDA0003144682590000103
Figure BDA0003144682590000104
the first two steps are called entering the Montgomery field and the last step is called exiting the Montgomery field, and the final result C is the value of (A.B) (mod N).
The Montgomery algorithm has the advantages that:
in the operation time, the Montgomery algorithm eliminates the modular operation in the modular multiplication operation, thereby saving the execution time of division in the execution process of a computer; the loop in the Montgomery algorithm can modify the size of the R to reduce the number of the loop, thereby adjusting the time for loop calculation; b in Montgomery algorithm can be calculated in advance0And N0Only calculation is needed in the loopC0And thus does not increase the operation time due to the additional overhead.
In the operation space, the traditional modular multiplication operation needs to calculate multiplication first, the multiplication of a large number of bits needs to occupy one time more memory space than the original number, the use of the memory space of a computer is influenced, and more memory space is further occupied by the modular extraction of the multiplication result; the modular multiplication operation realized by shifting also affects the calculation efficiency because one bit is moved each time; the Montgomery algorithm changes the execution times of the loop through the change of R, and saves the memory space required by loop execution.
When the Montgomery large number multiplication is actually realized, the accumulation characteristic of the result C is considered, the loop process can be optimized, the calculation process is parallelized, and the modular multiplication operation is further accelerated.
An example code implementation of the Montgomery algorithm is as follows:
Figure BDA0003144682590000105
Figure BDA0003144682590000111
Figure BDA0003144682590000121
the code from tagA to tagB in the program is the main calculation part, and the structure is that 4 branches are nested in a while loop, which corresponds to the schematic diagram of the Montgomery large number multiplication loop program in the prior art shown in FIG. 5, wherein the nested multi-path branches of the loop program are shown.
FIG. 6 shows a dataflow chip architecture diagram, according to one embodiment of the invention. The round robin procedure shown in fig. 5 is highly matched to the common dataflow architecture chip architecture shown in fig. 6, taking into account the transitivity and dependency of the multi-way branches. Therefore, in the data stream architecture, branch prediction is performed through the history record, and the downstream PE number or the data sending direction is obtained through the history record, so that the utilization rate of module components in the data stream chip can be improved, and the generation time of the key can be shortened.
According to an embodiment of the present invention, the present invention provides a branch prediction mechanism based on a data flow structure, in which the method decomposes a montgomery number multiplication step in an ECDSA algorithm into a loop program, and proposes a branch prediction mechanism in a data flow architecture according to an execution characteristic of the loop program. The branch prediction mechanism predicts the downstream data sending direction through historical records, reduces the jump instruction time of a loop program, reduces the PE waiting time caused by data dependence, and improves the component utilization rate of the PE array.
According to one embodiment of the invention, the branch prediction mechanism comprises: each PE internally maintains a branch transfer history table of an instruction; FIG. 7 illustrates a branch history table corresponding to a branch prediction mechanism, including an instruction ID, a forwarding flag, and a forwarding destination, according to an embodiment of the invention. The instruction ID records the serial number of the branch instruction, the instruction passes through a decoding stage, and if the instruction is the branch instruction, the ID of the instruction is added into a branch history table. When the PE transmits the instruction, if the transmitted instruction ID is stored in the branch history table and the forwarding mark g of the instruction is forwarding confirmation, the data forwarding destination information in the branch history table is sent to a route linking the PE, and the operation result is forwarded to the corresponding PE according to the forwarding destination.
The purpose of the forwarding flag is to mark whether the forwarding record is valid, and when a branch instruction is executed for the first time, the forwarding destination is filled in the table according to the forwarding destination of the instruction, and the forwarding flag is marked as true. If the forwarding purpose of the actual program execution result is different from the forwarding purpose in the table for a certain execution, the forwarding flag is set false, which indicates the last prediction error, and the purpose needs to be updated according to the actual forwarding purpose after the instruction execution is finished.
The forwarding destination is not the address of the next instruction recorded in the conventional branch recording mechanism, and the forwarding destination of the result data of the instruction is recorded. Since the program of the data flow architecture is represented by a data flow graph and is actually completed by the flow of data in the execution process, the purpose or forwarding direction of the data flow is taken as the content to be predicted. According to one embodiment of the invention, the following data forwarding direction is adopted: up, down, left and right.
Specifically, fig. 8 shows a loop program code of montgomery number multiplication in the prior art, in which code segments are numbered, reference numeral 1 denotes a while main loop, reference numerals 2, 3, 4, and 5 denote judgment statements of 4 branches in the while loop, respectively, and reference numerals 6, 7, 8, and 9 correspond to code segments of 4 branches, respectively. The code of FIG. 8 corresponds to the data flow diagram for the Montgomery large number multiplication of FIG. 9. The node numbers in fig. 9 correspond to the code segment numbers in fig. 8. Fig. 10 shows an example of mapping the data flow graph of fig. 9 to a PE array, where data flow graph nodes numbered 1, 2, 3, 4, 5 are mapped in PE101, and four adjacent PEs of PE101 map four nodes 6, 7, 8, 9 of the data flow graph, respectively. The branch transfer history table in the PE101 records the ID, forwarding destination, and forwarding flag of the branch instruction, and when the PE101 transmits the instruction, the PE forwards the operation result to one of the four adjacent PEs according to the ID, forwarding destination, and forwarding flag of the branch instruction in the branch transfer history table.
According to an embodiment of the present invention, the present invention provides an ECDSA performing method based on the above system, the method including the steps of: the sending end and the receiving end equipment both comprise a Host and a data stream chip, the Host of the sending end generates a secret key meeting the requirement, the secret key is sent to the data stream chip of the sending end equipment, the data stream chip generates a public key according to the private key, and the public key is returned to the Host of the sending end. And then the data stream chip performs signature calculation on the data to be sent according to the generated private key, and after the signature calculation is completed, the signature information and the public key can be transmitted to a receiving end through a network. And the data flow chip of the receiving end verifies the signature, and sends a verification result to the Host of the receiving end to complete the whole process of the ECDSA algorithm.
It is to be noted and understood that various modifications and improvements can be made to the invention described in detail above without departing from the spirit and scope of the invention as claimed in the appended claims. Accordingly, the scope of the claimed subject matter is not limited by any of the specific exemplary teachings provided.

Claims (10)

1. An ECDSA algorithm execution system, comprising: a transmitting end and a receiving end,
the sending end comprises first equipment and a first data stream chip, wherein the first equipment is used for generating a private key, and the first data stream chip is used for generating a public key according to the private key and signing a message by using the private key; the receiving end comprises a second device and a second data stream chip, the second data stream chip is used for obtaining a verification result according to the received public key and the received message signature, and the second device is used for receiving the verification result.
2. The system of claim 1, each PE of the first dataflow chip and/or the second dataflow chip to forward operation results based on a branch history table of instructions.
3. The system of claim 2, the branch transition history table comprising an instruction ID recording a sequence number of a branch instruction, a forwarding flag for marking validity of a forwarding record, and a forwarding destination for recording forwarding purposes of result data of the branch instruction.
4. The system of claim 3, wherein the forwarding destination is a data forwarding direction, and the data forwarding direction is one of up, down, left, and right.
5. The system of claim 4, wherein, in the instruction decode stage, an ID of the branch instruction is written to the branch transfer history table.
6. The system of claim 5, upon first execution of a branch instruction, writing a forwarding destination of the branch instruction to the branch transfer history table according to the ID of the branch instruction, and asserting a corresponding forwarding flag.
7. The system of claim 5, wherein upon execution of a branch instruction, determining that a result forwarding destination of the branch instruction is the corresponding forwarding destination in the branch instruction history table, and if not, invalidating a forwarding flag, and upon completion of execution of the branch instruction, updating the forwarding destination of the branch instruction.
8. An ECDSA algorithm execution method based on the system of one of claims 1 to 7, comprising:
a first data flow chip of a sending end generates a public key according to a private key generated by first equipment of the sending end;
the first data flow chip acquires a message signature based on a private key and a message;
the sending end sends the public key and the message signature to a receiving end;
a second data flow chip of the receiving end acquires a verification result according to the received public key and the message signature;
and the second equipment of the receiving end receives the verification result.
9. The execution method of claim 8, comprising: and mapping Montgomery number multiplication in the ECDSA algorithm to the first data flow chip and the second data flow chip based on a branch prediction mechanism.
10. The execution method of claim 9, the branch prediction mechanism predicts a data forwarding direction of the PE through a branch transfer history table.
CN202110747114.2A 2021-07-02 2021-07-02 ECDSA algorithm execution system and method Pending CN113505383A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110747114.2A CN113505383A (en) 2021-07-02 2021-07-02 ECDSA algorithm execution system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110747114.2A CN113505383A (en) 2021-07-02 2021-07-02 ECDSA algorithm execution system and method

Publications (1)

Publication Number Publication Date
CN113505383A true CN113505383A (en) 2021-10-15

Family

ID=78009796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110747114.2A Pending CN113505383A (en) 2021-07-02 2021-07-02 ECDSA algorithm execution system and method

Country Status (1)

Country Link
CN (1) CN113505383A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429051A (en) * 2022-04-01 2022-05-03 深圳鲲云信息科技有限公司 Modeling method, device, equipment and medium of data flow chip

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090327746A1 (en) * 2007-04-10 2009-12-31 International Business Machines Corporation Key encryption and decryption
CN104065648A (en) * 2014-06-05 2014-09-24 天地融科技股份有限公司 Data processing method of voice communication
CN106155979A (en) * 2016-05-19 2016-11-23 东南大学—无锡集成电路技术研究所 A kind of DES algorithm secret key based on coarseness reconstruction structure extension system and extended method
CN111738703A (en) * 2020-05-29 2020-10-02 中国科学院计算技术研究所 Accelerator for accelerating secure hash algorithm
CN112215349A (en) * 2020-09-16 2021-01-12 中国科学院计算技术研究所 Sparse convolution neural network acceleration method and device based on data flow architecture
CN113032011A (en) * 2021-03-12 2021-06-25 北京睿芯数据流科技有限公司 Method and system for executing circular program in data flow architecture
CN113032845A (en) * 2021-03-31 2021-06-25 郑州信大捷安信息技术股份有限公司 EdDSA signature implementation method and device for resource-constrained chip

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090327746A1 (en) * 2007-04-10 2009-12-31 International Business Machines Corporation Key encryption and decryption
CN104065648A (en) * 2014-06-05 2014-09-24 天地融科技股份有限公司 Data processing method of voice communication
CN106155979A (en) * 2016-05-19 2016-11-23 东南大学—无锡集成电路技术研究所 A kind of DES algorithm secret key based on coarseness reconstruction structure extension system and extended method
CN111738703A (en) * 2020-05-29 2020-10-02 中国科学院计算技术研究所 Accelerator for accelerating secure hash algorithm
CN112215349A (en) * 2020-09-16 2021-01-12 中国科学院计算技术研究所 Sparse convolution neural network acceleration method and device based on data flow architecture
CN113032011A (en) * 2021-03-12 2021-06-25 北京睿芯数据流科技有限公司 Method and system for executing circular program in data flow architecture
CN113032845A (en) * 2021-03-31 2021-06-25 郑州信大捷安信息技术股份有限公司 EdDSA signature implementation method and device for resource-constrained chip

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
尹玲: "二元域椭圆曲线密码算法的高性能标量乘法器设计", 中国优秀硕士学位论文全文数据库 信息科技辑, no. 12, pages 2 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114429051A (en) * 2022-04-01 2022-05-03 深圳鲲云信息科技有限公司 Modeling method, device, equipment and medium of data flow chip

Similar Documents

Publication Publication Date Title
Van Oorschot et al. Parallel collision search with cryptanalytic applications
Albrecht et al. Implementing RLWE-based schemes using an RSA co-processor
CN107040385B (en) Method and system for realizing signature verification algorithm based on SM2 elliptic curve
US8340280B2 (en) Using a single instruction multiple data (SIMD) instruction to speed up galois counter mode (GCM) computations
JP5064408B2 (en) Digital signature for network encoding
US7904498B2 (en) Modular multiplication processing apparatus
CN113628094B (en) High-throughput SM2 digital signature computing system and method based on GPU
CN109039640B (en) Encryption and decryption hardware system and method based on RSA cryptographic algorithm
CN115622684B (en) Privacy computation heterogeneous acceleration method and device based on fully homomorphic encryption
US7835517B2 (en) Encryption processing apparatus, encryption processing method, and computer program
KR20070008012A (en) Cryptographic apparatus and method for fast computation of blinding-exponent dpa countermeasure
US20090136025A1 (en) Method for scalarly multiplying points on an elliptic curve
WO2010048719A1 (en) Method and apparatus for modulus reduction
Lee et al. TensorCrypto: High throughput acceleration of lattice-based cryptography using tensor core on GPU
Koppermann et al. 18 seconds to key exchange: Limitations of supersingular isogeny Diffie-Hellman on embedded devices
CN109144472B (en) Scalar multiplication of binary extended field elliptic curve and implementation circuit thereof
Dong et al. Utilizing the Double‐Precision Floating‐Point Computing Power of GPUs for RSA Acceleration
CN113505383A (en) ECDSA algorithm execution system and method
CN111314080B (en) SM9 algorithm-based collaborative signature method, device and medium
CN115952526B (en) Ciphertext ordering method, equipment and storage medium
CN113114462A (en) Small-area scalar multiplication circuit applied to ECC (error correction code) safety hardware circuit
CN117155572A (en) Method for realizing large integer multiplication in cryptographic technology based on GPU (graphics processing Unit) parallel
Trujillo-Olaya et al. Hardware implementation of elliptic curve digital signature algorithm over GF (2409) using sha-3
KR102587719B1 (en) Circuits, computing chips, data processing devices and methods for performing hash algorithms
Elango et al. High-Performance Multi-RNS-Assisted Concurrent RSA Cryptosystem Architectures

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination