CN110795748A - Method, system and medium for realizing stream cipher algorithm based on reconfigurable computing array - Google Patents

Method, system and medium for realizing stream cipher algorithm based on reconfigurable computing array Download PDF

Info

Publication number
CN110795748A
CN110795748A CN201911020613.0A CN201911020613A CN110795748A CN 110795748 A CN110795748 A CN 110795748A CN 201911020613 A CN201911020613 A CN 201911020613A CN 110795748 A CN110795748 A CN 110795748A
Authority
CN
China
Prior art keywords
configuration
operator
reconfigurable computing
value
computing array
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911020613.0A
Other languages
Chinese (zh)
Other versions
CN110795748B (en
Inventor
刘雷波
朱敏
魏少军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Research Institute of Applied Technologies of Tsinghua University
Original Assignee
Wuxi Research Institute of Applied Technologies of Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Research Institute of Applied Technologies of Tsinghua University filed Critical Wuxi Research Institute of Applied Technologies of Tsinghua University
Priority to CN201911020613.0A priority Critical patent/CN110795748B/en
Publication of CN110795748A publication Critical patent/CN110795748A/en
Application granted granted Critical
Publication of CN110795748B publication Critical patent/CN110795748B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Microcomputers (AREA)
  • Logic Circuits (AREA)

Abstract

The embodiment of the invention provides a method, a system and a computer readable storage medium for realizing a stream cipher algorithm based on a reconfigurable computing array. The method comprises the following steps: acquiring first configuration information and performing first configuration on the reconfigurable computing array according to the first configuration information; acquiring second configuration information and performing second configuration on the reconfigurable computing array according to the second configuration information; acquiring third configuration information and performing third configuration on the reconfigurable computing array according to the third configuration information; the reconfigurable computing array after the first configuration, the second configuration and the third configuration can sequentially perform initialization, 32 times of loop operation and N times of loop computing processing on an initialization variable, a first value of a constant register and a fixed key to obtain N keys, wherein N is a preset loop frequency. The invention implements the process of realizing the stream cipher algorithm through the reconfigurable computing array, and improves the computing efficiency and the flexibility of the stream cipher algorithm.

Description

Method, system and medium for realizing stream cipher algorithm based on reconfigurable computing array
Technical Field
The present invention relates to the field of reconfigurable computing, and in particular, to a method, a system, and a computer-readable storage medium for implementing a stream cipher algorithm based on a reconfigurable computing array.
Background
The reconfigurable computing array has the advantages of high performance and high speed of an Application Specific Integrated Circuit (ASIC), and also has high universality and powerful programmable function of a microprocessor, so that respective defects of the ASIC and the ASIC are just made up. The stream cipher algorithm (e.g., Snow3G algorithm) can provide data security for each link in the 3G communication system industry chain such as chip manufacturers, device manufacturers, system integrators, network operators, etc., and is widely applied in the encryption field.
However, in implementing the inventive concept, the inventors found that the implementation of the conventional Snow3G algorithm generally employs two schemes: firstly, the method is realized on a general microprocessor, although the method is flexible to use and convenient to update, the operation performance of the microprocessor is limited to be relatively low, so that the performance of Snow3G is far from meeting the requirement; secondly, the method is realized on a special ASIC, which is just opposite to a general-purpose microprocessor, and the performance of the method is very high on the special ASIC, but the flexibility is very poor, and when an algorithm needs to be replaced, a chip has to be replaced, so that the use cost is very high.
Disclosure of Invention
One aspect of the present invention provides a method for implementing a stream cipher algorithm based on a reconfigurable computing array, wherein the method comprises: acquiring first configuration information, performing first configuration on the reconfigurable computing array according to the first configuration information, so that the reconfigurable computing array after the first configuration processes an initialization variable, a first value of a constant register and a fixed key to obtain a linear feedback shift register initial value, a finite state machine register initial value and a first value output by a finite state machine, acquiring second configuration information, and performing second configuration on the reconfigurable computing array according to the second configuration information, so that the reconfigurable computing array after the second configuration performs 32-time cycle operation on the linear feedback shift register initial value, the finite state machine register initial value, the first value output by the finite state machine and a second value output by the constant register to obtain a second value of the linear feedback shift register, a second value output by the finite state machine and a second value output by the finite state machine, and acquiring third configuration information and performing third configuration on the reconfigurable computing array according to the third configuration information so that the reconfigurable computing array after the third configuration performs N-time cycle computation on the second value of the linear feedback shift register, the second value of the finite-state machine register, the second value output by the finite-state machine and the second value of the constant register to obtain N keys, wherein N is a preset cycle number.
Optionally, the performing the first configuration on the reconfigurable computing array includes: configuring at least one operator of a plurality of operators of the reconfigurable compute array as a first operator for implementing a logic function. The second configuring the reconfigurable computing array comprises: configuring at least three operators of the plurality of operators of the reconfigurable computing array as a second operator for implementing a logic function, a third operator for implementing a look-up table function, and a fourth operator for implementing a bit permutation function. The third configuring the reconfigurable computing array comprises: configuring at least three operators of the plurality of operators of the reconfigurable computing array as a fifth operator for implementing a logic function, a sixth operator for implementing a look-up table function, and a seventh operator for implementing a bit permutation function.
Optionally, the method further includes: and when the first configuration, the second configuration or the third configuration is carried out, determining a data processing period corresponding to each configured operator, and configuring the operators with the same data processing period into parallel operation.
Optionally, when performing the first configuration, the second configuration, or the third configuration, the configured operator includes a plurality of input ports, and the method further includes: determining an operation period corresponding to the input data of each input port of the plurality of input ports of the configured operator. When the operation period corresponding to the input data of at least one input port is different from the operation period corresponding to the input data of at least another input port, buffering M periods for the input data corresponding to the smaller operation period until the operation periods corresponding to the input data of the input port corresponding to the larger operation period are the same, wherein M is a positive integer.
Another aspect of the present invention provides a system for implementing a stream cipher algorithm based on a reconfigurable computing array, the system comprising: the device comprises a first configuration module, a second configuration module and a third configuration module. The first configuration module is configured to acquire first configuration information and perform first configuration on the reconfigurable computing array according to the first configuration information, so that the reconfigurable computing array after the first configuration processes an initialization variable, a constant register first value and a fixed key to obtain a linear feedback shift register initial value, a finite state machine register initial value and a finite state machine output first value. And the second configuration module is used for acquiring second configuration information and performing second configuration on the reconfigurable computing array according to the second configuration information so that the reconfigurable computing array after the second configuration performs 32-time cyclic operation on the linear feedback shift register initial value, the finite-state machine register initial value, the first value output by the finite-state machine and the second value output by the constant register to obtain a second value of the linear feedback shift register, a second value output by the finite-state machine register and a second value output by the finite-state machine. And the third configuration module is used for acquiring third configuration information and performing third configuration on the reconfigurable computing array according to the third configuration information so that the reconfigurable computing array after the third configuration performs N-time loop calculation on the second value of the linear feedback shift register, the second value of the finite-state machine register, the second value output by the finite-state machine and the second value of the constant register to obtain N keys, wherein N is a preset loop number.
Optionally, when the first configuration module performs the first configuration on the reconfigurable computing array, the first configuration module is specifically configured to configure at least one operator of the multiple operators of the reconfigurable computing array as a first operator for implementing a logic function. When the second configuration module performs a second configuration on the reconfigurable computing array, the second configuration module is specifically configured to configure at least three operators of the multiple operators of the reconfigurable computing array as a second operator for implementing a logic function, a third operator for implementing a look-up table function, and a fourth operator for implementing a bit permutation function. When the third configuration module performs a third configuration on the reconfigurable computing array, the third configuration module is specifically configured to configure at least three operators of the multiple operators of the reconfigurable computing array as a fifth operator for implementing a logic function, a sixth operator for implementing a look-up table function, and a seventh operator for implementing a bit permutation function.
Optionally, the system further includes: the first determining module is configured to determine a data processing period corresponding to each configured operator when performing the first configuration, the second configuration, or the third configuration, and configure the operators having the same data processing period as a parallel operation.
Optionally, when performing the first configuration, the second configuration, or the third configuration, the configured operator includes a plurality of input ports, and the system further includes: a second determining module and a caching module. The second determining module is configured to determine an operation cycle corresponding to input data of each of the plurality of input ports of the configured operator. The buffer module is used for buffering M periods of the input data corresponding to a smaller operation period until the operation period corresponding to the input data of the input port corresponding to a larger operation period is the same when the operation period corresponding to the input data of at least one input port is different from the operation period corresponding to the input data of at least another input port, wherein M is a positive integer.
Another aspect of the invention provides a computing device comprising: at least one memory storing executable instructions, and at least one processor executing the executable instructions to implement the method as described above.
Another aspect of the invention provides a computer-readable storage medium storing computer-executable instructions for implementing the method as described above when executed.
Another aspect of the invention provides a computer program comprising computer executable instructions for implementing a method as described above when executed.
Therefore, in the technical scheme of the embodiment of the invention, the process of the stream cipher algorithm is realized through the reconfigurable computing array, the computing efficiency and the flexibility of the stream cipher algorithm are improved, and the use cost is reduced.
Drawings
For a more complete understanding of the present invention, and the advantages thereof, reference is now made to the following descriptions taken in conjunction with the accompanying drawings, in which:
FIG. 1 schematically illustrates the architecture of a reconfigurable computing array provided for embodiments of the present invention;
FIG. 2 schematically illustrates a flow diagram of a method for implementing the Snow3G algorithm based on a reconfigurable compute array, according to an embodiment of the present invention;
FIG. 3 is a schematic diagram illustrating a reconfigurable computing array according to an embodiment of the present invention after a second configuration, executing 32 times in a loop;
FIG. 4 is a schematic diagram that schematically illustrates a cascaded computation by a plurality of operators, in accordance with an embodiment of the present invention;
FIG. 5 schematically illustrates a diagram of optimizing a computation cycle, according to an embodiment of the invention;
6A-6B schematically illustrate a schematic of a calculation cycle before and after optimization according to an embodiment of the invention;
FIG. 7 schematically illustrates a block diagram of a system for implementing a stream cipher algorithm based on a reconfigurable compute array, according to an embodiment of the present invention; and
FIG. 8 schematically illustrates a block diagram of a computer system for implementing a stream cipher algorithm based on a reconfigurable compute array, according to an embodiment of the present invention.
Detailed Description
Hereinafter, embodiments of the present invention will be described with reference to the accompanying drawings. It is to be understood that such description is merely illustrative and not intended to limit the scope of the present invention. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present invention.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. The words "a", "an" and "the" and the like as used herein are also intended to include the meanings of "a plurality" and "the" unless the context clearly dictates otherwise. Furthermore, the terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Some block diagrams and/or flow diagrams are shown in the figures. It will be understood that some blocks of the block diagrams and/or flowchart illustrations, or combinations thereof, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the instructions, which execute via the processor, create means for implementing the functions/acts specified in the block diagrams and/or flowchart block or blocks.
Accordingly, the techniques of the present invention may be implemented in hardware and/or in software (including firmware, microcode, etc.). Furthermore, the techniques of this disclosure may take the form of a computer program product on a computer-readable medium having instructions stored thereon for use by or in connection with an instruction execution system. In the context of the present invention, a computer-readable medium may be any medium that can contain, store, communicate, propagate, or transport the instructions. For example, the computer readable medium can include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the computer readable medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The embodiment of the invention provides a method, a system and a computer readable storage medium for realizing a stream cipher algorithm based on a reconfigurable computing array. The method comprises the following steps: acquiring first configuration information, performing first configuration on a reconfigurable computing array according to the first configuration information, processing the initialized variable, a first value of a constant register and a fixed key by the reconfigurable computing array after the first configuration to obtain a linear feedback shift register initial value, a finite state machine register initial value and a first value output by a finite state machine, acquiring second configuration information, performing second configuration on the reconfigurable computing array according to the second configuration information, performing 32-time cyclic operation on the linear feedback shift register initial value, the finite state machine register initial value, the first value output by the finite state machine and a second value output by the constant register by the reconfigurable computing array after the second configuration to obtain a second value of the linear feedback shift register, a second value output by the finite state machine and a second value output by the finite state machine, acquiring third configuration information, and performing third configuration on the reconfigurable computing array according to the third configuration information, and performing N times of cyclic calculation on the second value of the linear feedback shift register, the second value of the finite-state machine register, the second value output by the finite-state machine and the second value of the constant register by using the reconfigurable calculation array after the third configuration to obtain N keys, wherein N is a preset cycle number.
Specifically, the stream cipher algorithm of the embodiment of the present invention is implemented based on a reconfigurable computing array. Before describing how a reconfigurable computing array implements a stream cipher algorithm, a reconfigurable computing array for implementing a stream cipher algorithm is first described, as shown in fig. 1.
Fig. 1 schematically illustrates the architecture of a reconfigurable compute array in an implementation scenario of an embodiment of the invention. As shown in fig. 1, the reconfigurable computing array includes a reconfigurable cryptographic algorithm block 100.
The structure and function of each component in reconfigurable cryptographic algorithm module 100 shown in fig. 1 according to the embodiment of the present invention are explained in detail below, wherein the operator is the basic operation unit of the reconfigurable computing array.
Inputting a First-In First-Out register (In First Out, IFIFO);
output First-In First-Out register (Out First In First Out, OFIFO);
internal memory (General Purpose Register File, GPRF);
reconfigurable Cell Array (RCA): including four array sub-blocks, IFIFO, OFIFO, and GPRF. Wherein, the four array sub-blocks can be configured to complete different functions;
reconfigurable configuration Manager (Reconfigurable Context Manager, RCM): a circuit connection structure for configuring the RCA;
reconfigurable Scheduling Manager (RSM): for controlling the flow, blockage, etc. of data in the array;
array Interface module (configurable Controller Interface, RCI): an array and an external bus interface;
reconfigurable circulator (RLM): the configuration command used for compressing CPU makes the command execution more efficient;
basic Functional Unit (BFU): an operator for implementing a logical function;
nonlinear Substitution Block (SBOX): an operator for implementing a look-up table function;
BENES: an operator for implementing a bit permutation function;
the reconfigurable cryptographic algorithm module 100 has a main body of a reconfigurable computing array RCA, which includes: m1Each compute array subblock having a width of M2Rows, each row having M3And (4) an operator. Wherein M is3The operators include: m4BFU, M5SBOX and M6And BENES, the operators can be dynamically adjusted according to the configuration information to complete different algorithm operations. Wherein M is1~M6Are all integers.
A method for implementing a stream cipher algorithm based on a reconfigurable computing array will be described in detail below with reference to the structure of the reconfigurable computing array of fig. 1. The stream cipher algorithm is exemplified by the Snow3G algorithm. It should be understood that configuring the reconfigurable compute array is part of mapping the Snow3G algorithm to the reconfigurable compute array. Specifically, when the Snow3G algorithm is mapped to the reconfigurable computing array in the invention, the method comprises the following steps:
(1) analyzing a Snow3G algorithm, and mapping the Snow3G algorithm into 3 data flow diagrams by combining the characteristics of a reconfigurable computing array;
(2) generating corresponding configuration information according to the 3 data flow graphs;
(3) storing the configuration information and the operation data into corresponding memories;
(4) the reconfigurable computing array analyzes the configuration information, the microprocessor instructs the reconfigurable computing array to perform computing, and after the computing is finished, the reconfigurable computing array sends an interrupt signal to inform the microprocessor to read data.
FIG. 2 is a flow chart schematically illustrating a method for implementing the Snow3G algorithm based on a reconfigurable computing array according to an embodiment of the present invention, where the method includes operations S210-S230.
According to an embodiment of the invention, the Snow3G body of the algorithm is divided into two part phases including an initialization phase and a keystream generation phase. By analyzing the characteristics of the Snow3G algorithm and combining the characteristics of the reconfigurable computing array, the configuration process of the Snow3G algorithm can be divided into three steps, namely steps S210-S230.
The Snow3G algorithm features may include similarity between components of the algorithm and/or order of execution. For example, the following (1) to (2) can be summarized:
(1) the initialization stage is similar to the calculation of the key stream generation stage
The difference in structure between the initialization phase, which is executed in 32 cycles, and the keystream generation phase, according to an embodiment of the invention, is whether or not the F value participates in the xor operation in the v value calculation. Wherein, F takes part in the exclusive or operation in the v value calculation when the loop is executed for 32 times, and F does not take part in the exclusive or operation in the v value calculation when the key stream generation stage, wherein the specific process of v value calculation will be described in fig. 3. Therefore, the 32 times of cyclic execution in the initialization stage and the key stream generation stage have great similarity in algorithm logic, so that the advantage is that only the 32 times of cyclic execution of the algorithm logic needs to be processed into one data flow graph, one copy is made, and then another graph can be obtained by modifying the part of the F value, wherein the other data flow graph is the data flow graph in the key stream generation stage, and the step of processing the algorithm logic into the mapping of three data flow graphs is greatly simplified. Specifically, in the data flow diagram corresponding to the algorithm logic executed 32 times in a loop, when one operator is calculated, the F value is used as an input to participate in the operation, and therefore, in the data flow diagram corresponding to the key flow generation stage, the F value input to the operator needs to be deleted.
(2) The calculation flow sequence is not replaceable
As can be known from the algorithm process of Snow3G, the calculation flow of the method must be to initialize Linear Feedback Shift Register (LFSR) and Finite State Machine (FSM) operations, perform 32 operations in a loop, and perform key stream generation operations in sequence, where the sequence is not changeable. This makes it necessary to map the algorithmic logic into a dataflow graph in that order as well. The essence of performing 32 loop operations during initialization is the value S of the LFSR0~S15Is updated 32 times, each cycle requiring a first read from MEM.
Wherein the LFSR value S0~S15The value of (A) is input into the array for calculation, and a new LFSR value S is obtained again after the calculation is finished0~S15The value of (1) is stored in the original address of MEM for being taken out in the next operation, and the operation is executed in a circulating mode. Because of the fixed key (k) in the initialization LFSR and FSM operation stages0、k1、k2、k3) Value of (IV) and Initialization Variable (IV)0、IV1、IV2、IV3) The values of (a) are derived from IFIFO, so that the same data graph as the 32-cycle operation stage cannot be used, so that the two stages must be divided into two data flow graphs (a first data flow graph and a second data flow graph) for processing, because if the algorithm for initializing the LFSR and FSM operations and the 32-cycle operation is logically mapped in one data flow graph, the fixed key and the initialization variable are also subjected to 32 cycles, and 32 data are read from the IFIFO, and the algorithm requires that only one data can be readAnd processing the secondary data.
The features of the reconfigurable compute array may include the number of hardware compute resources and/or memory fabric features. For example, the following (1) to (2) can be summarized:
(1) limited computational resources
A data flow diagram corresponds to a configuration scheme of a reconfigurable computing array, and the reconfigurable computing array has the same resources when being configured according to each data flow diagram: n is a radical of1Configurable BFU, N2Configurable SBOX, N3Configurable BENES, N4An IFIFIFO read port, N5One OFIFO write port, N6A GPRF read-write port, wherein N1~N6Are all integers. Wherein the operator resource consists of N1Configurable BFU, N2Configurable SBOX, N3A plurality of configurable BENES. When the resources required by the stream cipher algorithm exceed any of the above resources, the configuration of the RCA needs to be switched.
(2) Memory (MEM) storage structure has no influence on the configuration of the switched RCA
The MEM memory structure of the reconfigurable computing array comprises N7A private sub MEM and 1 public sub MEM. Wherein, the private sub MEM can only be accessed by the corresponding computing array sub-block, and the storage depth is M7(ii) a The common sub-MEM is accessible by all compute array sub-blocks and has a memory depth of M8. Wherein N is7、M7And M8Are all integers. To reduce the hardware design logic unit, M may be set7Large, M8This is small, which results in that when all the required intermediate cache data is stored in the public sub-MEM, the private sub-MEM must be used to participate in data caching because the public sub-MEM is not stored, however, the data between the private sub-MEM must be interacted by a plurality of different computing array sub-blocks, which results in a situation where the computing array sub-blocks are not enough, and this will require switching the configuration of RCA to compensate for this problem. In the embodiment of the invention, when the mapping of the Snow3G algorithm is realized based on the reconfigurable computing array, the MEM storage structure has no influence on the configuration of the RCA (radio-controlled array) caused by switchingSince the Snow3G algorithm needs to buffer very little data, it is sufficient to store it in the common sub-MEM. The MEM memory structure may have an impact on the implementation of other algorithms.
For example, in one embodiment, the logical mapping of the Snow3G algorithm is processed into three data flow diagrams, and then the process of configuring the reconfigurable computing array is divided into three parts according to the three data flow diagrams. Wherein the first dataflow graph is processed to obtain first configuration information that is used to configure the reconfigurable computing array to implement initialization of the Snow3G algorithm, corresponding to step 210. The second dataflow graph is processed to obtain second configuration information that is used to configure the reconfigurable computing array to perform 32 loop computations in the Snow3G algorithm, corresponding to step 220. And processing the third dataflow graph to obtain third configuration information for configuring the reconfigurable computing array to implement calculations that generate a keystream in the Snow3G algorithm, corresponding to step 230. Operation S210, for example, corresponds to performing a first configuration on the reconfigurable computing array according to the first configuration information and performing correlation calculation based on the reconfigurable computing array after the first configuration, operation S220, for example, corresponds to performing a second configuration on the reconfigurable computing array according to the second configuration information and performing correlation calculation based on the reconfigurable computing array after the second configuration, and operation S230, for example, corresponds to performing a third configuration on the reconfigurable computing array according to the third configuration information and performing correlation calculation based on the reconfigurable computing array after the third configuration.
After the configuration of the reconfigurable computing array is completed, the microprocessor writes a specific value into the corresponding register, starts the reconfigurable computing array to start operation, and informs the microprocessor to read the key stream after an operation result is obtained. Operations S210 to S230 are described in detail below with reference to specific examples.
In operation S210, first configuration information is obtained and a first configuration is performed on the reconfigurable computing array according to the first configuration information, so that the reconfigurable computing array after the first configuration processes the initialization variable, the first value of the constant register, and the fixed key, and obtains an initial value of the linear feedback shift register, an initial value of the finite state machine register, and a first value output by the finite state machine.
For example, first configuration information is generated according to the first dataflow graph, and the reconfigurable computing array is subjected to first configuration according to the first configuration information to obtain the reconfigurable computing array after the first configuration. The reconfigurable computing array after the first configuration can read in a 128-bit fixed key k from the first IFIFO0、k1、k2、k3And an initialization variable IV of 128 bits0、IV1、IV2、IV3And reading the constant register first value (constant 0xffffffff) from the MEM memory. The reconfigurable computing array after the first configuration can process the same fixed key, the initialized variable and the first value of the constant register to set initial values for a plurality of parameters, and after the initial values are set, the initial values S of the LFSR are obtained0~S15FSM register initial value R of finite state machine1~R3The and finite state machine FSM outputs a first value F.
In operation S220, second configuration information is obtained and second configuration is performed on the reconfigurable computing array according to the second configuration information, so that the reconfigurable computing array after the second configuration performs 32-time loop operations on the linear feedback shift register initial value, the finite-state machine register initial value, the first value output by the finite-state machine, and the second value output by the constant register, to obtain a second value of the linear feedback shift register, a second value output by the finite-state machine, and a second value output by the finite-state machine.
For example, second configuration information is generated according to the second dataflow graph, and the reconfigurable computing array is configured in a second mode according to the second configuration information to obtain the reconfigurable computing array after the second configuration. The reconfigurable computing array after the second configuration can initialize the initial value S of the linear feedback shift register LFSR0~S15FSM register initial value R of finite state machine1~R3The FSM outputs a first value F and a constant register second value (D)0~D5) The cycle calculation was performed 32 times. Of these 32 loop calculations, each loop calculation includes: step one, the reconfigurable computing array pair after the second configuration is used for the current initial stageInitialization value S of linear feedback shift register LFSR0~S15FSM register initial value R of current finite state machine1~R3The FSM of the current finite state machine outputs a first value F and a second value (D) of the constant register0~D5) Calculating to obtain an initial value S of the updated initialization linear feedback shift register LFSR0~S15FSM register initial value R after updating1~R3Outputting a first value F with the updated finite state machine FSM; secondly, the updated initial linear feedback shift register LFSR initial value S0~S15FSM register initial value R after updating1~R3The updated FSM outputs a first value F as the initial value S of the current initialization linear feedback shift register LFSR0~S15FSM register initial value R of current finite state machine1~R3The current FSM outputs a first value F. After 32 times of cyclic calculation is finished, the initial value S of the LFSR is obtained by the final calculation0~S15FSM register initial value R of current finite state machine1~R3The FSM of the current finite state machine outputs a first value F as a second value S of the initialized linear feedback shift register LFSR0~S15FSM register second value R of finite state machine1~R3And the finite state machine FSM outputs a second value F.
In operation S230, third configuration information is obtained and third configuration is performed on the reconfigurable computing array according to the third configuration information, so that the reconfigurable computing array after the third configuration performs N-time loop computation on the second value of the linear feedback shift register, the second value of the finite-state machine register, the second value output by the finite-state machine, and the second value of the constant register to obtain N keys, where N is a preset loop number.
For example, third configuration information is generated according to the third dataflow graph, and the reconfigurable computing array is configured according to the third configuration information to obtain a reconfigurable computing array after the third configuration. Wherein after the third configuration canThe reconstruction calculation array may be based on the second value S of the linear feedback shift register LFSR0~S15FSM register second value R of finite state machine1~R3The FSM outputs a second value F and a second value (D) of the constant register0~D5) And performing N times of circular calculation to obtain N keys. In the N loop calculations, each loop calculation includes: firstly, the reconfigurable computing array after the third configuration is used for setting the second value S of the current linear feedback shift register LFSR0~S15Current FSM register second value R1~R3The current FSM outputs a second value F and a second value (D) of the constant register0~D5) Calculating to obtain the second value S of the updated linear feedback shift register LFSR0~S15The second value R of the FSM register of the updated finite state machine1~R3The updated current FSM outputs a second value F and a 32-bit key stream; secondly, the second value S of the updated linear feedback shift register LFSR0~S15The second value R of the FSM register of the updated finite state machine1~R3The updated current FSM outputs a second value F as a second value S of the current LFSR0~S15Current FSM register second value R1~R3And the current finite state machine FSM outputs a second value F. And generating a 32-bit key stream after each calculation, wherein the number of the key stream can be increased by increasing the cycle number N of the third data flow diagram.
According to an embodiment of the present invention, the performing the first configuration on the reconfigurable computing array in operation S210 includes: at least one operator of a plurality of operators of the reconfigurable compute array is configured as a first operator for implementing a logic function. For example, the first operator for implementing the logical function is the BFU operator.
According to an embodiment of the present invention, the performing the second configuration on the reconfigurable computing array in operation S220 includes: at least three operators in the plurality of operators of the reconfigurable computing array are configured as a second operator for implementing a logic function, a third operator for implementing a look-up table function, and a fourth operator for implementing a bit permutation function. For example, the second operator for implementing the logic function is the BFU operator, the third operator for implementing the look-up table function is the SBOX operator, and the fourth operator for implementing the bit permutation function is the BENES operator.
According to an embodiment of the present invention, the third configuration of the reconfigurable computing array in operation S230 includes: at least three operators in the plurality of operators of the reconfigurable computing array are configured as a fifth operator for implementing a logic function, a sixth operator for implementing a look-up table function, and a seventh operator for implementing a bit permutation function. For example, the fifth operator for implementing the logic function is the BFU operator, the sixth operator for implementing the look-up table function is the SBOX operator, and the seventh operator for implementing the bit permutation function is the BENES operator.
Figure 3 schematically shows a schematic diagram of a reconfigurable computing array according to an embodiment of the invention after a second configuration, executed 32 times in a loop. As shown in FIG. 3, the operations are performed in the following sequence of steps C1-C7 and are performed in a loop 32 times.
At step C1, the v value calculates the decomposition. Wherein, the step C1 may include the following substeps C1.1-C1.8. Wherein the second value of the constant register comprises D0~D5
In sub-step C1.1, S is read from MEM0、D0(0xffffff00) and F are loaded into the operator simultaneously, and the function of the BFU operator is configured as ((S)0<<8)&D0F) results are r1And outputting the data to a buffer unit.
In sub-step C1.2, S is read from MEM0And D1(0xff) simultaneously loads an operator, and the function of the BFU operator is configured as ((S)0>>24)&D1) The result is r2And outputting the data to a buffer unit.
In substep C1.3, the result r is compared2Loading the contents of the lookup table into the SBOX operator from the cache unit, wherein the contents of the lookup table are MULalpha tables, and the result is r3And outputting the data to a buffer unit.
In sub-step C1.4, S is read from MEM11And D2(0x00ffffff) simultaneously loading an operator, the function of BFU operator is configured as ((S)11>>8)&D2) The result is r4And outputting the data to a buffer unit.
In substep C1.5, the result r is compared1、r3、r4Loading into operator from cache unit, function configuration of BFU operator is (r)1^r3^r4) The result is r5And outputting the data to a buffer unit.
In sub-step C1.6, S is read from MEM11And D3(0xff) simultaneously loading an operator, the function of the BFU operator is configured as (S)11&D3) The result is r6And outputting the data to a buffer unit.
In substep C1.7, the result r is compared6Loading the contents of the lookup table into the SBOX operator from the cache unit, wherein the contents of the lookup table are DIValpha tables, and the result is r7And outputting the data to a buffer unit.
In substep C1.8, S is2Results r6F load operator, function configuration of BFU operator is (S)2^r6F), the result is a v value which is output to the buffer unit.
In step C2, S0~S15And (5) performing left shift operation.
Wherein the left shift operation comprises, for example, reading S from MEM15~S1Respectively assigned to S14~S0Reading v assignment from buffer unit to S15
At step C3, the F value calculates the decomposition. Wherein, the step C3 may include the following substeps C3.1-C3.2.
In sub-step C3.1, S is read from MEM15And a buffer value T1(T1R being FSM1Value) load operator, function configuration of BFU operator is (S)15+T1) The result is r8And outputting the data to a buffer unit.
In sub-step C3.2, r is read from the buffer cell8And T2(T2R2 value for FSM), MEM read D4(0 xfffffff) is loaded into the operator, and the function of the BFU operator is configured as (r)8^T2^D4) The result is the F valueAnd outputting the data to a buffer unit.
At step C4, T is read from the buffer cell2And T3(T3R being FSM3Value), MEM read D5(0xffffffff)、S5Load to operator, BFU function configuration is (T)2+(T3^S5))&D5The result is an r value, which is output to the buffer unit.
At step C5, T is read from the buffer cell2Loaded into SBOX operator, look-up table content is T2With the result that R3And outputting the data to a buffer unit.
At step C6, T is read from the buffer cell1Loaded into SBOX operator, look-up table content is T1With the result that R2And outputting the data to a buffer unit.
In step C7, R is loaded from the buffer unit into the operator, and the result is directly output as R1To the cache unit.
The key stream generation phase corresponds to operation S230 in fig. 2. In other words, the keystream generation stage is to perform third configuration on the reconfigurable computing array according to third configuration information, and perform computation based on the reconfigurable computing array after the third configuration to generate the keystream.
For example, configured according to the calculation procedures of steps C1 to C7, unlike the reconfigurable computing array loop executing 32 loop calculations, the F value of step C1.8 does not participate in the calculation in the key stream generation phase. Changing the operator function of BFU in step C1.8 to (S)2^r6) And obtaining a result which is the key stream through the calculation process of C1-C7. And setting the circulation times as N, wherein N is an integer. The first round of calculation of the N-time loop calculation outputs an output value F which needs to be discarded by the FSM, and each round of output of a 32-bit key stream starts from the second round of loop calculation, and the number of the key streams can be increased by increasing the loop times N of the third data flow diagram.
FIG. 4 is a schematic diagram illustrating a cascaded computation by a plurality of operators according to an embodiment of the present invention.
According to an embodiment of the invention, when performing the first configuration, the second configuration or the third configuration, the configured operator comprises a plurality of input ports. The method of the embodiment of the invention also comprises the following steps: determining an operation cycle corresponding to input data of each input port of a plurality of configured input ports of an operator, and caching M cycles for input data corresponding to a smaller operation cycle until the operation cycle corresponding to the input data of the input port corresponding to a larger operation cycle is the same when the operation cycle corresponding to the input data of at least one input port is different from the operation cycle corresponding to the input data of at least another input port, wherein M is a positive integer.
According to the embodiment of the invention, in the process of realizing the Snow3G algorithm based on the reconfigurable array, certain operation processes are very long. Because the BFU operator in the reconfigurable computing array has three inputs and two outputs and the SBOX operator has four inputs and four outputs, the computing of the whole formula cannot be completed by using 1 operator, and a plurality of operators are needed for cascade computing. When the operators are cascaded, it must be ensured that the data of each operator input port participating in the calculation is fed in the same period, otherwise, the calculation will be in error.
In the process of realizing the Snow3G algorithm based on the reconfigurable computing array, the whole computing process is very complex, the operation process is required to be split, and a plurality of operators are cascaded to jointly complete the operation process. As shown in fig. 4, taking the example of calculating the v value, the specific calculation division is as follows:
(1)BFU0calculating ((S)0<<8)&0xfffff 00) to obtain the result r1
(2)BFU1Calculating ((S)0>>24)&0xff) to obtain a result r2
(3)SBOX0Calculating MILalpha (r)2) Obtaining a result r3
(4)BFU2Calculating ((S)11>>8)&0xfffff 00) to obtain the result r4
(5)BFU3Calculating (S)11&0x00ff) to obtain a result r5
(6)SBOX1Calculating DIValpha (r)5) Obtaining a result r6
(7)BFU4Computing(r1^r3^S2) Obtaining a result r7
(8)BFU5Calculating (r)4^r6F) to obtain a result r8
(9)BFU6Calculating (r)7^r8) And obtaining a final result v value.
As can be seen from the above calculation flow, the result r is obtained7Must ensure r1、r3And S2Input into operator BFU at the same time4And (6) performing calculation. And r1Is BFU0The output of (2), the result can be obtained in the 2 nd period; r is3Is SBOX0Output of r2Is also BFU1So that the result is only available in cycle 3; s2The value of (c) is stored in the GPRF, and the 1 st cycle can be the result. So as to guarantee the result r7Must be such that r is1And S2And r3Likewise, the operator BFU is reached only in cycle 34To the input port of (1). The method used by the embodiment of the invention is to use the characteristic of 2 output ports of the BFU operator to convert S into S in the first period2Are input to BFU together0Port of, by BFU0Is outputted by one output port S2Thus realizing S2One cycle is buffered. Similarly, the S of the second cycle is again set2And r1Are respectively input into BFU1And BFU2In and then BFU1And BFU2Are respectively outputted, so that r in the third period can be obtained1、r3And S2
According to the implementation of the invention, when the reconfigurable array is subjected to the first configuration, the second configuration or the third configuration, the data processing period corresponding to each configured operator is determined, and the operators with the same data processing period are configured into parallel operation.
According to the embodiment of the invention, in order to maximize the performance of the Snow3G algorithm, the minimum number of cycles can be used in the algorithm mapping process to realize the operation, which needs to fully utilize the characteristics and owned resources of the operator.
For example, each configuration scheme is a dataflow graph. When a configuration scheme is designed, operation is put on an operator as much as possible according to three-input two-output of the BFU and abundant combined operation characteristics, so that the operation efficiency is improved. For example, as shown in sub-step C1.1 of FIG. 3, S may be added0、D0F is calculated as three inputs to a BFU, and 1 cycle results. In the original formula of Snow3G algorithm, S is first calculated0、D0And performing operation, and performing operation with F after obtaining the result, which needs to be completed by using 2 BFUs and takes 2 cycles to obtain the result.
Therefore, when a configuration scheme is designed, according to the characteristic that operators can perform parallel operation, the number of cycles required by each intermediate variable to obtain a result in an operation formula is divided, and the operation formulas which can obtain the result by the same number of cycles are put together to perform the parallel operation as much as possible. For example, in substeps C1.1-C1.8, the formula for obtaining a result in a cycle of ((S)0>>24)&D1)、((S11>>8)&D2) And the partial arithmetic expressions can be operated in parallel in the same period, so that the whole operation period is greatly shortened.
FIG. 5 schematically shows a diagram of an optimized computation cycle according to an embodiment of the invention. As shown in fig. 5, taking the example of calculating v value, the following describes a specific implementation scheme after optimization:
(1) multiple operator BFU0、BFU1、BFU2、BFU3Parallel computations can be performed in the same cycle (e.g., cycle 1), and the computation contents are as follows:
BFU0: calculating ((S)0<<8)&0xffffff00)^S2Obtaining a result r0
BFU1: calculating ((S)11>>8)&0xffffffff) F to obtain a result r1
BFU2: calculating ((S)0>>24)&0xff) to obtain a result r2
BFU3: computing(S11&0xff) to obtain a result r3
(2)BFU4、SBOX0、SBOX1Parallel computations can be performed in the same cycle (e.g., cycle 2), and the contents of the computations are as follows:
BFU4: calculating (r)0^r1) Obtaining a result r4
SBOX0: calculating MILalpha (r)2) Obtaining a result r5
SBOX1: calculating DIValpha (r)3) Obtaining a result r6
(3)BFU5For example, in cycle 3, a final v value is obtained.
BFU5: calculating (r)4^r5^r6) And obtaining a result v.
Fig. 6A to 6B schematically show diagrams of calculation cycles before and after optimization according to an embodiment of the present invention.
As shown in FIG. 6A, the unoptimized implementation of the algorithm given in sub-steps C1.1-C1.8 shown in FIG. 3 takes 9 cycles to complete the calculation process. As shown in fig. 6B, the optimized algorithm implementation scheme only needs 3 cycles to complete the calculation process, and the performance is improved by 3 times.
Fig. 7 schematically shows a block diagram of a system for implementing a stream cipher algorithm based on a reconfigurable compute array according to an embodiment of the present invention.
As shown in fig. 7, a system 700 for implementing a stream cipher algorithm based on a reconfigurable computing array includes a first configuration module 710, a second configuration module 720, and a third configuration module 730. The system 700 may perform the method described above with reference to fig. 2.
Specifically, the first configuration module 710 may be configured to obtain first configuration information and perform first configuration on the reconfigurable computing array according to the first configuration information, so that the reconfigurable computing array after the first configuration processes the initialization variable, the first value of the constant register, and the fixed key to obtain the initial value of the linear feedback shift register, the initial value of the register of the finite state machine, and the first value output by the finite state machine. According to the embodiment of the present invention, the first configuration module 710 may, for example, perform the operation S210 described above with reference to fig. 2, which is not described herein again.
The second configuration module 720 may be configured to obtain second configuration information and perform second configuration on the reconfigurable computing array according to the second configuration information, so that the reconfigurable computing array after the second configuration performs 32-time loop operations on the linear feedback shift register initial value, the finite-state machine register initial value, the first value output by the finite-state machine, and the constant register second value, to obtain a linear feedback shift register second value, a finite-state machine register second value, and a second value output by the finite-state machine. According to the embodiment of the present invention, the second configuration module 720 may, for example, perform the operation S220 described above with reference to fig. 2, which is not described herein again.
The third configuration module 730 may be configured to obtain third configuration information and perform third configuration on the reconfigurable computing array according to the third configuration information, so that the reconfigurable computing array after the third configuration performs N-time loop calculation on the second value of the linear feedback shift register, the second value of the finite-state machine register, the second value output by the finite-state machine, and the second value of the constant register to obtain N keys, where N is a preset loop number. According to the embodiment of the present invention, the third configuration module 730 may, for example, perform the operation S230 described above with reference to fig. 2, which is not described herein again.
According to the embodiment of the present invention, when the first configuration module performs the first configuration on the reconfigurable computing array, the first configuration module is specifically configured to configure at least one operator of the plurality of operators of the reconfigurable computing array as the first operator for implementing the logic function. When the second configuration module performs a second configuration on the reconfigurable computing array, the second configuration module is specifically configured to configure at least three operators of the multiple operators of the reconfigurable computing array as a second operator for implementing a logic function, a third operator for implementing a look-up table function, and a fourth operator for implementing a bit permutation function. When the third configuration module performs a third configuration on the reconfigurable computing array, the third configuration module is specifically configured to configure at least three operators of the multiple operators of the reconfigurable computing array as a fifth operator for implementing a logic function, a sixth operator for implementing a look-up table function, and a seventh operator for implementing a bit permutation function.
According to an embodiment of the invention, the system 700 further comprises: the first determining module is used for determining a data processing period corresponding to each configured operator when the first configuration, the second configuration or the third configuration is carried out, and configuring the operators with the same data processing period into parallel operation.
According to an embodiment of the present invention, when performing the first configuration, the second configuration, or the third configuration, the configured operator includes a plurality of input ports, and the system 700 further includes: a second determining module and a caching module. The second determining module is configured to determine an operation cycle corresponding to input data of each of the plurality of input ports of the configured operator. The buffer module is used for buffering M periods of the input data corresponding to a smaller operation period until the operation period corresponding to the input data of the input port corresponding to a larger operation period is the same when the operation period corresponding to the input data of at least one input port is different from the operation period corresponding to the input data of at least another input port, wherein M is a positive integer.
Any number of modules, sub-modules, units, sub-units, or at least part of the functionality of any number thereof according to embodiments of the invention may be implemented in one module. Any one or more of the modules, sub-modules, units, and sub-units according to the embodiments of the present invention may be implemented by being divided into a plurality of modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present invention may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in any other reasonable manner of hardware or firmware by integrating or packaging a circuit, or in any one of or a suitable combination of software, hardware, and firmware implementations. Alternatively, one or more of the modules, sub-modules, units, sub-units according to embodiments of the present invention may be at least partially implemented as computer program modules, which, when executed, may perform the corresponding functions.
For example, any plurality of the first configuration module 710, the second configuration module 720, and the third configuration module 730 may be combined and implemented in one module, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present invention, at least one of the first configuration module 710, the second configuration module 720 and the third configuration module 730 may be implemented at least partially as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware and firmware, or in any suitable combination of any of them. Alternatively, at least one of the first configuration module 710, the second configuration module 720 and the third configuration module 730 may be at least partially implemented as a computer program module, which when executed may perform a corresponding function.
FIG. 8 schematically illustrates a block diagram of a computer system for implementing a stream cipher algorithm based on a reconfigurable compute array, according to an embodiment of the present invention. The computer system illustrated in FIG. 8 is only one example and should not impose any limitations on the scope of use or functionality of embodiments of the invention.
As shown in fig. 8, computer system 800 includes a processor 801, a computer-readable storage medium 802. The system 800 may perform a method according to an embodiment of the invention.
In particular, the processor 801 may include, for example, a general purpose microprocessor, an instruction set processor and/or related chip set and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), and/or the like. The processor 801 may also include onboard memory for caching purposes. The processor 801 may be a single processing unit or a plurality of processing units for performing the different actions of the method flows according to embodiments of the present invention.
Computer-readable storage medium 802 may be, for example, any medium that can contain, store, communicate, propagate, or transport the instructions. For example, a readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, device, or propagation medium. Specific examples of the readable storage medium include: magnetic storage devices, such as magnetic tape or Hard Disk Drives (HDDs); optical storage devices, such as compact disks (CD-ROMs); a memory, such as a Random Access Memory (RAM) or a flash memory; and/or wired/wireless communication links.
The computer-readable storage medium 802 may comprise a computer program 803, which computer program 803 may comprise code/computer-executable instructions that, when executed by the processor 801, cause the processor 801 to perform a method according to an embodiment of the invention or any variant thereof.
The computer program 803 may be configured with, for example, computer program code comprising computer program modules. For example, in an example embodiment, code in computer program 803 may include one or more program modules, including for example 803A, module 803B, … …. It should be noted that the division and number of the modules are not fixed, and those skilled in the art may use suitable program modules or program module combinations according to actual situations, so that the processor 801 may execute the method according to the embodiment of the present invention or any variation thereof when the program modules are executed by the processor 801.
According to an embodiment of the present invention, at least one of the first configuration module 710, the second configuration module 720 and the third configuration module 730 may be implemented as a computer program module described with reference to fig. 8, which, when executed by the processor 801, may implement the respective operations described above.
The present invention also provides a computer-readable medium, which may be embodied in the apparatus/device/system described in the above embodiments; or may exist separately and not be assembled into the device/apparatus/system. The computer readable medium carries one or more programs which, when executed, implement the method.
According to embodiments of the present invention, a computer readable medium may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wired, optical fiber cable, radio frequency signals, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
It will be appreciated by a person skilled in the art that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present invention are possible, even if such combinations or combinations are not explicitly recited in the present invention. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present invention may be made without departing from the spirit or teaching of the invention. All such combinations and/or associations fall within the scope of the present invention.
While the invention has been shown and described with reference to certain exemplary embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents. Accordingly, the scope of the present invention should not be limited to the above-described embodiments, but should be defined not only by the appended claims, but also by equivalents thereof.

Claims (10)

1. A method for implementing a stream cipher algorithm based on a reconfigurable computing array, the method comprising:
acquiring first configuration information and performing first configuration on the reconfigurable computing array according to the first configuration information so that the reconfigurable computing array after the first configuration processes an initialization variable, a constant register first value and a fixed key to obtain a linear feedback shift register initial value, a finite state machine register initial value and a finite state machine output first value;
acquiring second configuration information and performing second configuration on the reconfigurable computing array according to the second configuration information so that the reconfigurable computing array after the second configuration performs 32-time cyclic operation on the linear feedback shift register initial value, the finite-state machine register initial value, the first value output by the finite-state machine and the second value output by the constant register to obtain a second value of the linear feedback shift register, a second value output by the finite-state machine and a second value output by the finite-state machine; and
and acquiring third configuration information and performing third configuration on the reconfigurable computing array according to the third configuration information so that the reconfigurable computing array after the third configuration performs N-time cycle computation on the second value of the linear feedback shift register, the second value of the finite-state machine register, the second value output by the finite-state machine and the second value of the constant register to obtain N keys, wherein N is a preset cycle number.
2. The method of claim 1,
the first configuring the reconfigurable computing array comprises:
configuring at least one operator of a plurality of operators of the reconfigurable computational array as a first operator for implementing a logical function;
the second configuring the reconfigurable computing array comprises:
configuring at least three operators of the plurality of operators of the reconfigurable computing array as a second operator for implementing a logic function, a third operator for implementing a look-up table function, and a fourth operator for implementing a bit permutation function;
the third configuring the reconfigurable computing array comprises:
configuring at least three operators of the plurality of operators of the reconfigurable computing array as a fifth operator for implementing a logic function, a sixth operator for implementing a look-up table function, and a seventh operator for implementing a bit permutation function.
3. The method of claim 2, further comprising:
and when the first configuration, the second configuration or the third configuration is carried out, determining a data processing period corresponding to each configured operator, and configuring the operators with the same data processing period into parallel operation.
4. The method of claim 2, wherein in performing the first configuration, the second configuration, or the third configuration, the configured operator comprises a plurality of input ports, the method further comprising:
determining an operation cycle corresponding to input data of each input port of a plurality of input ports of the configured operator;
when the operation period corresponding to the input data of at least one input port is different from the operation period corresponding to the input data of at least another input port, buffering M periods for the input data corresponding to the smaller operation period until the operation periods corresponding to the input data of the input port corresponding to the larger operation period are the same, wherein M is a positive integer.
5. A system for implementing a stream cipher algorithm based on a reconfigurable computational array, the system comprising:
the first configuration module is used for acquiring first configuration information and performing first configuration on the reconfigurable computing array according to the first configuration information so that the reconfigurable computing array after the first configuration processes an initialization variable, a constant register first value and a fixed key to obtain a linear feedback shift register initial value, a finite state machine register initial value and a finite state machine output first value;
the second configuration module is used for acquiring second configuration information and performing second configuration on the reconfigurable computing array according to the second configuration information so that the reconfigurable computing array after the second configuration performs 32-time cyclic operation on the linear feedback shift register initial value, the finite-state machine register initial value, the first value output by the finite-state machine and the second value output by the constant register to obtain a second value of the linear feedback shift register, a second value output by the finite-state machine register and a second value output by the finite-state machine; and
and the third configuration module is used for acquiring third configuration information and performing third configuration on the reconfigurable computing array according to the third configuration information so that the reconfigurable computing array after the third configuration performs N-time loop calculation on the second value of the linear feedback shift register, the second value of the finite-state machine register, the second value output by the finite-state machine and the second value of the constant register to obtain N keys, wherein N is a preset loop number.
6. The system of claim 5,
when the first configuration module performs first configuration on the reconfigurable computing array, the first configuration module is specifically configured to configure at least one operator in a plurality of operators of the reconfigurable computing array as a first operator for implementing a logic function;
when the second configuration module performs a second configuration on the reconfigurable computing array, the second configuration module is specifically configured to configure at least three operators of the multiple operators of the reconfigurable computing array as a second operator for implementing a logic function, a third operator for implementing a look-up table function, and a fourth operator for implementing a bit permutation function;
when the third configuration module performs a third configuration on the reconfigurable computing array, the third configuration module is specifically configured to configure at least three operators of the multiple operators of the reconfigurable computing array as a fifth operator for implementing a logic function, a sixth operator for implementing a look-up table function, and a seventh operator for implementing a bit permutation function.
7. The system of claim 6, further comprising:
the first determining module is configured to determine a data processing period corresponding to each configured operator when performing the first configuration, the second configuration, or the third configuration, and configure the operators having the same data processing period as a parallel operation.
8. The system of claim 6, wherein in performing the first configuration, the second configuration, or the third configuration, the configured operator comprises a plurality of input ports, the system further comprising:
the second determining module is used for determining an operation cycle corresponding to the input data of each input port of the plurality of input ports of the configured operator;
the buffer module is used for buffering M periods of the input data corresponding to a smaller operation period until the operation period corresponding to the input data of the input port corresponding to a larger operation period is the same when the operation period corresponding to the input data of at least one input port is different from the operation period corresponding to the input data of at least another input port, wherein M is a positive integer.
9. A computing device, comprising:
one or more processors;
a memory for storing one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-4.
10. A computer-readable storage medium storing computer-executable instructions for implementing the method of any one of claims 1 to 4 when executed.
CN201911020613.0A 2019-10-24 2019-10-24 Method, system and medium for realizing stream cipher algorithm based on reconfigurable computing array Active CN110795748B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911020613.0A CN110795748B (en) 2019-10-24 2019-10-24 Method, system and medium for realizing stream cipher algorithm based on reconfigurable computing array

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911020613.0A CN110795748B (en) 2019-10-24 2019-10-24 Method, system and medium for realizing stream cipher algorithm based on reconfigurable computing array

Publications (2)

Publication Number Publication Date
CN110795748A true CN110795748A (en) 2020-02-14
CN110795748B CN110795748B (en) 2021-12-14

Family

ID=69441186

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911020613.0A Active CN110795748B (en) 2019-10-24 2019-10-24 Method, system and medium for realizing stream cipher algorithm based on reconfigurable computing array

Country Status (1)

Country Link
CN (1) CN110795748B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114064560A (en) * 2021-11-17 2022-02-18 上海交通大学 Configurable scratch pad cache design method for coarse-grained reconfigurable array
CN114510450A (en) * 2021-05-25 2022-05-17 无锡沐创集成电路设计有限公司 Accelerated calculation method and device of encryption algorithm and array unit operator system
CN114546933A (en) * 2022-04-25 2022-05-27 广州万协通信息技术有限公司 Reconfigurable computing system, method, terminal device and storage medium
CN114760057A (en) * 2022-04-13 2022-07-15 中金金融认证中心有限公司 Method for cryptographic chip, cryptographic card and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103260156A (en) * 2012-02-15 2013-08-21 中国移动通信集团公司 Key stream generating device and method and confidentiality protective device and method
US20140108782A1 (en) * 2010-12-07 2014-04-17 Comcast Cable Communications, Llc Reconfigurable Access Network Encryption Architecture
CN109274497A (en) * 2018-08-30 2019-01-25 无锡凯特微电子有限公司 A kind of mapping method of the SM3 algorithm based on reconfigurable arrays
CN110011798A (en) * 2019-04-08 2019-07-12 中国科学院软件研究所 The initial method and device and communication means of a kind of ZUC-256 stream cipher arithmetic
CN110059493A (en) * 2019-04-10 2019-07-26 无锡沐创集成电路设计有限公司 SKINNY-128-128 Encryption Algorithm realization method and system based on coarseness Reconfigurable Computation unit

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140108782A1 (en) * 2010-12-07 2014-04-17 Comcast Cable Communications, Llc Reconfigurable Access Network Encryption Architecture
CN103260156A (en) * 2012-02-15 2013-08-21 中国移动通信集团公司 Key stream generating device and method and confidentiality protective device and method
CN109274497A (en) * 2018-08-30 2019-01-25 无锡凯特微电子有限公司 A kind of mapping method of the SM3 algorithm based on reconfigurable arrays
CN110011798A (en) * 2019-04-08 2019-07-12 中国科学院软件研究所 The initial method and device and communication means of a kind of ZUC-256 stream cipher arithmetic
CN110059493A (en) * 2019-04-10 2019-07-26 无锡沐创集成电路设计有限公司 SKINNY-128-128 Encryption Algorithm realization method and system based on coarseness Reconfigurable Computation unit

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114510450A (en) * 2021-05-25 2022-05-17 无锡沐创集成电路设计有限公司 Accelerated calculation method and device of encryption algorithm and array unit operator system
CN114064560A (en) * 2021-11-17 2022-02-18 上海交通大学 Configurable scratch pad cache design method for coarse-grained reconfigurable array
CN114064560B (en) * 2021-11-17 2024-06-04 上海交通大学 Configurable scratch pad design method for coarse-grained reconfigurable array
CN114760057A (en) * 2022-04-13 2022-07-15 中金金融认证中心有限公司 Method for cryptographic chip, cryptographic card and storage medium
CN114546933A (en) * 2022-04-25 2022-05-27 广州万协通信息技术有限公司 Reconfigurable computing system, method, terminal device and storage medium

Also Published As

Publication number Publication date
CN110795748B (en) 2021-12-14

Similar Documents

Publication Publication Date Title
CN110795748B (en) Method, system and medium for realizing stream cipher algorithm based on reconfigurable computing array
Khan et al. High-Speed and Low-Latency ECC Processor Implementation Over GF ($2^{m}) $ on FPGA
Gaj et al. Fair and comprehensive methodology for comparing hardware performance of fourteen round two SHA-3 candidates using FPGAs
Mert et al. A flexible and scalable NTT hardware: Applications from homomorphically encrypted deep learning to post-quantum cryptography
KR20210106452A (en) Quantum controller with modular and dynamic pulse generation and routing
García et al. A compact FPGA-based processor for the Secure Hash Algorithm SHA-256
Aikata et al. KaLi: A crystal for post-quantum security using Kyber and Dilithium
Turan et al. Compact and flexible FPGA implementation of Ed25519 and X25519
US20170373836A1 (en) AES Hardware Implementation
Güneysu Utilizing hard cores of modern FPGA devices for high-performance cryptography
At et al. Compact implementation of Threefish and Skein on FPGA
Guo et al. Agile-AES: Implementation of configurable AES primitive with agile design approach
Visconti et al. High-performance AES-128 algorithm implementation by FPGA-based SoC for 5G communications
US20170302438A1 (en) Advanced bus architecture for aes-encrypted high-performance internet-of-things (iot) embedded systems
Mao et al. High-performance and configurable SW/HW co-design of Post-Quantum Signature CRYSTALS-Dilithium
Dubey et al. Hardware-software co-design for side-channel protected neural network inference
Gouert et al. HELM: Navigating Homomorphic Encryption through Gates and Lookup Tables
KR20210153423A (en) Circuit, apparatus and method for calculating multiplicative inverse
KR100453230B1 (en) Hyperelliptic curve crtpto processor hardware apparatus
JP2010107947A (en) Sha-based message schedule operation method, message compression operation method and cryptographic device performing the same
US20220012052A1 (en) Reconfigurable Crypto-Processor
Majzoub et al. MorphoSys reconfigurable hardware for cryptography: the twofish case
KR20150105405A (en) Method and apparatus for a computable, large, variable and secure substitution box
Karl et al. Hardware Accelerated FrodoKEM on RISC-V
Khalid et al. RunFein: a rapid prototyping framework for Feistel and SPN-based block ciphers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant