CN109672524B - SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture - Google Patents

SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture Download PDF

Info

Publication number
CN109672524B
CN109672524B CN201811514910.6A CN201811514910A CN109672524B CN 109672524 B CN109672524 B CN 109672524B CN 201811514910 A CN201811514910 A CN 201811514910A CN 109672524 B CN109672524 B CN 109672524B
Authority
CN
China
Prior art keywords
data
row
configuration
unit
reconfigurable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811514910.6A
Other languages
Chinese (zh)
Other versions
CN109672524A (en
Inventor
杨锦江
陆启乐
赵利锋
葛伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University Wuxi Institute Of Integrated Circuit Technology
Southeast University
Original Assignee
Southeast University Wuxi Institute Of Integrated Circuit Technology
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University Wuxi Institute Of Integrated Circuit Technology, Southeast University filed Critical Southeast University Wuxi Institute Of Integrated Circuit Technology
Priority to CN201811514910.6A priority Critical patent/CN109672524B/en
Publication of CN109672524A publication Critical patent/CN109672524A/en
Application granted granted Critical
Publication of CN109672524B publication Critical patent/CN109672524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/08Key distribution or management, e.g. generation, sharing or updating, of cryptographic keys or passwords
    • H04L9/0861Generation of secret information including derivation or calculation of cryptographic keys or passwords
    • H04L9/0863Generation of secret information including derivation or calculation of cryptographic keys or passwords involving passwords or one-time passwords
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0618Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation
    • H04L9/0625Block ciphers, i.e. encrypting groups of characters of a plain text message using fixed encryption transformation with splitting of the data block into left and right halves, e.g. Feistel based algorithms, DES, FEAL, IDEA or KASUMI
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/06Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols the encryption apparatus using shift registers or memories for block-wise or stream coding, e.g. DES systems or RC4; Hash functions; Pseudorandom sequence generators
    • H04L9/0643Hash functions, e.g. MD5, SHA, HMAC or f9 MAC

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Power Engineering (AREA)
  • Logic Circuits (AREA)
  • Advance Control (AREA)

Abstract

The invention discloses an SM3 algorithm round iteration system and an iteration method based on a coarse-grained reconfigurable architecture, wherein the iteration system comprises a system bus, a reconfigurable processor and a microprocessor, the reconfigurable processor comprises a configuration unit, an input first-in first-out register group, an output first-in first-out register group, a general register file and 4 reconfigurable array blocks, an inlet of the configuration unit is connected with the microprocessor through the system bus, and an outlet is connected with each reconfigurable array block; the input first-in first-out register group is connected with the microprocessor through a system bus; the 4 reconfigurable array blocks are respectively connected with an input/output first-in first-out register bank and a general register file; data storage, reading and transmission are carried out among the 4 reconfigurable array blocks through the general register file; the output FIFO register set is connected to the microprocessor through the system bus. The technical scheme supports certain flexibility, and meanwhile, the high-efficiency operation of the SM3 algorithm is realized by improving the parallelism of the DES algorithm, optimizing a production line and the like.

Description

SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture
Technical Field
The invention belongs to the field of embedded reconfigurable systems, and particularly relates to a large-scale coarse-grained embedded reconfigurable system and a processing method thereof, which are applied to the fields of communication, encryption and the like.
Background
General purpose processors and Application Specific Integrated Circuits (ASICs) are two of the dominant methods in the field of conventional computer system architecture. However, as the demand of the application field for indexes such as performance, energy consumption, time to market and the like of the system is continuously increased, the disadvantages of the two traditional computing modes are exposed.
The general processor method has a wide application range, but has low computational efficiency, and although the application-specific integrated circuit can improve the computational speed and computational efficiency and meet the performance requirements, the flexibility of the ASIC device is poor.
Reconfigurable computing (reconfigurable computing) technology arises in order to achieve a good tradeoff between flexibility and computational efficiency. Reconfigurable computing is one of the current trends in the field of computer system architecture, and its architecture is between general-purpose processors and ASICs, and combines the strengths of both. The reconfigurable equipment is configured, so that a general computing platform can be converted into a special hardware system to complete specific computing tasks, which are equivalent to the fact that the computing tasks are simultaneously expanded in time and space, and the flexibility of application and high computing performance are displayed. In addition, the reconfigurable computing technology has the advantages of low system energy consumption, high reliability, short time to market and the like. The advantages enable the reconfigurable computing technology to have wide application prospects in various application fields, particularly the embedded application field. Many mainstream applications in the embedded field, such as multimedia applications, encryption/decryption applications, and communication applications, are well suited for implementation using reconfigurable computing techniques. The current reconfigurable computing technology is mainly used for computing platforms in the advanced technical field, but as the cost of reconfigurable logic devices is gradually reduced and the reconfigurable computing technology is continuously perfected during operation, it is reasonable to believe that various advantages of the reconfigurable computing technology can make the reconfigurable computing technology have great significance in more fields.
At present, multiple reconfigurable systems such as ReMAP, AsAP, DRP and the like are researched at home and abroad. However, the interconnection of these arrays is simple, and a large number of bit shifts and a large number of rounds are required in the round iteration operation of the SM3 algorithm, so that the efficiency and speed of the operation are low. The traditional reconfigurable computing system has great problems in the operation efficiency and operation period of the SM 3.
Disclosure of Invention
The invention aims to provide an SM3 algorithm round iteration system and an iteration method based on a coarse-grained reconfigurable architecture, which utilize the advantages of parallelism processing, independent configurable operation modules and the like of a reconfigurable technology to support certain flexibility and realize efficient operation of an SM3 algorithm by improving the parallelism of a DES algorithm, optimizing a production line and the like.
In order to achieve the above purpose, the solution of the invention is:
an SM3 algorithm round iteration system based on a coarse-grained reconfigurable architecture comprises a system bus, a reconfigurable processor and a microprocessor, wherein the reconfigurable processor comprises a configuration unit, an input first-in first-out register set, an output first-in first-out register set, a general register file and 4 reconfigurable array blocks, an inlet of the configuration unit is connected with the microprocessor through the system bus, and outlets of the configuration unit are respectively connected with the reconfigurable array blocks; the input first-in first-out register group is connected with the microprocessor through a system bus; the 4 reconfigurable array blocks are respectively connected with an input first-in first-out register set and an output first-in first-out register set, and the 4 reconfigurable array blocks are all connected with a general register file; the 4 reconfigurable array blocks mutually store, read and transmit data through a general register file; the output first-in first-out register group is connected with the microprocessor through a system bus;
the SM3 algorithm round iteration system comprises 5M +1 configuration flow charts, the microprocessor determines the operation flow of round iteration by analyzing the characteristics of SM3, and expands the configuration flow charts of multi-round iteration operation into a data flow chart which is mapped to the reconfigurable processor to form configuration information which is sent to the configuration unit; the microprocessor sends plaintext data to the reconfigurable processor through a system bus, the plaintext data are stored in an input first-in first-out register set, and initial data, generated keys and calculated intermediate data are stored in a general register file for next round iteration of a graph; the configuration unit is used for storing configuration information and sending the configuration information to each reconfigurable array block.
An iteration method of an SM3 algorithm iteration system based on a coarse-grained reconfigurable architecture comprises the following steps;
step 1, summarizing a data flow diagram of SM3 method iteration;
step 2, formulating a data input mode of SM 3;
step 3, configuring the reconfigurable processor according to the data input mode determined in the step 2 and the data flow graph determined in the step 1, and generating configuration information;
step 4, storing the configuration information and the initial data of the reconfigurable processor into a corresponding memory through the microprocessor;
step 5, the microprocessor starts the reconfigurable processor and sends the configuration information and the data to be processed to the reconfigurable processor;
step 6, the reconfigurable processor processes data according to the configuration information and the data to be processed, and sends an interrupt signal after the reconfigurable processor completes the current task; and sends the processed data to the microprocessor through the system bus.
After the scheme is adopted, aiming at SM3 algorithm iteration, the 4 reconfigurable array blocks comprise a plurality of operation units, the operation parallelism of the SM3 algorithm is improved by means of the general register file, multiple rounds of iteration are optimized and accelerated in a parallel shift replacement mode in the reconfigurable processor, the operation efficiency of the SM3 method is improved while certain flexibility is achieved, and the operation period is reduced as much as possible.
Drawings
FIG. 1 is a block diagram of a large-scale coarse-grained embedded reconfigurable system processor based on the present invention;
fig. 2 to 7 are flowcharts of iterative configuration of the SM3 algorithm according to the present invention;
FIG. 8 is a schematic diagram of a message extension rule;
FIG. 9 is a schematic diagram of a compression function rule;
fig. 10 is an overall flow diagram of SM3 algorithm round iterations.
Detailed Description
The technical solution and the advantages of the present invention will be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the present invention provides an SM3 algorithm round iterative system based on a coarse-grained reconfigurable architecture, which includes a system bus, a reconfigurable processor and a microprocessor, which are respectively described below.
The reconfigurable processor comprises a configuration unit, an input first-in first-out register set, an output first-in first-out register set, a general register file, 4 reconfigurable array blocks and a lookup table, wherein an incoming line port of the configuration unit is connected with the microprocessor through a system bus, and an outgoing line port of the configuration unit is respectively connected with each reconfigurable array block; the input first-in first-out register group is connected with the microprocessor through a system bus; the 4 reconfigurable array blocks are respectively a 1 st reconfigurable array block, a 2 nd reconfigurable array block, a 3 rd reconfigurable array block and a 4 th reconfigurable array block; each reconfigurable array block is connected with an input first-in first-out register set and an output first-in first-out register set, and the 4 reconfigurable array blocks are connected with a general register file; the 4 reconfigurable array blocks mutually store, read and transmit data through a general register file; the output first-in first-out register group is connected with the microprocessor through a system bus; the SM3 algorithm round iteration system comprises 5M +1 configuration flow charts, wherein:
the microprocessor determines the operation flow of the wheel iteration by analyzing the characteristics of the SM3, expands the configuration flow chart of the wheel iteration operation of multiple wheels into a data flow chart which is mapped to the reconfigurable processor to form configuration information and sends the configuration information to the configuration unit. The microprocessor sends the plaintext data to the reconfigurable processor through the system bus, and the plaintext data is stored in the input first-in first-out register set. The microprocessor also stores the initial data, the generated key and the calculated intermediate data in a general register file for the next round of iteration of the graph.
The configuration unit is used for storing configuration information and sending the configuration information to each reconfigurable array block.
For the 5p-4 configuration flow chart, p is more than or equal to 1 and less than or equal to M; the (5p-4) th configuration flow chart is used for acquiring message data input into the FIFO register set and reading configuration information of a configuration unit corresponding to the message data; the (5p-4) th configuration flow chart stores the read initial message data into a general register file according to the configuration information; loading initial message data into the next configuration flow chart according to the configuration information for the operation of the next configuration flow chart;
for the 5p-3 configuration flow charts, p is more than or equal to 1 and less than or equal to M; the (5p-3) th configuration flow chart is used for acquiring initial message data stored in the general register file of the (5p-4) th configuration flow chart, reading configuration information of a corresponding configuration unit, and finishing message expansion iteration; reading message extension word data in a general register file through a 1 st read port operation row selector; the (5p-3) th configuration flow chart completes the first stage of message expansion iteration in the SM3 algorithm according to the configuration information;
for the 5p-2 configuration flow chart, p is more than or equal to 1 and less than or equal to M; the (5p-2) th configuration flow chart is used for acquiring initial message data stored in the general register file of the (5p-3) th configuration flow chart, reading configuration information of a corresponding configuration unit, and finishing message expansion iteration; reading message extension word data in a general register file through a 1 st read port operation row selector; the (5p-2) th configuration flow chart completes the second stage iteration of message expansion in the SM3 algorithm according to the configuration information;
for the 5p-1 configuration flow chart, p is more than or equal to 1 and less than or equal to M; for the 5p-1 configuration flow diagram block, the configuration flow diagram block is used for acquiring the message extension word generated by the (5p-2) configuration flow diagram, reading the hash value stored in the general register file and the configuration information of the corresponding configuration unit; reading a hash value stored in a general register file through a 1 st read port operation row selector; performing compression function iterations from 0 th to 15 th on the message expansion word and the initial hash value of the (5p-2) th reconfigurable array block by the 5p-1 th configuration flow chart according to the configuration information to obtain intermediate information of the current block iteration;
for the 5p configuration flow chart, p is more than or equal to 1 and less than or equal to M; for the 5 p-th configuration flow diagram block, the configuration flow block is used for acquiring the message extension word generated by the (5p-1) -th configuration flow diagram, reading the hash value stored in the general register file and the configuration information of the corresponding configuration unit; reading a hash value stored in a general register file through a 1 st read port operation row selector; performing 16 th to 63 th iterations of a compression function on the message expansion word and the initial hash value of the (5p-2) th reconfigurable array block according to the configuration information by the 5 p-th configuration flow chart to obtain intermediate information of the current block iteration;
acquiring intermediate information of a 5M configuration flow chart and reading configuration information of a corresponding configuration unit for a (5M +1) th reconfigurable array block; and the (5M +1) th configuration flow chart obtains a hash value by performing XOR on the intermediate information of the 5M th reconfigurable array block and the initial hash value according to the configuration information.
Preferably, the configuration unit comprises a configuration and control interface, a configuration memory and a configuration analysis module which are connected together in sequence, and the configuration and control interface is connected with the system bus; the microprocessor sends the required configuration information to the configuration memory sequentially through the system bus and the configuration and control interface, the configuration memory stores the sent configuration information, the configuration analysis module is used for analyzing the configuration information of the configuration memory and sending the analyzed configuration information to the reconfigurable array block, and configuration, starting and switching operation of the reconfigurable array block are achieved.
Preferably, the reconfigurable array block comprises a read port operation row selector, a write port operation row selector and N reconfigurable array operation rows, and the N reconfigurable array operation rows share the read port operation row selector and the write port operation row selector; the read port operation row selector in the mth configuration flow chart is marked as the mth read port operation row selector, the write port operation row selector in the mth configuration flow chart is marked as the mth write port operation row selector, and the nth reconfigurable array operation row in the mth configuration flow chart is marked as the mth
Figure BDA0001901704570000054
The reconfigurable array operation rows are arranged, M is 1, …,5M +1, N is 1, …, N, 5M +1 are the number of the configuration flow charts, N is the row number of the reconfigurable array operation rows included by the reconfigurable array block, and M and N are integers; the configuration flow charts are sequentially connected, and the reconfigurable array operation rows in each reconfigurable array block are sequentially connected;
configuring intermediate data obtained by flow chart operation in round iteration to be stored in a general register file through a write port operation row selector, and configuring intermediate data required to be obtained by the flow chart operation in round iteration to be read information stored in the general register file through a read port operation row selector;
first, the
Figure BDA0001901704570000051
The row reconfigurable array operation row is connected with input first-in first-out register set, and the first time
Figure BDA0001901704570000052
The row reconfigurable array operation row is connected with the output first-in first-out register group, the reconfigurable array operation row can read various buffer data and various temporary message digests through the general register file, and simultaneously can write initial hash values into the general register file, and the hash values are used for subsequent compression function calculation;
in the (5p-4) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; first, the
Figure BDA0001901704570000053
The method comprises the steps that information data in a first-in first-out register set are input into a row reconfigurable array operation row, and configuration information of a configuration unit is read through a 1 st read port operation row selector; first, the
Figure BDA0001901704570000061
The row reconfigurable array operation row carries out straight-through on the message data according to the configuration information to obtain intermediate data of the next reconfigurable array block round iteration; writing the intermediate data into a general register file through a 1 st write port operation row selector;
for the (5p-3) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; the (5p-3) th configuration flowchart
Figure BDA0001901704570000062
The row reconfigurable array operation row loads the intermediate data of the (5p-4) th configuration flow chart from the general register file; simultaneously writing an initial hash value in the general register file through a 1 st write port operation row selector; reading the configuration information of the configuration unit through a read port operation row selector; the (5p-3) th configuration flow chart performs message extension on the intermediate data of the (5p-4) th configuration flow chart according to the configuration informationIterative operation is carried out, intermediate data of the (5p-4) th configuration flow chart are obtained, the first stage of message expansion is completed, the intermediate data are written into a general register file through the 4 th write port operation row selector and are used for iterative calculation of the next reconfigurable array block wheel;
for the (5p-2) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; the (5p-2) th configuration flowchart
Figure BDA0001901704570000063
The row reconfigurable array operation row loads the intermediate data of the (5p-3) th configuration flow chart from the general register file; simultaneously writing an initial hash value in the general register file through a 1 st write port operation row selector; reading the configuration information of the configuration unit through a read port operation row selector; performing message expansion iterative operation on the intermediate data of the (5p-3) th configuration flow diagram according to the configuration information by the (5p-2) th configuration flow diagram to obtain the intermediate data of the (5p-3) th configuration flow diagram, completing the second stage of message expansion, and writing the intermediate data into a general register file through a 2 nd write port operation row selector for the next reconfigurable array block round iterative calculation;
for the (5p-1) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; the (5p-1) th configuration flowchart
Figure BDA0001901704570000064
The row reconfigurable array operation row loads the intermediate data of the (5p-2) th configuration flow chart from the general register file; reading the configuration information of the configuration unit through a read port operation row selector; performing compression function iterative operation on the intermediate data of the (5p-2) th configuration flow diagram according to the configuration information by the (5p-1) th configuration flow diagram to obtain the intermediate data of the (5p-1) th configuration flow diagram, completing the 0 th to 15 th iterations of the compression function, and writing the intermediate data into a general register file through a 3 rd write port operation row selector for the next reconfigurable array block round iterative operation;
for the 5p configuration flow chart, p is more than or equal to 1 and less than or equal to M; the 5p th configuration flowchart
Figure BDA0001901704570000065
The row reconfigurable array operation row loads the intermediate data of the 5 p-th configuration flow chart from the general register file; reading the configuration information of the configuration unit through a read port operation row selector; performing compression function iterative operation on the intermediate data of the (5p-1) th configuration flow diagram according to the configuration information by the 5 p-th configuration flow diagram to obtain the intermediate data of the 5 p-th configuration flow diagram, completing the 16 th to 63 th iterations of the compression function, and writing the intermediate data into a general register file through a 3 rd write port operation selector for the calculation of the next configuration flow diagram;
for the 5M +1 configuration flow chart, the 5M +1 configuration flow chart
Figure BDA0001901704570000071
The row reconfigurable array operation row loads the intermediate data of the 5M configuration flow chart from the general register file; and the 5M +1 th reconfigurable array block obtains the hash value by carrying out XOR on the intermediate information of the 5M th configuration flow chart and the hash value according to the configuration information.
Preferably, each reconfigurable array operation row comprises X1 data loading units, X2 data output units and X3 32-bit operation units; each arithmetic unit uses a corresponding read port arithmetic row selector to select any three outputs of other arithmetic units in the uplink or the current row as the inputs of the arithmetic units; the k1 th data loading unit of the n row reconfigurable array operation row of the m configuration flow chart is recorded as the th
Figure BDA0001901704570000072
The data loading units, the k2 data output units of the n row reconfigurable array operation row of the m configuration flow chart are marked as the th
Figure BDA0001901704570000073
The k3 arithmetic units of the n-th row reconfigurable array arithmetic row of the m configuration flow chart are marked as the th
Figure BDA0001901704570000074
One fortuneAn arithmetic unit of
Figure BDA0001901704570000075
The output of each arithmetic unit is expressed as
Figure BDA0001901704570000076
X, the arithmetic number k1 being 11,k2=1...X2,k3=1...X3,k4=1...X4X1, X2, X3 and X4 are integers; the operation unit is used for selecting the middle data to flow into by the m-th read port operation row selector and receiving the configuration information of the analysis configuration analysis module;
first, the
Figure BDA0001901704570000077
And
Figure BDA0001901704570000078
the data loading unit loads data input into the FIFO register set and analyzes the configuration information of the configuration analysis module; reading the information stored in the general register file by the 1 st read port operation row selector and selecting a corresponding replacement network into which data flows according to the analyzed configuration information, wherein the replacement network is the 1 st read port operation row selector
Figure BDA0001901704570000079
And
Figure BDA00019017045700000710
an arithmetic unit; first, the
Figure BDA00019017045700000711
And
Figure BDA00019017045700000712
each data output unit temporarily stores the corresponding data
Figure BDA00019017045700000713
Figure BDA00019017045700000714
And
Figure BDA00019017045700000715
the result of the arithmetic logic unit reads the configuration information to determine to output the data to an output first-in first-out register group, a next line of reconfigurable array operation line or a general register file;
first, the
Figure BDA00019017045700000716
The data loading unit analyzes the configuration information of the configuration analysis module, reads the running data information of the 5m configuration flow chart stored in the general register file through the 2 nd and 3 rd read port operation row selector, and selects the corresponding operation of data inflow according to the analyzed configuration information, and the second data loading unit analyzes the configuration information of the configuration analysis module, and the second data loading unit selects the corresponding operation of data inflow according to the analyzed configuration information
Figure BDA00019017045700000717
The operation unit performs operation and temporarily stores the output data in the corresponding first
Figure BDA00019017045700000718
And the output unit outputs data to an output first-in first-out register group, a next row of reconfigurable array operation row or a general register file.
Preferably, the arithmetic unit comprises modulo addition operation, exclusive or operation, and operation, nand operation, shift operation, and pass-through output operation; and each arithmetic unit has at most 3 inputs and at most 2 outputs, wherein the arithmetic unit supports an optional one of the inputs as an output while performing the above arithmetic operation.
Preferably, the number of the reconfigurable array blocks is 4, the reconfigurable array blocks are sequentially connected together end to end, the number of the general register files is 1, the number of the input first-in first-out register groups is 4, and the number of the output first-in first-out register groups is 4.
Preferably, each reconfigurable array block comprises 4 rows of reconfigurable array operation rows, 4 read port operation row selectors and 4 write port operation row selectors; each row of reconfigurable array operation row comprises 4 data loading units, 4 data input units and 8 32-bit arithmetic operation units.
Preferably, M is the number of blocks of the message data partitioned by 512 bits.
The invention also provides an iteration method of the SM3 algorithm iteration system based on the coarse-grained reconfigurable architecture, which comprises the following steps;
step 1, analyzing iterative calculation characteristics of an SM3 method, and inducing a data flow graph;
step 2, formulating a data input mode of SM3 according to an operation flow in a data flow graph;
step 3, configuring the reconfigurable processor according to the data input mode determined in the step 2 and the data flow diagram determined in the step 1 aiming at the characteristics of the reconfigurable processor, and generating configuration information;
step 4, storing the configuration information and the initial data of the reconfigurable processor into a corresponding memory through the microprocessor;
step 5, the microprocessor starts the reconfigurable processor and sends the configuration information and the data to be processed to the reconfigurable processor;
step 6, the reconfigurable processor processes data according to the configuration information and the data to be processed, and sends an interrupt signal after the reconfigurable processor completes the current task; and sends the processed data to the microprocessor through the system bus.
Preferably, the specific process of the reconfigurable processor performing data processing according to the configuration information and the data to be processed in step 6 is as follows:
step 61: first, the
Figure BDA0001901704570000081
The data loading unit loads initial 512-bit message data in sequence from the input FIFO register group by 128 bits each time; reading the configuration information of the configuration unit through a 1 st read port operation row selector; according to the configuration information through
Figure BDA0001901704570000082
An arithmetic logic unit selects pass-throughMode passes 512bit message data through
Figure BDA0001901704570000083
The data output units are stored to the general register files denoted as W0, W1, …, W16.
Step 62: for the (5p-3) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; reading the configuration information of the configuration unit through the (5p-3) th read port operation row selector, reading the information data Wj-3, Wj-5, Wj-6, Wj-8, Wj-9, Wj-11, Wj-12, Wj-13, Wj-16, j is more than or equal to 0 and less than 68 in the general register file and is an even number in the (5p-3) th configuration flow chart, and reading the information data Wj-3, Wj-5, Wj-6, Wj-8, Wj-9, Wj-11, Wj-12, Wj-13, Wj-16, j is more than or equal to 0 and is an even number
Figure BDA0001901704570000091
Row of row reconfigurable array operation
Figure BDA0001901704570000092
A data load unit loads Wj-3, Wj-9, Wj-12, Wj-13, th
Figure BDA0001901704570000093
The individual arithmetic unit inputs Wj-3, Wj-9 perform shift and XOR operations,
Figure BDA0001901704570000094
obtain an output
Figure BDA0001901704570000095
First, the
Figure BDA0001901704570000096
A unit of operation input
Figure BDA0001901704570000097
The Wj-16 performs an exclusive or operation,
Figure BDA0001901704570000098
obtain an output
Figure BDA0001901704570000099
First, the
Figure BDA00019017045700000910
A unit of operation input
Figure BDA00019017045700000911
The permutation function P1 is completed and,
Figure BDA00019017045700000912
obtain an output
Figure BDA00019017045700000913
First, the
Figure BDA00019017045700000914
The individual arithmetic unit inputs Wj-16 perform a shift operation,
Figure BDA00019017045700000915
to obtain an output (Wj-16)<<<7) (ii) a First, the
Figure BDA00019017045700000916
A unit of operation input
Figure BDA00019017045700000917
And
Figure BDA00019017045700000918
the exclusive or operation is completed and the operation is performed,
Figure BDA00019017045700000919
obtain an output
Figure BDA00019017045700000920
Figure BDA00019017045700000921
First, the
Figure BDA00019017045700000922
A unit of operation input
Figure BDA00019017045700000923
And
Figure BDA00019017045700000924
the exclusive or operation is completed and the operation is performed,
Figure BDA00019017045700000925
obtain an output
Figure BDA00019017045700000926
First, the
Figure BDA00019017045700000927
Row of row reconfigurable array operation
Figure BDA00019017045700000928
A data load unit loads Wj-6, the first
Figure BDA00019017045700000929
A unit of operation input
Figure BDA00019017045700000930
And
Figure BDA00019017045700000931
the exclusive or operation is completed and the operation is performed,
Figure BDA00019017045700000932
obtain an output
Figure BDA00019017045700000933
Figure BDA00019017045700000934
And pass through
Figure BDA00019017045700000935
The data output unit is stored in the general register file.
First, the
Figure BDA00019017045700000936
Row of row reconfigurable array operation
Figure BDA00019017045700000937
A data load unit loads Wj-2, Wj-8, Wj-11, Wj-12, th
Figure BDA00019017045700000938
The individual arithmetic unit inputs Wj-2, Wj-8 perform shift and XOR operations,
Figure BDA00019017045700000939
obtain an output
Figure BDA00019017045700000940
First, the
Figure BDA00019017045700000941
A unit of operation input
Figure BDA00019017045700000942
The Wj-15 performs an exclusive or operation,
Figure BDA00019017045700000943
obtain an output
Figure BDA00019017045700000944
First, the
Figure BDA00019017045700000945
A unit of operation input
Figure BDA00019017045700000946
The permutation function P1 is completed and,
Figure BDA00019017045700000947
obtain an output
Figure BDA00019017045700000948
First, the
Figure BDA00019017045700000949
The individual arithmetic unit inputs Wj-12 perform a shift operation,
Figure BDA00019017045700000950
to obtain an output (Wj-12)<<<7) (ii) a First, the
Figure BDA00019017045700000951
A unit of operation input
Figure BDA00019017045700000952
And
Figure BDA00019017045700000953
the exclusive or operation is completed and the operation is performed,
Figure BDA00019017045700000954
to obtain an output P1
Figure BDA00019017045700000955
First, the
Figure BDA00019017045700000956
Row of row reconfigurable array operation
Figure BDA00019017045700000957
A unit of operation input
Figure BDA00019017045700000958
And
Figure BDA00019017045700000959
the exclusive or operation is completed and the operation is performed,
Figure BDA00019017045700000960
obtain an output
Figure BDA00019017045700000961
Figure BDA00019017045700000962
First, the
Figure BDA00019017045700000963
A data load unit loads Wj-5, the first
Figure BDA00019017045700000964
A unit of operation input
Figure BDA00019017045700000965
And
Figure BDA00019017045700000966
the exclusive or operation is completed and the operation is performed,
Figure BDA00019017045700000967
obtain an output
Figure BDA0001901704570000101
And pass through
Figure BDA0001901704570000102
The data output unit is stored in the general register file; step 62 is repeated until j 67.
And step 63: for the (5p-2) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; and reading the configuration information of the configuration unit through the (5p-2) th read port operation row selector, and reading the information data Wk-12, Wk-11, Wk-15 and Wk-16 in the general register file by the (5p-3) th reconfigurable array block, wherein k is more than or equal to 0 and less than 64, and k is an even number. First, the
Figure BDA0001901704570000103
Row of row reconfigurable array operation
Figure BDA0001901704570000104
A data load unit loads Wk-16 and Wk-12
Figure BDA0001901704570000105
The input Wk-16 and Wk-12 of each operation unit complete the XOR operation,
Figure BDA0001901704570000106
obtain an output
Figure BDA0001901704570000107
First, the
Figure BDA0001901704570000108
And
Figure BDA0001901704570000109
the operation units are straight-through units,
Figure BDA00019017045700001010
output of
Figure BDA00019017045700001011
And pass through
Figure BDA00019017045700001012
The data output unit is stored in the general register file.
First, the
Figure BDA00019017045700001013
Row of row reconfigurable array operation
Figure BDA00019017045700001014
A data load unit loads Wk-15 and Wk-11, the first
Figure BDA00019017045700001015
The inputs Wk-15 and Wk-11 of the operation units complete the XOR operation,
Figure BDA00019017045700001016
obtain an output
Figure BDA00019017045700001017
First, the
Figure BDA00019017045700001018
The operation units are straight-through units,
Figure BDA00019017045700001019
output of
Figure BDA00019017045700001020
And pass through
Figure BDA00019017045700001021
The data output unit is stored in the general register file; repeat step 63 until k is 63
Step 64: for the (5p-1) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; the (5p-3) th and (5p-2) th reconfigurable array blocks are written into the extended word block of the message in the general register file; the second of the (5p-1) th reconfigurable array block
Figure BDA00019017045700001022
Row of row reconfigurable array operation
Figure BDA00019017045700001023
The data loading unit loads the hash value ADE and a constant Tj in the general register file, wherein j is more than or equal to 0 and less than 16; first, the
Figure BDA00019017045700001024
The operation units complete SS1 ← ((A)<<<12)+E+((Tj<<<j))<<<7,
Figure BDA00019017045700001025
And SS2+ D operation.
First, the
Figure BDA00019017045700001026
Row of row reconfigurable array operation
Figure BDA00019017045700001027
The data load unit derives the hash value BC and the message extension words Wj and W from the general register filej', j is more than or equal to 0 and less than 16; first, the
Figure BDA00019017045700001028
The arithmetic units complete the Boolean function FFj (A, B, C), TT1 ← FFj (A, B, C) + D + SS2+ Wj′,B<<<9, SS1+ Wj operation, th
Figure BDA00019017045700001029
An output unit sequentially outputs D '═ C and C' ═ B<<<9, B 'a and a' TT1 are written into the general register file at the hash value of the block.
First, the
Figure BDA00019017045700001030
Row of row reconfigurable array operation
Figure BDA00019017045700001031
The data loading unit loads the hash value EFGH in the general register file; first, the
Figure BDA00019017045700001032
The arithmetic units complete the operations of Boolean functions GGj (E, F, G), TT2 ← GGj (E, F, G) + H + SS1+ Wj, and direct E, F and G in turn.
First, the
Figure BDA00019017045700001033
Row of row reconfigurable array operation
Figure BDA00019017045700001034
The arithmetic units complete the Boolean function replacement function E ← P0(TT2), F<<<19 operation, the first
Figure BDA0001901704570000111
An output unit successively converts outputs E ' ═ E ← P0(TT2), F ' ═ E, G ' ═ F ← F ═ F-<<<19 and H' G are written to the general register file at the hash value of the block; repeat steps 64 to j 15.
Step 65: for the 5p configuration flow chart, p is more than or equal to 1 and less than or equal to M; the (5p-3) th and (5p-2) th reconfigurable array blocks are written into the extended word block of the message in the general register file; the second of the 5 p-th reconfigurable array blocks
Figure BDA0001901704570000112
Row of row reconfigurable array operation
Figure BDA0001901704570000113
Slave connection of data loading unitUsing hash value ADE in register file and constant Tj, j is more than or equal to 16 and less than 64; first, the
Figure BDA0001901704570000114
The operation units complete SS1 ← ((A)<<<12)+E+((Tj<<<j))<<<7,
Figure BDA0001901704570000115
And SS2+ D operation.
First, the
Figure BDA0001901704570000116
Row of row reconfigurable array operation
Figure BDA0001901704570000117
The data load unit derives the hash value BC and the message extension words Wj and W from the general register filej', 16 is less than or equal to j and less than 64; first, the
Figure BDA0001901704570000118
The arithmetic units complete the Boolean function FFj (A, B, C), TT1 ← FFj (A, B, C) + D + SS2+ Wj′,B<<<9, SS1+ Wj operation, th
Figure BDA0001901704570000119
An output unit sequentially outputs D '═ C and C' ═ B<<<9, B 'a and a' TT1 are written into the general register file at the hash value of the block.
First, the
Figure BDA00019017045700001110
Row of row reconfigurable array operation
Figure BDA00019017045700001111
The data loading unit loads the hash value EFGH in the general register file; first, the
Figure BDA00019017045700001112
The arithmetic units complete the operations of Boolean function GGi (E, F, G), TT2 ← GGj (E, F, G) + H + SS1+ Wj, direct E, F, G。
First, the
Figure BDA00019017045700001113
Row of row reconfigurable array operation
Figure BDA00019017045700001114
The arithmetic units complete the Boolean function replacement function E ← P0(TT2), F<<<19 operation, the first
Figure BDA00019017045700001115
An output unit successively converts outputs E ' ═ E ← P0(TT2), F ' ═ E, G ' ═ F ← F ═ F-<<<19 and H' G are written to the general register file at the hash value of the block; step 65 is repeated until j is 63.
And step 66: for the 5M +1 configuration flow chart, the 5M +1 configuration flow chart reads the 5M configuration flow chart and the 5M-5 configuration flow chart writes to the 5M +1 reconfigurable array block of hash values in the general register file
Figure BDA00019017045700001116
Row of row reconfigurable array operation
Figure BDA00019017045700001117
The data load unit writes the hash value ABCD from the 5M configuration flow chart into the general register file
Figure BDA00019017045700001118
The arithmetic units are configured in a direct-through mode;
first, the
Figure BDA00019017045700001119
Row of row reconfigurable array operation
Figure BDA00019017045700001120
The data loading unit reads the hash values A ', B', C 'and D' from the 5M-5 th reconfigurable array block general register file
Figure BDA00019017045700001121
The operation units complete in sequence
Figure BDA00019017045700001122
First, the
Figure BDA00019017045700001123
The data output unit sends the hash value into an output first-in first-out register array and writes the hash value into a general register file at the same time; repeating the steps 66 to EFGH to complete the same operation.
Preferably, the rules of P0 and P1 permutation in the steps 62, 63, 64 and 65 are as follows:
P0(X)=X⊕(X<<<9)⊕(X<<<17)
p1(X) · X · (X < <15) · (X < <23) · where X is a word.
The boolean functions FFi and GGi in steps 62, 63, 64 and 65 are as follows:
Figure BDA0001901704570000121
Figure BDA0001901704570000122
where XYZ is a word.
The constants in step 64 are as follows:
Figure BDA0001901704570000123
Figure BDA0001901704570000124
representing exclusive-OR, Λ represents AND, and, -represents not, | represents OR.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereby, and any modifications made on the basis of the technical scheme according to the technical idea of the present invention fall within the protection scope of the present invention.

Claims (7)

1. An SM3 algorithm round iteration system based on a coarse-grained reconfigurable architecture is characterized in that: the reconfigurable array comprises a system bus, a reconfigurable processor and a microprocessor, wherein the reconfigurable processor comprises a configuration unit, an input first-in first-out register group, an output first-in first-out register group, a general register stack and 4 reconfigurable array blocks, an input port of the configuration unit is connected with the microprocessor through the system bus, and an output port of the configuration unit is respectively connected with each reconfigurable array block; the input first-in first-out register group is connected with the microprocessor through a system bus; the 4 reconfigurable array blocks are respectively connected with an input first-in first-out register set and an output first-in first-out register set, and the 4 reconfigurable array blocks are all connected with a general register file; the 4 reconfigurable array blocks mutually store, read and transmit data through a general register file; the output first-in first-out register group is connected with the microprocessor through a system bus;
the SM3 algorithm round iteration system comprises 5M +1 configuration flow charts, the microprocessor determines the operation flow of round iteration by analyzing the characteristics of SM3, and expands the configuration flow charts of multi-round iteration operation into a data flow chart which is mapped to the reconfigurable processor to form configuration information which is sent to the configuration unit; the microprocessor sends plaintext data to the reconfigurable processor through a system bus, the plaintext data are stored in an input first-in first-out register set, and initial data, generated keys and calculated intermediate data are stored in a general register file for next round iteration of a graph; the configuration unit is used for storing configuration information and sending the configuration information to each reconfigurable array block;
the reconfigurable array block comprises a read port operation row selector, a write port operation row selector and N reconfigurable array operation rows, wherein the N reconfigurable array operation rows share the read port operation row selector and the write port operation row selector; wherein, the read port in the mth configuration flow chart is operatedThe calculation row selector is marked as the mth read port operation row selector, the write port operation row selector in the mth configuration flow chart is marked as the mth write port operation row selector, and the nth reconfigurable array operation row in the mth configuration flow chart is marked as the mth
Figure FDA0002983758120000011
The reconfigurable array operation rows are arranged, M is 1, …,5M +1, N is 1, …, N, 5M +1 are the number of the configuration flow charts, N is the row number of the reconfigurable array operation rows included by the reconfigurable array block, and M and N are integers; the configuration flow charts are sequentially connected, and the reconfigurable array operation rows in each reconfigurable array block are sequentially connected; configuring intermediate data obtained by flow chart operation in round iteration to be stored in a general register file through a write port operation row selector, and configuring intermediate data required to be obtained by the flow chart operation in round iteration to be read information stored in the general register file through a read port operation row selector;
each reconfigurable array operation row comprises X1 data loading units, X2 data output units and X3 32-bit operation units, and each operation unit uses a corresponding read port operation row selector to select the output of any three uplink or other operation units in the row as the input of the operation unit; the k1 th data loading unit of the n row reconfigurable array operation row of the m configuration flow chart is recorded as the th
Figure FDA0002983758120000021
The data loading units, the k2 data output units of the n row reconfigurable array operation row of the m configuration flow chart are marked as the th
Figure FDA0002983758120000022
The k3 arithmetic units of the n-th row reconfigurable array arithmetic row of the m configuration flow chart are marked as the th
Figure FDA0002983758120000023
An arithmetic unit of
Figure FDA0002983758120000024
The output of each arithmetic unit is expressed as
Figure FDA0002983758120000025
X, the arithmetic number k1 being 11,k2=1...X2,k3=1...X3,k4=1...X4X1, X2, X3 and X4 are integers; the operation unit is used for selecting the middle data to flow into by the m-th read port operation row selector and receiving the configuration information of the analysis configuration analysis module;
first, the
Figure FDA0002983758120000026
And
Figure FDA0002983758120000027
the data loading unit loads data input into the FIFO register set and analyzes the configuration information of the configuration analysis module; reading the information stored in the general register file by the 1 st read port operation row selector and selecting a corresponding replacement network into which data flows according to the analyzed configuration information, wherein the replacement network is the 1 st read port operation row selector
Figure FDA0002983758120000028
And
Figure FDA0002983758120000029
an arithmetic unit; first, the
Figure FDA00029837581200000210
And
Figure FDA00029837581200000211
each data output unit temporarily stores the corresponding data
Figure FDA00029837581200000212
Figure FDA00029837581200000213
And
Figure FDA00029837581200000214
the result of the arithmetic logic unit reads the configuration information to determine to output the data to an output first-in first-out register group, a next line of reconfigurable array operation line or a general register file;
first, the
Figure FDA00029837581200000215
The data loading unit analyzes the configuration information of the configuration analysis module, reads the running data information of the 5m configuration flow chart stored in the general register file through the 2 nd and 3 rd read port operation row selector, and selects the corresponding operation of data inflow according to the analyzed configuration information, and the second data loading unit analyzes the configuration information of the configuration analysis module, and the second data loading unit selects the corresponding operation of data inflow according to the analyzed configuration information
Figure FDA00029837581200000216
The operation unit performs operation and temporarily stores the output data in the corresponding first
Figure FDA00029837581200000217
And the output unit outputs data to an output first-in first-out register group, a next row of reconfigurable array operation row or a general register file.
2. The coarse-grained reconfigurable architecture-based SM3 algorithm round iterative system of claim 1, wherein: the configuration unit comprises a configuration and control interface, a configuration memory and a configuration analysis module which are sequentially connected together, and the configuration and control interface is connected with a system bus; the microprocessor sends the required configuration information to the configuration memory sequentially through the system bus and the configuration and control interface, the configuration memory stores the sent configuration information, and the configuration analysis module is used for analyzing the configuration information of the configuration memory and sending the analyzed configuration information to the reconfigurable array block.
3. The coarse-grained reconfigurable architecture-based SM3 algorithm round iterative system of claim 1, wherein: each reconfigurable array block comprises 4 rows of reconfigurable array operation rows, 4 read port operation row selectors and 4 write port operation row selectors, and each row of reconfigurable array operation rows comprises 4 data loading units, 4 data input units and 8 32-bit operation units.
4. An iteration method of an SM3 algorithm iteration system based on a coarse-grained reconfigurable architecture is characterized by comprising the following steps;
step 1, summarizing a data flow diagram of SM3 method iteration;
step 2, formulating a data input mode of SM 3;
step 3, configuring the reconfigurable processor according to the data input mode determined in the step 2 and the data flow graph determined in the step 1, and generating configuration information;
step 4, storing the configuration information and the initial data of the reconfigurable processor into a corresponding memory through the microprocessor;
step 5, the microprocessor starts the reconfigurable processor and sends the configuration information and the data to be processed to the reconfigurable processor;
step 6, the reconfigurable processor processes data according to the configuration information and the data to be processed, and sends an interrupt signal after the reconfigurable processor completes the current task; and sending the processed data to a microprocessor through a system bus;
in step 6, the specific process of the reconfigurable processor performing data processing according to the configuration information and the data to be processed is as follows:
step 61: first, the
Figure FDA0002983758120000031
The data loading unit loads initial 512-bit message data in sequence from the input FIFO register group by 128 bits each time; reading the configuration information of the configuration unit through a 1 st read port operation row selector; according to the configuration information through
Figure FDA0002983758120000032
The arithmetic logic unit selects a pass-through mode to pass 512bit message data
Figure FDA0002983758120000033
The data output units are stored in a general register file, and are marked as W0, W1, … and W16;
step 62: for the (5p-3) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; reading the configuration information of the configuration unit through the (5p-3) th read port operation row selector, and reading the information data Wj-3, Wj-5, Wj-6, Wj-8, Wj-9, Wj-11, Wj-12, Wj-13, Wj-16, 0-j in the general register file through the (5p-3) th configuration flow chart<68 and j is an even number, th
Figure FDA0002983758120000034
Row of row reconfigurable array operation
Figure FDA0002983758120000035
A data load unit loads Wj-3, Wj-9, Wj-12, Wj-13, th
Figure FDA0002983758120000036
The individual arithmetic unit inputs Wj-3, Wj-9 perform shift and XOR operations,
Figure FDA0002983758120000037
obtain an output
Figure FDA0002983758120000038
First, the
Figure FDA0002983758120000039
A unit of operation input
Figure FDA00029837581200000310
The Wj-16 performs an exclusive or operation,
Figure FDA00029837581200000311
obtain an output
Figure FDA0002983758120000041
First, the
Figure FDA0002983758120000042
A unit of operation input
Figure FDA0002983758120000043
The permutation function P1 is completed and,
Figure FDA0002983758120000044
obtain an output
Figure FDA0002983758120000045
First, the
Figure FDA0002983758120000046
The individual arithmetic unit inputs Wj-16 perform a shift operation,
Figure FDA0002983758120000047
to obtain an output (Wj-16)<<<7) (ii) a First, the
Figure FDA0002983758120000048
A unit of operation input
Figure FDA0002983758120000049
And
Figure FDA00029837581200000410
the exclusive or operation is completed and the operation is performed,
Figure FDA00029837581200000411
obtain an output
Figure FDA00029837581200000412
Figure FDA00029837581200000413
First, the
Figure FDA00029837581200000414
A unit of operation input
Figure FDA00029837581200000415
And
Figure FDA00029837581200000416
the exclusive or operation is completed and the operation is performed,
Figure FDA00029837581200000417
obtain an output
Figure FDA00029837581200000418
First, the
Figure FDA00029837581200000419
Row of row reconfigurable array operation
Figure FDA00029837581200000420
A data load unit loads Wj-6, the first
Figure FDA00029837581200000421
A unit of operation input
Figure FDA00029837581200000422
And
Figure FDA00029837581200000423
the exclusive or operation is completed and the operation is performed,
Figure FDA00029837581200000424
obtain an output
Figure FDA00029837581200000425
Figure FDA00029837581200000426
And pass through
Figure FDA00029837581200000427
The data output unit is stored in the general register file;
first, the
Figure FDA00029837581200000428
Row of row reconfigurable array operation
Figure FDA00029837581200000429
A data load unit loads Wj-2, Wj-8, Wj-11, Wj-12, th
Figure FDA00029837581200000430
The individual arithmetic unit inputs Wj-2, Wj-8 perform shift and XOR operations,
Figure FDA00029837581200000431
obtain an output
Figure FDA00029837581200000432
First, the
Figure FDA00029837581200000433
A unit of operation input
Figure FDA00029837581200000434
The Wj-15 performs an exclusive or operation,
Figure FDA00029837581200000435
obtain an output
Figure FDA00029837581200000436
First, the
Figure FDA00029837581200000437
A unit of operation input
Figure FDA00029837581200000438
The permutation function P1 is completed and,
Figure FDA00029837581200000439
obtain an output
Figure FDA00029837581200000440
First, the
Figure FDA00029837581200000441
The individual arithmetic unit inputs Wj-12 perform a shift operation,
Figure FDA00029837581200000442
to obtain an output (Wj-12)<<<7) (ii) a First, the
Figure FDA00029837581200000443
A unit of operation input
Figure FDA00029837581200000444
And
Figure FDA00029837581200000445
the exclusive or operation is completed and the operation is performed,
Figure FDA00029837581200000446
obtain an output
Figure FDA00029837581200000447
Figure FDA00029837581200000448
First, the
Figure FDA00029837581200000449
Row reconfigurable array operation rowFirst, the
Figure FDA00029837581200000450
A unit of operation input
Figure FDA00029837581200000451
And
Figure FDA00029837581200000452
the exclusive or operation is completed and the operation is performed,
Figure FDA00029837581200000453
obtain an output
Figure FDA00029837581200000454
Figure FDA00029837581200000455
First, the
Figure FDA00029837581200000456
A data load unit loads Wj-5, the first
Figure FDA00029837581200000457
A unit of operation input
Figure FDA00029837581200000458
And
Figure FDA00029837581200000459
the exclusive or operation is completed and the operation is performed,
Figure FDA00029837581200000460
obtain an output
Figure FDA00029837581200000461
And pass through
Figure FDA00029837581200000462
The data output unit is stored in the general register file; repeat step 62 until j 67;
and step 63: for the (5p-2) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; reading the configuration information of the configuration unit through the (5p-2) th read port operation row selector, and reading the information data Wk-12, Wk-11, Wk-15 and Wk-16 in the general register file by the (5p-3) th reconfigurable array block<64 and k is an even number; first, the
Figure FDA00029837581200000463
Row of row reconfigurable array operation
Figure FDA00029837581200000464
A data load unit loads Wk-16 and Wk-12
Figure FDA00029837581200000465
The input Wk-16 and Wk-12 of each operation unit complete the XOR operation,
Figure FDA00029837581200000466
obtain an output
Figure FDA0002983758120000051
First, the
Figure FDA0002983758120000052
And
Figure FDA0002983758120000053
the operation units are straight-through units,
Figure FDA0002983758120000054
output of
Figure FDA0002983758120000055
And pass through
Figure FDA0002983758120000056
Data ofThe output unit is stored in the general register file;
first, the
Figure FDA0002983758120000057
Row of row reconfigurable array operation
Figure FDA0002983758120000058
A data load unit loads Wk-15 and Wk-11, the first
Figure FDA0002983758120000059
The inputs Wk-15 and Wk-11 of the operation units complete the XOR operation,
Figure FDA00029837581200000510
obtain an output
Figure FDA00029837581200000511
First, the
Figure FDA00029837581200000512
The operation units are straight-through units,
Figure FDA00029837581200000513
output of
Figure FDA00029837581200000514
And pass through
Figure FDA00029837581200000515
The data output unit is stored in the general register file; repeating step 63 until k is 63;
step 64: for the (5p-1) th configuration flow chart, p is more than or equal to 1 and less than or equal to M; the (5p-3) th and (5p-2) th reconfigurable array blocks are written into the extended word block of the message in the general register file; the second of the (5p-1) th reconfigurable array block
Figure FDA00029837581200000516
Row reconfigurable arrayThe first in the operation line
Figure FDA00029837581200000517
The hash value ADE of the data loading unit in the general register file and a constant Tj, j is more than or equal to 0<16; first, the
Figure FDA00029837581200000518
The operation units complete SS1 ← ((A)<<<12)+E+((Tj<<<j))<<<7,
Figure FDA00029837581200000519
And SS2+ D operation;
first, the
Figure FDA00029837581200000520
Row of row reconfigurable array operation
Figure FDA00029837581200000521
The data load unit derives the hash value BC and the message extension words Wj and W 'from the general register file'j,0≤j<16; first, the
Figure FDA00029837581200000522
The arithmetic units complete the Boolean function FFj (A, B, C), TT1 ← FFj (A, B, C) + D + SS2+ W'j,B<<<9, SS1+ Wj operation, th
Figure FDA00029837581200000523
An output unit sequentially outputs D '═ C and C' ═ B<<<9, B 'a and a' TT1 are written into the general register file at the hash value of the block;
first, the
Figure FDA00029837581200000524
Row of row reconfigurable array operation
Figure FDA00029837581200000525
The data loading unit loads the hash value EFGH in the general register file; first, the
Figure FDA00029837581200000526
The arithmetic units complete the operations of Boolean functions GGj (E, F and G), TT2 ← GGj (E, F and G) + H + SS1+ Wj, straight-through E, F and G in turn;
first, the
Figure FDA00029837581200000527
Row of row reconfigurable array operation
Figure FDA00029837581200000528
The arithmetic units complete the Boolean function replacement function E ← P0(TT2), F<<<19 operation, the first
Figure FDA00029837581200000529
An output unit successively converts outputs E ' ═ E ← P0(TT2), F ' ═ E, G ' ═ F ← F ═ F-<<<19 and H' G are written to the general register file at the hash value of the block; repeating steps 64 to j 15;
step 65: for the 5p configuration flow chart, p is more than or equal to 1 and less than or equal to M; the (5p-3) th and (5p-2) th reconfigurable array blocks are written into the extended word block of the message in the general register file; the second of the 5 p-th reconfigurable array blocks
Figure FDA00029837581200000530
Row of row reconfigurable array operation
Figure FDA00029837581200000531
The hash value ADE of the slave general register file of the data loading unit and a constant Tj, j is more than or equal to 16<64; first, the
Figure FDA00029837581200000532
The operation units complete SS1 ← ((A)<<<12)+E+((Tj<<<j))<<<7,
Figure FDA0002983758120000061
And SS2+ D operation;
first, the
Figure FDA0002983758120000062
Row of row reconfigurable array operation
Figure FDA0002983758120000063
The data load unit derives the hash value BC and the message extension words Wj and W 'from the general register file'j,16≤j<64; first, the
Figure FDA0002983758120000064
The arithmetic units complete the Boolean function FFj (A, B, C), TT1 ← FFj (A, B, C) + D + SS2+ W'j,B<<<9, SS1+ Wj operation, th
Figure FDA0002983758120000065
An output unit sequentially outputs D '═ C and C' ═ B<<<9, B 'a and a' TT1 are written into the general register file at the hash value of the block;
first, the
Figure FDA0002983758120000066
Row of row reconfigurable array operation
Figure FDA0002983758120000067
The data loading unit loads the hash value EFGH in the general register file; first, the
Figure FDA0002983758120000068
The arithmetic units complete operations of Boolean functions GGi (E, F, G), TT2 ← GGj (E, F, G) + H + SS1+ Wj, direct E, F and G in turn;
first, the
Figure FDA0002983758120000069
Row of row reconfigurable array operation
Figure FDA00029837581200000610
The arithmetic units complete the Boolean function replacement function E ← P0(TT2), F<<<19 operation, the first
Figure FDA00029837581200000611
An output unit successively converts outputs E ' ═ E ← P0(TT2), F ' ═ E, G ' ═ F ← F ═ F-<<<19 and H' G are written to the general register file at the hash value of the block; repeating step 65 until j is 63;
and step 66: for the 5M +1 configuration flow chart, the 5M +1 configuration flow chart reads the 5M configuration flow chart and the 5M-5 configuration flow chart writes to the 5M +1 reconfigurable array block of hash values in the general register file
Figure FDA00029837581200000612
Row of row reconfigurable array operation
Figure FDA00029837581200000613
The data load unit writes the hash value ABCD from the 5M configuration flow chart into the general register file
Figure FDA00029837581200000614
The arithmetic units are configured in a direct-through mode;
first, the
Figure FDA00029837581200000615
Row of row reconfigurable array operation
Figure FDA00029837581200000616
The data loading unit reads the hash values A ', B', C 'and D' from the 5M-5 th reconfigurable array block general register file
Figure FDA00029837581200000617
The operation units complete in sequence
Figure FDA00029837581200000618
First, the
Figure FDA00029837581200000619
The data output unit sends the hash value into an output first-in first-out register array and writes the hash value into a general register file at the same time; repeating the steps 66 to EFGH to complete the same operation.
5. The iterative method of the coarse-grained reconfigurable architecture-based SM3 algorithm round iterative system according to claim 4, wherein: the rules for P1 in step 62 and P0 in step 64 are as follows:
Figure FDA00029837581200000620
Figure FDA00029837581200000621
wherein X is a word.
6. The iterative method of the coarse-grained reconfigurable architecture-based SM3 algorithm round iterative system according to claim 4, wherein: the boolean functions FFi and GGi in steps 64 and 65 are as follows:
Figure FDA0002983758120000071
Figure FDA0002983758120000072
wherein, X, Y and Z are characters.
7. The iterative method of the coarse-grained reconfigurable architecture-based SM3 algorithm round iterative system according to claim 4, wherein: the constants in step 64 are as follows:
Figure FDA0002983758120000073
wherein,
Figure FDA0002983758120000074
representing exclusive-OR, Λ represents AND, and, -represents not, | represents OR.
CN201811514910.6A 2018-12-12 2018-12-12 SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture Active CN109672524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811514910.6A CN109672524B (en) 2018-12-12 2018-12-12 SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811514910.6A CN109672524B (en) 2018-12-12 2018-12-12 SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture

Publications (2)

Publication Number Publication Date
CN109672524A CN109672524A (en) 2019-04-23
CN109672524B true CN109672524B (en) 2021-08-20

Family

ID=66143706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811514910.6A Active CN109672524B (en) 2018-12-12 2018-12-12 SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture

Country Status (1)

Country Link
CN (1) CN109672524B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110059493B (en) * 2019-04-10 2023-04-07 无锡沐创集成电路设计有限公司 SKINNY-128-128 encryption algorithm implementation method and system based on coarse-grained reconfigurable computing unit
CN111008133B (en) * 2019-11-29 2021-04-27 中国科学院计算技术研究所 Debugging method and device for coarse-grained data flow architecture execution array

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102508816A (en) * 2011-11-15 2012-06-20 东南大学 Configuration method applied to coarse-grained reconfigurable array
CN103984560A (en) * 2014-05-30 2014-08-13 东南大学 Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN105487838A (en) * 2015-11-23 2016-04-13 上海交通大学 Task-level parallel scheduling method and system for dynamically reconfigurable processor
CN105867994A (en) * 2016-04-20 2016-08-17 上海交通大学 Instruction scheduling optimization method for coarse-grained reconfigurable architecture complier
CN106155979A (en) * 2016-05-19 2016-11-23 东南大学—无锡集成电路技术研究所 A kind of DES algorithm secret key based on coarseness reconstruction structure extension system and extended method
CN108616348A (en) * 2018-04-19 2018-10-02 清华大学无锡应用技术研究院 The method and system of security algorithm, decipherment algorithm are realized using reconfigurable processor

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008005020A (en) * 2006-06-20 2008-01-10 Matsushita Electric Ind Co Ltd Programmable logic circuit
CN102073481B (en) * 2011-01-14 2013-07-03 上海交通大学 Multi-kernel DSP reconfigurable special integrated circuit system
CN102156666B (en) * 2011-04-20 2012-11-28 上海交通大学 Temperature optimizing method for resource scheduling of coarse reconfigurable array processor
CN102567279B (en) * 2011-12-22 2015-03-04 清华大学 Generation method of time sequence configuration information of dynamically reconfigurable array
JP6587188B2 (en) * 2015-06-18 2019-10-09 パナソニックIpマネジメント株式会社 Random number processing apparatus, integrated circuit card, and random number processing method
CN105975251B (en) * 2016-05-19 2018-10-02 东南大学—无锡集成电路技术研究所 A kind of DES algorithm wheel iteration systems and alternative manner based on coarseness reconstruction structure

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102508816A (en) * 2011-11-15 2012-06-20 东南大学 Configuration method applied to coarse-grained reconfigurable array
CN103984560A (en) * 2014-05-30 2014-08-13 东南大学 Embedded reconfigurable system based on large-scale coarseness and processing method thereof
CN105487838A (en) * 2015-11-23 2016-04-13 上海交通大学 Task-level parallel scheduling method and system for dynamically reconfigurable processor
CN105867994A (en) * 2016-04-20 2016-08-17 上海交通大学 Instruction scheduling optimization method for coarse-grained reconfigurable architecture complier
CN106155979A (en) * 2016-05-19 2016-11-23 东南大学—无锡集成电路技术研究所 A kind of DES algorithm secret key based on coarseness reconstruction structure extension system and extended method
CN108616348A (en) * 2018-04-19 2018-10-02 清华大学无锡应用技术研究院 The method and system of security algorithm, decipherment algorithm are realized using reconfigurable processor

Also Published As

Publication number Publication date
CN109672524A (en) 2019-04-23

Similar Documents

Publication Publication Date Title
US10534839B2 (en) Method for matrix by vector multiplication for use in artificial neural network
CN110321162B (en) Present encryption algorithm implementation method and system based on coarse-granularity reconfigurable computing unit
CN105335331B (en) A kind of SHA256 realization method and systems based on extensive coarseness reconfigurable processor
US7856102B2 (en) Methods and apparatus for providing a message authentication code using a pipeline
CN105912501B (en) A kind of SM4-128 Encryption Algorithm realization method and systems based on extensive coarseness reconfigurable processor
CN110059493B (en) SKINNY-128-128 encryption algorithm implementation method and system based on coarse-grained reconfigurable computing unit
CN105975251B (en) A kind of DES algorithm wheel iteration systems and alternative manner based on coarseness reconstruction structure
CN103218348B (en) Fast Fourier Transform (FFT) disposal route and system
CN109672524B (en) SM3 algorithm round iteration system and iteration method based on coarse-grained reconfigurable architecture
CN110784307B (en) Lightweight cryptographic algorithm SCENERY implementation method, device and storage medium
CN112464296B (en) Large integer multiplier hardware circuit for homomorphic encryption technology
CN108959168B (en) SHA512 full-flow water circuit based on-chip memory and implementation method thereof
CN111563281B (en) Processor supporting multiple encryption and decryption algorithms and implementation method thereof
CN103761068A (en) Optimized Montgomery modular multiplication method, optimized modular square method and optimized modular multiplication hardware
KR102075848B1 (en) Method, Apparatus and Recording Medium Of Polynomial Operation Optimization Processing
KR20230141045A (en) Crypto-processor Device and Data Processing Apparatus Employing the Same
CN117407640A (en) Matrix calculation method and device
CN106021171A (en) An SM4-128 secret key extension realization method and system based on a large-scale coarseness reconfigurable processor
JP2015503785A (en) FFT / DFT reverse sorting system, method, and operation system thereof
CN106155979B (en) A kind of DES algorithm secret key expansion system and extended method based on coarseness reconstruction structure
CN112003688A (en) CUDA-based data encryption and decryption processing method and system
RU120303U1 (en) DEVICE FOR TRANSFORMING DATA BLOCKS DURING ENCRYPTION
CN112134691B (en) NLCS block cipher realization method, device and medium with repeatable components
CN111368250B (en) Data processing system, method and equipment based on Fourier transformation/inverse transformation
Järvinen et al. Efficient byte permutation realizations for compact AES implementations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant