CN112256633A - Command-driven commercial password special processor system - Google Patents

Command-driven commercial password special processor system Download PDF

Info

Publication number
CN112256633A
CN112256633A CN202011096724.2A CN202011096724A CN112256633A CN 112256633 A CN112256633 A CN 112256633A CN 202011096724 A CN202011096724 A CN 202011096724A CN 112256633 A CN112256633 A CN 112256633A
Authority
CN
China
Prior art keywords
instruction
module
bits
data
write
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011096724.2A
Other languages
Chinese (zh)
Inventor
赵昀昊
陈志坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202011096724.2A priority Critical patent/CN112256633A/en
Publication of CN112256633A publication Critical patent/CN112256633A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/82Architectures of general purpose stored program computers data or demand driven
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • G06F15/7878Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS for pipeline reconfiguration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/30007Arrangements for executing specific machine instructions to perform operations on data operands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3005Arrangements for executing specific machine instructions to perform operations for flow control

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Storage Device Security (AREA)

Abstract

A commercial cipher special processor system driven by instruction is prepared as entering processor into main flow line after receiving cipher configuration information, fetching instruction from internal instruction storage by instruction fetching module and sending it to decoding module, carrying out decoding by decoding module according to coding format of customized instruction set, maintaining register file with variable bit width and sending selection signal, operation code and operand to each special executing module facing to encryption and decryption, using data access module to store and read data from outside of chip according to packet length and write output result in it, using cipher operation module to process data in sequence according to instruction decoding result and to feed forward and write back intermediate result, using branch jump module to judge to enter, continue or end iterative cycle according to condition. The invention provides a reasonable intermediate data scheme by driving a special execution module through an instruction and orienting to an encryption and decryption algorithm, and can additionally increase an execution unit and flexibly support more encryption and decryption algorithms.

Description

Command-driven commercial password special processor system
Technical Field
The invention belongs to the technical field of integrated circuit design and electronic information security, and particularly relates to a customized instruction set based on a Chinese national standard commercial cryptographic algorithm and a processor implementation thereof.
Background
Since the 21 st century, the technology is changing day by day, the information technology is rapidly developed, and people live in a big era of big data and internet of things. The emerging information storage and interaction modes such as mobile communication, electronic commerce, intelligent medical treatment and portable equipment are emerging and popularized, great convenience is brought, and meanwhile, people pay more and more attention to the problems related to information safety and propagation risks. Cryptography is undoubtedly the basis and core of reducing the risk of information security.
The rise of modern cryptography dates back to the release of "new direction of cryptography" in the 70's of the 20 th century and to the DES block cipher algorithm developed by IBM corporation of the united states in 1972 and adopted as a standard by the united states.
China's own modern cryptosystems began late. The ZUC algorithm, also called grand rushing algorithm, is the first cryptographic algorithm that is independently designed and becomes the international cryptographic standard in china, and is formally established in 2009 and later passed by the 3GPP SA congress, becoming the core algorithm of the third set of encryption standard of 3GPP LTE.
Thereafter, the national cryptology authority published SM2 elliptic curve public key cryptographic algorithms and SM3 cryptographic hash algorithms in 12 months in 2010, SM4 block cryptographic algorithms in 3 months in 2012, and SM9 identity cryptographic algorithms in 3 months in 2016. China gradually forms a complete national password/commercial password system and obtains international approval: 11, 3 months in 2017, and the SM2 and SM9 algorithms become ISO/IEC international standards; in 2018, in 10 months, the SM3 algorithm formally became an ISO/IEC international standard: 24 days 4 and 4 in 2020, ZUC sequence cipher algorithm gets a uniform pass and becomes ISO/IEC international standard.
Generally, the implementation of the encryption algorithm can be realized by software based on a general-purpose processor or a microprocessor, and the implementation mode has the advantages of convenient control, low design difficulty and strong flexibility. However, nowadays, a large number of security demand scenarios all have requirements of low cost, low power consumption, high real-time performance, high operation efficiency, and the like, which requires to share operations by using hardware resources. According to the relationship between the hardware implementation and the coupling degree of the processor, there are generally the following ways:
(1) ASIC application specific integrated circuit implementation. Generally, the method is designed by aiming at a single algorithm or similar algorithms in a customized mode, hardware resources are few, the speed is high, the energy efficiency is high, but the flexibility is poor, the application range is narrow, the safety is limited, and once the algorithm is upgraded, the function of the algorithm is lost.
(2) Extended instruction set implementation. The coprocessor can be integrated, and the processor customization unit can be directly modified. Based on the existing processor architecture, additional encryption and decryption special instructions are designed and related hardware modules are introduced, so that partial operations in a cryptographic algorithm are accelerated in a targeted manner while the performance of a general processor is not influenced significantly. But also has the problems of low throughput rate and limited power consumption.
(3) ASIP private processor implementation. Aiming at the application, a set of special processor architecture is designed, implementation modes such as parallel processing, reconfigurable computing and customized storage structures are introduced, and certain flexibility is achieved while performance and energy consumption are pursued.
Disclosure of Invention
In order to overcome the defects of the prior art, the invention aims to provide a set of customized instruction set and a special processor for realizing the customized instruction set by considering the requirements of high performance and less resource consumption aiming at the Chinese national standard cryptographic algorithm.
The invention adopts the following technical scheme:
an instruction-driven commercial cipher special-purpose processor system, the system comprising:
the password configuration module is used for receiving password algorithm configuration information written by the bus when the processor pipeline is idle, and starting the processor pipeline after the configuration information is complete;
the internal storage unit is used for storing preset data or intermediate data in a processor pipeline and comprises an internal instruction memory and a register file; the internal instruction memory is used for storing an instruction sequence of the corresponding encryption and decryption algorithm configuration, and the configuration can be changed; the register file is composed of 32 128-bit registers, has 2 read ports and 1 write port, can operate variable-length continuous registers, and gives consideration to the correlation and the priority when reading and writing;
a processor pipeline module for performing execution of a cryptographic algorithm, the processor pipeline module comprising:
the instruction fetching module receives the cryptographic algorithm and the mode information transmitted by the cryptographic configuration module, starts to obtain an instruction from a corresponding address in an internal instruction memory, executes the instruction sequence to increase progressively or receives an effective address of the branch jump module, and transmits the instruction to the decoding stage;
the decoding module receives the effective instruction and the effective signal transmitted by the instruction fetching module, decodes according to the coding mode of the customized instruction, reads the data of the register file, and transmits the execution module, the execution function, the execution data and the like to the next-stage execution stage, or receives the effective write-back data transmitted by each execution module of the write-back stage and executes the write-back of the register file;
the operation integration module receives the effective execution signal and the execution data transmitted by the decoding module, executes corresponding encryption and decryption operations according to instruction operation, and transmits operation results to a write-back stage;
the branch jump module judges whether jump occurs according to a judgment mode selection signal transmitted by the decoding module, transmits a PC value for calculating the jump target to the instruction fetching module, judges whether write back occurs according to an execution mode selection signal, and transmits write back data to a write back stage;
the data access module reads or writes data into the off-chip data memory according to the read-write selection signal transmitted by the decoding module, and sends the self-increased address to the write-back stage according to the address self-increasing signal.
Further, the processor pipeline module performs instruction fetching, decoding and execution operations according to a customized instruction set oriented to the China national standard commercial cryptographic algorithm.
Still further, the custom instruction set includes a cryptographic operation instruction, a data access instruction, and a branch jump instruction.
Furthermore, the instruction set is 32 bits in length, bits 31 to 30 represent instruction types, 01 represents a cryptographic operation instruction, 10 represents a branch jump instruction, and 11 represents a data access instruction; wherein,
for a cryptographic operation instruction, bits 24 to 21 are operation selection, bits 20 to 16 are destination registers, bits 15 to 10 are source register 0, and bits 10 to 6 are source register 1; the lowest 6 bits are immediate;
for the branch jump instruction, the 29 th to 27 th bits are branch jump conditions, the 26 th to 25 th bits are comparison mode and operation, the 20 th to 16 th bits and the lower 6 th bits constitute branch jump absolute address or relative address, the 15 th to 10 th bits are source register 0, the 10 th to 6 th bits can be source register 1, and can also form immediate with the 23 th to 21 th bits;
for the data access instruction, bit 29 is read-write selection, bit 24 is address self-increment selection, bits 23 to 21 are data length, and bits 15 to 10 are source register 0; when the instruction is read data, the 20 th to 16 th bits are used as a destination register, and the lower 11 th bits are used as a base address offset; when the instruction is to write data, bits 10 to 6 are source register 1, and bits 20 to 16 and the lower 6 constitute the base address offset.
The processor pipeline includes: fetch, decode, execute, and write back.
The operation integration module comprises a data feedforward path and is used for the correlation basis and data selection of the register value of the decoding module.
The interaction mode of the invention and the central processing unit is as follows: and the password configuration module receives password configuration information written into the control register by the central processing unit when the encryption and decryption operation is idle. After enough password configuration information is received, the encryption and decryption operation process can be entered. And not accepting new password configuration information and encryption and decryption requests until the operation is finished and the output result is written. And after the output result is written in, feeding back an encryption and decryption operation ending signal to the central processing unit.
Generally, the cryptographic configuration information includes cryptographic algorithms and modes, initial vector addresses, initial key addresses, input information bit lengths, output information addresses.
Furthermore, the encryption and decryption operation flow adopts a pipeline structure similar to that of a traditional processor, in particular to a pipeline structure of 4 levels of fetching, decoding, executing and writing back.
Wherein,
the instruction fetching module receives the cryptographic algorithm and the mode information transmitted by the cryptographic configuration module, starts to obtain an instruction from a corresponding address in the internal instruction memory, and transmits the instruction to the decoding stage.
The decoding module receives the instruction and the effective signal transmitted by the instruction fetching module, decodes according to the coding mode of the encryption and decryption customized instruction set, and reads the data of the register file. The decode module passes the execution module, execution function, execution data, etc. to the next level of execution.
On the other hand, the decoding module receives information such as a write-back valid signal, a write-back register, write-back data and the like returned by each execution module of the write-back stage, and writes back the register file.
Further, a data feedforward path is additionally added on the basis of data write-back for the high efficiency of the data path and the smooth promotion of the pipeline. And when the read-after-write correlation exists before and after the instruction sequence, the decoding module acquires the operation result of the execution stage through the data feedforward path.
Further, the register file design of the decoding module comprises 32 registers, each register has a basic bit width of 128 bits, and the register file comprises 2 read ports and 1 write port. Considering that, in the chinese national standard cryptographic algorithm, the message packet length of the SM3 message digest algorithm is 512 bits, the digest length is 256 bits, and the compression function state length is 256 bits; the SM4 block cipher algorithm has a block length of 128 bits and a key length of 128 bits; the length of a secret key of the ZUC grandma sequence cipher algorithm is 128 bits, the length of an initial vector is 128 bits, the length of an intermediate variable is 32 bits, the length of each round of output is 32 bits, and the total length of the linear feedback shift register can be regarded as 512 bits. The register bit width of the processor is designed to be 128 bits, so that the requirements of large quantity, wide length and strong correlation of intermediate data of cryptographic algorithm operation are met, and the data can be stored and calculated more completely. Particularly, based on the same requirement, the processor supports a variable data width, can read or write a register number once, and can operate data of 1 to 4 registers, namely, data of 128 bits, 256 bits, 384 bits or maximum 512 bits can be read or written once, so that the application scenarios of the cryptographic algorithm are more abundant and diversified.
The operation integration module receives control signals such as enable signals, operation selection signals and the like transmitted by the decoding module and operation data including data read by the register file and immediate data in the instruction, executes corresponding encryption and decryption operations according to the operation selected by the instruction, transmits operation results and the control signals to the write-back stage after the operations are completed, and transmits the operation results and the control signals to the decoding module.
The branch jump module receives control signals transmitted by the decoding module, such as enable signals, judgment mode selection, execution mode selection and the like, and operation data which can comprise register operands to be compared, immediate numbers to be compared, PC offset of jump and the like. Then, whether skipping occurs or not is judged according to the judging mode, and a skipping enabling signal and a skipping target PC value are transmitted to the instruction fetching module; and judging whether to write back according to the execution mode, transmitting the write back data and the control signal to a write back stage, and transmitting the write back data and the control signal to a decoding module.
The data access module receives control signals transmitted by the decoding module, such as enabling signals, read-write selection, data length, address self-increment selection and the like, and operation data, including access base addresses, address selection signals, address offset, data to be stored and the like, and then initiates read or write requests to an external memory. Generally, the data access module reads data such as an initial key, an initial vector, input information, and writes operation results/output information. When the complete data reading of one instruction is finished, the transmission data and the control signal are sent to the write-back stage and sent to the decoding module. According to the address self-increment selection, after the complete data transmission of one instruction is finished, the next address data and control signal of the last transmission are sent to the write-back stage and sent to the decoding module.
In addition, the bus transmission interface of the invention adopts an AMBA2 AHB bus protocol, meets the performance and clock requirements of the encryption and decryption special processor, and does not need too large bandwidth.
The invention has the beneficial effects that: the encryption and decryption algorithm is oriented, a special execution module is driven through an instruction, a reasonable intermediate data scheme is provided, an execution unit can be additionally added, and more encryption and decryption algorithms are flexibly supported.
Drawings
Fig. 1 is a schematic diagram of a processor according to an embodiment of the invention.
FIG. 2 is a diagram of a programming model according to the present invention.
Detailed Description
In order to more clearly illustrate the detailed description of the invention or the technical solutions in the prior art, the drawings that are needed in the detailed description of the invention or the prior art will be briefly described below.
Referring to fig. 1 and 2, an instruction-driven commercial cipher special processor system designs corresponding instructions for an SM3 cipher hash algorithm, an SM4 block cipher algorithm and a ZUC grand bust sequence algorithm published by the chinese national cipher administration, and reserves an encoding space for other newly added cipher algorithms at the same time, and has certain expandability.
As can be seen from fig. 1, when the encryption/decryption dedicated processor needs to be woken up, the central processing unit needs to write the cryptographic algorithm configuration information into the control register of the present embodiment. The control register related to the embodiment of the invention is as follows: an 8-bit cryptographic algorithm configuration register crypt _ cfg, a 32-bit initial vector address register iv _ addr, a 32-bit initial key address register key _ addr, a 32-bit input information address register txt _ in _ addr, a 32-bit result output address register txt _ out _ addr, and a 64-bit input information length register txt _ len. When the CPU writes all the input and output related information and configures the cryptographic algorithm, it enters the main pipeline of the encryption and decryption special processor. The password configuration module sends the valid signal to a main pipeline value-taking stage.
The cryptographic algorithm configuration register crypt _ cfg is available in 8 bits. The upper 2 bits are used to indicate encryption mode, 00 indicates information decryption, 01 indicates information encryption, 10 indicates digital signature verification, and 11 indicates digital signature generation. The lower 6 bits represent the cryptographic algorithm. For the current design, 0x0 corresponds to the SM2 elliptic curve public key cryptographic algorithm, 0x1 corresponds to the SM3 cryptographic hash algorithm and the SM4 block cipher algorithm, 0x2 corresponds to the SM9 identification cryptographic algorithm, and 0x3 corresponds to the ZUC grand sequence cryptographic algorithm. A considerable algorithm configuration space is reserved.
Considering that the SM3 cryptographic hash algorithm input message length is not greater than 264Bit, the bit width of the input information length register txt _ len is set to 64 bits.
After entering a main pipeline of the processor, firstly, the instruction fetching module reads a valid signal of the cryptographic algorithm received from the cryptographic configuration module and reads a cryptographic algorithm configuration register crypt _ cfg to obtain an initial PC of the corresponding algorithm as the beginning of the main pipeline.
The instruction fetching module corresponds to an instruction fetching stage production line and is mainly responsible for the work of value taking PC maintenance, instruction branch jumping and the like. Typically, the instruction fetch module increments the PC in order and fetches 32-bit instructions from the internal instruction memory per PC. When the instruction fetching module receives the branch jump valid signal transmitted by the branch jump module, the PC is updated to be the branch jump PC.
In addition, the instruction fetch module receives a pipeline stall signal from the decode module to wait for the execution of the branch jump instruction or the data access instruction to complete. On one hand, generally, before a group of data is processed and after the group of data is completely processed, a branch jump instruction and a data access instruction are executed, and the operation of the group of data cannot be influenced; on the other hand, the cryptographic algorithm has the advantages of strong data dependency and compact processing, and cannot execute other operation instructions on a pipeline.
The decoding module mainly corresponds to a decoding stage pipeline and a write-back stage pipeline and is mainly responsible for the operations of decoding of instructions, scheduling and write-back of registers, preparation of operands and the like. In the decoding stage, the decoding module receives the instruction and the effective signal thereof transmitted by the instruction fetching module, decodes according to the coding format of the customized encryption and decryption instruction set, prepares operands according to the decoding result, pushes the instruction operands and the operation information into the execution stage, and distributes the instruction operands and the operation information to each execution unit.
In the write-back stage, the decoding module receives a write-back effective signal, a write-back register number, write-back data and a write-back register length which are transmitted back by each execution unit, and sends effective write-back to the register file. Furthermore, the decoding module receives a feedforward effective signal, a feedforward register number, feedforward data and a feedforward register length which are returned by the operation execution unit, and when the current decoding instruction is related to the operand of the execution instruction, the decoding module selects the corresponding feedforward data as the operand.
The decoding module needs to maintain a register file, the register file comprises 32 general registers, the bit width of the general registers is 128 bits, the register file is provided with 2 read ports and 1 write port, and the register read ports are all bit width variable read ports. The 1 read-write port of the register file can read and write data of 1 to 4 registers which are continuous from the specified register number by specifying the register number and the register length, namely 128 bits of a single register can be read and written at one time, or 256 bits and 384 bits of data can be read and written in a plurality of continuous registers, so that the maximum 512 bits of data can be obtained. If the read port of the register file and the write-back port and the feed-forward path of the register file have the relevance of the registers, the data of each register is acquired according to the sequence of the feed-forward path first and the write-back port second. The whole register file supports the temporary storage of input and output and intermediate results, and no read-write conflict exists.
The execution unit corresponds to the execution stage and is divided into 3 parts, as shown in fig. 1, including an operation integration module, a branch jump module and a data access module. It is mainly responsible for executing the instruction operation transmitted from the decoding stage and sending the instruction result to the write-back stage according to the operation.
Figure BDA0002724003720000051
Figure BDA0002724003720000061
TABLE 1
The operation integration module integrates the operation realization of each encryption and decryption algorithm and is used as the mapping of a customized instruction set of a core. The cipher operation instruction encoding is shown in table 1. The 31 st bit to the 30 th bit are fixed as 01 and represent an encryption and decryption operation instruction; bit 29 to bit 25 reserved; the 24 th bit to the 21 st bit are used for representing operation; bits 20 to 16 represent the destination register; bits 15 to 10 represent a source register 0; bits 10 to 6 represent a source register 1; the lowest 6 bits represent the number of rounds or stages of algorithm operation, 64 rounds of iterative compression of the maximum matching SM3 cryptographic hash algorithm. In consideration of the complexity and flexible debugging capability of the cryptographic algorithm operation, each cryptographic algorithm module is independent, and each cryptographic algorithm module internally comprises a sub-module of each operation. A cryptographic operation instruction may not contain a source operand of 1, but must require a source operand of 0 and write back result data. After the execution of one cipher operation instruction is finished, the feedforward signal and the write-back signal are sent to the decoding module.
Figure BDA0002724003720000062
TABLE 2
The main application scenes of the branch jump instruction mainly comprise different modes of branch jump of algorithm configuration, length-based iterative branch jump, length-based message filling branch jump, branch jump whether the operation result is reasonable or not, direct jump back to the start of iterative loop, direct jump back to the idle state of a production line and the like, and the division, the start or the end of the operation stage of the cryptographic algorithm is corresponded. The encoding format of the branch jump instruction is shown in table 2. Bits 31 to 30 are fixed to 10, representing a branch jump instruction; the 29 th bit to the 27 th bit represent branch jump conditions, 000 corresponds to absolute address jump, 001 corresponds to relative address jump, 010 corresponds to equal branch instructions, 011 corresponds to unequal branch instructions, 100 corresponds to more than or equal to branch instructions, 101 corresponds to more than branch instructions, 110 corresponds to less than or equal to branch instructions, and 11 corresponds to less than branch instructions; bits 26 through 25 indicate the manner and operation of comparison, effective at the time of the branch instruction, 00 indicates that the register value is compared with the register value, 01 indicates that the register value is compared with the immediate value, 10 indicates that the register value is compared with the immediate value and the immediate value is subtracted when the register value is greater than or equal to the immediate value, and 11 indicates that the register value is subtracted from the immediate value and compared with 0; the 20 th bit to the 16 th bit and the lower 6 th bit form a branch jump absolute address or a relative address with 11 bits; bits 15 to 10 represent a source register 0; the 10 th bit to the 6 th bit either represent the source register 1 or constitute an 8-bit immediate as the lower bits with the 23 rd bit to the 21 st bit.
In particular, the operation of subtracting the immediate value from the register value occurs in the context of determining iteration loop or exit, and the source operand 0 corresponds to the information length. At this time, the branch jump module sends the branch jump valid signal and the target PC, and simultaneously sends the write-back information and the write-back data back to the decoding module, and the destination register number should be the same as the source register 0.
In addition, the branch jump instruction may stall the pipeline for one beat based on considerations of scene continuity and data continuity.
Figure BDA0002724003720000063
TABLE 3
Figure BDA0002724003720000064
TABLE 4
The data access module is responsible for reading and writing information to the off-chip processor, and the information comprises an initial vector, an initial password, input information and output information, wherein the input information is read and operated according to the packet length, and the output information is written according to the packet length in an encryption and decryption mode. The encoding format of the data access instruction is shown in table 3 and table 4. Bits 31 to 30 are fixed to 11, representing a data access instruction; bit 29 represents the data direction, 0 represents read data, and 1 represents write data; the 24 th bit represents whether the address is updated or not after the data access is finished; the 23 rd bit to the 21 st bit represent the byte number of the access data, 000 represents the access of 1 byte of data with 8 bit width, and 110 represents the access of 64 bytes of data with 512 bit width; bits 15 through 10 represent source register 0, with source operand 0 corresponding to the data access base address. When the instruction is read data, the 20 th bit to the 16 th bit are used as a destination register, and the lower 11 bits are used as a base address offset; when the instruction is write data, the 10 th bit to the 6 th bit represent a source register 1, the source operand 1 is data to be written, and the 20 th bit to the 16 th bit and the lower 6 th bit form an 11-bit base address offset.
The address self-increment is applied to a scene of reading input information and writing output information, a group of fixed-length data is operated by one-time cryptographic algorithm operation, the continuous operation is finished until the last group of data or a group of data additionally filled by information is finished, the addresses are continuous, and the address self-increment effectively optimizes the instruction number and the instruction operation.
Generally, the input data width is 128 bits or 256 bits, and the output data width is 32 bits, 128 bits or 256 bits.
Embodiments of the present application may be provided as a method, system, or computer program product, and the above description is only a preferred embodiment of the present invention, and the scope of the present invention is not limited to the above embodiments, and all the technical solutions, modifications, and variations that belong to the spirit of the present invention belong to the scope of the present invention.

Claims (6)

1. An instruction-driven commercial crypto-specific processor system, the system comprising:
the password configuration module is used for receiving password algorithm configuration information written by the bus when the processor pipeline is idle, and starting the processor pipeline after the configuration information is complete;
the internal storage unit is used for storing preset data or intermediate data in a processor pipeline and comprises an internal instruction memory and a register file; the internal instruction memory is used for storing an instruction sequence of the corresponding encryption and decryption algorithm configuration, and the configuration can be changed; the register file is composed of 32 128-bit registers, has 2 read ports and 1 write port, can operate variable-length continuous registers, and gives consideration to the correlation and the priority when reading and writing;
a processor pipeline module for performing execution of a cryptographic algorithm, the processor pipeline module comprising:
the instruction fetching module receives the cryptographic algorithm and the mode information transmitted by the cryptographic configuration module, starts to obtain an instruction from a corresponding address in an internal instruction memory, executes the instruction sequence to increase progressively or receives an effective address of the branch jump module, and transmits the instruction to the decoding stage;
the decoding module receives the effective instruction and the effective signal transmitted by the instruction fetching module, decodes according to the coding mode of the customized instruction, reads the data of the register file, and transmits the execution module, the execution function, the execution data and the like to the next-stage execution stage, or receives the effective write-back data transmitted by each execution module of the write-back stage and executes the write-back of the register file;
the operation integration module receives the effective execution signal and the execution data transmitted by the decoding module, executes corresponding encryption and decryption operations according to instruction operation, and transmits operation results to a write-back stage;
the branch jump module judges whether jump occurs according to a judgment mode selection signal transmitted by the decoding module, transmits a PC value for calculating the jump target to the instruction fetching module, judges whether write back occurs according to an execution mode selection signal, and transmits write back data to a write back stage;
the data access module reads or writes data into the off-chip data memory according to the read-write selection signal transmitted by the decoding module, and sends the self-increased address to the write-back stage according to the address self-increasing signal.
2. The instruction-driven commercial cipher special-purpose processor system according to claim 1, wherein the processor pipeline module performs fetching, decoding and executing operations according to a customized instruction set facing the chinese national standard commercial cipher algorithm.
3. The instruction-driven commodity-code special-purpose processor system according to claim 2, wherein said custom instruction set comprises a cryptographic operation instruction, a data access instruction, and a branch jump instruction.
4. The instruction-driven commodity cipher special purpose processor system according to claim 3, wherein the instruction set is 32 bits in length, the 31 st to 30 th bits represent an instruction type, 01 represents a cipher operation instruction, 10 represents a branch jump instruction, and 11 represents a data access instruction; wherein,
for a cryptographic operation instruction, bits 24 to 21 are operation selection, bits 20 to 16 are destination registers, bits 15 to 10 are source register 0, and bits 10 to 6 are source register 1; the lowest 6 bits are immediate;
for the branch jump instruction, the 29 th to 27 th bits are branch jump conditions, the 26 th to 25 th bits are comparison mode and operation, the 20 th to 16 th bits and the lower 6 th bits constitute branch jump absolute address or relative address, the 15 th to 10 th bits are source register 0, the 10 th to 6 th bits can be source register 1, and can also form immediate with the 23 th to 21 th bits;
for the data access instruction, bit 29 is read-write selection, bit 24 is address self-increment selection, bits 23 to 21 are data length, and bits 15 to 10 are source register 0; when the instruction is read data, the 20 th to 16 th bits are used as a destination register, and the lower 11 th bits are used as a base address offset; when the instruction is to write data, bits 10 to 6 are source register 1, and bits 20 to 16 and the lower 6 constitute the base address offset.
5. The instruction-driven commercial cipher special-purpose processor system according to claim 4, wherein the processor pipeline comprises: fetch, decode, execute, and write back.
6. The instruction-driven commercial cipher special-purpose processor system according to any one of claims 1 to 5, wherein the arithmetic integration module comprises a data feed-forward path for data selection and dependency of register values of the decoding module.
CN202011096724.2A 2020-10-14 2020-10-14 Command-driven commercial password special processor system Pending CN112256633A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011096724.2A CN112256633A (en) 2020-10-14 2020-10-14 Command-driven commercial password special processor system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011096724.2A CN112256633A (en) 2020-10-14 2020-10-14 Command-driven commercial password special processor system

Publications (1)

Publication Number Publication Date
CN112256633A true CN112256633A (en) 2021-01-22

Family

ID=74243416

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011096724.2A Pending CN112256633A (en) 2020-10-14 2020-10-14 Command-driven commercial password special processor system

Country Status (1)

Country Link
CN (1) CN112256633A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254082A (en) * 2021-06-23 2021-08-13 北京智芯微电子科技有限公司 Conditional branch instruction processing method and system, CPU and chip
CN114629665A (en) * 2022-05-16 2022-06-14 百信信息技术有限公司 Hardware platform for trusted computing

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254082A (en) * 2021-06-23 2021-08-13 北京智芯微电子科技有限公司 Conditional branch instruction processing method and system, CPU and chip
CN113254082B (en) * 2021-06-23 2021-10-08 北京智芯微电子科技有限公司 Conditional branch instruction processing method and system, CPU and chip
CN114629665A (en) * 2022-05-16 2022-06-14 百信信息技术有限公司 Hardware platform for trusted computing

Similar Documents

Publication Publication Date Title
US10928847B2 (en) Apparatuses and methods for frequency scaling a message scheduler data path of a hashing accelerator
US20220138329A1 (en) Microprocessor pipeline circuitry to support cryptographic computing
CN107667499B (en) Keyed hash message authentication code processor, method, system, and instructions
KR100942668B1 (en) Multithreaded processor with efficient processing for convergence device application
US11121856B2 (en) Unified AES-SMS4—Camellia symmetric key block cipher acceleration
JP3789454B2 (en) Stream processor with cryptographic coprocessor
CN106575215B (en) System, device, method, processor, medium, and electronic device for processing instructions
US20220198027A1 (en) Storage encryption using converged cryptographic engine
US9544133B2 (en) On-the-fly key generation for encryption and decryption
US6920562B1 (en) Tightly coupled software protocol decode with hardware data encryption
CN108228960B (en) Simon-based hashing for fuse verification
CN112256633A (en) Command-driven commercial password special processor system
TW201812637A (en) Low cost cryptographic accelerator
CN111563281A (en) Processor supporting multiple encryption and decryption algorithms and implementation method thereof
WO2022143536A1 (en) Apsoc-based state cipher calculation method, system, device, and medium
CN112417522A (en) Data processing method, security chip device and embedded system
CN114154640A (en) Processor for realizing post-quantum cryptography Saber algorithm
CN114465820B (en) Data encryption method, data encryption device, electronic device, program, and medium
EP3671438B1 (en) Systems and methods to transpose vectors on-the-fly while loading from memory
US9438414B2 (en) Virtualized SHA computational engine
CN110659505A (en) Accelerator for encrypting or decrypting confidential data and additional authentication data
TW201712534A (en) Decoding information about a group of instructions including a size of the group of instructions
KR101126596B1 (en) Dual mode aes implementation to support single and multiple aes operations
Pu et al. Computing privacy-preserving edit distance and Smith-Waterman problems on the GPU architecture
CN113672946B (en) Data encryption and decryption assembly, related device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination