CN112417522A - Data processing method, security chip device and embedded system - Google Patents

Data processing method, security chip device and embedded system Download PDF

Info

Publication number
CN112417522A
CN112417522A CN202011387866.4A CN202011387866A CN112417522A CN 112417522 A CN112417522 A CN 112417522A CN 202011387866 A CN202011387866 A CN 202011387866A CN 112417522 A CN112417522 A CN 112417522A
Authority
CN
China
Prior art keywords
data stream
ram
algorithm
processed
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011387866.4A
Other languages
Chinese (zh)
Inventor
胡灿辉
陈保儒
帅兰兰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huada Zhibao Electronic System Co Ltd
Original Assignee
Beijing Huada Zhibao Electronic System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Huada Zhibao Electronic System Co Ltd filed Critical Beijing Huada Zhibao Electronic System Co Ltd
Priority to CN202011387866.4A priority Critical patent/CN112417522A/en
Publication of CN112417522A publication Critical patent/CN112417522A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/70Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer
    • G06F21/71Protecting specific internal or peripheral components, in which the protection of a component leads to protection of the entire computer to assure secure computing or processing of information
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2107File encryption

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Mathematical Physics (AREA)
  • Storage Device Security (AREA)

Abstract

The invention relates to a data processing method, a security chip device and an embedded system, belongs to the technical field of computers, and solves the problems that the existing data processing speed is low and the existing data processing speed cannot adapt to an encryption and decryption algorithm with large calculation amount. The method comprises the following steps: step S1: storing the first data stream in a first RAM; step S2: storing the second data stream in a second RAM while processing the first data stream by the arithmetic unit and storing the processed first data stream in a first RAM; step S3: outputting the processed first data stream to an upper computer, processing a second data stream through an algorithm unit, and storing the processed second data stream in a second RAM; and step S4: and outputting the processed second data stream to an upper computer, wherein T is a data processing period. And in the same T, the data processing quantity is increased, and the utilization rate of the algorithm unit is improved, so that the working efficiency is improved.

Description

Data processing method, security chip device and embedded system
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, a security chip device, and an embedded system.
Background
A Random Access Memory (RAM), also called a main Memory, is an internal Memory that directly exchanges data with the CPU. It can be read and written at any time (except for refreshing), and has high speed, and is usually used as a temporary data storage medium of an operating system or other programs in operation. The RAM can write (store) or read (take out) information from any one of designated addresses at any time when it is operated. RAM is used in computers and digital systems to temporarily store programs, data, and intermediate results.
Processing a set of data in the prior art requires three steps: storing data into the RAM, processing data, and reading data out of the RAM require three processing cycles, and thus, six cycles are required to process two sets of data. The working mode has low data processing speed and cannot adapt to the encryption and decryption algorithm with large calculation amount.
Disclosure of Invention
In view of the foregoing analysis, embodiments of the present invention provide a data processing method, a secure chip apparatus, and an embedded system, so as to solve the problem that the existing data processing method is low in data processing rate and cannot adapt to an encryption/decryption algorithm with a large calculation amount.
In one aspect, an embodiment of the present invention provides a data processing method, including: step S1: storing the first data stream in a first RAM; step S2: storing a second data stream in a second RAM while processing the first data stream by an arithmetic unit and storing the processed first data stream in the first RAM; step S3: outputting the processed first data stream to an upper computer, processing the second data stream through the algorithm unit, and storing the processed second data stream in the second RAM; and step S4: and outputting the processed second data stream to the upper computer.
The beneficial effects of the above technical scheme are as follows: the first data stream is processed through the algorithm unit and stored in the first RAM while the second data stream is stored in the second RAM, the second data stream is processed through the algorithm unit and stored in the second RAM while the processed first data stream is output to the upper computer, the number of data processing can be increased within the same T, the utilization rate of the algorithm unit is improved, and therefore the working efficiency is improved.
Based on the further improvement of the method, the algorithm unit comprises one or more of an SM1 cryptographic algorithm module, an SM4 cryptographic algorithm module, an AES cryptographic algorithm module, a DES cryptographic algorithm module and a Hash algorithm module.
Based on a further improvement of the above method, storing the first data stream in the first RAM further comprises: inputting the first data stream via a four-wire SPI interface; and storing the first data stream in the first RAM through a RAM switch matrix.
Based on a further improvement of the above method, storing the second data stream in the second RAM, while processing the first data stream by the arithmetic unit and storing the processed first data stream in the first RAM further comprises: inputting the second data stream via a four-wire SPI interface and passing the second data stream through a RAM switch matrix in the second RAM; and simultaneously, the algorithm unit acquires the first data stream from the first RAM through the RAM switching matrix, performs algorithm processing on the first data stream, and stores the processed first data stream in the first RAM through the RAM switching matrix.
Based on the further improvement of the method, outputting the processed first data stream to an upper computer, processing the second data stream through the algorithm unit, and storing the processed second data stream in the second RAM further comprises: reading the processed first data stream from the first RAM through an RAM switching matrix, and outputting the processed first data stream to the upper computer through a four-wire SPI (serial peripheral interface); and simultaneously, the algorithm unit acquires the second data stream from the second RAM through the RAM switching matrix, performs the algorithm processing on the second data stream, and stores the processed second data stream in the second RAM through the RAM switching matrix.
Based on the further improvement of the method, outputting the processed second data stream to the upper computer further comprises: reading the processed second data stream from the second RAM through a RAM switching matrix; and outputting the processed second data stream to the upper computer through a four-wire SPI interface.
Based on a further improvement of the above method, processing the first data stream by an arithmetic unit and storing the processed first data stream in the first RAM further comprises: the CPU sends a key, a working mode and an algorithm processing starting instruction to the SFR interface module through the SFR bus; an algorithm core of the algorithm unit performs algorithm processing on the first data stream based on the secret key, the working mode and the algorithm processing starting instruction; after the algorithm processing is finished, transmitting an algorithm finishing instruction from the SFR interface module to the CPU through the SFR bus; after the CPU completes the instruction according to the algorithm, a first RAM writing instruction is generated and the RAM switching matrix is configured; and storing the processed first data stream in the first RAM through the RAM switch matrix according to the first RAM write-in instruction.
Based on a further improvement of the above method, processing the second data stream by the arithmetic unit and storing the processed second data stream in the second RAM further comprises: the CPU sends a key, a working mode and an algorithm processing starting instruction to the SFR interface module through the SFR bus; an algorithm core of the algorithm unit performs algorithm processing on the second data stream based on the secret key, the working mode and the algorithm processing starting instruction; after the algorithm processing is finished, transmitting an algorithm finishing instruction from the SFR interface module to the CPU through the SFR bus; after finishing the instruction according to the algorithm, the CPU generates a second RAM writing instruction and configures the RAM switching matrix; and storing the processed second data stream in the second RAM through the RAM switching matrix according to the second RAM writing instruction.
Based on the further improvement of the method, the method divides the data stream to be processed into a plurality of cycles, each cycle comprises the first data stream and the second data stream, and the method also comprises the following steps of when outputting the processed second data stream to the upper computer: the first data stream of the next cycle is acquired, and the step S1 is executed.
Based on the further improvement of the method, dividing the data stream to be processed into a plurality of periods, wherein each period comprises N data streams, and the N data streams comprise the first data stream and the second data stream; processing N data streams in each cycle by N RAMs, the N RAMs including the first RAM and the second RAM.
In another aspect, an embodiment of the present invention provides a secure chip apparatus, including a first RAM, a second RAM, and an algorithm unit, where: the first RAM is used for storing a first data stream and outputting the first data stream processed by the algorithm unit to an upper computer; the second RAM is used for storing a second data stream and outputting the second data stream processed by the algorithm unit to the upper computer; and the algorithm unit is used for processing the first data stream and storing the processed first data stream in the first RAM, and processing the second data stream and storing the processed second data stream in the second RAM, and the upper computer is positioned outside the security chip device.
Based on the further improvement of the device, the algorithm unit comprises one or more of an SM1 cryptographic algorithm module, an SM4 cryptographic algorithm module, an AES cryptographic algorithm module, a DES cryptographic algorithm module and a Hash algorithm module, which are respectively used for encryption and decryption of an SM1 cryptographic algorithm, an SM4 cryptographic algorithm, an AES cryptographic algorithm, a DES cryptographic algorithm and a Hash algorithm.
Based on the further improvement of the above device, the SM1 cryptographic algorithm module includes: the first SFR interface module is used for being connected with the CPU through an SFR bus; a first special function register for control configuration and data caching for the SM1 core; and the SM1 core, used for performing SM1 operations; and the SM4 cryptographic algorithm module comprises: the second SFR interface module is used for being connected with the CPU through an SFR bus; a second special function register for control configuration and data caching for the SM4 core; and the SM4 core is used for carrying out SM4 operation.
Based on the further improvement of the above device, the security chip device further comprises: the RAM switching matrix is connected with an AHB bus, and the first RAM and the second RAM are connected with the arithmetic unit through the RAM switching matrix and are used for: storing the first data stream and the second data stream in the first RAM and the second RAM, respectively; the algorithm unit respectively acquires the first data stream and the second data stream through the RAM switching matrix; and storing the processed first data stream and the second data stream in the first RAM and the second RAM respectively.
Based on a further improvement of the above apparatus, the security chip apparatus further includes a four-wire SPI interface, wherein the four-wire SPI interface connects the first RAM and the second RAM via the RAM switch matrix, and the four-wire SPI interface is configured to input the first data stream and the second data stream, and output the processed first data stream and the processed second data stream.
Based on the further improvement of the above device, the security chip device further comprises: the CPU is used for connecting the Hash algorithm module and the algorithm unit through the AHB bus; the memory management module EMMU is used for being connected with the CPU through the AHB bus, and is directly connected with the RAM switching matrix; and the storage management module EMMU is directly connected with the ROM, the FLASH and the SRAM.
Based on the further improvement of the above device, the security chip device further comprises: a Hash algorithm module for Hash algorithm, connected to the CPU through the AHB bus, connected to the first RAM and the second RAM via the RAM switch matrix, wherein the Hash algorithm module further comprises: the SFR interface module is used for being connected with the CPU through an SFR bus; the special function register is used for controlling configuration and data caching of the HASH core; and the HASH core is used for carrying out HASH operation.
Based on the further improvement of the device, the transmission clock of the four-wire SPI interface reaches 100MHz at most, and the data transmission speed reaches 50 MB/s.
Based on the further improvement of the above device, the security chip device further comprises: a true random number generator TRNG for generating a sequence of random numbers meeting security requirements for key generation, digital signature, said TRNG further comprising: TRNG0 is a switching voltage structure; TRNG1, TRNG2, and TRNG3 are switched current structures; a first exclusive-OR gate that exclusive-OR's the outputs of the TRNG0, the TRNG1, the TRNG2, and the TRNG3 to generate a parallel random number noise source seed; a supplementary random number generator for generating a supplementary random number from a supplementary seed or a feedback random number, wherein the supplementary seed is used for the first time and the feedback random number is used subsequently; a random number seed generator for generating a random number seed from the output of the first exclusive or gate and the supplemental random number; a linear feedback shift register LFSR for receiving the random number seed and performing a shift process on the random number seed to generate a plaintext and a secret key; and an RNG2 module to perform an SM4 operation on the plaintext and the key to generate a random number.
In another aspect, an embodiment of the present invention provides an embedded system, including: the security chip device described above.
Based on the further improvement of the system, the embedded system further comprises: the SD card is used for storing data streams, wherein the data streams comprise the first data stream and the second data stream; and an SD card controller for transmitting a data stream to be encrypted to the secure chip apparatus through the four-wire SPI interface to encrypt the data stream therein, or storing a decrypted data stream from the secure chip apparatus to the SD card through the four-wire SPI interface; the secure chip device and the SD card controller are arranged in the SD card.
Based on the further improvement of the system, the embedded system further comprises: the Nandflash memory is arranged in the SD card and comprises a safety area, a hidden area and a safety area, wherein the SD card controller is further used for storing an encrypted data stream to the Nandflash memory through an NF I/F interface or transmitting the encrypted data stream to be decrypted to the safety chip device through the four-wire SPI interface so as to decrypt the encrypted data stream to be decrypted.
Compared with the prior art, the invention can realize at least one of the following beneficial effects:
1. the first data stream is processed through the algorithm unit and stored in the first RAM while the second data stream is stored in the second RAM, the second data stream is processed through the algorithm unit and stored in the second RAM while the processed first data stream is output to the upper computer, the number of data processing can be increased within the same T, the utilization rate of the algorithm unit is improved, and therefore the working efficiency is improved.
2. Data stream is input or output to the safety chip device through the four-wire SPI interface, so that the data transmission rate of the interface is improved and can reach 50 MB/s.
3. Compared with the bus, the algorithm unit directly obtains data from the first RAM or the second RAM through the RAM exchange matrix, or stores the processed data in the first RAM or the second RAM, so that the processing speed of the encryption and decryption data is improved and can reach 40 MB/s.
4. The safety chip device is internally provided with 4 noise source generators which adopt two different circuit architectures of a switch current structure and a switch voltage structure, and the independence is excellent. The method combines physical random source and digital post-processing to generate high-quality random numbers with uniform distribution, sequence independence and long period.
5. The security chip device is packaged by QFN32, and the chip area is 5x5mm2And the size is small, and the PCB space is saved.
In the invention, the technical schemes can be combined with each other to realize more preferable combination schemes. Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
Drawings
The drawings are only for purposes of illustrating particular embodiments and are not to be construed as limiting the invention, wherein like reference numerals are used to designate like parts throughout.
Fig. 1 is a flowchart of a data processing method according to an embodiment of the present invention.
Fig. 2 is a diagram of a four-wire SPI interface architecture according to an embodiment of the present invention.
Fig. 3 is a structural diagram of Pipeline mode data processing according to an embodiment of the present invention.
Fig. 4 is a general configuration diagram of a security chip device according to an embodiment of the present invention.
Fig. 5 is a diagram showing the state of the I2C interface in the secure chip device.
Fig. 6 is a structure diagram of a GPIO interface in the secure chip device.
Fig. 7 is a block diagram of an SM1 cryptographic algorithm module according to an embodiment of the present invention.
Fig. 8 is a block diagram of a PKU in a secure chip device.
Fig. 9 is a block diagram of an SM4 cryptographic algorithm module according to an embodiment of the present invention.
Fig. 10 is a block diagram of a HASH cryptographic algorithm module according to an embodiment of the present invention.
FIG. 11 is a block diagram of a random number generator according to an embodiment of the present invention.
Fig. 12 is a schematic diagram of a Linear Feedback Shift Register (LFSR) according to an embodiment of the present invention.
FIG. 13 is a diagram of an embedded system according to an embodiment of the invention.
Fig. 14 is a block diagram of a secure chip device according to an embodiment of the present invention.
Fig. 15 is a flowchart of a data processing method according to an embodiment of the present invention.
Detailed Description
The accompanying drawings, which are incorporated in and constitute a part of this application, illustrate preferred embodiments of the invention and together with the description, serve to explain the principles of the invention and not to limit the scope of the invention.
The invention discloses a data processing method. Referring to fig. 1, the data processing method includes: step S1: storing the first data stream in a first RAM; step S2: storing the second data stream in a second RAM while processing the first data stream by the arithmetic unit and storing the processed first data stream in a first RAM; step S3: outputting the processed first data stream to an upper computer, processing a second data stream through an algorithm unit, and storing the processed second data stream in a second RAM; and step S4: and outputting the processed second data stream to an upper computer, wherein T is a data processing period.
Compared with the prior art, the data processing method provided by the embodiment processes the first data stream through the algorithm unit and stores the processed first data stream in the first RAM while storing the second data stream in the second RAM, processes the second data stream through the algorithm unit and stores the processed second data stream in the second RAM while outputting the processed first data stream to the upper computer, and can increase the number of data processes and improve the utilization rate of the algorithm unit within the same T, thereby improving the work efficiency.
Hereinafter, step 1, step 2, step 3, and step 4 of the data processing method are described in detail with reference to fig. 1 to 4.
Referring to fig. 1 and 3, step S1: the first data stream is stored in a first RAM. Specifically, further comprising inputting a first data stream via a four-wire SPI interface; and storing the first data stream in the first RAM through the RAM switch matrix.
Referring to fig. 1 and 3, step S2: the second data stream is stored in the second RAM while the first data stream is processed by the arithmetic unit and the processed first data stream is stored in the first RAM. The algorithm unit may include one or more of an SM1 cryptographic algorithm module, an SM4 cryptographic algorithm module, an AES cryptographic algorithm module, a DES cryptographic algorithm module, and a Hash algorithm module. Specifically, step S104 further includes: inputting a second data stream via the four-wire SPI interface and passing the second data stream through the RAM switch matrix in a second RAM; and simultaneously, the algorithm unit acquires the first data stream from the first RAM through the RAM switching matrix, performs algorithm processing on the first data stream, and stores the processed first data stream in the first RAM through the RAM switching matrix, wherein the algorithm processing is one of SM1, SM4, AES and DES. In an embodiment, processing the first data stream by the arithmetic unit and storing the processed first data stream in the first RAM further comprises: the CPU sends a key, a working mode and an algorithm processing starting instruction to the SFR interface module through the SFR bus; an algorithm core (SM1, SM4, AES or DES core) of the algorithm unit performs algorithm processing on the first data stream based on the key, the working mode and the algorithm processing starting instruction; after the algorithm processing is finished, transmitting an algorithm finishing instruction from the SFR interface module to the CPU through the SFR bus; after finishing the instruction according to the algorithm, the CPU generates a first RAM write-in instruction and configures an RAM switching matrix; and storing the processed first data stream in the first RAM through the RAM switch matrix according to the first RAM write instruction.
Referring to fig. 1 and 3, step S3: and outputting the processed first data stream to an upper computer, processing a second data stream through an algorithm unit, and storing the processed second data stream in a second RAM. Specifically, the method further comprises the following steps: reading the processed first data stream from the first RAM through the RAM switching matrix, and outputting the processed first data stream to the upper computer through the four-wire SPI interface; and simultaneously, the algorithm unit acquires the second data stream from the second RAM through the RAM switching matrix, performs algorithm processing on the second data stream, and stores the processed second data stream in the second RAM through the RAM switching matrix. In an embodiment, processing the second data stream by the arithmetic unit and storing the processed second data stream in the second RAM further comprises: the CPU sends a key, a working mode and an algorithm processing starting instruction to the SFR interface module through the SFR bus; an algorithm core (SM1, SM4, AES or DES core) of the algorithm unit performs algorithm processing on the second data stream based on the key, the working mode and the algorithm processing starting instruction; after the algorithm processing is finished, transmitting an algorithm finishing instruction from the SFR interface module to the CPU through the SFR bus; after finishing the instruction according to the algorithm, the CPU generates a second RAM write-in instruction and configures an RAM switching matrix; and storing the processed second data stream in the second RAM through the RAM switch matrix according to the second RAM write instruction.
Referring to fig. 1 and 3, step S4: and outputting the processed second data stream to an upper computer. Further comprising: reading the processed second data stream from a second RAM through a RAM switch matrix; and outputting the processed second data stream to the upper computer through the four-wire SPI interface.
Divide the dataflow that will wait to handle into a plurality of cycles, every cycle includes first dataflow and second dataflow, when exporting the second dataflow after handling to the host computer, still includes: the first data stream of the next cycle is acquired, and step S1 (refer to fig. 15) is performed. For example, the upper computer generally divides the first data stream and the second data stream, the first data stream and the second data stream are generally equal in each period, individual cases may not be equal, and the data length of each period is generally equal and individual cases may not be equal. In addition, dividing the data stream to be processed into a plurality of periods, wherein each period comprises N data streams, and the N data streams comprise a first data stream and a second data stream; the N data streams in each cycle are processed by N RAMs, including a first RAM and a second RAM. For example, there may be 3 RAMs, and there may be 3 data streams per cycle; or 4 RAMs handle 4 data streams per cycle. In order to improve the data processing efficiency, a plurality of algorithm units can be added, and data streams in a plurality of RAMs can be processed simultaneously.
Taking the data stream processing in one cycle as an example for explanation, assuming that the time for starting to process the data stream in one cycle is T, and the data stream in one cycle includes the first data stream and the second data stream, the data stream processing in one cycle can be divided into four times from T +0 to T + 3: storing the first data stream in a first RAM at time T + 0;
at the time of T +1, storing a second data stream in a second RAM, simultaneously processing the first data stream through an algorithm unit and storing the processed first data stream in the first RAM;
at the moment of T +2, outputting the processed first data stream to an upper computer, processing the second data stream through the algorithm unit, and storing the processed second data stream in the second RAM; and
and outputting the processed second data stream to the upper computer at the moment of T + 3.
The invention discloses a security chip device. Referring to fig. 14, the secure chip apparatus includes: the system comprises a first RAM (RAM _ A), a second RAM (RAM _ B), an algorithm unit DES/AES/SM1/SM2, a RAM exchange Matrix RAM _ Matrix, a four-wire SPI interface (Quad SPI), a Hash algorithm module (HASH), a CPU, a storage management module (EMMU), a ROM, a FLASH, an SRAM, a true random number generator TRNG and an upper computer. In fig. 14, the upper computer is provided outside the secure chip device. In an alternative embodiment, the upper computer may also be disposed inside the security chip device.
Referring to fig. 2, a first RAM (RAM _ a) is used to store a first data stream and output the first data stream processed by the arithmetic unit to an upper computer. And the second RAM (RAM _ B) is used for storing the second data stream and outputting the second data stream processed by the algorithm unit to the upper computer. The arithmetic unit DES/AES/SM1/SM2 is used for processing the first data stream and storing the processed first data stream in the first RAM, and processing the second data stream and storing the processed second data stream in the second RAM, and the upper computer is located outside the security chip device. Referring to fig. 4, the algorithm unit further includes one or more of an SM1 cipher algorithm module, an SM4 cipher algorithm module, an AES cipher algorithm module, a DES cipher algorithm module, and a Hash algorithm module, which are respectively used for encryption and decryption of the SM1 cipher algorithm, the SM4 cipher algorithm, the AES cipher algorithm, the DES cipher algorithm, and the Hash algorithm. Referring to fig. 7, the SM1 cryptographic algorithm module includes: the first SFR interface module is used for being connected with the CPU through an SFR bus; a first special function register for control configuration and data caching for the SM1 core; and an SM1 core for performing SM1 operations. Referring to fig. 9, the SM4 cryptographic algorithm module includes: the second SFR interface module is used for being connected with the CPU through an SFR bus; a second special function register for control configuration and data caching for the SM4 core; and an SM4 core for performing SM4 operations.
Referring to fig. 4, the RAM switch matrix may be connected with an AHB bus. The first RAM and the second RAM are directly connected with the algorithm unit through the RAM exchange matrix. The RAM switching matrix is used for storing the first data stream and the second data stream in the first RAM and the second RAM respectively through the RAM switching matrix; the algorithm unit respectively acquires a first data stream and a second data stream through the RAM switching matrix; and storing the processed first data stream and the second data stream in a first RAM and a second RAM respectively.
Referring to fig. 4, a four-wire SPI interface for inputting a first data stream and a second data stream and for outputting the processed first data stream and second data stream may connect the first RAM and the second RAM via a RAM switching matrix. The transmission clock of the four-wire SPI interface reaches 100MHz at most, and the data transmission speed reaches 50 MB/s.
The Hash algorithm module is used for a Hash algorithm, is connected to the CPU through an AHB bus, and is connected to the first RAM and the second RAM through the RAM switching matrix. In an embodiment, referring to fig. 10, the Hash algorithm module further comprises: the SFR interface module is used for being connected with the CPU through an SFR bus; the special function register is used for controlling configuration and data caching of the HASH core; and a HASH core for performing a HASH operation.
Referring to fig. 4, the CPU is configured to connect the Hash algorithm module and the algorithm unit via the AHB bus. The memory management module EMMU is used for being connected with the CPU through an AHB bus and is directly connected with the RAM switching matrix; and the storage management module EMMU is directly connected with the ROM, the FLASH and the SRAM.
A true random number generator TRNG for generating a sequence of random numbers meeting security requirements for key generation, digital signature. Specifically, referring to fig. 11, the TRNG further includes: TRNG0 is a switching voltage structure; TRNG1, TRNG2, and TRNG3 are switched current structures; a first exclusive or gate XOR to XOR the outputs of TRNG0, TRNG1, TRNG2, and TRNG3 to generate a parallel random number noise source seed; a supplementary random number generator for generating a supplementary random number according to a supplementary seed or a feedback random number, wherein the supplementary seed is used for the first time and the feedback random number is used subsequently; a random number seed generator for generating a random number seed according to the output of the first exclusive or gate and the supplemental random number; the linear feedback shift register LFSR is used for receiving the random number seed and shifting the random number seed to generate a plaintext and a secret key; and an RNG2 module to perform SM4 operations on the plaintext and the key to generate a random number.
Hereinafter, the secure chip apparatus and the data processing method are described in detail by way of specific examples with reference to fig. 2 to 12.
One, chip architecture
The safety chip adopts an ARM SC000 CPU core to realize central control, and the program storage space is 36KB ROM and 256KB FLASH in the chip. The program variables are temporarily stored in the system SRAM and are 32KB in size. The CPU can access the control registers of the respective IPs through the system bus, thereby scheduling the respective IPs to complete a specific job. The system provides interrupt and DMA service, and realizes the quick response of IP request and the quick transmission of data.
Referring to fig. 1, the functional blocks are illustrated as follows:
CPU, ARM-based 32-bit secure CPU core — SC 000;
the AHB bus is used for transmitting data and instructions;
the APB slow bus is used for transmitting data and instructions;
the EMMU storage management module is used for realizing the functions of mapping a memory, carrying out authority management on chip management programs/data, factory codes, password operation unit programs and user programs and the like;
DMA, direct access control unit, is used for the direct access data between memory cell and every password arithmetic unit;
ROM (read only memory) is a memory for storing programs, and has the characteristic of being non-rewritable, that is, the programs are solidified in a chip in the manufacturing process of the chip and cannot be rewritten in the using process of the chip;
RAM and register file are the memorizer to deposit the intermediate data that CPU runs;
FLASH is a nonvolatile memory for storing user programs and data;
the RAM _ Matrix is connected with an AHB bus, and the CPU and the password operation unit can directly access data in the SPI _ Buffer through the RAM _ Matrix;
the DES cryptographic algorithm module is used for realizing the encryption and decryption functions of the DES cryptographic algorithm;
the AES cipher algorithm module is used for realizing the encryption and decryption functions of the AES cipher algorithm;
the PKU coprocessor realizes the operation functions of large digital-analog addition, analog subtraction, analog multiplication, analog power and the like, and the operation functions of large digital-analog multiplication and elliptic curve point;
the SM1 cryptographic algorithm module is used for realizing the encryption and decryption functions of the SM1 cryptographic algorithm;
the SM4 cryptographic algorithm module is used for realizing the encryption and decryption functions of the SM4 cryptographic algorithm;
the HASH algorithm module is used for realizing the operation of a key function of a HASH algorithm;
the CKMU clock control unit is used for setting the working frequency of the system and the cryptographic algorithm module;
the RST is reset in an up-down mode, and reset signals of all circuit modules in the chip are generated;
the PWMU power management unit is used for powering on and powering off and controlling power consumption;
the FD \ VD \ TD abnormity detection circuit can detect the voltage, temperature and frequency of an external power supply and is used for preventing illegal attack;
a CRC module to implement a data check of CRC 16;
TIMER counter/TIMER, which can be used as counter or TIMER, is a commonly used functional component of microcontroller;
the WDT watchdog module can realize the watchdog reset function;
the TRNG true random number generator comprises four paths and is used for generating a random number sequence meeting the safety requirement, and the random number is used for key generation, digital signature and other important functions;
the MSEQ module is used for realizing the digital post-processing function of the random number;
OSC internal oscillator as clock source of the chip;
the SPI interface is used for realizing high-speed transmission of data with a microcontroller or other equipment;
I2the interface C realizes the function of an I2C protocol interface and can be externally connected with I2C interface equipment;
the GPIO interface is used for realizing common communication functions of the microcontroller, such as external interruption and the like;
second, the working principle
The security chip mainly realizes various security functions of identity authentication and data symmetric encryption based on a national cryptographic algorithm. The method meets the requirements of terminals of the Internet of things such as embedded equipment and the like, and can provide a cross-platform network identity authentication solution.
The chip is based on AHB and APB double-bus system architecture design, adopts ARM safety processor, is equipped with abundant FLASH and RAM space, supports cryptographic operation units such as SM1, SM2, SM3, SM4, SHA-256, RSA and the like, and various communication interfaces such as high-speed SPI, I2C, GPIO and the like, can be used for thing networking equipment safety module unit, possesses higher safety protection level.
1. CPU processor
The chip employs an ARM SC000 processor core. SC000 is an ARM 32-bit SecureCor microprocessor based on a Cortex-M0 processor, Von Neumann architecture with class 3 pipelining. Extremely high code efficiency is achieved by a simple, powerful instruction set and a fully optimized design (providing high-end processing hardware including a single-cycle multiplier).
The SC000 processor employs a 16-bit Thumb2 instruction set based on the ARMv6-M architecture. Providing a 32-bit processor with higher code density than other 8-bit and 16-bit microcontrollers.
The SC000 processor tightly integrates a configurable nested vector interrupt handler (NVIC) which has the following functions: including a non-maskable interrupt (NMI); providing 32 maskable Interrupt Sources (IRQs); 4 configurable interrupt priorities are provided.
The tight coupling of the processor core and NVIC allows for fast execution of Interrupt Service Routines (ISRs), greatly reducing interrupt latency. The interrupt handler does not need any assembly package code and does not consume any ISR code. Optimization of the tail-chaining also greatly reduces the overhead of switching one ISR to another. To optimize low power design, NVIC is also combined with sleep mode (WIC), which can wake up from sleep mode quickly by interrupts.
The SC000 is combined with a chip system, and the safety characteristics of register verification, system polarity control, jump instruction normalization, reset after the register group is interrupted and the like are realized.
2. Memory device
(1)ROM
A36 KB ROM memory is embedded in the chip, a data and address encryption mode is adopted, the chip boot firmware program is stored and is responsible for hardware testing, self-checking, user COS downloading and the like, and the area is only visible in a CMS mode and is not visible in other modes.
(2)RAM
The RAM available for CPU direct use on chip is 32KB SRAM.
The 32K SRAM supports 8, 16 and 32bit access, and in order to enhance the safety of the RAM data of the chip, the address and the data of the SRAM are scrambled, and an encryption scrambling key is regenerated at each power-on of the chip and is unpredictable. And the RAM data supports 1:1 data verification, and the starting address of the verification area and the size of the verification area can be set.
(3)Flash
The on-chip FLASH is Nor FLASH with the capacity of 256KB and is used for storing codes and data. The characteristics of Flash on chip are as follows: 1) the highest reading frequency is 40 MHz; 2) 8/16/32 bit read operation and 8bit write operation of FLASH are supported; 3) support to FLASH reading, erasing, page erasing, byte writing; 4) flash address scrambling is supported; 5) and Flash data disorder is supported.
In order to improve the safety of data, the Flash address and the data are scrambled. The address scrambling of Flash is implemented in the 512Byte range.
Operations such as erasing and programming of Flash are realized through a system control register. During the Flash erasing and programming operation, the CPU clock is stopped and the program cannot be executed until the erasing and programming operation is finished.
3. Memory management unit MMU
In order to enhance the security performance of the system, an MMU module is added for implementing memory mapping and performing functions such as authority management on chip management programs/data, factory code, cryptographic operation unit programs, and user programs.
The safety algorithm program is solidified at a fixed address of the FLASH in the chip production stage, has safety algorithm authority, and can not be changed after the application mode of the user takes effect. The application mode can only call corresponding functions through a specified interface provided by a password operation unit program, and any read, write and erase operation can cause CPU abnormity.
After the chip is reset outside, the program in the chip management program area is executed firstly, the necessary functions of system configuration, safety management, self-checking, user COS downloading and the like are completed, and then the chip management program exits through soft reset and enters the user program.
The chip provides three working modes for users to use, namely a security algorithm mode, a super user mode and a common user mode. In different modes, the rights to access system resources are different. As shown in the following table:
TABLE 1 memory management Authority
Figure BDA0002810246680000141
4. OSC clock
Internal oscillotor: can generate a 58MHz high-frequency clock for normal operation of the system and can also generate a 400KHz low-frequency clock for low power consumption control;
5. RST reset
The method mainly comprises three types of power-on reset, software reset and hardware reset, namely power-on reset; hardware reset: the method comprises the steps of power-on reset; resetting the pin in the test mode; abnormal resetting of the sensor (abnormal detection of voltage, frequency, temperature, voltage glitch); resetting the watchdog; the safety function is detected and reset; software resetting: the CPU may generate a software reset by setting a register.
6. TIMER TIMER
(1) Description of the function: 4 16/32 bit timers; has 3 configurable operation modes; 3 kinds of prescaler timer clocks are provided; interrupting the output; each timer can count the clock of the chip CPU; counting the frequency division (16/256 frequency division) of the chip CPU bus clock/external asynchronous pulse signal is supported.
(2) The design principle is as follows: the TIMER internal working register has: timer control register TimerControl, timer initial value register TimerLoad, timer current count value register TimerValue, and timer interrupt status register TimerIS. Switching of three working modes is realized by configuring a timer control register TimerControl, wherein the working mode0 is a circulation mode, the timer interrupt state register TimerIS is set when the counted value overflows, and meanwhile, a timer initial value register TimerLoad is reloaded to a counter to continue counting; the working mode 1 is a one-shot mode, and the timer interrupt status register TimerIS is set when the count value overflows, and the counting is stopped.
7. Interruption of a memory
Interrupts are one way for the CPU to exchange information with the peripheral. The introduction of the interrupt solves the speed matching problem between the CPU and the peripheral equipment, improves the execution efficiency of the CPU, and enables the embedded software to process the peripheral equipment data in real time.
The chip provides an interrupt controller unit to centrally manage all external event interrupts on the system integration. The chip provides 23 maskable interrupt sources IRQ and 1 nonmaskable interrupt NMI, and provides 4-level interrupt nesting with configurable priority so as to optimize interrupt processing performance.
The chip provides an interrupt controller unit to centrally manage all interrupts on the system integration.
Up to 23 maskable event interrupt sources connected to the input signal IRQ of the CPU, these interrupts being configurable by registers to be masked or turned on; the system exception interrupt source NMI source may not be masked by the system. The interrupt has the highest priority, and any other interrupt can be interrupted; interrupt nesting of 4-level configurable priorities is provided to optimize interrupt handling performance.
8. Communication interface
(1) SPI interface
Description of the function: full-duplex synchronous serial interface, SPI interface; supporting a Slave mode of operation; support a Mode0(CPOL 0, CPHA 0) configuration; the transmission sequence in the support byte is configured as MSB or LSB priority transmission; supporting configurable transmission length (1-2112 bytes) at one time; three connection mode selections of Normal, Dual and Quad ports are supported. The SPI _ CLK transfer clock supports 100MHz maximum.
The design principle is as follows: referring to fig. 2, the spi _ sett _ if module is a communication logic between the chip main control CPU and the IP, and implements functions of address decoding, register definition, register reading and writing, and the like; the Spi _ reflector module is an Spi communication transmission module and internally includes two 4Bytes buffer buffers. The RAM is used to store data to be transmitted and received.
(2)I2C interface
Description of the function: Master/Slave mode; software programmable slave addresses; software programmable clock frequency; can generate Start/Stop/retrieved Start/Acknowledge; the Start/Stop/retrieved Start can be detected; support 7/10 bit addresses; a Time out function; software may program the maximum time of the active SCL low level; supporting an S/F mode; the highest supported frequency is 400 Khz; support clock wait state generation and clock skewing; support clock synchronization; the HS mode is not supported; the mode of the Master is not supported, and the Slave mode is automatically converted into the Slave mode after the arbitration is lost;
the design principle is as follows: referring to fig. 5, the processor _ interface module is a communication logic of the chip main control CPU and the IP, and implements functions of timing conversion, address decoding, register definition, register reading and writing, and the like; i is2The c _ core module realizes the physical layer realization of the I2C protocol, and comprises functions of detecting the state of the I2C bus, generating the I2C bus time sequence and the like.
(3) GPIO interface
Description of the function: supporting 9 bidirectional IOs to carry out data communication with the outside; all GPIO ports may be configured as either inputs or outputs; when the GPIO is used as an input port, the interruption of high level, low level, rising edge, falling edge and double edge triggering is supported;
the design principle is as follows: referring to fig. 6, the interface module implements the interface IP and the control/configuration interface and the state feedback interface of the CPU; synchronous logic (sync) realizes asynchronous processing logic from external asynchronous interface signals to chip internal signals; and the Process module realizes corresponding functions according to different configurations of the CPU to the IP.
9. Algorithm module
(1) SM1 Algorithm Module
Description of the function: the ECB, CBC, CFB and OFB working modes are supported; supporting operations of 8 rounds, 10 rounds, 12 rounds and 14 rounds are selectable; the system parameter SK can use internal intrinsic parameter, internal and external combination parameter or external input parameter; the IV values of the initial vectors under the CBC, CFB and OFB working modes can be matched; and supporting the function of preventing high-order DPA.
The design principle is as follows: referring to fig. 7, the first SFR interface module 702 implements the conversion of interface timing; the first special function register module 704 implements control configuration and data caching for the SM1 core; the SM1 core 706 implements the SM1 protocol controlled by its internal state machine.
(2) PKU algorithm module
Description of the function: the functions of RSA modular addition, modular subtraction, modular multiplication and modular exponentiation and the functions of ECC/SM2 modular addition, modular subtraction, modular multiplication, point addition and point multiplication are realized by the PKU coprocessor, and software is accessed through PKURAM, PKUCMD, PKUMC0, PKUMC1, PKUSEGN, PKUEXP and PKUINT registers; writing the operation data into an internal data cache region of the PKU coprocessor through a PKURAM register, and reading the operation result from the internal data cache region of the PKU coprocessor through the PKURAM register; defining an internal data cache region by configuring a PKUCMD register, and judging an operation state by the PKUCMD register; defining a 64-bit Montgomery Constant by writing PKUMC a total of 2 times; by configuring the PKUSEGN register to define the operation bit number (taking 64 bits as a unit), the RSA operation can be supported to 2048 bits at most, and the ECC operation can be supported to 256 bits at most; defining a 32-bit modular exponentiation by configuring a PKUEXP register; the internal data cache is inaccessible during PKU operations;
the design principle is as follows: referring to fig. 8, the dual port RAM can be read from and written to the AHB bus, the mc, SegN, e _ reg, and cmd registers can be written, and the ready _ busy flag can be read. Where the mc register holds the montgomery constant and e reg is the most significant data register of the power exponent register. The cmd masters the command control word register (ECC) of the cpu, and the RAM is a 128X64bit double-ported sram. The multiMod module is a core module for realizing operation, the ModOpFSM controls the multiMod module to realize internal operation of montgomery among large integers, and the pkuFSM controls the ModOpFSM module to complete modular addition, modular subtraction, modular multiplication, modular exponentiation and point addition and point multiplication operation of ECC operation. After the operation is finished, the PKU sends an interrupt signal to the CPU, or the CPU continuously reads a status word in the algorithm chip, inquires sfr a register mark to wait for the completion of the operation, and then the main control CPU reads out an operation result from the chip.
(3) SM4 Algorithm Module
Description of the function: the working modes of ECB, CBC, CFB, OFB and the like are supported; supporting a query mode; hardware DPA prevention is supported; and conforms to the GM/T0002 SM4 block cipher algorithm.
The design principle is as follows: the algorithm module can prevent timing attacks, the algorithm hardware engine realizes calculation according to the round number defined by the SM4 algorithm standard, and each round of calculation time has a fixed clock period number and is irrelevant to externally input data (including plaintext, ciphertext and a secret key).
The algorithm module is particularly protected against side channel attacks. The whole operation chain of the SM4 is protected, input data is firstly randomized, corresponding conversion is synchronously performed on SBOX when SBOX conversion is performed, and finally all data on the whole operation chain of the SM4 are random until a final result is output. SM4 encryption requires 32 rounds of operations, each time before the round starts, two 8-bit random numbers are used to mask the S-boxes in this round. Thus, the S-box usage differs between rounds of operations during a single SM4 encryption/decryption operation. For single round operation of SM4 encryption and decryption, 4S boxes are needed to process 32bits data, four identical masking S boxes are used by the chip, and the 32bits data are processed in a parallel mode.
Referring to fig. 9, the second SFR interface module 902 implements the conversion of interface timing; the second special function register module 904 implements control configuration and data caching for the SM4 core; the SM4 core 906 is controlled by its internal state machine to implement the SM4 protocol.
(4) HASH algorithm module
Description of the function: the hardware HASH operation function is realized, and two HASH algorithms of SM3 and SHA-256 are supported; a 32-bit double-port SRAM interface is supported; support interrupt and query modes; meets the SM3 cryptographic hash algorithm standard and the FIPS PUB 180-2 SHA-256 standard.
The design principle is as follows: a secure HASH algorithm HASH operation module is integrated in the chip, a hardware HASH operation function is provided, and two HASH algorithms of SM3 and SHA-256 are supported.
Referring to fig. 10, the IP internal structure mainly includes a HASH core, an SFR interface module, an SRAM interface module, and a special function register, and the functions of each component are: a HASH nucleus: realizing HASH operation function (SM3 and SHA-256); SFR interface module: a communication interface between the controller and an external controller supports 8/16/32 bit operation; an SRAM interface module: the dual-port SRAM externally connected with the 32-bit data bus supports a 32X 32-bit dual-port SRAM and stores operation data and results; special function registers: the control information is stored.
10. Random number generator
Random numbers are an essential element of physical security mechanisms and are also an indispensable part in cryptographic applications.
The random number generator is divided into a True Random Number (TRNG) generator and a digital post-processing circuit (MSEQ). The true random number generator generates random seeds which are sent to the post-processing circuit, and the software reads the post-processed result to obtain the random number. Software can also be used as a seed for random number generation by writing to the RNGIN register directly.
The chip is internally provided with 4 noise source generators, two different circuit architecture realization modes of a switch current structure and a switch voltage structure are adopted based on a chaos theory and are realized independently, and the independence of the two types of random sources can be considered to be excellent. By combining physical random source with digital post-processing, high-quality random numbers with uniform distribution, sequence independence and long period can be generated.
FIG. 11 shows an overall design framework diagram for chip random number generation. The hardware automatically collects the serial random number generated by the random number noise source, after serial-to-parallel conversion, the serial random number enters a digital post-processing circuit for processing, and the software reads the processing result.
The digital post-processing circuit consists of two parts, namely a Linear Feedback Shift Register (LFSR) and an SM4 algorithm.
Referring to fig. 11, the 64 to 48bit extraction method of the feedback random number is to remove bit3 every 4bits and reserve bits 0 to 2. In the physical noise source part of FIG. 11, TRNG0 is the design of the switching voltage structure, and TRNG1/TRNG2/TRNG3 is the design of the switching current structure. Four independent physical noise sources respectively generate four paths of serial random number bit streams, a digital circuit behind the four paths of serial random number bit streams respectively carries out serial-parallel conversion on the four paths of serial random number bit streams, then four groups of independent 48bits parallel random numbers are generated, then a group of 48bits random numbers are generated by bitwise XOR processing, and finally the random numbers are sent to an LFSR circuit for subsequent processing. The frequency of the random number output by the physical noise source is about 1MHz, the frequency of the digital post-processing circuit is the fastest 58 MHz/slowest 2MHz, even if the frequency of the post-processing circuit is faster, the design ensures that the digital post-processing circuit is started to use the 48-bit random number seed after the generation of the 48-bit random number seed is completed once, and the output of the physical noise source cannot be unreasonable due to the frequency deviation.
In addition, in order to ensure the effectiveness of the output of the physical noise source and also perform self-checking on the output of the physical noise source, the following two self-checking methods are available: firstly, the abnormal output of all 0 or all 1 of the TRNG0/1/2/3 modules is checked respectively; and secondly, respectively collecting 128-bit or 1024-bit data output by the TRNG0/1/2/3 module, checking the number proportion of 1 (optional 1/16, 1/8, 1/4 and 3/8), and if the number proportion is smaller than the proportion, judging the output to be abnormal.
The seed of the digital post-processing circuit is generated by combining a true random number and a supplementary random number, wherein the true random number is generated by an on-chip physical noise source, and the four physical noise sources output XOR and then enter a subsequent circuit; the supplementary random number is generated according to the supplementary seed and the feedback random number, wherein the supplementary seed is generated according to the factory code when the BootLoader is powered on and initialized, and is set into the random number module only once; the feedback random number is generated according to the previous random number operation data, and is updated once each time the random number generation is completed.
After the seed is generated, the seed enters a Linear Feedback Shift Register (LFSR), and the structure has the following advantages: the LFSR has a simple structure and is very suitable for hardware implementation; the running speed is high; a large periodic sequence can be generated, and the length can reach 248 bits; sequences with good statistical properties can be produced, i.e., each bit in the sequence is nearly uniformly distributed.
Fig. 12 is a circuit configuration diagram of a Linear Feedback Shift Register (LFSR). The LFSR adopts 48 stages of linear feedback shift registers, and the feedback primitive polynomial is as follows: (X) X48+ X7+ X5+ X4+ X2+ X + 1. The initial seed value of the LFSR may be set by software, with the seed using random data generated by a random number noise source. In circuit implementation, x 0-x 47 is implemented by a register, and the exclusive-or circuit is implemented by a combination circuit.
The RNG2 module further processes the LFSR output data using SM4 algorithm to make the random nature of the output random number more excellent.
Random number generation flow
The generation flow of the random number required by the cryptographic algorithm is as follows:
1) and four independent random numbers generated by the four independent physical noise sources are processed by serial-to-parallel conversion, exclusive-or combination and the like to obtain 48-bit parallel random number noise source seeds.
2) The hardware uses the collected 48-bit serial random number noise source seed and the 48-bit random number seed input by the software to carry out XOR processing (the first time use, then uses SM4 to operate the intermediate result and carry out XOR with the random number noise source seed), and provides the result to the LFSR for processing, and generates 128-bit plaintext and 128-bit KEY required by SM4 operation in 256 system clock cycles;
3) the RNG2 module receives the needed plaintext and KEY and then automatically starts operation, generates 128-bit calculation results after 32 system clock cycles, then uses the highest and lowest 64-bit vectors to perform XOR, outputs a 64-bit vector, generates a DONE signal and informs software to read random numbers;
4) after the software reads the 64-bit random number, the hardware automatically starts the generation of the next round of 64-bit random number;
5) after the software carries out folding XOR on the 64-bit random number generated by the hardware, the 32-bit random number is finally output for the algorithm to use;
third, data processing flow
The data processing flow supports the following normal mode and Pipeline mode.
A normal mode:
step 1, a Quad SPI interface inputs data flow and stores the data flow into an RAM _ A;
step 2, the CPU analyzes the data in the RAM _ A;
step 3, if the data is analyzed to be COS data, the CPU configures an RAM _ Matrix channel, configures DMA, and writes the data in the RAM _ A into Flash through an AHB bus;
step 4, if the data is analyzed to be symmetrical algorithm data or hash data, the CPU configures an RAM _ Matrix path, configures a corresponding algorithm unit, starts an algorithm, and stores the data into an RAM _ A after the algorithm module finishes processing;
step 5, if the data are analyzed to be asymmetric algorithm data, the CPU configures an RAM _ Matrix channel, configures a DMA, writes the data in the RAM _ A into an RAM corresponding to a PKU through an AHB bus, after the data are prepared, the CPU configures and starts a PKU algorithm module, and after the algorithm module finishes processing, the CPU configures the DMA and stores the data into the RAM _ A through the AHB bus;
step 6, the CPU frames the data in the RAM _ A and outputs the data through a Quad SPI interface;
pipeline mode:
mainly aiming at symmetric encrypted data, the security algorithm processing is carried out while the interface transmits the data, so that the waiting time is greatly saved, and the data processing capability of a chip is improved, referring to fig. 3, the flow is as follows:
step 1, at the moment of T +0, a data stream X1 is input by a Quad SPI interface and is stored in an RAM _ A;
step 2, at the moment of T +1, a data stream X2 is input by the Quad SPI interface, an algorithm unit (SM1 or SM4) processes the data stream X1, and processed data are stored in an RAM _ A;
step 3, at the moment of T +2, reading a data stream X1 from the RAM _ A by the Quad SPI interface and outputting the data stream to an upper computer; the arithmetic unit (SM1 or SM4) processes the data stream X2 and stores the processed data into the RAM _ B;
step 4, at the moment of T +3, reading a data stream X2 from the RAM _ B by the SPI and outputting the data stream to the upper computer;
table 2 Pipeline mode data processing
Figure BDA0002810246680000211
Fourthly, innovation technology
The innovation of the invention is as follows: (1) the first family in China adopts a high-speed four-wire SPI interface security chip, the transmission clock reaches 100MHz at most, and the interface data transmission rate reaches 50 MB/s; (2) high-speed data stream encryption: the optimized SM1 and SM4 symmetric encryption algorithm units, a high-speed four-wire SPI interface and double-BUFFER (RAM _ A and RAM _ B) cache are adopted, so that Pipeline data transmission is facilitated, the algorithm units directly access data from the BUFFERs without buses, and the processing rate of encryption and decryption data can reach 40 MB/s; (3) the SM2 signature times are 307 times/second @58MHz, and the verification times are 177 times/second @58MHz (actually measured speed), which are higher than those of the same domestic product; (4) 4 noise source generators are arranged in the chip, and based on the chaos theory, two different circuit architecture realization modes of a switch current structure and a switch voltage structure are adopted, so that the independence is excellent. The method combines a physical random source with digital post-processing to generate high-quality random numbers which are uniformly distributed, have nothing to do with sequences and have long periods and pass the password detection test of a security chip; (5) QFN32 package, 5x5mm ^2 area is small, PCB space is saved; (6) low cost and low power consumption.
In yet another embodiment of the present invention, an embedded system is disclosed. Referring to fig. 13, the embedded system includes: the secure chip device, the SD card, the SD controller (SDC for short), and the Nandflash memory are described above. The SD card is used for storing data streams, wherein the data streams comprise a first data stream and a second data stream; an SD Interface (SD Interface) or SD I/F, used for the communication between the SD card and the external equipment; the SD card controller is used for transmitting the data stream to be encrypted to the security chip device through the four-wire SPI interface so as to encrypt the data stream to be encrypted, or storing the decrypted data stream from the security chip device to the SD card through the four-wire SPI interface; the secure chip device and the SD card controller are arranged in the SD card. The Nandflash memory is arranged in the SD card and comprises a secure area, a hidden area and a secure area, wherein the SD card controller is further configured to store the encrypted data stream to the Nandflash memory through an NF I/F Interface (a NadFlash Interface), or transmit the encrypted data stream to be decrypted to the secure chip device through a four-wire SPI Interface, so as to decrypt the encrypted data stream to be decrypted therein. The security chip device is typically applied to an embedded system, and can be directly used as a security module to complete functions such as identity authentication, encryption and decryption.
Compared with the prior art, the invention can realize at least one of the following beneficial effects:
1. the first data stream is processed through the algorithm unit and stored in the first RAM while the second data stream is stored in the second RAM, the second data stream is processed through the algorithm unit and stored in the second RAM while the processed first data stream is output to the upper computer, the number of data processing can be increased within the same T, the utilization rate of the algorithm unit is improved, and therefore the working efficiency is improved.
2. And data streams are input or output to the security chip device through the four-wire SPI interface, so that the data transmission rate of the interface reaches 50 MB/s.
3. Compared with the bus, the algorithm unit directly obtains data from the first RAM or the second RAM through the RAM exchange matrix or stores the processed data in the first RAM or the second RAM, so that the processing rate of the encryption and decryption data can reach 40 MB/s.
4. The safety chip device is internally provided with 4 noise source generators which adopt two different circuit architectures of a switch current structure and a switch voltage structure, and the independence is excellent. The method combines physical random source and digital post-processing to generate high-quality random numbers with uniform distribution, sequence independence and long period.
5. The security chip device is packaged by QFN32, and the chip area is 5x5mm2And the size is small, and the PCB space is saved.
Those skilled in the art will appreciate that all or part of the flow of the method implementing the above embodiments may be implemented by a computer program, which is stored in a computer readable storage medium, to instruct related hardware. The computer readable storage medium is a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present invention are included in the scope of the present invention.

Claims (12)

1. A data processing method, comprising:
step S1: storing the first data stream in a first RAM;
step S2: storing a second data stream in a second RAM while processing the first data stream by an arithmetic unit and storing the processed first data stream in the first RAM;
step S3: outputting the processed first data stream to an upper computer, processing the second data stream through the algorithm unit, and storing the processed second data stream in the second RAM; and
step S4: and outputting the processed second data stream to the upper computer.
2. The data processing method of claim 1, wherein storing a second data stream in a second RAM, while processing the first data stream by an arithmetic unit and storing the processed first data stream in the first RAM further comprises:
inputting the second data stream via a four-wire SPI interface and passing the second data stream through a RAM switch matrix in the second RAM; and
meanwhile, the algorithm unit acquires the first data stream from the first RAM through the RAM switching matrix, performs algorithm processing on the first data stream, and stores the processed first data stream in the first RAM through the RAM switching matrix.
3. The data processing method of claim 1, wherein outputting the processed first data stream to an upper computer while processing the second data stream by the arithmetic unit and storing the processed second data stream in the second RAM further comprises:
reading the processed first data stream from the first RAM through an RAM switching matrix, and outputting the processed first data stream to the upper computer through a four-wire SPI (serial peripheral interface); and
and meanwhile, the algorithm unit acquires the second data stream from the second RAM through the RAM switching matrix, performs the algorithm processing on the second data stream, and stores the processed second data stream in the second RAM through the RAM switching matrix.
4. The data processing method of claim 2, wherein processing the first data stream by an arithmetic unit and storing the processed first data stream in the first RAM further comprises:
the CPU sends a key, a working mode and an algorithm processing starting instruction to the SFR interface module through the SFR bus;
an algorithm core of the algorithm unit performs algorithm processing on the first data stream based on the secret key, the working mode and the algorithm processing starting instruction;
after the algorithm processing is finished, transmitting an algorithm finishing instruction from the SFR interface module to the CPU through the SFR bus;
after the CPU completes the instruction according to the algorithm, a first RAM writing instruction is generated and the RAM switching matrix is configured; and
and storing the processed first data stream in the first RAM through the RAM switching matrix according to the first RAM writing instruction.
5. The data processing method of claim 3, wherein processing the second data stream by the arithmetic unit and storing the processed second data stream in the second RAM further comprises:
the CPU sends a key, a working mode and an algorithm processing starting instruction to the SFR interface module through the SFR bus;
an algorithm core of the algorithm unit performs algorithm processing on the second data stream based on the secret key, the working mode and the algorithm processing starting instruction;
after the algorithm processing is finished, transmitting an algorithm finishing instruction from the SFR interface module to the CPU through the SFR bus;
after finishing the instruction according to the algorithm, the CPU generates a second RAM writing instruction and configures the RAM switching matrix; and
and storing the processed second data stream in the second RAM through the RAM switching matrix according to the second RAM writing instruction.
6. The data processing method of claim 1,
dividing the data stream to be processed into a plurality of cycles, wherein each cycle comprises the first data stream and the second data stream, and the method also comprises the following steps of outputting the processed second data stream to the upper computer:
the first data stream of the next cycle is acquired, and the step S1 is executed.
7. A secure chip device, comprising a first RAM, a second RAM, and an algorithm unit, wherein:
the first RAM is used for storing a first data stream and outputting the first data stream processed by the algorithm unit to an upper computer;
the second RAM is used for storing a second data stream and outputting the second data stream processed by the algorithm unit to the upper computer;
and the algorithm unit is used for processing the first data stream and storing the processed first data stream in the first RAM, and processing the second data stream and storing the processed second data stream in the second RAM, and the upper computer is positioned outside the security chip device.
8. The secure chip apparatus according to claim 7, further comprising: the RAM switching matrix is connected with an AHB bus, and the first RAM and the second RAM are connected with the arithmetic unit through the RAM switching matrix and are used for:
storing the first data stream and the second data stream in the first RAM and the second RAM, respectively;
the algorithm unit respectively acquires the first data stream and the second data stream through the RAM switching matrix; and
and storing the processed first data stream and the second data stream in the first RAM and the second RAM respectively.
9. The secure chip apparatus according to claim 8, further comprising:
the CPU is used for connecting the Hash algorithm module and the algorithm unit through the AHB bus;
the memory management module EMMU is used for being connected with the CPU through the AHB bus, and is directly connected with the RAM switching matrix; and
the storage management module EMMU is directly connected with the ROM, the FLASH and the SRAM.
10. An embedded system, comprising: the secure chip apparatus of any one of claims 7 to 9.
11. The embedded system of claim 10, further comprising:
the SD card is used for storing data streams, wherein the data streams comprise the first data stream and the second data stream; and
the SD card controller is used for transmitting the data stream to be encrypted to the secure chip device through the four-wire SPI interface so as to encrypt the data stream to be encrypted, or storing the decrypted data stream from the secure chip device to the SD card through the four-wire SPI interface;
the secure chip device and the SD card controller are arranged in the SD card.
12. The embedded system of claim 11, further comprising: a Nandflash memory disposed in the SD card, the Nandflash memory including a secure area, a hidden area, and a secure area, wherein,
the SD card controller is also used for storing the encrypted data stream to the Nandflash memory through an NF I/F interface, or transmitting the encrypted data stream to be decrypted to the security chip device through the four-wire SPI interface so as to decrypt the encrypted data stream to be decrypted therein.
CN202011387866.4A 2020-12-01 2020-12-01 Data processing method, security chip device and embedded system Pending CN112417522A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011387866.4A CN112417522A (en) 2020-12-01 2020-12-01 Data processing method, security chip device and embedded system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011387866.4A CN112417522A (en) 2020-12-01 2020-12-01 Data processing method, security chip device and embedded system

Publications (1)

Publication Number Publication Date
CN112417522A true CN112417522A (en) 2021-02-26

Family

ID=74829519

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011387866.4A Pending CN112417522A (en) 2020-12-01 2020-12-01 Data processing method, security chip device and embedded system

Country Status (1)

Country Link
CN (1) CN112417522A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742753A (en) * 2021-09-15 2021-12-03 北京宏思电子技术有限责任公司 Data stream encryption and decryption method, electronic equipment and chip system
CN114499958A (en) * 2021-12-24 2022-05-13 东软睿驰汽车技术(沈阳)有限公司 Control method and device, vehicle and storage medium
CN114489571A (en) * 2022-04-15 2022-05-13 广州万协通信息技术有限公司 Asymmetric algorithm calculation circuit
CN114879934A (en) * 2021-12-14 2022-08-09 中国科学院深圳先进技术研究院 Efficient zero-knowledge proof accelerator and method
CN115374463A (en) * 2022-10-20 2022-11-22 北京万协通信息技术有限公司 Data processing method, device, equipment and medium based on SPI matrix communication

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023937A (en) * 2010-11-19 2011-04-20 苏州国芯科技有限公司 Dataflow encryption method for USB (Universal Serial Bus) storage equipment
CN108197502A (en) * 2018-01-11 2018-06-22 苏州国芯科技有限公司 A kind of SPI transmission methods, device, controller, encryption chip and communication equipment
US20190384502A1 (en) * 2018-06-15 2019-12-19 EMC IP Holding Company LLC Method, device and computer program product for data stream processing

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102023937A (en) * 2010-11-19 2011-04-20 苏州国芯科技有限公司 Dataflow encryption method for USB (Universal Serial Bus) storage equipment
CN108197502A (en) * 2018-01-11 2018-06-22 苏州国芯科技有限公司 A kind of SPI transmission methods, device, controller, encryption chip and communication equipment
US20190384502A1 (en) * 2018-06-15 2019-12-19 EMC IP Holding Company LLC Method, device and computer program product for data stream processing

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113742753A (en) * 2021-09-15 2021-12-03 北京宏思电子技术有限责任公司 Data stream encryption and decryption method, electronic equipment and chip system
CN113742753B (en) * 2021-09-15 2023-09-29 北京宏思电子技术有限责任公司 Data stream encryption and decryption method, electronic equipment and chip system
CN114879934A (en) * 2021-12-14 2022-08-09 中国科学院深圳先进技术研究院 Efficient zero-knowledge proof accelerator and method
CN114499958A (en) * 2021-12-24 2022-05-13 东软睿驰汽车技术(沈阳)有限公司 Control method and device, vehicle and storage medium
CN114499958B (en) * 2021-12-24 2024-02-09 东软睿驰汽车技术(沈阳)有限公司 Control method and device, vehicle and storage medium
CN114489571A (en) * 2022-04-15 2022-05-13 广州万协通信息技术有限公司 Asymmetric algorithm calculation circuit
CN114489571B (en) * 2022-04-15 2022-07-15 广州万协通信息技术有限公司 Asymmetric algorithm calculation circuit
CN115374463A (en) * 2022-10-20 2022-11-22 北京万协通信息技术有限公司 Data processing method, device, equipment and medium based on SPI matrix communication
CN115374463B (en) * 2022-10-20 2023-01-10 北京万协通信息技术有限公司 Data processing method, device, equipment and medium based on SPI matrix communication

Similar Documents

Publication Publication Date Title
CN112417522A (en) Data processing method, security chip device and embedded system
JP4815491B2 (en) Integrated DMA
US20190132118A1 (en) Technologies for low-latency cryptography for processor-accelerator communication
US8671254B2 (en) Processes, circuits, devices, and systems for concurrent dual memory access in encryption and decryption
US7548997B2 (en) Functional DMA performing operation on DMA data and writing result of operation
KR20220028132A (en) Cryptographic Architectures for Cryptographic Permutations
CN112329038B (en) Data encryption control system and chip based on USB interface
CN102663326B (en) SoC-used data security encryption module
US11429751B2 (en) Method and apparatus for encrypting and decrypting data on an integrated circuit
WO2006131069A1 (en) A separate encryption/decryption equipment for plentiful data and a implementing method thereof
CN108628791B (en) High-speed security chip based on PCIE interface
US20170139851A1 (en) System architecture with secure data exchange
WO2019229192A1 (en) Memory-efficient hardware cryptographic engine
CN108959129B (en) Embedded system confidentiality protection method based on hardware
KR20090043592A (en) Dual mode aes implementation to support single and multiple aes operations
CN111566987B (en) Data processing method, circuit, terminal device and storage medium
Schilling et al. High speed ASIC implementations of leakage-resilient cryptography
Vaslin et al. Memory security management for reconfigurable embedded systems
WO2024064234A1 (en) Latency-controlled integrity and data encryption (ide)
CN2615756Y (en) High-speed information safety processor
Cao et al. A core-based multi-function security processor with GALS Wrapper
CN117492515A (en) Clock frequency detection unit, method, processing unit and computing system
CN113672946A (en) Data encryption and decryption component, related device and method
Weissenbach Security from Freescale Processors

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination