CN113672946A - Data encryption and decryption component, related device and method - Google Patents

Data encryption and decryption component, related device and method Download PDF

Info

Publication number
CN113672946A
CN113672946A CN202110798608.3A CN202110798608A CN113672946A CN 113672946 A CN113672946 A CN 113672946A CN 202110798608 A CN202110798608 A CN 202110798608A CN 113672946 A CN113672946 A CN 113672946A
Authority
CN
China
Prior art keywords
execution unit
stage
data
unit
key stream
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110798608.3A
Other languages
Chinese (zh)
Inventor
韦健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou C Sky Microsystems Co Ltd
Original Assignee
Pingtouge Shanghai Semiconductor Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Pingtouge Shanghai Semiconductor Co Ltd filed Critical Pingtouge Shanghai Semiconductor Co Ltd
Priority to CN202110798608.3A priority Critical patent/CN113672946A/en
Publication of CN113672946A publication Critical patent/CN113672946A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/602Providing cryptographic facilities or services
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/64Protecting data integrity, e.g. using checksums, certificates or signatures

Abstract

The present disclosure provides a data encryption and decryption component, a related device and a method. The data encryption and decryption component comprises: a key stream generation unit for generating a key stream based on a physical address, wherein the key stream generation unit includes multiple stages of sequentially connected pipeline execution units, wherein a first stage of the pipeline execution units generates an output to a next stage of the pipeline execution units based on the physical address and a stage key corresponding to the first stage of the pipeline execution units; the subsequent pipeline execution unit generates output to the next-stage pipeline execution unit based on the output of the previous-stage pipeline execution unit and a stage key corresponding to the pipeline execution unit, and the last-stage pipeline execution unit outputs the generated key stream; and the encryption and decryption unit is used for encrypting plaintext data to be encrypted or decrypting ciphertext data to be decrypted by using the key stream. The disclosed embodiments do not add additional encryption and decryption latency between the parallel port external memory and the computing device beyond the initial latency.

Description

Data encryption and decryption component, related device and method
Technical Field
The present disclosure relates to the field of chips, and in particular, to a data encryption and decryption component, a related apparatus, and a related method.
Background
In order to increase the storage space and reduce the product cost, more and more systems-on-a-chip (SoC) use an external memory to store data such as program codes. In order to achieve security of data, it is necessary to encrypt data stored in the external memory. The plaintext data needs to be encrypted using an encryption key before the target data is written from the internal memory to the external memory. The ciphertext data needs to be decrypted using the decryption key before the target data is read from the external memory to the internal memory. The external memory can be a serial external memory or a parallel external memory. The parallel port external memory has stricter requirements on encryption and decryption time delay and data throughput rate than the serial port external memory. How to read and write the external memory of the parallel port does not increase extra encryption and decryption time delay, and guarantee the throughput rate of the packet data becomes a difficult problem.
Disclosure of Invention
In view of the above, it is an object of the present disclosure to reduce extra encryption and decryption delay for reading and writing to the parallel port external memory and to guarantee packet data throughput.
According to an aspect of the present disclosure, there is provided a data encryption and decryption component for performing an encryption and decryption operation on data written to an external memory, including:
a key stream generation unit that generates a key stream based on a physical address that is an address at which target data is stored in the external memory, the key stream generation unit including pipeline execution units that are sequentially connected in multiple stages, wherein a first-stage pipeline execution unit generates an output to a next-stage pipeline execution unit based on the physical address and a stage key corresponding to the first-stage pipeline execution unit; the subsequent pipeline execution unit generates output to the next-stage pipeline execution unit based on the output of the previous-stage pipeline execution unit and a stage key corresponding to the pipeline execution unit, the last-stage pipeline execution unit outputs the generated key stream, and the multistage pipeline execution units respectively occupy one clock cycle to execute in sequence;
and the encryption and decryption unit is used for encrypting plaintext data to be encrypted or decrypting ciphertext data to be decrypted by using the key stream.
Optionally, the key stream generating unit includes:
a plurality of sequences of pipelined execution units, each of the sequences of pipelined execution units being comprised of one or more of the multiple stages of sequentially connected pipelined execution units;
a parallel port translation component to divide the physical address into a plurality of physical address fragments, each physical address fragment corresponding to one of the sequence of pipelined execution units,
wherein, in each of the sequences of pipelined execution units,
the first stage pipelined execution unit having a first input to receive the corresponding physical address fragment, a second input to receive a stage key corresponding to the first stage pipelined execution unit, and an output to generate an output to a next stage pipelined execution unit;
the subsequent pipelined execution unit having a first input receiving an output of the previous stage pipelined execution unit, a second input receiving a stage key corresponding to the pipelined execution unit, and an output producing an output to the next stage pipelined execution unit,
the output end of the last stage of the pipeline execution unit outputs the key stream fragments generated by the sequence of the pipeline execution unit, and the key stream fragments generated by each sequence of the pipeline execution unit are cascaded to obtain the generated key stream.
Optionally, each of the pipelined execution units in the sequence of pipelined execution units further has a third input to receive a random number assigned to the pipelined execution unit, and the pipelined execution unit generates an output of the output according to inputs of the first, second, and third inputs.
Optionally, the parallel port conversion unit has a strobe signal receiving end for receiving a strobe signal, the strobe signal includes a plurality of strobe bits, each of the plurality of strobe bits is used for indicating an operating state of a corresponding pipeline execution unit sequence in the plurality of pipeline execution unit sequences,
wherein, the gating bit effectively represents the work of the corresponding pipeline execution unit sequence, and then the corresponding physical address fragment is sent into the pipeline execution unit sequence;
otherwise, the strobe bit is invalid and the corresponding pipelined execution unit sequence stops working.
Optionally, the encryption and decryption unit includes:
the first delay unit is used for delaying plaintext data to be encrypted for N clock cycles, wherein N is the number of the pipeline execution units contained in the pipeline execution unit sequence;
and the first XOR unit is used for XOR-ing the delayed plaintext data to be encrypted and the key stream to obtain ciphertext data.
Optionally, the encryption and decryption unit further includes: the second delay unit is used for delaying the physical address by N clock cycles to send out the physical address;
the data encryption and decryption section further includes: and the off-chip storage controller is used for writing the ciphertext data obtained by the first exclusive-or unit into the physical address sent out after the second delay unit delays the time.
Optionally, the encryption and decryption unit includes:
the third delay unit is used for delaying the ciphertext data to be decrypted by N clock cycles, wherein N is the number of the pipeline execution units contained in the pipeline execution unit sequence;
and the third XOR unit is used for XOR-ing the ciphertext data to be decrypted after the time delay and the key stream to obtain plaintext data.
Optionally, the data encryption and decryption component further includes:
the comparator is used for comparing the width of the plaintext data to be encrypted or the ciphertext data to be decrypted with the width of the key stream;
and the width adjuster is used for adjusting the width of the key stream according to the width comparison result.
Optionally, the width adjuster comprises:
an interceptor, configured to intercept the key stream according to a first rule if the width of the key stream is greater than the width of the plaintext data to be encrypted or the ciphertext data to be decrypted, so that the width of the key stream is equal to the width of the plaintext data to be encrypted or the ciphertext data to be decrypted;
and the bit complementing device is used for complementing bits for the key stream according to a second rule if the width of the key stream is smaller than that of the plaintext data to be encrypted or the ciphertext data to be decrypted, so that the width of the key stream is equal to that of the plaintext data to be encrypted or the ciphertext data to be decrypted.
Optionally, the data encryption and decryption component further includes: a stage key synthesis unit corresponding to the pipelined execution unit to synthesize the stage key based on the initial key and the identity of the pipelined execution unit.
According to an aspect of the present disclosure, there is provided a system on a chip, including: a data encryption and decryption component as described above; a processing unit; a processing unit; an on-chip memory.
According to an aspect of the present disclosure, there is provided a system on a chip, including: a data encryption and decryption component as described above; a processing unit having an on-chip memory.
According to an aspect of the present disclosure, there is provided a computing device comprising the data encryption and decryption component as described above.
According to an aspect of the present disclosure, there is provided a data encryption and decryption method for performing an encryption and decryption operation on data written to an external memory, including:
generating, by a plurality of stages of sequentially connected pipelined execution units, a key stream based on a physical address, wherein the physical address is an address at which target data is stored in the external memory, wherein a first-stage pipelined execution unit generates an output to a next-stage pipelined execution unit based on the physical address and a stage key corresponding to the first-stage pipelined execution unit; the subsequent pipeline execution unit generates output to the next-stage pipeline execution unit based on the output of the previous-stage pipeline execution unit and a stage key corresponding to the pipeline execution unit, the last-stage pipeline execution unit outputs the generated key stream, and the multistage pipeline execution units respectively occupy one clock cycle to execute in sequence;
and encrypting plaintext data to be encrypted or decrypting ciphertext data to be decrypted by using the key stream.
In the disclosed embodiment, the key stream does not need to be stored in a specific storage location in advance. When encryption and decryption are needed, the key stream generation unit generates the key stream on the spot based on the physical address to be read and written, and the physical address to be read and written is invariable, so that the key stream adopted during reading and writing is also consistent, the storage space is saved on the premise of ensuring the encryption and decryption synchronization, and the encryption and decryption time delay caused by searching the needed key stream in the storage space in which a large number of key streams are stored is reduced. Compared with the mode of a serial off-chip memory, the encryption and decryption process is divided into a plurality of stages of pipeline execution units to be executed, the first stage of pipeline execution unit generates the output to the next stage of pipeline execution unit according to the physical address and the corresponding stage key, the later pipeline execution units generate the output to the next stage of pipeline execution unit according to the output of the previous stage of pipeline execution unit and the corresponding stage key, and each pipeline execution unit occupies one cycle. Thus, the generation of one keystream can be divided into different phases for execution by different clock cycles, while the generation of other keystreams can be divided into different phases for execution by the same clock cycles. That is to say, the same clock cycle can execute a certain stage of different key stream generation at the same time, so that different key streams are generated by multiplexing the same clock cycle. Compared with a mode that one key stream is generated and then another key stream is generated, in a data access scene of an off-chip memory, extra encryption and decryption time delay is greatly reduced, and the throughput rate of packet data is ensured.
Drawings
The foregoing and other objects, features, and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which refers to the accompanying drawings in which:
FIG. 1 is a block diagram of a data center to which one embodiment of the present disclosure is applied;
FIG. 2 is an internal block diagram of a computing device according to one embodiment of the present disclosure;
FIG. 3 is an internal block diagram of a computing device according to another embodiment of the present disclosure;
FIG. 4 is an internal block diagram of a data encryption and decryption component according to one embodiment of the present disclosure;
FIG. 5 is an internal block diagram of a data encryption and decryption component according to another embodiment of the present disclosure;
fig. 6 is an internal structural diagram of a key stream generation unit according to one embodiment of the present disclosure;
FIG. 7 is a diagram of a pipelined execution unit and round robin operation according to one embodiment of the present disclosure;
FIG. 8 is a timing diagram of signals according to one embodiment of the present disclosure;
fig. 9 is a flowchart of a data encryption and decryption method according to one embodiment of the present disclosure.
Detailed Description
The present disclosure is described below based on examples, but the present disclosure is not limited to only these examples. In the following detailed description of the present disclosure, some specific details are set forth in detail. It will be apparent to those skilled in the art that the present disclosure may be practiced without these specific details. Well-known methods, procedures, and procedures have not been described in detail so as not to obscure the present disclosure. The figures are not necessarily drawn to scale.
The following terms are used herein.
A computing device: the device with computing or processing capability may be embodied in the form of a terminal, such as an internet of things device, a mobile terminal, a desktop computer, a laptop computer, etc., or may be embodied as a server or a cluster of servers. In the context of a data center, the computing devices are servers in the data center. In the context of the internet of things, the computing device is an internet of things terminal in the internet of things.
The system on chip: the processing-capable units and peripheral circuitry are packaged in pieces, integrated into a complete system on a single chip, and may be inserted into or replaced from a computing device.
A processor: a unit within a computing device or system on a chip for executing instructions of a conventional computing process (non-neural network processing, non-graphics processing, etc.).
A scheduling unit: in addition to conventional processing (processing not used for complicated operations such as image processing and various deep learning models) performed in a computing device or a system on chip, a unit that performs a scheduling function for an acceleration unit is also assumed. It allocates to the acceleration unit the tasks that the acceleration unit needs to undertake. The scheduling unit may take various forms such as a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), and the like.
An acceleration unit: a processing unit designed to increase the data processing speed in some special-purpose fields in response to the fact that the conventional processing unit is not efficient in these special-purpose fields (e.g., processing images, processing various operations of a deep learning model, etc.). The acceleration unit, also known as an Artificial Intelligence (AI) processing unit, includes a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a General Purpose Graphics Processing Unit (GPGPU), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), and special-purpose intelligent acceleration hardware (e.g., a neural network processor NPU). The method and the device are applicable to NPU scenes, but the embodiment of the method and the device adopt a general compiling custom interface, and the CPU, the GPU, the GPGPU and the like can be adopted for realizing the hardware of the acceleration unit.
An internal memory: a physical structure within a computing device or system on a chip for storing information. Depending on the application, the memory available to a computing device or system on a chip may be divided into a main memory (i.e., an internal memory) and a secondary memory (i.e., an external memory of embodiments of the present disclosure). The internal memory is used for storing instruction information and/or data information represented by data signals, for example, for storing data provided by the processor, and also for realizing information exchange between the processor and the external memory. Information provided by the external memory needs to be brought into the internal memory to be accessed by the processor.
An external memory: i.e., the secondary memory, is located off the computing device or system-on-chip. The external memory stores the ciphertext data for security. Thus, encryption is required when data is stored in the external memory, and decryption is required when ciphertext data is recalled to the internal memory.
Data encryption and decryption component: located within the computing device or system-on-chip, for decrypting the ciphertext data when the information provided by the external memory is read into the computing device or system-on-chip, and encrypting the plaintext data when the plaintext data provided by the computing device or system-on-chip is written into the external memory, as described above.
Application environment of the present disclosure
The embodiment of the disclosure provides a data encryption and decryption scheme of an external memory. The whole data encryption and decryption scheme is relatively universal, and can be used for various hardware devices for accessing an external memory, such as a data center, an AI (artificial intelligence) acceleration unit, a GPU (graphic processing unit), an IOT (Internet of things) device capable of executing a deep learning model, an embedded device and the like. The data encryption and decryption method is independent of the hardware where the data encryption and decryption component is finally deployed. For exemplary purposes, however, the following description will be made mainly with respect to a data center as an application scenario. Those skilled in the art will appreciate that the disclosed embodiments are also applicable to other application scenarios.
Data center
A data center is a globally collaborative network of devices that is used to communicate, accelerate, present, compute, store data information over an internet network infrastructure. In future development, the data center will become an asset for enterprise competition. With the popularization of data center applications, data security of data centers is more and more emphasized.
In a conventional large data center, the network structure is generally as shown in fig. 1, i.e., a hierarchical inter-networking model (internetworking model). This model contains the following parts:
the server 140: each server 140 is a processing and storage entity of a data center in which the processing and storage of large amounts of data is performed by the servers 140.
The access switch 130: the access switch 130 is a switch used to access the server 140 to the data center. One access switch 130 accesses multiple servers 140. The access switches 130 are typically located on Top of the Rack, so they are also called set-Top (Top of Rack) switches, which physically connect the servers.
Aggregation switch 120: each aggregation switch 120 connects multiple access switches 130 while providing other services such as firewalls, intrusion detection, network analysis, and the like.
The core switch 110: core switches 110 provide high-speed forwarding of packets to and from the data center and connectivity for aggregation switches 120. The entire data center network is divided into an L3 layer routing network and an L2 layer routing network, and the core switch 110 provides a flexible L3 layer routing network for the entire data center network.
Typically, the aggregation switch 120 is the demarcation point between L2 and L3 layer routing networks, with L2 below and L3 above the aggregation switch 120. Each group Of aggregation switches manages a Point Of Delivery (POD), within each Of which is a separate VLAN network. Server migration within a POD does not have to modify the IP address and default gateway because one POD corresponds to one L2 broadcast domain.
A Spanning Tree Protocol (STP) is typically used between aggregation switch 120 and access switch 130. STP makes only one aggregation layer switch 120 available for a VLAN network and the other aggregation switches 120 are used in the event of a failure (dashed lines in the upper figure). That is, at the level of aggregation switches 120, no horizontal scaling is done, since only one is working even if multiple aggregation switches 120 are added.
Computing device
Since the server 140 is the real processing device of the data center, fig. 2 shows a block diagram of the server 140 (computing device 141). Computing device 141 includes memory 24, processor 22, and data encryption and decryption component 300. The external memory 350 is a memory outside the computing device 141, and the external memory 350 stores ciphertext data for the purpose of improving security. In some embodiments, each processor 22 may include one or more processor cores 220 for processing instructions. As an example, processor cores 1 to m are shown in fig. 2, m being a natural number other than 0. In some embodiments, as shown in FIG. 2, the processor 22 may include a cache memory 28 (a triple-level cache of L1, L2, and L3 is shown in FIG. 2) for storing some instructions or data common to the processor cores 220.
In some embodiments, processor 22 includes a Register File 226(Register File), and Register File 226 may include a plurality of registers for storing different types of data and/or instructions, which may be of different types. For example, register file 226 may include: integer registers, floating point registers, status registers, instruction registers, pointer registers, and the like.
Processor 22 may include a Memory Management Unit (MMU) 222 for implementing virtual to physical address translations.
The processor 22 is used to execute sequences of instructions (i.e., programs). The process of executing each instruction by processor 22 includes: and the steps of fetching the instruction from the memory for storing the instruction, decoding the fetched instruction, executing the decoded instruction, saving the instruction execution result and the like are circulated until all instructions in the instruction sequence are executed or a halt instruction is encountered. To implement the above processes, processor 22 may include an instruction fetch unit 224, an instruction decode unit 225, an instruction execution unit 221, and so on. Instruction fetch unit 224 is configured to fetch instructions from memory 24 and/or off-chip memory 350 and receive or compute a next instruction fetch address according to an instruction fetch algorithm. Instruction decode unit 225 decodes the fetched instruction according to a predetermined instruction format to obtain operand fetch information required by the fetched instruction. Instruction execution unit 221 executes the instruction using the operands. Instruction execution unit 221 may be an arithmetic unit (e.g., including an arithmetic logic unit, a vector arithmetic unit, etc.), a memory execution unit (e.g., for accessing a memory according to an instruction to read data in the memory or write specified data to the memory, etc.), a coprocessor, and so on.
In some embodiments, data encryption and decryption component 300 shown in FIG. 2 includes bus interconnect 310, protocol parsing unit 320, keystream generation unit 360, encryption and decryption unit 370, bus interconnect 330, and off-chip memory controller 340. Note that only some of the major components of the data encryption and decryption component 300 are shown in fig. 2. The data encryption/decryption component 300 also includes other components such as a comparator 380, a width adjuster 390, a key synthesis unit 354, etc., which will be described in detail later in conjunction with fig. 4-6. In some embodiments, during the process of accessing the external memory 350 by the computing device 141, the data encryption and decryption component 300 is utilized to encrypt plaintext data to be encrypted, thereby writing encrypted ciphertext data to the external memory 350. In some embodiments, during the process of accessing the external memory 350 by the computing device 141, the ciphertext data to be decrypted is decrypted by the data encryption and decryption component 300, so that the decrypted plaintext data is written to the internal memory. Since the details of encrypting plaintext data to be encrypted by the data encryption/decryption module 300 and decrypting ciphertext data to be decrypted by the data encryption/decryption module 300 will be described below, further description is omitted here.
Fig. 2 shows the structure of the computing device in a scenario where the cpu 22 needs to encrypt plaintext data to the external memory 350 or decrypt ciphertext data stored in the external memory 350 into plaintext data and store the plaintext data in the internal memory during execution. In contrast, fig. 3 shows the computing device structure in the scenario where the acceleration unit 430 needs to encrypt plaintext data to the external memory 350 or decrypt ciphertext data stored in the external memory 350 into plaintext data to be stored in the internal memory of the acceleration unit 430 during execution.
The computing device 141 of FIG. 3 includes a memory 410, a dispatch unit cluster 470 and an acceleration unit cluster 480 connected by a bus, and a data encryption and decryption component 300. The scheduling unit cluster 470 includes a plurality of scheduling units 420. The acceleration unit cluster 480 includes a plurality of acceleration units 430. The acceleration unit is a special processing unit designed to accelerate the operation processing speed of the neural network model in the embodiment of the present disclosure, and may be embodied as a processing unit specially designed for the neural network operation processing, a Graphics Processing Unit (GPU), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA), and the like. The scheduling unit 420 is a processing unit that schedules the acceleration units 430 and allocates to each acceleration unit 430 a sequence of instructions to be executed, and may take various forms such as a Central Processing Unit (CPU), an Application Specific Integrated Circuit (ASIC), and a Field Programmable Gate Array (FPGA).
The data encryption and decryption component 300 shown in fig. 3 corresponds to the data encryption and decryption component 300 shown in fig. 2. Since the details of encrypting plaintext data to be encrypted by the data encryption/decryption module 300 and decrypting ciphertext data to be decrypted by the data encryption/decryption module 300 will be described below, further description is omitted here.
It is understood that, in the embodiments of the present disclosure, the computing device is illustrated by taking a Central Processing Unit (CPU) and an acceleration unit (e.g., a neural network processor NPU) as an example, the embodiments of the present disclosure are not limited thereto, and the computing device of the embodiments of the present disclosure may further include other processing units, for example, a Graphics Processing Unit (GPU), a General Purpose Graphics Processing Unit (GPGPU), a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), and the like.
Data encryption and decryption component
Fig. 4 is an internal structural diagram of a data encryption and decryption component according to one embodiment of the present disclosure. The data encryption/decryption component shown in fig. 4 is configured to encrypt target data (plaintext data to be encrypted), and write encrypted ciphertext data into the external memory 350 (not shown). Referring to fig. 4, the data encryption and decryption components include a bus interconnect 310, a protocol parsing unit 320, a key stream generation unit 360, an encryption and decryption unit 370, a bus interconnect 330, an off-chip memory controller 340, a comparator 380, and a width adjuster 390. The width adjuster 390 in turn comprises a truncator 391 and a padding 392.
In some embodiments, as shown in FIG. 4, bus interconnect 310 is a line used to interconnect processor 22 or acceleration unit 430 with data encryption and decryption component 300. In the embodiment of the present disclosure, a write access request for writing target data stored in a computing device (not shown in the figure) to the external memory 350 is received therethrough. The target data may be stored in the memory 24 and cache 28 of the processor 22, the memory 410, and the internal memories of the dispatch unit 420 and acceleration unit 430. Bus interconnect 310 is, for example, an AXI bus. The bit width of bus interconnect 310 may be 32 bits, 64 bits, 128 bits, 256 bits, 512 bits, or 1024 bits.
In some embodiments, protocol resolution unit 320 connects bus interconnect 310, which may be any processing unit with different network communication protocol resolution functionality. Protocol parsing refers to a process of analyzing the content of a network packet in detail. The protocol analysis starts from the specific regularity of the network communication protocol, analyzes the data and the structure of the data packet, obtains deep-level data, and provides accurate and detailed information for processing the data packet later. In some embodiments, the protocol parsing unit 320 parses a write access request that writes target data to the external memory 350, thereby obtaining a physical address at which the target data is stored in the external memory 350.
The off-chip memory controller 340 is a part that controls writing of ciphertext data to the external memory 350, and reading of ciphertext data from the external memory 350. The bus interconnect 330 is a component that interconnects the data encryption/decryption component 300 with the external memory 350. As with bus interconnect 310 described above, it may also be an AXI bus.
The key stream generation unit 360, encryption/decryption unit 370, comparator 380, and width adjuster 390 are unique components of the disclosed embodiments. Implementation of the embodiments of the present disclosure is described below in conjunction with these components.
The key stream generation unit 360 is a means for generating a key stream based on the physical address. The physical address is an address where target data is stored in the external memory 350. The target data is data to be written to the external memory 350 or data to be read from the external memory 350. Accordingly, the physical address is an address at which target data is written into the external memory 350 or an address at which data is to be read from the external memory 350. The encryption/decryption unit 370 is a means for encrypting plaintext data to be encrypted or decrypting ciphertext data to be decrypted, using the key stream. In one embodiment, the plaintext data to be encrypted is encrypted by xoring the plaintext data to be encrypted with the key stream to obtain a physical address for storing the ciphertext data in the external memory 350; the way to decrypt the ciphertext data to be decrypted is to xor the ciphertext data to be decrypted at the physical address of the external memory 350 with the key stream to obtain plaintext data to be stored in the internal memory.
In the prior art, the keystream is not generated based on physical addresses. It is stored in a specific memory location, such as a certain area of the internal memory, in advance. Since there is a large amount of data to be encrypted and written to the external memory 350, a large amount of space may be required in the internal memory to store the keystream corresponding to the data. In practice, a keystream with many data may be multiplexed. For example, the keystream may be the same for data to be written to or read from a particular memory location in external memory 350. It takes a lot of storage space to store these key streams, causing a lot of storage overhead, and searching for the required key streams in a large storage space results in a long time consumption. The embodiment of the disclosure skillfully utilizes the characteristic that key streams required by the same physical address are consistent, and generates the key streams based on the physical address. Thus, if the same physical address is to be read and written, the key stream used for encryption and decryption is also consistent. The key stream is generated on site according to the physical address every time, instead of spending a large amount of storage space to store the required key stream in advance, the storage space is saved on the premise of ensuring encryption and decryption synchronization, and encryption and decryption time delay caused by searching the required key stream in the storage space in which the large amount of key stream is stored is reduced.
The generation of the key stream by the physical address is not a single process, and a plurality of rounds of operations are performed inside the key stream, each round of operation receives the result of the previous round of operation as input, and generates the operation result as input of the next round of operation. In the example of fig. 7, a physical address undergoes 10 rounds of operations to generate the final keystream. In the embodiment of the present disclosure, the round operations may be distributed into a plurality of clock cycles according to a principle of execution time balance, and each clock cycle is executed by one pipeline execution unit 363. The execution time balancing principle ensures that the total execution time of the divided round operations in each clock cycle is approximately balanced. For example, the execution times of the round operations 1 to 10 are 0.1s, 0.02s, 0.07s,0.06s,0.05s,0.08s,0.02s,0.04s,0.06s,0.11s, respectively, for 6 clock cycles, and corresponding to the 1 st to 6 th-stage pipeline execution units, the round operation 1 is distributed to the 1 st-stage pipeline execution unit, which takes 0.1 s; distributing round operation 2-3 to a 2 nd-level pipeline execution unit, wherein the time consumption is 0.02s +0.07 s-0.09 s; distributing round operation 4-5 to a 3 rd level pipeline execution unit, wherein the time consumption is 0.06s +0.05 s-0.11 s; distributing round operation 6-7 to a 4 th-level pipeline execution unit, wherein the time consumption is 0.08s +0.02 s-0.1 s; distributing round operation 8-9 to a5 th-stage pipeline execution unit, wherein the time consumption is 0.04s +0.06 s-0.1 s; the round robin 10 was distributed to the 6 th stage pipelined execution unit, which took 0.11 seconds. The operation time of each pipeline execution unit is 0.09-0.11s, and the execution time balance principle is met. Each pipelined execution unit 363 occupies one clock cycle.
The significance of dividing the process of generating the key stream based on the physical address into the multi-stage pipeline execution units to be executed is that the execution parallelism of the process of generating different key streams based on different physical addresses can be improved. If the process of generating a key stream based on one physical address is performed by only one pipeline execution unit, the encryption and decryption efficiency is low only after one key stream is generated through multiple rounds of operations based on one physical address and another key stream is generated through multiple rounds of operations based on another physical address. Especially for parallel external memory 350, this efficiency is not acceptable at all. As described above, for one task that generates a key stream based on a physical address, the task may be executed by a plurality of pipeline execution units 363, each pipeline execution unit 363 occupies one clock cycle, and for another task that generates a key stream based on a physical address, the task may also be executed by a plurality of pipeline execution units 363, each pipeline execution unit 363 occupies one clock cycle, so that in fact, in the same clock cycle, different tasks that generate key streams based on physical addresses may be executed by different pipeline execution units 363 at the same time, thereby greatly improving the overall execution efficiency. For example, for clock cycle 4, the 2 nd stage pipelined execution unit in a first task that generates a first keystream based on a first physical address and the 3 rd stage pipelined execution unit in a second task that generates a second keystream based on a second physical address may be executed concurrently. Therefore, compared with a mode that one key stream is generated and then another key stream is generated, in a data access scene of an off-chip memory, extra encryption and decryption time delay is greatly reduced, and the throughput rate of packet data is ensured.
The keystream generation unit 360 may include multiple stages of sequentially connected pipeline execution units 363. The first stage pipelined execution unit 363 generates an output to the next stage pipelined execution unit 363 based on the physical address and a stage key corresponding to the first stage pipelined execution unit 363; the following pipeline execution unit 363 generates an output to the next-stage pipeline execution unit based on the output of the previous-stage pipeline execution unit and the stage key corresponding to the pipeline execution unit, and the last-stage pipeline execution unit 363 outputs the generated key stream. In one specific example, keystream generation unit 360 has n stages of sequentially connected pipeline execution units 363. The first-stage pipeline execution unit 363 performs exclusive or operation on the physical address and the stage key 1 corresponding to the first-stage pipeline execution unit 363 to obtain an exclusive or result, and outputs the exclusive or result to the second-stage pipeline execution unit 363; the second-stage pipeline execution unit 363 performs an exclusive or operation on the output of the first-stage pipeline execution unit 363 and the stage key 2 corresponding to the second-stage pipeline execution unit 363 to obtain an exclusive or result, and outputs the exclusive or result to the third-stage pipeline execution unit 363 … …, and the nth-stage pipeline execution unit 363 performs an exclusive or operation on the output of the n-1-stage pipeline execution unit 363 and the stage key n corresponding to the nth-stage pipeline execution unit 363 to obtain an exclusive or result, which is used as a generated keystream.
The stage key is a key unique to the corresponding stage's pipelined execution unit 363. Which may be synthesized based on a preset initial key and the identity of the stage's pipelined execution unit 363. In one embodiment, it may be xored with the initial key and the identity of the stage's pipelined execution unit 363. The initial key is constant for all pipelined execution units 363, but different stages of pipelined execution units 363 have different identities. For example, the first stage pipelined execution unit 363 is identified as 1 and the second stage pipelined execution unit 363 is identified as 2 … ….
By the above manner, the operation of the multi-stage pipeline execution unit 363 is applied to the physical address, and in the process, the stage keys of the pipeline execution units 363 at all stages participate, so that the complexity of the process of obtaining the stream key is improved, and the security of the encryption and decryption process is high.
In another embodiment, it may take the form of an array of pipelined execution units, as shown in FIG. 6. The keystream generation unit 360 includes a plurality of sequences of pipelined execution units, each sequence of pipelined execution units including the multiple-stage, sequentially-connected pipelined execution units 363. The n-level pipelined execution unit 363 in each row in fig. 6 is a pipelined execution unit sequence, and includes a 1 st-level pipelined execution unit 363, and a 2 nd-level pipelined execution unit 363 … … n-level pipelined execution unit 363. The pipelined execution unit array of FIG. 6 has N rows, so there are N pipelined execution unit sequences. The keystream generation unit 360 further comprises a parallel port translation component 361 for dividing the physical address into a plurality of physical address fragments (e.g., physical address fragment 1, physical address fragment 2 … … physical address fragment N of fig. 6), each physical address fragment corresponding to a sequence of pipelined execution units. In each sequence of pipelined execution units, a first stage pipelined execution unit 363 has a first input 365 receiving a corresponding physical address fragment, a second input 366 receiving a stage key corresponding to the first stage pipelined execution unit 363, and an output 368 producing an output to a next stage pipelined execution unit 363; the following pipelined execution unit 363 has a first input 365 that receives the output of the preceding stage pipelined execution unit 363, a second input 366 that receives a stage key corresponding to the pipelined execution unit 363, and an output 368 that produces an output to the next stage pipelined execution unit 363. The output terminal 368 of the last stage pipelined execution unit 363 outputs the keystream fragments generated by the pipelined execution unit sequence, and the keystream fragments generated by each pipelined execution unit sequence are cascaded in the order in which the physical addresses are divided into corresponding physical address fragments to obtain the generated keystream.
For example, for a row of FIG. 6, i.e., a sequence of pipelined execution units, the first stage pipelined execution unit 363 has a first input 365 that receives the physical address fragment of the corresponding row, and a second input 366 that receives the stage key 1 corresponding to the first stage pipelined execution unit 363. The first-stage pipeline execution unit 363 performs exclusive or on the received physical address fragment and the stage key 1 to obtain an output of the first-stage pipeline execution unit 363, and the output is supplied to the second-stage pipeline execution unit 363 from the output terminal 368 and serves as an input of the first input terminal 365 of the second-stage pipeline execution unit 363. A second input 366 of the second stage pipelined execution unit 363 receives a stage key 2 corresponding to the second stage pipelined execution unit 363. The second-stage pipeline execution unit 363 performs exclusive or on the received output of the second-stage pipeline execution unit 363 and the stage key 2 to obtain an output of the second-stage pipeline execution unit 363, and the output is supplied to the third-stage pipeline execution unit 363 from the output end 368, the first input end 365 of the nth-stage pipeline execution unit 363 receives an output generated by the (n-1) th-stage pipeline execution unit 363 as the input … … of the first input end 365 of the third-stage pipeline execution unit 363, and the second input end 366 receives the stage key n corresponding to the nth-stage pipeline execution unit 363. The nth stage pipeline execution unit 363 performs exclusive or on the received output of the nth-1 stage pipeline execution unit 363 and the stage key n to obtain an output of the nth stage pipeline execution unit 363, and outputs the output from the output terminal 368 as the key stream fragment generated by the sequence of the pipeline execution units. The pipelined execution unit sequence of the first row of FIG. 6 corresponds to physical address fragment 1, resulting in keystream fragment 1; the sequence of pipelined execution units of the second row corresponds to physical address fragment 2 and the sequence of pipelined execution units of line N that produces keystream fragment 2 … … corresponds to physical address fragment N, producing keystream fragment N. Thus, the key stream fragments 1 and 2 … … N are sequentially concatenated to obtain the generated key stream.
Compared with the embodiment with only one row of pipelined execution units 363 (i.e. one pipelined execution unit sequence), the embodiment of the pipelined execution unit array in fig. 6 has the advantages that the process of generating the key stream is more complex, the security of the encryption and decryption process is improved, meanwhile, each pipelined execution unit sequence generates the key stream segment based on the physical address segment instead of the whole physical address, the granularity of execution operation of each pipelined execution unit is reduced, and the operation of different pipelined execution units 363 can be more flexibly multiplexed in the same clock cycle.
As shown in fig. 6, the data encryption and decryption component 300 may further include: a stage key synthesis unit 364, corresponding to the pipelined execution unit 363, is configured to synthesize the stage key based on the initial key and the identity of the pipelined execution unit. The pipelined execution units 363 of the same stage number in each sequence of pipelined execution units may multiplex the same key synthesis unit 364, i.e., each stage corresponds to one key synthesis unit 364, synthesizing one stage key. For example, in FIG. 6, the initial key may be XOR'd with the identity of first-stage pipelined execution unit 363 (the identity of pipelined execution units 363 of the same stage may be the same), and generate stage key 1 … … the initial key may be XOR'd with the identity of nth-stage pipelined execution unit 363, generating stage key n.
In order to further increase the complexity of the encryption and decryption process and thus improve the security thereof, a random number may be further introduced in the process of generating the keystream fragment. As shown in fig. 6, each pipelined execution unit 363 actually has a third input 367 in addition to the first 365 and second 366 inputs described above. But in the above described embodiment the input at the third input 367 is not considered to derive the keystream fragment. In this embodiment, a third input 367 is added for receiving a random number assigned to the pipelined execution unit 363. The random number for each slice of the pipelined execution unit 363 may be different. The pipeline execution unit 363 generates the output of the output 368 according to the input of the first input 365, the input of the second input 366, and the input of the third input 367. In one embodiment, the pipeline execution unit 363 may exclusive-or the input of the first input terminal 365, the input of the second input terminal 366, and the input of the third input terminal 367 to obtain the output of the output terminal 368. In this embodiment, because a random number is introduced in the process of generating the key stream segment, the complexity of generating the key stream is further improved, and the security of encryption and decryption is improved.
After the key stream generation unit 360 generates the key stream, in the case of encryption, the encryption/decryption unit 370 may encrypt plaintext data to be encrypted, which is transferred from the protocol parsing unit 320, using the generated key stream. In one embodiment, the plaintext data to be encrypted may be xored with the keystream. However, one premise of performing the xor operation is that the width, i.e., the number of bits, of the plaintext data to be encrypted and the key stream need to be the same. To achieve the width agreement between the two, the width of the keystream generated by the keystream generation unit 360 may be compared to the width of the plaintext data to be encrypted by a comparator 380, as shown in fig. 4. If the two are not identical, the width adjuster 390 adjusts the width of the key stream so that the adjusted width is identical to the width of the plaintext data to be encrypted. In one embodiment, the width adjuster 390 may include a truncator 391 and a padding 392. Assume that the width of plaintext data to be encrypted is L1 and the width of the key stream is L2. If the width L2 of the key stream is greater than the width L1 of the plaintext data to be encrypted, the key stream is intercepted by an interceptor 391 according to a first rule so that the width of the intercepted key stream is equal to the width L1 of the plaintext data to be encrypted. The first rule may be to truncate the first L1 bits or the last L1 bits of the L2 bits of the keystream, and so on. If the width L2 of the key stream is less than the width L1 of the plaintext data to be encrypted, the bit completer 392 bits the key stream according to a second rule such that the width of the key stream is equal to the width L1 of the plaintext data to be encrypted. The second rule may be to complement the key stream with L1-L2 bit 0 or L1-L2 bit 1 in front of the key stream so that the width of the key stream is equal to the width L1 of the plaintext data to be encrypted. It should be understood that comparator 380 and width adjuster 390 are not required. For example, in some cases, the key stream generation unit 360 ensures that the width of the generated key stream coincides with the width of the plaintext data to be encrypted.
As shown in fig. 4, in one embodiment, the encryption/decryption unit 370 may include a first delay unit 371, a second delay unit 372, and a first exclusive or unit 373. The first delay unit 371 delays the plaintext data to be encrypted by N clock cycles, where N is the number of pipeline execution units included in the pipeline execution unit sequence. The reason for this is that since the pipeline execution unit has N pipeline execution units 363 in the sequence, each pipeline execution unit 363 occupies one clock cycle, and thus, N clock cycles are shared from the time the physical address is received by the key stream generation unit 360 to the time the key stream is output. Thus, when encrypting plaintext data to be encrypted using the generated keystream, the plaintext data to be encrypted must also be delayed as many clock cycles to align. As shown in FIG. 8, the keystream generation unit 360 is input with physical addresses A0-A5, respectively, during clock cycles 0-5. Since the 6-stage pipeline execution unit 363, i.e., 6 clock cycles, are required to translate the physical addresses into the corresponding keystream, the 6 physical addresses become the corresponding keystream B0-B6 after 6 clock cycles. It can be seen from FIG. 8 that the 6 plaintext data corresponding to the key streams B0-B6 are still at the 0-5 clock cycle positions, and they need to be delayed by 6 clock cycles to perform the corresponding operation with the key streams B0-B6, such as the XOR operation of the first XOR unit 373.
The first xor unit 373 is configured to perform xor on the delayed plaintext data to be encrypted and the key stream to obtain ciphertext data. The off-chip memory controller 340 needs to write it to the corresponding physical address. However, when the first xor unit 373 performs xor, the plaintext data to be encrypted and the key stream are delayed by N clock cycles, and the ciphertext data obtained by the xor is also delayed by N clock cycles, but the physical address output from the protocol parsing unit 320 is not delayed. To be able to write to the correct address of the off-chip memory 350, the physical address is delayed by N clock cycles by the second delay unit 372 to the off-chip memory controller 340 over the bus interconnect 330. The off-chip memory controller 340 writes the ciphertext data obtained by the first xor unit 373 into the physical address sent by the second delay unit 372 after delaying.
The above discussion is for the case of encryption. In the case of decryption, after the key stream generation unit 360 generates the key stream, the encryption and decryption unit 370 may decrypt ciphertext data to be decrypted, which is transferred from the protocol parsing unit 320, using the generated key stream. In one embodiment, the ciphertext data to be decrypted may be xored with the keystream. However, one premise of performing the xor operation is that the width, i.e., the number of bits, of the ciphertext data to be decrypted and the key stream need to be the same. To achieve the width agreement between the two, the width of the keystream generated by the keystream generation unit 360 may be compared to the width of the ciphertext data to be decrypted by a comparator 380, as shown in fig. 5. If the two are not consistent, the width adjuster 390 adjusts the width of the key stream, so that the adjusted width is consistent with the width of the ciphertext data to be decrypted. In one embodiment, the width adjuster 390 may include a truncator 391 and a padding 392. Assume that the ciphertext data to be decrypted has a width of L1 and the keystream has a width of L2. If the width L2 of the key stream is greater than the width L1 of the ciphertext data to be decrypted, the key stream is truncated by a truncator 391 according to a first rule, so that the truncated width of the key stream is equal to the width L1 of the ciphertext data to be decrypted. The first rule may be to truncate the first L1 bits or the last L1 bits of the L2 bits of the keystream, and so on. If the width L2 of the key stream is less than the width L1 of the ciphertext data to be decrypted, the bit completer 392 bits the key stream according to a second rule, so that the width of the key stream is equal to the width L1 of the ciphertext data to be decrypted. The second rule may be to complement either L1-L2 bit 0 or L1-L2 bit 1 in front of the keystream so that the width of the keystream equals the width L1 of the ciphertext data to be decrypted. It should be understood that comparator 380 and width adjuster 390 are not required. For example, in some cases, the width of the generated key stream may be guaranteed by the key stream generation unit 360 to be consistent with the width of the ciphertext data to be decrypted.
As shown in fig. 5, in one embodiment, encryption/decryption unit 370 may include a third delay unit 374 and a third delay unit 375. The third delay unit 374 delays the ciphertext data to be decrypted by N clock cycles, where N is the number of pipeline execution units included in the pipeline execution unit sequence. The reason why the ciphertext data to be decrypted needs to be delayed by N clock cycles is the same as the reason why the plaintext data to be encrypted is delayed by N clock cycles, and therefore the description is omitted. The third xor unit 375 performs xor on the ciphertext data to be decrypted after the delay and the key stream to obtain plaintext data, and writes the plaintext data back to the internal memory through the protocol parsing unit 320, the bus interconnect 310, and the like.
As shown in fig. 4 to 5, the protocol parsing unit 320 may output a reset signal, a bypass signal, a strobe signal, a pause signal, etc., in addition to outputting the physical address and the plaintext data to the key stream generation unit 360.
The reset signal is a signal that triggers the operation of the key stream generation unit 360. When the reset signal is not reset, for example, 0, as shown in fig. 8, the key stream generation unit 360 does not operate, does not generate a key stream, and naturally does not perform subsequent encryption and decryption. When the keystream generation unit 360 is reset, such as at the rising edge of the reset signal of fig. 8, a keystream is generated for subsequent encryption and decryption.
The pause signal is a signal indicating that the next clock cycle is paused for a corresponding clock cycle stretch. Such as the falling edge of the pause signal of fig. 8, when the 6 th clock cycle is entered, keystream B0 is generated, which clock cycle is extended, pausing for the 7 th clock cycle. The 7 th clock cycle does not arrive until a rising edge of the pause signal occurs.
The strobe signal is a signal for indicating whether or not each of the pipelined execution unit sequences operates. The parallel port converting part 361 has a strobe signal receiving terminal for receiving a strobe signal. The strobe signal includes strobe bits to indicate whether the respective sequence of pipelined execution units is operational. The strobe bit is valid to indicate the operation of the corresponding sequence of pipelined execution units into which the corresponding fragment of physical addresses into which the physical addresses are divided is fed. Otherwise, the corresponding pipeline execution unit sequence stops working, and the corresponding physical address fragment divided by the physical address stops sending into the pipeline execution unit sequence. For example, assume that the pipelined execution unit array of FIG. 6 has 6 rows, i.e., 6 pipelined execution unit sequences 1-6. Assuming that the strobe signal is 011110, indicating that the 2 nd to 5 th pipeline execution unit sequences are all strobed, generating a keystream fragment; the 1 st and 6 th pipeline execution unit sequences are not gated, and key stream fragments are not generated. When the corresponding key stream segments are connected into the key stream, only the key stream segments generated by the 2 nd to 5 th sequence of the pipeline execution units can be connected, or the key stream segments generated by the 1 st and 6 th sequence of the pipeline execution units are all represented by 0 or all 1 and are connected with the key stream segments generated by the 2 nd to 5 th sequence of the pipeline execution units. This embodiment can realize flexible designation of the effective part in the physical address by setting the strobe bit.
The bypass signal is a signal indicating whether or not a particular stage of the pipelined execution unit 363 in each sequence of pipelined execution units is functioning. The parallel port converting part 361 has a bypass signal receiving terminal for receiving a bypass signal. The bypass signal includes a stage valid bit indicating whether the pipelined execution unit 363 of each stage in the respective sequence of pipelined execution units is operational. The stage valid bit is valid to indicate the operation of the pipelined execution unit 363 of the corresponding stage in each sequence of pipelined execution units. Conversely, it indicates that the corresponding stage of the pipeline execution unit 363 does not operate, and the output of the pipeline execution unit 363 of the previous stage is directly used as the input of the pipeline execution unit 363 of the next stage. For example, assume that each of the pipelined execution units in FIG. 6 have 8 stages in sequence, namely, stage 1 pipelined execution unit 363-stage 8 pipelined execution unit 363. Assume that the stage valid signal is 11011111 indicating that all of the other stages of pipelined execution units 363, except for the 3 rd stage pipelined execution unit 363, are valid. Therefore, the output of the 2 nd-stage pipeline execution unit 363 may be directly connected to the input of the 4 th-stage pipeline execution unit 363, enabling flexible control of the number of sub-processes involved in generating the keystream based on the physical address.
Data encryption and decryption method
Fig. 9 is a flowchart of a data encryption and decryption method provided by an embodiment of the present disclosure. As shown on the figure, the method comprises the following steps.
In step 801, the pipeline execution units 363 sequentially connected through a plurality of stages generate a key stream based on a physical address, wherein the physical address is an address at which target data is stored in the external memory 350, wherein the pipeline execution unit 363 of the first stage generates an output to the pipeline execution unit 363 of the next stage based on the physical address and a stage key corresponding to the pipeline execution unit 363 of the first stage; the following pipeline execution unit 363 generates an output to the next pipeline execution unit 363 based on the output of the previous pipeline execution unit 363 and the stage key corresponding to the pipeline execution unit 363, the last pipeline execution unit 363 outputs the generated key stream, and the multiple pipeline execution units 363 sequentially occupy one clock cycle to execute.
In step 802, the key stream is utilized to encrypt plaintext data to be encrypted or decrypt ciphertext data to be decrypted.
Since the details of the implementation of the above process have been introduced in the description of the foregoing device embodiment, they are not repeated.
Commercial value of the disclosed embodiments
The parallel port external memory data encryption and decryption component provided by the embodiment of the disclosure allocates a process of generating a key stream based on a target address to a multi-stage pipeline execution unit, each stage of pipeline execution unit occupies one clock cycle for execution in sequence, and after the initial waiting time of the stage number of clock cycles of the multi-stage pipeline execution unit, the encryption and decryption unit encrypts plaintext data to be encrypted or decrypts ciphertext data to be decrypted by using the key stream to obtain encrypted ciphertext data to be written into an external memory or decrypted plaintext data to be written into an internal memory. Besides the initial waiting time, no extra encryption and decryption time delay is added, so that the power consumption of the data encryption and decryption component can be effectively reduced on the premise of ensuring the data throughput rate of the parallel port external memory, and the data security is improved. Under the scene, the power consumption of the data encryption and decryption component is reduced, so that the power consumption of the computing device for accessing the external memory is reduced, and the running cost of the whole data center is further reduced. The embodiment of the disclosure reduces the encryption and decryption delay by 80%, and improves the packet data throughput by 30%, thereby having good commercial value and economic value.
As will be appreciated by one skilled in the art, the present disclosure may be embodied as systems, methods and computer program products. Accordingly, the present disclosure may be embodied in the form of entirely hardware, entirely software (including firmware, resident software, micro-code), or in the form of a combination of software and hardware. Furthermore, in some embodiments, the present disclosure may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied therein.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium is, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer-readable storage medium include: an electrical connection for the particular wire or wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical memory, a magnetic memory, or any suitable combination of the foregoing. In this context, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with a processing unit, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a chopper. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any other suitable combination. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., and any suitable combination of the foregoing.
Computer program code for carrying out embodiments of the present disclosure may be written in one or more programming languages or combinations. The programming language includes an object-oriented programming language such as JAVA, C + +, and may also include a conventional procedural programming language such as C. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAn) or a wide area network (WAn), or the connection may be made to an external computer (for example, through the internet using an internet service provider).
The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims (14)

1. A data encryption and decryption component for performing encryption and decryption operations on data written to an external memory, comprising:
a key stream generation unit that generates a key stream based on a physical address that is an address at which target data is stored in the external memory, the key stream generation unit including pipeline execution units that are sequentially connected in multiple stages, wherein a first-stage pipeline execution unit generates an output to a next-stage pipeline execution unit based on the physical address and a stage key corresponding to the first-stage pipeline execution unit; the subsequent pipeline execution unit generates output to the next-stage pipeline execution unit based on the output of the previous-stage pipeline execution unit and a stage key corresponding to the pipeline execution unit, the last-stage pipeline execution unit outputs the generated key stream, and the multistage pipeline execution units respectively occupy one clock cycle to execute in sequence;
and the encryption and decryption unit is used for encrypting plaintext data to be encrypted or decrypting ciphertext data to be decrypted by using the key stream.
2. The data encryption and decryption component of claim 1, wherein the keystream generation unit comprises:
a plurality of sequences of pipelined execution units, each of the sequences of pipelined execution units being comprised of one or more of the multiple stages of sequentially connected pipelined execution units;
a parallel port translation component to divide the physical address into a plurality of physical address fragments, each physical address fragment corresponding to one of the sequence of pipelined execution units,
wherein, in each of the sequences of pipelined execution units,
the first stage pipelined execution unit having a first input to receive the corresponding physical address fragment, a second input to receive a stage key corresponding to the first stage pipelined execution unit, and an output to generate an output to a next stage pipelined execution unit;
the subsequent pipelined execution unit having a first input receiving an output of the previous stage pipelined execution unit, a second input receiving a stage key corresponding to the pipelined execution unit, and an output producing an output to the next stage pipelined execution unit,
the output end of the last stage of the pipeline execution unit outputs the key stream segment generated by the sequence of the pipeline execution unit,
and after the key stream segments generated by each pipeline execution unit sequence are cascaded, the generated key stream is obtained.
3. The data encryption and decryption component of claim 2, wherein each pipelined execution unit of the sequence of pipelined execution units further has a third input to receive a random number assigned to that pipelined execution unit, the pipelined execution units producing the output of the output based on inputs to the first, second and third inputs.
4. The data encryption and decryption component of claim 2, wherein the parallel port conversion component has a strobe signal receiving end for receiving a strobe signal, the strobe signal including a plurality of strobe bits, each of the plurality of strobe bits for indicating an operating state of a respective one of the plurality of sequences of pipelined execution units,
wherein, the gating bit effectively represents the work of the corresponding pipeline execution unit sequence, and then the corresponding physical address fragment is sent into the pipeline execution unit sequence;
otherwise, the strobe bit is invalid and the corresponding pipelined execution unit sequence stops working.
5. The data encryption and decryption component of claim 2, wherein the encryption and decryption unit comprises:
the first delay unit is used for delaying plaintext data to be encrypted for N clock cycles, wherein N is the number of the pipeline execution units contained in the pipeline execution unit sequence;
and the first XOR unit is used for XOR-ing the delayed plaintext data to be encrypted and the key stream to obtain ciphertext data.
6. The data encryption and decryption component of claim 5, wherein the encryption and decryption unit further comprises: the second delay unit is used for delaying the physical address by N clock cycles to send out the physical address;
the data encryption and decryption section further includes: and the off-chip storage controller is used for writing the ciphertext data obtained by the first exclusive-or unit into the physical address sent out after the second delay unit delays the time.
7. The data encryption and decryption component of claim 2, wherein the encryption and decryption unit comprises:
the third delay unit is used for delaying the ciphertext data to be decrypted by N clock cycles, wherein N is the number of the pipeline execution units contained in the pipeline execution unit sequence;
and the third XOR unit is used for XOR-ing the ciphertext data to be decrypted after the time delay and the key stream to obtain plaintext data.
8. The data encryption and decryption component of claim 1, further comprising:
the comparator is used for comparing the width of the plaintext data to be encrypted or the ciphertext data to be decrypted with the width of the key stream;
and the width adjuster is used for adjusting the width of the key stream according to the width comparison result.
9. The data encryption and decryption component of claim 8, wherein the width adjuster comprises:
an interceptor, configured to intercept the key stream according to a first rule if the width of the key stream is greater than the width of the plaintext data to be encrypted or the ciphertext data to be decrypted, so that the width of the key stream is equal to the width of the plaintext data to be encrypted or the ciphertext data to be decrypted;
and the bit complementing device is used for complementing bits for the key stream according to a second rule if the width of the key stream is smaller than that of the plaintext data to be encrypted or the ciphertext data to be decrypted, so that the width of the key stream is equal to that of the plaintext data to be encrypted or the ciphertext data to be decrypted.
10. The data encryption and decryption component of claim 1, further comprising:
a stage key synthesis unit corresponding to the pipelined execution unit to synthesize the stage key based on the initial key and the identity of the pipelined execution unit.
11. A system on a chip, comprising:
a data encryption and decryption component according to any one of claims 1 to 10;
a processing unit;
an on-chip memory.
12. A system on a chip, comprising:
a data encryption and decryption component according to any one of claims 1 to 10;
a processing unit having an on-chip memory.
13. A computing device comprising a data encryption and decryption component according to any one of claims 1 to 10.
14. A data encryption and decryption method for performing encryption and decryption operations on data written to an external memory, comprising:
generating, by a plurality of stages of sequentially connected pipelined execution units, a key stream based on a physical address, wherein the physical address is an address at which target data is stored in the external memory, wherein a first-stage pipelined execution unit generates an output to a next-stage pipelined execution unit based on the physical address and a stage key corresponding to the first-stage pipelined execution unit; the subsequent pipeline execution unit generates output to the next-stage pipeline execution unit based on the output of the previous-stage pipeline execution unit and a stage key corresponding to the pipeline execution unit, the last-stage pipeline execution unit outputs the generated key stream, and the multistage pipeline execution units respectively occupy one clock cycle to execute in sequence;
and encrypting plaintext data to be encrypted or decrypting ciphertext data to be decrypted by using the key stream.
CN202110798608.3A 2021-07-15 2021-07-15 Data encryption and decryption component, related device and method Pending CN113672946A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110798608.3A CN113672946A (en) 2021-07-15 2021-07-15 Data encryption and decryption component, related device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110798608.3A CN113672946A (en) 2021-07-15 2021-07-15 Data encryption and decryption component, related device and method

Publications (1)

Publication Number Publication Date
CN113672946A true CN113672946A (en) 2021-11-19

Family

ID=78539344

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110798608.3A Pending CN113672946A (en) 2021-07-15 2021-07-15 Data encryption and decryption component, related device and method

Country Status (1)

Country Link
CN (1) CN113672946A (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1993922A (en) * 2004-07-30 2007-07-04 英特尔公司 Stream cipher combining system and method
CN101764685A (en) * 2009-10-26 2010-06-30 广州杰赛科技股份有限公司 Encrypting and deciphering system for realizing SMS4 algorithm
JP2010268149A (en) * 2009-05-13 2010-11-25 Mitsubishi Electric Corp Decoder, decoding method, and program
CN103427981A (en) * 2012-05-15 2013-12-04 北京华虹集成电路设计有限责任公司 Encryption and decryption achieving method and device
CN109656840A (en) * 2018-12-21 2019-04-19 成都海光集成电路设计有限公司 A kind of device of data encrypting and deciphering, method, storage medium and data-storage system
CN112149151A (en) * 2019-06-29 2020-12-29 英特尔公司 Cryptographic compute engine for memory load and store units of a microarchitectural pipeline

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1993922A (en) * 2004-07-30 2007-07-04 英特尔公司 Stream cipher combining system and method
JP2010268149A (en) * 2009-05-13 2010-11-25 Mitsubishi Electric Corp Decoder, decoding method, and program
CN101764685A (en) * 2009-10-26 2010-06-30 广州杰赛科技股份有限公司 Encrypting and deciphering system for realizing SMS4 algorithm
CN103427981A (en) * 2012-05-15 2013-12-04 北京华虹集成电路设计有限责任公司 Encryption and decryption achieving method and device
CN109656840A (en) * 2018-12-21 2019-04-19 成都海光集成电路设计有限公司 A kind of device of data encrypting and deciphering, method, storage medium and data-storage system
CN112149151A (en) * 2019-06-29 2020-12-29 英特尔公司 Cryptographic compute engine for memory load and store units of a microarchitectural pipeline

Similar Documents

Publication Publication Date Title
EP3757854B1 (en) Microprocessor pipeline circuitry to support cryptographic computing
CN107851170B (en) Supporting configurable security levels for memory address ranges
EP3958160A1 (en) Encoded inline capabilities
US20190116023A1 (en) Power side-channel attack resistant advanced encryption standard accelerator processor
KR101370223B1 (en) Low latency block cipher
US8001374B2 (en) Memory encryption for digital video
US9544133B2 (en) On-the-fly key generation for encryption and decryption
CN110245498B (en) Decryption method and circuit and corresponding device
JP2019207393A (en) Hardware accelerators and methods for high-performance authenticated encryption
WO2014055136A1 (en) Parallelized counter tree walk for low overhead memory replay protection
US11429751B2 (en) Method and apparatus for encrypting and decrypting data on an integrated circuit
US20190044699A1 (en) Reconfigurable galois field sbox unit for camellia, aes, and sm4 hardware accelerator
Nabil et al. Design and implementation of pipelined and parallel AES encryption systems using FPGA
Fang et al. SIFO: Secure computational infrastructure using FPGA overlays
US11516013B2 (en) Accelerator for encrypting or decrypting confidential data with additional authentication data
US20030044007A1 (en) Methods and apparatus for accelerating ARC4 processing
CN112948840A (en) Access control device and processor comprising same
US20210318966A1 (en) Cryptographic protection of memory attached over interconnects
US20210006391A1 (en) Data processing method, circuit, terminal device and storage medium
KR101923210B1 (en) Apparatus for cryptographic computation on heterogeneous multicore processors and method thereof
US9384368B2 (en) Instruction and logic for mid-level caching of random numbers distributed to multiple units
Azad et al. RISE: RISC-V SoC for En/Decryption Acceleration on the Edge for Homomorphic Encryption
CN113672946A (en) Data encryption and decryption component, related device and method
US20130061292A1 (en) Methods and systems for providing network security in a parallel processing environment
Fakhry et al. An HBM3 Processing-in-Memory Architecture for Security and Data Integrity: Case study

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20240223

Address after: 310052 Room 201, floor 2, building 5, No. 699, Wangshang Road, Changhe street, Binjiang District, Hangzhou City, Zhejiang Province

Applicant after: C-SKY MICROSYSTEMS Co.,Ltd.

Country or region after: China

Address before: 200120 floor 5, No. 366, Shangke road and No. 2, Lane 55, Chuanhe Road, China (Shanghai) pilot Free Trade Zone, Pudong New Area, Shanghai

Applicant before: Pingtouge (Shanghai) semiconductor technology Co.,Ltd.

Country or region before: China

TA01 Transfer of patent application right