CN117997514A - Flexible cryptographic architecture in a network device - Google Patents

Flexible cryptographic architecture in a network device Download PDF

Info

Publication number
CN117997514A
CN117997514A CN202311444394.5A CN202311444394A CN117997514A CN 117997514 A CN117997514 A CN 117997514A CN 202311444394 A CN202311444394 A CN 202311444394A CN 117997514 A CN117997514 A CN 117997514A
Authority
CN
China
Prior art keywords
hardware
inputs
network
network packet
block cipher
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311444394.5A
Other languages
Chinese (zh)
Inventor
Y·希克特
M·美尼斯
A·沙哈尔
U·巴舍
B·皮斯曼尼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mellanox Technologies Ltd
Original Assignee
Mellanox Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from US18/195,615 external-priority patent/US20240146703A1/en
Application filed by Mellanox Technologies Ltd filed Critical Mellanox Technologies Ltd
Publication of CN117997514A publication Critical patent/CN117997514A/en
Pending legal-status Critical Current

Links

Landscapes

  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The invention discloses a flexible cryptographic architecture in a network device, in particular a network device comprising a hardware pipeline for processing network packets to be encrypted. A portion of the hardware pipeline retrieves information from the network packet and generates a command based on the information. The block cipher circuit is coupled inline within the hardware pipeline. The hardware pipeline includes a hardware engine coupled between the portion of the hardware pipeline and the block cipher circuit. The hardware engine parses and executes the command to determine a set of inputs and inputs the set of inputs and portions of the network packet to the block cipher circuit. The block cipher circuit encrypts payload data of the network packet based on the set of inputs.

Description

Flexible cryptographic architecture in a network device
Technical Field
At least one embodiment relates to processing resources for performing and facilitating network communications. For example, at least one embodiment relates to techniques for a flexible cryptographic (cryptographic) architecture in a network interface device.
Background
The ability to transmit protected and authenticated data is becoming a fundamental requirement of networks in use today and will become the basis in the near future. Moreover, the growth of cloud computing has increased the need to transfer data in a secure manner, as different users access and share the same resources (e.g., cloud-based services). Many algorithms today define such secure network protocols for various applications, such as secure tunneling, data streaming, internet browsing, etc. These protocols typically include a control phase (connection setup and cryptographic handshaking) and a data protection phase. Control is hardly similar between protocols, whereas data protection has common components, and in particular a cryptographic suite. Data protection algorithms are very demanding in terms of computational (or operation) resources and repeatedly executing data protection algorithms consumes significant Central Processing Unit (CPU) resources when executed and controlled by software, thereby reducing system performance and efficiency.
Drawings
Various embodiments according to the present disclosure will be described with reference to the accompanying drawings, in which:
FIG. 1 is a block diagram of a network device including a flexible cryptographic architecture that enables cryptographic operations to be performed inline in a hardware pipeline, in accordance with some embodiments;
FIG. 2 is a simplified diagram of the hardware pipeline of FIG. 1 employing flexible cryptographic operations, according to some embodiments;
FIG. 3 is a flow diagram of a method for flexibly processing network packets and generating input of block ciphers inline positioned within a hardware pipeline, in accordance with some embodiments;
FIG. 4 is a modified flow and architecture diagram of a packet transmit stream illustrating a flexible cryptographic architecture in accordance with some embodiments;
FIG. 5 is a simplified diagram of a network packet undergoing encryption in accordance with some embodiments;
FIG. 6 is a modified flow and architecture diagram of a packet receive flow illustrating a flexible cryptographic architecture in accordance with some embodiments; and
Fig. 7 is a simplified diagram of a network packet undergoing decryption, according to some embodiments.
Detailed Description
As described above, when relying on a programmable core or other source of software processing to perform cryptographic algorithms and related functions of a cryptographic suite, there are drawbacks in the speed and throughput of data (e.g., network packet flows) through a network device, including performance degradation. These drawbacks apply to secure network protocols, especially as the speed and throughput of network devices increases.
Various aspects and embodiments of the present disclosure address the drawback of too much reliance on software to perform cryptographic operations by offloading algorithmic cryptographic processing and related computations to external resources, such as block cryptographic circuitry (block cipher circuit) coupled inline within a hardware pipeline, which can process network packets at up to a line rate. Such block cipher circuitry may be located, for example, within a configurable stage of a networking hardware pipeline, as will be explained in detail. In at least some embodiments, the hardware pipeline includes a plurality of hardware engines, either hardware alone or in combination with a programmable processor, such as an Application Specific Integrated Circuit (ASIC), field Programmable Gate Array (FPGA), microcontroller, or other programmable circuit or chip. For example, the hardware or hardware engine located within the hardware pipeline of the intelligent network device is much faster than the software.
Thus, in at least some embodiments, the network device includes a hardware pipeline for processing network packets to be encrypted. A portion of the hardware pipeline retrieves information from the network packet and generates a command associated with a cryptographic operation to be performed on the network packet based on the information. In an embodiment, the block cipher circuit is coupled inline within a hardware pipeline, wherein the hardware pipeline includes a set of hardware engines coupled between the portion of the hardware pipeline and the block cipher circuit. In at least some embodiments, the set of hardware engines parses and executes the command to determine a set of inputs associated with the cryptographic operation and inputs the set of inputs and a portion of the network packet to the block cryptographic circuit. The set of inputs may be specific to a cryptographic protocol selected from a set of cryptographic protocols. In these embodiments, the block cipher circuitry encrypts (or decrypts) payload data of the network packet based on the set of inputs. In this way, the command directs the hardware engine to provide specific inputs to the block cipher circuitry to enable the cryptographic operation to be performed. In some embodiments, the programmable core executing the instructions may have some participation in this offload stream by providing definitions or parameters that help direct how the block cipher circuitry operates (invovelment).
In various embodiments, a network security protocol, such as media access control security (MACsec), is run in the link layer, internet protocol security (IPSec) is run in the network layer, and any number of many transport layer (or higher layer) protocols are run over MACsec and IPSec. In various embodiments, MACsec, IPSec, and these other transport and higher layer protocols use the advanced encryption standard (AES-GCM) in Galois (Galois) counter mode for authenticated encryption. Although the AES-GCM suite may be cited herein as an example, the scope of the present disclosure extends to other cipher suites used within various network security protocols.
Advantages of the present disclosure include, but are not limited to, improving the speed and throughput of network packets through a network device by inserting such block cipher circuitry inline into a hardware engine of a hardware pipeline. For example, the block cipher circuitry may be part of a configurable, inline offload (inline offload) that supports cryptographic operations of any cryptographic protocol. By avoiding excessive interaction with software that would otherwise perform cryptographic operations, the overall performance and efficiency of the network device is also improved. Other advantages will be apparent to those skilled in the art of intelligent network devices discussed below.
Fig. 1 is a block diagram of a network device 100 including a flexible cryptographic architecture of a network interface device 102 that enables cryptographic operations to be performed inline within a hardware pipeline, according to some embodiments. In at least some embodiments, network device 100 includes an interconnection memory (ICM) 140 coupled to one or more programmable cores 150 and network interface device 102.ICM 140 may be understood as a main memory of network device 100, such as Dynamic Random Access Memory (DRAM), or the like. In these embodiments, ICM 140 may store handler code 144 and handler data 148 for running an Operating System (OS) and applying one or more programmable cores 150. In some embodiments, network device 100 is a Data Processing Unit (DPU) alone or in combination with a switch, router, hub, or the like.
In various embodiments, one or more programmable cores 150 include cacheable IO 170, cache 180, and one or more processors 190 integrated with one or more programmable cores 150 (e.g., integrated on the same die as one or more programmable cores 150). The cacheable IO 170 may be a region or area of the cache 180 dedicated to IO transactions, or may be a separate dedicated cache memory for IO transactions, or a combination thereof. Cache 180 may be an L1, L2, L3, other higher level cache associated with programmable processing of one or more programmable cores 150, or a combination thereof. In some embodiments, cache 180 and cacheable IO 170 or similar areas of cache may be memory mapped to ICM 140.
In at least some embodiments, the cache 180 is a fast access memory that may include or store, for example, a handler stack memory 182 and a control register 188. For example, cache 180 may be Static Random Access Memory (SRAM), tightly Coupled Memory (TCM), or other fast access volatile memory mapped to ICM 140. In some embodiments, the handler heap memory 182 stores stateful contexts associated with applications executed by hardware threads of one or more programmable cores 150 to facilitate processing network packets.
In some embodiments, the network interface device 102 is a Network Interface Card (NIC). In these embodiments, network interface device 102 includes, but is not limited to, a set of network ports 104 coupled to a physical medium of a network or the Internet, a set of port buffers 106 for receiving network packets from network ports 104, a device control register space 108 (e.g., within a cache or other local memory) coupled to control registers 188 on cache 180, and hardware pipeline 105. In at least some embodiments, the hardware pipeline 105 includes a cache 110, a steering (steering) engine 120, and flexible cryptographic circuitry 160. In these embodiments, at least one of the one or more programmable cores 150 is located directly within the steering engine 120, e.g., a dedicated core may be configured to provide supported processing and parameters that help direct or affect hardware processing within the steering engine 120 and/or within the flexible cryptographic circuit 160, as will be explained.
In various embodiments, cache 110 is configured to cache or store hardware data structures 112, including, for example, packet header buffers 114, parsed header structures 116, steering metadata 118, and control registers 119, which store various parameters for processing network packets. These hardware data structures 112 may be directly memory mapped to the cacheable IOs 170 and thus may be shared with one or more programmable cores 150 executing application threads that may also provide data for network packet processing performed by the HW pipeline 105.
In these embodiments, steering engine 120 is part of hardware pipeline 105 that retrieves information from network packets and generates commands associated with cryptographic operations to be performed on the network packets based on the information. More specifically, steering engine 120 may include packet resolvers 122, each packet resolver 122 resolving a network packet to retrieve a header, determine a location of a data payload or other portion of the network packet, and retrieve the data payload or other portion of the network packet. The steering engine 120 may then populate the packet header buffer 114 with the packet header, populating the parsed header structure 116 with any particular structure parsed from the packet header and any steering metadata 118 associated with the processing of each respective network packet handled by the HW pipeline 105. In these embodiments, flexible cryptographic circuit 160 includes (or is coupled to) a set of hardware engines 125, and also includes block cryptographic circuit 166, as will be discussed.
In some embodiments, steering engine 120 further includes a command generator 124 configured to determine certain steering actions to take to process and forward any given network packet based on information parsed from the network packet. In various embodiments, the command generator 124 comprises hardware (e.g., processing device, circuitry, dedicated logic, programmable logic, microcode, hardware of the device, integrated circuit, etc.), software (e.g., instructions run or executed on a processing device), or a combination thereof. In these embodiments, command generator 124 may execute the instruction stream based on an opcode retrieved from a header of the network packet.
In various embodiments, command generator 124 may access a matching action pipeline leading to engine 120, or a matching action pipeline located elsewhere in HW pipeline 105 and/or received from programmable core 150. The matching action pipeline may be adapted to match information from network packets, such as the steering metadata 118, with specific actions (e.g., via matching action criteria that may be stored in the control registers 119) that need to be taken to include encryption/decryption and encapsulating some packets for further transmission (although destination ports are not shown for simplicity). The command generator 124 may then generate a particular command based on the determined action, which in some embodiments includes generating a command intended to trigger the set of hardware engines 125 to facilitate cryptographic operations (such as authenticated encryption or decryption). In this way, the steering engine 120 is designed to have flexibility in the generation of commands that can be adapted to different cipher suites and different stream-specific variables to properly trigger the correct cipher actions and any cipher post-processing in the block cipher circuitry 166.
More specifically, the set of hardware engines 125 may parse the command to determine a set of inputs associated with the cryptographic operation, and input the set of inputs and portions of the network packet to the block cipher circuitry 166. The block cipher circuitry 166 may then encrypt or decrypt (and optionally authenticate) the payload of the network packet based on the set of inputs. Because the location of the payload of data is specific to the network packet and may vary, the set of hardware engines 125 may further determine an encryption offset to the first byte of the payload data within the network packet, wherein the set of inputs includes the encryption offset. The set of hardware engines 125 and/or block cipher circuits 166 may then be able to properly access the payload data of each network packet to be encrypted or decrypted by the block cipher circuits 166.
In various embodiments, one or more command generators 124 may act as an interface to hardware and data structures stored in any one or combination of device control register space 108, control registers 188 of one or more programmable cores 150, or ICM 140 of network interface device 102. In some embodiments, these hardware and data structures may also be accessed within cache 110 (e.g., in hardware data structure 112) or other memory accessible by hardware pipeline 105. In these embodiments, hardware engine 125 accesses these hardware and data structures to retrieve parameters that facilitate generating inputs to be provided to block cipher circuitry 166 that are required to perform cryptographic operations within block cipher circuitry 166. Thus, in some embodiments, these hardware and data structures provide the output of the set of hardware engines 125 corresponding to the inputs identifiable by the block cipher 266.
As discussed, in at least some embodiments, the hardware pipeline 105 includes a plurality of hardware engines (including the steering engine 120 and the set of hardware engines 125) that are hardware alone or in combination with a programmable processor (such as an ASIC, FPGA, microcontroller, or other programmable circuit or chip). At least some of the plurality of hardware engines of hardware pipeline 105 may execute firmware as part of the hardware execution that processes the network packets. Thus, the use of the term "hardware" should not be construed to mean, for example, only discrete gates and logic, and may include other hardware computing and programming modules, processors, or circuits.
FIG. 2 is a diagram of the hardware pipeline 105 of FIG. 1 employing flexible cryptographic operations, according to some embodiments. In some embodiments, hardware pipeline 105 generates command 202 via steering action 206, and optionally also generates command 202 via programmable core action 216. In an embodiment, steering engine 120 performs steering action 206 and programmable core 150 performs programmable core action 216. In this manner, network interface device 102 may optionally combine operations with programmable core 150 to generate commands discussed with reference to fig. 1.
In various embodiments, flexible cryptographic circuit 160 may then dynamically determine what inputs are to be sent to block cryptographic circuit 166 based on the command. In these embodiments, the set of hardware engines 125 may be generally categorized into a command parsing engine 262, an execution command engine 264, and an optional post-processing engine 268. Command parsing engine 262 (optional operations are shown in phantom in this figure) may be particularly adapted to parse commands received from the combination of steering engine 120 and programmable core 150, for example, to determine actions to be performed by execution command engine 264 and optional post-processing engine 268, including whether and how headers are protected and/or encrypted or decrypted, and other actions to be discussed.
In these embodiments, the execution command engine 264 may then perform such actions to generate the set of inputs associated with the cryptographic operation and intended to trigger the cryptographic operation. The execution command engine 264 may be adapted to optionally interact with control registers storing previously referenced hardware and data structures, retrieve pointers to encryption or decryption keys that may be used by the block cipher circuitry 166, generate Initialization Vectors (IV), nonces, or other special password strings (e.g., for varying security protocols) for the block cipher circuitry 166, retrieve Additional Authentication Data (AAD) that may be used by the block cipher circuitry 166, and the like. In some embodiments, optional post-processing engine 268 may perform additional security-related operations after the payload of the network packet is encrypted or decrypted, such as encrypting the header, overwriting the tail (trailer) of the network packet to include an integrity check value, or removing the tail. These various hardware engines will be discussed in more detail with reference to fig. 4-7. In this manner, by locating flexible cryptographic circuit 160 to include block cryptographic circuit 166 coupled inline with other hardware engines of hardware pipeline 105, cryptographic operations performed on network packets may be performed up to a line rate and much faster than software execution of such cryptographic operations as part of packet processing.
Fig. 3 is a flow diagram of a method 300 for flexibly processing network packets and generating input of block ciphers positioned inline (e.g., coupled inline) within a hardware pipeline, in accordance with some embodiments. In various embodiments, the method 300 is performed by the network interface device 102, and in particular by the hardware pipeline 105 of the network interface device 102.
At operation 310, the hardware pipeline 105 processes the network packet to be encrypted.
At operation 320 (which may be a subset of operation 310), the hardware pipeline 105 retrieves information from the network packet. This information may, for example, inform about what type of cipher suite or cryptographic operation (e.g., authenticated encryption) is to be performed on the network packet.
At operation 330 (which may be a subset of operation 310), the hardware pipeline generates a command based on the information. In some embodiments, the command is associated with a cryptographic operation to be performed on the network packet.
At operation 340, the hardware pipeline 105 (e.g., the set of hardware engines 125) parses and executes the command to determine a set of inputs. In some embodiments, the set of inputs is associated with a cryptographic operation.
At operation 350, the hardware pipeline 105 inputs the set of inputs and portions of the network packet to a block cipher circuit 166 positioned inline within the hardware pipeline 105.
At operation 360, the hardware pipeline 105 (e.g., block cipher circuitry 166) encrypts the payload data of the network packet based on the set of inputs. In some embodiments, the encryption performed includes authentication. As will be apparent with reference to fig. 6-7, a similar set of operations may be performed to decrypt a network packet that includes a payload (and optionally an encrypted header) of encrypted data.
Fig. 4 is a modified flow and architecture diagram of a packet transmit stream 400 illustrating a flexible cryptographic architecture according to some embodiments. Fig. 5 is a simplified diagram of a network packet undergoing encryption in accordance with some embodiments. In these embodiments, the transmit stream 400 optionally includes: at operation 402, a constant cyclic redundancy check (iCRC) is performed, such as an error detection code used in digital networks and storage devices to detect unexpected changes in digital data. The block of packet data entering the transmit stream 400 may obtain a short check value appended based on the remainder of its polynomial division.
In various embodiments, the hardware pipeline 105 includes the set of hardware engines 125 coupled between the portion of the hardware pipeline 105 (e.g., the steering engine 120) and the block cipher circuitry 166 (see fig. 1). In an embodiment, the portion of the hardware pipeline 105 may include the command parsing engine 262 and the execute command engine 264 discussed with reference to fig. 2. In particular, command parsing engine 262 and execution command engine 264 may parse and execute the commands received from generate command operation 202 of FIG. 2 to determine a set of inputs associated with the cryptographic operation and input portions of the set of inputs and network packet to block cipher circuitry 166 and any associated optional post-processing engine 268. In these embodiments, the block cipher circuitry 166 encrypts the payload data of the network packet based on or using the set of inputs. How encryption is performed may be specific to the cipher suite employed by the block cipher circuitry 166.
At operation 404, the portion of hardware pipeline 105 parses the command and performs a match of the context of the network packet to identify that the parsed command within the network packet indicates certain information that will be used to determine the set of inputs. Performing the context matching by one or more hardware engines may involve instantiation of an interface with generational capabilities such as copy, paste, store, etc. to pass information and data to a further hardware engine that will execute commands on the information and data. The portion of the matching context may include: from parsing the command, it is determined whether the flexible cryptographic circuit 160 will encrypt the payload and/or header of the network packet.
As an example, at operation 406, the HW engine may access a key pointer to an algorithm or packing logic dedicated to a protection or cryptographic protocol. As mentioned, while the cipher suite protocol widely used by way of example is AES-GCM (in combination with other protocols), other protocols may also be employed, alone or in different combinations. The HW engine may also access some per-packet context and/or protocol anchoring information. The protocol anchoring information may include a start anchor in the packet at which an Additional Authentication Data (AAD) island should start. The key pointer and start anchor information may be passed forward for use by additional operation blocks.
For example, at operation 462, the HW engine may use the key pointer to retrieve and decrypt a Data Encryption Key (DEK) to be used by the flexible cryptographic circuit 160. For example, the HW engine may retrieve the payload encryption key and the header encryption key from the register (or other location) pointed to by the key pointer. The HW engine may further send the payload encryption key to the block cipher circuitry 166 for encryption of the payload data and the header encryption key to the post-processing engine 268, which post-processing engine 268 performs header encryption at operation 464.
In these embodiments, the header encryption performed at operation 464 may be optional, but if performed, it provides an additional level of protection for the packet header and thus integrity for the entire network packet. For example, encrypting the packet header may help prevent intermediate device (middlebox) attacks and vulnerabilities from interfering with the delivery of a particular packet to an intended destination. Thus, in at least one embodiment, when block cipher circuitry 166 encrypts payload data of a network packet, a separate cipher block (e.g., post-processing engine 268) may perform header encryption as shown at operation 464, such that both the payload data and the header may be encrypted in parallel or sequentially and by different keys.
At operation 408, the HW engine determines an encryption offset for a first byte of payload data within the network packet, wherein the set of inputs includes the encryption offset. Fig. 5 shows an example unencrypted network packet 502 including a header of data and a plaintext payload. Thus, the HW engine may determine an offset of the first byte of the payload, which is used for encryption. Typically, software will not append a header until the data is encrypted, so the flexible cryptographic circuit 160 needs an encryption offset so the hardware knows where to find the data on which it will perform encryption. The encryption offset may be determined using a combination of values (which may be a linear combination in one embodiment) from the group (typically a length field).
At operation 410, the HW engine optionally parses a packet number or identifier (which is included in the set of inputs) that may be used in the AES-GCM to construct the Initialization Vector (IV) and nonce at operation 420. The IV and nonce may be any number or random number used once per session with the payload encryption key. IV and nonce may be constructed with a packet number or sequence number, a salt value (random number), an XOR operation, and/or a value from the packet header.
At operation 412, the HW engine may determine Additional Authentication Data (AAD) included in the set of inputs as a concatenated stream of bytes selected from at least one of a header of the network packet, a security context, and a set of most significant bits of a sequence number of the network packet. If the set of most significant bits is used, the start anchor value may inform the HW engine where those MSBs start. The AAD used and whether the AAD is used may vary according to different cipher suite protocols. In some embodiments, the packet header is used as an AAD to ensure that no one has tampered with the packet header. AAD can be constructed using several slices of byte streams from different sources (packets, contexts, etc.), each slice having an offset and a length. These streams are concatenated into a single byte stream, which is one of the inputs provided to block cipher circuitry 166.
At operation 414, the HW engine may determine a tail offset for the tail position of the network packet where the integrity check value will be located based on the length of the payload data. Thus, one set of inputs to the block cipher circuitry 166 may include a tail offset. As shown in fig. 5, the encrypted network packet 512 may include a header, a ciphertext, and an integrity check value, which in AES-GCM is referred to as an authentication tag. In these embodiments, the block cipher circuitry 166 further generates an integrity check value to authenticate the payload data and appends the integrity check value to the network offset according to the tail offset.
When the software is removed from the cryptographic operation, operation 414 facilitates ensuring that the block cipher circuitry 166 properly locates the integrity check value at the proper location within the encrypted network packet. In some embodiments, at operation 414, the HW engine may further add or remove additional data bytes to or from the end of the payload data to provide space for the integrity check value at the correct location. In these embodiments, hardware pipeline 105 further includes a post-processing hardware engine 268, which post-processing hardware engine 268 optionally overwrites the tail of the network packet with the integrity check value at operation 466. However, the integrity check value may still be provided by the block cipher circuitry 166 to the post-processing hardware engine 268.
In some embodiments, the cryptographic operation is performed in an "unknown mode (unaware mode)", which means that the operation will be performed on data that is not defined by the user to be encrypted or decrypted. In these embodiments, one or more HW engines push the header at operation 416 and insert the trailer at operation 418. Pushing the header at operation 416 includes: inserting a header that does not exist to begin with (e.g., generated by the steering engine 120), and inserting a tail at operation 418 includes: the insertion does not exist to start from (e.g., generated by the steering engine 120) the tail. As for the tail, the insertion at operation 418 may be a placeholder byte for other purposes and which ensures that the integrity check value is properly positioned at a particular offset from the added byte.
At operation 470, the post-processing engine 268 may optionally perform a User Datagram Protocol (UDP) checksum, an error detection mechanism for determining the integrity of data transmitted over the network. The communication protocol, such as TCP/IP/UDP, implements this scheme to determine whether the received data is corrupted along the network.
Fig. 6 is a modified flow and architecture diagram of a packet receive flow 600 illustrating a flexible cryptographic architecture, according to some embodiments. Fig. 7 is a simplified diagram of a network packet undergoing decryption, according to some embodiments. In an embodiment, the receive stream 600 is generally related to the reverse set of operations performed in the transmit stream 400 for decrypting encrypted network packets discussed with reference to fig. 4. In various embodiments, hardware pipeline 105 includes a set of hardware engines 125 for processing encrypted network packets. For example, the receive stream 600 optionally includes a HW engine that performs verification of the UDP checksum at operation 601 similar to that discussed with reference to operation 470 of fig. 4.
In these embodiments, the receive flow 600 further includes a HW engine that performs at least an initial matching context with at least a particular byte within a header of the network packet at operation 602. For example, at operation 602, the HW engine may retrieve the header decryption key, decrypt the particular bytes, and perform a matching action flow on the decrypted bytes to determine if the rest of the header is encrypted context and, if so, with which cipher suite. This will inform the direction of the decrypted stream of the received stream 600 of fig. 6, including whether to decrypt the header and what parameters to retrieve to do so.
In some embodiments, the hardware pipeline 105 may include a first portion to optionally decrypt the header of the network packet at operation 603, such as by one or more of the plurality of command execution engines 264 (if the header is encrypted, as determined at operation 602). In an embodiment, operation 603 is performed as two sub-operations, including a header decryption 695 operation that decrypts a packet header with a header decryption key and an XOR operation 697 that operates on header data to be used later in the receive stream 600. In some embodiments, header decryption 695 is performed by running a cryptographic operation, and then performing an XOR between the header and the output of the cryptographic operation. The result is a decrypted header. In some embodiments, header decryption 695 is performed in programmable core 150.
Hardware pipeline 105 may further include a second portion to retrieve information from the decrypted header and generate commands associated with cryptographic operations to be performed on the network packet based on the information. For example, the second portion of the hardware pipeline 105 may include a steering engine 120, the steering engine 120 further including a packet parser 122 and a command generator 124 (fig. 2).
In these embodiments, hardware pipeline 105 further includes a set of hardware engines 125 (e.g., command parsing engine 262 and execution command engine 263) coupled between the second portion and block cipher circuitry 166, the set of hardware engines 125 parsing and executing the commands to determine a set of inputs associated with the cryptographic operation and to input portions of the set of inputs and the network packet to block cipher circuitry 166 and any associated post-processing engines 268. In these embodiments, the block cipher circuitry 166 then decrypts the payload data of the network packet based on the set of inputs. How decryption is performed may be specific to the cipher suite employed by the block cipher circuitry 166.
At operation 604, one or more command parsing engines 262 of hardware pipeline 105 parse the commands and perform a matching of the context of the encrypted network packet in order to identify certain information within the network packet that the parsed commands indicate that they are to be used to determine a set of inputs to block cipher circuitry 166. Thus, while some matching and parsing is performed at operation 602 before decrypting the header, additional matching may be performed after decrypting the header. The one or more hardware engines performing the context matching may include instantiation of an interface with generational capabilities such as copy, paste, store, etc. to transfer information and data to a further hardware engine that will execute commands on the information and data. The portion of the matching context may include: it is determined from the parse command whether the flexible cryptographic circuit 160 is to decrypt the payload and/or header of the network packet.
As an example, at operation 606, the HW engine may access a key pointer to an algorithm or packing logic specific to a protection or cryptographic protocol. As mentioned, while the cipher suite protocol widely used by way of example is AES-GCM (being a combination of other protocols), other protocols may also be employed, alone or in different combinations. The HW engine may also access some per-packet context and/or protocol anchoring information. The protocol anchoring information may include a start anchor in the packet at which an Additional Authentication Data (AAD) island should start. The key pointer and start anchor information may be passed forward for use by additional operation blocks.
For example, at operation 662, the HW engine may use the key pointer to obtain and decrypt a Data Encryption Key (DEK) to be used by the flexible cryptographic circuit 160. In an embodiment, the HW engine generates a payload decryption key that is sent to the block cipher circuitry 166 for decryption of the payload data.
At operation 608, the HW engine determines a decryption offset for a first byte of payload data within the network packet, wherein the set of inputs includes the decryption offset. Fig. 7 illustrates an example encrypted network packet 712 including a header, a ciphertext payload of data, and an integrity check value. In AES-GCM, the integrity check value is called an authentication tag. Thus, the HW engine may determine an offset of the first byte of the payload, which is used for decryption. In an embodiment, flexible cryptographic circuit 160 needs a decryption offset so the hardware knows where to find the data on which it will perform decryption. The decryption offset may be determined using a combination of values (which may be a linear combination in one embodiment) from the group (typically a length field).
At operation 610, the HW engine optionally parses a packet number or identifier (which is included in the set of inputs) that may be used in the AES-GCM to construct an Initialization Vector (IV) and nonce at operation 620. The IV and nonce may be any number or random number used once per session with the payload key. IV and nonce may be constructed with packet number or sequence number, salt (random number), XOR operation and/or value from packet header.
At operation 612, the HW engine may determine Additional Authentication Data (AAD) as a concatenated stream of bytes selected from at least one of a header of the network packet, a security context, and a set of most significant bits of a sequence number of the network packet, the AAD being included in the set of inputs. If the set of most significant bits is used, the start anchor value may inform the HW engine where those MSBs start. The AAD used and whether the AAD is used may vary according to different cipher suite protocols. In some embodiments, the packet header is used as an AAD to ensure that no one has tampered with the packet header. AAD can be constructed using several slices of byte streams from different sources (packets, contexts, etc.), each slice having an offset and a length. These streams are concatenated into a single byte stream, which is one of the inputs provided to block cipher circuitry 166.
At operation 614, the HW engine may determine a tail offset of the tail position of the network packet where the integrity check value is located based on the length of the payload data. Thus, the set of inputs to block cipher circuitry 166 may include a tail offset and may also be forwarded to post-processing engine 268, which performs operation 666.
Once the block cipher circuitry 166 has decrypted the payload data, the ciphertext in the encrypted network packet 722 is replaced by plaintext generated by the decryption, generating an unencrypted network packet 732, as shown in fig. 7. Additional post-processing may also be performed on the encrypted network packet 722 to further ensure the integrity of the payload data and to reference the integrity check value in the trailer.
In various embodiments, the block cipher circuitry 166 further retrieves the integrity check value using the tail offset determined at operation 614 and authenticates the payload data based on the integrity check value. At operation 664, post-processing engine 268 may optionally execute additional protection mechanisms including replay protection. Replay protection, for example, ensures that network packets are not further processed in the event of replay by a third party, for example to avoid people or other security risks in intermediate scenes while transmitting.
At operation 666, the post-processing engine 268 may further remove the tail, such as an integrity check value that is no longer needed. Thus, the result is a decrypted network packet 732 (fig. 7). For example, post-processing engine 268 may be a tail removal engine coupled to block cipher circuitry 166 and configured to remove the tail of a decrypted network packet containing the integrity check value and optionally further including additional data bytes added to align the integrity check value with a particular location at the end of the payload data. In some embodiments, tail removal of operation 666 may also be directed to removing the tail added at operation 418 of the transmit stream 400 (fig. 4).
At operation 668, if a packet header is added at operation 416 of the transmit stream 400 (fig. 4), the post-processing engine 268 may pop (or remove) the packet header. At operation 670, a constant cyclic redundancy check (iCRC), such as an error detection code used in digital networks and storage devices, is performed to detect unexpected changes in digital data (e.g., payload data). The block of packet data leaving the receiving stream 600 may be CRC verified based on a short check value that has been appended at operation 402 of the transmitting stream 400 based on the remainder of its polynomial division of content.
Other variations are within the spirit of the present disclosure. Thus, while the disclosed technology is susceptible to various modifications and alternative constructions, certain illustrated embodiments thereof are shown in the drawings and have been described above in detail. It should be understood, however, that there is no intention to limit the disclosure to the specific form or forms disclosed, but on the contrary, the intention is to cover all modifications, alternative constructions, and equivalents falling within the spirit and scope of the disclosure as defined in the appended claims.
The use of the terms "a" and "an" and "the" and similar referents in the context of describing the disclosed embodiments (especially in the context of the following claims) are to be construed to cover both the singular and the plural, unless otherwise indicated herein or clearly contradicted by context. Unless otherwise indicated, the terms "comprising," "having," "including," and "containing" are to be construed as open-ended terms (meaning "including, but not limited to"). The term "connected" (which refers to a physical connection, when unmodified) should be interpreted as partially or wholly contained within, attached to, or connected together, even if there are some intervening objects. Recitation of ranges of values herein are merely intended to serve as a shorthand method of referring individually to each separate value falling within the range, unless otherwise indicated herein, and each separate value is incorporated into the specification as if it were individually recited herein. In at least one embodiment, unless indicated otherwise or contradicted by context, the use of the term "set" (e.g., "set of items") or "subset" should be interpreted as a non-empty set comprising one or more members. Furthermore, unless indicated otherwise or contradicted by context, the term "subset" of a respective set does not necessarily denote an appropriate subset of the corresponding set, but the subset and the corresponding set may be equal.
Unless otherwise explicitly indicated or clearly contradicted by context, a connective language such as a phrase in the form of "at least one of a, B, and C" or "at least one of a, B, and C" is understood in the context as generally used to denote an item (item), term (term), etc., which may be a or B or C, or any non-empty subset of the a and B and C sets. For example, in the illustrative example of a set having three members, the conjoin phrases "at least one of a, B, and C" and "at least one of a, B, and C" refer to any of the following sets: { A }, { B }, { C }, { A, B }, { A, C }, { B, C }, { A, B, C }. Thus, such connection language is not generally intended to imply that certain embodiments require the presence of at least one of A, at least one of B, and at least one of C each. In addition, unless otherwise indicated herein or otherwise clearly contradicted by context, the term "plurality" indicates a plurality of states (e.g., the term "plurality of items" indicates a plurality of items). In at least one embodiment, the number of items in the plurality of items is at least two, but may be more if explicitly indicated or indicated by context. Furthermore, unless otherwise indicated or clear from context, the phrase "based on" means "based at least in part on" rather than "based only on".
The operations of the processes described herein may be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. In at least one embodiment, processes such as those described herein (or variations and/or combinations thereof) are performed under control of one or more computer systems configured with executable instructions and are implemented as code (e.g., executable instructions, one or more computer programs, or one or more application programs) that are jointly executed on one or more processors by hardware or a combination thereof. In at least one embodiment, the code is stored on a computer readable storage medium in the form of, for example, a computer program comprising a plurality of instructions executable by one or more processors. In at least one embodiment, the computer-readable storage medium is a non-transitory computer-readable storage medium that excludes transitory signals (e.g., propagated transient electrical or electromagnetic transmissions), but includes non-transitory data storage circuitry (e.g., buffers, caches, and queues) within the transceiver of the transitory signals. In at least one embodiment, code (e.g., executable code or source code) is stored on a set of one or more non-transitory computer-readable storage media (or other memory for storing executable instructions) that, when executed by one or more processors of a computer system (i.e., as a result of being executed), cause the computer system to perform operations described herein. In at least one embodiment, a set of non-transitory computer-readable storage media includes a plurality of non-transitory computer-readable storage media, and one or more of the individual non-transitory storage media in the plurality of non-transitory computer-readable storage media lacks all code, but the plurality of non-transitory computer-readable storage media collectively store all code. In at least one embodiment, executable instructions are executed such that different instructions are executed by different processors.
Thus, in at least one embodiment, a computer system is configured to implement one or more services that individually or collectively perform the operations of the processes described herein, and such computer system is configured with suitable hardware and/or software that enables the operations to be performed. Further, a computer system implementing at least one embodiment of the present disclosure is a single device, and in another embodiment is a distributed computer system, comprising a plurality of devices that operate differently, such that the distributed computer system performs the operations described herein, and such that a single device does not perform all of the operations.
The use of any and all examples, or exemplary language (e.g., "such as") provided herein, is intended merely to better illuminate embodiments of the disclosure and does not pose a limitation on the scope of the disclosure unless otherwise claimed. No language in the specification should be construed as indicating any non-claimed element as essential to the practice of the disclosure.
All references, including publications, patent applications, and patents, cited herein are hereby incorporated by reference to the same extent as if each reference were individually and specifically indicated to be incorporated by reference and were set forth in its entirety herein.
In the description and claims, the terms "coupled" and "connected," along with their derivatives, may be used. It should be understood that these terms may not be intended as synonyms for each other. Rather, in particular examples, "connected" or "coupled" may be used to indicate that two or more elements are in direct or indirect physical or electrical contact with each other. "coupled" may also mean that two or more elements are not in direct contact with each other, but yet still co-operate or interact with each other.
Unless specifically stated otherwise, it is appreciated that throughout the description, terms such as "processing," "computing," "calculating," "determining," or the like, refer to the action and/or processes of a computer or computing system, or similar electronic computing device, that manipulates and/or transforms data represented as physical quantities (e.g., electronic) within the computing system's registers and/or memories into other data similarly represented as physical quantities within the computing system's memories, registers or other such information storage, transmission or display devices.
In a similar manner, the term "processor" may refer to any device or portion of a device that processes electronic data from registers and/or memory and converts the electronic data into other electronic data that may be stored in registers and/or memory. As non-limiting examples, a "processor" may be a network device, NIC, or accelerator. A "computing platform" may include one or more processors. As used herein, a "software" process may include, for example, software and/or hardware entities that perform work over time, such as tasks, threads, and intelligent agents. Also, each process may refer to multiple processes to execute instructions sequentially or in parallel, continuously or intermittently. In at least one embodiment, the terms "system" and "method" are used interchangeably herein as long as the system can embody one or more methods, and the methods can be considered as systems.
In this document, reference may be made to obtaining, acquiring, receiving or inputting analog or digital data into a subsystem, computer system or computer-implemented machine. In at least one embodiment, the process of obtaining, acquiring, receiving, or inputting analog and digital data may be accomplished in a variety of ways, such as by receiving data that is a parameter of a function call or call to an application programming interface. In at least one embodiment, the process of obtaining, acquiring, receiving, or inputting analog or digital data may be accomplished by transmitting the data via a serial or parallel interface. In at least one embodiment, the process of obtaining, acquiring, receiving, or inputting analog or digital data may be accomplished by transmitting data from a providing entity to an acquiring entity via a computer network. In at least one embodiment, the analog or digital data may also be provided, output, transmitted, sent, or presented with reference. In various examples, the process of providing, outputting, transmitting, sending, or presenting analog or digital data may be implemented by transmitting the data as input or output parameters for a function call, parameters for an application programming interface, or an interprocess communication mechanism.
While the description herein sets forth example embodiments of the described technology, other architectures may be used to implement the described functionality and are intended to fall within the scope of the present disclosure. Furthermore, while specific assignments of responsibilities are defined above for purposes of description, various functions and responsibilities may be assigned and divided in different ways, as the case may be.
Furthermore, although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter claimed in the appended claims is not necessarily limited to the specific features or acts described. Rather, the specific features and acts are disclosed as example forms of implementing the claims.

Claims (23)

1. A network device, comprising:
A hardware pipeline for processing network packets to be encrypted, wherein a portion of the hardware pipeline is for retrieving information from the network packets and generating commands based on the information;
A block cipher circuit, the block cipher circuit being coupled inline within the hardware pipeline; and
Wherein the hardware pipeline includes a set of hardware engines coupled between the portion of the hardware pipeline and the block cipher circuit, the set of hardware engines to:
Parsing and executing the command to determine a set of inputs; and
Inputting the set of inputs and a portion of the network packet to the block cipher circuit; and
Wherein the block cipher circuitry is to encrypt payload data of the network packet based on the set of inputs.
2. The network device of claim 1, wherein the set of hardware engines is further to determine an encryption offset for a first byte of the payload data within the network packet, and wherein the set of inputs includes the encryption offset.
3. The network device of claim 2, wherein the set of hardware engines is to determine the encryption offset from a combination of length fields from the network packet.
4. The network device of claim 1, further comprising an interface coupled to the hardware pipeline and the set of hardware engines, wherein to determine the set of inputs, the set of hardware engines accesses the interface based on a string of the command.
5. The network device of claim 1, wherein the set of hardware engines is further to:
Determining a tail offset for a tail position of the network packet where an integrity check value is located, based on a length of the payload data, wherein the set of inputs includes the tail offset; and
Wherein the block cipher circuit is further to:
generating the integrity check value to authenticate the payload data; and
The integrity-check value is appended to the network offset according to the tail offset.
6. The network device of claim 5, wherein the set of hardware engines is further to one of: adding or removing additional bytes of data to or from the end of the payload data to provide space for the integrity check value, wherein the hardware pipeline further comprises a post-processing hardware engine that overwrites the end of the network packet with the integrity check value.
7. The network device of claim 1, wherein the set of hardware engines is further to access a pointer to a register from which a payload encryption key is retrieved, the set of inputs including the pointer, and wherein the block cipher circuitry is to encrypt the payload data using the payload encryption key.
8. The network device of claim 1, wherein the set of hardware engines is further to determine a pointer to a register from which to retrieve a header encryption key, wherein the hardware pipeline further comprises a post-processing hardware engine to encrypt a header of the network packet using the header encryption key, wherein the encrypted header provides integrity to the network packet as a whole.
9. The network device of claim 1, wherein the set of hardware engines is further to:
Constructing an initialization vector from a combination of a sequence number of the packet, a salt value, and an input from a header of the network packet, the initialization vector being included in the set of inputs; and
Additional authenticated data included in the set of inputs is determined as a concatenated stream of bytes selected from at least one of a header of the network packet, a security context, and a set of most significant bits of the sequence number of the network packet.
10. The network device of claim 1, wherein the set of inputs is specific to a cryptographic protocol selected from a set of cryptographic protocols.
11. The network device of claim 1, further comprising a programmable core integrated with the hardware pipeline, wherein a portion of the commands are generated based on processing performed by the programmable core.
12. A network device, comprising:
A hardware pipeline for processing encrypted network packets, wherein the hardware pipeline comprises:
a first portion for decrypting a header of the network packet; and
A second portion for retrieving information from the decrypted header and generating a command based on the information;
A block cipher circuit, the block cipher circuit being coupled inline within the hardware pipeline; and
Wherein the hardware pipeline includes a set of hardware engines coupled between the second portion and the block cipher circuit, the set of hardware engines to:
Parsing and executing the command to determine a set of inputs; and
Inputting the set of inputs and a portion of the network packet to the block cipher circuit; and
Wherein the block cipher circuit is to decrypt payload data of the network packet based on the set of inputs.
13. The network device of claim 12, wherein the set of hardware engines is further to determine a decryption offset for a first byte of the payload data within the network packet, and wherein the set of inputs includes the decryption offset.
14. The network device of claim 13, wherein the set of hardware engines is to determine the decryption offset from a combination of length fields from the network packet.
15. The network device of claim 12, further comprising an interface coupled to the hardware pipeline and the set of hardware engines, wherein to determine the set of inputs, the set of hardware engines accesses the interface based on a string of the command.
16. The network device of claim 12, wherein the set of hardware engines is further to:
Determining a tail offset for a tail position of the network packet where an integrity check value is located, based on a length of the payload data, wherein the set of inputs includes the tail offset; and
Wherein the block cipher circuit is further to:
retrieving the integrity check value using the tail offset; and
Authenticating the payload data based on the integrity check value.
17. The network device of claim 16, further comprising a tail removal engine coupled to the block cipher circuit, the tail removal engine to remove a tail of the network packet that includes the integrity check value.
18. The network device of claim 12, wherein the set of hardware engines is further to determine a pointer to a register from which to retrieve a payload decryption key, the set of inputs including the pointer, and wherein the block cipher circuit is to decrypt the payload data using the payload decryption key.
19. The network device of claim 12, wherein the set of hardware engines is further to:
Constructing an initialization vector from a combination of a sequence number of the packet, a salt value, and an input from a header of the network packet, the initialization vector being included in the set of inputs; and
Additional authenticated data included in the set of inputs is determined as a concatenated stream of bytes selected from at least one of a header of the network packet, a security context, and a set of most significant bits of the sequence number of the network packet.
20. The network device of claim 12, further comprising a programmable core integrated with the hardware pipeline, wherein a portion of the commands are generated based on processing performed by the programmable core.
21. The network device of claim 12, wherein the set of inputs is specific to a cryptographic protocol selected from a set of cryptographic protocols.
22. A method, comprising:
processing, by a hardware pipeline, a network packet to be encrypted, wherein the processing comprises:
Retrieving information from the network packet; and
Generating a command based on the information;
Parsing, by a set of hardware engines of the hardware pipeline, the command to determine a set of inputs;
Inputting the set of inputs and a portion of the network packet to a block cipher circuit positioned inline within the hardware pipeline; and
The payload data of the network packet is encrypted by the block cipher circuit based on the set of inputs.
23. The method of claim 22, further comprising: an encryption offset of a first byte of the payload data within the network packet is determined, wherein the set of inputs includes the encryption offset.
CN202311444394.5A 2022-11-02 2023-11-01 Flexible cryptographic architecture in a network device Pending CN117997514A (en)

Applications Claiming Priority (3)

Application Number Priority Date Filing Date Title
IL297,897 2022-11-02
US18/195,615 US20240146703A1 (en) 2022-11-02 2023-05-10 Flexible cryptographic architecture in a network device
US18/195,615 2023-05-10

Publications (1)

Publication Number Publication Date
CN117997514A true CN117997514A (en) 2024-05-07

Family

ID=90893469

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311444394.5A Pending CN117997514A (en) 2022-11-02 2023-11-01 Flexible cryptographic architecture in a network device

Country Status (1)

Country Link
CN (1) CN117997514A (en)

Similar Documents

Publication Publication Date Title
US10785020B2 (en) Hardware offload for QUIC connections
US7392399B2 (en) Methods and systems for efficiently integrating a cryptographic co-processor
US11658803B2 (en) Method and apparatus for decrypting and authenticating a data record
US11757973B2 (en) Technologies for accelerated HTTP processing with hardware acceleration
WO2017045484A1 (en) Xts-sm4-based storage encryption and decryption method and apparatus
Pismenny et al. Autonomous NIC offloads
US11750403B2 (en) Robust state synchronization for stateful hash-based signatures
CN112152783A (en) Low-latency post-quantum signature verification for fast secure boot
US20230096233A1 (en) Chosen-plaintext secure cryptosystem and authentication
Roth et al. Classic McEliece implementation with low memory footprint
US10630760B2 (en) Adaptive encryption in checkpoint recovery of file transfers
US11082411B2 (en) RDMA-based data transmission method, network interface card, server and medium
CN114978676B (en) Data packet encryption and decryption method and system based on FPGA and eBPF cooperation
Niederhagen et al. Streaming SPHINCS+ for embedded devices using the example of TPMs
CN103649935A (en) Method and system for cryptographic processing core
US20240146703A1 (en) Flexible cryptographic architecture in a network device
CN117997514A (en) Flexible cryptographic architecture in a network device
Kiningham et al. CESEL: Securing a Mote for 20 Years.
JP5149863B2 (en) Communication device and communication processing method
US20240031127A1 (en) Lightweight side-channel protection for polynomial multiplication in post-quantum signatures
US20230283452A1 (en) Method and apparatus supporting tunable alignment for cipher/authentication implementations
CN115516454B (en) Hardware security module and system
US11943367B1 (en) Generic cryptography wrapper
US20240031140A1 (en) Efficient low-overhead side-channel protection for polynomial multiplication in post-quantum encryption
US20230412376A1 (en) Middlebox visibility for post quantum kem

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination