CN114296929A - High-speed encryption and decryption method based on replaceable bounded lock-free queue and cipher machine - Google Patents

High-speed encryption and decryption method based on replaceable bounded lock-free queue and cipher machine Download PDF

Info

Publication number
CN114296929A
CN114296929A CN202111641481.0A CN202111641481A CN114296929A CN 114296929 A CN114296929 A CN 114296929A CN 202111641481 A CN202111641481 A CN 202111641481A CN 114296929 A CN114296929 A CN 114296929A
Authority
CN
China
Prior art keywords
bounded
data
lock
free queue
encryption
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111641481.0A
Other languages
Chinese (zh)
Inventor
杨旸
王明华
胡冲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu 30rtom Mobile Communication Co ltd
Original Assignee
Chengdu 30rtom Mobile Communication Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu 30rtom Mobile Communication Co ltd filed Critical Chengdu 30rtom Mobile Communication Co ltd
Priority to CN202111641481.0A priority Critical patent/CN114296929A/en
Publication of CN114296929A publication Critical patent/CN114296929A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Storage Device Security (AREA)

Abstract

The invention provides a high-speed encryption and decryption method based on replaceable bounded and lock-free queues, wherein a cipher machine receives data packets through data receiving threads and stores the data packets into a first bounded annular lock-free queue, a plurality of encryption and decryption threads simultaneously take out data from the first bounded annular lock-free queue and acquire keys in a key bank for encryption and decryption calculation, and the data encrypted and decrypted among the encryption and decryption threads is stored into a second bounded annular lock-free queue; and the cipher machine takes out the data from the second bounded annular lock-free queue through the data sending thread and sends the data outwards. The scheme provided by the invention greatly improves the execution efficiency of encryption and decryption, improves the overall processing capacity of a cipher machine system, solves the problem of multi-channel parallel encryption, and meets the application requirements of high-speed and low-delay encryption and decryption.

Description

High-speed encryption and decryption method based on replaceable bounded lock-free queue and cipher machine
Technical Field
The invention relates to the field of encryption and decryption, in particular to a high-speed encryption and decryption method and a cipher machine based on replaceable bounded lock-free queues.
Background
In the solution of server cipher machine in information security field, data security and system efficiency coexist, and under the condition of the same security mechanism, high throughput and low time delay of encryption and decryption become key indexes influencing product performance. To achieve a high performance, low latency cryptographic engine program, engineers design a multi-tasking, multi-threaded architecture. The task threads for performing the cryptographic operation are independently executed respectively and finish the same cryptographic processing task concurrently. Concurrent program multithreading writes to the same resource requires complex and expensive coordination, which is usually achieved through some kind of lock. Locks are in fact expensive because they need to arbitrate when competing. This arbitration involves a context switch by the operating system, which suspends all threads waiting on the lock until the lock holder releases the lock. During a context switch, the executing thread may lose control of the operating system, resulting in the executing thread's execution context losing the previously cached data and instruction set, which can cause significant performance loss for modern processors.
In addition to locks, another approach is CAS. CAS relies on processor support, which most modern processors of course support. CAS is very efficient with respect to locks because it does not need to arbitrate with kernel context switches. But CAS is not free, the processor needs to lock the instruction pipeline to ensure atomicity, and a memory fence is used to ensure visibility to other threads.
Modern processors may perform instruction reordering for higher performance, and instruction execution, data loading, and storing are performed in memory and execution units. The memory fence, also called memory barrier, is a CPU instruction: a) ensuring the order in which certain operations are performed; b) affecting the visibility of some data. The compiler and the CPU can reorder the instructions under the condition of ensuring the same output result, thereby optimizing the performance. Inserting a memory barrier is equivalent to telling the CPU and compiler that this command must be executed before and after. Another role of the memory barrier is to force an update of the cache of a different CPU once. For example, a write barrier may flush data written before the barrier to a cache, so that any thread attempting to read the data will get the latest value regardless of which CPU core or CPU is executing. The memory barrier acts as an instruction for another CPU stage, with no as much overhead as a lock. The kernel does not interfere and schedule among multiple threads. However, the memory barrier is still overhead — the compiler/CPU cannot reorder instructions, which results in inefficient use of the CPU, and also overhead in flushing the cache.
Disclosure of Invention
Aiming at the problems in the prior art, the high-speed encryption and decryption method and the cipher machine based on the replaceable bounded lock-free queue are provided, the problem of multi-thread parallel encryption of a server cipher machine system program is solved, the problem of poor performance of the traditional queue in a multi-thread program is solved, the overall processing capacity of encryption and decryption is improved, and the application requirements of high-speed and low-delay encryption and decryption of the server cipher machine are met.
The technical scheme adopted by the invention is as follows: a high-speed encryption and decryption method based on replaceable bounded and lock-free queues is characterized in that a cipher machine receives data packets through data receiving threads and stores the data packets into a first bounded annular lock-free queue, a plurality of encryption and decryption threads simultaneously take out data from the first bounded annular lock-free queue and obtain keys in a key bank for encryption and decryption calculation, and data encrypted and decrypted among the encryption and decryption threads are stored into a second bounded annular lock-free queue; and the cipher machine takes out the data from the second bounded annular lock-free queue through the data sending thread and sends the data outwards.
Further, all encryption and decryption threads have the same cryptographic processing logic.
Furthermore, the bounded annular lock-free queue adopts a fence and sequence number mechanism to coordinate a receiving thread and a sending thread.
Furthermore, the bounded annular lock-free queue is implemented by pre-distributing a bounded data structure in the form of an annular buffer, a single or a plurality of receiving threads write data into the bounded annular lock-free queue, and a single or a plurality of sending threads read data from the bounded annular lock-free queue; and after reading or writing, the respective pointer value p is (p + 1)% n, and n is the length of the bounded annular lock-free queue.
The invention also provides a high-speed cipher machine based on the replaceable bounded and lock-free queue, which comprises a data receiving module, a data sending module, N encryption and decryption modules, a key bank, a first bounded annular lock-free queue and a second bounded annular lock-free queue;
the data receiving module is used for receiving the data packet and storing the data packet into a first bounded annular lock-free queue;
the N encryption and decryption modules are used for taking out data from the first bounded annular lock-free queue, performing encryption and decryption operation according to a secret key, and storing the data into a second bounded annular lock-free queue;
the data sending module is used for taking out the data after the encryption and decryption operation from the second bounded annular lock-free queue and sending the data to the outside of the cipher machine;
and the key bank is used for storing the key provided for the encryption and decryption module.
Further, the cipher machine further comprises a key negotiation module, which is used for generating a key and storing the key in a key repository.
Furthermore, the N encryption and decryption modules have the same cryptographic operation processing logic.
Furthermore, the first bounded annular non-lock queue and the second bounded annular non-lock queue adopt a fence and sequence number mechanism to coordinate a receiving thread and a sending thread.
Furthermore, the bounded annular lock-free queue is implemented by pre-distributing a bounded data structure in the form of an annular buffer, a single or a plurality of receiving threads write data into the bounded annular lock-free queue, and a single or a plurality of sending threads read data from the bounded annular lock-free queue; and after reading or writing, the respective pointer value p is (p + 1)% n, and n is the length of the bounded annular lock-free queue.
Compared with the prior art, the beneficial effects of adopting the technical scheme are as follows: the proposal provided by the invention makes full use of the characteristic of CPU multi-core to concurrently process the cryptographic operation of each data packet, greatly improves the execution efficiency of the application program, improves the overall processing capacity of the cipher machine system, solves the problem of multi-channel parallel encryption, and meets the application requirements of high-speed and low-delay encryption and decryption.
Drawings
FIG. 1 is a high-speed encryption and decryption diagram based on alternative bounded lock-free queues according to the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
Example 1
As shown in fig. 1, this embodiment provides a high-speed encryption and decryption method based on a replaceable bounded and lock-free queue, where a cryptographic machine receives a data packet through a data receiving thread and stores the data packet into a first bounded annular lock-free queue, multiple encryption and decryption threads simultaneously fetch data from the first bounded annular lock-free queue, and obtain a key in a key bank to perform encryption and decryption calculations, and store encrypted and decrypted data among the encryption and decryption threads into a second bounded annular lock-free queue; and the cipher machine takes out the data from the second bounded annular lock-free queue through the data sending thread and sends the data outwards. Wherein the number of the encryption and decryption threads is the number of CPU cores.
The encryption operation processing logic of all encryption and decryption threads is the same, and each packet of data encryption operation can be processed in any thread, so that data are in a dynamic equilibrium state among a plurality of threads, and the parallel encryption operation of the encryption machine is realized.
In this embodiment, an alternative technique to the bounded circular lock-free queue is used, which is an efficient inter-thread data exchange component that uses a barrier + sequence number mechanism to coordinate the receiving and sending threads, thereby avoiding the use of locks and CAS, and simultaneously uses a pre-allocated memory mechanism, a cache line mechanism, and a batch processing effect in combination to achieve the high throughput and low latency goals. The traditional unbounded lock queue depends on a bottom operating system to cause slow performance, so that the processing delay of messages is higher, severe delay jitter is easily brought, and meanwhile, one memory queue is unbounded and can become uncontrollable under a large condition until a large amount of memory is consumed to cause serious errors. In contrast, the bounded annular lock-free queue has less write contention, lower concurrency overhead, friendly cache and lower delay jitter.
Specifically, the bounded annular lock-free queue is implemented by pre-allocating a bounded data structure in the form of an annular buffer, a single or a plurality of receiving threads write data into the bounded annular lock-free queue, and a single or a plurality of sending threads read data from the bounded annular lock-free queue; the receiving thread and the sending thread respectively have a pointer, the pointer of the sending thread points to the next slot to be read, the receiving thread pointer points to the next slot to be put, after reading or writing, the respective pointer value p is (p + 1)% n, n is the length of the bounded annular lock-free queue, and thus the pointers repeatedly walk on the queue.
The receiving thread and the sending thread operate different pointers respectively, so that no lock is needed, no lock exists, and no competition means that the speed of the lock-free queue is very high. The difference between the bounded lock-free queue and the conventional queue in common use is that it is an array, and therefore faster than a linked list, and has an easily predictable access pattern. This is CPU cache friendly-that is, at the hardware level, elements in the array are preloaded, so the CPU does not need to occasionally host the next element in the loaded array in the ring buffer. Second, the cryptographic engine program can pre-allocate memory for the array, so that the array object always exists, which means that it does not take much time for garbage collection. Furthermore, unlike a linked list, a node object-corresponding needs to be created for each object added to it, and when a node is deleted, a corresponding memory clean-up operation needs to be performed. The bounded lock-free queue can be replaced without deleting data in the ring buffer, that is, the data is always stored in the ring buffer until new data replaces and overwrites them.
The bounded annular lock-free queue has a single-thread writing mode and a memory barrier. The cipher machine application program is programmed by adopting Java language, and a field volile in Java can enable a Java memory model to insert a write barrier instruction after write operation and a read barrier instruction before read operation. The pointer (cursor) of the circular lock-free queue belongs to this magic latency variable, which enables the circular lock-free queue to operate without a lock. However, the memory barrier is still overhead, so that the read-write frequency of the volatile variable can be minimized instead of the implementation of the bounded circular lock-free queue.
The bounded circular lock-free queue can be replaced to eliminate false sharing by filling the cache line. When multiple threads modify variables that are independent of each other, they inadvertently affect each other's performance if they share the same cache line, which is a false share. Write contention on a cache line is the most important limiting factor for parallel threads running in an SMP system to achieve scalability. Pseudo-sharing is a silent performance killer. The alternative bounded circular lock-free queue ensures that the sequence number of the circular buffer does not exist in a cache line with other things at the same time by adding padding, and there is no unnecessary cache miss situation.
Example 2
The embodiment provides a high-speed cipher machine based on a replaceable bounded and lock-free queue, which comprises a data receiving module, a data sending module, N encryption and decryption modules, a key bank, a first bounded annular lock-free queue and a second bounded annular lock-free queue;
the data receiving module is used for receiving the data packet and storing the data packet into a first bounded annular lock-free queue;
the N encryption and decryption modules are used for taking out data from the first bounded annular lock-free queue, performing encryption and decryption operation according to a secret key, and storing the data into a second bounded annular lock-free queue;
the data sending module is used for taking out the data after the encryption and decryption operation from the second bounded annular lock-free queue and sending the data to the outside of the cipher machine;
and the key bank is used for storing the key provided for the encryption and decryption module.
In this embodiment, the cryptographic engine further includes a key negotiation module, configured to generate a key and store the key in a key repository.
Furthermore, the N encryption and decryption modules have the same cryptographic operation processing logic.
The bounded ring lock-free queue technology is an efficient inter-thread data exchange component, a fence and sequence number mechanism is used for coordinating a receiving thread and a sending thread, so that the use of locks and CAS is avoided, and meanwhile, a pre-allocated memory mechanism, a cache line mechanism and a batch processing effect are combined to achieve the purposes of high throughput and low time delay. The traditional unbounded lock queue depends on a bottom operating system to cause slow performance, so that the processing delay of messages is higher, severe delay jitter is easily brought, and meanwhile, one memory queue is unbounded and can become uncontrollable under a large condition until a large amount of memory is consumed to cause serious errors. In contrast, the bounded annular lock-free queue has less write contention, lower concurrency overhead, friendly cache and lower delay jitter.
Furthermore, the bounded annular lock-free queue is implemented by pre-distributing a bounded data structure in the form of an annular buffer, a single or a plurality of receiving threads write data into the bounded annular lock-free queue, and a single or a plurality of sending threads read data from the bounded annular lock-free queue; and after reading or writing, the respective pointer value p is (p + 1)% n, and n is the length of the bounded annular lock-free queue. The pointers of the receiving thread and the sending thread repeatedly walk on the queue, and the bounded lock-free queue does not delete the data in the ring buffer, namely the data are stored in the ring buffer until new data replace and cover the data.
The invention is not limited to the foregoing embodiments. The invention extends to any novel feature or any novel combination of features disclosed in this specification and any novel method or process steps or any novel combination of features disclosed. Those skilled in the art to which the invention pertains will appreciate that insubstantial changes or modifications can be made without departing from the spirit of the invention as defined by the appended claims.
All of the features disclosed in this specification, or all of the steps in any method or process so disclosed, may be combined in any combination, except combinations of features and/or steps that are mutually exclusive.
Any feature disclosed in this specification may be replaced by alternative features serving equivalent or similar purposes, unless expressly stated otherwise. That is, unless expressly stated otherwise, each feature is only an example of a generic series of equivalent or similar features.

Claims (9)

1. A high-speed encryption and decryption method based on replaceable bounded and lock-free queues is applied to a cipher machine and is characterized in that the cipher machine receives data packets through data receiving threads and stores the data packets into a first bounded annular lock-free queue, a plurality of encryption and decryption threads simultaneously take out data from the first bounded annular lock-free queue and obtain keys in a key bank for encryption and decryption calculation, and data encrypted and decrypted among the encryption and decryption threads are stored into a second bounded annular lock-free queue; and the cipher machine takes out the data from the second bounded annular lock-free queue through the data sending thread and sends the data outwards.
2. The method of claim 1, wherein all encryption and decryption threads have the same cryptographic processing logic.
3. The high-speed encryption and decryption method based on the replaceable bounded lock-free queue as claimed in claim 1 or 2, wherein the bounded circular lock-free queue coordinates read and write threads by adopting a fence + sequence number mechanism.
4. The method for high-speed encryption and decryption based on the replaceable bounded and lock-free queue according to claim 1, wherein the bounded annular lock-free queue is implemented by pre-allocating a bounded data structure in the form of a ring buffer, and a single or a plurality of receiving threads write data into the bounded annular lock-free queue and a single or a plurality of sending threads read data from the bounded annular lock-free queue; the data in the queue is put in a slot, the receiving and transmitting threads respectively have a pointer, the pointer of the transmitting thread points to the next slot to be read, the pointer of the receiving thread points to the next slot to be put, after the receiving and transmitting are finished, the respective pointer value p is (p + 1)% n, and n is the length of the bounded annular lock-free queue.
5. A high-speed cipher machine based on replaceable bounded and lock-free queues is characterized by comprising a data receiving module, a data sending module, N encryption and decryption modules, a key bank, a first bounded annular lock-free queue and a second bounded annular lock-free queue;
the data receiving module is used for receiving the data packet and storing the data packet into a first bounded annular lock-free queue;
the N encryption and decryption modules are used for taking out data from the first bounded annular lock-free queue, performing encryption and decryption operation according to a secret key, and storing the data into a second bounded annular lock-free queue;
the data sending module is used for taking out the data after the encryption and decryption operation from the second bounded annular lock-free queue and sending the data to the outside of the cipher machine;
and the key bank is used for storing the key provided for the encryption and decryption module.
6. The high-speed cryptographic machine based on an alternative bounded lock-free queue of claim 5, further comprising a key agreement module for generating and storing a key to the keystore.
7. The high-speed cryptographic machine based on an alternative bounded lock-free queue as claimed in claim 5 or 6, wherein said N encryption and decryption modules have the same cryptographic processing logic.
8. The high speed cryptographic machine based on alternative bounded and lock-free queues as claimed in claim 5, wherein the first and second bounded circular lock-free queues use a fence + sequence number mechanism to coordinate the transceiving threads.
9. The high speed cryptographic machine based on an alternative bounded and lock-free queue as claimed in claim 5, wherein the bounded circular lock-free queue is implemented as a circular buffer pre-allocated with bounded data structures, wherein a single or multiple receiving threads write data to the bounded circular lock-free queue, and wherein a single or multiple sending threads read data from the bounded circular lock-free queue; and after reading or writing, the respective pointer value p is (p + 1)% n, and n is the length of the bounded annular lock-free queue.
CN202111641481.0A 2021-12-29 2021-12-29 High-speed encryption and decryption method based on replaceable bounded lock-free queue and cipher machine Pending CN114296929A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111641481.0A CN114296929A (en) 2021-12-29 2021-12-29 High-speed encryption and decryption method based on replaceable bounded lock-free queue and cipher machine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111641481.0A CN114296929A (en) 2021-12-29 2021-12-29 High-speed encryption and decryption method based on replaceable bounded lock-free queue and cipher machine

Publications (1)

Publication Number Publication Date
CN114296929A true CN114296929A (en) 2022-04-08

Family

ID=80970637

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111641481.0A Pending CN114296929A (en) 2021-12-29 2021-12-29 High-speed encryption and decryption method based on replaceable bounded lock-free queue and cipher machine

Country Status (1)

Country Link
CN (1) CN114296929A (en)

Similar Documents

Publication Publication Date Title
US8458721B2 (en) System and method for implementing hierarchical queue-based locks using flat combining
CA2706737C (en) A multi-reader, multi-writer lock-free ring buffer
US7650602B2 (en) Parallel processing computer
US20070157200A1 (en) System and method for generating a lock-free dual queue
US8949549B2 (en) Management of ownership control and data movement in shared-memory systems
US20100070730A1 (en) Minimizing memory access conflicts of process communication channels
US11822815B2 (en) Handling ring buffer updates
US20190087123A1 (en) Handling in-order and out-of-order ring buffer updates
US20120096292A1 (en) Method, system and apparatus for multi-level processing
US9424080B2 (en) Systems and methods for utilizing futures for constructing scalable shared data structures
US20130067160A1 (en) Producer-consumer data transfer using piecewise circular queue
Arnautov et al. Ffq: A fast single-producer/multiple-consumer concurrent fifo queue
US7398368B2 (en) Atomic operation involving processors with different memory transfer operation sizes
Patel et al. A hardware implementation of the MCAS synchronization primitive
Aboulenein et al. Hardware support for synchronization in the Scalable Coherent Interface (SCI)
US5623685A (en) Vector register validity indication to handle out-of-order element arrival for a vector computer with variable memory latency
CN114296929A (en) High-speed encryption and decryption method based on replaceable bounded lock-free queue and cipher machine
JP7346649B2 (en) Synchronous control system and method
US11734051B1 (en) RTOS/OS architecture for context switching that solves the diminishing bandwidth problem and the RTOS response time problem using unsorted ready lists
US8990511B2 (en) Multiprocessor, cache synchronization control method and program therefor
Calciu et al. How to implement any concurrent data structure
CN117331709A (en) Optimization lock-free queue method for Linux system
Ma et al. Effective data exchange in parallel computing
JEFFERY A Lock-Free Inter-Device Ring Buffer
Garg et al. Light-weight Locks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination