CN117667783A

CN117667783A - High-concurrency high-throughput non-waiting annular queue data storage method

Info

Publication number: CN117667783A
Application number: CN202311676752.5A
Authority: CN
Inventors: 李珂; 匡乃亮; 杨启迪; 罗迒哉; 钟升
Original assignee: Xian Microelectronics Technology Institute
Current assignee: Xian Microelectronics Technology Institute
Priority date: 2023-12-07
Filing date: 2023-12-07
Publication date: 2024-03-08

Abstract

The invention discloses a high-concurrency high-throughput non-waiting annular queue data storage method, which belongs to the field of computer data structures and comprises the following steps of 1, initializing an annular queue structure; step 2, acquiring a current position index through atomic operation and bit operation; step 3, judging whether the storage content pointed by the index meets the read-write condition, if yes, turning to step 4, and if not, turning to step 5; step 4, writing or reading data; and 5, failing to operate, and directly returning without any modification to the content pointed by the index. Through simple design, the throughput of accessing the shared memory area is greatly improved, and the multithreading synchronous operation is stably supported. The invention adopts the annular queue, does not need to apply for the storage space additionally in operation, has low space complexity, and can be applied to embedded equipment with severely limited performance; the invention has low code complexity and high code readability, and is convenient for the later maintenance of users.

Description

High-concurrency high-throughput non-waiting annular queue data storage method

Technical Field

The invention belongs to the field of computer data structures, and particularly relates to a high-concurrency high-throughput non-waiting annular queue data storage method.

Background

With the development of multi-core technology, it has become very urgent to design efficient concurrent data structures. Especially for microsystem many-core applications. The traditional concurrency mode adopts a mutual exclusion lock or spin lock technology, has huge system overhead, and is easy to cause the problems of deadlock, priority reversal and the like. To address the above issues, non-blocking data structures are of great concern, where no wait provides the highest non-blocking guarantee, ensuring that each thread can complete any operation within a limited number of steps.

The current wait-free technology has the biggest problems that the design is difficult and the cost is overlarge, and the performance is continuously reduced to meet the redundant design along with the improvement of 'guarantee' in most scenes. One is due to the key mechanism employed by most non-waiting algorithms-the "help mechanism". This mechanism controls the way threads help each other to complete an operation, often resulting in complex algorithms and additional use of more atomic operations, creating high contention and increasing overhead. Second, the help mechanism is designed to operate in sequence, as all concurrently running threads help other operations in exactly the same order. Due to the high redundancy, a large number of inefficient operations result. Secondly, due to the memory reclamation mechanism, the mechanism is used for releasing the data storage space which has completed all operations, so that the theoretical unbounded storage is achieved, however, frequent memory application and release can cause storage boundary crossing and wild pointer in extreme situations, and the storage space is not enough any more along with the increase of the number of writing threads.

In summary, the application of the wait-free queue in the prior art has a complex help mechanism and memory reclamation mechanism, which significantly affects the thread running efficiency, and significantly increases the possibility of errors in extreme situations.

Disclosure of Invention

In order to solve the problems in the prior art, the invention provides a high-concurrency high-throughput non-waiting annular queue data storage method which is used for solving the problems in the background art. And finally, the real requirement of rapid data synchronization among multiple cores of the microsystem is realized.

In order to achieve the above purpose, the present invention provides the following technical solutions:

a high concurrency high throughput no-wait ring queue data storage method comprises the following steps,

step 1, initializing a ring-shaped queue structure;

step 2, acquiring a current position index through atomic operation and bit operation;

step 3, judging whether the storage content pointed by the index meets the read-write condition, if yes, turning to step 4, and if not, turning to step 5;

step 4, writing or reading data;

and 5, failing to operate, and directly returning without any modification to the content pointed by the index.

Preferably, in step 1, variable initialization is performed, bytes are aligned according to the machine instruction length, and the read index variable and the write index variable are initialized to zero after alignment.

Further, when the machine is a 32-bit machine, the alignment is 32; when the machine is a 64-bit machine, then the mode is 64 for this.

Preferably, in step 1, the storage space is initialized, and the user defines the size of the ring buffer.

Preferably, in step 2, obtaining the current location index specifically includes the following process, moving the write index location through the fetch_and_add atomic operation, automatically atomically adding parameter 1 to the value stored in the atomic object, and obtaining the previously saved value; meanwhile, due to the adoption of the annular buffer area, the obtained index and the size of the buffer area are subjected to residual taking operation, so that the real index position is obtained rapidly.

Preferably, in step 4, writing data specifically includes the following process, setting a special identifier Fu on each position of the buffer area, which indicates that the buffer area is empty, and storing data;

when writing data, comparing whether the index position points to a special identifier (T), if so, attempting to store the data through a compare_and_swap atomic operation, and if the atomic operation is successful, returning true; the other cases are regarded as failures, which means that other producer threads at the position store data first and return to false directly.

Preferably, in step 4, the reading data specifically includes the following steps of acquiring a position index, moving a read index position through a fetch_and_add atomic operation, automatically adding 1 to an index value, and returning a value of a current read index;

checking whether the position points to a special identifier (T) or not, if not, indicating that normal data are needed to be fetched, firstly fetching the data, resetting the data to be a special identifier Fu, and finally returning true; the rest of the cases are regarded as execution failure, which means that there is no data, and return false.

Preferably, in step 4, the data types of the written or read data are the same.

Preferably, in step 4, one or more data writing threads and one data reading thread are provided.

Compared with the prior art, the invention has the following beneficial technical effects:

the invention provides a high-concurrency high-throughput non-waiting annular queue data storage method, which greatly improves the throughput of accessing a shared storage area through simple design and stably supports multi-thread synchronous operation. The invention completely avoids a help mechanism, improves the data throughput by 50% -100%, and can stably adapt to the application scene of high-speed data throughput; the invention adopts the annular queue, does not need to apply for the storage space additionally in operation, has low space complexity, and can be applied to embedded equipment with severely limited performance; the invention has low code complexity and high code readability, and is convenient for the later maintenance of users. The method overcomes the defects, and is particularly suitable for applications lacking supporting software environments in the embedded field of microsystems and the like. The invention has stable operation, and the performance index does not drop along with the rapid increase of the thread number.

Drawings

FIG. 1 is an application model diagram of a high concurrency high throughput no-wait ring queue data storage method of the present invention;

FIG. 2 is a graph showing the performance of a method for storing data in a high-concurrency high-throughput non-waiting ring queue according to the present invention;

FIG. 3 is a detailed schematic diagram of the operation flow of a high concurrency high throughput no-wait ring queue data storage method according to the present invention.

Detailed Description

Hereinafter, only certain exemplary embodiments are briefly described. As will be recognized by those of skill in the pertinent art, the described embodiments may be modified in various different ways without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.

Examples

The invention performs application test on the Intel (R) Core (TM) i5-6300HQ CPU@2.30GHz chip of the Linux operating system, and has excellent experimental results.

The invention does not require the type of the data to be written and read, but the same type of the data is required to be ensured in the actual operation process, namely if the data contains integer type (int) data, all the data are integer type data; if character type (char) data is included, all data is character type.

When the program runs, a plurality of writing (data storing) threads can be adopted, and the writing (data storing) threads are determined according to actual requirements; there is one and only one read (fetch data) thread. The data read-write process comprises the following steps:

s1: initializing variables, functions and spaces;

s2: acquiring a current position index through atomic operation and bit operation;

s3: judging whether the storage content pointed by the index meets the read-write condition, if yes, turning to S4 is met, and if not, turning to S5 is not met;

s4: writing or reading data;

s5: the operation fails, the content pointed by the index is not modified, and the operation returns directly.

During the running process of the program, the threads cannot interfere with each other, and all code instructions can be completed in limited steps and return results. Therefore, the problem of performance degradation caused by a help mechanism and a memory recovery mechanism is well avoided, the application scene and the application range of the queue without waiting are further expanded, and the performance of the queue is stably improved by 50% -100% compared with the performance of the existing algorithm queue.

The invention provides a new annular queue to support a large number of concurrent threads under the environment of single consumer of multiple producers, thereby meeting the high throughput requirement and fundamentally solving the problems brought by the help mechanism and the memory recovery mechanism. And finally, the real requirement of rapid data synchronization among multiple cores of the microsystem is realized.

To achieve the above object, the design details of the ring queue are as follows, and the flowchart is shown in fig. 3:

step 1, initializing a structure:

step 1.1, initializing variables: the bytes are aligned according to the machine instruction length to increase the cache hit rate, with 32 for a 32-bit machine and 64 for a 64-bit machine. After alignment, the read index variable (readIdx) and the write index variable (writeIdx) are initialized to zero.

Step 1.2, initializing a storage space: the user can define the size (queue size) of the ring buffer, so that the ring buffer is convenient to adapt to various devices, and the applicable range is improved.

Step 2, enqueue operation (write):

step 2.1, obtaining a position index: first, the write index position is moved through the fetch_and_add atomic operation, the fetch_and_add atomic operation automatically increases the write index by 1, the operation returns to the current write index, and the data corresponding to the current write index position is obtained according to the current write index. Meanwhile, due to the adoption of the annular buffer area, the obtained current writing index and the size of the buffer area are subjected to residual taking operation, so that the real index position is obtained rapidly.

Step 2.2, data enqueuing: for faster execution, a special flag Fu is set at each location in the buffer, indicating that the location is empty and data can be stored. The first step is therefore to compare if the index location points to a special identifier Fu (resolving contention between producer threads and consumer threads), if so, try to store data via a compare_and_swap atomic operation (resolving contention among multiple producer threads), if the atomic operation is successful, return true; the other cases are regarded as failures, which means that other producer threads at the position store data first and return to false directly.

Step 3, dequeue operation (reading):

step 3.1, obtaining a position index: first, the read index position is moved through the fetch_and_add atomic operation, the index value is atomically and automatically incremented by 1, and the value of the current read index is returned. Here again only the most relaxed memory order related is needed. And performing remainder taking operation on the size queue of the buffer area to quickly acquire the real position.

Step 3.2, data dequeuing: checking whether the position points to a special identifier (T), if not, indicating that normal data need to be fetched, and because the consumer thread is only single, only needs to fetch the data first, resetting the position as a special identifier Fu, and finally returning true; the rest of the cases are regarded as execution failure, which means that there is no data, and return false.

The invention greatly improves the throughput of accessing the shared memory area through simple design, and stably supports the synchronous operation of multiple threads. (effect comparison as shown in fig. 2):

the invention completely avoids a help mechanism, improves the data throughput by 50% -100%, and can stably adapt to the application scene of high-speed data throughput;

the invention adopts the annular queue, does not need to apply for the storage space additionally in operation, has low space complexity, and can be applied to embedded equipment with severely limited performance;

the invention has low code complexity and high code readability, and is convenient for the later maintenance of users. The method overcomes the defects, and is particularly suitable for applications lacking supporting software environments in the embedded field of microsystems and the like. The invention has stable operation, and the performance index does not drop along with the rapid increase of the thread number.

As shown in FIG. 1, where each thread runs a task, the data generated at the beginning and end of the task run needs to be aggregated and centrally analyzed on the host process.

The abscissa in fig. 2 represents the number of threads, and the ordinate represents the data throughput (write+read sum). Under the same hardware platform, different algorithms are tested, the algorithms are operated for a certain time, the data throughput of the different algorithms in the time is tested, and as can be seen from fig. 2, the performance of the invention is improved by 50% compared with the latest queue, and is improved by 100% compared with the classical queue.

As shown in fig. 3, fig. 3 details what the program may happen to be at run time and the corresponding results returned. false represents the failure of this operation, true represents the success of this operation.

Table 1 throughput data table

Table 2 code complexity analysis table

While the fundamental and principal features of the invention and advantages of the invention have been shown and described, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.

Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art. The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims

1. A high concurrency high throughput no-wait ring queue data storage method is characterized by comprising the following steps,

step 1, initializing a ring-shaped queue structure;

step 4, writing or reading data;

2. The method for storing high-concurrency high-throughput non-waiting ring queue data according to claim 1, wherein in step 1, variable initialization is performed, byte alignment is performed according to the machine instruction length, and the read index variable and the write index variable are initialized to zero after alignment.

3. The method for storing high-concurrency high-throughput non-waiting circular queue data according to claim 2, wherein when the machine is a 32-bit machine, the alignment mode is 32; when the machine is a 64-bit machine, then the mode is 64 for this.

4. The method for storing high-concurrency high-throughput non-waiting ring queue data according to claim 1, wherein in step 1, the storage space is initialized, and the ring buffer size is customized by a user.

5. The high-concurrency high-throughput no-wait ring queue data storage method of claim 1, wherein in step 2, obtaining the current location index comprises the steps of automatically atomically adding parameter 1 to the value stored in the atomic object by moving the write index location through a fetch_and_add atomic operation, and obtaining the previously saved value; meanwhile, due to the adoption of the annular buffer area, the obtained index and the size of the buffer area are subjected to residual taking operation, so that the real index position is obtained rapidly.

6. The method for storing high-concurrency high-throughput non-waiting circular queue data according to claim 1, wherein in step 4, writing data specifically includes the following process, setting a special flag Fu on each location of the buffer, which indicates that the location is empty and data can be stored therein;

7. The method for storing high-concurrency high-throughput non-waiting annular queue data according to claim 1, wherein in step 4, reading data specifically comprises the following steps of obtaining a position index, moving a read index position through a fetch_and_add atomic operation, automatically adding 1 to an index value atomically, and returning the value of a current read index;

8. The high-concurrency high-throughput no-wait ring queue data storage method of claim 1, wherein in step 4, the data types of the write or read data are the same.

9. The method of claim 1, wherein in step 4, one or more write data threads and one read data thread are used.