CN117667783A - High-concurrency high-throughput non-waiting annular queue data storage method - Google Patents

High-concurrency high-throughput non-waiting annular queue data storage method Download PDF

Info

Publication number
CN117667783A
CN117667783A CN202311676752.5A CN202311676752A CN117667783A CN 117667783 A CN117667783 A CN 117667783A CN 202311676752 A CN202311676752 A CN 202311676752A CN 117667783 A CN117667783 A CN 117667783A
Authority
CN
China
Prior art keywords
data
index
throughput
concurrency
waiting
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311676752.5A
Other languages
Chinese (zh)
Inventor
李珂
匡乃亮
杨启迪
罗迒哉
钟升
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xian Microelectronics Technology Institute
Original Assignee
Xian Microelectronics Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xian Microelectronics Technology Institute filed Critical Xian Microelectronics Technology Institute
Priority to CN202311676752.5A priority Critical patent/CN117667783A/en
Publication of CN117667783A publication Critical patent/CN117667783A/en
Pending legal-status Critical Current

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a high-concurrency high-throughput non-waiting annular queue data storage method, which belongs to the field of computer data structures and comprises the following steps of 1, initializing an annular queue structure; step 2, acquiring a current position index through atomic operation and bit operation; step 3, judging whether the storage content pointed by the index meets the read-write condition, if yes, turning to step 4, and if not, turning to step 5; step 4, writing or reading data; and 5, failing to operate, and directly returning without any modification to the content pointed by the index. Through simple design, the throughput of accessing the shared memory area is greatly improved, and the multithreading synchronous operation is stably supported. The invention adopts the annular queue, does not need to apply for the storage space additionally in operation, has low space complexity, and can be applied to embedded equipment with severely limited performance; the invention has low code complexity and high code readability, and is convenient for the later maintenance of users.

Description

High-concurrency high-throughput non-waiting annular queue data storage method
Technical Field
The invention belongs to the field of computer data structures, and particularly relates to a high-concurrency high-throughput non-waiting annular queue data storage method.
Background
With the development of multi-core technology, it has become very urgent to design efficient concurrent data structures. Especially for microsystem many-core applications. The traditional concurrency mode adopts a mutual exclusion lock or spin lock technology, has huge system overhead, and is easy to cause the problems of deadlock, priority reversal and the like. To address the above issues, non-blocking data structures are of great concern, where no wait provides the highest non-blocking guarantee, ensuring that each thread can complete any operation within a limited number of steps.
The current wait-free technology has the biggest problems that the design is difficult and the cost is overlarge, and the performance is continuously reduced to meet the redundant design along with the improvement of 'guarantee' in most scenes. One is due to the key mechanism employed by most non-waiting algorithms-the "help mechanism". This mechanism controls the way threads help each other to complete an operation, often resulting in complex algorithms and additional use of more atomic operations, creating high contention and increasing overhead. Second, the help mechanism is designed to operate in sequence, as all concurrently running threads help other operations in exactly the same order. Due to the high redundancy, a large number of inefficient operations result. Secondly, due to the memory reclamation mechanism, the mechanism is used for releasing the data storage space which has completed all operations, so that the theoretical unbounded storage is achieved, however, frequent memory application and release can cause storage boundary crossing and wild pointer in extreme situations, and the storage space is not enough any more along with the increase of the number of writing threads.
In summary, the application of the wait-free queue in the prior art has a complex help mechanism and memory reclamation mechanism, which significantly affects the thread running efficiency, and significantly increases the possibility of errors in extreme situations.
Disclosure of Invention
In order to solve the problems in the prior art, the invention provides a high-concurrency high-throughput non-waiting annular queue data storage method which is used for solving the problems in the background art. And finally, the real requirement of rapid data synchronization among multiple cores of the microsystem is realized.
In order to achieve the above purpose, the present invention provides the following technical solutions:
a high concurrency high throughput no-wait ring queue data storage method comprises the following steps,
step 1, initializing a ring-shaped queue structure;
step 2, acquiring a current position index through atomic operation and bit operation;
step 3, judging whether the storage content pointed by the index meets the read-write condition, if yes, turning to step 4, and if not, turning to step 5;
step 4, writing or reading data;
and 5, failing to operate, and directly returning without any modification to the content pointed by the index.
Preferably, in step 1, variable initialization is performed, bytes are aligned according to the machine instruction length, and the read index variable and the write index variable are initialized to zero after alignment.
Further, when the machine is a 32-bit machine, the alignment is 32; when the machine is a 64-bit machine, then the mode is 64 for this.
Preferably, in step 1, the storage space is initialized, and the user defines the size of the ring buffer.
Preferably, in step 2, obtaining the current location index specifically includes the following process, moving the write index location through the fetch_and_add atomic operation, automatically atomically adding parameter 1 to the value stored in the atomic object, and obtaining the previously saved value; meanwhile, due to the adoption of the annular buffer area, the obtained index and the size of the buffer area are subjected to residual taking operation, so that the real index position is obtained rapidly.
Preferably, in step 4, writing data specifically includes the following process, setting a special identifier Fu on each position of the buffer area, which indicates that the buffer area is empty, and storing data;
when writing data, comparing whether the index position points to a special identifier (T), if so, attempting to store the data through a compare_and_swap atomic operation, and if the atomic operation is successful, returning true; the other cases are regarded as failures, which means that other producer threads at the position store data first and return to false directly.
Preferably, in step 4, the reading data specifically includes the following steps of acquiring a position index, moving a read index position through a fetch_and_add atomic operation, automatically adding 1 to an index value, and returning a value of a current read index;
checking whether the position points to a special identifier (T) or not, if not, indicating that normal data are needed to be fetched, firstly fetching the data, resetting the data to be a special identifier Fu, and finally returning true; the rest of the cases are regarded as execution failure, which means that there is no data, and return false.
Preferably, in step 4, the data types of the written or read data are the same.
Preferably, in step 4, one or more data writing threads and one data reading thread are provided.
Compared with the prior art, the invention has the following beneficial technical effects:
the invention provides a high-concurrency high-throughput non-waiting annular queue data storage method, which greatly improves the throughput of accessing a shared storage area through simple design and stably supports multi-thread synchronous operation. The invention completely avoids a help mechanism, improves the data throughput by 50% -100%, and can stably adapt to the application scene of high-speed data throughput; the invention adopts the annular queue, does not need to apply for the storage space additionally in operation, has low space complexity, and can be applied to embedded equipment with severely limited performance; the invention has low code complexity and high code readability, and is convenient for the later maintenance of users. The method overcomes the defects, and is particularly suitable for applications lacking supporting software environments in the embedded field of microsystems and the like. The invention has stable operation, and the performance index does not drop along with the rapid increase of the thread number.
Drawings
FIG. 1 is an application model diagram of a high concurrency high throughput no-wait ring queue data storage method of the present invention;
FIG. 2 is a graph showing the performance of a method for storing data in a high-concurrency high-throughput non-waiting ring queue according to the present invention;
FIG. 3 is a detailed schematic diagram of the operation flow of a high concurrency high throughput no-wait ring queue data storage method according to the present invention.
Detailed Description
Hereinafter, only certain exemplary embodiments are briefly described. As will be recognized by those of skill in the pertinent art, the described embodiments may be modified in various different ways without departing from the spirit or scope of the present invention. Accordingly, the drawings and description are to be regarded as illustrative in nature and not as restrictive.
Examples
The invention performs application test on the Intel (R) Core (TM) i5-6300HQ CPU@2.30GHz chip of the Linux operating system, and has excellent experimental results.
The invention does not require the type of the data to be written and read, but the same type of the data is required to be ensured in the actual operation process, namely if the data contains integer type (int) data, all the data are integer type data; if character type (char) data is included, all data is character type.
When the program runs, a plurality of writing (data storing) threads can be adopted, and the writing (data storing) threads are determined according to actual requirements; there is one and only one read (fetch data) thread. The data read-write process comprises the following steps:
s1: initializing variables, functions and spaces;
s2: acquiring a current position index through atomic operation and bit operation;
s3: judging whether the storage content pointed by the index meets the read-write condition, if yes, turning to S4 is met, and if not, turning to S5 is not met;
s4: writing or reading data;
s5: the operation fails, the content pointed by the index is not modified, and the operation returns directly.
During the running process of the program, the threads cannot interfere with each other, and all code instructions can be completed in limited steps and return results. Therefore, the problem of performance degradation caused by a help mechanism and a memory recovery mechanism is well avoided, the application scene and the application range of the queue without waiting are further expanded, and the performance of the queue is stably improved by 50% -100% compared with the performance of the existing algorithm queue.
The invention provides a new annular queue to support a large number of concurrent threads under the environment of single consumer of multiple producers, thereby meeting the high throughput requirement and fundamentally solving the problems brought by the help mechanism and the memory recovery mechanism. And finally, the real requirement of rapid data synchronization among multiple cores of the microsystem is realized.
To achieve the above object, the design details of the ring queue are as follows, and the flowchart is shown in fig. 3:
step 1, initializing a structure:
step 1.1, initializing variables: the bytes are aligned according to the machine instruction length to increase the cache hit rate, with 32 for a 32-bit machine and 64 for a 64-bit machine. After alignment, the read index variable (readIdx) and the write index variable (writeIdx) are initialized to zero.
Step 1.2, initializing a storage space: the user can define the size (queue size) of the ring buffer, so that the ring buffer is convenient to adapt to various devices, and the applicable range is improved.
Step 2, enqueue operation (write):
step 2.1, obtaining a position index: first, the write index position is moved through the fetch_and_add atomic operation, the fetch_and_add atomic operation automatically increases the write index by 1, the operation returns to the current write index, and the data corresponding to the current write index position is obtained according to the current write index. Meanwhile, due to the adoption of the annular buffer area, the obtained current writing index and the size of the buffer area are subjected to residual taking operation, so that the real index position is obtained rapidly.
Step 2.2, data enqueuing: for faster execution, a special flag Fu is set at each location in the buffer, indicating that the location is empty and data can be stored. The first step is therefore to compare if the index location points to a special identifier Fu (resolving contention between producer threads and consumer threads), if so, try to store data via a compare_and_swap atomic operation (resolving contention among multiple producer threads), if the atomic operation is successful, return true; the other cases are regarded as failures, which means that other producer threads at the position store data first and return to false directly.
Step 3, dequeue operation (reading):
step 3.1, obtaining a position index: first, the read index position is moved through the fetch_and_add atomic operation, the index value is atomically and automatically incremented by 1, and the value of the current read index is returned. Here again only the most relaxed memory order related is needed. And performing remainder taking operation on the size queue of the buffer area to quickly acquire the real position.
Step 3.2, data dequeuing: checking whether the position points to a special identifier (T), if not, indicating that normal data need to be fetched, and because the consumer thread is only single, only needs to fetch the data first, resetting the position as a special identifier Fu, and finally returning true; the rest of the cases are regarded as execution failure, which means that there is no data, and return false.
The invention greatly improves the throughput of accessing the shared memory area through simple design, and stably supports the synchronous operation of multiple threads. (effect comparison as shown in fig. 2):
the invention completely avoids a help mechanism, improves the data throughput by 50% -100%, and can stably adapt to the application scene of high-speed data throughput;
the invention adopts the annular queue, does not need to apply for the storage space additionally in operation, has low space complexity, and can be applied to embedded equipment with severely limited performance;
the invention has low code complexity and high code readability, and is convenient for the later maintenance of users. The method overcomes the defects, and is particularly suitable for applications lacking supporting software environments in the embedded field of microsystems and the like. The invention has stable operation, and the performance index does not drop along with the rapid increase of the thread number.
As shown in FIG. 1, where each thread runs a task, the data generated at the beginning and end of the task run needs to be aggregated and centrally analyzed on the host process.
The abscissa in fig. 2 represents the number of threads, and the ordinate represents the data throughput (write+read sum). Under the same hardware platform, different algorithms are tested, the algorithms are operated for a certain time, the data throughput of the different algorithms in the time is tested, and as can be seen from fig. 2, the performance of the invention is improved by 50% compared with the latest queue, and is improved by 100% compared with the classical queue.
As shown in fig. 3, fig. 3 details what the program may happen to be at run time and the corresponding results returned. false represents the failure of this operation, true represents the success of this operation.
Table 1 throughput data table
Table 2 code complexity analysis table
While the fundamental and principal features of the invention and advantages of the invention have been shown and described, it will be apparent to those skilled in the art that the invention is not limited to the details of the foregoing exemplary embodiments, but may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art. The above is only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited by this, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the claims of the present invention.

Claims (9)

1. A high concurrency high throughput no-wait ring queue data storage method is characterized by comprising the following steps,
step 1, initializing a ring-shaped queue structure;
step 2, acquiring a current position index through atomic operation and bit operation;
step 3, judging whether the storage content pointed by the index meets the read-write condition, if yes, turning to step 4, and if not, turning to step 5;
step 4, writing or reading data;
and 5, failing to operate, and directly returning without any modification to the content pointed by the index.
2. The method for storing high-concurrency high-throughput non-waiting ring queue data according to claim 1, wherein in step 1, variable initialization is performed, byte alignment is performed according to the machine instruction length, and the read index variable and the write index variable are initialized to zero after alignment.
3. The method for storing high-concurrency high-throughput non-waiting circular queue data according to claim 2, wherein when the machine is a 32-bit machine, the alignment mode is 32; when the machine is a 64-bit machine, then the mode is 64 for this.
4. The method for storing high-concurrency high-throughput non-waiting ring queue data according to claim 1, wherein in step 1, the storage space is initialized, and the ring buffer size is customized by a user.
5. The high-concurrency high-throughput no-wait ring queue data storage method of claim 1, wherein in step 2, obtaining the current location index comprises the steps of automatically atomically adding parameter 1 to the value stored in the atomic object by moving the write index location through a fetch_and_add atomic operation, and obtaining the previously saved value; meanwhile, due to the adoption of the annular buffer area, the obtained index and the size of the buffer area are subjected to residual taking operation, so that the real index position is obtained rapidly.
6. The method for storing high-concurrency high-throughput non-waiting circular queue data according to claim 1, wherein in step 4, writing data specifically includes the following process, setting a special flag Fu on each location of the buffer, which indicates that the location is empty and data can be stored therein;
when writing data, comparing whether the index position points to a special identifier (T), if so, attempting to store the data through a compare_and_swap atomic operation, and if the atomic operation is successful, returning true; the other cases are regarded as failures, which means that other producer threads at the position store data first and return to false directly.
7. The method for storing high-concurrency high-throughput non-waiting annular queue data according to claim 1, wherein in step 4, reading data specifically comprises the following steps of obtaining a position index, moving a read index position through a fetch_and_add atomic operation, automatically adding 1 to an index value atomically, and returning the value of a current read index;
checking whether the position points to a special identifier (T) or not, if not, indicating that normal data are needed to be fetched, firstly fetching the data, resetting the data to be a special identifier Fu, and finally returning true; the rest of the cases are regarded as execution failure, which means that there is no data, and return false.
8. The high-concurrency high-throughput no-wait ring queue data storage method of claim 1, wherein in step 4, the data types of the write or read data are the same.
9. The method of claim 1, wherein in step 4, one or more write data threads and one read data thread are used.
CN202311676752.5A 2023-12-07 2023-12-07 High-concurrency high-throughput non-waiting annular queue data storage method Pending CN117667783A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311676752.5A CN117667783A (en) 2023-12-07 2023-12-07 High-concurrency high-throughput non-waiting annular queue data storage method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311676752.5A CN117667783A (en) 2023-12-07 2023-12-07 High-concurrency high-throughput non-waiting annular queue data storage method

Publications (1)

Publication Number Publication Date
CN117667783A true CN117667783A (en) 2024-03-08

Family

ID=90074799

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311676752.5A Pending CN117667783A (en) 2023-12-07 2023-12-07 High-concurrency high-throughput non-waiting annular queue data storage method

Country Status (1)

Country Link
CN (1) CN117667783A (en)

Similar Documents

Publication Publication Date Title
US8095727B2 (en) Multi-reader, multi-writer lock-free ring buffer
KR101370314B1 (en) Optimizations for an unbounded transactional memory (utm) system
US11106795B2 (en) Method and apparatus for updating shared data in a multi-core processor environment
US11204813B2 (en) System and method for multidimensional search with a resource pool in a computing environment
TWI644208B (en) Backward compatibility by restriction of hardware resources
CN105786665B (en) The system for executing state for testing transactional
CN110727675B (en) Method and device for processing linked list
US7809894B2 (en) Compare, swap and store facility with no external serialization
US11714801B2 (en) State-based queue protocol
US10282307B1 (en) Lock-free shared hash map
US10313477B2 (en) System and method for use of a non-blocking process with a resource pool in a computing environment
CN111095203A (en) Inter-cluster communication of real-time register values
CN117667783A (en) High-concurrency high-throughput non-waiting annular queue data storage method
US20220269675A1 (en) Hash-based data structure
WO2022218337A1 (en) Method for inspecting code under weak memory order architecture, and corresponding device
US9223780B2 (en) Non-blocking caching technique
CN113377549B (en) Queue data control method, system and queue data structure
US11822815B2 (en) Handling ring buffer updates
CN114443764A (en) Multithreading data processing method and device for database and electronic equipment
US11593113B2 (en) Widening memory access to an aligned address for unaligned memory operations
Ma et al. Effective data exchange in parallel computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination