CN116991609B - Queue fairness processing method, apparatus, and readable storage medium - Google Patents
Queue fairness processing method, apparatus, and readable storage medium Download PDFInfo
- Publication number
- CN116991609B CN116991609B CN202311250123.6A CN202311250123A CN116991609B CN 116991609 B CN116991609 B CN 116991609B CN 202311250123 A CN202311250123 A CN 202311250123A CN 116991609 B CN116991609 B CN 116991609B
- Authority
- CN
- China
- Prior art keywords
- data
- queue
- slice
- descriptor
- processing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title abstract description 8
- 238000012545 processing Methods 0.000 claims abstract description 75
- 238000000034 method Methods 0.000 claims abstract description 56
- 238000004364 calculation method Methods 0.000 claims description 24
- 238000007667 floating Methods 0.000 claims description 6
- 230000005540 biological transmission Effects 0.000 claims description 5
- 230000006870 function Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 6
- 238000004590 computer program Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000001360 synchronised effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000007726 management method Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/546—Message passing systems or structures, e.g. queues
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5005—Allocation of resources, e.g. of the central processing unit [CPU] to service a request
- G06F9/5027—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
- G06F9/5038—Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
- G06F9/544—Buffers; Shared memory; Pipes
Landscapes
- Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
The application provides a queue fairness processing method, a device and a readable storage medium. The method comprises the following steps: slicing the first queue data or the data to be processed corresponding to the first queue data to obtain a first slice; processing the first slice, queuing the first residual data again, slicing the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice, processing the second slice, and queuing the second queue data or the second residual data of the data to be processed corresponding to the second queue data again; slicing the first residual data to obtain a third slice; and processing the third slice, and queuing the first queue data or third residual data of the data to be processed corresponding to the first queue data again.
Description
Technical Field
The present invention relates to computer technology, and in particular, to a queue fairness processing method, apparatus, and readable storage medium.
Background
A Queue (Queue) is a common data structure that follows the First-In-First-Out (FIFO) principle. A queue can be seen as a special linear table that allows an insert operation only at one end of the table (the tail) and a delete operation at the other end (the head). In a multi-queue scenario, scheduling management is performed on multiple queues, and one queue which is dequeued first is processed and then the next queue is processed. However, when the traffic between different queues is different, the required processing time is different, and the serial processing mode can cause that the queues with large data volume block the queues with smaller traffic volume, so that the user experience of the queues with smaller traffic volume is poor.
Disclosure of Invention
The embodiment of the invention provides a queue fairness processing method, equipment and a readable storage medium, which can fairly provide processing opportunities among different queues.
In a first aspect, a method for fair queue processing is provided, which is applied to a process of processing first queue data of a first queue and second queue data of a second queue by a processor, and the method includes:
slicing the first queue data or the data to be processed corresponding to the first queue data to obtain a first slice, wherein the first queue data is data in a first queue;
processing a first slice, and queuing the first queue data or first residual data of the data to be processed corresponding to the first queue data again, wherein the first residual data is the first queue data or the data to be processed corresponding to the first queue data which is residual after the first slice is cut off;
slicing the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice, wherein the second queue data is data in a second queue;
Processing a second slice, and queuing the second queue data or second residual data of the data to be processed corresponding to the second queue data again, wherein the second residual data is the second queue data or the data to be processed corresponding to the second queue data which is residual after the second slice is cut off;
slicing the first residual data to obtain a third slice;
and processing a third slice, and queuing the first queue data or third residual data of the data to be processed corresponding to the first queue data again, wherein the third residual data is the first queue data or the data to be processed corresponding to the first queue data, which is remained after the first slice and the third slice are cut off.
In some possible designs, the first queue is a queue of a virtual input/output device, and the second queue is a queue of a virtual input/output device, where the virtual input/output device is used to implement data transmission between the first device and the second device;
slicing the data to be processed corresponding to the first queue data to obtain a first slice, including:
Slicing data in a first buffer area corresponding to a first descriptor in a first queue to obtain a first slice, wherein the first descriptor belongs to the first queue data;
processing the first slice, and queuing the first queue data or the first remaining data of the data to be processed corresponding to the first queue data again, including:
processing a first slice, determining a first offset, and queuing the first residual data again, wherein the first offset is calculated according to the data of the first slice and is used for generating a starting address of the third slice;
slicing the data to be processed corresponding to the second queue data to obtain a second slice, including:
slicing data in a second buffer area corresponding to a second descriptor in a second queue to obtain a second slice, wherein the second descriptor belongs to the second queue data;
processing the second slice, and queuing the second queue data or second residual data of the data to be processed corresponding to the second queue data again, including:
processing a second slice, determining a second offset, and queuing second residual data again, wherein the second offset is calculated according to the data of the second slice and is used for generating a start address of a slice after the next slice;
Slicing the first remaining data to obtain a third slice, comprising:
slicing the first residual data in the first buffer area corresponding to the first descriptor in the first queue to obtain a third slice;
processing the third slice, and queuing the first queue data or third remaining data of the data to be processed corresponding to the first queue data again, including:
and processing a third slice, determining a third offset, and queuing third residual data again, wherein the third offset is calculated according to the data of the first slice and the third slice and is used for generating a start address of the slice after the next slice.
In some possible designs, slicing the data in the first buffer corresponding to the first descriptor in the first queue to obtain a first slice includes:
under the condition that a first flag bit in a first available ring is set, slicing data in a first buffer area corresponding to a first descriptor in a first queue to obtain a first slice, wherein the first available ring corresponds to the first queue;
slicing the data in the second buffer area corresponding to the second descriptor in the second queue to obtain a second slice, including:
And under the condition that a second flag bit in a second available ring is set, slicing data in a second buffer area corresponding to a second descriptor in a second queue to obtain a second slice, wherein the second available ring corresponds to the second queue.
In some possible designs, processing the first slice, determining the first offset, and re-queuing the first remaining data includes:
processing the first slice, determining a first offset, inserting the first offset into the first descriptor, and queuing the first descriptor inserted with the first offset again.
In some possible designs, processing the first slice, determining the first offset, and re-queuing the first remaining data includes:
and processing the first slice, determining a first offset, storing the first offset into a memory, and queuing the first residual data again.
In some possible designs, where the first slice is obtained by slicing the data to be processed corresponding to the first queue data, processing the first slice includes:
generating a first corresponding descriptor, and subtracting the data size of a first slice from the data size in the first descriptor, wherein the data size in the first corresponding descriptor is equal to the data size of the first slice;
The first slice and/or the first corresponding descriptor is transferred between the first device and the second device via direct memory access.
In some possible designs, processing the third slice includes:
subtracting the data size of the third slice from the data size of the first descriptor, and adding the data size of the third slice to the data size of the first corresponding descriptor;
and transmitting a third slice and/or the modified first corresponding descriptor between the first device and the second device through direct memory access.
In some possible designs, where the first slice is a slice of the first queue data, processing the first slice includes:
a first calculation process is performed on the first slice, the first calculation process including any one of an arithmetic operation, a bit operation, a logical operation, a floating point operation, and a string operation.
In some possible designs, after the computing process is performed on the first slice, the method further includes:
and storing the context which is generated by processing the first slice and does not need to participate in queuing into a memory.
In some possible designs, processing the third slice includes:
And restoring the context which does not need to participate in queuing from the memory, and processing a third slice in the environment of the restored context.
In some possible designs, before slicing the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice, the method further includes:
comparing second queue data or the data to be processed corresponding to the second queue data with the priority level of the first residual data;
slicing the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice, including:
and slicing the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice under the condition that the priority level of the second queue data or the data to be processed corresponding to the second queue data is higher than the priority level of the first residual data.
In a second aspect, there is provided a computing device comprising: the system comprises a processor, virtual input and output equipment, a first queue and a second queue, wherein the first queue is a queue of the virtual input and output equipment, the second queue is a queue of the virtual input and output equipment, and the virtual input and output equipment is used for realizing data transmission between the first equipment and the second equipment;
The processor is used for slicing first queue data or data to be processed corresponding to the first queue data to obtain a first slice, wherein the first queue data is data in a first queue;
the processor is used for processing the first slice and queuing the first queue data or first residual data of the data to be processed corresponding to the first queue data again, wherein the first residual data is the first queue data or the data to be processed corresponding to the first queue data which remains after the first slice is cut off;
the processor is configured to slice the second queue data or data to be processed corresponding to the second queue data to obtain a second slice, where the second queue data is data in a second queue;
the processor is configured to process a second slice, and re-queue the second queue data or second remaining data of the data to be processed corresponding to the second queue data, where the second remaining data is the second queue data or remaining data of the data to be processed corresponding to the second queue data after the second slice is cut off;
The processor is used for slicing the first residual data to obtain a third slice;
the processor is configured to process a third slice, and re-queue the first queue data or third remaining data of the data to be processed corresponding to the first queue data, where the third remaining data is the first queue data or remaining data after the first slice and the third slice are cut off by the data to be processed corresponding to the first queue data.
In a third aspect, there is provided a computer readable storage medium storing computer instructions that, when run on a computer device, cause the computer device to perform the method according to any one of the first aspects.
In the scheme, each queue is sliced by the slicing technology, and the queuing is returned to be re-queued after slicing is finished, so that the situation that the next queue cannot be processed before the previous queue is finished can be avoided, for example, because the queue with more data volume is not finished, the queue with less data volume can only wait for the queue with more data volume to be processed to obtain the opportunity of processing, different queues have the opportunity of processing in a certain time, and the processing fairness is greatly improved.
Drawings
In order to more clearly describe the embodiments of the present invention or the technical solutions in the background art, the following description will describe the drawings that are required to be used in the embodiments of the present invention or the background art.
FIG. 1 is a schematic diagram of a multi-queue scenario provided herein;
fig. 2 is a flow chart of a queue fairness method provided in the present application;
fig. 3 is a schematic structural diagram of a computing device provided in the present application.
Detailed Description
A virtual input output (Virtio) device is a set of virtual devices used in a virtualized environment. The Virtio device may emulate various hardware devices including a network adapter, disk controller, serial controller, etc. to provide input/output functionality between the first device and the second device. In a specific embodiment, the Virtio device is provided with a queue. The queues may or may not distinguish between receive queues and transmit queues. Wherein the receive queue is for receiving data from the Virtio device and the transmit queue is for transmitting data to the Virtio device. The composition of queues of different protocol versions is different, e.g., split virtual queues (split virtual) include descriptor rings, available rings, unavailable rings, etc.; a compact virtual queue (packet virtual queue includes) descriptor ring. Each queue is described below as being composed of a set of descriptors (descriptors) and one or more Available rings (Available rings). Descriptors (descriptors) are data structures used to describe a data buffer, each Descriptor containing a pointer to the buffer and associated control information, and can be considered as metadata for the buffer. An Available Ring (Ring) is a Ring buffer that contains a set of indices that point to descriptors. When the first device needs to send data, it will write the index of the descriptor to the available ring. Communication between the first device and the second device is via a queue. Specifically, when the first device needs to send data, it writes the index of the descriptor into the available ring of the sending queue, after the second device detects the new index in the available ring, it acquires the descriptor and copies the data from the first device to the Virtio device, when the Virtio device finishes data processing, it writes the index of the descriptor into the available ring of the receiving queue, informs the first device that the new data is available, and after the first device detects the new index in the available ring, it acquires the descriptor and reads the data returned by the Virtio device. In a specific embodiment, the first device may be a host machine or a virtual machine, and the second device may be a virtual machine or a host machine, respectively.
As shown in fig. 1, when there are multiple Virtio devices, there are multiple queues, or when there is only one Virtio device, the Virtio device has multiple queues, which is a multi-queue scenario. Taking the example shown in fig. 1 as an example, the multi-queue scenario includes a first queue, a second queue, a third queue, and a fourth queue. The first queue corresponds to a first available ring, the second queue corresponds to a second available ring, the third queue corresponds to a third available ring, and the fourth queue corresponds to a fourth available ring. The first queue has a first descriptor. The second queue has a second descriptor and a third descriptor. The third queue has a fourth descriptor, a fifth descriptor, and a sixth descriptor. The fourth queue has a seventh descriptor and an eighth descriptor. Taking the first descriptor as an example, the first descriptor may include a data type, an address of a data buffer, a length of data, a data constraint, a data access right, a data association relationship, a data default value, a data formatting rule, and the like. Because of the plurality of queues, the plurality of queues naturally need to be scheduled, so that conflicts are avoided. However, in the prior art, it is common to process one queue of the first dequeue before processing the next queue. However, when the traffic between different queues is different, the required processing time is different, and the serial processing mode can cause that the queues with large data volume block the queues with smaller traffic volume, so that the user experience of the queues with smaller traffic volume is poor. For example, assume that a first descriptor of a first queue is dequeued and a second descriptor of a second queue is dequeued, but that the first descriptor corresponds to a data buffer that stores 1 gigabit (Gbit) of data and the second descriptor corresponds to a data buffer that stores 1 megabit (Mbit) of data. In this way, before the 1Gbit data corresponding to the first descriptor is not processed, the 100Mbit data corresponding to the second descriptor will not be transmitted, and only after the 1Gbit data corresponding to the first descriptor is transmitted, the 1Mbit data corresponding to the second descriptor will be transmitted, which is obviously unfair.
In order to solve the above problems, the present application provides a queue fairness processing method, apparatus, and readable storage medium, which can fairly provide processing opportunities among different queues.
Referring to fig. 2, fig. 2 is a flow chart of a queue fairness processing method provided in the present application. As shown in fig. 2, the queue fairness processing method of the present embodiment includes:
s101: and the processor slices the first queue data or the data to be processed corresponding to the first queue data to obtain a first slice.
In a possible embodiment, the first queue data is data in a first queue, and the data in the first queue may be data for calculation or metadata. The data used for the computation may be integers, characters, floating point numbers, etc. The metadata may be descriptors or the like.
In one possible implementation, when the first queue data is data for calculation, the first queue data itself may be sliced. Taking the first queue data in the first queue as an example of a picture, the picture may be an array of 1024×1024, then the picture may be sliced, for example, 1024×2 is cut from the array of 1024×1024 as the first slice. It should be understood that the size of the first slice may be set as required, and in the above example, only the size of the first slice is exemplified by 1024×2, and in practical application, the size of the first slice may also be 1024×1, 1024×3,1×1024,2×1024,100×100,50×50, etc., which is not limited specifically herein.
In a possible implementation manner, when the first queue data is metadata, the data in the first buffer area corresponding to the first descriptor in the first queue is sliced to obtain a first slice. Specifically, when it is detected that the first flag bit in the first available ring is set, the data in the first buffer area corresponding to the first descriptor in the first queue may be sliced to obtain a first slice. The first available ring is an available ring corresponding to the first queue. Taking the example that the data in the first queue includes a first descriptor, and the data buffer area corresponding to the first descriptor stores 1 gigabit (Gbit) to-be-processed data, 100Mbit can be cut out from the 1 gigabit (Gbit) to-be-processed data as a first slice. It should be understood that the size of the first slice may be set as required, and in the above example, the size of the first slice is merely exemplified by 100Mbit, and in practical application, the size of the first slice may also be 150Mbit,200Mbit,250Mbit, etc., which is not limited herein specifically.
S102: and the processor processes the first slice and re-queues the first queue data or first residual data of the data to be processed corresponding to the first queue data.
In one possible implementation, when the first queue data is data for calculation, the first slice is processed and the first remaining data of the first queue data is re-queued. The first remaining data is the data remaining after the first slice is cut off for the first queue data. Specifically, the processor performs a first calculation process on the first slice, the first calculation process including any one of an arithmetic operation, a bit operation, a logical operation, a floating point operation, and a string operation. Taking the first slice as 1024×2 data as an example, the 1024×2 data may be subjected to an inverse operation, that is, a subtraction operation is performed on the gray value of each pixel and the maximum gray value. And then, saving the calculated 1024 x 2 data into a memory, saving the first context which is generated by calculation and does not need to be queued into the memory, and returning the rest 1024 x 1022 data to the tail of the first queue for queuing again.
In a possible implementation manner, when the first queue data is metadata, the first slice is processed, and the first remaining data of the data to be processed corresponding to the first queue data is re-queued, where the first remaining data is the data remaining after the first slice is cut off by the data to be processed corresponding to the first queue data. Specifically, taking the case that the first device accesses the second device through Direct Memory Access (DMA), at this time, the first buffer corresponding to the first descriptor is the memory area corresponding to the first device. The processor cuts out the first slice, subtracts the data size of the first slice from the data length in the first descriptor, generates a first corresponding descriptor for the first slice, determines a first offset, re-queues the first residual data, and transmits the first slice and/or the first corresponding descriptor from the first device to the second device through direct memory access, and stores a first context which is generated by processing the first slice and does not need to participate in queuing into a memory. Here, the first offset may be calculated from the data amount of the first slice, for example, the first offset may be equal to the data amount of the first slice, or the first offset may be an address calculated from the base address of the first buffer and the data of the first slice. The processing manner of the first offset may include two kinds of: (1) The first offset is inserted into the first descriptor, and the first descriptor inserted into the first offset is re-queued. (2) saving the first offset as a context to memory. The data length in the first corresponding descriptor is equal to the data quantity of the first slice, and the buffer area corresponding to the first corresponding descriptor is the memory area corresponding to the second device. The first context that does not need to participate in queuing may be other information than the queue number and the first offset, such as a base address, a data type, a data constraint, a data access right, a data association, and so on. In the above examples, the first device accesses the second device through Direct Memory Access (DMA) is described as an example, and in practical applications, the second device may access the first device through Direct Memory Access (DMA), and the like, which is not particularly limited herein.
S103: the processor compares the first residual data with the second queue data in the second queue or the priority level of the data to be processed corresponding to the second queue number according to the priority policy. If the priority level in the second queue is higher, go to step S104; if the priority level of the first remaining data is relatively high, the process proceeds to step S106.
In one possible embodiment, the priority policy may be an unprocessed queue priority, e.g., the first queue has been processed, but the second queue has not been processed, at which point the second queue may be processed preferentially and the processed first queue processed later; the priority level can be determined according to the size of the data volume, for example, the data volume of the first queue is larger, the data volume of the second queue is smaller, the first queue with large data volume can be designed to obtain more processing opportunities, and the second queue with small data volume can obtain less processing opportunities; the priority level may be determined based on the importance of the data, e.g. the importance of the data of the second queue is higher, the second queue is processed preferentially.
S104: and the processor slices the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice.
In a possible embodiment, the second queue data is data in a second queue, and the data in the second queue may be data for calculation or metadata.
In one possible implementation, when the second queue data is data for calculation, the second queue data itself may be sliced. Please refer to the related content in the first queue, which will not be described herein.
In a possible implementation manner, when the second queue data is metadata, slicing the data in the second buffer area corresponding to the second descriptor in the second queue to obtain a second slice. Specifically, when the first flag bit in the second available ring is detected to be set, the data in the second buffer area corresponding to the second descriptor in the second queue may be sliced to obtain a second slice. The second available ring is an available ring corresponding to the first queue. Because the situation is substantially similar to that of the first queue, please refer to the related content in the first queue, and the description will not be repeated here.
In one possible embodiment, when the queue data in the first queue is data for calculation, the queue data in the second queue may be metadata; alternatively, when the queue data in the first queue is metadata, the queue data in the second queue may be data for calculation; alternatively, when the queue data in the first queue is data for calculation, the queue data in the second queue may be data for calculation; alternatively, when the queue data in the first queue is metadata, the queue data in the second queue may be metadata.
S105: and processing the second slice, and queuing the second queue data or second residual data of the data to be processed corresponding to the second queue data again.
In one possible embodiment, when the second queue data is data for calculation, the second slice is processed and the second remaining data of the second queue data is re-queued. Wherein the second remaining data is data remaining after the second slice is cut out for the second queue data. Specifically, the processor performs a second calculation process on the second slice, wherein the second calculation process includes any one of an arithmetic operation, a bit operation, a logical operation, a floating point operation, and a string operation. Please refer to the related content in the first queue, which will not be described herein.
In a possible implementation manner, when the second queue data is metadata, the second slice is processed, and second remaining data of the data to be processed corresponding to the second queue data is re-queued, where the second remaining data is data remaining after the second slice is cut out by the data to be processed corresponding to the second queue data. Specifically, taking the first device accessing the second device through Direct Memory Access (DMA) as an example, at this time, the second buffer corresponding to the second descriptor is the memory area corresponding to the first device. The processor cuts out the second slice, subtracts the data size of the second slice from the data length in the second descriptor, generates a second corresponding descriptor for the second slice, determines a second offset, re-queues the second remaining data, transmits the second slice and/or the second corresponding descriptor from the first device to the second device through direct memory access, and stores a second context which is generated by processing the second slice and does not need to participate in queuing into a memory. Here, the second offset may be calculated from the data amount of the second slice, and used to generate the start address of the slice after the next slice, for example, the second offset may be equal to the data amount of the second slice, or the second offset may be an address calculated from the base address of the second buffer and the data of the second slice. The second offset may be processed in two ways: (1) Inserting the second offset into the second descriptor, and re-queuing the second descriptor inserted with the second offset. (2) And saving the second offset as a context to a memory, reading the second offset from the memory when the next slice is required to be sliced, and determining the starting address of the next slice according to the second offset. The data length in the second corresponding descriptor is equal to the data quantity of the second slice, and the buffer area corresponding to the second corresponding descriptor is the memory area corresponding to the second device. The second context that does not need to participate in queuing may be other information than the queue number and the second offset, such as a base address, a data type, a data constraint, a data access right, a data association, and so on. In the above examples, the first device accesses the second device through Direct Memory Access (DMA) is described as an example, and in practical applications, the second device may access the first device through Direct Memory Access (DMA), and the like, which is not particularly limited herein.
S106: the processor slices the first remaining data to obtain a third slice.
In one possible implementation, when the first queue data is data for calculation, the first remaining data itself in the first queue may be sliced. Taking the first queue data in the first queue as a picture as an example, the first remaining data may be an array of 1024×1022, then slicing the picture may be continued, for example, cutting 1024×3 data from the array of 1024×1022 as a third slice. It should be understood that the size of the third slice may be set as desired, and the sizes of the first slice and the third slice may be the same or different.
In a possible implementation manner, when the first queue data is metadata, a start address of first remaining data is determined according to the first offset, and data in a first buffer area corresponding to the first descriptor in the first queue is sliced to obtain a first slice. Here, there are two ways to obtain the first offset: (1) When the first offset is inserted into the first descriptor, it may be directly fetched from the returned dequeued first descriptor. (2) When the first offset is stored in the memory, the first offset may be read from the memory. Taking the example that the first buffer corresponding to the first descriptor stores 900Mbit (1 Gbit-100 Mbit) of the first residual data, 200Mbit may be cut out from the 900Mbit of the first residual data as the third slice. It should be understood that the size of the third slice may be set as desired, and the sizes of the first slice and the third slice may be the same or different.
S107: and the processor processes the third slice and re-queues the first queue data or third residual data of the data to be processed corresponding to the first queue data.
In one possible embodiment, when the first queue data is data for calculation, the first context is restored from the memory, the third slice is processed in the context of the first context, and the third remaining data of the first queue data is re-queued. The third remaining data is the data remaining after the first slice and the third slice are cut out of the first queue data. Specifically, the processor performs a third calculation process on the third slice, the third calculation process including any one of an arithmetic operation, a bit operation, a logical operation, a floating point operation, and a string operation. Taking the third slice as 1024×3 data as an example, the 1024×3 data may be subjected to an inverse operation, that is, a subtraction operation of the gray value of each pixel and the maximum gray value. And then, saving the calculated 1024 x 3 and the calculated 1024 x 2 data into a memory, saving a third context which is generated by calculation and is not to be queued into the memory, and returning the rest 1024 x 1019 data to the tail of the first queue for queuing again.
In a possible implementation manner, when the first queue data is metadata, a third slice is processed, and third remaining data of the data to be processed corresponding to the first queue data is re-queued, where the third remaining data is data remaining after the first slice and the third slice are cut off for the data to be processed corresponding to the first queue data. Specifically, taking the first device to access the second device through Direct Memory Access (DMA) as an example, the processor cuts out the third slice, subtracts the data size of the third slice from the data size of the first descriptor, adds the data size of the third slice to the first corresponding descriptor size, determines a third offset, and re-queues the third remaining data, restores the first context from the memory, transfers the third slice and/or the third corresponding descriptor from the first device to the second device through direct memory access in the environment of the first context, and saves the third context generated by processing the third slice and not needing to participate in queuing into the memory. Here, the third offset may be calculated from the data amounts of the first slice and the third slice, for example, the third offset may be equal to the data amount of the first slice plus the third slice, or the third offset may be an address calculated from the base address of the first buffer, the data amounts of the first slice, and the third slice. The third offset may be processed in two ways: (1) The third offset is inserted into the first descriptor, and the first descriptor inserted with the third offset is re-queued. (2) saving the third offset as a context to memory. The data length in the first corresponding descriptor is equal to the data amount of the first slice, and the third context that does not need to participate in queuing may be other information than the queue number and the third offset, such as a base address, a data type, a data constraint, a data access right, a data association relationship, and the like. The third offset may or may not cover the first offset, and may be added after the first offset. In the above examples, the first device accesses the second device through Direct Memory Access (DMA) is described as an example, and in practical applications, the second device may access the first device through Direct Memory Access (DMA), and the like, which is not particularly limited herein.
In the above examples, only the first queue and the second queue are taken as examples, and in practical applications, the third queue, the fourth queue, or even more queues may exist, which is not limited herein specifically.
In the scheme, each queue is sliced by the slicing technology, and the queuing is returned to be re-queued after slicing is finished, so that the situation that the next queue cannot be processed before the previous queue is finished can be avoided, for example, because the queue with more data volume is not finished, the queue with less data volume can only wait for the queue with more data volume to be processed to obtain the opportunity of processing, different queues have the opportunity of processing in a certain time, and the processing fairness is greatly improved.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a computing device provided in the present application. As shown in fig. 3, the computing device of the present embodiment includes: one or more processors 410, a communication interface 420, and a memory 430.
The processor 410, communication interface 420, and memory 430 are interconnected by a bus 440. Optionally, the computing device 400 may further include an input/output interface 450, where the input/output interface 450 is connected to an input/output device for receiving parameters set by a user, etc. The computing device 400 can be used to implement some or all of the functionality of the device embodiments or system embodiments described above in the embodiments of the present application; the processor 410 can also be used to implement some or all of the operational steps of the method embodiments described above in the embodiments of the present application. For example, specific implementations of the computing device 400 performing various operations may refer to specific details in the above-described embodiments, such as the processor 410 being configured to perform some or all of the steps of the above-described method embodiments or some or all of the operations of the above-described method embodiments. For another example, in the present embodiment, the computing device 400 may be configured to implement some or all of the functions of one or more components of the apparatus embodiments described above, and the communication interface 420 may be configured to implement communication functions and the like necessary for the functions of the apparatuses, components, and the processor 410 may be configured to implement processing functions and the like necessary for the functions of the apparatuses, components.
It should be appreciated that the computing device 400 of fig. 3 may include one or more processors 410, and that the processors 410 may cooperatively provide processing power in a parallelized connection, a serialized connection, a serial-parallel connection, or any connection, or the processors 410 may constitute a processor sequence or processor array, or the processors 410 may be separated into primary and secondary processors, or the processors 410 may have different architectures such as heterogeneous computing architectures. In addition, the computing device 400 shown in FIG. 3, the associated structural and functional descriptions are exemplary and not limiting. In some example embodiments, computing device 400 may include more or fewer components than shown in fig. 3, or combine certain components, or split certain components, or have a different arrangement of components.
The processor 410 may have various specific implementations, for example, the processor 410 may include one or more of a central processing unit (central processingunit, CPU), a graphics processor (graphic processing unit, GPU), a neural network processor (neural-networkprocessing unit, NPU), a tensor processor (tensor processing unit, TPU), or a data processor (data processing unit, DPU), which are not limited in this embodiment. Processor 410 may also be a single-core processor or a multi-core processor. Processor 410 may be comprised of a combination of a CPU and hardware chips. The hardware chip may be an application-specific integrated circuit (ASIC), a programmable logic device (programmable logicdevice, PLD), or a combination thereof. The PLD may be a complex programmable logic device (complexprogrammable logic device, CPLD), a field-programmable gate array (field-programmable gate array, FPGA), general-purpose array logic (generic array logic, GAL), or any combination thereof. The processor 410 may also be implemented solely with logic devices incorporating processing logic, such as an FPGA or digital signal processor (digital signal processor, DSP) or the like. The communication interface 420 may be a wired interface, which may be an ethernet interface, a local area network (local interconnect network, LIN), etc., or a wireless interface, which may be a cellular network interface, or use a wireless local area network interface, etc., for communicating with other modules or devices.
The memory 430 may be a nonvolatile memory such as a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an electrically erasable programmable ROM (electricallyEPROM, EEPROM), or a flash memory. Memory 430 may also be volatile memory, which may be random access memory (randomaccess memory, RAM) used as external cache. By way of example, and not limitation, many forms of RAM are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (double data rate SDRAM), enhanced SDRAM (ESDRAM), synchronous DRAM (SLDRAM), and direct memory bus RAM (DR RAM). Memory 430 may also be used to store program code and data such that processor 410 invokes the program code stored in memory 430 to perform some or all of the operational steps of the method embodiments described above, or to perform corresponding functions in the apparatus embodiments described above. Moreover, computing device 400 may contain more or fewer components than shown in FIG. 3, or may have a different arrangement of components.
The bus 440 may be a peripheral component interconnect express (peripheral component interconnect express, PCIe) bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, a unified bus (Ubus or UB), a computer quick link (compute express link, CXL), a cache coherent interconnect protocol (cache coherentinterconnect for accelerators, CCIX), or the like. The bus 440 may be divided into an address bus, a data bus, a control bus, and the like. The bus 440 may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. But is shown with only one bold line in fig. 3 for clarity of illustration, but does not represent only one bus or one type of bus.
Embodiments of the present application also provide a system that includes a plurality of computing devices, where each computing device may have a structure that refers to the structure of the computing device described above. The functions or operations that may be implemented by the system may refer to specific implementation steps in the above method embodiments and/or specific functions described in the above apparatus embodiments, which are not described herein. Embodiments of the present application also provide a computer-readable storage medium having stored therein computer instructions which, when executed on a computer device (e.g., one or more processors), may implement the method steps in the above-described method embodiments. The specific implementation of the processor of the computer readable storage medium in executing the above method steps may refer to specific operations described in the above method embodiments and/or specific functions described in the above apparatus embodiments, which are not described herein again. Embodiments of the present application also provide a computer program product comprising instructions stored on a computer-readable storage medium, which when run on a computer device, cause the computer device to perform the method steps in the method embodiments described above.
In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product comprises one or more computer instructions which, when loaded and executed on a computer, produce, in whole or in part, a process or function in accordance with embodiments of the present invention. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one network site, computer, server, or data center to another network site, computer, server, or data center via wired (e.g., coaxial cable, optical fiber, digital subscriber line) or wireless (e.g., infrared, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer and may also be a data storage device, such as a server, data center, etc., that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape, etc.), an optical medium (e.g., DVD, etc.), or a semiconductor medium (e.g., solid state disk), etc.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
Claims (11)
1. A method for fair processing of a queue, applied to a process of a processor queuing first queue data of a first queue and second queue data of a second queue, the method comprising:
slicing the first queue data or the data to be processed corresponding to the first queue data to obtain a first slice, wherein the first queue data is data in a first queue;
processing a first slice, and queuing the first queue data or first residual data of the data to be processed corresponding to the first queue data again, wherein the first residual data is the first queue data or the data to be processed corresponding to the first queue data which is residual after the first slice is cut off;
slicing the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice, wherein the second queue data is data in a second queue;
processing a second slice, and queuing the second queue data or second residual data of the data to be processed corresponding to the second queue data again, wherein the second residual data is the second queue data or the data to be processed corresponding to the second queue data which is residual after the second slice is cut off;
Slicing the first residual data to obtain a third slice;
processing a third slice, and queuing the first queue data or third residual data of the data to be processed corresponding to the first queue data again, wherein the third residual data is the first queue data or the data to be processed corresponding to the first queue data, which is remained after the first slice and the third slice are cut off;
the first queue is a queue of the virtual input/output device, the second queue is a queue of the virtual input/output device, and the virtual input/output device is used for realizing data transmission between the first device and the second device;
slicing the data to be processed corresponding to the first queue data to obtain a first slice, including:
slicing data in a first buffer area corresponding to a first descriptor in a first queue to obtain a first slice, wherein the first descriptor belongs to the first queue data;
processing the first slice, and queuing the first queue data or the first remaining data of the data to be processed corresponding to the first queue data again, including:
processing a first slice, determining a first offset, and queuing the first residual data again, wherein the first offset is calculated according to the data of the first slice and is used for generating a starting address of the third slice;
Slicing the data to be processed corresponding to the second queue data to obtain a second slice, including:
slicing data in a second buffer area corresponding to a second descriptor in a second queue to obtain a second slice, wherein the second descriptor belongs to the second queue data;
processing the second slice, and queuing the second queue data or second residual data of the data to be processed corresponding to the second queue data again, including:
processing a second slice, determining a second offset, and queuing second residual data again, wherein the second offset is calculated according to the data of the second slice and is used for generating a start address of a slice after the next slice;
slicing the first remaining data to obtain a third slice, comprising:
slicing the first residual data in the first buffer area corresponding to the first descriptor in the first queue to obtain a third slice;
processing the third slice, and queuing the first queue data or third remaining data of the data to be processed corresponding to the first queue data again, including:
and processing a third slice, determining a third offset, and queuing third residual data again, wherein the third offset is calculated according to the data of the first slice and the third slice and is used for generating a start address of the slice after the next slice.
2. The method of claim 1, wherein processing the first slice, determining the first offset, and re-queuing the first remaining data comprises:
processing the first slice, determining a first offset, inserting the first offset into the first descriptor, and queuing the first descriptor inserted with the first offset again.
3. The method of claim 1, wherein processing the first slice, determining the first offset, and re-queuing the first remaining data comprises:
and processing the first slice, determining a first offset, storing the first offset into a memory, and queuing the first residual data again.
4. A method according to any one of claims 1 to 3, wherein, in the case where the first slice is obtained by slicing the data to be processed corresponding to the first queue data, processing the first slice comprises:
generating a first corresponding descriptor, and subtracting the data size of a first slice from the data size in the first descriptor, wherein the data size in the first corresponding descriptor is equal to the data size of the first slice;
The first slice and/or the first corresponding descriptor is transferred between the first device and the second device via direct memory access.
5. The method of claim 4, wherein processing the third slice comprises:
subtracting the data size of the third slice from the data size of the first descriptor, and adding the data size of the third slice to the data size of the first corresponding descriptor;
and transmitting a third slice and/or the modified first corresponding descriptor between the first device and the second device through direct memory access.
6. A method according to any one of claims 1 to 3, wherein, in the case where the first slice is obtained by slicing the first queue data, processing the first slice comprises:
a first calculation process is performed on the first slice, the first calculation process including any one of an arithmetic operation, a bit operation, a logical operation, a floating point operation, and a string operation.
7. A method according to any one of claims 1 to 3, wherein after processing the first cut piece, the method further comprises:
and storing the context which is generated by processing the first slice and does not need to participate in queuing into a memory.
8. The method of claim 7, wherein processing the third slice comprises:
and restoring the context which does not need to participate in queuing from the memory, and processing a third slice in the environment of the restored context.
9. A method according to any one of claims 1 to 3, wherein before slicing the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice, the method further comprises:
comparing second queue data or the data to be processed corresponding to the second queue data with the priority level of the first residual data;
slicing the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice, including:
and slicing the second queue data or the data to be processed corresponding to the second queue data to obtain a second slice under the condition that the priority level of the second queue data or the data to be processed corresponding to the second queue data is higher than the priority level of the first residual data.
10. A computing device, comprising: the system comprises a processor, virtual input and output equipment, a first queue and a second queue, wherein the first queue is a queue of the virtual input and output equipment, the second queue is a queue of the virtual input and output equipment, and the virtual input and output equipment is used for realizing data transmission between the first equipment and the second equipment;
The processor is used for slicing first queue data or data to be processed corresponding to the first queue data to obtain a first slice, wherein the first queue data is data in a first queue;
the processor is used for processing the first slice and queuing the first queue data or first residual data of the data to be processed corresponding to the first queue data again, wherein the first residual data is the first queue data or the data to be processed corresponding to the first queue data which remains after the first slice is cut off;
the processor is used for slicing second queue data or data to be processed corresponding to the second queue data to obtain a second slice, wherein the second queue data is data in a second queue;
the processor is configured to process a second slice, and re-queue the second queue data or second remaining data of the data to be processed corresponding to the second queue data, where the second remaining data is the second queue data or remaining data of the data to be processed corresponding to the second queue data after the second slice is cut off;
The processor is used for slicing the first residual data to obtain a third slice;
the processor is configured to process a third slice, and re-queue the first queue data or third remaining data of the data to be processed corresponding to the first queue data, where the third remaining data is the first queue data or remaining data after the first slice and the third slice are cut off by the data to be processed corresponding to the first queue data;
the first queue is a queue of the virtual input/output device, the second queue is a queue of the virtual input/output device, and the virtual input/output device is used for realizing data transmission between the first device and the second device;
the processor is specifically configured to slice data in a first buffer area corresponding to a first descriptor in a first queue to obtain a first slice, where the first descriptor belongs to the first queue data;
the processor is specifically configured to process a first slice, determine a first offset, and re-queue the first remaining data, where the first offset is calculated according to the data of the first slice, and is used to generate a start address of the third slice;
The processor is specifically configured to slice data in a second buffer area corresponding to a second descriptor in a second queue to obtain a second slice, where the second descriptor belongs to the second queue data;
the processor is specifically configured to process a second slice, determine a second offset, and re-queue second remaining data, where the second offset is calculated according to the data of the second slice, and is used to generate a start address of a slice after a next slice;
the processor is specifically configured to slice first remaining data in a first buffer area corresponding to a first descriptor in the first queue to obtain a third slice;
the processor is specifically configured to process a third slice, determine a third offset, and re-queue third remaining data, where the third offset is calculated according to the data of the first slice and the third slice, and is used to generate a start address of a slice after a next slice.
11. A computer readable storage medium, characterized in that the computer readable storage medium stores computer instructions, which when run on a computer device, cause the computer device to perform the method according to any of claims 1-9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311250123.6A CN116991609B (en) | 2023-09-26 | 2023-09-26 | Queue fairness processing method, apparatus, and readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311250123.6A CN116991609B (en) | 2023-09-26 | 2023-09-26 | Queue fairness processing method, apparatus, and readable storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116991609A CN116991609A (en) | 2023-11-03 |
CN116991609B true CN116991609B (en) | 2024-01-16 |
Family
ID=88530528
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311250123.6A Active CN116991609B (en) | 2023-09-26 | 2023-09-26 | Queue fairness processing method, apparatus, and readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116991609B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117724874B (en) * | 2024-02-06 | 2024-04-26 | 珠海星云智联科技有限公司 | Method, computer device and medium for managing shared receive queues |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1798092A (en) * | 2004-12-29 | 2006-07-05 | 中兴通讯股份有限公司 | Fast weighted polling dispatching method, and fast weighted polling despatcher and device |
CN111708812A (en) * | 2020-05-29 | 2020-09-25 | 北京赛博云睿智能科技有限公司 | Distributed data processing method |
CN115858116A (en) * | 2022-11-28 | 2023-03-28 | 招联消费金融有限公司 | Case scheduling method, computer equipment and computer-readable storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
ITMI20011140A1 (en) * | 2001-05-30 | 2002-11-30 | Cit Alcatel | METHOD TO TRANSFER PACKAGES OF INFORMATION AND SYSTEM THAT USES IT |
US7496699B2 (en) * | 2005-06-17 | 2009-02-24 | Level 5 Networks, Inc. | DMA descriptor queue read and cache write pointer arrangement |
CN105162724B (en) * | 2015-07-30 | 2018-06-26 | 华为技术有限公司 | A kind of data are joined the team and go out group method and queue management unit |
-
2023
- 2023-09-26 CN CN202311250123.6A patent/CN116991609B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1798092A (en) * | 2004-12-29 | 2006-07-05 | 中兴通讯股份有限公司 | Fast weighted polling dispatching method, and fast weighted polling despatcher and device |
CN111708812A (en) * | 2020-05-29 | 2020-09-25 | 北京赛博云睿智能科技有限公司 | Distributed data processing method |
CN115858116A (en) * | 2022-11-28 | 2023-03-28 | 招联消费金融有限公司 | Case scheduling method, computer equipment and computer-readable storage medium |
Non-Patent Citations (1)
Title |
---|
基于双缓冲队列的海量地形数据并行处理方法;陈小潘 等;《郑州大学学报(工学版)》;第37卷(第3期);第6-10页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116991609A (en) | 2023-11-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN116991609B (en) | Queue fairness processing method, apparatus, and readable storage medium | |
CN115934625B (en) | Doorbell knocking method, equipment and medium for remote direct memory access | |
CN112698959A (en) | Multi-core communication method and device | |
CN115964319A (en) | Data processing method for remote direct memory access and related product | |
CN117573602B (en) | Method and computer device for remote direct memory access message transmission | |
US11671382B2 (en) | Technologies for coordinating access to data packets in a memory | |
CN115827506A (en) | Data writing method, data reading method, device, processing core and processor | |
CN113986969A (en) | Data processing method and device, electronic equipment and storage medium | |
CN109905331B (en) | Queue scheduling method and device, communication equipment and storage medium | |
CN117909031A (en) | Message processing method, computer equipment and medium for data processing unit | |
CN116340246B (en) | Data pre-reading method and medium for direct memory access read operation | |
CN112306827A (en) | Log collection device, method and computer readable storage medium | |
CN115576661A (en) | Data processing system, method and controller | |
US11188394B2 (en) | Technologies for synchronizing triggered operations | |
CN113326151A (en) | Inter-process communication method, device, equipment, system and storage medium | |
CN112559404A (en) | Data scheduling device and method and accelerated processing chip | |
US10007485B2 (en) | Zero-delay compression FIFO buffer | |
CN117687795B (en) | Hardware offloading method, device and medium for remote direct memory access | |
CN118368293B (en) | Data transmission method, computer equipment and medium | |
CN116166605B (en) | Data hybrid transmission method, device, DMA controller, medium and system | |
CN117573603B (en) | Data processing method and computer equipment for remote direct memory access | |
CN118138558B (en) | Message packet sending method based on direct memory access, computer equipment and medium | |
US20230418697A1 (en) | Data transmission system and related device | |
CN113626216A (en) | Method and system for optimizing network application performance based on remote direct data access | |
CN117749729A (en) | Data transmission method and device and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |