CN117370018A

CN117370018A - Accelerator, ordering method and heterogeneous computing system

Info

Publication number: CN117370018A
Application number: CN202311348784.2A
Authority: CN
Inventors: 马荣杰; 才华
Original assignee: Yusur Technology Co ltd
Current assignee: Yusur Technology Co ltd
Priority date: 2023-10-18
Filing date: 2023-10-18
Publication date: 2024-01-09

Abstract

The embodiment of the invention provides an accelerator, a sequencing method and a heterogeneous computing system. The accelerator includes: the reading unit is used for reading the data to be sequenced from the first storage space according to the sequencing instruction of the processor; the sorting unit is used for reading part of data in the data to be sorted into a sorting container for initial sorting to obtain a sorting sequence filled with preset capacity of the sorting container, and sorting the rest data in the data to be sorted and the data in the sorting sequence to update the sorting sequence, wherein the preset capacity of the sorting container corresponds to a preset sorting interception length; and the writing unit is used for writing the updated ordering sequence into the second storage space as an ordering result of the ordering instruction. The scheme of the embodiment of the invention improves the calculation efficiency of the accelerator, thereby improving the calculation efficiency of the heterogeneous calculation system.

Description

Accelerator, ordering method and heterogeneous computing system

Technical Field

The embodiment of the invention relates to the technical field of computers, in particular to an accelerator, a sequencing method and a heterogeneous computing system.

Background

The data processing unit (Data Processing Unit, DPU) acts as an accelerator and may be implemented as a dedicated data processing chip. A very high performance improvement can be obtained when processing complex data calculations compared to a processor such as a CPU.

In performing data computing tasks, heterogeneous computing systems including a CPU and an accelerator such as a DPU may be employed, with computing processing of the respective subtasks offloaded from the CPU to the accelerator, thereby improving computing performance. Such heterogeneous computing architectures may focus the CPU on performing scheduling of distributed computing tasks and the accelerator on performing data computations in subtasks (e.g., sparkSQL), particularly when accelerating the computational power of the required distributed computing tasks (e.g., apache Spark) such as big data scenarios.

In the computation of a query subtask such as SparkSQL, computation of a sort operator such as TopN operator needs to be performed. The traditional accelerator usually adopts the cooperation of a plurality of internal computing units when computing the sorting operator, and the intermediate computing results are required to be carried among different computing units in the computing process, so that the computing efficiency of the accelerator is lower, and the computing efficiency of a heterogeneous computing system is lower.

Disclosure of Invention

Accordingly, embodiments of the present invention provide an accelerator, a sorting method, and a heterogeneous computing system to at least partially solve the above-mentioned problems.

According to a first aspect of an embodiment of the present invention, there is provided an accelerator including: the reading unit is used for reading the data to be sequenced from the first storage space according to the sequencing instruction of the processor; the sorting unit is used for reading part of data in the data to be sorted into a sorting container for initial sorting to obtain a sorting sequence filled with preset capacity of the sorting container, and sorting the rest data in the data to be sorted and the data in the sorting sequence to update the sorting sequence, wherein the preset capacity of the sorting container corresponds to a preset sorting interception length; and the writing unit is used for writing the updated ordering sequence into the second storage space as an ordering result of the ordering instruction.

In another implementation of the present invention, the sorting unit is specifically configured to: and if the sorting priority of the current data in the rest data in the data to be sorted is higher than the sorting priority of the first data in the sorting sequence, sorting the current data before the first data, and deleting the last data in the sorting sequence at the same time so as to update the sorting sequence.

In another implementation of the present invention, the reading unit is specifically configured to: and if the remaining data still exist in the data to be sorted, reading the next data of the current data as the current data. The writing unit is specifically configured to: and if the data to be sequenced does not have residual data, writing the updated sequencing sequence into a second storage space as a sequencing result of the sequencing instruction.

In another implementation of the present invention, the sorting unit is further configured to: and discarding the current data if the sorting priority of the current data is behind the last data in the sorting sequence.

In another implementation of the present invention, the sorting unit is specifically configured to: the current data is ranked between the first data and the second data if the current data is prioritized after the first data and before the second data.

In another implementation of the present invention, the sorting unit determines that the first data and the second data are adjacent in the sorted sequence before the sorting unit sorts the current data to the first data and the second data.

In another implementation of the present invention, the sorting unit is specifically configured to: if the sorting priority of the current data in the data to be sorted is prior to the sorting priority of the third data in the sorting container, sorting the current data before the third data in the sorting container. And if the current data is ranked before the third data and before the fourth data, ranking the current data between the third data and the fourth data in the ranking container, wherein the third data and the fourth data are adjacent data. And if the current data is ranked behind the fourth data in priority, ranking the current data behind the fourth data in the ranking container.

In another implementation of the present invention, the sorting unit is specifically configured to: the sorting sequence is determined if the sorting container is filled after the current data is sorted in the sorting container.

In another implementation of the present invention, the sorting unit is further configured to: and creating the sequencing container with the preset capacity according to the sequencing instruction, and deleting the sequencing container after the writing unit writes the updated sequencing arrangement, wherein the sequencing instruction comprises the preset capacity.

In another implementation of the present invention, the accelerator further includes a communication unit configured to: the ordering instructions are fetched and a response of the ordering instructions is sent to the processor after the writing unit writes the updated ordering arrangement.

According to a second aspect of an embodiment of the present invention, a sorting method is provided. The sequencing method comprises the following steps: reading data to be sequenced from the first storage space according to the sequencing instruction of the processor; reading part of data in the data to be sorted into a sorting container for initial sorting to obtain a sorting sequence filled with preset capacity of the sorting container, wherein the preset capacity of the sorting container corresponds to a preset sorting interception length; sorting the rest data in the data to be sorted and the data in the sorting sequence to update the sorting sequence; and writing the updated ordering sequence into a second storage space as an ordering result of the ordering instruction.

According to a third aspect of embodiments of the present invention, a heterogeneous computing system is provided. The heterogeneous computing system includes a processor and an accelerator according to the first aspect.

According to the embodiment of the invention, the sorting sequence filling the sorting container is obtained by carrying out initial sorting in the sorting container, and the sorting sequence is updated by the local sorting in the sorting container, so that the global sorting of the data to be sorted is avoided, and the calculated amount is reduced. In addition, the initial sorting and the further sorting are performed in the sorting containers of the sorting units, so that the data are prevented from being carried in different computing units, the computing efficiency of the accelerator is improved, and the computing efficiency of the heterogeneous computing system is further improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and other drawings may be obtained according to these drawings for a person having ordinary skill in the art.

FIG. 1 is a schematic diagram of a distributed computing system according to some examples.

Fig. 2A and 2B are schematic diagrams of a sorting process of TopN operators according to some examples.

Fig. 3 is a schematic block diagram of an accelerator according to some embodiments of the invention.

Fig. 4A and 4B are schematic diagrams of an exemplary ordering process of the embodiment of fig. 3.

Fig. 5 is a flow chart of steps of a sorting process according to further embodiments of the present invention.

FIG. 6 is a schematic diagram of a heterogeneous computing system according to further embodiments of the present invention.

Detailed Description

In order to better understand the technical solutions in the embodiments of the present invention, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which are derived by a person skilled in the art based on the embodiments of the present invention, shall fall within the scope of protection of the embodiments of the present invention.

The implementation of the embodiments of the present invention will be further described below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of a distributed computing system according to some examples. The distributed computing system of fig. 1 includes a plurality of servers (also referred to as hosts) 10, each server 10 in which a virtualization container may be configured. The distributed computing framework 120 may be deployed in a virtualized container of individual servers. The data transmission of each server may be performed through the internet 20, and the internet 20 may be a network composed of intelligent network cards connected to the servers, a network composed of switches with hardware resource pools, or a combination of the two.

Further, the server 10 is further provided with a power layer 110, where the power layer 110 includes at least a processor 200 such as a CPU and an accelerator 300 such as a DPU, and the power layer 110 may further include a general-purpose accelerator (not shown) such as a GPU. The computational layer 110 in each server 10 constitutes a significant portion of the hardware resources of the distributed computing system.

Further, the structured processing interface 130 may be configured based on the distributed computing architecture 120, thereby obtaining the processing power of the structured data.

In some examples, the distributed computing architecture 120 may be a fast general-purpose computing engine designed for large-scale data computation, such as Apache Spark, that may optimize iterative workload in addition to being able to provide interactive queries. The structured processing interface 130 may be a special purpose device such as Spark SQL that processes structured data. Spark SQL is a computing module configured based on Apache Spark. Spark SQL allows users to perform a truncated query operation of SQL using standard SQL statements.

In one example employing the distributed computing system shown in FIG. 1, the processor 200 is configured to execute a schedule of computing tasks based on a distributed computing architecture 120, such as Apache Spark, to generate query subtasks. The processor then distributes the query subtasks to the accelerator 300 for computation. The accelerator 300 reads data to be calculated from the memory when the calculation of the query subtask is performed, and then writes the calculation result of the query subtask into the memory.

For a structured processing interface 130 such as Spark SQL, one important query subtask is the ordering subtask of the TopN operator. The function of the TopN operator is to sort according to a sort rule given by SQL (e.g., based on sort priority), and intercept the first N pieces of (maximum or minimum) data after sorting is completed. Referring to fig. 2a, the specific implementation procedure of the topn operator is mainly decomposed into respective processes of a sortOrder calculation unit, a gather calculation unit, and a slice calculation unit in the accelerator 300.

Specifically, the sortOrder computation unit is configured to obtain a globally ordered row index, and exemplary development pseudocode for its function is as follows:

definition:

* sortKeys: multiple column indexes to be ordered

* isconscreen: whether or not to sort in descending order

* areNullsSmallet: whether or not NULL is considered minimum

* gatherapep: column index corresponding to ordered sequence

* return code

Functional function:

public static native int sortOrder(long[]sortKeys,

boolean[]isDescending,

boolean[]areNullsSmallest,

long gatherMap)throws RaceException；

furthermore, the gather calculation unit is configured to obtain a globally ordered data table according to the index, and exemplary development pseudocode of the function thereof is as follows:

definition:

* columnIds: multiple column index of non-join

* joinedKeyId: key column index in the table after having been join processed

* join dcollumnids: multi-column index after join

* return code

Functional function:

public static native int gather(long[]columnIds,

long joinedKeyId,

long[]joinedColumnIds)；

furthermore, the slice computation unit is configured to extract the top N data from the data table, and exemplary development pseudocodes for its functions are as follows:

definition:

* columnIds: column index to be truncated

* start: start position

* end: end position

* results: column index corresponding to intercepted

* return: return code

Functional function:

public static native int slice(long columnIds,

long start,

long end,

long results)throws RaceException；

the ordering process of fig. 2A is exemplarily described below, and the sortOrder calculation unit orders data to be ordered (i.e., an original sequence) to obtain a global ordering sequence. Taking fig. 2A as an example, (c) has the largest sorting priority, (b) has the second largest sorting priority, and (a) has the third largest sorting priority, which is considered by null to be the smallest, so that in the resulting global sorted sequence, the first data is (c) index 2 in the original sequence. The second data is (b) index 3 in the original sequence. The third data (a) is an index 1 in the original sequence. The fourth data is the index of null in the original sequence. That is, the global ordering sequence is: 2. 3, 1 and 4. It should be understood that a, b, c are prioritized objects obtained by comparing non-null attribute values, and null is an object whose attribute value is null. In this example, it is defined that objects whose attribute values are non-null are ranked higher than those objects whose attribute values are null. Alternatively, objects whose attribute values are null may be defined to have other sorting priorities.

Further, the gather calculation unit acquires data according to the index after the global ordering sequence, because the global ordering sequence is 2, 3, 1, 4. Thus, the resulting ordered sequence is: c. b, a, null.

Further, the Slice calculation unit intercepts the first N data from the sequenced sequence, taking fig. 2A as an example, and takes the previous data of the sequenced sequence as c and b as the final sequencing result. Fig. 2B intuitively shows the correspondence between the data to be sorted and the sorting result.

As can be seen from the above examples, in the calculation process of the accelerator 300, handling of intermediate calculation results between different calculation units is required, and a global ordering process is involved, resulting in a lower overall calculation efficiency of the accelerator 300.

Further, fig. 3 illustrates an accelerator according to some embodiments of the invention. The accelerator includes:

the reading unit 310 is configured to read the data to be sorted from the first storage space according to the sorting instruction of the processor.

It should be appreciated that the first storage space may be a memory (or referred to as a main memory) or may be a local cache of the accelerator. And under the condition that the first storage space is a memory, the accelerator reads data to be ordered from the memory. In the case that the first storage space is a local cache, the accelerator reads data to be ordered from the memory to the local cache, and further reads the data to be ordered from the local cache.

It should also be understood that the accelerator may further include a communication unit, and the communication unit may acquire and parse the ordering instruction sent by the processor, and then send the parsed ordering instruction to the reading unit.

The sorting unit 320 is configured to read a portion of data in the data to be sorted into a sorting container for initial sorting, obtain a sorting sequence filled with a preset capacity of the sorting container, and sort the remaining data in the data to be sorted and the data in the sorting sequence to update the sorting sequence, where the preset capacity of the sorting container corresponds to a preset sorting interception length.

It should be appreciated that the ordering unit may create a pre-capacity sized ordering container from the ordering instructions. The order instruction may include information indicating a preset capacity.

It should also be appreciated that after the sort sequence fills the sort container, the length of the sort sequence does not increase with each sort. The length of the sorting sequence increases with each sorting before the sorting sequence fills the sorting container.

The writing unit 330 is configured to write the updated ordering sequence into the second storage space as an ordering result of the ordering instruction.

It should be appreciated that the second storage space may be a memory or may be a non-volatile storage medium such as an SSD.

It should also be appreciated that the ordering unit may delete the ordering container after the writing unit writes the updated ordering arrangement. After the writing unit writes the updated ordering arrangement, the communication unit may send a response to the ordering instruction to the processor, so that the processor further continues to schedule the query subtask based on the ordering result.

In the scheme of the embodiment of the invention, the sorting sequence filling the sorting container is obtained by carrying out initial sorting in the sorting container, and the sorting sequence is updated by the local sorting in the sorting container, so that the global sorting of the data to be sorted is avoided, and the calculated amount is reduced. In addition, the initial sorting and the further sorting are performed in the sorting containers of the sorting units, so that the data are prevented from being carried in different computing units, the computing efficiency of the accelerator is improved, and the computing efficiency of the heterogeneous computing system is further improved.

The following will describe in connection with an initial sorting procedure of the sorting container and an update procedure of the sorting sequence, respectively. It should be appreciated that the read-write unit reads the current data from the first memory space, either during the initial ordering process or during the updating process. After the ordering of the current data is completed, the next data is read as an update of the current data until there is no next data. Further, local ordering is performed in the ordering container, whether during an initial ordering process or an update process.

In addition, during the update of the sorting sequence, since the sorting sequence has a preset sequence length, the sorting container is in a full state, and the current data may be sorted into the sorting sequence or discarded. The data already ordered into the ordered sequence may be retained or discarded, e.g. at a certain ordering the current data is ordered with a higher priority than the current end data of the ordered sequence, the current data is ordered into the ordered sequence and the current end data is removed or rejected from the ordered sequence.

In the initial sorting process, the sorting container is in an unfilled state, and the current data is not discarded but is sorted into the sorting container, that is, the data amount in the sorting container is smaller than the preset capacity of the sorting container.

In addition, the reading unit of the embodiment obtains the current data index in the data to be sorted from the first storage space, and reads the current data corresponding to the current data index. In the present embodiment, it is defined that the sorting priority of the object whose attribute value is non-null is higher than the sorting priority of the object whose attribute value is null. Alternatively, objects whose attribute values are null may be defined to have other sorting priorities.

Further, fig. 4A shows an initial ordering process of one example of the ordering process of the TopN operator. The memory (or the local cache of the accelerator) stores data to be ordered, which are a (14), b (16), c (10), d (6), e (1), f (3), g (4) and h (8) in sequence from left to right. The preset capacity of the created sorting container is 5. It should be appreciated that the numerical values following a, b, c, d, e, f, g, h represent their ranking priorities, which may indicate a ranking rule, for example, ranking the numerical values from large to small, or prioritizing the numerical values from small to large, or other ranking rule. It should also be understood that the various data described above are merely exemplary and should not be construed as specific characters.

The data (current data) is sequentially read into the sorting container in the reading order from left to right (an example of the reading order in the first storage space). When a is read as the current data, no data is in the sorting container, and the sorting container is in an unfilled state (< 5), and a is directly read into the sorting container.

When b is read as current data, the sorting container is now in an unfilled state (< 5), b is read directly into the sorting container, and b is sorted before a because the sorting priority 16 of b is before the sorting priority 14 of a. Without loss of generality, the sorting unit ranks the current data before the third data in the sorting container if the sorting priority of the current data precedes the sorting priority of the third data in the sorting container.

When c is read as the current data, the sorting container is in an unfilled state (< 5), and c is directly read into the sorting container. Since the sorting priority 10 of c is after the sorting priority 14 of a, c is sorted after a.

When d is read as the current data, the sorting container is in an unfilled state (< 5), and d is directly read into the sorting container. Since the sorting priority 6 of d is after the sorting priority 10 of c, d is sorted after c. Without loss of generality, the sorting unit sorts the current data after the fourth data in the sorting container if the sorting priority of the current data is after the fourth data.

When e is read as the current data, the sorting container is in an unfilled state (< 5), and e is directly read into the sorting container. Since e's sorting priority 1 is after d's sorting priority 6, e is sorted after d. Without loss of generality, the sorting unit sorts the current data between the third data and the fourth data, which are neighboring data, in the sorting container if the sorting priority of the current data follows the third data and precedes the fourth data.

Through the sorting priority comparison process, the previous data in the sorting container is sorted before the current data is sorted into the sorting container, so that the initial sorting in the sorting container is efficiently realized without traversing all the data in the sorting container.

Further, the sorting unit is specifically configured to: if the data in the sorting container reaches the preset capacity after the current data is sorted in the sorting container, the sorting sequence is determined. For example, after e is ranked into the ranking container, the data in the ranking container reaches a preset capacity of 5.

Further, fig. 4B shows an update procedure of the sorted sequence of one example of the sorting procedure of the TopN operator. After the initial sorting process of fig. 4A, the remaining data in the data to be sorted in the memory further includes f (3), g (4), and h (8).

When f is read as the current data, the sorting container is in the full state (=5), and since the sorting priority 3 of f is greater than the sorting priority 1 of the last data e, the data e is removed from the sorting sequence, and f is read into the sorting container. Since the sorting priority 3 of f is smaller than the sorting priority 6 of the data d, f is sorted to the end of the sorting sequence.

When reading g as current data, the sorting container is in the full state (=5) at this time, since the sorting priority 4 of g is greater than the sorting priority 3 of the last data f, the data f is removed from the sorting sequence, and g is read into the sorting container. Since the sorting priority 4 of g is smaller than the sorting priority 6 of data d, g is sorted to the end of the sorting sequence.

When h is read as the current data, the sorting container is in the full state (=5) at this time, since the sorting priority 8 of h is greater than the sorting priority 4 of the last data g, the data g is removed from the sorting sequence, and h is read into the sorting container. Since the sorting priority 8 of h is smaller than the sorting priority 10 of data c and is larger than the sorting priority 6 of d, h is sorted between data c and data d.

Without loss of generality, if the sorting priority of the current data precedes the sorting priority of the first data in the sorted sequence, the sorting unit sorts the current data before the first data while deleting the last data in the sorted sequence to update the sorted sequence. By comparing with the sorting priority of the first data and deleting the end data in the sorting sequence, the updating efficiency of the sorting sequence is improved.

Alternatively, the sorting unit is further configured to: and discarding the current data if the sorting priority of the current data is behind the last data in the sorting sequence. The updating efficiency of the sequencing sequence is further improved by comparing with the end data.

Alternatively, the following is used: if the current data is prioritized after the first data and prior to the second data, the current data is ranked between the first data and the second data. By comparing the sorting priority of the first data with the second data, the updating efficiency of the sorting sequence is improved.

Alternatively, the first data and the second data are adjacent in the sort sequence before the sorting unit sorts the current data into the first data and the second data. Through the sorting priority comparison process of the adjacent data, the previous data in the sorting sequence is sorted before the current data is sorted into the sorting sequence, so that all data in the sorting sequence does not need to be traversed, and the updating in the sorting sequence is efficiently realized.

In addition, when reading the current data from the data to be sorted, if there is still remaining data in the data to be sorted, the reading unit reads the next data of the current data as the current data. And if the data to be sequenced does not have residual data, the writing unit writes the updated sequencing sequence.

Further, the ranking container is configured to get a state that meets the ranking rule (which defines the ranking priority) after each ranking.

The processing of the accelerator of some embodiments of the invention is described above. An example of the development of pseudo code for the sorting unit is shown below:

definition:

* src_handles: data columns requiring ordering

* src_order_index_handles: data columns for ordering

* res_handles: ordered data columns

* isconscreen: whether or not to sort in descending order

* areNullsSmallet: whether or not NULL is considered minimum

* use_limit: ordering rules: true returns only the first N results

* limit_line_n: return only the first N results

* return code

Functional function:

public static native int slice(long[]src_handles,

Long[]src_order_index_handles,

Long[]res_handles,

boolean[]isDescending,

boolean[]areNullsSmallest,

Boolean use_limit,

long limit_line_n)throws RaceException；

the sorting process according to further embodiments of the present invention will be described below in connection with fig. 5. The ordering method of fig. 5 includes:

s510: and reading the data to be sequenced from the first storage space according to the sequencing instruction of the processor.

S520: and reading part of data in the data to be sorted into a sorting container for initial sorting to obtain a sorting sequence with preset capacity of filling the sorting container, wherein the preset capacity of the sorting container corresponds to the preset sorting interception length.

S530: and sorting the rest data in the data to be sorted and the data in the sorting sequence to update the sorting sequence.

S540: and writing the updated ordering sequence into the second storage space as an ordering result of the ordering instruction.

It should be appreciated that the first storage space may be a memory (or referred to as a main memory) or may be a local cache of the accelerator. And under the condition that the first storage space is a memory, the accelerator reads data to be ordered from the memory. In the case that the first storage space is a local cache, the accelerator reads data to be ordered from the memory to the local cache, and further reads the data to be ordered from the local cache. The accelerator may further include a communication unit that may acquire and parse the ordering instruction transmitted by the processor, and then transmit the parsed ordering instruction to the reading unit.

It should be appreciated that the ordering unit may create a pre-capacity sized ordering container from the ordering instructions. The order instruction may include information indicating a preset capacity. After the sorting sequence fills the sorting container, the length of the sorting sequence does not increase at each sorting. The length of the sorting sequence increases with each sorting before the sorting sequence fills the sorting container.

It should be appreciated that the second storage space may be a memory or may be a non-volatile storage medium such as an SSD. After the writing unit writes the updated ordering arrangement, the ordering unit may delete the ordering container. After the writing unit writes the updated ordering arrangement, the communication unit may send a response to the ordering instruction to the processor, so that the processor further continues to schedule the query subtask based on the ordering result.

In other examples, the sorting the remaining data in the data to be sorted with the data in the sorted sequence to update the sorted sequence includes: and if the sorting priority of the current data in the rest data in the data to be sorted is higher than the sorting priority of the first data in the sorting sequence, sorting the current data before the first data, and deleting the last data in the sorting sequence at the same time so as to update the sorting sequence.

In other examples, reading data to be sorted from a first storage space includes: and if the remaining data still exist in the data to be sorted, reading the next data of the current data as the current data. Writing the updated ordering sequence into a second storage space as an ordering result of the ordering instruction, including: and if the data to be sequenced does not have residual data, writing the updated sequencing sequence into a second storage space as a sequencing result of the sequencing instruction.

In other examples, the sorting the remaining data in the data to be sorted with the data in the sorted sequence to update the sorted sequence further includes: and discarding the current data if the sorting priority of the current data is behind the last data in the sorting sequence.

In other examples, the sorting the remaining data in the data to be sorted with the data in the sorted sequence to update the sorted sequence further includes: the current data is ranked between the first data and the second data if the current data is prioritized after the first data and before the second data.

In other examples, the method further comprises: before ranking the current data with the first data and the second data, it is determined that the first data and the second data are adjacent in the ordered sequence.

In other examples, reading part of the data in the data to be sorted into the sorting container for initial sorting to obtain a sorting sequence with preset capacity for filling the sorting container, including: if the sorting priority of the current data in the data to be sorted is prior to the sorting priority of the third data in the sorting container, sorting the current data before the third data in the sorting container.

In other examples, reading part of the data in the data to be sorted into the sorting container for initial sorting to obtain a sorting sequence with preset capacity for filling the sorting container, and further including: if the current data is prioritized after the third data and prior to the fourth data, then the current data is ranked between the third data and the fourth data in the ranking container.

In other examples, reading part of the data in the data to be sorted into the sorting container for initial sorting to obtain a sorting sequence with preset capacity for filling the sorting container, and further including: and if the current data is ranked behind the fourth data in priority, ranking the current data behind the fourth data in the ranking container. In one example, the third data and the fourth data are adjacent data.

In other examples, reading part of the data in the data to be sorted into the sorting container for initial sorting to obtain a sorting sequence with preset capacity for filling the sorting container, and further including: the sorting sequence is determined if the sorting container is filled after the current data is sorted in the sorting container.

In other examples, the ordering method further comprises: and creating the sequencing container with the preset capacity according to the sequencing instruction, and deleting the sequencing container after the writing unit writes the updated sequencing arrangement, wherein the sequencing instruction comprises the preset capacity.

In other examples, the ordering method further comprises: the ordering instructions are fetched and a response of the ordering instructions is sent to the processor after the writing unit writes the updated ordering arrangement.

The specific implementation of each step in the sorting method of the embodiment of the present invention may refer to the corresponding step in the embodiment of the accelerator and the corresponding description in the unit, and has the corresponding beneficial effects, which are not described herein. It will be clear to those skilled in the art that, for convenience and brevity of description, the specific operation of the above-described units may be described with reference to the corresponding processes in the foregoing accelerator embodiments, which are not repeated here.

FIG. 6 is a schematic diagram of a heterogeneous computing system according to further embodiments of the present invention. Heterogeneous computing system 110 includes processor 200 and accelerator 300. Heterogeneous computing system 110 can be implemented as a system on a chip, instructions or data can be transferred between processor 200 and accelerator 300 using a bus such as PCIe, and instructions or data can be transferred between accelerator 300 and the SSD via a bus such as NVMe. The accelerator 300 may be an accelerator in the above examples of the present invention, and the corresponding operations thereof are not described herein.

The embodiment of the invention also provides a computer storage medium, on which a computer program is stored, which when executed by a processor implements the sorting method described in any of the previous embodiments. The computer storage media includes, but is not limited to: a compact disk read Only (Compact Disc Read-Only Memory, CD-ROM), random access Memory (Random Access Memory, RAM), floppy disk, hard disk, magneto-optical disk, or the like.

The embodiment of the invention also provides a computer program product, which comprises computer instructions, wherein the computer instructions instruct a computing device to execute the operations corresponding to any ordering method.

In addition, it should be noted that, the information related to the user (including, but not limited to, user equipment information, user personal information, etc.) and the data related to the embodiment of the present invention (including, but not limited to, sample data for training the model, data for analyzing, stored data, presented data, etc.) are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related regulations and standards, and are provided with corresponding operation entries for the user to select authorization or rejection.

It should be noted that, according to implementation requirements, each component/step described in the embodiments of the present invention may be split into more components/steps, or two or more components/steps or part of operations of the components/steps may be combined into new components/steps, so as to achieve the objects of the embodiments of the present invention.

The methods according to embodiments of the present invention described above may be implemented in hardware, firmware, or as software or computer code storable in a recording medium such as a CD-ROM, RAM, floppy disk, hard disk, or magneto-optical disk, or as computer code originally stored in a remote recording medium or a non-transitory machine-readable medium and to be stored in a local recording medium downloaded through a network, so that the methods described herein may be processed by such software on a recording medium using a general purpose computer, a special purpose processor, or programmable or special purpose hardware such as an application specific integrated circuit (Application Specific Integrated Circuit, ASIC) or field programmable or gate array (Field Programmable Gate Array, FPGA). It is understood that a computer, processor, microprocessor controller, or programmable hardware includes a Memory component (e.g., random access Memory (Random Access Memory, RAM), read-Only Memory (ROM), flash Memory, etc.) that can store or receive software or computer code that, when accessed and executed by the computer, processor, or hardware, performs the methods described herein. Furthermore, when a general purpose computer accesses code for implementing the methods illustrated herein, execution of the code converts the general purpose computer into a special purpose computer for performing the methods illustrated herein.

Those of ordinary skill in the art will appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or as a combination of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present invention.

The above embodiments are only for illustrating the embodiments of the present invention, but not for limiting the embodiments of the present invention, and various changes and modifications may be made by one skilled in the relevant art without departing from the spirit and scope of the embodiments of the present invention, so that all equivalent technical solutions also fall within the scope of the embodiments of the present invention, and the scope of the embodiments of the present invention should be defined by the claims.

Claims

1. An accelerator, comprising:

the reading unit is used for reading the data to be sequenced from the first storage space according to the sequencing instruction of the processor;

the sorting unit is used for reading part of data in the data to be sorted into a sorting container for initial sorting to obtain a sorting sequence filled with preset capacity of the sorting container, and sorting the rest data in the data to be sorted and the data in the sorting sequence to update the sorting sequence, wherein the preset capacity of the sorting container corresponds to a preset sorting interception length;

and the writing unit is used for writing the updated ordering sequence into the second storage space as an ordering result of the ordering instruction.

2. The accelerator according to claim 1, wherein the sorting unit is specifically configured to:

and if the sorting priority of the current data in the rest data in the data to be sorted is higher than the sorting priority of the first data in the sorting sequence, sorting the current data before the first data, and deleting the last data in the sorting sequence at the same time so as to update the sorting sequence.

3. Accelerator according to claim 2, characterized in that the reading unit is specifically adapted to: if the remaining data still exist in the data to be sorted, reading the next data of the current data as the current data;

the writing unit is specifically configured to: and if the data to be sequenced does not have residual data, writing the updated sequencing sequence into a second storage space as a sequencing result of the sequencing instruction.

4. The accelerator according to claim 2, wherein the ranking unit is further configured to:

and discarding the current data if the sorting priority of the current data is behind the last data in the sorting sequence.

5. The accelerator according to claim 2, characterized in that the sorting unit is specifically configured to: the current data is ranked between the first data and the second data if the current data is prioritized after the first data and before the second data.

6. The accelerator of claim 5, wherein the ordering unit determines that the first data is adjacent to the second data in the ordered sequence before the ordering unit orders the current data with the first data and the second data.

7. The accelerator according to claim 1, wherein the sorting unit is specifically configured to:

ranking the current data in the ranking container before the third data in the ranking container if the ranking priority of the current data in the data to be ranked is earlier than the ranking priority of the third data in the ranking container;

ranking the current data in the ranking container between the third data and the fourth data, the third data and the fourth data being adjacent data, if the current data is ranked prior to the third data and prior to the fourth data;

and if the current data is ranked behind the fourth data in priority, ranking the current data behind the fourth data in the ranking container.

8. The accelerator according to claim 7, wherein the sorting unit is specifically configured to: the sorting sequence is determined if the sorting container is filled after the current data is sorted in the sorting container.

9. The accelerator of claim 1, wherein the ranking unit is further configured to: and creating the sequencing container with the preset capacity according to the sequencing instruction, and deleting the sequencing container after the writing unit writes the updated sequencing arrangement, wherein the sequencing instruction comprises the preset capacity.

10. The accelerator according to claim 1, further comprising a communication unit for: the ordering instructions are fetched and a response of the ordering instructions is sent to the processor after the writing unit writes the updated ordering arrangement.

11. A method of ordering comprising:

reading data to be sequenced from the first storage space according to the sequencing instruction of the processor;

reading part of data in the data to be sorted into a sorting container for initial sorting to obtain a sorting sequence filled with preset capacity of the sorting container, wherein the preset capacity of the sorting container corresponds to a preset sorting interception length;

sorting the rest data in the data to be sorted and the data in the sorting sequence to update the sorting sequence;

and writing the updated ordering sequence into a second storage space as an ordering result of the ordering instruction.

12. A heterogeneous computing system, comprising:

a processor;

the accelerator of any one of claims 1-10.