WO2023071566A1 - 数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品 - Google Patents

数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2023071566A1
WO2023071566A1 PCT/CN2022/118483 CN2022118483W WO2023071566A1 WO 2023071566 A1 WO2023071566 A1 WO 2023071566A1 CN 2022118483 W CN2022118483 W CN 2022118483W WO 2023071566 A1 WO2023071566 A1 WO 2023071566A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
sorting
tone
sorted
sub
Prior art date
Application number
PCT/CN2022/118483
Other languages
English (en)
French (fr)
Inventor
于潇宇
陈德炜
韩峰
李嘉昕
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP22885453.5A priority Critical patent/EP4328748A1/en
Publication of WO2023071566A1 publication Critical patent/WO2023071566A1/zh
Priority to US18/335,491 priority patent/US20230325149A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/06Arrangements for sorting, selecting, merging, or comparing data on individual record carriers
    • G06F7/08Sorting, i.e. grouping record carriers in numerical or other ordered sequence according to the classification of at least some of the information they carry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/445Program loading or initiating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt

Definitions

  • the present application relates to the technical field of the Internet, and in particular to a data processing method, device, computer equipment, computer-readable storage medium, and computer program product.
  • Sorting operations are widely used in scenarios such as databases, object recognition, and text analysis, and are common operations in artificial intelligence models and object detection models.
  • Representative sorting algorithms include Bitonic sort, Odd-even sort, Radix Sort, etc.
  • Bitonic sorting is parallel in parallel due to its unique parallelism and data-index independence. Computing and other fields have important application value.
  • Double tone sorting has a definite execution time and is very suitable for hardware implementation. However, during the execution of the sorting task, multiple rounds of traversal of the data to be sorted are required, which increases the execution time and reduces the sorting efficiency.
  • Embodiments of the present application provide a data processing method, device, computer equipment, computer-readable storage medium, and computer program product, which can not only reduce the execution time of sorting tasks, but also improve sorting efficiency.
  • An embodiment of the present application provides a data processing method, which is executed by a computer device, and includes: obtaining a data sorting request for a data sequence to be sorted, and calling C data double-tone sorting components according to the data sorting request; C is a positive value greater than 1 Integer; according to the data sequence to be sorted and C data double-tone sorting components, start B data double-tone sorting tasks; B is a positive integer greater than 1; B data double-tone sorting tasks are respectively associated with different data subsequences to be sorted, The B data subsequences to be sorted are all generated based on the data sequences to be sorted; according to the B data double-tone sorting tasks, run C data double-tone sorting components in parallel to obtain B outputted by the C data double-tone sorting components data sorting sub-results; a data sorting sub-result is used to represent the following results: the sorting result of the data subsequence to be sorted associated with
  • An embodiment of the present application provides a data processing device, including: a calling component module configured to obtain a data sorting request for a data sequence to be sorted, and call C data double-tone sorting components according to the data sorting request; C is a positive integer greater than 1 ;Start the task module, configured to start B data double-tone sorting tasks according to the data sequence to be sorted and C data double-tone sorting components; B is a positive integer greater than 1; B data double-tone sorting tasks are associated with different waiting Sorting data subsequences, the B data subsequences to be sorted are all generated based on the data sequence to be sorted; the parallel component module is configured to run C data double tone sorting components in parallel according to B data double tone sorting tasks, and obtain B data sorting sub-results output by C data bi-tone sorting components; a data sorting sub-result is used to represent the following results: a sorting result of the data subsequence to be sorted associated with
  • An embodiment of the present application provides a computer device, including: a processor, a memory, and a network interface; the processor is connected to the memory and the network interface, wherein the network interface is used to provide data communication functions, and the memory is used to store computer A program, the above-mentioned processor is used to call the above-mentioned computer program, so that the computer device executes the data processing method in the embodiment of the present application.
  • An embodiment of the present application provides a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and the computer program is adapted to be loaded by a processor and execute the data processing method in the embodiment of the present application.
  • An embodiment of the present application provides a computer program product, the computer program product includes a computer program, the computer program is stored in a computer-readable storage medium; the processor of the computer device reads the computer program from the computer-readable storage medium, and the processor Executing the computer program causes the computer device to execute the data processing method in the embodiment of the present application.
  • the computer device can generate multiple data double-tone sorting tasks for the data sequence to be sorted, and by running C data double-tone sorting components in parallel, multiple data double-tone sorting tasks can be executed simultaneously.
  • the data double-tone sorting task by executing multiple data double-tone sorting tasks at the same time, can reduce the sorting time of the data sequence to be sorted, that is, reduce the task execution time; and, based on the C data double-tone sorting components, the computer equipment can sort the B data The sorting sub-results are merged to obtain the data sorting result for the data sequence to be sorted.
  • FIG. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • Fig. 2 is a schematic flow chart of a data processing method provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a data processing scenario provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of another data processing scenario provided by the embodiment of the present application.
  • FIG. 5 is a schematic diagram of another data processing scenario provided by the embodiment of the present application.
  • Fig. 6 is a schematic flow chart of another data processing method provided by the embodiment of the present application.
  • FIG. 7 is a schematic structural diagram of a sorting accelerator provided in an embodiment of the present application.
  • Fig. 8 is a schematic flowchart of another data processing method provided by the embodiment of the present application.
  • FIG. 9 is a schematic diagram of another data processing scenario provided by the embodiment of the present application.
  • FIG. 10 is a schematic structural diagram of a data processing device provided in an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technique of computer science that attempts to understand the nature of intelligence and produce a new kind of intelligent machine that can respond in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive subject that involves a wide range of fields, including both hardware-level technology and software-level technology.
  • Artificial intelligence basic technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technology, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes several major directions such as computer vision technology, speech processing technology, natural language processing technology, machine learning/deep learning, automatic driving, and intelligent transportation.
  • Fig. 1 is a schematic diagram of a system architecture provided by an embodiment of the present application.
  • the system may include a business server 100 and a terminal cluster, and the terminal cluster may include: a terminal device 200a, a terminal device 200b, a terminal device 200c, ... .
  • the terminal device 200n may include one or more terminal devices, and this application does not limit the number of terminal devices.
  • a communication connection may exist between terminal clusters, for example, a communication connection exists between the terminal device 200a and the terminal device 200b, and a communication connection exists between the terminal device 200a and the terminal device 200c.
  • any terminal device in the terminal cluster can have a communication connection with the service server 100, for example, there is a communication connection between the terminal device 200a and the service server 100, wherein the above-mentioned communication connection does not limit the connection method, and can be directly connected through wired communication. or indirect connection, or direct or indirect connection through wireless communication, or through other methods, which are not limited in this application.
  • each terminal device in the terminal cluster shown in FIG. 1 can be installed with an application client, and when the application client runs in each terminal device, it can communicate with the service server 100 shown in FIG. 1 respectively. Data exchange between them, that is, the above-mentioned communication connection.
  • the application client can be a short video application, video application, live broadcast application, social networking application, instant messaging application, game application, music application, shopping application, novel application, browser and other application clients with data sorting function.
  • the application client can be an independent client, or an embedded sub-client integrated in a certain client (for example, a social client, an educational client, and a multimedia client, etc.), which is not limited here .
  • a terminal device may be selected as a target terminal device in the terminal cluster shown in FIG. 1 , for example, terminal device 200a is used as the target terminal device.
  • the terminal device 200 a may send a data sorting request for the data sequence to be sorted to the service server 100 .
  • the service server 100 After the service server 100 receives the data sorting request sent by the terminal device 200a, it can call C data double tone sorting components through the data sorting request, wherein the data sorting request carries the first total amount of data to be sorted in the data sequence to be sorted, C is a positive integer greater than 1, that is, calls at least two data double-tone sorting components; the business server 100 can enable B data double-tone sorting according to the first total quantity and the data storage capacity corresponding to the C data double-tone sorting components respectively task, where B is a positive integer greater than 1, that is, at least two data bitonal sorting tasks are started, and the B data bidirectional sorting tasks are associated with different data subsequences to be sorted, and the B data subsequences to be sorted are all based on The data sequence to be sorted is generated; the business server 100 can run C data double-tone sorting components in parallel according to the B data double-tone sorting tasks, and obtain B data sorting sub-results output by
  • the terminal device 200a can use the local C data double-tone sorting components to double-order the data to be sorted in the data sequence to be sorted. It can be understood that this process is consistent with the above-mentioned process in which the business server performs double-tone sorting on the data to be sorted in the data sequence to be sorted through C data double-tone sorting components, so it will not be described here, please refer to the above describe.
  • the system architecture can include multiple service servers, one terminal device can be connected to one service server, and each service server can obtain the data sorting request sent by the terminal device connected to it, so that the The data sequence to be sorted for which the obtained data sorting request is obtained is subjected to dual-tone sorting, and the data sorting result is returned to the terminal device connected thereto.
  • the method provided by the embodiment of the present application can be applied in circuits such as application specific integrated circuit (ASIC, Application Specific Integrated Circuit), field programmable logic gate array (FPGA, Field Programmable Gate Array), and the circuit can be applied in a chip, and the above chip It can be integrated in the central processing unit (CPU, Central Processing Unit), or in the AI chip, or in other types of chips.
  • the AI chip is integrated in the form of a peripheral high-speed connection standard (PCIE, PeriCeral Component Interconnect Express) in the business server.
  • PCIE peripheral high-speed connection standard
  • PCIE PeriCeral Component Interconnect Express
  • the method provided in the embodiment of the present application can be executed by a computer device, and the computer device includes but is not limited to a terminal device or a service server.
  • the business server can be an independent physical server, or a server cluster or distributed system composed of multiple physical servers, and can also provide cloud database, cloud service, cloud computing, cloud function, cloud storage, network service, cloud Cloud servers for basic cloud computing services such as communications, middleware services, domain name services, security services, CDN, and big data and artificial intelligence platforms.
  • Terminal devices include but are not limited to mobile phones, computers, intelligent voice interaction devices, smart home appliances, vehicle terminals, etc.
  • the terminal device and the service server may be connected directly or indirectly through wired or wireless means, which is not limited in this embodiment of the present application.
  • Fig. 2 is a schematic flow diagram of a data processing method provided by an embodiment of the present application.
  • the data processing method may be executed by a service server (for example, the service server 100 shown in Fig. 1 above), or by a terminal device (for example, the above-mentioned
  • the execution of the terminal device 200a) shown in FIG. 1 may also be executed interactively by the service server and the terminal device.
  • the embodiment of the present application takes the method executed by a service server as an example for illustration.
  • the embodiment of the present application can be applied to various scenarios, including but not limited to cloud technology, artificial intelligence, intelligent transportation, assisted driving, etc.
  • the data processing method may at least include the following steps S101 to S104.
  • Step S101 obtaining a data sorting request for the data sequence to be sorted, and calling C data double tone sorting components according to the data sorting request;
  • C is a positive integer greater than 1.
  • the data sorting request is obtained, and the data sorting request is sent to the task control component; the data sorting request carries the first total amount of data to be sorted in the data sequence to be sorted.
  • the data sequence to be sorted can be an unordered data sequence, such as sequence (0, 5, 2, 6, 8, 7, 6); the data sequence to be sorted can be a bitone sequence, such as sequence (0 , 2, 6, 9, 8, 7, 4, 1), the embodiment of the present application does not limit the sequence type of the data sequence to be sorted. It can be understood that the first total number of data to be sorted in the data sequence to be sorted is at least three.
  • the data sorting request can be sent from the terminal device to the business server, as described in Figure 1 above, or it can be generated locally by the business server.
  • the source of the data sorting request is not limited in this embodiment of the application .
  • the data sorting request may carry the data to be sorted in the data sequence to be sorted.
  • the service server may determine the first total quantity according to the data to be sorted in the data sequence to be sorted, or the data sorting request may carry the data to be sorted in the data sequence to be sorted.
  • the read address and the first total quantity corresponding to the sorted data then the service server can read the data to be sorted in the data sequence to be sorted from the read address.
  • the embodiment of the present application does not limit the data carried in the data acquisition request, and can The data carried in the data acquisition request is set according to the actual application.
  • the data bi-tone sorting component refers to a component that can perform bi-tone sorting on the data sequence to be sorted.
  • FIG. 3 is a schematic diagram of a data processing scenario provided by an embodiment of the present application.
  • the data sorting request 30a includes a first total quantity, a read address, a write address, and a write type, etc., wherein the write address is used to instruct the service server 30b to write the data sorting result to the address
  • the write type refers to whether the result type written to the write address is a data result type or an index result type, which can be referred to the description in step S104 below, and will not be described here.
  • the business server 30b After the business server 30b obtains the data sorting request 30a, it can call C (at least two) data double-tone sorting components according to the data sorting request 30a, please refer to the data double-tone sorting component in the embodiment corresponding to FIG. The description of the internal structure will not be described here. It can be understood that, in practical applications, the first total number may be equal to 2, or may be a positive integer greater than 2. The embodiment of the present application can be applied in a scenario where a data dual tone sorting component sorts the data sequence to be sorted.
  • Step S102 according to the data sequence to be sorted and C data double-tone sorting components, start B data double-tone sorting tasks; B is a positive integer greater than 1; B data double-tone sorting tasks are respectively associated with different data subsequences to be sorted , the B data subsequences to be sorted are all generated based on the data sequence to be sorted.
  • the task control component determine the quantity to be sorted associated with the first total quantity; the quantity to be sorted is equal to or greater than the first total quantity; obtain the data storage capacity corresponding to the C data dual tone sorting components respectively, and compare the quantity to be sorted with The total data storage capacity is compared to obtain the first comparison result; the total data storage capacity is equal to the sum of the C data storage capacities; and the B data double tone sorting tasks are started according to the first comparison result.
  • the process of starting B data dual-tone sorting tasks according to the first comparison result may include: if the first comparison result is that the number to be sorted is greater than the total data storage capacity, then obtaining the first number between the number to be sorted and the total data storage capacity Quantity ratio; start B data double tone sorting tasks according to the first quantity ratio; the product of C and the first quantity ratio is equal to B; the second total quantity carried by a data double tone sorting task is equal to the data storage capacity; the second total quantity It is used to represent the total number of data to be sorted in the subsequence of data to be sorted associated with a data bitone sorting task; if the first comparison result is that the number to be sorted is less than or equal to the total data storage capacity, then according to C
  • the tune sorting component starts B data double tune sorting tasks; B is equal to C.
  • the business server 30b sends the data sorting request 30a to the top-level control component, and the data sorting request 30a is sent to the task control component 30c through the top-level control component, which can also be understood as the top-level control component carrying the data sorting request.
  • the data and task control component 30c After the data and task control component 30c is configured, it sends a data sorting instruction (Sort request) for the data sequence to be sorted to the task control component 30c.
  • the business server 30b determines the quantity to be sorted associated with the first total quantity, for example, the first total quantity corresponding to the data sequence to be sorted is 31, and 31 is closest to the 5th power of 2, then the quantity to be sorted is 2 5 , which is 32. In the embodiment of the present application, it is enough that the first total quantity satisfies the condition of a positive integer greater than 2, and the quantity to be sorted is equal to or greater than the first total quantity, and is a power of 2; the business server 30b acquires C data bitonically sorted The data storage capacity corresponding to each component. Usually, in order not to increase the complexity of parameter configuration, the C data storage capacities are consistent, for example, they are all 32.
  • the total data storage capacity is 32*C; further, the number to be sorted is compared with the total data storage capacity to obtain the first comparison result. It should be noted that the above examples are only for convenience of description and understanding.
  • the service server 30b obtains the first quantity ratio between the quantity to be sorted and the total data storage capacity, such as the ratio K shown in Fig. 3 ,
  • the data storage capacity corresponding to a data double tone sorting component is 8
  • C is equal to 4
  • the total data storage capacity is equal to 32
  • the number to be sorted is 64
  • K 2.
  • the second total quantity carried by a data bi-tonal sorting task can be set as the data storage capacity, that is, the total quantity of data to be sorted in the data subsequence to be sorted associated with a data bi-tonal sorting task is equal to the data Storage capacity, obviously, the business server 30b needs to start 8 data dual-order sorting tasks through the task control component 30c.
  • the result of the first comparison is that the number to be sorted is less than or equal to the total data storage capacity, for example, the data storage capacity corresponding to a data dual tone sorting component is 64, and C is equal to 4, then the total data storage capacity is equal to 256, and if the number to be sorted is 64 , obviously, starting 4 data double tone sorting tasks at this time can complete the sorting of the data to be sorted in the data sequence to be sorted in one round.
  • the second total quantity associated with a data bitone sorting task is equal to the ratio of the quantity to be sorted to C, that is, 64/4.
  • the task control component 30c parses the parameters (including the first total quantity and data storage capacity, etc.), and splits the sorting task for the data sequence to be sorted into sorting subtasks of the same number as the data double tone sorting component ( That is, the data double tone sorting task), and send a data double tone sorting task to a Sort unit (ie, the data double tone sorting component).
  • the embodiment of the present application also considers the storage area allocated by each Sort unit (that is, the data storage capacity).
  • the sorting task can also be split into K*C sub
  • the task (that is, the data double tone sorting task) is completed in k rounds, and each round also sends C subtasks, that is, a data double tone sorting task is sent to a Sort unit.
  • the task control component 30c sends out the data double tone sorting task, it waits for the task completion signal.
  • Step S103 according to the B data double-tone sorting tasks, run C data double-tone sorting components in parallel to obtain B data sorting sub-results output by the C data double-tone sorting components; one data sorting sub-result is used to represent the following Result: the sorting result of the data subsequence to be sorted associated with a data bitone sorting task.
  • B data double-tone sorting tasks are distributed to C data double-tone sorting components; wherein, one data double-tone sorting task is distributed to one data double-tone sorting component;
  • the data double-tone sorting component runs B data double-tone sorting tasks in parallel, and obtains data sorting sub-results output by C data double-tone sorting components respectively.
  • the B data double tone sorting tasks include the data double tone sorting task D e , e is a positive integer and e is less than or equal to B; the B data subsequences to be sorted include the data subsequences to be sorted corresponding to the data double tone sorting task D e Sequence A e ; C data bitone sorting components include data bitone sorting component F e for performing data bitone sorting task D e .
  • B is greater than C, split the B data double-tone sorting tasks into K*C data double-tone sorting tasks; K is the maximum number of running rounds corresponding to the C data double-tone sorting components; the i-th running round
  • the C data bi-tone sorting tasks in this time are distributed to C data bi-tone sorting components; among them, one data bi-tone sorting task is distributed to one data bi-tone sorting component; i is a positive integer less than or equal to K; through C
  • the data double-tone sorting component runs in parallel the C data double-tone sorting tasks in the i-th round of operation, and obtains the data sorting sub-results output by the C data double-tone sorting components, until i is equal to K and obtains the B data The data sorting sub-results corresponding to the dual tone sorting task respectively.
  • step S102 there are two situations in which the data double-adjustment sorting task is enabled in the embodiment of the present application.
  • the operation process of the single-round data bi-tone sorting component is different from the data bi-tone sorting component when the number to be sorted is less than or equal to the total data storage capacity.
  • the parallel running process is the same, the only difference between the two is that after each parallel running of C data double tone sorting components, the former needs to store the C data sorting sub-results generated in a single round to the external memory to execute the next round C data double tone sorting tasks, and the latter does not need to store the C data sorting sub-results in the external memory, but can directly merge the C data sorting sub-results to obtain the data sorting results corresponding to the data to be sorted, so in this application
  • B equal to C
  • the scenario where B is greater than C can refer to the description below.
  • FIG. 4 is a schematic diagram of another data processing scenario provided by the embodiment of the present application.
  • C is set to 4
  • the task control component 40a starts four data double tone sorting tasks, such as the data double tone sorting task 401a, the data double tone sorting task 402a, the data double tone sorting task 402a,
  • the business server invokes four data double tone sorting components, such as the data double tone sorting component S 1 , the data double tone sorting component S 2 , and the data double tone sorting component S 2 as shown in Figure 4 Sorting component S 3 and data dual tone sorting component S 4 .
  • the data double tone sorting component S1 executes the data double tone sorting task 401a, and the data subsequence 401b to be sorted corresponding to the data double tone sorting task 401a is (10, 20, 5, 9), and the data double tone sorting task 401a is (10, 20, 5, 9), and the data double tone
  • the sorting component S2 executes the data double tone sorting task 402a, the data subsequence 402b to be sorted corresponding to the data double tone sorting task 402a is (3, 8, 12, 14), and the data double tone sorting component S3 executes the data double tone sorting Task 403a, the data subsequence 403b to be sorted corresponding to the data double tone sorting task 403a is (90, 0, 60, 40), the data double tone sorting component S4 executes the data double tone sorting task 404a, the data double tone sorting task 404a The corresponding data subsequence 404b to be sorted is (23, 35, 95, 18).
  • the business server can firstly generate a unique index identifier corresponding to each data to be sorted in order for each data to be sorted in the sequence of data to be sorted.
  • the embodiment of this application does not limit the form of the index identifier. It may be any kind of information that can be used to identify each piece of data to be sorted in the data sequence to be sorted.
  • the present application takes 4-digit (bit) binary system as an example, as shown in Figure 4, the index mark of 10 is 0000, the index mark of 20 is 0001, the index mark of 5 is 0010, and the index mark of 9 0011, the index ID of 3 is 0100, the index ID of 8 is 0101, the index ID of 12 is 0110, the index ID of 14 is 0111, the index ID of 90 is 1000, the index ID of 0 is 1001, and the index ID of 60 is The index of 1010 and 1011 is 40, the index of 23 is 1100, the index of 35 is 1101, the index of 95 is 1110, and the index of 18 is 1111.
  • the business server runs B (4 in Fig. 4) data bitune sorting tasks in parallel through C (4 in Fig. 4 ) data bitune sorting components to obtain C data bitune sorting tasks Adjust the data sorting sub-results output by the sorting component respectively.
  • a dual tone sorting component executes a data dual tone sorting task, and the sorting logic of each data dual tone sorting component is consistent, so here the data dual tone sorting component S 1 is used to perform the data dual tone sorting task 401a is described as an example, and the execution process of the rest of the data double tone sorting components can refer to the description below, and will not be repeated.
  • the embodiment of the present application compares the data to be sorted (original data to be sorted) respectively corresponding to the two index identifiers, and updates the data to be sorted respectively corresponding to the two index identifiers (the data to be sorted after comparison) according to the comparison result.
  • the data subsequence 401b to be sorted is (10, 20, 5, 9), so the second total number is equal to 4, and there are two bi-tone sorting stages, As shown in the first dual-tone sorting stage 401h in Figure 4, and the second dual-tone sorting stage 402h, and in the second dual-tone sorting stage 402h, two rounds of sorting are included, as shown in Figure 4.
  • the logical parameters of a round of sorting include the ascending comparison between the index mark 0000 and the index mark 0001, and the descending order comparison between the index mark 0010 and the index mark 0011, so the first double-tone sorting stage 401h
  • the output result is that the index ID 0000 is updated as the data to be sorted 10, the index ID 0001 is updated as the data to be sorted 20, the index ID 0010 is updated as the data to be sorted 9, and the index ID 0011 is updated as the data to be sorted 5.
  • the business server writes the above-mentioned output result into the storage component 401d corresponding to the data double-tone sorting component S1 , and after the writing is successful, starts the second double-tone sorting stage 402h.
  • the sorting round logic parameters of the first round of sorting 401i include index identification 0000 and index identification 0010 for ascending comparison, and index identification 0001 and index identification 0011 for comparison. Ascending comparison, so the output result of the first sorting round 401i is that the index ID 0000 is updated as the data to be sorted 9, the index ID 0010 is updated as the data to be sorted 10, the index ID 0001 is updated as the data to be sorted 5, and the index ID 0011 is updated as The data to be sorted is 20.
  • the service server caches the output result of the first round of sorting 401i in the data acquisition subcomponent of the data dual-tone sorting component S1, and does not need to be written into the storage component 401d. Therefore, when the second round of sorting is started In the sorting round 402i, the data double tone sorting component S 1 does not need to obtain the output result of the previous sorting round from the storage component 401d, but can directly obtain it from the local data acquisition subcomponent.
  • the logical parameters of the sorting round of the second round of sorting 402i include index identification 0000 and index identification 0001 for ascending comparison, index identification 0010 and index identification 0011 for ascending comparison, so the second round of sorting 402i
  • the output result is that the index ID 0000 is updated to the data to be sorted 5, the index ID 0001 is updated to the data to be sorted 9, the index ID 0010 is updated to the data to be sorted 10, the index ID 0011 is updated to the data to be sorted 20, and the second double tone sorting
  • the stage 402h ends, and at this time, the service server writes the output result of the second sorting round 402i (ie, the data sorting sub-result 401c in FIG. 4 ) into the storage component 401d.
  • the storage component 401d, the storage component 402d, the storage component 403d, and the storage component 404d shown in FIG. 4 are all local storages of the data dual tone sorting component.
  • the output result corresponding to the data double-tone sorting component S1 is the data sorting sub-result 401c
  • the corresponding output result of the data double-tone sorting component S2 is the data sorting sub-result 402c
  • the corresponding output result of the data double-tone sorting component S3 is
  • the output result is the data sorting sub-result 403c
  • the corresponding output result of the data double tone sorting component S4 is the data sorting sub-result 404c.
  • the Sort unit After the sorting is completed, the Sort unit writes the output result back to the memory, and at the same time sends a sorting completion signal to the task control component 40a, indicating that the data dual tone sorting task is completed.
  • Step S104 based on the C data double tone sorting components, merge the B data sorting sub-results to obtain a data sorting result for the data sequence to be sorted.
  • U data sorting sub-result pairs are obtained in the L merge; U is a positive integer; L is a positive integer; if L is 1, then U data sorting sub-result pairs are based on B data sorting generated by sub-results; if L is not 1, then the U data sorting sub-result pairs are generated based on the sorting sub-results of historical merged data, and the sorting sub-results of historical merged data are the sub-results obtained in the L-1th merge Result; based on C data double-tone sorting components, U data sorting sub-result pairs are merged to obtain U target merged data sorting sub-results; if U is equal to 1, then U target merged data sorting sub-results are determined as The data sorting result corresponding to the data sequence to be sorted; if U is greater than 1, then in the L+1th merge, based on the C data
  • the task control component After the task control component receives the completion signals of all subtasks (data dual tone sorting tasks), it starts the merge command for C data sorting sub-results. Similarly, the business server calls the Sort unit to complete, which is consistent with the sorting process in step S103 , the difference between the two is that the data subsequence to be sorted may be an unordered sequence, while the sequence corresponding to the data sorting subresult is an ordered sequence.
  • the data processing method provided by the embodiment of the present application can obtain a data sorting request for the data sequence to be sorted corresponding to the data reading and writing operation when performing data reading and writing operations, and then execute the above steps in S101 to S104 in response to the data sorting request
  • the step is to realize the data sorting of the data sequence to be sorted.
  • the data read and write operation may be a read operation or a write operation performed in any target application installed on the terminal device, and the data sequence to be sorted is the read data corresponding to the read operation or the write data corresponding to the write operation.
  • the data processing method of the embodiment of the present application can be applied at least in the following scenarios: when the stored data needs to be read from the cloud storage through the service server of the target application, the read operation can be performed on the client of the target application, and the service server responds to For the read operation, the stored data corresponding to the read operation is acquired from the cloud storage, that is, the above-mentioned data sequence to be sorted is obtained.
  • the method of the embodiment of the present application can be used to send a data sorting request to the service server through the client of the target application on the terminal device for the data sequence to be sorted; the business server responds to the data sorting request, Call C data double-tone sorting components, and according to the data sequence to be sorted and C data double-tone sorting components, start B data double-tone sorting tasks, run C data double-tone sorting components in parallel, and get the C data double-tone sorting components
  • the B data sorting sub-results output by the sorting component then, based on the C data double-tone sorting components, the B data sorting sub-results are merged to obtain the data sorting result for the data sequence to be sorted, thereby completing the data sorting in the cloud storage Data sorting processing of the acquired data sequence to be sorted.
  • the read operation requesting to read data from the cloud storage may request to read multiple data, and may have multiple read operations for different data, the read stored data is multiple
  • the method of the embodiment of the present application can be used for data sorting processing; and, since the embodiment of the present application is performing data sorting processing When , by running multiple data dual-tone sorting components in parallel, it can not only reduce the execution time of the sorting task for the data sequence to be sorted (that is, read the stored data), but also improve the sorting efficiency of the sorting task.
  • the data processing method of the embodiment of the present application can also be applied to the following scenarios: a shopping application is installed on the terminal device, the user performs a pull-down refresh operation on the client side of the shopping application, and the business server of the shopping application responds to the pull-down refresh operation and retrieves the A preset quantity of commodity information is recalled as information to be recommended, wherein the preset quantity of commodity information constitutes the aforementioned data sequence to be sorted.
  • the sorting method provided by the embodiment of the present application may be used to sort the data sequences to be sorted corresponding to the preset quantity of commodity information.
  • the client of the shopping application can send a data sorting request to the business server for the data sequence to be sorted; And C data double-tone sorting components, start B data double-tone sorting tasks, run C data double-tone sorting components in parallel, and obtain B data sorting sub-results output by C data double-tone sorting components; then, based on C data double tone sorting components merge B data sorting sub-results to obtain data sorting results for the data sequences to be sorted, thereby completing the data sorting process for the data sequences to be sorted corresponding to the preset quantity of commodity information.
  • the data processing method of the embodiment of the present application can also be applied to the following scenarios: in the field of smart transportation, the terminal device is a vehicle terminal, and a navigation application is installed on the vehicle terminal, and the service server of the navigation application responds to the navigation request, based on the input starting point and Destination information, generating navigation information, the navigation information includes but is not limited to at least one of the following: a plurality of location information, the estimated time of arrival for each location information, and the current road condition corresponding to each location information. All the data in the navigation information constitute the above-mentioned data sequence to be sorted. Before performing real-time navigation, the sorting method provided by the embodiment of the present application may be used to sort the navigation information corresponding to the data sequence to be sorted.
  • the client of the navigation application can send a data sorting request to the business server for the data sequence to be sorted;
  • the business server calls C data sorting components in response to the data sorting request, and uses And C data double-tone sorting components, start B data double-tone sorting tasks, run C data double-tone sorting components in parallel, and obtain B data sorting sub-results output by C data double-tone sorting components; then, based on C data double-tone sorting components merge B data sorting sub-results to obtain a data sorting result for the data sequence to be sorted, thereby completing the data sorting process for the data sequence to be sorted corresponding to the navigation information.
  • Data sorting processing can be performed on the data sequence to be sorted, and then real-time navigation can be performed based on the sorting result after the data sorting processing, so as to provide users with a navigation experience that is accurate and meets the user's navigation needs.
  • it can also reduce the execution time of the sorting task for the data sequence to be sorted, and improve the sorting efficiency of the sorting task.
  • FIG. 5 is a schematic diagram of another data processing scenario provided by the embodiment of the present application.
  • the business server can obtain two pairs of data sorting sub-results for the four data sorting sub-results shown in Figure 4, as shown in Figure 5, generated by the data sorting sub-result 401c and the data sorting sub-result 402c
  • the data sorting sub-result pair 401f; the data sorting sub-result pair 402f is generated from the data sorting sub-result 403c and the data sorting sub-result 404c.
  • the business server starts two data merging tasks, such as the data merging task 401e and the data merging task 402e in Figure 5, the data merging task 401e refers to merging the data sorting sub-result 401c and the data sorting sub-result 402c The data merging task 402e refers to the task of merging the data sorting sub-result 403c and the data sorting sub-result 404c.
  • the task control component 40a may distribute the data merging task 401e to the two data pairs associated with the data sorting sub-result pair 401f One or two data bi-tone sorting components in the data-tuning sorting components, that is, the bi-tone sorting component S 1 and the bi-tone sorting component S 2 shown in FIG. 4 .
  • the task control component 40a distributes the data merging task 401e to one of the two data bidirectional sorting components associated with the data sorting sub-result pair 401f (the bidirectional sorting component shown in FIG.
  • the data sort sub-result pair 401f is a bitone sequence, which includes (5, 9, 10, 20, 14, 12, 8, 3), Therefore, the fourth total number is equal to 8, and there are three sorting rounds, and the sorting round logic parameters of each sorting round are shown in FIG. 5 .
  • the logical parameters of the first round of sorting include: comparing index ID 0000 with index ID 0100 in ascending order, comparing index ID 0001 with index ID 0101 in ascending order, comparing index ID 0010 with index ID 0110 is compared in ascending order, and the index ID 0011 is compared with the index ID 0111 in ascending order, and the output result of the first round of sorting is that the index ID 0000 is updated as the data to be sorted 5, the index ID 0100 is updated as the data to be sorted 14, and the index ID 0001 is updated to data to be sorted 9, index ID 0101 is updated to data to be sorted 12, index ID 0010 is updated to data to be sorted 8, index ID 0110 is updated to data to be sorted 10, index ID 0011 is updated to data to be sorted 3, index ID 0111 is updated to the data to be sorted 20.
  • the business server caches the output results of the first round of sorting (i.e. 5, 9, 8, 3, 14, 12, 10, 20) in the data acquisition subcomponent of the data dual tone sorting component S1 , without Write to the storage component 401d, so when the second round of sorting is started, the data bidirectional sorting component S1 does not need to obtain the output result of the previous sorting round from the storage component 401d, and can directly obtain the subcomponent from the local data obtained from .
  • the first round of sorting i.e. 5, 9, 8, 3, 14, 12, 10, 20
  • the logical parameters of the second round of sorting include: ascending comparison between index identification 0000 and index identification 0010, ascending comparison between index identification 0001 and index identification 0011, and ascending order between index identification 0100 and index identification 0110 Comparison, compare the index ID 0101 and the index ID 0111 in ascending order, and the output result of the second round of sorting is that the index ID 0000 is updated as the data to be sorted 5, the index ID 0010 is updated as the data to be sorted 8, and the index ID 0001 is updated as Data to be sorted 3, index ID 0011 is updated to data to be sorted 9, index ID 0100 is updated to data to be sorted 10, index ID 0110 is updated to data to be sorted 14, index ID 0101 is updated to data to be sorted 12, index ID 0111 is updated to Data to be sorted 20.
  • the business server caches the output results of the second round of sorting (i.e. 5, 3, 8, 9, 10, 12, 14, 20) in the data acquisition subcomponent of the data dual tone sorting component S1 , without Write to the storage component 401d, so when the third round of sorting is started, the data bidirectional sorting component S1 does not need to obtain the output result of the previous sorting round from the storage component 401d, and can directly obtain the subcomponent from the local data obtained from .
  • the second round of sorting i.e. 5, 3, 8, 9, 10, 12, 14, 20
  • the logic parameters of the third round of sorting include: ascending comparison of index identification 0000 and index identification 0001, ascending comparison of index identification 0010 and index identification 0011, ascending order of index identification 0100 and index identification 0101 Comparison, compare the index ID 0110 and the index ID 0111 in ascending order, and the output result of the third round of sorting is that the index ID 0000 is updated as the data to be sorted 3, the index ID 0001 is updated as the data to be sorted 5, and the index ID 0010 is updated as Data to be sorted 8, index 0011 is updated to data to be sorted 9, index 0100 is updated to data to be sorted 10, index 0101 is updated to data to be sorted 12, index 0110 is updated to data to be sorted 14, index 0111 is updated to Data to be sorted 20.
  • the service server determines the output results of the third round of sorting (ie 3, 5, 8, 9, 10, 12, 14, 20) as the target merged data sorting sub-result 401g, and the target merged data
  • the sorted sub-result 401g is written to the storage component 401d.
  • the process of performing the data merging task 402e by the data double-tone sorting component S3 is consistent with the process described above, the difference is that the data double-tone sorting component S1 performs ascending sorting, and the data double-tone sorting component S3 performs descending sorting , so without going into details, the target merged data sorting sub-result 402g can be obtained.
  • the task control component 40a distributes a data merge task to the data double-tone sorting component according to the two target merged data sorting sub-results until all data sorting sub-results are completed merging to obtain a set of data sorting results.
  • the C data double tone sorting components all include an index generation sub-component, which is used to assign an index value to each data to be sorted in the data sequence to be sorted; the data sorting request carries the write address; the data sorting is obtained in order According to the index value corresponding to each data to be sorted in the result, an index value sequence with the same sequence length as the data sequence to be sorted is generated according to the obtained index value, where the sequence length is the same as the data to be sorted in the data sequence to be sorted.
  • the number has a mapping relationship, and the sequence length is used to represent the number of data to be sorted in the data sequence to be sorted; send a data write request for the index value sequence to the data read and write component, and write the index value sequence through the data read and write component into the write address.
  • the return form of the data sorting result can rearrange the data to be sorted in the original sequence in order from small to large or from large to small and then return it.
  • the original sequence data can be kept unchanged, and the index value (index) of each data to be sorted in the data sequence to be sorted in the sequence after sorting (that is, the data sorting result) is returned, so as to constitute the data to be sorted A sequence of index values of equal length.
  • the embodiment of the present application proposes a hardware-accelerated sorting computing architecture, which implements multiple sorting operations per clock cycle (clock), and improves computing efficiency compared with a central processing unit.
  • architectural parallelism refers to increasing the parallelism of the sorting unit (that is, the data double tone sorting component) and using multiple sorting units.
  • the local memory of the sorting accelerator can provide multiple read and write ports that can be accessed simultaneously; process optimization is It means that by modifying the execution order of the algorithm, the separate calculation processes are merged, and combined with the calculation pipeline, the number of rounds of memory read-comparison-memory write is reduced to achieve acceleration.
  • process optimization and architecture parallelism can improve the sorting speed and reduce the sorting time T from formula (1),
  • T is the execution time
  • n is the number to be sorted
  • N ceil(log2(n))
  • C and PS are the speedup ratio achieved by architecture parallelism and process optimization
  • ceil() is the round-up function.
  • the commonly used number of Sort units can be 2, 4, or 8, etc.
  • the computer device can generate multiple data double-tone sorting tasks for the data sequence to be sorted, and by running C data double-tone sorting components in parallel, multiple data double-tone sorting tasks can be executed simultaneously.
  • the data double tone sorting task by executing multiple data double tone sorting tasks at the same time, can reduce the sorting time of the data sequence to be sorted, that is, reduce the task execution time of the data sequence to be sorted; and, based on C data double tone sorting components, the computer The device merges the B data sorting sub-results to obtain the data sorting result for the data sequence to be sorted.
  • FIG. 6 is a schematic flowchart of another data processing method provided by an embodiment of the present application.
  • the method can be executed by a service server (for example, the service server 100 shown in FIG. 1 above), or by a terminal device (for example, the terminal device 200a shown in FIG. 1 above), or by the service server and the terminal device. implement.
  • the embodiment of the present application takes the method executed by the service server as an example for description.
  • the process of the data processing method includes the following steps S201 to S203 , and steps S201 to S203 are another embodiment of step S103 in the embodiment corresponding to FIG. 2 .
  • Step S201 in the data double tone sorting component F e , obtain the data subsequence A e to be sorted corresponding to the data double tone sorting task De e .
  • the data sorting request carries the read addresses corresponding to the data to be sorted in the data sequence to be sorted, and the sorting order for the data sequence to be sorted; in the data double tone sorting component F e , obtain the data double tone sorting task D e
  • FIG. 7 is a schematic structural diagram of a sorting accelerator provided in an embodiment of the present application.
  • the sorting accelerator includes a task control component, C data double-tone sorting components, storage components corresponding to the C data double-tone sorting components, a data selection component, and a data read-write component.
  • the data read-write component is used to obtain the data transfer request, and implement data transfer according to the data transfer request.
  • the data transfer request can be a data acquisition request.
  • the data read-write component obtains the initial to-be-sorted A data subsequence;
  • the data transfer request may be a data write request, and according to the data write request, the data read and write component writes the index value sequence to the write address.
  • the data selection component is used to allocate the read and write channels of the local memory to the data read and write components or multiple data dual tone sorting components as required.
  • Each storage component is used to write data and read data.
  • the written data may include writing the data subsequence to be sorted from the external storage component through the data reading and writing component, and writing the intermediate data returned by the dual tone sorting component. Sorting sub-results and data sorting sub-results; the read data may include data sub-sequences to be sorted, intermediate sorting sub-results, data sorting sub-results, etc.
  • Each data double tone sorting component is used to perform double tone sorting on the data subsequences to be sorted to obtain the data sorting subresults; it is also used to merge the data sorting subresults to obtain the data sorting results for the data sequence to be sorted.
  • the data dual tone sorting component may also be called a sorting (Sort) unit.
  • the task control component is used to obtain the data sorting request, start B data double-tone sorting tasks according to the data sorting request, and send a data double-tone sorting task to a data double-tone sorting component; it is also used to obtain each data double-tone sorting
  • the sorting completion signals returned by the sorting components based on the B sorting completion signals, start multiple data merging tasks, and distribute a data merging task to the data dual tone sorting component associated with the data sorting sub-results; it is also used to obtain the merging completion signal, sending the merge completion signal to the top-level control component; wherein, the merge completion signal is used to indicate that the data to be sorted in the data sequence to be sorted has been sorted.
  • Mission control components may also be referred to as control units.
  • the data read and write component can be direct memory access (DMA, Direct Memory Access), and the DMA can load the data to be sorted from the external memory (that is, the external storage component) outside the sorting (Sort) accelerator to the internal memory of the sorting accelerator (that is, the storage component), and write (store) the designated data in the internal storage of the sorting accelerator to the storage outside the sorting accelerator (that is, the external storage component).
  • DMA can be executed in an instruction queue mode, or in a polling mode, which is not limited in this embodiment of the present application.
  • the data selection component (MUX, Multiplexer) is the internal use selector of the sorting accelerator.
  • the business server allocates the read and write channels of the memory to DMA or multiple Sort units as needed.
  • the local memory supports C read channels and C write channels. It can be composed of multiple physical storage units spliced to realize simultaneous reading and writing of multiple channels, which is not limited here.
  • the storage component corresponding to the data dual tone sorting component refers to the local storage of the sorting accelerator.
  • the data to be sorted in the data sequence to be sorted is stored in the local storage of the sorting accelerator or external storage. If it is stored in the external storage, the business server reads the data through the data The component reads the data to be sorted from the external storage to the local storage to perform bitone sorting on the data to be sorted.
  • the sorting order is from small to large, then the supplementary data to be sorted is preferentially selected as positive infinity, and the supplementary data to be sorted is added to the high address side of the data to be sorted in the initial data subsequence to be sorted, that is, positive infinity is sorted After the data to be sorted; if the sorting order is from large to small, the supplementary data to be sorted is preferentially negative infinity, and the supplementary data to be sorted is added to the high address side of the data to be sorted in the initial data subsequence to be sorted, that is, Negative infinity sorts after the data to be sorted.
  • Step S202 perform bi-tonal sorting on the data subsequence A e to be sorted, and obtain a data sorting sub-result corresponding to the data subsequence A e to be sorted.
  • the data dual tone sorting component F e includes a sorting control subcomponent, a round control subcomponent, a data acquisition subcomponent, and a dual tone sorting subcomponent; when the sorting control subcomponent obtains a data loading completion notification, the round control subcomponent Component; the data loading completion notification is used to characterize the storage component corresponding to the data dual tone sorting component F e has loaded the data subsequence A e to be sorted; in the round control subcomponent, determine the data subsequence to be sorted according to the second total quantity
  • the double-tone sorting stage corresponding to A e and the double-tone sorting logic parameters corresponding to the double-tone sorting stage send the double-tone sorting stage and the double-tone sorting logic parameters corresponding to the double-tone sorting stage to the data acquisition subcomponent; the second total quantity It is used to characterize the total amount of data to be sorted in the data subsequence A e to be sorted; in the data acquisition subcomponent,
  • the process of determining the data sorting sub-result corresponding to the data sub-sequence A e to be sorted may include: the number of phases in the dual tone sorting stage is at least two; in the data acquisition subcomponent, according to at least two dual tone In the i-th bi-tone sorting phase in the sorting phase, the first data sorting transition sub-result is read from the storage component; i is a positive integer greater than 1, and the first data sorting transition sub-result is the i-1th bi-tone sorting
  • the bitone sorting logic parameters corresponding to the i-th bitone sorting stage include n rounds of sorting round logic parameters; n is determined based on i, and n is a positive integer greater than 1; according to the jth round Sorting round logical parameter, obtain the intermediate data to be sorted from the intermediate data sorting transition sub-result; j is a positive integer and j is less than
  • each data bi-tonal sorting component can form a sorting accelerator. It can be understood that the basic structure of each data bi-tonal sorting component is the same, as shown in Figure 7 In the above, the basic structure of each data double-tone sorting component is exemplified by the data double-tone sorting component 1.
  • the data double-tone sorting component 1 can include sorting control subcomponents, round control subcomponents, index generation subcomponents, data acquisition subcomponents, For the dual-tone sorting sub-components, in the dual-tone sorting scenario, the implementation functions of these sub-components can refer to the above step S202.
  • the sorting control subcomponent is used to obtain the data double-tone sorting task distributed by the task control component, and based on the data double-tone sorting task, the round control component is started; it is also used to obtain the merge sent by the round control subcomponent Completing the signal (including the sorting completion signal and the merging completion signal), and sending the merging completion signal to the task control component.
  • the sorting control subcomponent implements the signal interaction between the data dual tone sorting component and the task control component, and starts the round control subcomponent after receiving the enable signal.
  • the sort control subcomponent may be called a sort (Sort) control module.
  • the round control subcomponent is used to determine the dual-tone sorting stage and the dual-tone sorting logic parameters corresponding to the dual-tone sorting stage according to the data subsequence to be sorted, and to generate index subcomponents, data acquisition subcomponents, and dual-tone sorting subcomponents Issue the corresponding enable signal.
  • a turn control subassembly may be referred to as a turn control module.
  • the index generation subcomponent is used for reading the data to be sorted for the first time. For each set of data, a unique index identifier is generated and written into the data buffer.
  • the index generation subcomponent may be referred to as an index generation module.
  • the data acquisition subcomponent is used to implement a round of traversal of the data to be sorted in the subsequence of the data to be sorted.
  • the data acquisition subcomponent can be called a single-round data control and memory access address calculation module.
  • the dual-tone sorting subcomponent is used for the calculation sub-component in the dual-tone sorting process to achieve fast comparison of data.
  • a bitone sorting subcomponent may be referred to as a comparison module.
  • the method of process optimization can be adopted to combine the operation process of the same data in two or more rounds, so as to realize one-time reading and one-time writing of multiple local data, instead of multiple reading and writing.
  • the process reduces the number of read and write rounds of the data as a whole and achieves an acceleration effect.
  • step S103 please refer to the description of step S103 in the embodiment corresponding to FIG. 3 above.
  • Step S203 while running the data double tone sorting component F e , run B-1 data double tone sorting tasks in parallel through C-1 data double tone sorting components, and obtain the output values of C-1 data double tone sorting components respectively
  • C-1 data double-tone sorting components refer to: data double-tone sorting components except data double-tone sorting component F e among C data double-tone sorting components;
  • the tasks refer to: among the B data bi-tonal sorting tasks, data bi-tone sorting tasks other than the data bi-tone sorting task De .
  • the embodiment of the present application can reduce the time-consuming execution of double-tone sorting through architecture parallelism and process optimization. Combining with the embodiment corresponding to FIG.
  • the data is sent to the local storage, and multiple data bidirectional sorting components are called to complete the sorting, and the data sorting results are written back from the local storage to the external storage, and a response to the completion of the instruction is sent to the top-level control unit.
  • the computer device can generate multiple data double-tone sorting tasks for the data sequence to be sorted, and by running C data double-tone sorting components in parallel, multiple data double-tone sorting tasks can be executed simultaneously.
  • the data double tone sorting task by executing multiple data double tone sorting tasks at the same time, can reduce the sorting time of the data sequence to be sorted, that is, the task execution time; subsequently, based on the C data double tone sorting components, the computer equipment sorts the B data The sub-results are combined to obtain the data sorting result for the data sequence to be sorted.
  • FIG. 8 is a schematic flowchart of another data processing method provided by an embodiment of the present application.
  • the method can be executed by a service server (for example, the service server 100 shown in FIG. 1 above), or by a terminal device (for example, the terminal device 200a shown in FIG. 1 above), or by the service server and the terminal device. implement.
  • the embodiment of the present application takes the method executed by the service server as an example for description.
  • the data processing method includes steps S301 to S304 as follows, and steps S301 to S304 are another embodiment of step S104 in the embodiment corresponding to FIG. 2 .
  • Step S301 determining the fourth total quantity of data to be sorted associated with the data sorting sub-result pair Mp , and the corresponding data storage capacity of the data sorting component associated with the data sorting sub-result pair Mp;
  • the sorting component includes a data bitone sorting component associated with the data sorting sub-result pair M p .
  • Step S302 comparing the fourth total amount with the data storage capacity to obtain a second comparison result.
  • step S301 to step S302 please refer to the description of step S104 in the embodiment corresponding to FIG. 2 above, and details are not repeated here.
  • Step S303 according to the second comparison result and the data double-tone sorting component associated with the data sorting sub-result pair M p , perform double-tone sorting on the data sorting sub-result pair M p , and obtain the target merge corresponding to the data sorting sub-result pair M p Data sort subresults.
  • the data sorting sub-result pair Mp includes the first data sorting sub-result and the second data sorting sub-result; if the second comparison result is that the fourth total quantity is greater than the data storage capacity, then the first data sorting sub-result is equally divided into The first ordered data fragment and the second ordered data fragment divide the second data sorting sub-result into the third ordered data fragment and the fourth ordered data fragment; if the data in the first ordered data fragment is less than or is equal to the data in the second ordered data segment, and the data in the third ordered data segment is less than or equal to the data in the fourth ordered data segment, then based on the first data double tone sorting component, the first ordered data Fragments and the third ordered data fragments are bitonically sorted to obtain the third data sorting sub-result; the data sorting sub-results to Mp associated data bitonous sorting components include the first data bitonous sorting components; The tune sorting component performs double-tone sorting on
  • Data double tone sorting component based on the fifth ordered data segment, the fifth data sorted sub-result and the eighth ordered data segment, obtain the target merged data sorted sub-result corresponding to the data sorted sub-result pair Mp .
  • FIG. 9 is a schematic diagram of another data processing scenario provided by the embodiment of the present application.
  • the business server divides the two ordered sequences into two equally, and obtains four ordered sequences, that is, the ordered sequence S0-1 (equivalent to the first data sorting sub-result) is equally divided into the first
  • the ordered data fragment and the second ordered data fragment divide the ordered sequence S0-2 (equivalent to the second data sorting sub-result) into the third ordered data fragment and the fourth ordered data fragment; based on the first data
  • the double-tone sorting component performs double-tone sorting on the first ordered data segment and the third ordered data segment to obtain the third data sorting sub-result (equal to the ordered sequence S1-1 in Figure 9); based on the second data
  • the double-tone sorting component performs double-tone sorting on the second ordered data segment and the fourth ordered data segment to obtain the fourth data sorting sub-result (equal to the ordered sequence S1-2 in Figure 9); the third
  • Step S304 determining the U data sorting sub-results to the corresponding target merged data sorting sub-results as U target merged data sorting sub-results.
  • step S304 please refer to step S104 in the above embodiment corresponding to FIG. 2 .
  • the computer device can generate multiple data double-tone sorting tasks for the data sequence to be sorted, and by running C data double-tone sorting components in parallel, multiple data double-tone sorting tasks can be executed simultaneously.
  • the data double tone sorting task by executing multiple data double tone sorting tasks at the same time, can reduce the sorting time of the data sequence to be sorted, that is, the task execution time; subsequently, based on the C data double tone sorting components, the computer equipment sorts the B data The sub-results are combined to obtain the data sorting result for the data sequence to be sorted.
  • FIG. 10 is a schematic structural diagram of a data processing device provided by an embodiment of the present application.
  • the data processing device may be a computer program (including program code) running on a computer device, for example, the data processing device is an application software; the device may be configured to execute the corresponding steps in the method provided by the embodiment of the present application.
  • the data processing apparatus 1 may include: a call component module 11 , a task start module 12 , a parallel component module 13 and a merge result module 14 .
  • the calling component module 11 is configured to obtain a data sorting request for the data sequence to be sorted, and calls C data double-tone sorting components according to the data sorting request; C is a positive integer greater than 1; the task module 12 is started, configured to be based on the data to be sorted Sequence and C data double-tone sorting components, start B data double-tone sorting tasks; B is a positive integer greater than 1; B data double-tone sorting tasks are associated with different data subsequences to be sorted, B data sub-sequences to be sorted The sequences are all generated based on the data sequence to be sorted; the parallel component module 13 is configured to run C data double-tone sorting components in parallel according to the B data double-tone sorting tasks, and obtain the output by the C data double-tone sorting components B data sorting sub-results; one data sorting sub-result is used to represent the sorting result of the data subsequence to be sorted associated with a data bi-tone sorting
  • the calling component module 11 is also configured to obtain a data sorting request, and send the data sorting request to the task control component; the data sorting request carries the first total amount of data to be sorted in the data sequence to be sorted; then enable
  • the task module 12 may include: a first determination unit 121 , a first acquisition unit 122 and a start task unit 123 .
  • the first determination unit 121 is configured to, in the task control component, determine the quantity to be sorted associated with the first total quantity; the quantity to be sorted is equal to or greater than the first total quantity; the first acquisition unit 122 is configured to acquire C data The data storage capacity corresponding to each data dual tone sorting component in the double-tone sorting component, the number to be sorted is compared with the total data storage capacity, and the first comparison result is obtained; the total data storage capacity is equal to the sum of C data storage capacities; The starting task unit 123 is configured to start B data double tone sorting tasks according to the first comparison result.
  • the opening task unit 123 may include: a first opening subunit 1231 and a second opening subunit 1232 .
  • the first opening subunit 1231 is configured to obtain the first quantity ratio between the number to be sorted and the total data storage capacity if the first comparison result is that the quantity to be sorted is greater than the total data storage capacity; the first opening subunit 1231 is also It is configured to start B data double-tone sorting tasks according to the first quantity ratio; the product of C and the first quantity ratio is equal to B; the second total quantity carried by a data double-tone sorting task is equal to the data storage capacity; the second total quantity is used To characterize the following information: the total number of data to be sorted in the data subsequence to be sorted associated with a data bitone sorting task; the second opening subunit 1232 is configured to be less than or equal to the total number of sorted data if the first comparison result is For the data storage capacity, start B data bidirectional sorting tasks according to C data bidirectional sorting components; B is equal to C.
  • the parallel component module 13 may include: a first distribution unit 131 and a first parallel unit 132.
  • the first distribution unit 131 is configured to, if B is equal to C, then distribute B data bi-tone sorting tasks to C data bi-tone sorting components; wherein, one data bi-tone sorting task is sent to one data bi-tone sorting component;
  • the first parallel unit 132 is configured to run B data double-tone sorting tasks in parallel through the C data double-tone sorting components, and obtain data sorting sub-results respectively output by the C data double-tone sorting components.
  • the B data double-tone sorting tasks include data double-tone sorting tasks De e , e is a positive integer and e is less than or equal to B; the B data subsequences to be sorted include data double-tone sorting tasks corresponding to The data subsequence A e to be sorted; the C data double tone sorting components include the data double tone sorting component F e for performing the data double tone sorting task De; the first parallel unit 132 may include: the first generating subunit 1321 and The second generating subunit 1322 .
  • the first generation subunit 1321 is configured to obtain the data subsequence Ae to be sorted corresponding to the data double-tone sorting task De in the data double-tone sorting component Fe, perform double-tone sorting on the data sub-sequence Ae to be sorted, and obtain the data sub-sequence to be sorted
  • the second generation subunit 1322 is configured to run B-1 data double-tone sorting components in parallel through C-1 data double-tone sorting components while running the data double-tone sorting component F e
  • the task is to obtain the data sorting sub-results respectively output by C-1 data bi-tone sorting components;
  • C-1 data bi-tone sorting components refer to: C data bi-tone sorting components except data bi-tone sorting components F
  • B-1 data bi-tone sorting tasks refer to: among the B data bi-tone sorting tasks, data bi-tone sorting tasks
  • the data sorting request carries the read addresses corresponding to the data to be sorted in the data sequence to be sorted, and the sort order for the data sequence to be sorted;
  • the first generation subunit 1321 may include: a first acquisition subunit 13211 , the second acquiring subunit 13212 and the first determining subunit 13213 .
  • the first acquisition subunit 13211 is configured to, in the data double tone sorting component F e , acquire the target read address and the second total quantity carried by the data double tone sorting task D e ; the data to be sorted in the data sequence to be sorted are respectively The corresponding read address includes the target read address; the second total quantity is used to characterize the total quantity of data to be sorted in the data subsequence A e to be sorted; the second acquisition subunit 13212 is configured to send The data acquisition request of the address is sent to the data reading and writing component, and the initial data subsequence to be sorted is obtained from the target read address through the data reading and writing component; the initial subsequence to be sorted belongs to the data sequence to be sorted; the first determination subunit 13213 is configured as Determine the quantity difference between the second total quantity and the third total quantity; the third total quantity is used to characterize the total quantity of data to be sorted in the initial subsequence of data to be sorted; the first determining subunit 13213
  • the data double-tone sorting component F e includes a sorting control subcomponent, a round control subcomponent, a data acquisition subcomponent, and a double-tone sorting subcomponent;
  • the first generating subunit 1321 may include: a second determining subunit 13214 and the third determining subunit 13215.
  • the second determination subunit 13214 is configured to start the round control subcomponent when the sorting control subcomponent obtains the data loading completion notification; the data loading completion notification is used to indicate that the storage component corresponding to the data dual tone sorting component F e has been loaded and is waiting to be loaded.
  • Sorting data subsequence A e is also configured to, in the round control subcomponent, determine the double tone sorting stage and the double tone sorting stage corresponding to the data subsequence A e to be sorted according to the second total quantity Corresponding double-tone sorting logic parameters; the double-tone sorting stage and the double-tone sorting logic parameters corresponding to the double-tone sorting stage are sent to the data acquisition subcomponent; the second total quantity is used to represent the data to be sorted .
  • the third determining subunit 13215 is configured to obtain the intermediate data to be sorted according to the double-tone sorting stage and the double-tone sorting logic parameters corresponding to the double-tone sorting stage in the data acquisition subcomponent;
  • the data is sent to the double-tone sorting subcomponent, and the intermediate data to be sorted is sorted by the double-tone sorting subcomponent to obtain the intermediate sorting sub-result; according to the intermediate sorting sub-result, the data sorting sub-result corresponding to the data sub-sequence A e to be sorted is determined;
  • the data subsequence Ae to be sorted includes intermediate data to be sorted.
  • the number of stages in the bitone sorting stage is at least two; the third determination subunit 13215 is configured to, in the data acquisition subcomponent, sort according to the i-th bitone sorting stage in the at least two bitone sorting stages stage, read the first data sorting transition sub-result from the storage component; i is a positive integer greater than 1, and the first data sorting transition sub-result is the output result in the i-1th bitone sorting stage; the i-th The double-tone sorting logic parameters corresponding to the double-tone sorting stage include n rounds of sorting round logic parameters; n is determined based on i, and n is a positive integer greater than 1; the third determining subunit 13215 is also configured to Round sorting round logic parameter, obtain the intermediate data to be sorted from the intermediate data sorting transition sub-result; j is a positive integer and j is less than or equal to n; if j is 1, the intermediate data sorting transition sub-re
  • the parallel component module 13 may include: a split task unit 133 , a second distribution unit 134 and a second parallel unit 135 .
  • the split task unit 133 is configured such that if B is greater than C, then the B data double tone sorting tasks are split into K*C data double tone sorting tasks; K is the maximum running round corresponding to the C data double tone sorting components respectively times;
  • the second distribution unit 134 is configured to distribute the C data double-tone sorting tasks in the i-th round of operation to C data double-tone sorting components; wherein, a data double-tone sorting task is sent to a data double-tone sorting Adjust the sorting component; i is a positive integer less than or equal to K;
  • the second parallel unit 135 is configured to run in parallel the C data double-tone sorting tasks in the i-th round of operation through the C data double-tone sorting components, and obtain The data sorting sub-results respectively output by the C data bi-tonal sorting components until i equals K
  • the combined result module 14 may include: a second acquiring unit 141 , a first combining unit 142 , a second determining unit 143 and a second combining unit 144 .
  • the U data sorting sub-result pairs include data sorting sub-result pairs M p , p is a positive integer, and p is less than or equal to U;
  • the first merging unit 142 may include: a first quantity sub-unit 1421, a second The second quantity subunit 1422 , the third generation subunit 1423 and the fourth generation subunit 1424 .
  • the first quantity subunit 1421 is configured to determine the fourth total quantity of data to be sorted associated with the data sorting sub-result pair Mp , and the corresponding data storage capacity of the data bi-tone sorting component associated with the data sorting sub-result pair Mp ;
  • C data double-tone sorting components include data sorting sub-results associated with the data double-tone sorting component of M p ;
  • the second quantity subunit 1422 is configured to compare the fourth total quantity with the data storage capacity to obtain a second comparison Result;
  • the third generating subunit 1423 is configured to perform bitonal sorting on the data sorting sub-results to M p according to the second comparison result and the data sorting sub-results associated with the data sorting sub-results of M p to obtain the data sorting sub-results
  • the result pair M p corresponds to the target combined data sorting sub-results;
  • the data ranking sub-result pair Mp includes a first data sorting sub-result and a second data sorting sub-result;
  • the third generating subunit 1423 may include: a first averaging subunit 14231, a first sorting subunit 14232 , the second sorting subunit 14233 , the second averaging subunit 14234 , the third sorting subunit 14235 and the fourth sorting subunit 14236 .
  • the first averaging unit 14231 is configured to equally divide the first data sorting sub-results into the first ordered data fragment and the second ordered data fragment if the second comparison result is that the fourth total quantity is greater than the data storage capacity, and The second data sorting sub-results are equally divided into the third ordered data segment and the fourth ordered data segment; the first sorting subunit 14232 is configured such that if the data in the first ordered data segment is less than or equal to the second ordered data fragment, and the data in the third ordered data fragment is less than or equal to the data in the fourth ordered data fragment, then based on the first data double-tone sorting component, the first ordered data fragment and the third ordered data fragment.
  • the data fragments are bitonically sorted to obtain the third data sorting sub-result; the data sorting sub-result pairs the data bi-tone sorting components associated with Mp including the first data bi-tone sorting component; the second sorting subunit 14233 is configured to be based on the second The data double-tone sort
  • the C data dual tone sorting components all include an index generation subcomponent, and the index generation subcomponent is used to assign an index value to each data to be sorted in the data sequence to be sorted; the data sorting request carries a write address
  • the data processing device 1 may also include: a generating sequence module 15 and a writing sequence module 16 . Generate a sequence module 15, configured to sequentially obtain the index value corresponding to each data to be sorted in the data sorting result, and generate an index value sequence having the same sequence length as the data sequence to be sorted according to the obtained index value; write The sequence module 16 is configured to send a data write request for the index value sequence to the data read and write component, and write the index value sequence to the write address through the data read and write component.
  • the computer device can generate multiple data double-tone sorting tasks for the data sequence to be sorted, and by running C data double-tone sorting components in parallel, multiple data double-tone sorting tasks can be executed simultaneously.
  • the data double tone sorting task by executing multiple data double tone sorting tasks at the same time, can reduce the sorting time of the data sequence to be sorted, that is, reduce the task execution time of the data sequence to be sorted; and, based on C data double tone sorting components, the computer The device merges the B data sorting sub-results to obtain the data sorting result for the data sequence to be sorted.
  • FIG. 11 is a schematic structural diagram of a computer device provided by an embodiment of the present application.
  • the computer device 1000 may include: at least one processor 1001, such as a CPU, at least one network interface 1004, a user interface 1003, and a memory 1005.
  • the communication bus 1002 is used to realize connection and communication between these components.
  • the user interface 1003 may include a display screen (Display) and a keyboard (Keyboard), and the network interface 1004 may optionally include a standard wired interface or a wireless interface (such as a WI-FI interface).
  • the memory 1005 can be a high-speed RAM memory, or a non-volatile memory (Non-volatile Memory), such as at least one disk memory.
  • the memory 1005 may also be at least one storage device located away from the aforementioned processor 1001 .
  • the memory 1005 as a computer-readable storage medium may include an operating system, a network communication module, a user interface module, and a device control application program.
  • the network interface 1004 can provide a network communication function; the user interface 1003 is mainly used to provide an input interface for the user; and the processor 1001 can be used to call the device control application stored in the memory 1005
  • the computer device 1000 described in the embodiment of the present application can execute the description of the data processing method in the embodiment corresponding to FIG. 2 , FIG. 6 , and FIG.
  • the description of the data processing device 1 is omitted here.
  • the description of the beneficial effect of adopting the same method will not be repeated here.
  • the embodiment of the present application also provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program.
  • the computer program includes program instructions.
  • For the data processing method provided by each step in FIG. 2 please refer to the implementation methods provided by each step in FIG. 2 , FIG. 6 , and FIG. 8 , and details will not be repeated here. In addition, the description of the beneficial effect of adopting the same method will not be repeated here.
  • the above-mentioned computer-readable storage medium may be the data processing apparatus provided in any one of the foregoing embodiments or an internal storage unit of the above-mentioned computer equipment, such as a hard disk or memory of the computer equipment.
  • the computer-readable storage medium can also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (SMC, Smart Media Card), a secure digital (SD, Secure Digital) card, Flash card (flash card), etc.
  • the computer-readable storage medium may also include both an internal storage unit of the computer device and an external storage device.
  • the computer-readable storage medium is used to store the computer program and other programs and data required by the computer device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
  • An embodiment of the present application also provides a computer program product, where the computer program product includes a computer program or a computer instruction, and the computer program or computer instruction is stored in a computer-readable storage medium.
  • the processor of the computer device reads the computer program or computer instruction from the computer-readable storage medium, and the processor executes the computer program or computer instruction, so that the computer device can execute the embodiments corresponding to the above-mentioned Fig. 2 , Fig. 6 , and Fig. 8
  • the description of the data processing method in will not be repeated here.
  • the description of the beneficial effect of adopting the same method will not be repeated here.
  • the methods and related devices provided in the embodiments of the present application are described with reference to at least one of the method flow charts and structural diagrams provided in the embodiments of the present application.
  • the method flow charts, structural diagrams, or method flow charts can be implemented by computer program instructions and at least one of the following information of the structural schematic diagram: each flow and block, and the combination of flow and block in the flowchart and block diagram.
  • These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a A device for realizing the functions specified in one or more blocks of the flow chart and one or more blocks of the structural diagram.
  • These computer program instructions may also be stored in a computer-readable storage medium capable of directing a computer or other programmable data processing device to operate in a specific manner, so that the computer program stored in the computer-readable storage medium produces an article of manufacture comprising instruction means , the instruction device implements the functions specified in one or more blocks of the flow chart and one or more blocks in the structural schematic diagram.
  • These computer programs and instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process for execution on the computer or other programmable device
  • the instructions provide steps for realizing the functions specified in one or more blocks of the flowchart and one or more blocks in the structural representation.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

本申请实施例公开了一种数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品,可应用于云技术、人工智能、智慧交通、辅助驾驶等各种场景,该方法包括:在进行数据读写操作时,获取针对所述数据读写操作对应的待排序数据序列的数据排序请求;响应于数据排序请求,调用C个数据双调排序组件;根据待排序数据序列以及C个数据双调排序组件,开启B个数据双调排序任务;B个数据双调排序任务分别关联不同的待排序数据子序列,B个待排序数据子序列均是基于待排序数据序列所生成的;根据B个数据双调排序任务并行运行C个数据双调排序组件,得到B个数据排序子结果;基于C个数据双调排序组件对B个数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果。采用本申请,不仅可以降低针对待排序数据序列的排序任务的执行耗时,还可以提高排序任务的排序效率。

Description

数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品
相关申请的交叉引用
本申请基于申请号为202111242449.5、申请日为2021年10月25日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及互联网技术领域,尤其涉及一种数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品。
背景技术
排序操作大量应用于数据库、目标识别、文本分析等场景,是人工智能模型以及目标检测模型中的常见操作。代表性排序算法包括双调排序(Bitonic sort)、奇偶排序(Odd-even sort)以及基数排序(Radix Sort)等,其中,双调排序因自身特有的并行性和数据-索引独立性,在并行计算等领域有重要的应用价值。
双调排序具有确定的执行时间,十分适合硬件实现,但在排序任务执行过程中,需要对待排序数据进行多轮遍历,故增加了执行耗时,也降低了排序效率。
发明内容
本申请实施例提供一种数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品,不仅可以降低排序任务的执行耗时,还可以提高排序效率。
本申请实施例提供一种数据处理方法,该方法由计算机设备执行,包括:获取针对待排序数据序列的数据排序请求,根据数据排序请求调用C个数据双调排序组件;C为大于1的正整数;根据待排序数据序列以及C个数据双调排序组件,开启B个数据双调排序任务;B为大于1的正整数;B个数据双调排序任务分别关联不同的待排序数据子序列,B个待排序数据子序列均是基于待排序数据序列所生成的;根据B个数据双调排序任务,并行运行C个数据双调排序组件,得到由C个数据双调排序组件所输出的B个数据排序子结果;一个数据排序子结果用于表征以下结果:一个数据双调排序任务所关联的待排序数据子序列的排序结果;基于C个数据双调排序组件,对B个数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果。
本申请实施例提供一种数据处理装置,包括:调用组件模块,配置为获取针对待排序数据序列的数据排序请求,根据数据排序请求调用C个数据双调排序组件;C为大于1的正整数;开启任务模块,配置为根据待排序数据序列以及C个数据双调排序组件,开启B个数据双调排序任务;B为大于1的正整数;B个数据双调排序任务分别关联不同的待排序数据子序列,B个待排序数据子序列均是基于待排序数据序列所生成的;并行组件模块,配置为根据B个数据双调排序任务,并行运行C个数据双调排序组件,得到由C个数据双调排序组件所输出的B个数据排序子结果;一个数据排 序子结果用于表征以下结果:一个数据双调排序任务所关联的待排序数据子序列的排序结果;合并结果模块,配置为基于C个数据双调排序组件,对B个数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果。
本申请实施例提供一种计算机设备,包括:处理器、存储器、网络接口;上述处理器与上述存储器、上述网络接口相连,其中,上述网络接口用于提供数据通信功能,上述存储器用于存储计算机程序,上述处理器用于调用上述计算机程序,以使得计算机设备执行本申请实施例中的数据处理方法。
本申请实施例提供一种计算机可读存储介质,上述计算机可读存储介质中存储有计算机程序,上述计算机程序适于由处理器加载并执行本申请实施例中的数据处理方法。
本申请实施例提供一种计算机程序产品,该计算机程序产品包括计算机程序,该计算机程序存储在计算机可读存储介质中;计算机设备的处理器从计算机可读存储介质读取该计算机程序,处理器执行该计算机程序,使得该计算机设备执行本申请实施例中的数据处理方法。
在本申请实施例中,基于C个数据双调排序组件,计算机设备可以生成针对待排序数据序列的多个数据双调排序任务,通过并行运行C个数据双调排序组件,可以同时执行多个数据双调排序任务,通过同时执行多个数据双调排序任务,可以减少待排序数据序列的排序时间,即减少任务执行时间;并且,基于C个数据双调排序组件,计算机设备对B个数据排序子结果进行合并,可以得到针对待排序数据序列的数据排序结果。由此可见,本申请实施例通过并行运行C个数据双调排序组件,不仅可以降低针对待排序数据序列的排序任务的执行耗时,还可以提高排序任务的排序效率。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种系统架构示意图;
图2是本申请实施例提供的一种数据处理方法的流程示意图;
图3是本申请实施例提供的一种数据处理的场景示意图;
图4是本申请实施例提供的另一种数据处理的场景示意图;
图5是本申请实施例提供的再一种数据处理的场景示意图;
图6是本申请实施例提供的另一种数据处理方法的流程示意图;
图7是本申请实施例提供的一种排序加速器的结构示意图;
图8是本申请实施例提供的再一种数据处理方法的流程示意图;
图9是本申请实施例提供的又一种数据处理的场景示意图;
图10是本申请实施例提供的一种数据处理装置的结构示意图;
图11是本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基 于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
为了便于理解,首先对部分名词进行以下简单解释:
人工智能(AI,Artificial Intelligence)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习、自动驾驶、智慧交通等几大方向。
图1是本申请实施例提供的一种系统架构示意图,如图1所示,该系统可以包括业务服务器100以及终端集群,终端集群可以包括:终端设备200a、终端设备200b、终端设备200c、…、终端设备200n,可以理解的是,上述系统可以包括一个或者多个终端设备,本申请不对终端设备的数量进行限制。
终端集群之间可以存在通信连接,例如终端设备200a与终端设备200b之间存在通信连接,终端设备200a与终端设备200c之间存在通信连接。同时,终端集群中的任一终端设备可以与业务服务器100存在通信连接,例如终端设备200a与业务服务器100之间存在通信连接,其中,上述通信连接不限定连接方式,可以通过有线通信方式进行直接或间接地连接,也可以通过无线通信方式进行直接或间接地连接,还可以通过其它方式,本申请在此不做限制。
应当理解,如图1所示的终端集群中的每个终端设备均可以安装有应用客户端,当该应用客户端运行于各终端设备中时,可以分别与上述图1所示的业务服务器100之间进行数据交互,即上述的通信连接。其中,该应用客户端可以为短视频应用、视频应用、直播应用、社交应用、即时通信应用、游戏应用、音乐应用、购物应用、小说应用、浏览器等具有数据排序功能的应用客户端。其中,该应用客户端可以为独立的客户端,也可以为集成在某客户端(例如,社交客户端、教育客户端以及多媒体客户端等)中的嵌入式子客户端,在此不做限定。
为便于后续理解和说明,本申请实施例可以在图1所示的终端集群中选择一个终端设备作为目标终端设备,例如以终端设备200a作为目标终端设备。当获取到待排序数据序列,并接收到针对待排序数据序列的数据排序指令时,终端设备200a可以将针对待排序数据序列的数据排序请求发送至业务服务器100。业务服务器100接收到终端设备200a发送的数据排序请求后,可以通过数据排序请求调用C个数据双调排序组件,其中,数据排序请求携带有待排序数据序列中的待排序数据的第一总数量,C为大于1的正整数,即调用至少两个数据双调排序组件;业务服务器100根据第一总数量以及C个数据双调排序组件分别对应的数据存储容量,可以开启B个数据双调排序任务,其中,B为大于1的正整数,即开启至少两个数据双调排序任务,B个数据双调排序任务分别关联不同的待排序数据子序列,B个待排序数据子序列均是基于待排序数据序列所生成的;业务服务器100根据B个数据双调排序任务,可以并行运行C个数据双调排序组件,得到由C个数据双调排序组件所输出的B个数据排序子结果,其中,一个数据排序子结果用于表征一个数据双调排序任务所关联的待排序数据子序列的排序结果;基于C个数据双调排序组件,业务服务器100可以对B个数据排序子结果进行合并,得到针 对待排序数据序列的数据排序结果,最后,可以将数据排序结果写入至数据排序请求所携带的写入地址。
在一些实施例中,若终端设备200a的本地存储了上述C个数据双调排序组件,则终端设备200a可以通过本地的C个数据双调排序组件,对待排序数据序列中的待排序数据进行双调排序,可以理解的是,该过程与上述业务服务器通过C个数据双调排序组件对待排序数据序列中的待排序数据进行双调排序的过程一致,故此处不进行赘述,请参见上文的描述。
可以理解的是,系统架构中可以包括多个业务服务器,一个终端设备可以与一个业务服务器相连接,每个业务服务器可以获取到与之相连接的终端设备所发送的数据排序请求,从而可以对所获取到的数据排序请求所针对的待排序数据序列进行双调排序,并将数据排序结果返回给与之相连接的终端设备。
本申请实施例提供的方法可以应用在专用集成电路(ASIC,Application Specific Integrated Circuit)、现场可编程逻辑门阵列(FPGA,Field Programmable Gate Array)等电路中,该电路可以应用于芯片中,上述芯片可以集成在中央处理器(CPU,Central Processing Unit)中,或AI芯片中,或其他类型的芯片中,该AI芯片以周边设备高速连接标准(PCIE,PeriCeral Component Interconnect Express)板卡的形式集成在业务服务器中。
可以理解的是,本申请实施例提供的方法可以由计算机设备执行,计算机设备包括但不限于终端设备或业务服务器。其中,业务服务器可以是独立的物理服务器,也可以是多个物理服务器构成的服务器集群或者分布式系统,还可以是提供云数据库、云服务、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、CDN、以及大数据和人工智能平台等基础云计算服务的云服务器。终端设备包括但不限于手机、电脑、智能语音交互设备、智能家电、车载终端等。其中,终端设备和业务服务器可以通过有线或无线方式进行直接或间接地连接,本申请实施例在此不做限制。
可以理解的是,上述系统架构可适用于数据库、目标识别、文本分析等数据排序场景,这里将不对具体的业务场景进行一一列举。
图2是本申请实施例提供的一种数据处理方法的流程示意图,该数据处理方法可以由业务服务器(例如,上述图1所示的业务服务器100)执行,也可以由终端设备(例如,上述图1所示的终端设备200a)执行,还可以由业务服务器和终端设备交互执行。为便于理解,本申请实施例以该方法由业务服务器执行为例进行说明,此外,本申请实施例可应用于各种场景,包括但不限于云技术、人工智能、智慧交通、辅助驾驶等。如图2所示,该数据处理方法至少可以包括以下步骤S101至步骤S104。
步骤S101,获取针对待排序数据序列的数据排序请求,根据数据排序请求调用C个数据双调排序组件;C为大于1的正整数。
本申请实施例中,获取数据排序请求,将数据排序请求发送至任务控制组件;数据排序请求携带待排序数据序列中的待排序数据的第一总数量。
在本申请实施例中,待排序数据序列可以是无序数据序列,例如序列(0、5、2、6、8、7、6);待排序数据序列可以是双调序列,例如序列(0、2、6、9、8、7、4、1),本申请实施例不对待排序数据序列的序列类型进行限定。可以理解的是,待排序数据序列中的待排序数据的第一总数量为至少三个。
在本申请实施例中,数据排序请求可以是终端设备发送至业务服务器的,如上文图1中所描述的,也可以是业务服务器本地生成的,本申请实施例不对数据排序请求的来源进行限定。数据排序请求可以携带待排序数据序列中的待排序数据,此时,业务服务器可以根据待排序数据序列中的待排序数据确定第一总数量,或者,数据排序请求携带 待排序数据序列中的待排序数据分别对应的读取地址以及第一总数量,则业务服务器可以从读取地址读取待排序数据序列中的待排序数据,本申请实施例不对数据获取请求所携带的数据进行限定,可以根据实际应用对数据获取请求所携带的数据进行设定。在本申请实施例中,数据双调排序组件是指可以对待排序数据序列进行双调排序的组件。
请一并参见图3,图3是本申请实施例提供的一种数据处理的场景示意图。如图3所示,数据排序请求30a包括第一总数量、读取地址、写入地址以及写入类型等,其中,写入地址用于指示业务服务器30b将数据排序结果写入至该地址,写入类型是指写入至写入地址的结果类型是数据结果类型,还是索引结果类型,可以参见下文步骤S104中的描述,此处暂不展开描述。
业务服务器30b获取到数据排序请求30a后,可以根据该数据排序请求30a调用C个(至少两个)数据双调排序组件,请参见下文图6所对应的实施例中关于数据双调排序组件的内部结构描述,此处暂不展开叙述。可以理解的是,实际应用时,第一总数量可以等于2,也可以为大于2的正整数。本申请实施例可以应用在一个数据双调排序组件对待排序数据序列进行排序处理的场景。
步骤S102,根据待排序数据序列以及C个数据双调排序组件,开启B个数据双调排序任务;B为大于1的正整数;B个数据双调排序任务分别关联不同的待排序数据子序列,B个待排序数据子序列均是基于待排序数据序列所生成的。
在任务控制组件中,确定与第一总数量相关联的待排序数量;待排序数量等于或大于第一总数量;获取C个数据双调排序组件分别对应的数据存储容量,将待排序数量与总数据存储容量进行对比,得到第一对比结果;总数据存储容量等于C个数据存储容量的总和;根据第一对比结果开启B个数据双调排序任务。
这里,根据第一对比结果开启B个数据双调排序任务的过程可以包括:若第一对比结果为待排序数量大于总数据存储容量,则获取待排序数量以及总数据存储容量之间的第一数量比值;根据第一数量比值开启B个数据双调排序任务;C与第一数量比值的乘积等于B;一个数据双调排序任务所携带的第二总数量等于数据存储容量;第二总数量用于表征,一个数据双调排序任务所关联的待排序数据子序列中的待排序数据的总数量;若第一对比结果为待排序数量小于或等于总数据存储容量,则根据C个数据双调排序组件开启B个数据双调排序任务;B等于C。
请再参见图3,业务服务器30b将数据排序请求30a发送至顶层控制组件,通过顶层控制组件将数据排序请求30a发送至任务控制组件30c,也可以理解为顶层控制组件将数据排序请求所携带的数据与任务控制组件30c配置好后,发送针对待排序数据序列的数据排序指令(Sort request)至任务控制组件30c。
在任务控制组件30c中,业务服务器30b确定与第一总数量相关联的待排序数量,例如待排序数据序列对应的第一总数量为31,31最接近2的5次幂,则待排序数量为2 5,即32。在本申请实施例中,第一总数量满足大于2的正整数的条件即可,待排序数量等于或大于第一总数量,且为2的幂次;业务服务器30b获取C个数据双调排序组件分别对应的数据存储容量,通常情况下,为了不增加参数配置的复杂度,C个数据存储容量是一致的,例如均为32。假设C个数据存储容量均为32,则总数据存储容量为32*C;进一步,将待排序数量与总数据存储容量进行对比,得到第一对比结果。需要说明的是,上述示例只是为了便于叙述以及理解。
请再参见图3,若第一对比结果为待排序数量大于总数据存储容量,则业务服务器30b获取待排序数量以及总数据存储容量之间的第一数量比值,如图3所示例的比值K,例如一个数据双调排序组件对应的数据存储容量为8,C等于4,则总数据存储容量等于32,若待排序数量为64,则K=2。此时,可以将一个数据双调排序任务所携带的第 二总数量设置为数据存储容量,即一个数据双调排序任务所关联的待排序数据子序列中的待排序数据的总数量,等于数据存储容量,明显地,业务服务器30b需要通过任务控制组件30c开启8个数据双调排序任务。
若第一对比结果为待排序数量小于或等于总数据存储容量,例如一个数据双调排序组件对应的数据存储容量为64,C等于4,则总数据存储容量等于256,若待排序数量为64,明显地,此时开启4个数据双调排序任务即可一轮运行轮次将待排序数据序列中的待排序数据排序完毕。一个数据双调排序任务所关联的第二总数量,等于待排序数量与C的比值,即64/4。需要说明的是,上述示例只是为了便于叙述以及理解。
综上所述,任务控制组件30c解析参数(包括第一总数量以及数据存储容量等),将针对待排序数据序列的排序任务,拆分为与数据双调排序组件等同数量的排序子任务(即数据双调排序任务),并将一个数据双调排序任务发送到一个Sort单元(即数据双调排序组件)。此外,本申请实施例还考虑了每个Sort单元分配的存储区域情况(即数据存储容量),在总数据存储容量小于待排序数量的情况下,也可将排序任务拆分为K*C个子任务(即数据双调排序任务),分k轮运行轮次完成,每一轮同样发送C个子任务,即一个数据双调排序任务发送到一个Sort单元。任务控制组件30c发出数据双调排序任务后,等待任务的完成信号。
步骤S103,根据B个数据双调排序任务,并行运行C个数据双调排序组件,得到由C个数据双调排序组件所输出的B个数据排序子结果;一个数据排序子结果用于表征以下结果:一个数据双调排序任务所关联的待排序数据子序列的排序结果。
本申请实施例中,若B等于C,则将B个数据双调排序任务分发至C个数据双调排序组件;其中,一个数据双调排序任务分发至一个数据双调排序组件;通过C个数据双调排序组件,并行运行B个数据双调排序任务,得到C个数据双调排序组件分别输出的数据排序子结果。其中,B个数据双调排序任务包括数据双调排序任务D e,e为正整数且e小于或等于B;B个待排序数据子序列包括数据双调排序任务D e对应的待排序数据子序列A e;C个数据双调排序组件包括用于执行数据双调排序任务D e的数据双调排序组件F e
若B大于C,则将B个数据双调排序任务拆分为K*C个数据双调排序任务;K为C个数据双调排序组件分别对应的最大运行轮次;将第i轮运行轮次中的C个数据双调排序任务分发至C个数据双调排序组件;其中,一个数据双调排序任务分发至一个数据双调排序组件;i为小于或等于K的正整数;通过C个数据双调排序组件,并行运行第i轮运行轮次中的C个数据双调排序任务,得到C个数据双调排序组件分别输出的数据排序子结果,直至i等于K时得到与B个数据双调排序任务分别对应的数据排序子结果。
由上文步骤S102可知,本申请实施例有两种开启数据双调排序任务的情况,一种是待排序数量小于或等于总数据存储容量,即B=C,此时,开启与数据双调排序组件数量等同的数据双调排序任务,即C个数据双调排序任务;另一种是待排序数量大于总数据存储容量,此时开启K*C个数据双调排序任务,并通过并行运行C个数据双调排序组件K轮,执行完K*C个数据双调排序任务。可以理解的是,在待排序数量大于总数据存储容量的情况下,单轮数据双调排序组件的运行过程,与待排序数量小于或等于总数据存储容量的情况下的数据双调排序组件的并行运行过程是一致的,两者区别仅在于,前者每并行运行完C个数据双调排序组件后,需要将单轮生成的C个数据排序子结果存储至外部存储器,以执行下一轮的C个数据双调排序任务,而后者不需要将C个数据排序子结果存储至外部存储器,可以直接对C个数据排序子结果进行合并,得到待排序数据对应的数据排序结果,故在本申请实施例中,仅描述B等于C的场景,B大于C的场景可以参见下文的描述。
请一并参见图4,图4是本申请实施例提供的另一种数据处理的场景示意图。如图4所示,本申请实施例将C设定为4,任务控制组件40a开启4个数据双调排序任务,如图4所示例的数据双调排序任务401a、数据双调排序任务402a、数据双调排序任务403a以及数据双调排序任务404a,业务服务器调用4个数据双调排序组件,如图4所示例的数据双调排序组件S 1、数据双调排序组件S 2、数据双调排序组件S 3以及数据双调排序组件S 4。其中,通过所接收到的数据双调排序任务,数据双调排序组件获取待排序数据子序列的过程,请参见下文图6所对应的实施例,此处暂不展开描述。请再参见图4,数据双调排序组件S 1执行数据双调排序任务401a,数据双调排序任务401a所对应的待排序数据子序列401b为(10、20、5、9),数据双调排序组件S 2执行数据双调排序任务402a,数据双调排序任务402a所对应的待排序数据子序列402b为(3、8、12、14),数据双调排序组件S 3执行数据双调排序任务403a,数据双调排序任务403a所对应的待排序数据子序列403b为(90、0、60、40),数据双调排序组件S 4执行数据双调排序任务404a,数据双调排序任务404a所对应的待排序数据子序列404b为(23、35、95、18)。在本申请实施例中,业务服务器可以先对待排序数据序列中的每个待排序数据,按序生成每个待排序数据对应的唯一索引标识,本申请实施例对索引标识的形式不做限定,可以为任意一种能够用于标识待排序数据序列中的每个待排序数据的信息。为了便于叙述以及理解,本申请以4位(bit)2进制为例,如图4所示10的索引标识为0000,20的索引标识为0001,5的索引标识为0010,9的索引标识为0011,3的索引标识为0100,8的索引标识为0101,12的索引标识为0110,14的索引标识为0111,90的索引标识为1000,0的索引标识为1001,60的索引标识为1010,1011的索引标识为40,23的索引标识为1100,35的索引标识为1101,95的索引标识为1110,18的索引标识为1111。
请再参见图4,业务服务器通过C个(如图4中的4个)数据双调排序组件,并行运行B个(如图4中的4个)数据双调排序任务,得到C个数据双调排序组件分别输出的数据排序子结果。可以理解的是,一个双调排序组件执行一个数据双调排序任务,且每个数据双调排序组件的排序逻辑是一致的,故此处将以数据双调排序组件S 1执行数据双调排序任务401a为例叙述,剩余的数据双调排序组件的执行过程可以参见下文的描述,不进行赘述。本申请实施例对两个索引标识分别对应的待排序数据(原始待排序数据)进行比较,并根据比较结果,更新两个索引标识分别对应的待排序数据(比较后的待排序数据)。
图4中的符号
Figure PCTCN2022118483-appb-000001
表示升序,两个
Figure PCTCN2022118483-appb-000002
相连表示相连的两个索引标识分别对应的待排序数据进行升序排序,符号“Θ”表示降序,两个“Θ”相连表示相连的两个索引标识分别对应的待排序数据进行降序排序。如图4所示,在数据双调排序组件S 1中,待排序数据子序列401b为(10、20、5、9),故第二总数量等于4,则具有两个双调排序阶段,如图4中的第一个双调排序阶段401h,以及第二个双调排序阶段402h,且在第二个双调排序阶段402h中,包括两轮排序轮次,如图4中的第一轮排序轮次401i以及第二轮排序轮次402i。在第一个双调排序阶段401h中,一轮排序轮次逻辑参数包括索引标识0000与索引标识0001进行升序比较,索引标识0010与索引标识0011进行降序比较,故第一个双调排序阶段401h的输出结果为索引标识0000更新为待排序数据10,索引标识0001更新为待排序数据20,索引标识0010更新为待排序数据9,索引标识0011更新为待排序数据5。此时,业务服务器将上述的输出结果写入至数据双调排序组件S 1对应的存储组件401d中,写入成功后,开启第二个双调排序阶段402h。
请再参见图4,在第二个双调排序阶段402h中,第一轮排序轮次401i的排序轮次逻辑参数包括索引标识0000与索引标识0010进行升序比较,索引标识0001与索引标 识0011进行升序比较,故第一轮排序轮次401i的输出结果为索引标识0000更新为待排序数据9,索引标识0010更新为待排序数据10,索引标识0001更新为待排序数据5,索引标识0011更新为待排序数据20,此时,业务服务器将第一轮排序轮次401i的输出结果缓存在数据双调排序组件S1的数据获取子组件中,无需写入至存储组件401d,故在开启第二轮排序轮次402i时,数据双调排序组件S 1无需从存储组件401d中获取上一排序轮次的输出结果,可以直接从本地的数据获取子组件中获取。
请再参见图4,第二轮排序轮次402i的排序轮次逻辑参数包括索引标识0000与索引标识0001进行升序比较,索引标识0010与索引标识0011进行升序比较,故第二轮排序轮次402i的输出结果为索引标识0000更新为待排序数据5,索引标识0001更新为待排序数据9,索引标识0010更新为待排序数据10,索引标识0011更新为待排序数据20,第二个双调排序阶段402h结束,此时,业务服务器将第二轮排序轮次402i的输出结果(即图4中的数据排序子结果401c)写入至存储组件401d。需要说明的是,图4示例的存储组件401d、存储组件402d、存储组件403d以及存储组件404d均为数据双调排序组件的本地存储器。
按照上述排序过程,数据双调排序组件S 1对应的输出结果为数据排序子结果401c,数据双调排序组件S 2对应的输出结果为数据排序子结果402c,数据双调排序组件S 3对应的输出结果为数据排序子结果403c,数据双调排序组件S 4对应的输出结果为数据排序子结果404c。
排序完成后,Sort单元将输出结果写回存储器,同时向任务控制组件40a发出排序完成信号,表示数据双调排序任务完成。
步骤S104,基于C个数据双调排序组件,对B个数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果。
本申请实施例中,在第L次合并中获取U个数据排序子结果对;U为正整数;L为正整数;若L为1,则U个数据排序子结果对是基于B个数据排序子结果所生成的;若L不为1,则U个数据排序子结果对是基于历史合并数据排序子结果所生成的,历史合并数据排序子结果是在第L-1次合并中得到的子结果;基于C个数据双调排序组件,对U个数据排序子结果对进行合并,得到U个目标合并数据排序子结果;若U等于1,则将U个目标合并数据排序子结果,确定为待排序数据序列对应的数据排序结果;若U大于1,则在第L+1次合并中,基于C个数据双调排序组件,对U个目标合并数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果。其中,U个数据排序子结果对包括数据排序子结果对M p,p为正整数,且p小于或等于U。
任务控制组件收到所有的子任务(数据双调排序任务)的完成信号后,开启针对C个数据排序子结果的合并指令,同样,业务服务器调用Sort单元完成,与步骤S103中的排序过程一致,两者的区别在于待排序数据子序列可能是无序序列,而数据排序子结果所对应的序列为有序序列。
本申请实施例提供的数据处理方法,可以在进行数据读写操作时,获取针对数据读写操作对应的待排序数据序列的数据排序请求,进而响应于数据排序请求,执行上述S101至S104中的步骤,实现对待排序数据序列进行数据排序。这里,数据读写操作可以是在终端设备上安装的任一目标应用中进行的读操作或写操作,待排序数据序列是读操作对应的读数据或者写操作对应的写数据。
本申请实施例的数据处理方法至少可以应用于以下场景:当需要通过目标应用的业务服务器从云端存储器中读取已存储数据时,可以在目标应用的客户端上执行读操作,业务服务器响应于读操作,从云端存储器中获取读操作对应的已存储数据,即得到上述待排序数据序列。在得到上述待排序数据序列之后,可以采用本申请实施例的方法,针 对待排序数据序列,通过终端设备上的目标应用的客户端向业务服务器发送数据排序请求;业务服务器响应于数据排序请求,调用C个数据双调排序组件,并根据待排序数据序列以及C个数据双调排序组件,开启B个数据双调排序任务,并行运行C个数据双调排序组件,得到由C个数据双调排序组件所输出的B个数据排序子结果;然后,基于C个数据双调排序组件,对B个数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果,从而完成对云端存储器中获取的待排序数据序列的数据排序处理。
在该应用场景下,由于请求从云端存储器中读取数据的读操作可以请求读取多个数据,且可以具有针对不同数据的多个读操作,因此,读取的已存储数据为多个无序数据,为了保证读取的已存储数据能够按照预设的顺序排列,形成有序数据,则可以采用本申请实施例的方法进行数据排序处理;并且,由于本申请实施例在进行数据排序处理时,通过并行运行多个数据双调排序组件,不仅可以降低针对待排序数据序列(即读取的已存储数据)的排序任务的执行耗时,还可以提高排序任务的排序效率。
本申请实施例的数据处理方法还可以应用于以下场景:终端设备上安装有购物应用,用户在购物应用的客户端进行下拉刷新操作,购物应用的业务服务器响应于下拉刷新操作,从商品数据库中召回预设数量的商品信息作为待推荐信息,其中,预设数量的商品信息构成上述待排序数据序列。在进行商品信息推荐之前,可以先采用本申请实施例提供的排序方法,对预设数量的商品信息对应的待排序数据序列进行数据排序。在实现的过程中,可以针对待排序数据序列,通过购物应用的客户端向业务服务器发送数据排序请求;业务服务器响应于数据排序请求,调用C个数据双调排序组件,并根据待排序数据序列以及C个数据双调排序组件,开启B个数据双调排序任务,并行运行C个数据双调排序组件,得到由C个数据双调排序组件所输出的B个数据排序子结果;然后,基于C个数据双调排序组件,对B个数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果,从而完成对预设数量的商品信息对应的待排序数据序列的数据排序处理。
在该应用场景下,由于从商品数据库中召回的预设数量的商品信息之间存在一定的差异性,且多个商品信息对应的数据也为无序数据,因此,为了能够对用户进行准确的信息推荐,可以对待排序数据序列进行数据排序处理,进而基于数据排序处理后的排序结果进行信息推荐,从而能够对用户进行准确和符合用户体验的信息推荐。并且,在进行数据排序处理时,还可以降低针对待排序数据序列的排序任务的执行耗时,提高排序任务的排序效率。
本申请实施例的数据处理方法还可以应用于以下场景:在智慧交通领域,终端设备为车载终端,在车载终端上安装有导航应用,导航应用的业务服务器响应于导航请求,基于输入的起点和终点信息,生成导航信息,该导航信息中包括但不限于以下至少之一:多个位置信息、针对每一位置信息的预计到达时刻和每一位置信息对应的当前路况等。该导航信息中的所有数据构成上述待排序数据序列。在进行实时导航之前,可以先采用本申请实施例提供的排序方法,对导航信息对应待排序数据序列进行数据排序。在实现的过程中,可以针对待排序数据序列,通过导航应用的客户端向业务服务器发送数据排序请求;业务服务器响应于数据排序请求,调用C个数据双调排序组件,并根据待排序数据序列以及C个数据双调排序组件,开启B个数据双调排序任务,并行运行C个数据双调排序组件,得到由C个数据双调排序组件所输出的B个数据排序子结果;然后,基于C个数据双调排序组件,对B个数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果,从而完成对导航信息对应的待排序数据序列的数据排序处理。
在该应用场景下,由于生成的导航信息中的多个数据之间具有一定的差异性,且导航信息中的多个数据也为无序数据,因此,为了能够对用户进行准确的实时导航,可以 对待排序数据序列进行数据排序处理,进而基于数据排序处理后的排序结果进行实时导航,从而能够为用户提供准确和符合用户导航需求的导航体验。并且,在进行数据排序处理时,还可以降低针对待排序数据序列的排序任务的执行耗时,提高排序任务的排序效率。
下面,请一并参见图5以及图4,图5是本申请实施例提供的再一种数据处理的场景示意图。如图5所示,业务服务器针对图4所示例的4个数据排序子结果,可以获取2个数据排序子结果对,如图5所示,由数据排序子结果401c以及数据排序子结果402c生成数据排序子结果对401f;由数据排序子结果403c以及数据排序子结果404c生成数据排序子结果对402f。
通过任务控制组件40a,业务服务器开启两个数据合并任务,如图5中的数据合并任务401e以及数据合并任务402e,数据合并任务401e是指对数据排序子结果401c以及数据排序子结果402c进行合并的任务,数据合并任务402e是指对数据排序子结果403c以及数据排序子结果404c进行合并的任务。
若数据存储容量大于或等于数据排序子结果对所关联的待排序数据的第四总数量,则任务控制组件40a可以将数据合并任务401e分发至数据排序子结果对401f所关联的两个数据双调排序组件中的一个或两个数据双调排序组件,即图4所示的双调排序组件S 1以及双调排序组件S 2。本申请实施例中,任务控制组件40a将数据合并任务401e,分发至数据排序子结果对401f所关联的两个数据双调排序组件中的一个数据双调排序组件(如图5所示的双调排序组件S 1)中,若数据存储容量小于数据排序子结果对所关联的待排序数据的第四总数量,则请参见下文图8所对应的实施例中的描述,此处暂不展开叙述。
请结合图4以及图5,在数据双调排序组件S 1中,数据排序子结果对401f为一个双调序列,其包括(5、9、10、20、14、12、8、3),故第四总数量等于8,具有三轮排序轮次,每轮排序轮次的排序轮次逻辑参数如图5所示。在第一轮排序轮次中,第一轮排序轮次逻辑参数包括:对索引标识0000与索引标识0100进行升序比较,对索引标识0001与索引标识0101进行升序比较,对索引标识0010与索引标识0110进行升序比较,对索引标识0011与索引标识0111进行升序比较,得到第一轮排序轮次的输出结果为索引标识0000更新为待排序数据5,索引标识0100更新为待排序数据14,索引标识0001更新为待排序数据9,索引标识0101更新为待排序数据12,索引标识0010更新为待排序数据8,索引标识0110更新为待排序数据10,索引标识0011更新为待排序数据3,索引标识0111更新为待排序数据20。此时,业务服务器将第一轮排序轮次的输出结果(即5、9、8、3、14、12、10、20)缓存在数据双调排序组件S 1的数据获取子组件中,无需写入至存储组件401d,故在开启第二轮排序轮次时,数据双调排序组件S 1无需从存储组件401d中获取上一排序轮次的输出结果,可以直接从本地的数据获取子组件中获取。
请再参见图5,第二轮排序轮次逻辑参数包括:对索引标识0000与索引标识0010进行升序比较,对索引标识0001与索引标识0011进行升序比较,对索引标识0100与索引标识0110进行升序比较,对索引标识0101与索引标识0111进行升序比较,得到第二轮排序轮次的输出结果为索引标识0000更新为待排序数据5,索引标识0010更新为待排序数据8,索引标识0001更新为待排序数据3,索引标识0011更新为待排序数据9,索引标识0100更新为待排序数据10,索引标识0110更新为待排序数据14,索引标识0101更新为待排序数据12,索引标识0111更新为待排序数据20。此时,业务服务器将第二轮排序轮次的输出结果(即5、3、8、9、10、12、14、20)缓存在数据双调排序组件S 1的数据获取子组件中,无需写入至存储组件401d,故在开启第三轮排序轮 次时,数据双调排序组件S 1无需从存储组件401d中获取上一排序轮次的输出结果,可以直接从本地的数据获取子组件中获取。
请再参见图5,第三轮排序轮次逻辑参数包括:对索引标识0000与索引标识0001进行升序比较,对索引标识0010与索引标识0011进行升序比较,对索引标识0100与索引标识0101进行升序比较,对索引标识0110与索引标识0111进行升序比较,得到第三轮排序轮次的输出结果为索引标识0000更新为待排序数据3,索引标识0001更新为待排序数据5,索引标识0010更新为待排序数据8,索引标识0011更新为待排序数据9,索引标识0100更新为待排序数据10,索引标识0101更新为待排序数据12,索引标识0110更新为待排序数据14,索引标识0111更新为待排序数据20。此时,业务服务器将第三轮排序轮次的输出结果(即3、5、8、9、10、12、14、20),确定为目标合并数据排序子结果401g,并将该目标合并数据排序子结果401g写入至存储组件401d。
可以理解的是,数据双调排序组件S 3执行数据合并任务402e的过程与上文描述的过程一致,区别在于数据双调排序组件S 1执行升序排序,数据双调排序组件S 3执行降序排序,故不进行赘述,可以得到目标合并数据排序子结果402g。
经过一次合并后,得到两个目标合并数据排序子结果,故任务控制组件40a根据两个目标合并数据排序子结果,再分发一次数据合并任务至数据双调排序组件,直至完成所有数据排序子结果的合并,获得一组数据排序结果。
C个数据双调排序组件均包括索引生成子组件,索引生成子组件用于为待排序数据序列中的每个待排序数据分别分配索引值;数据排序请求携带写入地址;按序获取数据排序结果中的每个待排序数据分别对应的索引值,根据获取到的索引值,生成与待排序数据序列具有相同序列长度的索引值序列,其中,序列长度与待排序数据序列中的待排序数据的数量具有映射关系,序列长度用于表征待排序数据序列中的待排序数据的数量;发送针对索引值序列的数据写入请求至数据读写组件,通过数据读写组件,将索引值序列写入至写入地址。
数据排序结果的返回形式,可将原序列的待排序数据,按照从小到大或从大到小的顺序重新排列后返回。在一些实施例中,可以保持原序列数据不变,返回排序后序列(即数据排序结果)中每个待排序数据在待排序数据序列中的索引值(index),从而构成的与待排序数据序列等长的索引值序列。
通过架构并行以及流程优化,本申请实施例提出一种硬件加速排序的计算架构,实现每个时钟周期(clock)执行多个排序操作,相比于中央处理器提升计算效率。其中,架构并行是指增加排序单元(即数据双调排序组件)的并行度,使用多个排序单元,此时,排序加速器的本地存储器可以提供多个可同时访问的读写端口;流程优化是指通过修改算法的执行顺序,将分离的计算过程进行合并,结合计算流水线,减少存储器读-比较-存储器写的轮次,实现加速。采用流程优化和架构并行的方式,可以提升排序速度,将排序耗时T从公式(1),
Figure PCTCN2022118483-appb-000003
降低为公式(2),
Figure PCTCN2022118483-appb-000004
公式(1)以及公式(2)中,T为执行时间,n为待排序数量,N=ceil(log2(n)),C和P S为架构并行和流程优化实现的加速比,ceil()为向上取整函数。本申请实施例提供一种同时提升C和P S的加速方法及其执行架构,且P S>=2,C>=2,当P S=2,C=2 时,性能可提升近4倍。实现中,常用的Sort单元数量可以为2或者4或者8等。
在本申请实施例中,基于C个数据双调排序组件,计算机设备可以生成针对待排序数据序列的多个数据双调排序任务,通过并行运行C个数据双调排序组件,可以同时执行多个数据双调排序任务,通过同时执行多个数据双调排序任务,可以减少待排序数据序列的排序时间,即减少待排序数据序列的任务执行时间;并且,基于C个数据双调排序组件,计算机设备对B个数据排序子结果进行合并,可以得到针对待排序数据序列的数据排序结果。由此可见,本申请实施例通过并行运行C个数据双调排序组件,不仅可以降低针对待排序数据序列的排序任务的执行耗时,还可以提高排序任务的排序效率。
请参见图6,图6是本申请实施例提供的另一种数据处理方法的流程示意图。该方法可以由业务服务器(例如,上述图1所示的业务服务器100)执行,也可以由终端设备(例如,上述图1所示的终端设备200a)执行,还可以由业务服务器和终端设备交互执行。为便于理解,本申请实施例以该方法由业务服务器执行为例进行说明。如图6所示,该数据处理方法的过程包括如下步骤S201至步骤S203,且步骤S201至步骤S203为图2所对应实施例中步骤S103的另一个实施例。
步骤S201,在数据双调排序组件F e中,获取数据双调排序任务D e对应的待排序数据子序列A e
这里,数据排序请求携带待排序数据序列中的待排序数据分别对应的读取地址,以及针对待排序数据序列的排序顺序;在数据双调排序组件F e中,获取数据双调排序任务D e所携带的目标读取地址以及第二总数量;待排序数据序列中的待排序数据分别对应的读取地址中包括目标读取地址;第二总数量用于表征待排序数据子序列A e中的待排序数据的总数量;发送携带有目标读取地址的数据获取请求至数据读写组件,通过数据读写组件,从目标读取地址获取初始待排序数据子序列;初始待排序子序列属于待排序数据序列;确定第二总数量以及第三总数量之间的数量差值;第三总数量用于表征初始待排序数据子序列中的待排序数据的总数量;根据数量差值以及排序顺序获取待排序补充数据,将待排序补充数据添加至初始待排序数据子序列中,得到待排序数据子序列A e;将待排序数据子序列A e加载至数据双调排序组件F e对应的存储组件中。
请一并参见图7,图7是本申请实施例提供的一种排序加速器的结构示意图。如图7所示,该排序加速器包括任务控制组件、C个数据双调排序组件、C个数据双调排序组件分别对应的存储组件、数据选择组件以及数据读写组件。
这里,数据读写组件用于获取数据搬运请求,根据数据搬运请求实现数据搬运,该数据搬运请求可以为数据获取请求,根据数据获取请求中的目标读取地址,数据读写组件获取初始待排序数据子序列;该数据搬运请求可以为数据写入请求,根据数据写入请求,数据读写组件将索引值序列写入至写入地址。
数据选择组件,用于将本地存储器的读写通道按需分配给数据读写组件或多个数据双调排序组件。每一存储组件,用于写入数据以及读取数据,写入的数据可以包括通过数据读写组件,从外部存储组件写入待排序数据子序列、写入数据双调排序组件所返回的中间排序子结果以及数据排序子结果等;读取的数据可以包括待排序数据子序列、中间排序子结果、数据排序子结果等。每一数据双调排序组件,用于对待排序数据子序列进行双调排序,得到数据排序子结果;还用于对数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果。数据双调排序组件也可称为排序(Sort)单元。任务控制组件,用于获取数据排序请求,根据数据排序请求开启B个数据双调排序任务,将一个数据双调排序任务发送至至一个数据双调排序组件;还用于获取每个数据双调排序组件分别返回的排序完成信号,基于B个排序完成信号,开启多个数据合并任务,将一个数据合并任务分发至与数据排序子结果相关联的数据双调排序组件;还用于获取合并完 成信号,将合并完成信号发送至顶层控制组件;其中,合并完成信号用于表征待排序数据序列中的待排序数据已排序完毕。任务控制组件也可称为控制单元。
数据读写组件可以是直接存储器访问(DMA,Direct Memory Access),DMA可以将待排序数据从排序(Sort)加速器外的外部存储器(即外部存储组件),加载(load)到排序加速器的内部存储器(即存储组件),以及将排序加速器内部存储器中的指定数据,写入(store)到排序加速器外的存储器(即外部存储组件)。DMA可以以指令队列的方式执行,也可以使用轮询的方式,本申请实施例不做限定。
数据选择组件(MUX,Multiplexer),是排序加速器的内部使用选择器,业务服务器将存储器的读写通道按需分配给DMA或多个Sort单元,本地存储器支持C个读通道和C个写通道,可以由多个物理存储单元拼接构成,来实现多个通道的同时读写,这里不做限定。
数据双调排序组件对应的存储组件是指排序加速器的本地存储器,待排序数据序列中的待排序数据存储在排序加速器的本地存储器或外部存储器,若存储在外部存储器,则业务服务器通过数据读取组件将外部存储器中的待排序数据读取至本地存储器,以对待排序数据进行双调排序。
可以理解的是,若排序顺序是从小至大,则待排序补充数据优先选择正无穷大,并将待排序补充数据添加至初始待排序数据子序列的待排序数据的高位地址侧,即将正无穷大排在待排序数据的后面;若排序顺序是从大至小,则待排序补充数据优先选择负无穷大,并将待排序补充数据添加至初始待排序数据子序列的待排序数据的高位地址侧,即将负无穷大排在待排序数据的后面。
步骤S202,对待排序数据子序列A e进行双调排序,得到待排序数据子序列A e对应的数据排序子结果。
这里,数据双调排序组件F e包括排序控制子组件、轮次控制子组件、数据获取子组件以及双调排序子组件;当排序控制子组件获取到数据加载完成通知时,启动轮次控制子组件;数据加载完成通知用于表征数据双调排序组件F e对应的存储组件已加载待排序数据子序列A e;在轮次控制子组件中,根据第二总数量,确定待排序数据子序列A e对应的双调排序阶段以及双调排序阶段对应的双调排序逻辑参数,将双调排序阶段以及双调排序阶段对应的双调排序逻辑参数,发送至数据获取子组件;第二总数量用于表征待排序数据子序列A e中的待排序数据的总数量;在数据获取子组件中,根据双调排序阶段以及双调排序阶段对应的双调排序逻辑参数,获取中间待排序数据,将中间待排序数据发送至双调排序子组件,通过双调排序子组件,对中间待排序数据进行排序,得到中间排序子结果,根据中间排序子结果确定待排序数据子序列A e对应的数据排序子结果;待排序数据子序列A e包括中间待排序数据。
本申请实施例中,确定待排序数据子序列A e对应的数据排序子结果的过程可以包括:双调排序阶段的阶段数为至少两个;在数据获取子组件中,根据至少两个双调排序阶段中的第i个双调排序阶段,从存储组件中读取第一数据排序过渡子结果;i大于1的正整数,且第一数据排序过渡子结果为第i-1个双调排序阶段中的输出结果;第i个双调排序阶段对应的双调排序逻辑参数包括n轮排序轮次逻辑参数;n是基于i所确定的,且n为大于1的正整数;根据第j轮排序轮次逻辑参数,从中间数据排序过渡子结果中获取中间待排序数据;j为正整数且j小于或等于n;若j为1,则中间数据排序过渡子结果为第一数据排序过渡子结果;若j不为1,则中间数据排序过渡子结果为缓存在数据获取子组件中的第二数据排序过渡子结果,第二数据排序过渡子结果是基于第j-1轮排序轮次逻辑参数所确定的;将中间待排序数据发送至双调排序子组件,通过双调排序子组件,对中间待排序数据进行排序,得到中间待排序数据的中间排序子结果,根据 中间排序子结果确定待排序数据子序列A e对应的第三数据排序过渡子结果;若j小于n,则将第三数据排序过渡子结果缓存在数据获取子组件中;若j=n,且第i个双调排序阶段为至少两个双调排序阶段的最后一个双调排序阶段,则将第三数据排序过渡子结果写入至存储组件,将第三数据排序过渡子结果确定为待排序数据子序列A e对应的数据排序子结果。
请再参见图7,多个数据双调排序组件、任务控制组件以及数据读写组件可以组成一个排序加速器,可以理解的是,每个数据双调排序组件的基本结构是相同的,在图7中,以数据双调排序组件1示例每个数据双调排序组件的基本结构,数据双调排序组件1可以包括排序控制子组件、轮次控制子组件、索引生成子组件、数据获取子组件、双调排序子组件,在双调排序场景中,这些子组件的实现功能可以参见上述步骤S202。
本申请实施例中,排序控制子组件用于获取任务控制组件分发的数据双调排序任务,基于该数据双调排序任务,启动轮次控制组件;还用于获取轮次控制子组件发送的合并完成信号(包括排序完成信号以及合并完成信号),并将合并完成信号发送至任务控制组件。上述可知,排序控制子组件实现数据双调排序组件与任务控制组件之间的信号交互,在接收到使能信号后,启动轮次控制子组件。排序控制子组件可以称为排序(Sort)控制模块。
轮次控制子组件,用于根据待排序数据子序列,确定双调排序阶段以及双调排序阶段对应的双调排序逻辑参数,并向索引生成子组件、数据获取子组件、双调排序子组件发出对应的使能信号。轮次控制子组件可以称为轮次控制模块。索引生成子组件,用于首次待排序数据的读取,对于每组数据,生成唯一的索引标识,写入数据缓冲区。索引生成子组件可以称为索引生成模块。数据获取子组件,用于实现待排序数据子序列中待排序数据的一轮遍历。数据获取子组件可以称为单轮数据控制与访存地址计算模块。双调排序子组件,用于为双调排序过程中的计算子组件,实现数据的快速比较。双调排序子组件可以称为比较模块。
本申请实施例可采用流程优化的方法,将两轮或多轮过程中,对相同数据的操作过程进行合并,从而实现局部多个数据的一次读取,一次写回,代替多次读写的过程,减少了数据整体的读写轮次,实现加速效果,实现过程请参见上文图3所对应的实施例中步骤S103的描述。
步骤S203,在运行数据双调排序组件F e的同时,通过C-1个数据双调排序组件并行运行B-1个数据双调排序任务,得到C-1个数据双调排序组件分别输出的数据排序子结果;C-1个数据双调排序组件是指:C个数据双调排序组件中除了数据双调排序组件F e之外的数据双调排序组件;B-1个数据双调排序任务是指:B个数据双调排序任务中除了数据双调排序任务D e之外的数据双调排序任务。
本申请实施例通过架构并行以及流程优化可以减少双调排序的执行耗时,结合图3对应的实施例,可知顶层控制单元发出排序指令,排序加速器接受指令后,从外部存储器中读入待排序数据到本地存储器,调用多个数据双调排序组件完成排序,将数据排序结果从本地存储器中写回到外部存储器,向顶层控制单元发出指令完成的响应。
在本申请实施例中,基于C个数据双调排序组件,计算机设备可以生成针对待排序数据序列的多个数据双调排序任务,通过并行运行C个数据双调排序组件,可以同时执行多个数据双调排序任务,通过同时执行多个数据双调排序任务,可以减少待排序数据序列的排序时间,即任务执行时间;后续,基于C个数据双调排序组件,计算机设备对B个数据排序子结果进行合并,可以得到针对待排序数据序列的数据排序结果。由此可见,本申请实施例通过并行运行C个数据双调排序组件,不仅可以降低针对待排序数据序列的排序任务的执行耗时,还可以提高排序任务的排序效率。
请参见图8,图8是本申请实施例提供的再一种数据处理方法的流程示意图。该方法可以由业务服务器(例如,上述图1所示的业务服务器100)执行,也可以由终端设备(例如,上述图1所示的终端设备200a)执行,还可以由业务服务器和终端设备交互执行。为便于理解,本申请实施例以该方法由业务服务器执行为例进行说明。如图8所示,该数据处理方法的过程包括如下步骤S301至步骤S304,且步骤S301至步骤S304为图2所对应实施例中步骤S104的另一个实施例。
步骤S301,确定数据排序子结果对M p所关联的待排序数据的第四总数量、以及数据排序子结果对M p所关联的数据双调排序组件对应的数据存储容量;C个数据双调排序组件包括数据排序子结果对M p所关联的数据双调排序组件。
步骤S302,将第四总数量与数据存储容量进行对比,得到第二对比结果。
步骤S301至步骤S302的实现过程请参见上文图2所对应的实施例中步骤S104的描述,此处不进行赘述。
步骤S303,根据第二对比结果以及数据排序子结果对M p所关联的数据双调排序组件,对数据排序子结果对M p进行双调排序,得到数据排序子结果对M p对应的目标合并数据排序子结果。
这里,数据排序子结果对M p包括第一数据排序子结果以及第二数据排序子结果;若第二对比结果为第四总数量大于数据存储容量,则将第一数据排序子结果均分为第一有序数据片段以及第二有序数据片段,将第二数据排序子结果均分为第三有序数据片段以及第四有序数据片段;若第一有序数据片段中的数据小于或等于第二有序数据片段中的数据,且第三有序数据片段中的数据小于或等于第四有序数据片段中的数据,则基于第一数据双调排序组件,对第一有序数据片段以及第三有序数据片段进行双调排序,得到第三数据排序子结果;数据排序子结果对M p所关联的数据双调排序组件包括第一数据双调排序组件;基于第二数据双调排序组件,对第二有序数据片段以及第四有序数据片段进行双调排序,得到第四数据排序子结果;数据排序子结果对M p所关联的数据双调排序组件包括第二数据双调排序组件;将第三数据排序子结果均分为第五有序数据片段以及第六有序数据片段,将第四数据排序子结果均分为第七有序数据片段以及第八有序数据片段;若第五有序数据片段中的数据小于或等于第六有序数据片段中的数据,第七有序数据片段中的数据小于或等于第八有序数据片段中的数据,则基于目标数据双调排序组件,对第六有序数据片段以及第七有序数据片段进行双调排序,得到第五数据排序子结果;目标数据双调排序组件为第一数据双调排序组件或第二数据双调排序组件;基于第五有序数据片段、第五数据排序子结果以及第八有序数据片段,得到数据排序子结果对M p对应的目标合并数据排序子结果。
请一并参见图9,图9是本申请实施例提供的又一种数据处理的场景示意图。如图9所示,业务服务器将两个有序序列各自均分为两份,得到4个有序序列,即将有序序列S0-1(等同于第一数据排序子结果)均分为第一有序数据片段以及第二有序数据片段,将有序序列S0-2(等同于第二数据排序子结果)均分为第三有序数据片段以及第四有序数据片段;基于第一数据双调排序组件,对第一有序数据片段以及第三有序数据片段进行双调排序,得到第三数据排序子结果(等同于图9中的有序序列S1-1);基于第二数据双调排序组件,对第二有序数据片段以及第四有序数据片段进行双调排序,得到第四数据排序子结果(等同于图9中的有序序列S1-2);将第三数据排序子结果均分为第五有序数据片段以及第六有序数据片段,将第四数据排序子结果均分为第七有序数据片段以及第八有序数据片段,基于目标数据双调排序组件,对第六有序数据片段以及第七有序数据片段进行双调排序,得到第五数据排序子结果(等同于图9中的有序序列S2-2),将第五有序数据片段作为有序序列S2-1,将第八有序数据片段作为有序序列S2-3,将三 个由小到大的有序序列S2-1、有序序列S2-2、有序序列S2-3通过拼接(Concat)操作拼接起来,获得最终输出。
步骤S304,将U个数据排序子结果对对应的目标合并数据排序子结果,确定为U个目标合并数据排序子结果。
步骤S304的实现过程请参见上述图2对应的实施例中的步骤S104。
在本申请实施例中,基于C个数据双调排序组件,计算机设备可以生成针对待排序数据序列的多个数据双调排序任务,通过并行运行C个数据双调排序组件,可以同时执行多个数据双调排序任务,通过同时执行多个数据双调排序任务,可以减少待排序数据序列的排序时间,即任务执行时间;后续,基于C个数据双调排序组件,计算机设备对B个数据排序子结果进行合并,可以得到针对待排序数据序列的数据排序结果。由此可见,本申请实施例通过并行运行C个数据双调排序组件,不仅可以降低针对待排序数据序列的排序任务的执行耗时,还可以提高排序任务的排序效率。
请参见图10,图10是本申请实施例提供的一种数据处理装置的结构示意图。该数据处理装置可以是运行于计算机设备中的一个计算机程序(包括程序代码),例如该数据处理装置为一个应用软件;该装置可以配置为执行本申请实施例提供的方法中的相应步骤。如图10所示,该数据处理装置1可以包括:调用组件模块11、开启任务模块12、并行组件模块13以及合并结果模块14。
调用组件模块11,配置为获取针对待排序数据序列的数据排序请求,根据数据排序请求调用C个数据双调排序组件;C为大于1的正整数;开启任务模块12,配置为根据待排序数据序列以及C个数据双调排序组件,开启B个数据双调排序任务;B为大于1的正整数;B个数据双调排序任务分别关联不同的待排序数据子序列,B个待排序数据子序列均是基于待排序数据序列所生成的;并行组件模块13,配置为根据B个数据双调排序任务,并行运行C个数据双调排序组件,得到由C个数据双调排序组件所输出的B个数据排序子结果;一个数据排序子结果用于表征,一个数据双调排序任务所关联的待排序数据子序列的排序结果;合并结果模块14,配置为基于C个数据双调排序组件,对B个数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果。
在一些实施例中,调用组件模块11,还配置为获取数据排序请求,将数据排序请求发送至任务控制组件;数据排序请求携带待排序数据序列中的待排序数据的第一总数量;则开启任务模块12可以包括:第一确定单元121、第一获取单元122以及开启任务单元123。第一确定单元121,配置为在任务控制组件中,确定与第一总数量相关联的待排序数量;待排序数量等于或大于第一总数量;第一获取单元122,配置为获取C个数据双调排序组件中的每一数据双调排序组件对应的数据存储容量,将待排序数量与总数据存储容量进行对比,得到第一对比结果;总数据存储容量等于C个数据存储容量的总和;开启任务单元123,配置为根据第一对比结果开启B个数据双调排序任务。
在一些实施例中,开启任务单元123可以包括:第一开启子单元1231以及第二开启子单元1232。第一开启子单元1231,配置为若第一对比结果为待排序数量大于总数据存储容量,则获取待排序数量以及总数据存储容量之间的第一数量比值;第一开启子单元1231,还配置为根据第一数量比值开启B个数据双调排序任务;C与第一数量比值的乘积等于B;一个数据双调排序任务所携带的第二总数量等于数据存储容量;第二总数量用于表征以下信息:一个数据双调排序任务所关联的待排序数据子序列中的待排序数据的总数量;第二开启子单元1232,配置为若第一对比结果为待排序数量小于或等于总数据存储容量,则根据C个数据双调排序组件开启B个数据双调排序任务;B等于C。
在一些实施例中,并行组件模块13可以包括:第一分发单元131以及第一并行单 元132。第一分发单元131,配置为若B等于C,则将B个数据双调排序任务分发至C个数据双调排序组件;其中,一个数据双调排序任务被发送至一个数据双调排序组件;第一并行单元132,配置为通过C个数据双调排序组件,并行运行B个数据双调排序任务,得到C个数据双调排序组件分别输出的数据排序子结果。
在一些实施例中,B个数据双调排序任务包括数据双调排序任务D e,e为正整数且e小于或等于B;B个待排序数据子序列包括数据双调排序任务D e对应的待排序数据子序列A e;C个数据双调排序组件包括用于执行数据双调排序任务D e的数据双调排序组件F e;第一并行单元132可以包括:第一生成子单元1321以及第二生成子单元1322。第一生成子单元1321,配置为在数据双调排序组件Fe中,获取数据双调排序任务De对应的待排序数据子序列Ae,对待排序数据子序列Ae进行双调排序,得到待排序数据子序列Ae对应的数据排序子结果;第二生成子单元1322,配置为在运行数据双调排序组件F e的同时,通过C-1个数据双调排序组件并行运行B-1个数据双调排序任务,得到C-1个数据双调排序组件分别输出的数据排序子结果;C-1个数据双调排序组件是指:C个数据双调排序组件中除了数据双调排序组件F e之外的数据双调排序组件;B-1个数据双调排序任务是指:B个数据双调排序任务中除了数据双调排序任务D e之外的数据双调排序任务。
在一些实施例中,数据排序请求携带待排序数据序列中的待排序数据分别对应的读取地址,以及针对待排序数据序列的排序顺序;第一生成子单元1321可以包括:第一获取子单元13211、第二获取子单元13212以及第一确定子单元13213。第一获取子单元13211,配置为在数据双调排序组件F e中,获取数据双调排序任务D e所携带的目标读取地址以及第二总数量;待排序数据序列中的待排序数据分别对应的读取地址中包括目标读取地址;第二总数量用于表征待排序数据子序列A e中的待排序数据的总数量;第二获取子单元13212,配置为发送携带有目标读取地址的数据获取请求至数据读写组件,通过数据读写组件,从目标读取地址获取初始待排序数据子序列;初始待排序子序列属于待排序数据序列;第一确定子单元13213,配置为确定第二总数量以及第三总数量之间的数量差值;第三总数量用于表征初始待排序数据子序列中的待排序数据的总数量;第一确定子单元13213,还配置为根据数量差值以及排序顺序获取待排序补充数据,将待排序补充数据添加至初始待排序数据子序列中,得到待排序数据子序列A e;第一确定子单元13213,还配置为将待排序数据子序列A e加载至数据双调排序组件Fe对应的存储组件中。
在一些实施例中,数据双调排序组件F e包括排序控制子组件、轮次控制子组件、数据获取子组件以及双调排序子组件;第一生成子单元1321可以包括:第二确定子单元13214以及第三确定子单元13215。第二确定子单元13214,配置为当排序控制子组件获取到数据加载完成通知时,启动轮次控制子组件;数据加载完成通知用于表征数据双调排序组件F e对应的存储组件已加载待排序数据子序列A e;第二确定子单元13214,还配置为在轮次控制子组件中,根据第二总数量,确定待排序数据子序列A e对应的双调排序阶段以及双调排序阶段对应的双调排序逻辑参数;将双调排序阶段以及双调排序阶段对应的双调排序逻辑参数,发送至数据获取子组件;第二总数量用于表征待排序数据子序列A e中的待排序数据的总数量;第三确定子单元13215,配置为在数据获取子组件中,根据双调排序阶段以及双调排序阶段对应的双调排序逻辑参数,获取中间待排序数据;将中间待排序数据发送至双调排序子组件,通过双调排序子组件,对中间待排序数据进行排序,得到中间排序子结果;根据中间排序子结果确定待排序数据子序列A e对应的数据排序子结果;待排序数据子序列A e包括中间待排序数据。
在一些实施例中,双调排序阶段的阶段数为至少两个;第三确定子单元13215,配 置为在数据获取子组件中,根据至少两个双调排序阶段中的第i个双调排序阶段,从存储组件中读取第一数据排序过渡子结果;i是大于1的正整数,且第一数据排序过渡子结果为第i-1个双调排序阶段中的输出结果;第i个双调排序阶段对应的双调排序逻辑参数包括n轮排序轮次逻辑参数;n是基于i所确定的,且n为大于1的正整数;第三确定子单元13215,还配置为根据第j轮排序轮次逻辑参数,从中间数据排序过渡子结果中获取中间待排序数据;j为正整数且j小于或等于n;若j为1,则中间数据排序过渡子结果为第一数据排序过渡子结果;若j不为1,则中间数据排序过渡子结果为缓存在数据获取子组件中的第二数据排序过渡子结果,第二数据排序过渡子结果是基于第j-1轮排序轮次逻辑参数所确定的;第三确定子单元13215,还配置为将中间待排序数据发送至双调排序子组件,通过双调排序子组件,对中间待排序数据进行排序,得到中间待排序数据的中间排序子结果,根据中间排序子结果确定待排序数据子序列A e对应的第三数据排序过渡子结果;第三确定子单元13215,还配置为若j小于n,则将第三数据排序过渡子结果缓存在数据获取子组件中;第三确定子单元13215,还配置为若j=n,且第i个双调排序阶段为至少两个双调排序阶段的最后一个双调排序阶段,则将第三数据排序过渡子结果写入至存储组件,将第三数据排序过渡子结果确定为待排序数据子序列A e对应的数据排序子结果。
在一些实施例中,并行组件模块13可以包括:拆分任务单元133、第二分发单元134以及第二并行单元135。拆分任务单元133,配置为若B大于C,则将B个数据双调排序任务拆分为K*C个数据双调排序任务;K为C个数据双调排序组件分别对应的最大运行轮次;第二分发单元134,配置为将第i轮运行轮次中的C个数据双调排序任务分发至C个数据双调排序组件;其中,一个数据双调排序任务被发送至一个数据双调排序组件;i为小于或等于K的正整数;第二并行单元135,配置为通过C个数据双调排序组件,并行运行第i轮运行轮次中的C个数据双调排序任务,得到C个数据双调排序组件分别输出的数据排序子结果,直至i等于K时得到与B个数据双调排序任务分别对应的数据排序子结果。
在一些实施例中,合并结果模块14可以包括:第二获取单元141、第一合并单元142、第二确定单元143以及第二合并单元144。第二获取单元141,配置为在第L次合并中获取U个数据排序子结果对;U为正整数;L为正整数;若L为1,则U个数据排序子结果对是基于B个数据排序子结果所生成的;若L不为1,则U个数据排序子结果对是基于历史合并数据排序子结果所生成的,历史合并数据排序子结果是在第L-1次合并中得到的子结果;第一合并单元142,配置为基于C个数据双调排序组件,对U个数据排序子结果对进行合并,得到U个目标合并数据排序子结果;第二确定单元143,配置为若U等于1,则将U个目标合并数据排序子结果,确定为待排序数据序列对应的数据排序结果;第二合并单元144,配置为若U大于1,则在第L+1次合并中,基于C个数据双调排序组件,对U个目标合并数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果。
在一些实施例中,U个数据排序子结果对包括数据排序子结果对M p,p为正整数,且p小于或等于U;第一合并单元142可以包括:第一数量子单元1421、第二数量子单元1422、第三生成子单元1423以及第四生成子单元1424。第一数量子单元1421,配置为确定数据排序子结果对M p所关联的待排序数据的第四总数量、以及数据排序子结果对M p所关联的数据双调排序组件对应的数据存储容量;C个数据双调排序组件包括数据排序子结果对M p所关联的数据双调排序组件;第二数量子单元1422,配置为将第四总数量与数据存储容量进行对比,得到第二对比结果;第三生成子单元1423,配置为根据第二对比结果以及数据排序子结果对M p所关联的数据双调排序组件,对数据排序子 结果对M p进行双调排序,得到数据排序子结果对M p对应的目标合并数据排序子结果;第四生成子单元1424,配置为将U个数据排序子结果对对应的目标合并数据排序子结果,确定为U个目标合并数据排序子结果。
在一些实施例中,数据排序子结果对M p包括第一数据排序子结果以及第二数据排序子结果;第三生成子单元1423可以包括:第一均分子单元14231、第一排序子单元14232、第二排序子单元14233、第二均分子单元14234、第三排序子单元14235以及第四排序子单元14236。第一均分子单元14231,配置为若第二对比结果为第四总数量大于数据存储容量,则将第一数据排序子结果均分为第一有序数据片段以及第二有序数据片段,将第二数据排序子结果均分为第三有序数据片段以及第四有序数据片段;第一排序子单元14232,配置为若第一有序数据片段中的数据小于或等于第二有序数据片段中的数据,且第三有序数据片段中的数据小于或等于第四有序数据片段中的数据,则基于第一数据双调排序组件,对第一有序数据片段以及第三有序数据片段进行双调排序,得到第三数据排序子结果;数据排序子结果对Mp所关联的数据双调排序组件包括第一数据双调排序组件;第二排序子单元14233,配置为基于第二数据双调排序组件,对第二有序数据片段以及第四有序数据片段进行双调排序,得到第四数据排序子结果;数据排序子结果对M p所关联的数据双调排序组件包括第二数据双调排序组件;第二均分子单元14234,配置为将第三数据排序子结果均分为第五有序数据片段以及第六有序数据片段,将第四数据排序子结果均分为第七有序数据片段以及第八有序数据片段;第三排序子单元14235,配置为若第五有序数据片段中的数据小于或等于第六有序数据片段中的数据,第七有序数据片段中的数据小于或等于第八有序数据片段中的数据,则基于目标数据双调排序组件,对第六有序数据片段以及第七有序数据片段进行双调排序,得到第五数据排序子结果;目标数据双调排序组件为第一数据双调排序组件或第二数据双调排序组件;第四排序子单元14236,配置为基于第五有序数据片段、第五数据排序子结果以及第八有序数据片段,确定数据排序子结果对M p对应的目标合并数据排序子结果。
在一些实施例中,C个数据双调排序组件均包括索引生成子组件,索引生成子组件用于为待排序数据序列中的每个待排序数据分别分配索引值;数据排序请求携带写入地址;数据处理装置1还可以包括:生成序列模块15以及写入序列模块16。生成序列模块15,配置为按序获取数据排序结果中的每个待排序数据分别对应的索引值,根据获取到的索引值,生成与待排序数据序列具有相同序列长度的索引值序列;写入序列模块16,配置为发送针对索引值序列的数据写入请求至数据读写组件,通过数据读写组件,将索引值序列写入至写入地址。
在本申请实施例中,基于C个数据双调排序组件,计算机设备可以生成针对待排序数据序列的多个数据双调排序任务,通过并行运行C个数据双调排序组件,可以同时执行多个数据双调排序任务,通过同时执行多个数据双调排序任务,可以减少待排序数据序列的排序时间,即减少待排序数据序列的任务执行时间;并且,基于C个数据双调排序组件,计算机设备对B个数据排序子结果进行合并,可以得到针对待排序数据序列的数据排序结果。由此可见,本申请实施例通过并行运行C个数据双调排序组件,不仅可以降低针对待排序数据序列的排序任务的执行耗时,还可以提高排序任务的排序效率。
图11是本申请实施例提供的一种计算机设备的结构示意图,如图11所示,该计算机设备1000可以包括:至少一个处理器1001,例如CPU,至少一个网络接口1004,用户接口1003,存储器1005,至少一个通信总线1002。其中,通信总线1002用于实现这些组件之间的连接通信。其中,用户接口1003可以包括显示屏(Display)、键盘(Keyboard),网络接口1004可选地可以包括标准的有线接口、无线接口(如WI-FI接口)。存储器1005可以是高速RAM存储器,也可以是非不稳定的存储器(Non-volatile  Memory),例如至少一个磁盘存储器。存储器1005可选地还可以是至少一个位于远离前述处理器1001的存储装置。如图11所示,作为一种计算机可读存储介质的存储器1005可以包括操作系统、网络通信模块、用户接口模块以及设备控制应用程序。
在图11所示的计算机设备1000中,网络接口1004可提供网络通讯功能;而用户接口1003主要用于为用户提供输入的接口;而处理器1001可以用于调用存储器1005中存储的设备控制应用程序,以实现:获取针对待排序数据序列的数据排序请求,根据数据排序请求调用C个数据双调排序组件;C为大于1的正整数;根据待排序数据序列以及C个数据双调排序组件,开启B个数据双调排序任务;B为大于1的正整数;B个数据双调排序任务分别关联不同的待排序数据子序列,B个待排序数据子序列均是基于待排序数据序列所生成的;根据B个数据双调排序任务,并行运行C个数据双调排序组件,得到由C个数据双调排序组件所输出的B个数据排序子结果;一个数据排序子结果用于表征,一个数据双调排序任务所关联的待排序数据子序列的排序结果;基于C个数据双调排序组件,对B个数据排序子结果进行合并,得到针对待排序数据序列的数据排序结果。
应当理解,本申请实施例中所描述的计算机设备1000可执行前文图2、图6、以及图8所对应实施例中对数据处理方法的描述,也可执行前文图10所对应实施例中对数据处理装置1的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
本申请实施例还提供一种计算机可读存储介质,该计算机可读存储介质存储有计算机程序,该计算机程序包括程序指令,该程序指令被处理器执行时实现图2、图6、以及图8中各个步骤所提供的数据处理方法,具体可参见上述图2、图6、以及图8各个步骤所提供的实现方式,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
上述计算机可读存储介质可以是前述任一实施例提供的数据处理装置或者上述计算机设备的内部存储单元,例如计算机设备的硬盘或内存。该计算机可读存储介质也可以是该计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(SMC,Smart Media Card),安全数字(SD,Secure Digital)卡,闪存卡(flash card)等。进一步地,该计算机可读存储介质还可以既包括该计算机设备的内部存储单元也包括外部存储设备。该计算机可读存储介质用于存储该计算机程序以及该计算机设备所需的其他程序和数据。该计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
本申请实施例还提供了一种计算机程序产品,该计算机程序产品包括计算机程序或计算机指令,该计算机程序或计算机指令存储在计算机可读存储介质中。计算机设备的处理器从计算机可读存储介质读取该计算机程序或计算机指令,处理器执行该计算机程序或计算机指令,使得该计算机设备可执行前文图2、图6、以及图8所对应实施例中对数据处理方法的描述,在此不再赘述。另外,对采用相同方法的有益效果描述,也不再进行赘述。
本申请实施例的说明书和权利要求书及附图中的术语“第一”、“第二”等是用于区别不同对象,而非用于描述特定顺序。此外,术语“包括”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、装置、产品或设备没有限定于已列出的步骤或模块,而是可选地还包括没有列出的步骤或模块,或可选地还包括对于这些过程、方法、装置、产品或设备固有的其他步骤单元。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬 件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
本申请实施例提供的方法及相关装置是参照本申请实施例提供的方法流程图和结构示意图中的至少之一来描述的,具体可由计算机程序指令实现方法流程图、结构示意图,或者方法流程图和结构示意图的以下信息中的至少之一:每一流程和方框、以及流程图和方框图中的流程和方框的结合。这些计算机程序指令可提供到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和结构示意图一个方框或多个方框中指定的功能的装置。这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储介质中,使得存储在该计算机可读存储介质中的计算机程序产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和结构示意图一个方框或多个方框中指定的功能。这些计算机程序和指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和结构示意一个方框或多个方框中指定的功能的步骤。
以上所揭露的仅为本申请较佳实施例而已,当然不能以此来限定本申请之权利范围,因此依本申请权利要求所作的等同变化,仍属本申请所涵盖的范围。

Claims (17)

  1. 一种数据处理方法,所述方法由计算机设备执行,包括:
    获取针对待排序数据序列的数据排序请求,根据所述数据排序请求调用C个数据双调排序组件;C为大于1的正整数;
    根据所述待排序数据序列以及所述C个数据双调排序组件,开启B个数据双调排序任务;B为大于1的正整数;所述B个数据双调排序任务分别关联不同的待排序数据子序列,B个待排序数据子序列均是基于所述待排序数据序列所生成的;
    根据所述B个数据双调排序任务,并行运行所述C个数据双调排序组件,得到由所述C个数据双调排序组件所输出的B个数据排序子结果;一个数据排序子结果用于表征以下结果:一个数据双调排序任务所关联的待排序数据子序列的排序结果;
    基于所述C个数据双调排序组件,对所述B个数据排序子结果进行合并,得到针对所述待排序数据序列的数据排序结果。
  2. 根据权利要求1所述的方法,其中,在获取针对待排序数据序列的数据排序请求之后,所述方法还包括:
    将所述数据排序请求发送至任务控制组件;所述数据排序请求携带所述待排序数据序列中的待排序数据的第一总数量;
    所述根据所述待排序数据序列以及所述C个数据双调排序组件,开启B个数据双调排序任务,包括:
    在所述任务控制组件中,确定与所述第一总数量相关联的待排序数量;所述待排序数量等于或大于所述第一总数量;
    获取所述C个数据双调排序组件中的每一数据双调排序组件对应的数据存储容量,将所述待排序数量与总数据存储容量进行对比,得到第一对比结果;所述总数据存储容量等于C个数据存储容量的总和;
    根据所述第一对比结果开启所述B个数据双调排序任务。
  3. 根据权利要求2所述的方法,其中,所述根据所述第一对比结果开启所述B个数据双调排序任务,包括:
    若所述第一对比结果为所述待排序数量大于所述总数据存储容量,则获取所述待排序数量以及所述总数据存储容量之间的第一数量比值;
    根据所述第一数量比值开启所述B个数据双调排序任务;C与所述第一数量比值的乘积等于B;一个数据双调排序任务所携带的第二总数量等于所述数据存储容量;所述第二总数量用于表征以下信息:所述一个数据双调排序任务所关联的待排序数据子序列中的待排序数据的总数量;
    若所述第一对比结果为所述待排序数量小于或等于所述总数据存储容量,则根据所述C个数据双调排序组件开启所述B个数据双调排序任务;B等于C。
  4. 根据权利要求1所述的方法,其中,所述根据所述B个数据双调排序任务,并行运行所述C个数据双调排序组件,得到由所述C个数据双调排序组件所输出的B个数据排序子结果,包括:
    若B等于C,则将所述B个数据双调排序任务分发至所述C个数据双调排序组件;其中,一个数据双调排序任务被发送至一个数据双调排序组件;
    通过所述C个数据双调排序组件,并行运行所述B个数据双调排序任务,得到所述C个数据双调排序组件分别输出的数据排序子结果。
  5. 根据权利要求4所述的方法,其中,所述B个数据双调排序任务包括数据双调排序任务D e,e为正整数且e小于或等于B;所述B个待排序数据子序列包括所述数据 双调排序任务D e对应的待排序数据子序列A e;所述C个数据双调排序组件包括用于执行所述数据双调排序任务D e的数据双调排序组件F e
    所述通过所述C个数据双调排序组件,并行运行所述B个数据双调排序任务,得到所述C个数据双调排序组件分别输出的数据排序子结果,包括:
    在所述数据双调排序组件F e中,获取所述数据双调排序任务D e对应的所述待排序数据子序列A e,对所述待排序数据子序列A e进行双调排序,得到所述待排序数据子序列A e对应的数据排序子结果;
    在运行所述数据双调排序组件F e的同时,通过C-1个数据双调排序组件并行运行B-1个数据双调排序任务,得到C-1个数据双调排序组件分别输出的数据排序子结果;所述C-1个数据双调排序组件是指:所述C个数据双调排序组件中除了所述数据双调排序组件F e之外的数据双调排序组件;所述B-1个数据双调排序任务是指:所述B个数据双调排序任务中除了所述数据双调排序任务D e之外的数据双调排序任务。
  6. 根据权利要求5所述的方法,其中,所述数据排序请求携带所述待排序数据序列中的待排序数据分别对应的读取地址,以及针对所述待排序数据序列的排序顺序;
    所述在所述数据双调排序组件F e中,获取所述数据双调排序任务D e对应的所述待排序数据子序列A e,包括:
    在所述数据双调排序组件F e中,获取所述数据双调排序任务D e所携带的目标读取地址以及第二总数量;所述待排序数据序列中的待排序数据分别对应的读取地址中包括所述目标读取地址;所述第二总数量用于表征所述待排序数据子序列A e中的待排序数据的总数量;
    发送携带有所述目标读取地址的数据获取请求至数据读写组件,通过所述数据读写组件,从所述目标读取地址获取初始待排序数据子序列;所述初始待排序子序列属于所述待排序数据序列;
    确定所述第二总数量以及第三总数量之间的数量差值;所述第三总数量用于表征所述初始待排序数据子序列中的待排序数据的总数量;
    根据所述数量差值以及所述排序顺序获取待排序补充数据,将所述待排序补充数据添加至所述初始待排序数据子序列中,得到所述待排序数据子序列A e
    将所述待排序数据子序列A e加载至所述数据双调排序组件F e对应的存储组件中。
  7. 根据权利要求5所述的方法,其中,所述数据双调排序组件F e包括排序控制子组件、轮次控制子组件、数据获取子组件以及双调排序子组件;
    所述对所述待排序数据子序列A e进行双调排序,得到所述待排序数据子序列A e对应的数据排序子结果,包括:
    当所述排序控制子组件获取到数据加载完成通知时,启动所述轮次控制子组件;所述数据加载完成通知用于表征所述数据双调排序组件F e对应的存储组件已加载所述待排序数据子序列A e
    在所述轮次控制子组件中,根据第二总数量,确定所述待排序数据子序列A e对应的双调排序阶段以及所述双调排序阶段对应的双调排序逻辑参数;将所述双调排序阶段以及所述双调排序阶段对应的双调排序逻辑参数,发送至所述数据获取子组件;所述第二总数量用于表征所述待排序数据子序列A e中的待排序数据的总数量;
    在所述数据获取子组件中,根据所述双调排序阶段以及所述双调排序阶段对应的双调排序逻辑参数,获取中间待排序数据;将所述中间待排序数据发送至所述双调排序子组件,通过所述双调排序子组件,对所述中间待排序数据进行排序,得到中间排序子结果;根据所述中间排序子结果确定所述待排序数据子序列A e对应的数据排序子结果;所述待排序数据子序列A e包括所述中间待排序数据。
  8. 根据权利要求7所述的方法,其中,所述双调排序阶段的阶段数为至少两个;
    所述在所述数据获取子组件中,根据所述双调排序阶段以及所述双调排序阶段对应的双调排序逻辑参数,获取中间待排序数据;将所述中间待排序数据发送至所述双调排序子组件,通过所述双调排序子组件,对所述中间待排序数据进行排序,得到中间排序子结果;根据所述中间排序子结果确定所述待排序数据子序列A e对应的数据排序子结果,包括:
    在所述数据获取子组件中,根据至少两个双调排序阶段中的第i个双调排序阶段,从所述存储组件中读取第一数据排序过渡子结果;i是大于1的正整数,且所述第一数据排序过渡子结果为第i-1个双调排序阶段中的输出结果;所述第i个双调排序阶段对应的双调排序逻辑参数包括n轮排序轮次逻辑参数;n是基于i所确定的,且n为大于1的正整数;
    根据第j轮排序轮次逻辑参数,从中间数据排序过渡子结果中获取中间待排序数据;j为正整数且j小于或等于n;若j为1,则所述中间数据排序过渡子结果为所述第一数据排序过渡子结果;若j不为1,则所述中间数据排序过渡子结果为缓存在所述数据获取子组件中的第二数据排序过渡子结果,所述第二数据排序过渡子结果是基于第j-1轮排序轮次逻辑参数所确定的;
    将所述中间待排序数据发送至所述双调排序子组件,通过所述双调排序子组件,对所述中间待排序数据进行排序,得到所述中间待排序数据的中间排序子结果,根据所述中间排序子结果确定所述待排序数据子序列A e对应的第三数据排序过渡子结果;
    若j小于n,则将所述第三数据排序过渡子结果缓存在所述数据获取子组件中;
    若j=n,且所述第i个双调排序阶段为所述至少两个双调排序阶段的最后一个双调排序阶段,则将所述第三数据排序过渡子结果写入至所述存储组件,将所述第三数据排序过渡子结果确定为所述待排序数据子序列A e对应的数据排序子结果。
  9. 根据权利要求1所述的方法,其中,所述根据所述B个数据双调排序任务,并行运行所述C个数据双调排序组件,得到由所述C个数据双调排序组件所输出的B个数据排序子结果,包括:
    若B大于C,则将所述B个数据双调排序任务拆分为K*C个数据双调排序任务;K为所述C个数据双调排序组件分别对应的最大运行轮次;
    将第i轮运行轮次中的C个数据双调排序任务分发至所述C个数据双调排序组件;其中,一个数据双调排序任务被发送至一个数据双调排序组件;i为小于或等于K的正整数;
    通过所述C个数据双调排序组件,并行运行所述第i轮运行轮次中的C个数据双调排序任务,得到所述C个数据双调排序组件分别输出的数据排序子结果,直至i等于K时得到与所述B个数据双调排序任务分别对应的数据排序子结果。
  10. 根据权利要求1所述的方法,其中,所述基于所述C个数据双调排序组件,对所述B个数据排序子结果进行合并,得到针对所述待排序数据序列的数据排序结果,包括:
    在第L次合并中获取U个数据排序子结果对;U为正整数;L为正整数;若L为1,则所述U个数据排序子结果对是基于所述B个数据排序子结果所生成的;若L不为1,则所述U个数据排序子结果对是基于历史合并数据排序子结果所生成的,所述历史合并数据排序子结果是在第L-1次合并中得到的子结果;
    基于所述C个数据双调排序组件,对所述U个数据排序子结果对进行合并,得到U个目标合并数据排序子结果;
    若U等于1,则将所述U个目标合并数据排序子结果,确定为所述待排序数据序列 对应的数据排序结果;
    若U大于1,则在第L+1次合并中,基于所述C个数据双调排序组件,对所述U个目标合并数据排序子结果进行合并,得到针对所述待排序数据序列的数据排序结果。
  11. 根据权利要求10所述的方法,其中,所述U个数据排序子结果对包括数据排序子结果对M p,p为正整数,且p小于或等于U;
    所述基于所述C个数据双调排序组件,对所述U个数据排序子结果对进行合并,得到U个目标合并数据排序子结果,包括:
    确定所述数据排序子结果对M p所关联的待排序数据的第四总数量、以及所述数据排序子结果对M p所关联的数据双调排序组件对应的数据存储容量;所述C个数据双调排序组件包括所述数据排序子结果对M p所关联的数据双调排序组件;
    将所述第四总数量与所述数据存储容量进行对比,得到第二对比结果;
    根据所述第二对比结果以及所述数据排序子结果对M p所关联的数据双调排序组件,对所述数据排序子结果对M p进行双调排序,得到所述数据排序子结果对M p对应的目标合并数据排序子结果;
    将所述U个数据排序子结果对对应的目标合并数据排序子结果,确定为所述U个目标合并数据排序子结果。
  12. 根据权利要求11所述的方法,其中,所述数据排序子结果对M p包括第一数据排序子结果以及第二数据排序子结果;
    所述根据所述第二对比结果以及所述数据排序子结果对M p所关联的数据双调排序组件,对所述数据排序子结果对M p进行双调排序,得到所述数据排序子结果对M p对应的目标合并数据排序子结果,包括:
    若所述第二对比结果为所述第四总数量大于所述数据存储容量,则将所述第一数据排序子结果均分为第一有序数据片段以及第二有序数据片段,将所述第二数据排序子结果均分为第三有序数据片段以及第四有序数据片段;
    若所述第一有序数据片段中的数据小于或等于所述第二有序数据片段中的数据,且所述第三有序数据片段中的数据小于或等于所述第四有序数据片段中的数据,则基于第一数据双调排序组件,对所述第一有序数据片段以及所述第三有序数据片段进行双调排序,得到第三数据排序子结果;所述数据排序子结果对M p所关联的数据双调排序组件包括所述第一数据双调排序组件;
    基于第二数据双调排序组件,对所述第二有序数据片段以及所述第四有序数据片段进行双调排序,得到第四数据排序子结果;所述数据排序子结果对M p所关联的数据双调排序组件包括所述第二数据双调排序组件;
    将所述第三数据排序子结果均分为第五有序数据片段以及第六有序数据片段,将所述第四数据排序子结果均分为第七有序数据片段以及第八有序数据片段;
    若所述第五有序数据片段中的数据小于或等于所述第六有序数据片段中的数据,所述第七有序数据片段中的数据小于或等于所述第八有序数据片段中的数据,则基于目标数据双调排序组件,对所述第六有序数据片段以及所述第七有序数据片段进行双调排序,得到第五数据排序子结果;所述目标数据双调排序组件为所述第一数据双调排序组件或所述第二数据双调排序组件;
    基于所述第五有序数据片段、所述第五数据排序子结果以及所述第八有序数据片段,确定所述数据排序子结果对M p对应的目标合并数据排序子结果。
  13. 根据权利要求1至12任一项所述的方法,其中,所述C个数据双调排序组件均包括索引生成子组件,所述索引生成子组件用于为所述待排序数据序列中的每个待排序数据分别分配索引值;所述数据排序请求携带写入地址;
    所述方法还包括:
    按序获取所述数据排序结果中的每个待排序数据分别对应的索引值,根据获取到的索引值,生成与所述待排序数据序列具有相同序列长度的索引值序列;
    发送针对所述索引值序列的数据写入请求至数据读写组件,通过所述数据读写组件,将所述索引值序列写入至所述写入地址。
  14. 一种数据处理装置,包括:
    调用组件模块,配置为获取针对待排序数据序列的数据排序请求,根据所述数据排序请求调用C个数据双调排序组件;C为大于1的正整数;
    开启任务模块,配置为根据所述待排序数据序列以及所述C个数据双调排序组件,开启B个数据双调排序任务;B为大于1的正整数;所述B个数据双调排序任务分别关联不同的待排序数据子序列,B个待排序数据子序列均是基于所述待排序数据序列所生成的;
    并行组件模块,配置为根据所述B个数据双调排序任务,并行运行所述C个数据双调排序组件,得到由所述C个数据双调排序组件所输出的B个数据排序子结果;一个数据排序子结果用于表征以下结果:一个数据双调排序任务所关联的待排序数据子序列的排序结果;
    合并结果模块,配置为基于所述C个数据双调排序组件,对所述B个数据排序子结果进行合并,得到针对所述待排序数据序列的数据排序结果。
  15. 一种计算机设备,包括:处理器、存储器以及网络接口;所述处理器与所述存储器、所述网络接口相连,其中,所述网络接口配置为提供数据通信功能,所述存储器配置为存储计算机程序,所述处理器配置为调用所述计算机程序,以使得所述计算机设备执行权利要求1至13任一项所述的数据处理方法。
  16. 一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,所述计算机程序适于由处理器加载并执行,以使得具有所述处理器的计算机设备执行权利要求1至13任一项所述的数据处理方法。
  17. 一种计算机程序产品,计算机程序产品包括计算机程序,所述计算机程序存储在计算机可读存储介质中,所述计算机程序适于由处理器读取并执行,以使得具有所述处理器的计算机设备执行如权利要求1至13任一项的数据处理方法。
PCT/CN2022/118483 2021-10-25 2022-09-13 数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品 WO2023071566A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP22885453.5A EP4328748A1 (en) 2021-10-25 2022-09-13 Data processing method and apparatus, computer device, computer-readable storage medium, and computer program product
US18/335,491 US20230325149A1 (en) 2021-10-25 2023-06-15 Data processing method and apparatus, computer device, and computer-readable storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111242449.5 2021-10-25
CN202111242449.5A CN114356512A (zh) 2021-10-25 2021-10-25 一种数据处理方法、设备以及计算机可读存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/335,491 Continuation US20230325149A1 (en) 2021-10-25 2023-06-15 Data processing method and apparatus, computer device, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
WO2023071566A1 true WO2023071566A1 (zh) 2023-05-04

Family

ID=81095709

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/118483 WO2023071566A1 (zh) 2021-10-25 2022-09-13 数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品

Country Status (4)

Country Link
US (1) US20230325149A1 (zh)
EP (1) EP4328748A1 (zh)
CN (1) CN114356512A (zh)
WO (1) WO2023071566A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114356512A (zh) * 2021-10-25 2022-04-15 腾讯科技(深圳)有限公司 一种数据处理方法、设备以及计算机可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110295862A1 (en) * 2010-05-28 2011-12-01 Taylor Derek A Early return of partial sort results in a database system
CN111913955A (zh) * 2020-06-22 2020-11-10 中科驭数(北京)科技有限公司 数据的排序处理装置、方法和存储介质
CN112015366A (zh) * 2020-07-06 2020-12-01 中科驭数(北京)科技有限公司 数据排序方法、数据排序装置及数据库系统
CN114356512A (zh) * 2021-10-25 2022-04-15 腾讯科技(深圳)有限公司 一种数据处理方法、设备以及计算机可读存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110295862A1 (en) * 2010-05-28 2011-12-01 Taylor Derek A Early return of partial sort results in a database system
CN111913955A (zh) * 2020-06-22 2020-11-10 中科驭数(北京)科技有限公司 数据的排序处理装置、方法和存储介质
CN112015366A (zh) * 2020-07-06 2020-12-01 中科驭数(北京)科技有限公司 数据排序方法、数据排序装置及数据库系统
CN114356512A (zh) * 2021-10-25 2022-04-15 腾讯科技(深圳)有限公司 一种数据处理方法、设备以及计算机可读存储介质

Also Published As

Publication number Publication date
CN114356512A (zh) 2022-04-15
US20230325149A1 (en) 2023-10-12
EP4328748A1 (en) 2024-02-28

Similar Documents

Publication Publication Date Title
WO2022037337A1 (zh) 机器学习模型的分布式训练方法、装置以及计算机设备
CN110520853B (zh) 直接存储器访问的队列管理
CN107844837B (zh) 针对机器学习算法进行算法参数调优的方法及系统
CN110321958B (zh) 神经网络模型的训练方法、视频相似度确定方法
US20190279088A1 (en) Training method, apparatus, chip, and system for neural network model
WO2017166449A1 (zh) 机器学习模型生成方法和装置
KR102501327B1 (ko) 정보의 병렬처리 방법 및 장치
CN113469355B (zh) 分布式系统中的多模型训练管道
CN108933695B (zh) 用于处理信息的方法和装置
CN105700956A (zh) 用于处理分布式作业的方法和系统
CN111985831A (zh) 云计算资源的调度方法、装置、计算机设备及存储介质
WO2021258512A1 (zh) 数据的聚合处理装置、方法和存储介质
US11023825B2 (en) Platform as a service cloud server and machine learning data processing method thereof
EP3848815A1 (en) Efficient shared bulk loading into optimized storage
CN110673959A (zh) 用于处理任务的系统、方法和装置
WO2023226947A1 (zh) 端云协同推荐系统、方法以及电子设备
WO2023071566A1 (zh) 数据处理方法、装置、计算机设备、计算机可读存储介质及计算机程序产品
WO2021189845A1 (zh) 时间序列异常点的检测方法、装置、设备及可读存储介质
CN112182111A (zh) 基于区块链的分布式系统分层处理方法和电子设备
CN110909085A (zh) 数据处理方法、装置、设备及存储介质
CN115328891A (zh) 数据迁移方法、装置、存储介质及电子设备
US9172729B2 (en) Managing message distribution in a networked environment
CN114266937A (zh) 模型训练、图像处理方法,装置,设备以及存储介质
CN113313196B (zh) 标注数据处理方法、相关装置及计算机程序产品
CN111949500B (zh) 资源匹配方法、装置、电子设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22885453

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022885453

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 2022885453

Country of ref document: EP

Effective date: 20231120