CN111913955A - Data sorting processing device, method and storage medium - Google Patents

Data sorting processing device, method and storage medium Download PDF

Info

Publication number
CN111913955A
CN111913955A CN202010573219.6A CN202010573219A CN111913955A CN 111913955 A CN111913955 A CN 111913955A CN 202010573219 A CN202010573219 A CN 202010573219A CN 111913955 A CN111913955 A CN 111913955A
Authority
CN
China
Prior art keywords
subsequences
sequence
module
sorting
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010573219.6A
Other languages
Chinese (zh)
Inventor
鄢贵海
卢文岩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yusur Technology Co ltd
Original Assignee
Yusur Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yusur Technology Co ltd filed Critical Yusur Technology Co ltd
Priority to CN202010573219.6A priority Critical patent/CN111913955A/en
Publication of CN111913955A publication Critical patent/CN111913955A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The specification provides a data sorting processing device, a data sorting processing method and a storage medium. The device can firstly split a more complex target data sequence into a plurality of subsequences through the sequence grouping module, then can simultaneously sequence the plurality of subsequences in a parallel processing mode through the sequencing module group by utilizing the plurality of parallel sequencing modules to obtain a plurality of corresponding sequenced subsequences, and then combines the plurality of sequenced subsequences through the combining module to obtain a sequenced target data sequence, so that the sequencing processing aiming at the target data sequence is completed, and the technical problems of low sequencing processing efficiency and long time consumption in the processing process in the existing method can be solved. The technical effects of improving the sequencing processing efficiency and reducing the waiting time of the user are achieved.

Description

Data sorting processing device, method and storage medium
Technical Field
The present disclosure relates to database processing technologies, and in particular, to a data sorting apparatus, a data sorting method, and a storage medium.
Background
With the development of technology, the amount of data contained in the database becomes more and more huge. Accordingly, the time consumed by the server for sorting (or operating) the data in the database in response to the query request of the user for the database becomes longer and longer, so that the user often needs to wait for a longer time to obtain the sorted data sequence fed back by the server.
Therefore, a method for efficiently performing sorting processing on data in a database is needed.
Disclosure of Invention
The present specification provides a data sorting processing apparatus, a data sorting processing method, and a storage medium, so as to improve the processing efficiency of sorting processing for data of a database, and solve the technical problems of low sorting processing efficiency and long processing time consumption in the existing method.
The data sorting processing device, method and storage medium provided by the specification are realized by:
an apparatus for sorting data, comprising: the system comprises a sequence grouping module, a sequencing module group and a combining module, wherein the sequence grouping module is connected with the sequencing module group, and the combining module is connected with the sequencing module group;
the sequence grouping module is used for dividing a target data sequence in the accessed database into a plurality of subsequence groups to obtain a plurality of subsequences; wherein the target data sequence comprises a plurality of data elements;
the sequencing module group comprises a plurality of parallel sequencing modules, and is used for acquiring a plurality of subsequences and sequencing the subsequences in parallel through the parallel sequencing modules to obtain a plurality of corresponding sequenced subsequences;
and the merging module is used for merging the plurality of sequenced subsequences according to a preset rule to obtain a sequenced target data sequence.
In one embodiment, the sorting modules are respectively configured with a preset sorting algorithm, and the sorting modules perform sorting processing on the subsequences according to the preset sorting algorithm.
In one embodiment, the predetermined ranking algorithm comprises at least one of: a bitonic ordering algorithm, a fast ordering algorithm, a bubble algorithm, a tournament ordering method, etc.
In one embodiment, the merge module includes a plurality of parallel merge sub-modules.
In one embodiment, the merging module is specifically configured to divide the plurality of sorted subsequences into a plurality of merging groups, and allocate the plurality of merging groups to a plurality of parallel sub-modules; wherein the merge group comprises two ordered subsequences;
the merging module compares data elements contained in two sequenced subsequences in the same merging group through a plurality of merging sub-modules respectively so as to merge the two sequenced subsequences in the same merging group into a sequenced group sequence; and the merging module is used for merging the sorted group sequences to obtain a sorted target data sequence.
In one embodiment, a subsequence cache module is further connected between the sorting module group and the combining module, and the subsequence cache module is configured to cache a plurality of sorted subsequences.
In one embodiment, the sub-sequence caching module comprises an on-chip memory, and the sub-sequence caching module is configured to cache the plurality of ordered sub-sequences in the on-chip memory.
In one embodiment, the sub-sequence caching module further includes a DDR memory and/or an SSD memory and/or an SRAM, and the sub-sequence caching module caches the plurality of ordered sub-sequences in the DDR memory and/or the SSD memory if the storage space of the on-chip memory does not satisfy the storage requirement.
In one embodiment, the sequence grouping module is specifically configured to determine, according to the processing performance (the maximum number of data elements that can be processed) of each sorting module, the number of data elements that match a single sorting process as a partitioning parameter; equally dividing the target data sequence into a plurality of subsequence groups according to the input time of the data elements, wherein the number of the data elements contained in the subsequence groups is equal to or less than the dividing parameter; and obtaining a corresponding subsequence according to the data elements in the subsequence group.
In one embodiment, the subsequence length supported by the sorting module matches the partitioning parameter.
The embodiment of the application further provides a data sorting processing method, which includes:
acquiring a target data sequence in a database, wherein the target data sequence comprises a plurality of data elements;
dividing a plurality of data elements in the target data sequence into a plurality of subsequence groups to obtain a plurality of subsequences;
sequencing the plurality of subsequences in parallel to obtain a plurality of sequenced subsequences;
and combining the plurality of sequenced subsequences according to a preset rule to obtain a sequenced target data sequence.
In one embodiment, dividing a plurality of data elements in the target data sequence into a plurality of subsequence groups to obtain a plurality of subsequences includes:
determining the number of data elements matched with the single sorting processing as a dividing parameter according to the processing efficiency of the sorting module;
equally dividing the target data sequence into a plurality of subsequence groups according to the input time of the data elements, wherein the number of the data elements contained in the subsequence groups is equal to or less than the dividing parameter;
and obtaining a corresponding subsequence according to the data elements in the subsequence group.
In one embodiment, the parallel sorting of the plurality of subsequences comprises:
according to a preset sorting algorithm, parallel sorting processing is carried out on the plurality of subsequences, wherein the preset sorting algorithm comprises at least one of the following steps: a bitonic ordering algorithm, a fast ordering algorithm, a bubble algorithm.
In one embodiment, according to a preset rule, merging the plurality of ordered subsequences to obtain an ordered target data sequence, including:
dividing the plurality of ordered subsequences into a plurality of combining groups, wherein each combining group comprises two ordered subsequences;
merging the two sorted subsequences in the same merge group into a sorted group sequence by comparing the data elements contained in the two sorted subsequences in the same merge group;
and combining the sorted group sequences to obtain a sorted target data sequence.
In one embodiment, the method further comprises:
storing the ordered subsequence on an on-chip memory.
In one embodiment, the ordered subsequence is stored on a DDR memory, or SSD memory, in the event that the on-chip memory storage space does not meet the storage requirement.
In one embodiment, obtaining a target data sequence in a database comprises:
receiving a data query request of a user for a database;
and acquiring a related target data sequence from a database according to the data query request.
In an embodiment, after merging the plurality of ordered subsequences according to a preset rule to obtain an ordered target data sequence, the method further includes:
and displaying the sorted target data sequence to a user.
Embodiments of the present application also provide a computer-readable storage medium having stored thereon computer instructions that, when executed, enable obtaining a target data sequence in a database, wherein the target data sequence includes a plurality of data elements; dividing a plurality of data elements in the target data sequence into a plurality of subsequence groups to obtain a plurality of subsequences; sequencing the plurality of subsequences in parallel to obtain a plurality of sequenced subsequences; and combining the plurality of sequenced subsequences according to a preset rule to obtain a sequenced target data sequence.
The device, the method and the storage medium for processing data in sequence provided by the present specification are configured to split a relatively complex target data sequence into a plurality of subsequences, and then sequence the plurality of subsequences simultaneously in a parallel processing manner to obtain a plurality of corresponding sequenced subsequences, and then merge the plurality of sequenced subsequences according to a preset rule to obtain a sequenced target data sequence, thereby completing the sequencing processing for the target data sequence, and solving the technical problems of low efficiency of data sequencing processing in a database and long time consumption in the processing process in the existing method. The technical effects of improving the sequencing processing efficiency, reducing the waiting time when a user queries and accesses the database and improving the use experience of the user are achieved.
Drawings
In order to more clearly illustrate the embodiments of the present specification, the drawings needed to be used in the embodiments will be briefly described below, and the drawings in the following description are only some of the embodiments described in the present specification, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.
Fig. 1 is a schematic structural composition diagram of a data sorting processing apparatus provided in an embodiment of the present specification;
fig. 2 is a schematic structural composition diagram of a data sorting processing apparatus according to another embodiment of the present specification;
fig. 3 is a schematic structural diagram of a data sorting processing apparatus according to yet another embodiment of the present specification;
FIG. 4 is a flow chart illustrating a method for sorting data according to an embodiment of the present disclosure;
FIG. 5 is a diagram illustrating an embodiment of a method for ordering data according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram of an embodiment of a method for sorting data, to which the embodiments of the present specification are applied, in an example scenario.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.
Considering that when the general purpose processor CPU performs sorting operation on a data sequence in a database, data elements in the data sequence are often sorted one by one in a serial processing manner, although multithreading is possible, the parallelism is low. With the development of technologies such as big data, the amount of data related to the database is more and more huge. Accordingly, the sequence of data to be sequenced by the server becomes more and more complex. For example, a server may often order data sequences that contain millions or even billions of data elements. For the complex data sequence, if the sorting process is performed based on the serial processing manner of the CPU, a long processing time is often consumed in the processing process, so that the delay is obviously increased, and a user needs to wait for a long time to finally obtain the sorted data sequence. Therefore, when the existing CPU scheme is used for sorting a complex data sequence, the technical problems of low sorting processing efficiency, long processing time consumption, long waiting time of a user, and poor user experience often exist.
In view of the root cause of the above problems, the present specification proposes a new processing chip architecture for the sequence ordering operation, which overcomes the shortage of the existing general purpose processor CPU for serially processing data. The core idea is to realize parallel and efficient sequence sequencing through a plurality of sequencing modules. In the implementation, a more complex target data sequence is split into a plurality of relatively simple sub-sequences in a grouping manner, so that the processing performance of the sorting modules (such as the maximum number of data elements that can be processed by each sorting module) can be fully utilized, the plurality of sub-sequences are simultaneously sorted in a parallel processing manner, the sorting of the plurality of sub-sequences is rapidly completed, and the corresponding plurality of sorted sub-sequences are obtained. And further, merging the sorted subsequences in a merging mode to finally obtain a sorted target data sequence, and finishing the sorting process aiming at the target data sequence. Therefore, the technical problems of low sequencing processing efficiency, long processing time consumption, long waiting time of a user and poor user experience in the conventional method can be effectively solved, the sequencing processing efficiency is improved, the waiting time of the user for inquiring and accessing the database is reduced, and the technical effect of improving the use experience of the user is achieved.
Based on the thought, the embodiment of the application provides a data sorting processing device. Specifically, please refer to fig. 1, which is a schematic structural diagram of a data sorting processing apparatus according to an embodiment of the present disclosure. Wherein, the device specifically can include: a sequence grouping module 101, a group of sorting modules 103, and a merging module 105. The sequence grouping module 101 is connected to the sorting module group 103, and the combining module 105 is connected to the sorting module group 103.
Specifically, the sorting module group 103 may further include a plurality of sorting modules, for example, sorting module 0, sorting module 1 … …, sorting module N, and the like. The sorting modules are connected in parallel to form the sorting module group 103. The data processing among the sequencing modules is independent.
The sequence grouping module 101 may be configured to divide a target data sequence in an accessed database into a plurality of subsequence groups to obtain a plurality of subsequences; wherein the target data sequence comprises a plurality of data elements.
The target data sequence may be a data sequence that a user wants to query in a database, or a data sequence that is input from the outside or input through an on-chip cache.
The sorting module group 103 specifically includes a plurality of parallel sorting modules, and the sorting module group is configured to obtain a plurality of subsequences from the sequence grouping module group 101, and perform parallel sorting processing on the plurality of subsequences through the plurality of parallel sorting modules, respectively, to obtain a plurality of corresponding sorted subsequences.
After obtaining the plurality of subsequences output by the grouping module 101, the sorting module group 103 allocates the plurality of subsequences to a plurality of sorting modules for processing. In the specific allocation, if the number of the plurality of subsequences to be ordered is greater than the number of the plurality of ordering modules, the plurality of subsequences may be allocated to the plurality of ordering modules in batches for ordering processing.
The merging module 105, when implemented specifically, may be configured to merge the plurality of sorted subsequences according to a preset rule to obtain a sorted target data sequence.
Specifically, the merging module 105 may merge every two of the plurality of sequenced subsequences to obtain a merged sequence as a merged group sequence. And combining the combined group sequences pairwise, and repeating the steps until only one sequenced sequence is left as the sequenced target data sequence through combination.
In this embodiment, in a specific implementation, when performing sorting processing on a target data sequence with a relatively large number of complex contained data elements, the above apparatus may first divide the target data sequence into a plurality of relatively simple subsequences with a relatively small number of contained data elements through the sequence grouping module 101. Further, the plurality of subsequences output by the sequence grouping module 101 are first allocated to the plurality of included parallel sequencing modules by the sequencing module group 103. And then, a plurality of parallel sequencing modules can simultaneously perform sequencing processing on the plurality of sub-sequences respectively allocated to the sequencing modules in a parallel processing manner to obtain a plurality of corresponding sequenced sub-sequences. And then the merging module 105 merges the plurality of sorted subsequences to obtain a sorted target data sequence.
The device can firstly split a more complex target data sequence into a plurality of subsequences, then can simultaneously sequence the plurality of subsequences in a parallel processing mode to obtain corresponding sequenced subsequences, and then combines the sequenced subsequences to obtain the sequenced target data sequence, thereby completing the sequencing processing of the target data sequence, and further solving the technical problems of low sequencing processing efficiency and long processing time consumption in the prior art. The technical effects of improving the sequencing processing efficiency, reducing the waiting time when a user queries and accesses the database and improving the use experience of the user are achieved.
In an embodiment, the sorting modules may be specifically configured with preset sorting algorithms, and when the sorting modules are specifically implemented, the corresponding sorting processing may be performed on the subsequences according to the configured preset sorting algorithms.
In an embodiment, the preset sorting algorithm may specifically include at least one of: a bitonic ordering algorithm, a fast ordering algorithm, a bubble algorithm, a tournament ordering method, etc. Of course, it should be noted that the above listed sorting algorithm is only an illustrative one. In specific implementation, other types of sorting algorithms can be introduced as the preset sorting algorithm according to specific application scenarios and processing requirements. The present specification is not limited to these.
In one embodiment, specifically referring to fig. 2, the merge module 105 may specifically include a plurality of merge sub-modules connected in parallel.
For example, the merge module 105 may further include a merge submodule 0 and a merge submodule 1 … … in parallel, and a merge submodule M. In this way, when the merging module 105 specifically merges the plurality of sorted sub-sequences, the merging sub-modules included in the merging module may be called to respectively obtain two sorted sub-sequences for pairwise merging. And then calling the merging submodule to merge the two merged results. And only one total merging result is left after multiple times of merging, and the total merging result is finally output as the sorted target data sequence. Therefore, the sorted sequences can be specifically combined in a parallel processing mode, and the sorting processing efficiency is further improved.
In an embodiment, when the merging module 105 is implemented, the merging module may be configured to divide the plurality of sorted sub-sequences into a plurality of merging groups, and allocate the plurality of merging groups to a plurality of parallel sub-modules. Wherein the merged group comprises two ordered subsequences. The merging module compares data elements contained in two sequenced subsequences in the same merging group through a plurality of merging sub-modules respectively so as to merge the two sequenced subsequences in the same merging group into a sequenced group sequence; and the merging module is used for merging the sorted group sequences to obtain a sorted target data sequence.
In this embodiment, each merging module may merge two sorted subsequences in the allocated merging group in the pairwise merging manner to obtain a sorted group sequence corresponding to the merging group.
When the merging module specifically merges two sorted subsequences in the merged group, the merging module may use an ascending order (sorting according to the data values of the data elements from small to large) or a descending order (sorting according to the data values of the data elements from large to small) as a preset rule. In this embodiment, taking an ascending sorting rule as an example, the sorted subsequence 1 and the sorted subsequence 2 included in the assigned merge group may be extracted from the sorted subsequence 1 and the sorted subsequence 2, respectively, to perform numerical comparison of data values of data elements located at the first ordinal position after sorting, and the data element with the smaller numerical value is taken as the data element located at the first ordinal position in the sorted group sequence, and the data element with the larger numerical value is retained to participate in the next round of comparison. For example, the data value of the data element at the first ordinal position in the sorted sub-sequence 1 is greater than the data element at the first ordinal position in the sorted sub-sequence 2. Thus, the data element at the first ordinal position in the sorted sub-sequence 2 may be determined to be the data element at the first ordinal position in the sorted group sequence, i.e., the data element is sorted first in the sorted group sequence. Meanwhile, the data elements of the first sequence position in the ordered subsequence 1 are retained for participating in the next round of comparison. In the next comparison, the data element arranged at the first ordinal position in the remaining data elements is extracted from the sorted subsequence 2, and compared with the numerical value of the data element at the first ordinal position in the sorted subsequence 1 that is retained previously. According to the same rule, the smaller data elements are used for determining the data source positioned at the second sequential position in the sorted group sequence, and the larger data elements are reserved for participating in the next round of comparison. And comparing each data element included in the two sorted subsequences in the merged group in sequence according to the mode to obtain a corresponding sorted group sequence. Thereby completing the pairwise merging process for the two subsequences in the merged group.
For the combination of the sorted group sequences, reference may be made to the combination of the sorted subsequences, which is not described in detail herein. The merging module group 105 may merge a plurality of sorted subsequences in a pairwise merging manner to obtain a plurality of sorted group sequences; and combining a plurality of sequenced group sequences in a pairwise combination mode, and finally only one sequenced sequence is left by combination to serve as a sequenced target data sequence.
In an embodiment, referring to fig. 3, a sub-sequence caching module 104 is further connected between the merging module 105 and the sorting module group 103, and when the sub-sequence caching module 104 is implemented, the sub-sequence caching module may be configured to cache a plurality of sorted sub-sequences.
In specific implementation, a plurality of sequenced sub-sequences obtained by the sequencing module group 103 through parallel sequencing processing may be buffered by the sub-sequence buffering module 104. When the sorted sub-sequences are to be merged, the merging module 105 may read a plurality of sorted sub-sequences from the buffer for merging.
In an embodiment, the sub-sequence caching module 104 may specifically include an on-chip memory, and when the sub-sequence caching module 104 is specifically implemented, the plurality of ordered sub-sequences may be cached in the on-chip memory.
In specific implementation, under the condition that the storage space of the on-chip memory meets the storage requirement, the subsequence cache module 104 may preferentially cache the plurality of sequenced subsequences in the on-chip memory in specific implementation, so that the reading efficiency of the subsequent merge module 105 reading the plurality of subsequence cache modules can be effectively improved, and the overall processing efficiency is improved.
In an embodiment, the sub-sequence caching module 104 may further specifically include a DDR memory and/or an SSD memory and/or an SRAM, and the sub-sequence caching module 104 may cache the plurality of ordered sub-sequences in the DDR memory and/or the SSD memory when the storage space of the on-chip memory does not satisfy the storage requirement. In this way, when the storage space of the on-chip memory is insufficient, the sub-sequence caching module 104 may cache the plurality of sorted sub-sequences in an off-chip memory such as a DDR memory and/or an SSD memory, and the merging module 105 may read the plurality of sorted sub-sequences from the DDR memory and/or the SSD memory.
In one embodiment, in order to fully utilize the efficiency of the sorting module (e.g., the maximum number of data elements that can be processed by each sorting module) and perform sorting more efficiently, when the sequence grouping module 101 is implemented, the number of data elements that match a single sorting process may be determined as a partition parameter according to the processing efficiency of the sorting module; equally dividing the target data sequence into a plurality of subsequence groups according to the input time of the data elements, wherein the number of the data elements contained in the subsequence groups is equal to or less than the dividing parameter; and obtaining a corresponding subsequence according to the data elements in the subsequence group.
In one embodiment, the length of the sub-sequence supported by the sorting module in the sorting module group 103 matches the partition parameter. Thus, the processing efficiency of the sorting module can be fully exerted to carry out sorting processing.
In an embodiment, a subsequence grouping module group may be further connected between the sorting module group 103 and the sequence grouping module 101, where the subsequence grouping module group may include a plurality of subsequence grouping modules. The plurality of sub-sequence grouping modules may be specifically configured to further group data elements included in each of the plurality of sub-sequences, and further split each of the plurality of sub-sequences into a plurality of unit sub-sequences.
In this embodiment, considering that a plurality of subsequences obtained by dividing by the sequence grouping module 101 may still contain more data elements, the sorting module in the sorting module group 103 is relatively complex to perform sorting. In this case, the plurality of subsequences may be further divided by the plurality of subsequence grouping modules in the set of subsequence grouping modules, so as to obtain a plurality of unit subsequences that contain fewer data elements and are relatively simpler. Correspondingly, the plurality of sorting modules in the subsequent sorting module group 103 may perform parallel sorting processing on the plurality of unit subsequences, respectively, to obtain sorted unit subsequences; and combining the sequenced unit subsequences corresponding to the same subsequence to obtain a corresponding sequenced subsequence. Therefore, the complexity of the sorting processing performed by the sorting module group 103 can be reduced, and the processing efficiency is further improved.
It should be noted that, the units, devices, modules, etc. illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. It is to be understood that, in implementing the present specification, functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules or sub-units, or the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
As can be seen from the above, in the data sorting device provided in this specification, a more complex target data sequence may be first split into multiple subsequences by the sequence grouping module, and then the multiple subsequences may be simultaneously sorted by the sorting module group in a parallel processing manner by the multiple parallel sorting modules included in the sorting module group to obtain multiple corresponding sorted subsequences, and then the multiple sorted subsequences are merged by the merging module to obtain a sorted target data sequence, thereby completing the sorting process for the target data sequence, and thus solving the technical problems of low sorting process efficiency and long processing time consumption in the existing method. The method and the device have the advantages that the sequencing processing efficiency is improved, the waiting time of a user in querying and accessing the database is reduced, and the use experience of the user is improved.
The embodiment of the application also provides a data sorting processing method. Specifically, please refer to a flowchart of a data sorting method according to an embodiment of the present disclosure shown in fig. 4. The data sorting processing method provided in the embodiment of the present application may include the following steps in specific implementation.
S401: a target data sequence in a database is obtained, wherein the target data sequence comprises a plurality of data elements.
In this embodiment, the above data sorting method may be specifically applied to a server that is responsible for sorting data in a database. The server may specifically include a server in charge of data processing, which is applied to one side of the service platform and can implement functions such as data transmission and data processing. Specifically, the server may be, for example, an electronic device having data operation, storage function and network interaction function. Alternatively, the server may be a software program running in the electronic device and providing support for data processing, storage and network interaction. It should be noted that, in this embodiment, the number of the servers is not particularly limited. The server may specifically be one server, or may also be several servers, or a server cluster formed by several servers.
In an embodiment, the target data sequence may specifically include a data sequence stored in a database and containing information content that the user wants to query for access. The target data sequence may include a plurality of data elements. In particular, as shown in fig. 5. In the target data sequence Table0, each data element may specifically include the data value (which may be denoted as Col) of the data element, and the number (which may be denoted as Order) of the data element in the data sequence.
In an embodiment, the target data sequence may be specifically used to represent different service contents according to different service scenarios. Accordingly, the specific data value of each data element in the target data sequence may be a data value for characterizing the corresponding service content.
For example, in a business scenario of querying the salary of employees of a certain company, the target data sequence may be specifically data stored in a database for characterizing the specific conditions of the salary of each employee in the company. Each data element in the target data sequence may specifically correspond to the salary of a specific employee in the company, and represents the amount of the salary obtained by a specific employee in the company in the month. For example, the data element numbered 1 in the target data sequence may be the payroll for employee a of the company for the month, the data value 5000 of the data element may be used to characterize the amount of payroll taken by employee a of the company for the month is 5000 dollars, and so on. Of course, the above listed target data sequences are only illustrative. In specific implementation, the target data sequence may also be a data sequence for characterizing other service contents according to a specific application scenario and processing requirements. The present specification is not limited to these.
In an embodiment, the obtaining of the target data sequence in the database may include the following steps: receiving a data query request of a user for a database; and acquiring a target data sequence associated with the data query request from a database according to the data query request.
In this embodiment, if a user wants to query or access a certain database to obtain corresponding information content, the user may first generate and send a corresponding data query request (or access request) for the database to a server through a client device used by the user, for example, a device such as a mobile phone and a computer used by the user. The query request may specifically carry indication information of information content that the user wants to query for.
After receiving the query request, the server may first analyze the query request to obtain indication information of information content carried in the query request, where the information content is desired to be queried and obtained by the user, and search a database to which the query request is directed according to the indication information to find a data sequence, which is associated with the query request in a matching manner and contains the information content desired to be obtained by the user, from the database as a target data sequence.
S402: and dividing a plurality of data elements in the target data sequence into a plurality of subsequence groups to obtain a plurality of subsequences.
In this embodiment, considering that the target data sequence itself is complex and includes a large number of data elements, if the sorting process is performed one by one in a serial processing manner, a large processing time is inevitably consumed. In order to perform sorting processing in a parallel processing manner, an originally complex target data sequence may be split into a plurality of relatively simple subsequences containing relatively fewer data elements.
In an embodiment, in a specific implementation, the specific number of data elements included in each subsequence may be determined according to a processing performance of a processor of the server for word ordering of a single subsequence, and the specific number may be used as a corresponding dividing parameter. Meanwhile, it is considered that the data elements in the target data sequence stored in the database are often set with the information such as the numbers of the data elements in the target data sequence according to the sequence of the input time of the data elements, and the input time of the data elements is often stored in the target data sequence. Therefore, in order to avoid introducing unnecessary data processing in the grouping process, reduce the occupation of computing resources and improve the grouping time, a group of subsequence groups can be obtained and divided every other division parameter data elements according to the sequence of input time and the input time in the target data sequence in specific implementation. And establishing and obtaining the subsequence corresponding to the subsequence group based on the data elements contained in each subsequence group. The subsequence group obtained in this way can fully utilize the processing efficiency of the server, and obtain relatively good processing effect.
In an embodiment, the dividing the plurality of data elements in the target data sequence into a plurality of subsequence groups to obtain a plurality of subsequences may include the following steps: determining the number of data elements matched with the single sorting processing as a dividing parameter according to the processing efficiency of the sorting module; equally dividing the target data sequence into a plurality of subsequence groups according to the input time of the data elements, wherein the number of the data elements contained in the subsequence groups is equal to or less than the dividing parameter; and obtaining a corresponding subsequence according to the data elements in the subsequence group.
Of course, the above-listed splitting manner for splitting the target data sequence into multiple subsequences is only an exemplary illustration. In specific implementation, according to specific situations, other suitable splitting manners may also be adopted to split the target data sequence into a plurality of sub-sequences. The present specification is not limited to these.
In an embodiment, when the sub-sequence obtained by splitting in the manner described above may still be complex and include a large number of data elements, or the single sub-sequence cannot be efficiently processed based on the processing performance of the server, the server may further split and group the sub-sequence obtained by splitting described above, so as to obtain a relatively simpler processing structure.
Specifically, after dividing a plurality of data elements in the target data sequence into a plurality of subsequence groups and obtaining a plurality of subsequences, the method may further include the following steps: determining the number of data elements contained in each of the plurality of subsequences; and determining whether each subsequence in the plurality of subsequences needs to be subjected to grouping processing respectively according to the number of data elements contained in each subsequence in the plurality of subsequences.
If the number of data elements included in the subsequence is relatively large and is greater than the preset threshold number, the subsequence may be grouped to split the subsequence into unit subsequences which are relatively simpler and include relatively fewer data elements.
For example, each of the plurality of subsequences may be divided into a plurality of unit subsequences by grouping data elements included in each of the plurality of subsequences. In the subsequent processing, specific parallel sorting processing may be performed with the unit subsequence as a unit of parallel processing.
In an embodiment, after the plurality of data elements in the target data sequence are divided into the plurality of subsequence groups to obtain the plurality of subsequences, the server may further store the plurality of subsequences on an on-chip memory in a cache manner. Therefore, the subsequence can be conveniently read subsequently, the reading efficiency is improved, and the overall processing efficiency is further improved.
In one embodiment, the server may first check the on-chip memory space when storing the sub-sequence to determine whether the on-chip memory space meets the storage requirement, considering that the on-chip memory space may be relatively limited.
In case it is determined that the on-chip storage space meets the storage requirement, the plurality of sub-sequences may be preferentially stored on the on-chip memory. Storing the plurality of sub-sequences on a DDR memory, or SSD memory, if the on-chip memory storage space does not meet the storage requirement.
S403: and sequencing the plurality of subsequences in parallel to obtain a plurality of sequenced subsequences.
In this embodiment, the sorting process, which may also be referred to as a sorting operation, may be understood as an operation process of sorting the data elements in the target data sequence according to a certain order according to a corresponding sorting rule or sorting algorithm according to the data values of the data elements in the target data sequence.
In an embodiment, in a specific implementation, the plurality of sub-sequences may be simultaneously subjected to a sorting process according to a preset sorting algorithm.
In one embodiment, different suitable sorting algorithms may be selected as the preset sorting algorithm corresponding to different service scenarios and processing requirements. Specifically, the preset sorting algorithm may include at least one of the following: a bitonic ordering algorithm, a fast ordering algorithm, a bubble algorithm, etc. Of course, it should be noted that the above listed sorting algorithm is only an illustrative one. In specific implementation, according to a specific service scenario and a processing requirement, other suitable sorting algorithms may be adopted as the preset sorting algorithm. The present specification is not limited to these.
In an embodiment, in a specific implementation, before performing parallel sorting processing on the multiple subsequences, the server may further receive, through a configuration port, configuration parameters of a sorting algorithm set by a user; and determining and configuring a corresponding sorting algorithm as a preset sorting algorithm to perform specific sorting processing according to the configuration parameters of the sorting algorithm. The configuration parameters of the ranking algorithm may be specifically used to characterize a type of a preset ranking algorithm selected by a user.
In one embodiment, in specific implementation, the configuration parameters of the sorting algorithm set by the user may not be acquired, but the corresponding matching sorting algorithm is selected as the default preset sorting algorithm according to the data characteristics of the target data sequence to be processed, so that the plurality of sub-sequences may be directly sorted according to the default preset sorting algorithm.
In one embodiment, the server may perform parallel sorting processing on the plurality of sub-sequences simultaneously in a parallel processing manner. Therefore, the processing time for sequencing the target data sequence can be effectively shortened.
In an embodiment, in a specific implementation, the server may respectively allocate the plurality of sub-sequences to different processing units (e.g., different sorting modules), so that the different processing units may be controlled to respectively read the allocated sub-sequences from the memory, and then the processing unit is controlled to simultaneously perform sorting processing on the read sub-sequences according to a preset sorting algorithm to obtain corresponding sorted sub-sequences, thereby implementing parallel sorting processing on the plurality of sub-sequences.
The ordered subsequence can be specifically understood as a subsequence obtained by ordering a single subsequence, in which each data element in the subsequence is arranged according to an order rule. For example, the data elements included in the subsequence are arranged in descending order of the data values of the data elements.
In an embodiment, if the number of the sub-sequences to be sorted is relatively large and exceeds the allocable processing units, the server may further allocate the plurality of sub-sequences to the plurality of processing units in batches, and perform the parallel sorting processing on all the sub-sequences by controlling the processing units to perform the parallel sorting processing for a plurality of times.
In one embodiment, if the data elements included in each of the plurality of subsequences are also grouped, each of the plurality of subsequences is split into a plurality of unit subsequences. Correspondingly, the above-mentioned parallel sorting processing of the multiple subsequences to obtain multiple sorted subsequences may include the following contents in specific implementation: and respectively carrying out parallel sorting processing on the plurality of unit subsequences to obtain sorted unit subsequences. Further, according to a preset sorting algorithm, sorting the data elements in the sorted unit subsequences corresponding to the same subsequence to obtain the sorted subsequence corresponding to the subsequence.
In an embodiment, after obtaining the plurality of ordered sub-sequences, the server may further store the plurality of ordered sub-sequences on the on-chip memory in a cache manner. Therefore, the sequenced subsequence can be conveniently used for subsequent reading, the reading efficiency is improved, and the overall processing efficiency is further improved.
In one embodiment, the server may first check the on-chip memory space when storing the sub-sequence to determine whether the on-chip memory space meets the storage requirement, considering that the on-chip memory space may be relatively limited.
In the event that it is determined that the on-chip storage space meets the storage requirement, the plurality of ordered sub-sequences may be preferentially stored on the on-chip memory. And under the condition that the storage space of the on-chip memory does not meet the storage requirement, storing the plurality of ordered subsequences on a DDR memory or an SSD memory.
Specifically, as shown in fig. 5, the target data sequence may be divided into 4 sub-sequences, and the 4 sub-sequences may be allocated to 4 different processing units. Then, the 4 processing units can be controlled to perform parallel sorting processing on the 4 subsequences respectively, and the obtained 4 subsequences are sorted according to the data value from small to large, and are respectively recorded as: table0_0, Table0_1, Table0_2, and Table0_ 3.
S404: and combining the plurality of sequenced subsequences according to a preset rule to obtain a sequenced target data sequence.
In one embodiment, the plurality of ordered subsequences may be combined pairwise according to a preset rule in a pairwise combination manner, so as to finally obtain a total ordered data sequence as an ordered target data sequence, thereby completing the ordering process for the complex target data sequence.
In an embodiment, the merging the plurality of ordered subsequences according to a preset rule to obtain an ordered target data sequence, which may include the following contents in specific implementation: dividing the plurality of ordered subsequences into a plurality of combining groups, wherein each combining group comprises two ordered subsequences; merging the two sorted subsequences in the same merge group into a sorted group sequence by comparing the data elements contained in the two sorted subsequences in the same merge group; and combining the sorted group sequences to obtain a sorted target data sequence.
In this embodiment, specifically, referring to fig. 5, taking the merging process of two sorted sub-sequences (e.g., Table0_0 and Table0_1) included in one merge group as an example, the merging of the plurality of sorted sub-sequences according to the preset rule is specifically described. The two ordered subsequences are arranged in the order from small to large according to the data values of the data elements.
The server may extract the sorted data elements located at the first order position from the sorted sub-sequences Table0_0 and Table0_1, respectively, to perform numerical comparison of the data values, take the data element with a smaller numerical value as the data element located at the first order position in the sorted group sequence, and reserve the data element with a larger numerical value of the data value to participate in the next round of comparison. For example, data element numbered 2 and data value 5 in Table0_0, and data element numbered 5 and data value 4 in Table0_1 are equal to or greater than the value of the data element numbered 5 and data value 4. Thus, the data element numbered 5 and having a data value of 4 in Table0_1 may be determined as the data element located at the first ordinal position in the sorted group sequence, i.e., the data element is arranged at the first position in the sorted group sequence. Meanwhile, the data element of the data element with number 2 and data value 5 in Table0_0 is reserved for the next round of comparison. In the next comparison, the data element with the number 7 and the data value 6 arranged as the first ordinal position among the remaining data elements extracted from Table0_1 is compared with the data element with the number 2 and the data value 5 in Table0_0 reserved before. Since number 2 in Table0_0, the data value of the data element with data value 5 is smaller than the data element with number 7 and data value 6 in Table0_ 1. Thus, the data element with data value 5 may be determined as the data element at the second ordinal position in the sorted group sequence, i.e., the data element is arranged at the second position in the sorted group sequence, with number 2 in Table0_ 0. Meanwhile, the data element numbered 7 and having a data value of 6 in Table0_1 is reserved for participating in the next round of comparison. According to the method, the data elements included in the two sorted subsequences in the merged group can be compared in sequence, and a corresponding sorted group sequence is obtained and recorded as Result0_ 0. Thereby completing the pairwise merging process for the two subsequences in the merged group. According to the same merging mode, pairwise merging processing can be performed on the sorted sub-sequences Table0_2 and Table0_3 contained in the other merging group to obtain another corresponding sorted group sequence, which is recorded as Result0_ 1.
Further, the two sorted group sequences Result0_0 and Result0_1 obtained by pairwise merging may be used as a new merged group, and pairwise merging may be performed on the two sorted group sequences included in the new merged group according to a similar merging manner to obtain a sorted sequence, which is denoted as Result0 and is used as a sorted target data sequence to be finally obtained. Therefore, the sequencing processing of the originally complex target data sequence is completed.
In an embodiment, after the plurality of ordered subsequences are combined according to a preset rule to obtain an ordered target data sequence, when the method is implemented, the method may further include the following steps: and displaying the sorted target data sequence to a user.
In this embodiment, in specific implementation, the server may send the sorted target data sequence to a client device used by the user, and then display the sorted target data sequence to the user through the client device. For example, the user is shown the wage situation of each employee of a certain company whose wage amount is ranked from high to low.
In this embodiment, a more complex target data sequence is first split into a plurality of subsequences, and then the plurality of subsequences can be simultaneously sequenced in a parallel processing manner to obtain corresponding sequenced subsequences, and then the sequenced subsequences are combined pairwise to obtain a sequenced target data sequence, so that the sequencing processing for the target data sequence is completed, thereby fully utilizing the processing performance of the server and solving the technical problems of low sequencing processing efficiency and long processing time consumption in the existing method. The method and the device have the advantages that the sequencing processing efficiency is improved, the waiting time of a user in querying and accessing the database is reduced, and the use experience of the user is improved.
In an embodiment, the dividing the plurality of data elements in the target data sequence into a plurality of subsequence groups to obtain a plurality of subsequences may include the following steps: determining the number of data elements matched with the single sorting processing as a dividing parameter according to the processing efficiency of the sorting module; equally dividing the target data sequence into a plurality of subsequence groups according to the input time of the data elements, wherein the number of the data elements contained in the subsequence groups is equal to or less than the dividing parameter; and obtaining a corresponding subsequence according to the data elements in the subsequence group.
In an embodiment, the above-mentioned parallel sorting processing of the multiple subsequences may include the following steps: according to a preset sorting algorithm, parallel sorting processing is carried out on the plurality of subsequences, wherein the preset sorting algorithm specifically comprises at least one of the following steps: a bitonic ordering algorithm, a fast ordering algorithm, a bubble algorithm, etc.
In an embodiment, the merging the plurality of ordered subsequences according to a preset rule to obtain an ordered target data sequence, which may include the following contents in specific implementation: dividing the plurality of ordered subsequences into a plurality of combining groups, wherein each combining group comprises two ordered subsequences; merging the two sorted subsequences in the same merge group into a sorted group sequence by comparing the data elements contained in the two sorted subsequences in the same merge group; and combining the sorted group sequences to obtain a sorted target data sequence.
In one embodiment, when implemented, the method may further include the following: storing the ordered subsequence on an on-chip memory.
In one embodiment, in implementation, when the on-chip memory storage space does not meet the storage requirement, the ordered sub-sequence may be stored in the DDR memory or the SSD memory.
In an embodiment, the obtaining of the target data sequence in the database may include the following steps: receiving a data query request of a user for a database; and acquiring a related target data sequence from a database according to the data query request.
In an embodiment, after the plurality of ordered subsequences are combined according to a preset rule to obtain an ordered target data sequence, when the method is implemented, the method may further include the following steps: and displaying the sorted target data sequence to a user.
As can be seen from the above, in the data sorting method provided in this specification, a complex target data sequence is first split into multiple subsequences, and then the multiple subsequences can be simultaneously sorted in a parallel processing manner to obtain corresponding sorted subsequences, and then the sorted subsequences are combined pairwise to obtain a required sorted target data sequence, so that the sorting process for the complex target data sequence is completed, and the technical problems of low sorting processing efficiency and long processing time consumption in the existing method can be solved. The method and the device have the advantages that the sequencing processing efficiency is improved, the waiting time of a user in querying and accessing the database is reduced, and the use experience of the user is improved. The relatively complex target data sequence is divided into relatively simple subsequences, the subsequences are firstly cached on the on-chip memory preferentially to facilitate subsequent reading and sorting, and the sorted subsequences are cached on the on-chip memory preferentially to facilitate subsequent reading and merging, so that the efficiency of reading and processing the subsequences is improved, and the sorting processing efficiency for the whole target data sequence is further improved.
The embodiment of the present specification further provides a computer storage medium based on the above data sorting processing method, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer storage medium implements: acquiring a target data sequence in a database, wherein the target data sequence comprises a plurality of data elements; dividing a plurality of data elements in the target data sequence into a plurality of subsequence groups to obtain a plurality of subsequences; sequencing the plurality of subsequences in parallel to obtain a plurality of sequenced subsequences; and combining the plurality of sequenced subsequences according to a preset rule to obtain a sequenced target data sequence.
In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.
In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.
In a specific implementation scenario example, the data sequence in the database may be sorted by using the data sorting processing method provided in the embodiment of the present application. The following can be referred to as a specific implementation process.
In a first step, sequence grouping, the sorted input data sequence (e.g., a target data sequence in a database) is divided equally (resulting in multiple subsequences) at a fixed interval N (e.g., a partition parameter) according to the input order (e.g., the input time of a data element). The value of N depends on the size of the sequence that can be processed by the subsequent sorting module.
And secondly, sequencing the subsequences, and respectively sequencing the subsequences divided in the first step. When there are multiple sorting modules, multiple subsequences can be sorted simultaneously in parallel. The sorting algorithm (i.e., the preset sorting algorithm) specifically adopted by each sorting module is relatively free, and can be bitonic sorting, quick sorting, bubbling and the like.
And thirdly, combining the subsequences, wherein each subsequence is changed into an ordered sequence (namely, the ordered subsequences) through the operation of the second step, and then the ordered subsequences are combined pairwise. After merging, if a plurality of subsequences still exist, merging two by two.
In this scenario example, the merging manner may specifically include: one data element from each of the two sequences is taken and compared in size. If the data elements are sorted in ascending order, outputting the elements with small data (namely the data elements), then taking one data element from the corresponding sequence, and continuously comparing the sizes to repeat the operations. If the data elements are sorted in descending order, outputting the elements with large data, then taking one data element from the corresponding sequence, continuously comparing the sizes, and repeating the operation until all the data elements of the two sequences are taken out. Thereby completing the sorting of the input data sequence.
In the present scenario example, in order to be able to implement the above method, the following program implementation module may be constructed based on processing logic with a server. In particular, as shown in fig. 6. The method specifically comprises the following steps: the system comprises a sequence grouping module, a sequencing module group, a sequence caching module and a sequence merging module group.
The sequence grouping module is specifically used for receiving sequence data input from the outside or input from an on-chip buffer. And grouping the input sequence according to the pre-configured grouping interval parameter. And the grouped sequences are distributed to the sorting module 0, the sorting module 1 and the sorting module N in sequence for sorting. If the input sequence is not completely grouped, the input sequence is distributed to the sorting module N, and the sorting is continued from the sorting module 0 after the previous subsequence is completed.
And the sorting module group can specifically comprise a plurality of sorting modules, and each sorting module can be used for sorting the input sequence with a specific length. Typically it supports the same length of ordering as the sequence grouping module grouping interval. The specific implementation manner of each sorting module may be various, and is not particularly limited, and may include ways of bitonic sorting, fast sorting, bubbling, and the like, for example. The parallel sorting speed can be greatly improved by the parallel sorting modules.
And the sequence cache module is used for storing the data of the ordered data sequence output by the sequencing module group in the sequence cache module before the sequence combination is carried out. Meanwhile, after the output of the subsequent sequence merging module, a plurality of subsequences still exist, and under the condition that the merging is needed continuously, the output intermediate result can also be temporarily stored in the on-chip sequence caching module. The on-chip sequence cache module can improve the data access efficiency and is beneficial to improving the overall performance of the system, and when the space of the on-chip sequence cache module is full, part of data can be stored on an on-chip memory device, such as a larger DDR, SSD and other memory devices.
The sequence merging module group may specifically include a plurality of sequence merging modules, and each module may merge two independent ordered sequences and output a merged ordered sequence (i.e., an ordered target sequence). This ensures that one data element is output per clock cycle in a pipelined manner. The plurality of sequence merging modules work in parallel, and can efficiently merge a plurality of subsequences.
In this scenario example, when implemented specifically, the server may perform a specific sorting operation on the sequence shown in fig. 5 in the database based on the foregoing method.
Specifically, fig. 5 shows that the sorting process is performed on a sequence Table0 containing 16 data elements.
According to the parallel sequence division method, the original input data series can be equally divided into four subsequences by the sequence grouping module: table0_0, Table0_1, Table0_3, and Table0_ 4.
And distributing the four divided subsequences to four different sequencing modules respectively for parallel sequencing processing, wherein each sequenced subsequence can be temporarily stored in a sequence cache module.
Then, the sorted sub-sequences Table0_0 and Table0_1 can be redistributed to the same sequence merging module for merging, and an intermediate Result0_0 is obtained after merging. Meanwhile, the sorted sub-sequences Table0_2 and Table0_3 may be allocated to another sequence merging module for merging, and another intermediate Result0_1 is obtained after merging. Because the merging is completed, two intermediate subsequence results still exist, and the merging processing can be continued according to the merging mode subsequently, the intermediate results Result0_0 and Result0_1 can be cached in the on-chip sequence cache module, so that the subsequent reading is facilitated.
Finally, the sequence merging module is used to merge the merged intermediate results Result0_0 and Result0_1 again to obtain a final Result0 (i.e. the sorted sequence Table 0).
Through the above scenario example, it is verified that the data sorting processing method provided in the embodiment of the present application splits a more complex target data sequence into multiple subsequences, and then can simultaneously sort the multiple subsequences in a parallel processing manner to obtain corresponding sorted subsequences, and then performs pairwise combination processing on the sorted subsequences to obtain the sorted target data sequence, thereby completing the sorting processing for the more complex target data sequence, and indeed effectively solving the technical problems of low sorting processing efficiency and long processing time consumption in the existing method. The method and the device have the advantages that the sequencing processing efficiency is improved, the waiting time of a user in querying and accessing the database is reduced, and the use experience of the user is improved.
Although the present specification provides method steps as described in the examples or flowcharts, additional or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. The terms first, second, etc. are used to denote names, but not any particular order.
Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.
This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus necessary general hardware platform. With this understanding, the technical solutions in the present specification may be essentially embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments in the present specification.
The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.
While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims (19)

1. An apparatus for sorting data, comprising: the system comprises a sequence grouping module, a sequencing module group and a combining module, wherein the sequence grouping module is connected with the sequencing module group, and the combining module is connected with the sequencing module group;
the sequence grouping module is used for dividing a target data sequence in the accessed database into a plurality of subsequence groups to obtain a plurality of subsequences; wherein the target data sequence comprises a plurality of data elements;
the sequencing module group comprises a plurality of parallel sequencing modules, and is used for acquiring a plurality of subsequences and sequencing the subsequences in parallel through the parallel sequencing modules to obtain a plurality of corresponding sequenced subsequences;
and the merging module is used for merging the plurality of sequenced subsequences according to a preset rule to obtain a sequenced target data sequence.
2. The apparatus according to claim 1, wherein the sorting modules are respectively configured with a preset sorting algorithm, and the sorting modules perform sorting processing on the subsequences according to the preset sorting algorithm.
3. The apparatus of claim 2, wherein the predetermined ordering algorithm comprises at least one of: a bitonic ordering algorithm, a fast ordering algorithm, a bubble algorithm.
4. The apparatus of claim 1, wherein the merge module comprises a plurality of parallel merge sub-modules.
5. The apparatus according to claim 4, wherein the merging module is specifically configured to divide the plurality of sorted subsequences into a plurality of merging groups, and allocate the plurality of merging groups to a plurality of parallel sub-modules; wherein the merge group comprises two ordered subsequences;
the merging module compares data elements contained in two sequenced subsequences in the same merging group through a plurality of merging sub-modules respectively so as to merge the two sequenced subsequences in the same merging group into a sequenced group sequence; and the merging module is used for merging the sorted group sequences to obtain a sorted target data sequence.
6. The apparatus according to claim 1, wherein a sub-sequence buffer module is further connected between the sorting module group and the merging module, and the sub-sequence buffer module is configured to buffer a plurality of sorted sub-sequences.
7. The apparatus of claim 6, wherein the sub-sequence caching module comprises an on-chip memory, and wherein the sub-sequence caching module is configured to cache the plurality of ordered sub-sequences in the on-chip memory.
8. The apparatus of claim 7, wherein the sub-sequence caching module further comprises a DDR memory and/or an SSD memory, and the sub-sequence caching module caches the plurality of ordered sub-sequences in the DDR memory and/or the SSD memory if a storage space of the on-chip memory does not meet a storage requirement.
9. The apparatus according to claim 1, wherein the sequence grouping module is specifically configured to determine, according to the processing performance of the sorting module, the number of data elements that match a single sorting process as the partitioning parameter; equally dividing the target data sequence into a plurality of subsequence groups according to the input time of the data elements, wherein the number of the data elements contained in the subsequence groups is equal to or less than the dividing parameter; and obtaining a corresponding subsequence according to the data elements in the subsequence group.
10. The apparatus of claim 9, wherein the subsequence length supported by the sorting module matches the partition parameter.
11. A data sorting processing method is characterized by comprising the following steps:
acquiring a target data sequence in a database, wherein the target data sequence comprises a plurality of data elements;
dividing a plurality of data elements in the target data sequence into a plurality of subsequence groups to obtain a plurality of subsequences;
sequencing the plurality of subsequences in parallel to obtain a plurality of sequenced subsequences;
and combining the plurality of sequenced subsequences according to a preset rule to obtain a sequenced target data sequence.
12. The method of claim 11, wherein dividing the plurality of data elements in the target data sequence into a plurality of subsequence groups, resulting in a plurality of subsequences, comprises:
determining the number of data elements matched with the single sorting processing as a dividing parameter according to the processing efficiency of the sorting module;
equally dividing the target data sequence into a plurality of subsequence groups according to the input time of the data elements, wherein the number of the data elements contained in the subsequence groups is equal to or less than the dividing parameter;
and obtaining a corresponding subsequence according to the data elements in the subsequence group.
13. The method of claim 11, wherein performing the ordering process on the plurality of subsequences in parallel comprises:
according to a preset sorting algorithm, parallel sorting processing is carried out on the plurality of subsequences, wherein the preset sorting algorithm comprises at least one of the following steps: a bitonic ordering algorithm, a fast ordering algorithm, a bubble algorithm.
14. The method of claim 11, wherein merging the plurality of ordered subsequences according to a preset rule to obtain an ordered target data sequence comprises:
dividing the plurality of ordered subsequences into a plurality of combining groups, wherein each combining group comprises two ordered subsequences;
merging the two sorted subsequences in the same merge group into a sorted group sequence by comparing the data elements contained in the two sorted subsequences in the same merge group;
and combining the sorted group sequences to obtain a sorted target data sequence.
15. The method of claim 11, further comprising:
storing the ordered subsequence on an on-chip memory.
16. The method of claim 15, wherein the ordered subsequence is stored on a DDR memory, or an SSD memory, if the on-chip memory storage space does not meet storage requirements.
17. The method of claim 11, wherein obtaining a target data sequence in a database comprises:
receiving a data query request of a user for a database;
and acquiring a related target data sequence from a database according to the data query request.
18. The method of claim 17, wherein after combining the plurality of ordered subsequences according to a preset rule to obtain an ordered target data sequence, the method further comprises:
and displaying the sorted target data sequence to a user.
19. A computer-readable storage medium having stored thereon computer instructions, wherein the instructions, when executed, implement the steps of the method of any one of claims 11 to 18.
CN202010573219.6A 2020-06-22 2020-06-22 Data sorting processing device, method and storage medium Pending CN111913955A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010573219.6A CN111913955A (en) 2020-06-22 2020-06-22 Data sorting processing device, method and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010573219.6A CN111913955A (en) 2020-06-22 2020-06-22 Data sorting processing device, method and storage medium

Publications (1)

Publication Number Publication Date
CN111913955A true CN111913955A (en) 2020-11-10

Family

ID=73226173

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010573219.6A Pending CN111913955A (en) 2020-06-22 2020-06-22 Data sorting processing device, method and storage medium

Country Status (1)

Country Link
CN (1) CN111913955A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112807697A (en) * 2021-01-28 2021-05-18 北京达佳互联信息技术有限公司 List generation method and device, electronic equipment and storage medium
CN112861145A (en) * 2021-01-06 2021-05-28 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN112947890A (en) * 2021-03-09 2021-06-11 中科驭数(北京)科技有限公司 Merging and sorting method and device
CN113259265A (en) * 2021-04-30 2021-08-13 阿里巴巴新加坡控股有限公司 Message processing method and device, electronic equipment and storage medium
CN113672530A (en) * 2021-10-21 2021-11-19 苏州浪潮智能科技有限公司 Server and sequencing equipment thereof
CN114077581A (en) * 2021-11-24 2022-02-22 北京白板科技有限公司 Database based on data aggregation storage mode
CN114282255A (en) * 2022-03-04 2022-04-05 支付宝(杭州)信息技术有限公司 Sorting sequence merging method and system based on secret sharing
CN114546943A (en) * 2022-02-21 2022-05-27 重庆科创职业学院 Database file sorting optimization method and device based on multi-process call
CN114817274A (en) * 2022-07-01 2022-07-29 长沙广立微电子有限公司 Wafer data processing method and device, electronic device and storage medium
CN115599541A (en) * 2021-02-25 2023-01-13 华为技术有限公司(Cn) Sorting device and method
WO2023071566A1 (en) * 2021-10-25 2023-05-04 腾讯科技(深圳)有限公司 Data processing method and apparatus, computer device, computer-readable storage medium, and computer program product
CN116361319A (en) * 2023-05-17 2023-06-30 山东浪潮科学研究院有限公司 Database query method, device, equipment and storage medium
CN117112238A (en) * 2023-10-23 2023-11-24 天津南大通用数据技术股份有限公司 High-performance merging method in OLAP database sorting operator
WO2024088231A1 (en) * 2022-10-28 2024-05-02 华为技术有限公司 Signal processing method and apparatus, and device, medium and chip

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6260036B1 (en) * 1998-05-07 2001-07-10 Ibm Scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems
CN102750131A (en) * 2012-06-07 2012-10-24 中国科学院计算机网络信息中心 Graphics processing unit (GPU) oriented bitonic merge sort method
CN103530084A (en) * 2013-09-26 2014-01-22 北京奇虎科技有限公司 Data parallel sequencing method and system
CN104123304A (en) * 2013-04-28 2014-10-29 国际商业机器公司 Data-driven parallel sorting system and method
CN107077488A (en) * 2014-10-07 2017-08-18 甲骨文国际公司 It is parallel to merge
CN109145051A (en) * 2018-07-03 2019-01-04 阿里巴巴集团控股有限公司 The data summarization method and device and electronic equipment of distributed data base

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6260036B1 (en) * 1998-05-07 2001-07-10 Ibm Scalable parallel algorithm for self-organizing maps with applications to sparse data mining problems
CN102750131A (en) * 2012-06-07 2012-10-24 中国科学院计算机网络信息中心 Graphics processing unit (GPU) oriented bitonic merge sort method
CN104123304A (en) * 2013-04-28 2014-10-29 国际商业机器公司 Data-driven parallel sorting system and method
US20140324890A1 (en) * 2013-04-28 2014-10-30 International Business Machines Corporation Data Driven Parallel Sorting System and Method
CN103530084A (en) * 2013-09-26 2014-01-22 北京奇虎科技有限公司 Data parallel sequencing method and system
CN107077488A (en) * 2014-10-07 2017-08-18 甲骨文国际公司 It is parallel to merge
CN109145051A (en) * 2018-07-03 2019-01-04 阿里巴巴集团控股有限公司 The data summarization method and device and electronic equipment of distributed data base

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112861145A (en) * 2021-01-06 2021-05-28 华控清交信息科技(北京)有限公司 Data processing method and device and data processing device
CN112861145B (en) * 2021-01-06 2023-12-12 华控清交信息科技(北京)有限公司 Data processing method and device for data processing
CN112807697A (en) * 2021-01-28 2021-05-18 北京达佳互联信息技术有限公司 List generation method and device, electronic equipment and storage medium
WO2022160702A1 (en) * 2021-01-28 2022-08-04 北京达佳互联信息技术有限公司 List generation method and apparatus
CN115599541A (en) * 2021-02-25 2023-01-13 华为技术有限公司(Cn) Sorting device and method
CN112947890A (en) * 2021-03-09 2021-06-11 中科驭数(北京)科技有限公司 Merging and sorting method and device
CN112947890B (en) * 2021-03-09 2021-11-02 中科驭数(北京)科技有限公司 Merging and sorting method and device
CN113259265A (en) * 2021-04-30 2021-08-13 阿里巴巴新加坡控股有限公司 Message processing method and device, electronic equipment and storage medium
CN113259265B (en) * 2021-04-30 2023-07-25 阿里云计算有限公司 Message processing method and device, electronic equipment and storage medium
CN113672530B (en) * 2021-10-21 2022-02-18 苏州浪潮智能科技有限公司 Server and sequencing equipment thereof
CN113672530A (en) * 2021-10-21 2021-11-19 苏州浪潮智能科技有限公司 Server and sequencing equipment thereof
WO2023071566A1 (en) * 2021-10-25 2023-05-04 腾讯科技(深圳)有限公司 Data processing method and apparatus, computer device, computer-readable storage medium, and computer program product
CN114077581A (en) * 2021-11-24 2022-02-22 北京白板科技有限公司 Database based on data aggregation storage mode
CN114546943A (en) * 2022-02-21 2022-05-27 重庆科创职业学院 Database file sorting optimization method and device based on multi-process call
CN114282255A (en) * 2022-03-04 2022-04-05 支付宝(杭州)信息技术有限公司 Sorting sequence merging method and system based on secret sharing
CN114817274A (en) * 2022-07-01 2022-07-29 长沙广立微电子有限公司 Wafer data processing method and device, electronic device and storage medium
CN114817274B (en) * 2022-07-01 2022-09-16 长沙广立微电子有限公司 Wafer data processing method and device, electronic device and storage medium
WO2024088231A1 (en) * 2022-10-28 2024-05-02 华为技术有限公司 Signal processing method and apparatus, and device, medium and chip
CN116361319A (en) * 2023-05-17 2023-06-30 山东浪潮科学研究院有限公司 Database query method, device, equipment and storage medium
CN116361319B (en) * 2023-05-17 2023-08-29 山东浪潮科学研究院有限公司 Database query method, device, equipment and storage medium
CN117112238A (en) * 2023-10-23 2023-11-24 天津南大通用数据技术股份有限公司 High-performance merging method in OLAP database sorting operator
CN117112238B (en) * 2023-10-23 2024-01-30 天津南大通用数据技术股份有限公司 High-performance merging method in OLAP database sorting operator

Similar Documents

Publication Publication Date Title
CN111913955A (en) Data sorting processing device, method and storage medium
US8381230B2 (en) Message passing with queues and channels
US20160132541A1 (en) Efficient implementations for mapreduce systems
US20130263117A1 (en) Allocating resources to virtual machines via a weighted cost ratio
CN107153643B (en) Data table connection method and device
US10114866B2 (en) Memory-constrained aggregation using intra-operator pipelining
CN111324427B (en) Task scheduling method and device based on DSP
CN107969153B (en) Resource allocation method and device and NUMA system
CN111949681A (en) Data aggregation processing device and method and storage medium
CN110069557B (en) Data transmission method, device, equipment and storage medium
CN112882663B (en) Random writing method, electronic equipment and storage medium
Ma et al. Dependency-aware data locality for MapReduce
US7890705B2 (en) Shared-memory multiprocessor system and information processing method
CN111813517A (en) Task queue allocation method and device, computer equipment and medium
US20200117505A1 (en) Memory processor-based multiprocessing architecture and operation method thereof
US20110246582A1 (en) Message Passing with Queues and Channels
CN110597627A (en) Database operation acceleration device and method based on virtual FPGA
CA3094727A1 (en) Transaction processing method and system, and server
JP5043166B2 (en) Computer system, data search method, and database management computer
Wang et al. Improved intermediate data management for mapreduce frameworks
CN107103095A (en) Method for computing data based on high performance network framework
CN107291483A (en) Intelligence deletes the method and electronic equipment of application program
CN111104527A (en) Rich media file parsing method
US20140237149A1 (en) Sending a next request to a resource before a completion interrupt for a previous request
CN110489222A (en) Method for scheduling task, system, cluster server and readable storage medium storing program for executing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100089 room 801, 8 / F, building 3, yard 1, 81 Beiqing Road, Haidian District, Beijing

Applicant after: YUSUR TECHNOLOGY Co.,Ltd.

Address before: 100190 scientific research complex building, Institute of computing technology, Chinese Academy of Sciences, no.6, Academy of Sciences South Road, Haidian District, Beijing

Applicant before: YUSUR TECHNOLOGY Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20201110