WO2022017167A1 - Information processing method and system, electronic device, and storage medium - Google Patents

Information processing method and system, electronic device, and storage medium Download PDF

Info

Publication number
WO2022017167A1
WO2022017167A1 PCT/CN2021/104640 CN2021104640W WO2022017167A1 WO 2022017167 A1 WO2022017167 A1 WO 2022017167A1 CN 2021104640 W CN2021104640 W CN 2021104640W WO 2022017167 A1 WO2022017167 A1 WO 2022017167A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
keyword
process group
information
ordered
Prior art date
Application number
PCT/CN2021/104640
Other languages
French (fr)
Chinese (zh)
Inventor
赵彤
李锐喆
Original Assignee
北京卡普拉科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京卡普拉科技有限公司 filed Critical 北京卡普拉科技有限公司
Publication of WO2022017167A1 publication Critical patent/WO2022017167A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation

Definitions

  • the present disclosure relates to the technical field of information processing, and in particular, to an information processing method, system, electronic device and storage medium.
  • Keywords which can be integers or floating-point numbers of any number of digits, or strings of any length, etc.
  • a set of information is organized according to some order of keywords. Sorting is a method that can effectively organize information. It can adjust a group of information into a group of information with ordered keywords; then, the required information can be quickly found based on the keywords.
  • the present disclosure provides an information processing method, system, electronic device and storage medium, which can complete information processing of order adjustment of massive information among multiple computing nodes, thereby improving processing efficiency.
  • the present disclosure provides an information processing method, comprising: acquiring information to be processed of a current process group, wherein the current process group includes at least two different processes, and each information of the information to be processed includes one for marking or searching Query the keywords of the information, all the keywords of the information to be processed have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group; if there is a process whose keyword sequence is an unordered keyword sequence, Then the keyword sequence of the process is processed as an ordered keyword sequence; when load adjustment processing is required, the inter-process load adjustment processing is performed on the current process group to balance the load among the processes; the public value of the current process group is calculated.
  • Demarcation value perform keyword merge sort processing between every two processes in the current process group; the results of keyword merge sort processing are scattered and stored in each process of the current process group, and each process is scattered and stored in an orderly manner A sequence of keywords, and any keyword on the i-th process is less than any keyword on the i+1-th process.
  • the present disclosure provides an information processing system, comprising: an information acquisition module configured to acquire pending information of a current process group, wherein the current process group includes at least two different processes, and each piece of information to be processed includes A keyword used to mark or search for the information, all keywords of the information to be processed have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group; the first processing module is configured to For a process whose keyword sequence is a non-ordered keyword sequence, the keyword sequence of the process is processed as an ordered keyword sequence; the second processing module is configured to process the current process group when load adjustment processing is required.
  • Inter-process load adjustment processing to balance the load among the processes;
  • the calculation module is configured to calculate the common demarcation value of the current process group;
  • the third processing module is configured to calculate between every two processes of the current process group according to the public demarcation value Perform keyword merge sorting processing;
  • the storage module is configured to store the results of keyword merge sorting processing in each process of the current process group, so that each process stores an ordered keyword sequence in a scattered manner, and any key on the i-th process is distributed. The words are all smaller than any keyword on the i+1th process;
  • the control module is configured to control the invocation of the information acquisition module, the first processing module, the second processing module, the computing module, the third processing module, and the storage module.
  • the present disclosure provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, the information processing method according to the first aspect is implemented.
  • the present disclosure provides a storage medium on which a computer program is stored, and when the computer program is executed by one or more processors, implements the information processing method according to the first aspect.
  • Embodiment 1 is a flowchart of an information processing method provided in Embodiment 1 of the present disclosure
  • FIG. 2 is an example of a process group provided by Embodiment 1 of the present disclosure
  • FIG. 3(a) is an example of load adjustment of a process group provided by Embodiment 1 of the present disclosure, where is a process pair of the process group;
  • FIG. 3(b) is a result of load adjustment processing of a process group provided by Embodiment 1 of the present disclosure
  • Embodiment 4 is a calculation example of a candidate common demarcation value provided by Embodiment 1 of the present disclosure
  • FIG. 5 is a block diagram of an information processing system provided by Embodiment 2 of the present disclosure.
  • quick sort is to divide a set of data (or keywords) into two independent parts through one-way sorting, in which all data (or keywords) in one part are less than the cut-off value, and all data (or keys) in the other part are word) are greater than or equal to the cutoff value, and then recursively perform quicksort on these two parts of data (or keywords) in this way. Therefore, the choice of cutoff value has an important impact on the complexity of quicksort, although the average complexity is O(N'*log(N')), but in the worst case, the complexity can reach O(N'*N').
  • merge sort The basic idea of merge sort is to combine several ordered subsequences of data (or keywords) in one pass to obtain a completely ordered sequence of data (or keywords); Makes subsequence segments ordered.
  • the complexity of merge sort is stable at O(N'*log(N')), but it requires an extra storage space compared to quicksort.
  • the present disclosure provides a method that can complete the sequential adjustment of massive information among multiple computing nodes.
  • the information processing scheme can improve the processing efficiency.
  • FIG. 1 shows a flowchart of an information processing method provided by this embodiment. It should be noted that the information processing method provided by this embodiment of the present disclosure is not limited to the specific sequence shown in FIG. 1 and the following. It should be understood that in other In the embodiment, the order of some steps in the information processing method provided by the embodiment of the present disclosure may be exchanged with each other according to actual needs, or some of the steps may be omitted or deleted, or some of the steps may be performed simultaneously.
  • the method includes the following steps S1 to S6 .
  • Step S1 Acquire pending information of the current process group, wherein the current process group includes at least two different processes, each information of the information to be processed includes a keyword for marking or searching for the information, all the information to be processed
  • the keywords have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group. It can be understood that the keyword can be an integer or floating-point number of any number of digits, or a string of any length or other forms, which are not limited here.
  • Step S2 If there is a process in which the keyword sequence is a non-ordered keyword sequence, the keyword sequence of the process is processed as an ordered keyword sequence.
  • the non-ordered key sequence can be processed into an ordered key sequence by using an existing sorting method, such as quick sort, merge sort or heap sort.
  • the sorting process of the local keyword sequence stored in the process can be completed by an existing thread-level parallel sorting program or an existing sorting program using processor vector instructions to form an ordered keyword on the process. sequence.
  • Step S3 When the load adjustment process needs to be performed, the inter-process load adjustment process is performed on the current process group, so as to balance the load among the processes.
  • Step S4 Calculate the common boundary value of the current process group.
  • Step S5 Perform keyword merge sorting processing between every two processes in the current process group according to the common demarcation value.
  • Step S6 the results of the keyword merging and sorting processing are scattered and stored in each process of the current process group, and each process is scattered and stored in an ordered sequence of keywords, and any keyword on the i-th process is smaller than the i+1-th process. any keyword.
  • the keyword sequence in the process is first processed from disorder to order, and the keyword sequence in the process is processed as an ordered sequence.
  • sequence keyword sequence when load adjustment processing is required between processes, load adjustment processing is performed to balance the load among each process; based on the calculated common demarcation value, fast merge sort processing is performed between two processes in the current process group, combined with The respective ordered keyword sequences on the two processes complete a fast merge sort.
  • the ordered key of completes the information processing of a group of processes.
  • the method further includes the following steps S7-S8.
  • Step S7 obtain the first sub-process group and the second sub-process group according to the common demarcation value, wherein, all the keywords in the first sub-process group are less than the common demarcation value, and all the keywords in the second sub-process group are greater than the common demarcation value. cutoff value.
  • the current process group is further grouped according to the common demarcation value, and two sub-process groups are obtained, and then the next round of information processing is continued. That is, the sub-process group including at least two processes is used as the current process group, and step S1 is executed. ⁇ The process of step S6.
  • Step S8 When the first sub-process group or the second sub-process group includes at least two processes, execute the acquisition of pending information of the current process group, so as to perform information processing on the sub-process group.
  • step S3 may include the following sub-steps S301-S304.
  • Step S301 Obtain the load information of all processes in the current process group, and collect them in the main process of the current process group; wherein, the load information includes: the load of each process (that is, the number of keywords in the keyword sequence of each process) , the maximum load of all processes, and the average load of all processes.
  • Step S302 Sort the loads of all processes in the current process group in the main process, obtain an ordered array of loads of the current process group, and send the ordered array of loads to all processes in the current process group.
  • the sorting processing is performed on the loads of all processes in the current process group, which may be sorted according to the load from small to large, or may be sorted according to the load from large to small, which is not limited here.
  • Step S303 according to the load information of the current process group, determine whether the load adjustment process needs to be performed.
  • step S303 the load imbalance ratio is calculated according to the load information of the current process group.
  • the calculation method of the load imbalance ratio is: the maximum load of all processes in the process group is divided by the average load of all processes in the process group. The larger the load imbalance ratio, the more unbalanced the load is.
  • the load imbalance ratio is close to 1, it means that the load is unbalanced.
  • the load is well balanced. Then, when the load unbalance rate is greater than the preset threshold, load adjustment processing is required.
  • the preset threshold may be set according to the actual situation, which is not limited here.
  • Step S304 When load adjustment processing is required, all processes of the current process group are divided into several process pairs according to the load ordered array, and load adjustment processing is performed between the two processes of each process pair, wherein the load i is the largest.
  • the process of and the process with the largest load Ni form a process pair, i ⁇ N, N is the number of processes in the current process group.
  • the current process group can also be divided into a small-value sub-process group and a large-value sub-process group, wherein the number of each process in the small-value sub-process group is smaller than that of each process in the large-value sub-process group.
  • the number of processes in the large-value sub-process group can be as close as possible, and can also be determined according to the number of data less than the common threshold and the number of data greater than or equal to the common threshold obtained by the calculation module 4 . For example, when there are 100 processes in the process group, and the number of data less than or equal to the common threshold value is 450,000 and 550,000, respectively, the small-value sub-process group and the large-value sub-process group should have 45 and 55 processes respectively. If the number of processes in the small-value sub-process group is less than the number of processes in the large-value sub-process group, when the third processing module 5 is called, a small-value process appears in multiple process pairs.
  • load adjustment processing is performed between two processes in each process pair, including: in each process pair, the process with a large load transfers some keywords to a process with a small load, and the process with a small load transfers the keywords from the process based on the merge sort.
  • Some keywords related to the process with heavy load are merged in, so that the load of the two processes in the process pair is balanced, so that the load of the two processes is the same or close.
  • the process group includes 7 processes, namely: process 1 to process 7 .
  • Each process first establishes its own load information, including the number of local data (that is, the load of the process) and the two-tuple composed of the process number; then in the bottom-up process along the binary tree, complete the merging of the load of each process Sorting (keyword for sorting by load), after reaching the top root node, i.e.
  • process 1 based on the sorted load information array, process 1 finds the maximum load (that is, process 5 has 12 data), and finds out The total load of all processes is 53 data, so the average load of all processes is 53/7, and the load imbalance ratio is 12/(53/7), that is, the maximum load of all processes is divided by the average load of all processes. When the load imbalance ratio exceeds a preset threshold, load adjustment processing will be performed.
  • Figure 3 gives an example of the corresponding load adjustment, where there are a total of 3 process pairs as shown in Figure 3(a) (since the total number of processes 7 is an odd number, the process 7 with the 4th load is not paired), which can be based on the periodic interval Select data to be adjusted to another process, for example, in Figure 3(a), 77, 232, 973, 1404 of process 5, 33, 99, 887 of process 6, and 134, 311, 832 of process 4.
  • the result after the load adjustment process is shown in Figure 3(b).
  • step S4 may include the following sub-steps S401-S402.
  • Step S401 Obtain the information of the ordered keyword sequence of each process of the current process group, and collect it in the main process of the current process group, wherein the information of the ordered keyword sequence includes: Quantities, keywords in specific locations.
  • Step S402 According to the information of the ordered key sequence of each process of the current process group, the main process calculates the candidate common demarcation value.
  • the keyword at a specific position is the keyword at K-1 positions when the ordered keyword sequence is divided into K segments, and the keyword at the mth position in the ordered keyword sequence is called the m/K quantile key word, m ⁇ 1, and m ⁇ K; according to the information of the ordered key sequence of each process in the current process group, the main process calculates the common demarcation value.
  • This sub-step may further include the following sub-steps S402-1 to S402-2.
  • Step S402-1 Calculate the weight of each m/K quantile keyword of the ordered keyword sequence of each process.
  • the weight of the m/K quantile keyword is: the value of the m/K quantile keyword and the ordered key The product of the number of keywords in the word sequence.
  • Step S402-2 Calculate the m/K candidate common demarcation value corresponding to each m/K quantile keyword, and the m/K candidate public demarcation value is: the m/K score of the ordered keyword sequence of all processes in the current process group. The weighted sum of the bit keys divided by the total number of keys in the ordered key sequence of all processes in the current process group.
  • the median weight is: the value of the median and the ordered keyword The product of the number of keywords in the word sequence.
  • the median weight is calculated as follows:
  • Median weight median value * number of keywords in the ordered sequence of keywords.
  • the median candidate common demarcation value corresponding to the calculated median is: the sum of the median weights of the ordered key sequences of all processes in the current process group divided by the keys of the ordered key sequences of all processes in the current process group total.
  • the above step S402 may further include the following sub-step S402-3.
  • Step S402-3 the main process sends K-1 m/K candidate common demarcation values to all processes.
  • the current m/K candidate public demarcation value can be directly determined as the public demarcation value of the current process group.
  • Threshold when the calculated m/K candidate common demarcation value reaches the threshold, the current m/K candidate public demarcation value is determined as the public demarcation value of the current process group.
  • step S402 may further include the following sub-steps S402-4 to S402-6.
  • Step S402-4 count the first and second values of each m/K candidate common demarcation value in all processes of the current process group, wherein the first quantitative value is less than the m/K candidate common demarcation value.
  • the second value is the number of keywords that are greater than or equal to the m/K candidate common demarcation value.
  • Step S402-5 If there is an m/K candidate common demarcation value whose ratio between the first quantitative value and the second quantitative value does not exceed the preset value, then determine the m/K candidate public demarcation value as the common demarcation value of the current process group. cutoff value.
  • Step S402-6 when the ratio between the first quantitative value and the second quantitative value of any m/K candidate public demarcation value exceeds the preset value, save K-1 m/K candidate public demarcation values as There are candidate common cutoff values.
  • step S402 may further include the following sub-steps S402-7 to S402-9.
  • Step S402-7 optimize the existing candidate common demarcation values, and select the adjacent first reference candidate public demarcation value and the second reference candidate public demarcation value from the existing candidate public demarcation values, wherein the first reference candidate public demarcation value is selected.
  • the first quantity value of the candidate common demarcation value is smaller than the second quantity value
  • the first quantity value of the second reference candidate common demarcation value is greater than the second quantity value
  • the calculation is calculated to be between the first reference candidate common demarcation value and the second reference candidate common demarcation value.
  • the values between the boundary values are used as the optimized candidate common boundary values, and the first and second quantities of the optimized candidate public boundary values are counted.
  • Step S402-8 when the ratio between the first quantity value and the second quantity value of the optimized candidate common demarcation value does not exceed the preset value, determine the optimized candidate public demarcation value as the public demarcation value of the current process group ;
  • Step S402-9 when the ratio between the first quantitative value and the second quantitative value of the optimized candidate common demarcation value exceeds a preset value, adding the optimized candidate public demarcation value as the existing candidate public demarcation value, Step S402-7 is executed.
  • the difference between the first quantity value and the second quantity value when the difference between the first quantity value and the second quantity value does not exceed the preset value, it means that the load is relatively balanced under the current candidate common demarcation value, and when the difference between the first quantity value and the second quantity value exceeds the preset value When the value is set, it indicates that the load is unbalanced under the current candidate public boundary value.
  • the existing candidate public boundary value needs to be optimized. Since the candidate common demarcation value is recorded every time it is calculated, during optimization, the adjacent first reference candidate public demarcation value and the second reference candidate public demarcation value can be selected from the existing candidate public demarcation values for optimization processing.
  • the adjacent first reference candidate common demarcation value 10000 and the second reference candidate common demarcation value 20000 are selected from the existing candidate common demarcation values, and the existing candidate public demarcation value of 10000 makes the number of keywords less than and greater than 10000 are 1000 (the first number value) and 2000 (the second number value), respectively, and the existing candidate common demarcation value of 20000 makes the number of keywords less than and greater than 20000 to be 2000 (the first number value) and 1000 (the second number value), respectively ), the average value of 10000 and 20000, 15000 (a value between the first reference candidate common demarcation value and the second reference candidate public demarcation value), can be calculated as the optimized current candidate public demarcation value.
  • the keyword in a specific position is the median of the ordered keyword sequence.
  • the inter-process transfer of the process group is calculated by the bottom-up transfer and reduction along the binary tree.
  • the root node at the top of the binary tree, that is, process 1 finally obtains (68, 17653), where 68 is the keyword of 6 processes
  • step S5 further includes the following sub-steps S501-S503.
  • Step S501 according to the public demarcation value of the current process group, divide the ordered keyword sequence of each process of the current process group into a first ordered keyword subsequence smaller than the public demarcation value, and a second sequenced subsequence that is greater than or equal to the public demarcation value.
  • An ordered keyword subsequence is an ordered keyword subsequence.
  • Step S502 performing an exchange process of an ordered keyword subsequence between two processes in the current process group, the exchange process includes: a process with a small number transfers its second ordered keyword subsequence to a process with a large number, and the numbered process The larger process transfers its first ordered key subsequence to the lower-numbered process.
  • each process of the current process group merges the two ordered keyword subsequences of the process into an ordered keyword sequence, so that each process stores the ordered keyword sequence scatteredly, and any arbitrary sequence on the i-th process.
  • the keys are all less than any key on the i+1th process.
  • each process when performing keyword merge sort processing between every two processes in the current process group, each process firstly divides the ordered keyword sequence into two groups that are smaller than the common demarcation value and greater than or equal to the common demarcation value according to the common demarcation value. ordinal key subsequence. Then, the two processes complete the exchange of ordered key subsequences, in which the process with the smaller number transfers the ordered key subsequence whose value is greater than or equal to the common demarcation value to the process with the larger number, and the process with the larger number transfers the subsequence whose number is less than the common demarcation value. An ordered key subsequence of demarcation values is transferred to the lower-numbered process. Finally, each process merges the original reserved and newly received ordered key subsequences into a new ordered key sequence.
  • the common demarcation value is 400, and the ordered data sequence on the two processes (the smaller number is marked as process 1, and the larger number is marked as process 2) is given.
  • Table 2 shows two ordered key subsequences determined by each process according to the common demarcation value.
  • Table 3 shows the situation after the two processes exchange the ordered key subsequence with each other.
  • Table 4 shows that each process obtains a new ordered keyword sequence after merging the two ordered keyword subsequences.
  • the processes are always balanced, and the overall minimum complexity in information processing is O(N'*log(N')),
  • the complexity of each process is O(N'*log(N')/N), which has taken into account the fast merge sort and load exchange between processes.
  • Log(N) sorting is performed between processes, and in each sorting, inter-process communication is carried out within the process group to transmit and broadcast information related to common demarcation values and load balancing.
  • the complexity of the related communication on each process is O(log(N)*log(N)).
  • this embodiment provides an information processing system.
  • the system includes: an information acquisition module 1 , a first processing module 2 , a second processing module 3 , a computing module 4 , a third processing module 5 , a storage module 6 and a control module 7 .
  • the information acquisition module 1 is configured to acquire pending information of the current process group, wherein the current process group includes at least two different processes, and each information of the information to be processed contains a keyword for marking or searching the information, and the information to be processed is to be processed. All the keywords of the information have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group.
  • the first processing module 2 is configured to process the keyword sequence of the process into an ordered keyword sequence if there is a process whose keyword sequence is a non-ordered keyword sequence.
  • the second processing module 3 is configured to perform inter-process load adjustment processing on the current process group when the load adjustment processing needs to be performed, so as to balance the load among the processes.
  • the calculation module 4 is configured to calculate the common demarcation value of the current process group.
  • the third processing module 5 is configured to perform keyword merge sorting processing between every two processes in the current process group according to the common demarcation value.
  • the storage module 6 is configured to store the results of the keyword merging and sorting processing in each process of the current process group in a scattered manner, so that each process is scattered and stored in an ordered sequence of keywords, and any keyword on the ith process is smaller than the ith+1th. Arbitrary keyword on the process.
  • the control module 7 is configured to control the calling of the information acquisition module 1 , the first processing module 2 , the second processing module 3 , the calculation module 4 , the third processing module 5 , and the storage module 6 .
  • the implementation process of the information acquisition module 1 can refer to the implementation process of step S1 in the above-mentioned first embodiment
  • the implementation process of the first processing module 2 can refer to the implementation process of step S2 in the above-mentioned first embodiment
  • For the implementation process of the processing module 3 refer to the implementation process of step S3 in the first embodiment
  • for the implementation process of the calculation module 4 refer to the implementation process of step S4 in the above-mentioned first embodiment
  • the implementation process of the third processing module 5 refer to For the implementation process of step S5 in the above-mentioned embodiment 1
  • the implementation process of the storage module 6 reference may be made to the implementation process of step S6 in the above-mentioned embodiment 1, which will not be repeated in this embodiment.
  • the system provided by this embodiment is a process-level parallel distributed information processing system.
  • the control module 7 When performing information processing, the control module 7 first calls the first processing module 2, so that the keyword sequence in each process becomes an ordered keyword sequence; then Based on the idea of quick sorting, the collaborative sorting between processes is completed by calling the other three modules.
  • the control module 7 calls the second processing module 3 to adjust the distribution of all keywords of the process group among the processes; and then calls the calculation module 4 to determine the The common demarcation value of the process group; then call the third processing module 5 to complete the data exchange and fast merge sorting processing between the two processes of each pair, complete one sorting process, and obtain the information processing result of the current process group, that is, the current The ordered keywords of the process group, the storage module 6 stores the results of the keyword merging and sorting processing in each process of the current process group, so that each process is scattered and stores the ordered keyword sequence, and any keyword on the i-th process is scattered and stored. are less than any keyword on the i+1th process.
  • the current process group when dividing the process pair, can be divided into a small-value sub-process group and a large-value sub-process group, (wherein the number of each process in the small-value sub-process group is smaller than that in the large-value sub-process group. The number of each process), therefore, in practical applications, after completing one round of sorting, the control module 7 can respectively perform the next round of sorting in the small-value sub-process group and the large-value sub-process group.
  • the third processing module 5 is further configured to obtain the first sub-process group and the second sub-process group according to the common demarcation value, wherein all keywords in the first sub-process group are less than the common demarcation value, and the second sub-process group All keywords in are greater than the cutoff value.
  • the control module 7 is further configured to call the information acquisition module, the first processing module, the second processing module, the computing module, the third processing module, and the storage module when the first sub-process group or the second sub-process group includes at least two processes. , so as to perform information processing on the first sub-process group or the second sub-process group including at least two processes.
  • each of the above modules can use multiple threads within a process to perform accelerated computation.
  • This embodiment provides an electronic device, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the information processing method according to the first embodiment is implemented.
  • the electronic device may also include a communication component.
  • the processor is configured to execute all or part of the steps in the information processing method in the first embodiment.
  • the memory is used to store various types of data, which may include, for example, instructions for methods in the electronic device, as well as data related to the electronic device.
  • the processor may be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device ( Programmable Logic Device (PLD for short), Field Programmable Gate Array (FPGA for short), controller, microcontroller, microprocessor or other electronic components are implemented to perform the information processing in the first embodiment above. method.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • the memory can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory) Erasable Programmable Read-Only Memory (EEPROM for short), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM Static Random Access Memory
  • EPROM Erasable Programmable Read-Only Memory
  • PROM Programmable Read-Only Memory
  • ROM Read-Only Memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • Communication components are used for wired or wireless communication between electronic devices and other devices.
  • Wireless communication such as Wi-Fi, Bluetooth, Near Field Communication (NFC for short), 2G, 3G or 4G, or a combination of one or more of them, so the corresponding communication components may include: Wi-Fi -Fi module, bluetooth module, NFC module.
  • This embodiment provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by one or more processors, the information processing method according to the first embodiment is implemented.
  • flash memory such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable only memory Read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application mall, etc., on which a computer program is stored, and when the computer program is executed by the processor, it can be as in the first embodiment
  • the specific embodiment process of all or part of the steps in the above-mentioned information processing method can refer to the first embodiment, which will not be repeated in this embodiment.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which contains one or more possible functions for implementing the specified logical function(s) Execute the instruction. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
  • information processing of order adjustment of massive information can be completed among multiple computing nodes, thereby improving processing efficiency.
  • the keyword sequence in the process is processed from disorder to order first; when load adjustment processing is required between processes, load adjustment processing is performed to make Load balancing of each process; based on the calculated common demarcation value, perform fast merge sort processing between the two processes in the current process group, and combine the respective ordered keyword sequences on the two processes to complete a quick merge sort. It is disclosed that the load balance of each process is good, and the computational complexity is low, the distributed parallel processing efficiency can be effectively improved, and the ordered keywords of the process group can be quickly obtained.

Abstract

An information processing method and system, an electronic device, and a storage medium. The method comprises: obtaining information to be processed of the current process group; if there is a process in which the keyword sequence is a non-ordered keyword sequence, processing the keyword sequence of the process into an ordered keyword sequence; when load regulation processing is required, performing inter-process load regulation processing on the current process group so as to balance a load between respective processes; calculating a common boundary value of the current process group; according to the common boundary value, performing keyword merging and sorting processing between every two processes of the current process group; and dispersedly storing the results of keyword merging and sorting processing in each process of the current process group, so that each process dispersedly stores the ordered keyword sequence, and any keyword on the i-th process is smaller than any keyword on the (i+1)-th process.

Description

一种信息处理方法、系统、电子设备及存储介质An information processing method, system, electronic device and storage medium
相关申请的交叉引用CROSS-REFERENCE TO RELATED APPLICATIONS
本申请要求享有2020年07月24日提交的名称为“一种信息处理方法、系统、电子设备及存储介质”的中国专利申请CN202010720545.5的优先权,其全部内容通过引用并入本申请中。This application claims the priority of Chinese patent application CN202010720545.5, which was filed on July 24, 2020 and entitled "An Information Processing Method, System, Electronic Device and Storage Medium", the entire contents of which are incorporated into this application by reference .
技术领域technical field
本公开涉及信息处理技术领域,具体涉及一种信息处理方法、系统、电子设备及存储介质。The present disclosure relates to the technical field of information processing, and in particular, to an information processing method, system, electronic device and storage medium.
背景技术Background technique
在数据库和互联网等技术领域,通常需要从一组信息中找出所需的信息。为了能快速查找出所需的信息,一种有效的办法就是建立信息的包含关键字(可以是任意位数的整数或浮点数,也可以是任意长度的字符串等)的数据结构,并将一组信息根据关键字的某种顺序组织起来。排序就是一种能有效组织信息的方法,它能将一组信息调整为关键字有序的一组信息;而后,可基于关键字快速查找到所需信息。In technical fields such as databases and the Internet, it is often necessary to find the desired information from a set of information. In order to quickly find the required information, an effective way is to establish a data structure containing keywords (which can be integers or floating-point numbers of any number of digits, or strings of any length, etc.) A set of information is organized according to some order of keywords. Sorting is a method that can effectively organize information. It can adjust a group of information into a group of information with ordered keywords; then, the required information can be quickly found based on the keywords.
在过去,由于数据库和互联网等领域要处理的信息较少,通常只需用一个处理器核就能完成一组信息从关键字无序到有序的调整。随着信息量的增加,需要利用在同一计算节点内多个处理器核的并行计算来加速调整。当前,要处理的信息量已达到了海量,如T级甚至P级,一个计算节点无论是计算能力还是存储能力都无法满足需求,需要高效利用多个计算节点间的并行计算,才能及时完成调整。In the past, due to the small amount of information to be processed in fields such as databases and the Internet, it is usually only necessary to use one processor core to complete the adjustment of a set of information from disordered to ordered. As the amount of information increases, parallel computing of multiple processor cores within the same computing node needs to be utilized to speed up the adjustment. At present, the amount of information to be processed has reached a massive amount, such as T-level or even P-level. A computing node cannot meet the demand in terms of computing power or storage capacity. It is necessary to efficiently utilize parallel computing among multiple computing nodes to complete the adjustment in time. .
要在多个计算节点间完成海量信息从关键字无序到有序的调整,复杂度比较高,也效率低下,因此,亟需一种能够在多个计算节点间完成海量信息顺序调整的信息处理方案,以提高处理效率。To complete the adjustment of massive information from disordered to ordered keywords among multiple computing nodes, the complexity is relatively high and the efficiency is low. Therefore, there is an urgent need for an information that can adjust the order of massive information among multiple computing nodes. Treatment scheme to improve treatment efficiency.
发明内容SUMMARY OF THE INVENTION
针对上述技术问题,本公开提供一种信息处理方法、系统、电子设备及存储介质,能够在多个计算节点间完成海量信息顺序调整的信息处理,从而提高处理效率。In view of the above technical problems, the present disclosure provides an information processing method, system, electronic device and storage medium, which can complete information processing of order adjustment of massive information among multiple computing nodes, thereby improving processing efficiency.
第一方面,本公开提供一种信息处理方法,包括:获取当前进程组的待处理信息,其中,当前进程组包括至少两个不同进程,待处理信息的每个信息包含一个用于标记或搜索查询该信息的关键字,待处理信息的所有关键字已分散存储在当前进程组的各个进程中,在各个进程组形成关键字序列;若存在关键字序列为非有序关键字序列的进程,则将该进程的关键字序列处理为有序关键字序列;当需要进行负载调整处理时,对当前进程组进行进程间负载调整处理,以使各进程间的负载均衡;计算当前进程组的公共分界值;根据公共分界值,在当前进程组的每两个进程间进行关键字归并排序处理;将关键字归并排序处理的结果分散存储在当前进程组的各个进程中,各进程分散存储有序关键字序列,且第i进程上的任意关键字都小于第i+1进程上的任意关键字。In a first aspect, the present disclosure provides an information processing method, comprising: acquiring information to be processed of a current process group, wherein the current process group includes at least two different processes, and each information of the information to be processed includes one for marking or searching Query the keywords of the information, all the keywords of the information to be processed have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group; if there is a process whose keyword sequence is an unordered keyword sequence, Then the keyword sequence of the process is processed as an ordered keyword sequence; when load adjustment processing is required, the inter-process load adjustment processing is performed on the current process group to balance the load among the processes; the public value of the current process group is calculated. Demarcation value; according to the common demarcation value, perform keyword merge sort processing between every two processes in the current process group; the results of keyword merge sort processing are scattered and stored in each process of the current process group, and each process is scattered and stored in an orderly manner A sequence of keywords, and any keyword on the i-th process is less than any keyword on the i+1-th process.
第二方面,本公开提供一种信息处理系统,包括:信息获取模块,配置为获取当前进程组的待处理信息,其中,当前进程组包括至少两个不同进程,待处理信息的每个信息包含一个用于标记或搜索查询该信息的关键字,待处理信息的所有关键字已分散存储在当前进程组的各个进程中,在各个进程组形成关键字序列;第一处理模块,配置为若存在关键字序列为非有序关键字序列的进程,则将该进程的关键字序列处理为有序关键字序列;第二处理模块,配置为当需要进行负载调整处理时,对当前进程组进行进程间负载调整处理,以使各进程间的负载均衡;计算模块,配置为计算当前进程组的公共分界值;第三处理模块,配置为根据公共分界值,在当前进程组的每两个进程间进行关键字归并排序处理;存储模块,配置为将关键字归并排序处理的结果分散存储在当前进程组的各个进程中,使各进程分散存储有序关键字序列,且第i进程上的任意关键字都小于第i+1进程上的任意关键字;控制模块,配置为控制信息获取模块、第一处理模块、第二处理模块、计算模块、第三处理模块、存储模块的调用。In a second aspect, the present disclosure provides an information processing system, comprising: an information acquisition module configured to acquire pending information of a current process group, wherein the current process group includes at least two different processes, and each piece of information to be processed includes A keyword used to mark or search for the information, all keywords of the information to be processed have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group; the first processing module is configured to For a process whose keyword sequence is a non-ordered keyword sequence, the keyword sequence of the process is processed as an ordered keyword sequence; the second processing module is configured to process the current process group when load adjustment processing is required. Inter-process load adjustment processing to balance the load among the processes; the calculation module is configured to calculate the common demarcation value of the current process group; the third processing module is configured to calculate between every two processes of the current process group according to the public demarcation value Perform keyword merge sorting processing; the storage module is configured to store the results of keyword merge sorting processing in each process of the current process group, so that each process stores an ordered keyword sequence in a scattered manner, and any key on the i-th process is distributed. The words are all smaller than any keyword on the i+1th process; the control module is configured to control the invocation of the information acquisition module, the first processing module, the second processing module, the computing module, the third processing module, and the storage module.
第三方面,本公开提供一种电子设备,包括存储器和处理器,存储器上存储有计算机程序,所述计算机程序被处理器执行时实现如第一方面所述的信息处理方法。In a third aspect, the present disclosure provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, the information processing method according to the first aspect is implemented.
第四方面,本公开提供一种存储介质,该存储介质上存储有计算机程序,所述计算机程序被一个或多个处理器执行时,实现如第一方面所述的信息处理方法。In a fourth aspect, the present disclosure provides a storage medium on which a computer program is stored, and when the computer program is executed by one or more processors, implements the information processing method according to the first aspect.
附图说明Description of drawings
为了更清楚地说明本公开实施例的技术方案,下面将对实施例中所需要使用的附图作简单地介绍,应当理解,以下附图仅示出了本公开的某些实施例,因此不应被看作是对范围的限定,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他相关的附图。In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.
图1是本公开实施例一提供的一种信息处理方法的流程图;1 is a flowchart of an information processing method provided in Embodiment 1 of the present disclosure;
图2是本公开实施例一提供的一个进程组示例;FIG. 2 is an example of a process group provided by Embodiment 1 of the present disclosure;
图3(a)是本公开实施例一提供的一个进程组的负载调整示例,其中,是该进程组的进程对;FIG. 3(a) is an example of load adjustment of a process group provided by Embodiment 1 of the present disclosure, where is a process pair of the process group;
图3(b)是本公开实施例一提供的一个进程组的负载调整处理后的结果;FIG. 3(b) is a result of load adjustment processing of a process group provided by Embodiment 1 of the present disclosure;
图4是本公开实施例一提供的候选公共分界值计算示例;4 is a calculation example of a candidate common demarcation value provided by Embodiment 1 of the present disclosure;
图5是本公开实施例二提供的信息处理系统框图。FIG. 5 is a block diagram of an information processing system provided by Embodiment 2 of the present disclosure.
具体实施方式detailed description
下面将结合本公开实施例中附图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。通常在此处附图中描述和示出的本公开实施例的组件可以以各种不同的配置来布置和设计。因此,以下对在附图中提供的本公开的实施例的详细描述并非旨在限制要求保护的本公开的范围,而是仅仅表示本公开的选定实施例。基于本公开的实施例,本领域技术人员在没有做出创造性劳动的前提下所获得的所有其他实施例,都属于本公开保护的范围。The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all of the embodiments. The components of the disclosed embodiments generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.
给定N’个关键字,关键字无序到有序的调整的最低复杂度为O(N’*log(N’))。在多个计算节点间完成海量信息从关键字无序到有序的调整时,能达到该最低复杂度的排序方法包括快速排序和归并排序等。Given N' keywords, the minimum complexity of keyword unordered to ordered adjustment is O(N'*log(N')). When a large amount of information is adjusted from keyword disorder to order among multiple computing nodes, sorting methods that can achieve the lowest complexity include quick sort and merge sort.
快速排序的基本思路是:通过一趟排序将一组数据(或关键字)分割成独立的两部分,其中一部分的所有数据(或关键字)都小于分界值,另一部分的所有数据(或关键字)都大于等于分界值,然后再按此方法对这两部分数据(或关键字)分别递归进行快速排序,因此,分界值的选取对快速排序的复杂度有重要影响,虽然其平均复杂度为O(N’*log(N’)),但在最坏情况下,复杂度可以达到O(N’*N’)。The basic idea of quick sort is to divide a set of data (or keywords) into two independent parts through one-way sorting, in which all data (or keywords) in one part are less than the cut-off value, and all data (or keys) in the other part are word) are greater than or equal to the cutoff value, and then recursively perform quicksort on these two parts of data (or keywords) in this way. Therefore, the choice of cutoff value has an important impact on the complexity of quicksort, although the average complexity is O(N'*log(N')), but in the worst case, the complexity can reach O(N'*N').
归并排序的基本思路是:通过一趟排序将已有序的若干数据(或关键字)子序列合并,得到完全有序的数据(或关键字)序列;即先使每个子序列有序,再使子序列段间有序。归并排序的复杂度稳定在O(N’*log(N’)),但与快速排序相比,则需要一份额外的存储空间。The basic idea of merge sort is to combine several ordered subsequences of data (or keywords) in one pass to obtain a completely ordered sequence of data (or keywords); Makes subsequence segments ordered. The complexity of merge sort is stable at O(N'*log(N')), but it requires an extra storage space compared to quicksort.
因此,要在多个计算节点间完成海量信息从关键字无序到有序的调整,复杂度比较高,也效率低下,本公开提供了一种能够在多个计算节点间完成海量信息顺序调整的信息处理方案,可以提高处理效率。Therefore, to complete the adjustment of massive information from disordered to ordered keywords among multiple computing nodes, the complexity is relatively high and the efficiency is low. The present disclosure provides a method that can complete the sequential adjustment of massive information among multiple computing nodes. The information processing scheme can improve the processing efficiency.
实施例一Example 1
给定一个进程组,其中包括N个进程(各进程具有1到N之间的唯一编号),总数量为N’的关键字分散存放在该进程组各进程中,每个进程中都存放了部分关键字组成的关键字序列,为了得到当前进程组的有序关键字,以便能够基于有序关键字快速查找到用户所需信息,采用本实施例提供的一种信息处理方法,对分散存放在该进程组各进程中的关键字序列实现有序化处理,使每个进程内存放着有序关键字序列,且第i个进程上的任意关键字都小于第i+1个进程上的任意关键字,其中,i为1到N-1之间的任意数。图1示出了本实施例提供一种信息处理方法的流程图,需要说明的是,本公开实施例提供的信息处理方法并不以图1以及以下的具体顺序为限制,应当理解,在其它实施例中,本公开实施例提供的信息处理方法其中部分步骤的顺序可以根据实际需要相互交换,或者其中的部分步骤也可以省略或删除,或者其中的部分步骤可以同时执行。Given a process group, which includes N processes (each process has a unique number between 1 and N), a total number of N' keywords are scattered in each process of the process group, and each process is stored in A keyword sequence composed of some keywords, in order to obtain the ordered keywords of the current process group, so that the information required by the user can be quickly found based on the ordered keywords, an information processing method provided by this embodiment is used to store the information in a distributed manner. The sequence of keywords in each process of the process group is processed in an orderly manner, so that an ordered sequence of keywords is stored in each process, and any keyword on the i-th process is smaller than the i+1-th process. Any keyword, where i is any number between 1 and N-1. FIG. 1 shows a flowchart of an information processing method provided by this embodiment. It should be noted that the information processing method provided by this embodiment of the present disclosure is not limited to the specific sequence shown in FIG. 1 and the following. It should be understood that in other In the embodiment, the order of some steps in the information processing method provided by the embodiment of the present disclosure may be exchanged with each other according to actual needs, or some of the steps may be omitted or deleted, or some of the steps may be performed simultaneously.
下面结合图1对本公开一实施例的方法流程进行阐述,如图1所示,该方法包括如下步骤S1~S6。The following describes a method flow of an embodiment of the present disclosure with reference to FIG. 1 . As shown in FIG. 1 , the method includes the following steps S1 to S6 .
步骤S1、获取当前进程组的待处理信息,其中,当前进程组包括至少两个不同进程,待处理信息的每个信息包含一个用于标记或搜索查询该信息的关键字,待处理信息的所有关键字已分散存储在当前进程组的各个进程中,在各个进程组形成关键字序列。可以理解的是,关键字可以是任意位数的整数或浮点数,也可以是任意长度的字符串或其他形式,此处不做任何限定。Step S1: Acquire pending information of the current process group, wherein the current process group includes at least two different processes, each information of the information to be processed includes a keyword for marking or searching for the information, all the information to be processed The keywords have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group. It can be understood that the keyword can be an integer or floating-point number of any number of digits, or a string of any length or other forms, which are not limited here.
步骤S2、若存在关键字序列为非有序关键字序列的进程,则将该进程的关键字序列处理为有序关键字序列。优选地,此步骤可以通过使用已有排序方法将非有序关键字序列处理为有序关键字序列,如快速排序、归并排序或堆排序等。为此,可以通过已有线程级并行排序程序或使用了处理器向量指令的已有排序程序,完成对存放在该进程上的局部关键字序列的排序处理,形成该进程上的有序关键字序列。Step S2: If there is a process in which the keyword sequence is a non-ordered keyword sequence, the keyword sequence of the process is processed as an ordered keyword sequence. Preferably, in this step, the non-ordered key sequence can be processed into an ordered key sequence by using an existing sorting method, such as quick sort, merge sort or heap sort. To this end, the sorting process of the local keyword sequence stored in the process can be completed by an existing thread-level parallel sorting program or an existing sorting program using processor vector instructions to form an ordered keyword on the process. sequence.
步骤S3、当需要进行负载调整处理时,对当前进程组进行进程间负载调整处理,以使各进程间的负载均衡。Step S3: When the load adjustment process needs to be performed, the inter-process load adjustment process is performed on the current process group, so as to balance the load among the processes.
步骤S4、计算当前进程组的公共分界值。Step S4: Calculate the common boundary value of the current process group.
步骤S5、根据公共分界值,在当前进程组的每两个进程间进行关键字归并排序处理。Step S5: Perform keyword merge sorting processing between every two processes in the current process group according to the common demarcation value.
步骤S6、将关键字归并排序处理的结果分散存储在当前进程组的各个进程中,各进程分散存储有序关键字序列,且第i进程上的任意关键字都小于第i+1进程上的任意关键字。Step S6, the results of the keyword merging and sorting processing are scattered and stored in each process of the current process group, and each process is scattered and stored in an ordered sequence of keywords, and any keyword on the i-th process is smaller than the i+1-th process. any keyword.
本实施例中,若存在关键字序列为非有序关键字序列的进程,则先对该进程内的关键字序列进行从无序到有序的处理,将进程中的关键字序列处理为有序关键字序列;当进程间需要进行负载调整处理时进行负载调整处理,使各进程间的负载均衡;基于计算的公共分界值,在当前进程组的两个进程间进行快速归并排序处理,结合两个进程上各自的有序关键字序列,完成一趟快速归并排序,通过此方法,各进程的负载平衡性好,且计算复杂度低,能够有效提高分布式并行处理效率,快速得到进程组的有序关键字,完成一组进程的信息处理。In this embodiment, if there is a process in which the keyword sequence is an unordered keyword sequence, the keyword sequence in the process is first processed from disorder to order, and the keyword sequence in the process is processed as an ordered sequence. sequence keyword sequence; when load adjustment processing is required between processes, load adjustment processing is performed to balance the load among each process; based on the calculated common demarcation value, fast merge sort processing is performed between two processes in the current process group, combined with The respective ordered keyword sequences on the two processes complete a fast merge sort. Through this method, the load balance of each process is good, and the computational complexity is low, which can effectively improve the efficiency of distributed parallel processing and quickly obtain process groups. The ordered key of , completes the information processing of a group of processes.
完成一组进程的信息处理后,该方法还包括如下步骤S7~S8。After completing the information processing of a group of processes, the method further includes the following steps S7-S8.
步骤S7、根据公共分界值,得到第一子进程组和第二子进程组,其中,第一子进程组中的所有关键字小于公共分界值,第二子进程组中的所有关键字大于公共分界值。Step S7, obtain the first sub-process group and the second sub-process group according to the common demarcation value, wherein, all the keywords in the first sub-process group are less than the common demarcation value, and all the keywords in the second sub-process group are greater than the common demarcation value. cutoff value.
这里,根据公共分界值对当前进程组做进一步分组,得到两个子进程组,再继续进行下一轮信息处理,也就是,将至少包括两个进程的子进程组作为当前进程组,执行步骤S1~步骤S6的过程。Here, the current process group is further grouped according to the common demarcation value, and two sub-process groups are obtained, and then the next round of information processing is continued. That is, the sub-process group including at least two processes is used as the current process group, and step S1 is executed. ~ The process of step S6.
步骤S8、当第一子进程组或第二子进程组包括至少两个进程时,执行获取当前进程组的待处理信息,以对该子进程组进行信息处理。Step S8: When the first sub-process group or the second sub-process group includes at least two processes, execute the acquisition of pending information of the current process group, so as to perform information processing on the sub-process group.
进一步地,为了实现负载均衡,需要根据当前进程组所有进程的负载相关信息,分析是否需要进行负载调整处理,例如,步骤S3可以包括如下子步骤S301~S304。Further, in order to achieve load balancing, it is necessary to analyze whether load adjustment processing is required according to the load-related information of all processes in the current process group. For example, step S3 may include the following sub-steps S301-S304.
步骤S301、获取当前进程组所有进程的负载信息,并汇集于当前进程组的主进程;其中,负载信息包括:各进程的负载(也就是,每个进程的关键字序列中的关键字数量)、所有进程的最大负载和所有进程的平均负载。Step S301: Obtain the load information of all processes in the current process group, and collect them in the main process of the current process group; wherein, the load information includes: the load of each process (that is, the number of keywords in the keyword sequence of each process) , the maximum load of all processes, and the average load of all processes.
步骤S302、在主进程中对当前进程组的所有进程的负载进行排序处理,得到当前进程组的负载有序数组,并将负载有序数组发送给当前进程组的所有进程。Step S302: Sort the loads of all processes in the current process group in the main process, obtain an ordered array of loads of the current process group, and send the ordered array of loads to all processes in the current process group.
这里,对当前进程组的所有进程的负载进行排序处理,可以是按照负载从小到大排序,也可以是按照负载从大到小排序,此处不做限定。Here, the sorting processing is performed on the loads of all processes in the current process group, which may be sorted according to the load from small to large, or may be sorted according to the load from large to small, which is not limited here.
步骤S303、根据当前进程组的负载信息,判断是否需要进行负载调整处理。Step S303, according to the load information of the current process group, determine whether the load adjustment process needs to be performed.
这里,为了准确判断进程间负载是否平衡,也就是是否需要进行负载调整处理,可以通过在主进程中计算负载不平衡率来实现。Here, in order to accurately determine whether the load between the processes is balanced, that is, whether the load adjustment processing is required, it can be realized by calculating the load unbalance rate in the main process.
为此,在步骤S303中,首先,根据当前进程组的负载信息,计算负载不平衡率。其中, 负载不平衡率的计算方法是:该进程组所有进程的最大负载除以该进程组所有进程的平均负载,负载不平衡率越大表示负载越不平衡,负载不平衡率接近1时表示负载很平衡。然后,当负载不平衡率大于预设阈值时,则需要进行负载调整处理。Therefore, in step S303, first, the load imbalance ratio is calculated according to the load information of the current process group. Among them, the calculation method of the load imbalance ratio is: the maximum load of all processes in the process group is divided by the average load of all processes in the process group. The larger the load imbalance ratio, the more unbalanced the load is. When the load imbalance ratio is close to 1, it means that the load is unbalanced. The load is well balanced. Then, when the load unbalance rate is greater than the preset threshold, load adjustment processing is required.
可以理解的是,当负载不平衡率不大于预设阈值时,则不需要进行负载调整处理。其中的预设阈值根据实际情况设定即可,此处不做限定。It can be understood that, when the load unbalance rate is not greater than the preset threshold, the load adjustment process does not need to be performed. The preset threshold may be set according to the actual situation, which is not limited here.
步骤S304、当需要进行负载调整处理时,根据负载有序数组将当前进程组的所有进程划分成若干进程对,在各进程对的两个进程之间进行负载调整处理,其中,负载第i大的进程与负载第N-i大的进程组成一个进程对,i<N,N为当前进程组的进程数量。Step S304: When load adjustment processing is required, all processes of the current process group are divided into several process pairs according to the load ordered array, and load adjustment processing is performed between the two processes of each process pair, wherein the load i is the largest. The process of and the process with the largest load Ni form a process pair, i<N, N is the number of processes in the current process group.
可以理解的是,由于负载有序数组中已经将各进程的负载进行了排序处理,因此,为了提高负载调整处理效率,在进行负载调整处理时,通过负载第i大的进程与负载第N-i大的进程组成一个进程对,将所有进程划分为若干负载大致均衡的进程对,在进程对内进行负载调整,能够快速实现复杂均衡。It can be understood that since the load of each process has been sorted in the load ordered array, in order to improve the efficiency of load adjustment processing, when the load adjustment processing is performed, the process with the i-th largest load and the load of the Ni-th largest process are used. All processes are divided into several process pairs with roughly balanced load, and load adjustment is performed within the process pair, which can quickly achieve complex balance.
在实际应用中,在划分进程对时,还可以将当前进程组划分为小值子进程组和大值子进程组,其中小值子进程组中各进程的编号小于大值子进程组中各进程的编号;最后将小值、大值两个子进程组中的进程进行配对(每对中一个进程来自小值子进程组、另一个进程来自大值子进程组),小值子进程组与大值子进程组的进程数可尽量接近,也可根据计算模块4得出的小于公共分界值的数据数量和大于等于公共分界值的数据数量予以确定。例如当进程组有100个进程,小于公共分界值、大于等于公共分界值的数据数量分别是450000和550000时,小值子进程组和大值子进程组应分别有45和55个进程。如果小值子进程组的进程数小于大值子进程组的进程数,在调用第三处理模块5时,则会出现一个小值进程出现在多个进程对中的情况。In practical applications, when dividing process pairs, the current process group can also be divided into a small-value sub-process group and a large-value sub-process group, wherein the number of each process in the small-value sub-process group is smaller than that of each process in the large-value sub-process group. The number of the process; finally, the processes in the two sub-process groups of small value and large value are paired (one process in each pair is from the small-value sub-process group, and the other process is from the large-value sub-process group), and the small-value sub-process group is the same as the The number of processes in the large-value sub-process group can be as close as possible, and can also be determined according to the number of data less than the common threshold and the number of data greater than or equal to the common threshold obtained by the calculation module 4 . For example, when there are 100 processes in the process group, and the number of data less than or equal to the common threshold value is 450,000 and 550,000, respectively, the small-value sub-process group and the large-value sub-process group should have 45 and 55 processes respectively. If the number of processes in the small-value sub-process group is less than the number of processes in the large-value sub-process group, when the third processing module 5 is called, a small-value process appears in multiple process pairs.
这里,在各进程对的两个进程之间进行负载调整处理,包括:在各进程对中,负载大的进程将部分关键字转移到负载小的进程中,负载小的进程基于归并排序将来自于负载大的进程的部分关键字合并进来,以使进程对中两个进程的负载均衡,实现两个进程的负载相同或接近。Here, load adjustment processing is performed between two processes in each process pair, including: in each process pair, the process with a large load transfers some keywords to a process with a small load, and the process with a small load transfers the keywords from the process based on the merge sort. Some keywords related to the process with heavy load are merged in, so that the load of the two processes in the process pair is balanced, so that the load of the two processes is the same or close.
以图2所示的进程组为例,该进程组包括7个进程,即:进程1~进程7。各进程首先建立自己的负载信息,包括本地数据的数量(即该进程的负载)和该进程编号组成的二元组;然后在沿着二叉树自下而上的过程中,完成各进程负载的归并排序(以负载为排序的关键字),在达到位于顶部根节点即进程1后,进程1基于已排好序的负载信息数组,找到最大负载(即进程5有12个数据)、并求出所有进程的总负载即53个数据,从而求出 所有进程的平均负载为53/7,负载不平衡率为12/(53/7)、即所有进程的最大负载除以所有进程的平均负载。当负载不平衡率超过预设阈值时,将进行负载调整处理。图3给出了相应负载调整的例子,其中共有如图3(a)所示的3个进程对(由于进程总数7是奇数,负载排名第4的进程7没有被配对),可以基于周期间隔选出被调整到另一个进程的数据,例如图3(a)中,进程5的77、232、973、1404,进程6的33、99、887,进程4的134、311、832。负载调整处理后的结果如图3(b)所示。Taking the process group shown in FIG. 2 as an example, the process group includes 7 processes, namely: process 1 to process 7 . Each process first establishes its own load information, including the number of local data (that is, the load of the process) and the two-tuple composed of the process number; then in the bottom-up process along the binary tree, complete the merging of the load of each process Sorting (keyword for sorting by load), after reaching the top root node, i.e. process 1, based on the sorted load information array, process 1 finds the maximum load (that is, process 5 has 12 data), and finds out The total load of all processes is 53 data, so the average load of all processes is 53/7, and the load imbalance ratio is 12/(53/7), that is, the maximum load of all processes is divided by the average load of all processes. When the load imbalance ratio exceeds a preset threshold, load adjustment processing will be performed. Figure 3 gives an example of the corresponding load adjustment, where there are a total of 3 process pairs as shown in Figure 3(a) (since the total number of processes 7 is an odd number, the process 7 with the 4th load is not paired), which can be based on the periodic interval Select data to be adjusted to another process, for example, in Figure 3(a), 77, 232, 973, 1404 of process 5, 33, 99, 887 of process 6, and 134, 311, 832 of process 4. The result after the load adjustment process is shown in Figure 3(b).
进一步地,为了完成公共分界值的计算,需要综合考虑当前进程组所有进程的有序关键字序列的相关信息,因此,步骤S4可以包括如下子步骤S401~S402。Further, in order to complete the calculation of the common boundary value, it is necessary to comprehensively consider the relevant information of the ordered keyword sequences of all processes in the current process group. Therefore, step S4 may include the following sub-steps S401-S402.
步骤S401、获取当前进程组的每个进程的有序关键字序列的信息,并汇集于当前进程组的主进程,其中,有序关键字序列的信息包括:有序关键字序列中关键字的数量、特定位置的关键字。Step S401: Obtain the information of the ordered keyword sequence of each process of the current process group, and collect it in the main process of the current process group, wherein the information of the ordered keyword sequence includes: Quantities, keywords in specific locations.
步骤S402、根据当前进程组的各进程的有序关键字序列的信息,主进程计算候选公共分界值。Step S402: According to the information of the ordered key sequence of each process of the current process group, the main process calculates the candidate common demarcation value.
特定位置的关键字为将有序关键字序列分为K段时的K-1个位置上的关键字,有序关键字序列中第m位置上的关键字被称为m/K分位关键字,m≥1,且m≤K;根据当前进程组的各进程的有序关键字序列的信息,主进程计算公共分界值。该子步骤可以进一步包括如下子步骤S402-1~S402-2。The keyword at a specific position is the keyword at K-1 positions when the ordered keyword sequence is divided into K segments, and the keyword at the mth position in the ordered keyword sequence is called the m/K quantile key word, m≥1, and m≤K; according to the information of the ordered key sequence of each process in the current process group, the main process calculates the common demarcation value. This sub-step may further include the following sub-steps S402-1 to S402-2.
步骤S402-1、计算每个进程的有序关键字序列的各m/K分位关键字的权重,m/K分位关键字权重为:m/K分位关键字的值与有序关键字序列的关键字数量之积。Step S402-1: Calculate the weight of each m/K quantile keyword of the ordered keyword sequence of each process. The weight of the m/K quantile keyword is: the value of the m/K quantile keyword and the ordered key The product of the number of keywords in the word sequence.
步骤S402-2、计算各m/K分位关键字对应的m/K候选公共分界值,m/K候选公共分界值为:当前进程组的所有进程的有序关键字序列的m/K分位关键字的权重总和除以当前进程组的所有进程的有序关键字序列的关键字总量。Step S402-2: Calculate the m/K candidate common demarcation value corresponding to each m/K quantile keyword, and the m/K candidate public demarcation value is: the m/K score of the ordered keyword sequence of all processes in the current process group. The weighted sum of the bit keys divided by the total number of keys in the ordered key sequence of all processes in the current process group.
以特定位置的关键字为有序关键字序列的中位数为例,计算每个进程的有序关键字序列的中位数权重,中位数权重为:中位数的值与有序关键字序列的关键字数量之积。Taking the keyword at a specific position as the median of the ordered keyword sequence as an example, calculate the median weight of the ordered keyword sequence of each process. The median weight is: the value of the median and the ordered keyword The product of the number of keywords in the word sequence.
中位数权重的计算式如下:The median weight is calculated as follows:
中位数权重=中位数的值*有序关键字序列的关键字数量。Median weight = median value * number of keywords in the ordered sequence of keywords.
计算的中位数对应的中位候选公共分界值为:当前进程组的所有进程的有序关键字序列的中位数权重总和除以当前进程组的所有进程的有序关键字序列的关键字总量。The median candidate common demarcation value corresponding to the calculated median is: the sum of the median weights of the ordered key sequences of all processes in the current process group divided by the keys of the ordered key sequences of all processes in the current process group total.
中位候选公共分界值的计算式如下:The formula for calculating the median candidate common cutoff value is as follows:
中位候选公共分界值=中位数权重总和/关键字总量。Median Candidate Common Cutoff = Median Weight Sum/Keyword Total.
在一些实施例中,在上述步骤S402还可以包括如下子步骤S402-3。In some embodiments, the above step S402 may further include the following sub-step S402-3.
步骤S402-3、主进程将K-1个的m/K候选公共分界值发送给所有进程。Step S402-3, the main process sends K-1 m/K candidate common demarcation values to all processes.
在实际应用中,为减少运算过程,可以在计算出m/K候选公共分界值时,直接将当前m/K候选公共分界值确定为当前进程组的公共分界值,也可以通过预先设定一个阈值,当计算出的m/K候选公共分界值达到该阈值时,将当前m/K候选公共分界值确定为当前进程组的公共分界值。可以理解的是,这两种情况仅为实际应用的举例,本实施例对此不做任何限定。In practical applications, in order to reduce the operation process, when calculating the m/K candidate common demarcation value, the current m/K candidate public demarcation value can be directly determined as the public demarcation value of the current process group. Threshold, when the calculated m/K candidate common demarcation value reaches the threshold, the current m/K candidate public demarcation value is determined as the public demarcation value of the current process group. It can be understood that, these two situations are only examples of practical applications, and this embodiment does not make any limitations on them.
在另一些实施例中,为了得出更为准确的公共分界值,在上述步骤S402-3之后,步骤S402还可以包括如下子步骤S402-4~S402-6。In other embodiments, in order to obtain a more accurate common demarcation value, after the above step S402-3, the step S402 may further include the following sub-steps S402-4 to S402-6.
步骤S402-4、统计出当前进程组的所有进程中的各m/K候选公共分界值的第一数量值和第二数量值,其中,第一数量值是小于m/K候选公共分界值的关键字数量,第二数量值是大于等于m/K候选公共分界值的关键字数量。Step S402-4, count the first and second values of each m/K candidate common demarcation value in all processes of the current process group, wherein the first quantitative value is less than the m/K candidate common demarcation value. The number of keywords. The second value is the number of keywords that are greater than or equal to the m/K candidate common demarcation value.
步骤S402-5、若存在第一数量值和第二数量值之间比值不超过预设值的m/K候选公共分界值,则将该m/K候选公共分界值确定为当前进程组的公共分界值。Step S402-5: If there is an m/K candidate common demarcation value whose ratio between the first quantitative value and the second quantitative value does not exceed the preset value, then determine the m/K candidate public demarcation value as the common demarcation value of the current process group. cutoff value.
步骤S402-6、当任意m/K候选公共分界值的第一数量值和第二数量值之间比值均超过预设值时,将K-1个的m/K候选公共分界值保存为已有的候选公共分界值。Step S402-6, when the ratio between the first quantitative value and the second quantitative value of any m/K candidate public demarcation value exceeds the preset value, save K-1 m/K candidate public demarcation values as There are candidate common cutoff values.
为了进一步根据已有的候选公共分界值确定出公共分界值,步骤S402还可以包括如下子步骤S402-7~S402-9。In order to further determine the common cutoff value according to the existing candidate common cutoff values, step S402 may further include the following sub-steps S402-7 to S402-9.
步骤S402-7、对已有的候选公共分界值进行优化,从已有的候选公共分界值中选择出相邻的第一参考候选公共分界值和第二参考候选公共分界值,其中第一参考候选公共分界值的第一数量值小于第二数量值,第二参考候选公共分界值的第一数量值大于第二数量值,计算出介于第一参考候选公共分界值和第二参考候选公共分界值之间的值,作为优化后的候选公共分界值,统计出优化后的候选公共分界值的第一数量值和第二数量值。Step S402-7, optimize the existing candidate common demarcation values, and select the adjacent first reference candidate public demarcation value and the second reference candidate public demarcation value from the existing candidate public demarcation values, wherein the first reference candidate public demarcation value is selected. The first quantity value of the candidate common demarcation value is smaller than the second quantity value, the first quantity value of the second reference candidate common demarcation value is greater than the second quantity value, and the calculation is calculated to be between the first reference candidate common demarcation value and the second reference candidate common demarcation value. The values between the boundary values are used as the optimized candidate common boundary values, and the first and second quantities of the optimized candidate public boundary values are counted.
步骤S402-8、当优化后的候选公共分界值的第一数量值和第二数量值之间比值不超过预设值时,将优化后的候选公共分界值确定为当前进程组的公共分界值;Step S402-8, when the ratio between the first quantity value and the second quantity value of the optimized candidate common demarcation value does not exceed the preset value, determine the optimized candidate public demarcation value as the public demarcation value of the current process group ;
步骤S402-9、当优化后的候选公共分界值的第一数量值和第二数量值之间比值超过预 设值时,将优化后的候选公共分界值加入为已有的候选公共分界值,执行步骤S402-7。Step S402-9, when the ratio between the first quantitative value and the second quantitative value of the optimized candidate common demarcation value exceeds a preset value, adding the optimized candidate public demarcation value as the existing candidate public demarcation value, Step S402-7 is executed.
这里,当第一数量值和第二数量值之间差值不超过预设值时,说明当前候选公共分界值下负载较为均衡,当第一数量值和第二数量值之间差值超过预设值时,说明当前候选公共分界值下负载存在不平衡,此时,需要对已有候选公共分界值进行优化。由于每次计算出候选公共分界值都会记录下来,在优化时,可以从已有的候选公共分界值中选择相邻的第一参考候选公共分界值和第二参考候选公共分界值进行优化处理,通过这样的优化,可以获得更优的候选公共分界值,能够进一步提高负载平衡性,进而提升分布式并行处理数据的效率,避免由于负载不均衡而带来的处理效率低下,以及不同计算节点间的处理时间代价不均,造成资源利用率低。Here, when the difference between the first quantity value and the second quantity value does not exceed the preset value, it means that the load is relatively balanced under the current candidate common demarcation value, and when the difference between the first quantity value and the second quantity value exceeds the preset value When the value is set, it indicates that the load is unbalanced under the current candidate public boundary value. In this case, the existing candidate public boundary value needs to be optimized. Since the candidate common demarcation value is recorded every time it is calculated, during optimization, the adjacent first reference candidate public demarcation value and the second reference candidate public demarcation value can be selected from the existing candidate public demarcation values for optimization processing. Through such optimization, a better candidate common demarcation value can be obtained, which can further improve the load balance, thereby improving the efficiency of distributed parallel processing of data, avoiding low processing efficiency caused by unbalanced load, and the need for different computing nodes. The processing time cost is uneven, resulting in low resource utilization.
例如,从已有的候选公共分界值中选择出相邻的第一参考候选公共分界值10000和第二参考候选公共分界值20000,已有候选公共分界值10000使得小于和大于10000的关键字数量分别是1000(第一数量值)和2000(第二数量值),而已有候选公共分界值20000使得小于和大于20000的关键字数量分别是2000(第一数量值)和1000(第二数量值),可以计算10000与20000的平均值15000(介于第一参考候选公共分界值和第二参考候选公共分界值之间的值),作为优化后的当前候选公共分界值。For example, the adjacent first reference candidate common demarcation value 10000 and the second reference candidate common demarcation value 20000 are selected from the existing candidate common demarcation values, and the existing candidate public demarcation value of 10000 makes the number of keywords less than and greater than 10000 are 1000 (the first number value) and 2000 (the second number value), respectively, and the existing candidate common demarcation value of 20000 makes the number of keywords less than and greater than 20000 to be 2000 (the first number value) and 1000 (the second number value), respectively ), the average value of 10000 and 20000, 15000 (a value between the first reference candidate common demarcation value and the second reference candidate public demarcation value), can be calculated as the optimized current candidate public demarcation value.
以图4所示的进程组为例,特定位置的关键字为有序关键字序列的中位数,该进程组共有6个进程,形成了一棵用于通信的二叉树结构,以加快信息在该进程组的进程间的传递,通过沿着二叉树自下而上的传递与规约计算,位于二叉树顶部的根节点即进程1最终得出(68,17653),其中68是6个进程的关键字总量、17653是6个进程的中位数权重的总和,计算出候选公共分界值259=17653/68。Taking the process group shown in Figure 4 as an example, the keyword in a specific position is the median of the ordered keyword sequence. There are 6 processes in this process group, forming a binary tree structure for communication to speed up the information transfer. The inter-process transfer of the process group is calculated by the bottom-up transfer and reduction along the binary tree. The root node at the top of the binary tree, that is, process 1, finally obtains (68, 17653), where 68 is the keyword of 6 processes The total, 17653 is the sum of the median weights of the 6 processes, and the candidate common cutoff value 259 = 17653/68 is calculated.
在将非有序关键字序列处理为有序关键字序列,以及进程对内负载调整处理后,使得进程间负载均衡,为了得到当前进程组的有序关键字,还需要在两个进程间进行快速归并排序。为此,步骤S5进一步包括如下子步骤S501~S503。After the non-ordered keyword sequence is processed into an ordered keyword sequence, and the intra-process load adjustment is processed, the load between the processes is balanced. In order to obtain the ordered keyword of the current process group, it is necessary to perform a Quick merge sort. To this end, step S5 further includes the following sub-steps S501-S503.
步骤S501、根据当前进程组的公共分界值,将当前进程组的每个进程的有序关键字序列划分为小于公共分界值的第一有序关键字子序列、大于等于公共分界值的第二有序关键字子序列。Step S501, according to the public demarcation value of the current process group, divide the ordered keyword sequence of each process of the current process group into a first ordered keyword subsequence smaller than the public demarcation value, and a second sequenced subsequence that is greater than or equal to the public demarcation value. An ordered keyword subsequence.
步骤S502、在当前进程组的两个进程之间进行有序关键字子序列的交换处理,交换处理包括:编号小的进程将其第二有序关键字子序列转移到编号大的进程,编号大的进程将其第一有序关键字子序列转移到编号小的进程。Step S502, performing an exchange process of an ordered keyword subsequence between two processes in the current process group, the exchange process includes: a process with a small number transfers its second ordered keyword subsequence to a process with a large number, and the numbered process The larger process transfers its first ordered key subsequence to the lower-numbered process.
步骤S503、当前进程组的每个进程将该进程的两个有序关键字子序列归并为一个有序 关键字序列,以使各进程分散存储有序关键字序列,且第i进程上的任意关键字都小于第i+1进程上的任意关键字。Step S503, each process of the current process group merges the two ordered keyword subsequences of the process into an ordered keyword sequence, so that each process stores the ordered keyword sequence scatteredly, and any arbitrary sequence on the i-th process. The keys are all less than any key on the i+1th process.
这里,在当前进程组的每两个进程间进行关键字归并排序处理时,各进程首先根据公共分界值,把有序关键字序列分为小于公共分界值、大于等于公共分界值的两个有序关键字子序列。然后,两个进程完成有序关键字子序列的交换,其中编号小的进程把其大于等于公共分界值的有序关键字子序列转移到编号大的进程,而编号大的进程把其小于公共分界值的有序关键字子序列转移到编号小的进程。最后,各进程把原有保留和新接收到的两个有序关键字子序列归并为一个新的有序关键字序列。Here, when performing keyword merge sort processing between every two processes in the current process group, each process firstly divides the ordered keyword sequence into two groups that are smaller than the common demarcation value and greater than or equal to the common demarcation value according to the common demarcation value. ordinal key subsequence. Then, the two processes complete the exchange of ordered key subsequences, in which the process with the smaller number transfers the ordered key subsequence whose value is greater than or equal to the common demarcation value to the process with the larger number, and the process with the larger number transfers the subsequence whose number is less than the common demarcation value. An ordered key subsequence of demarcation values is transferred to the lower-numbered process. Finally, each process merges the original reserved and newly received ordered key subsequences into a new ordered key sequence.
以表1所示的两个进程为例,公共分界值为400,给出了两个进程(编号小的标记为进程1、编号大的标记为进程2)上的有序数据序列。表2为各进程根据公共分界值所确定的两个有序关键字子序列。表3为两个进程相互交换完有序关键字子序列后的情况。表4为各进程归并完两个有序关键字子序列后,得到了新的有序关键字序列。Taking the two processes shown in Table 1 as an example, the common demarcation value is 400, and the ordered data sequence on the two processes (the smaller number is marked as process 1, and the larger number is marked as process 2) is given. Table 2 shows two ordered key subsequences determined by each process according to the common demarcation value. Table 3 shows the situation after the two processes exchange the ordered key subsequence with each other. Table 4 shows that each process obtains a new ordered keyword sequence after merging the two ordered keyword subsequences.
表1Table 1
Figure PCTCN2021104640-appb-000001
Figure PCTCN2021104640-appb-000001
表2Table 2
Figure PCTCN2021104640-appb-000002
Figure PCTCN2021104640-appb-000002
表3table 3
Figure PCTCN2021104640-appb-000003
Figure PCTCN2021104640-appb-000003
表4Table 4
Figure PCTCN2021104640-appb-000004
Figure PCTCN2021104640-appb-000004
给定总关键字数量N’和进程数N,通过本实施例提供的方法,进程间始终保持平衡,并在信息处理上整体取得最小复杂度即O(N’*log(N’)),而每个进程上的复杂度为O(N’*log(N’)/N),这个复杂度已考虑了进程间的快速归并排序与负载交换等。在进程之间会进行log(N)趟排序,每趟排序中会在进程组内进行进程间的通信,以传递用于计算和广播与公共分界值和负载平衡等相关的信息。当采用二叉树方式组织进程组内的通信时,各进程上相关通信的复杂度是O(log(N)*log(N))。Given the total number of keywords N' and the number of processes N, through the method provided in this embodiment, the processes are always balanced, and the overall minimum complexity in information processing is O(N'*log(N')), The complexity of each process is O(N'*log(N')/N), which has taken into account the fast merge sort and load exchange between processes. Log(N) sorting is performed between processes, and in each sorting, inter-process communication is carried out within the process group to transmit and broadcast information related to common demarcation values and load balancing. When a binary tree is used to organize the communication within the process group, the complexity of the related communication on each process is O(log(N)*log(N)).
实施例二Embodiment 2
与实施例一对应地,本实施例提供一种信息处理系统。如图5所示,该系统包括:信息获取模块1、第一处理模块2、第二处理模块3、计算模块4、第三处理模块5、存储模块6及控制模块7。Corresponding to the first embodiment, this embodiment provides an information processing system. As shown in FIG. 5 , the system includes: an information acquisition module 1 , a first processing module 2 , a second processing module 3 , a computing module 4 , a third processing module 5 , a storage module 6 and a control module 7 .
信息获取模块1配置为获取当前进程组的待处理信息,其中,当前进程组包括至少两个不同进程,待处理信息的每个信息包含一个用于标记或搜索查询该信息的关键字,待处理信息的所有关键字已分散存储在当前进程组的各个进程中,在各个进程组形成关键字序列。The information acquisition module 1 is configured to acquire pending information of the current process group, wherein the current process group includes at least two different processes, and each information of the information to be processed contains a keyword for marking or searching the information, and the information to be processed is to be processed. All the keywords of the information have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group.
第一处理模块2配置为若存在关键字序列为非有序关键字序列的进程,则将该进程的关键字序列处理为有序关键字序列。The first processing module 2 is configured to process the keyword sequence of the process into an ordered keyword sequence if there is a process whose keyword sequence is a non-ordered keyword sequence.
第二处理模块3配置为当需要进行负载调整处理时,对当前进程组进行进程间负载调整处理,以使各进程间的负载均衡。The second processing module 3 is configured to perform inter-process load adjustment processing on the current process group when the load adjustment processing needs to be performed, so as to balance the load among the processes.
计算模块4配置为计算当前进程组的公共分界值。The calculation module 4 is configured to calculate the common demarcation value of the current process group.
第三处理模块5配置为根据公共分界值,在当前进程组的每两个进程间进行关键字归并排序处理。The third processing module 5 is configured to perform keyword merge sorting processing between every two processes in the current process group according to the common demarcation value.
存储模块6配置为将关键字归并排序处理的结果分散存储在当前进程组的各个进程中,使各进程分散存储有序关键字序列,且第i进程上的任意关键字都小于第i+1进程上的任意 关键字。The storage module 6 is configured to store the results of the keyword merging and sorting processing in each process of the current process group in a scattered manner, so that each process is scattered and stored in an ordered sequence of keywords, and any keyword on the ith process is smaller than the ith+1th. Arbitrary keyword on the process.
控制模块7配置为控制信息获取模块1、第一处理模块2、第二处理模块3、计算模块4、第三处理模块5、存储模块6的调用。The control module 7 is configured to control the calling of the information acquisition module 1 , the first processing module 2 , the second processing module 3 , the calculation module 4 , the third processing module 5 , and the storage module 6 .
本实施例中,信息获取模块1的实施过程可参见上述实施例一中的步骤S1的实施过程;第一处理模块2的实施过程可参见上述实施例一中的步骤S2的实施过程;第二处理模块3的实施过程可参见上述实施例一中的步骤S3的实施过程;计算模块4的实施过程可参见上述实施例一中的步骤S4的实施过程;第三处理模块5的实施过程可参见上述实施例一中的步骤S5的实施过程;存储模块6的实施过程可参见上述实施例一中的步骤S6的实施过程;对此,本实施例不做赘述。In this embodiment, the implementation process of the information acquisition module 1 can refer to the implementation process of step S1 in the above-mentioned first embodiment; the implementation process of the first processing module 2 can refer to the implementation process of step S2 in the above-mentioned first embodiment; For the implementation process of the processing module 3, refer to the implementation process of step S3 in the first embodiment; for the implementation process of the calculation module 4, refer to the implementation process of step S4 in the above-mentioned first embodiment; for the implementation process of the third processing module 5, refer to For the implementation process of step S5 in the above-mentioned embodiment 1; for the implementation process of the storage module 6, reference may be made to the implementation process of step S6 in the above-mentioned embodiment 1, which will not be repeated in this embodiment.
本实施例提供的系统是进程级并行的分布式信息处理系统,在进行信息处理时,控制模块7首先调用第一处理模块2,使各进程内的关键字序列成为有序关键字序列;然后基于快速排序的思路,通过调用其它三个模块,来完成进程间的协同排序。在由N>1个进程组成进程组的一趟排序过程中,控制模块7调用第二处理模块3,调整该进程组的所有关键字在进程间的分布;再调用计算模块4,来确定该进程组的公共分界值;然后再调用第三处理模块5来完成各对的两个进程间基于数据交换与快速归并排序处理,完成一趟排序处理,得到当前进程组的信息处理结果,即当前进程组的有序关键字,存储模块6将关键字归并排序处理的结果分散存储在当前进程组的各个进程中,使各进程分散存储有序关键字序列,且第i进程上的任意关键字都小于第i+1进程上的任意关键字。The system provided by this embodiment is a process-level parallel distributed information processing system. When performing information processing, the control module 7 first calls the first processing module 2, so that the keyword sequence in each process becomes an ordered keyword sequence; then Based on the idea of quick sorting, the collaborative sorting between processes is completed by calling the other three modules. In a process of sorting a process group consisting of N>1 processes, the control module 7 calls the second processing module 3 to adjust the distribution of all keywords of the process group among the processes; and then calls the calculation module 4 to determine the The common demarcation value of the process group; then call the third processing module 5 to complete the data exchange and fast merge sorting processing between the two processes of each pair, complete one sorting process, and obtain the information processing result of the current process group, that is, the current The ordered keywords of the process group, the storage module 6 stores the results of the keyword merging and sorting processing in each process of the current process group, so that each process is scattered and stores the ordered keyword sequence, and any keyword on the i-th process is scattered and stored. are less than any keyword on the i+1th process.
可以理解的是,由于在划分进程对时,可以将当前进程组划分为小值子进程组和大值子进程组,(其中小值子进程组中各进程的编号小于大值子进程组中各进程的编号),因此,在实际应用中,在完成一趟排序后,控制模块7可以分别在小值子进程组和大值子进程组中进行下一趟排序。It can be understood that, when dividing the process pair, the current process group can be divided into a small-value sub-process group and a large-value sub-process group, (wherein the number of each process in the small-value sub-process group is smaller than that in the large-value sub-process group. The number of each process), therefore, in practical applications, after completing one round of sorting, the control module 7 can respectively perform the next round of sorting in the small-value sub-process group and the large-value sub-process group.
此外,第三处理模块5还配置为根据公共分界值,得到第一子进程组和第二子进程组,其中,第一子进程组中的所有关键字小于公共分界值,第二子进程组中的所有关键字大于分界值。控制模块7还配置为当第一子进程组或第二子进程组包括至少两个进程时,调用信息获取模块、第一处理模块、第二处理模块、计算模块、第三处理模块、存储模块,以对包括至少两个进程的第一子进程组或第二子进程组进行信息处理。In addition, the third processing module 5 is further configured to obtain the first sub-process group and the second sub-process group according to the common demarcation value, wherein all keywords in the first sub-process group are less than the common demarcation value, and the second sub-process group All keywords in are greater than the cutoff value. The control module 7 is further configured to call the information acquisition module, the first processing module, the second processing module, the computing module, the third processing module, and the storage module when the first sub-process group or the second sub-process group includes at least two processes. , so as to perform information processing on the first sub-process group or the second sub-process group including at least two processes.
值得说明的是,当一个进程内包含多个线程时,上述各模块均可使用一个进程内的多个线程进行加速计算。It is worth noting that when a process includes multiple threads, each of the above modules can use multiple threads within a process to perform accelerated computation.
实施例三 Embodiment 3
本实施例提供一种电子设备,包括存储器和处理器,该存储器上存储有计算机程序,该计算机程序被处理器执行时实现如实施例一的信息处理方法。This embodiment provides an electronic device, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the information processing method according to the first embodiment is implemented.
可以理解的是,该电子设备还可以包括通信组件。It will be appreciated that the electronic device may also include a communication component.
处理器用于执行如实施例一中的信息处理方法中的全部或部分步骤。存储器用于存储各种类型的数据,这些数据例如可以包括电子设备中方法的指令,以及与电子设备相关的数据。The processor is configured to execute all or part of the steps in the information processing method in the first embodiment. The memory is used to store various types of data, which may include, for example, instructions for methods in the electronic device, as well as data related to the electronic device.
该处理器可以是专用集成电路(Application Specific Integrated Circuit,简称ASIC)、数字信号处理器(Digital Signal Processor,简称DSP)、数字信号处理设备(Digital Signal Processing Device,简称DSPD)、可编程逻辑器件(Programmable Logic Device,简称PLD)、现场可编程门阵列(Field Programmable Gate Array,简称FPGA)、控制器、微控制器、微处理器或其他电子元件实现,用于执行上述实施例一中的信息处理方法。The processor may be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device ( Programmable Logic Device (PLD for short), Field Programmable Gate Array (FPGA for short), controller, microcontroller, microprocessor or other electronic components are implemented to perform the information processing in the first embodiment above. method.
该存储器可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,例如静态随机存取存储器(Static Random Access Memory,简称SRAM),电可擦除可编程只读存储器(Electrically Erasable Programmable Read-Only Memory,简称EEPROM),可擦除可编程只读存储器(Erasable Programmable Read-Only Memory,简称EPROM),可编程只读存储器(Programmable Read-Only Memory,简称PROM),只读存储器(Read-Only Memory,简称ROM),磁存储器,快闪存储器,磁盘或光盘。The memory can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory) Erasable Programmable Read-Only Memory (EEPROM for short), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, magnetic disk or optical disk.
通信组件用于电子设备与其他设备之间进行有线或无线通信。无线通信,例如Wi-Fi,蓝牙,近场通信(Near Field Communication,简称NFC),2G、3G或4G,或它们中的一种或几种的组合,因此相应的该通信组件可以包括:Wi-Fi模块,蓝牙模块,NFC模块。Communication components are used for wired or wireless communication between electronic devices and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC for short), 2G, 3G or 4G, or a combination of one or more of them, so the corresponding communication components may include: Wi-Fi -Fi module, bluetooth module, NFC module.
实施例四 Embodiment 4
本实施例提供一种存储介质,该存储介质上存储有计算机程序,所述计算机程序被一个或多个处理器执行时,实现如实施例一的信息处理方法。This embodiment provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by one or more processors, the information processing method according to the first embodiment is implemented.
如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机程序,该计算机程序被处理器执行时可以如实施例一中的信息处理方法中的全部或部分步骤,上述信息处理方法中的全部或部分步骤的具体实施例过程可参见实施例一,本实施例在此不再重复赘述。Such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable only memory Read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application mall, etc., on which a computer program is stored, and when the computer program is executed by the processor, it can be as in the first embodiment For all or part of the steps in the information processing method in , the specific embodiment process of all or part of the steps in the above-mentioned information processing method can refer to the first embodiment, which will not be repeated in this embodiment.
在本公开所提供的实施例中,应该理解到,所揭露的装置和方法,也可以通过其它的方式实现。以上所描述的装置实施例仅仅是示意性的,例如,附图中的流程图和框图显示了根据本公开的多个实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段或代码的一部分,上述模块、程序段或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现方式中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个连续的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或动作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may also be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality and possible implementations of apparatuses, methods and computer program products according to various embodiments of the present disclosure. operate. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which contains one or more possible functions for implementing the specified logical function(s) Execute the instruction. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.
利用本公开提供的信息处理方法、系统、电子设备及存储介质,能够在多个计算节点间完成海量信息顺序调整的信息处理,从而提高处理效率。若存在关键字序列为非有序关键字序列的进程,则先对该进程内的关键字序列进行从无序到有序的处理;当进程间需要进行负载调整处理时进行负载调整处理,使各进程的负载均衡;基于计算的公共分界值,在当前进程组的两个进程间进行快速归并排序处理,结合两个进程上各自的有序关键字序列,完成一趟快速归并排序,通过本公开,各进程的负载平衡性好,且计算复杂度低,能够有效提高分布式并行处理效率,快速得到进程组的有序关键字。With the information processing method, system, electronic device and storage medium provided by the present disclosure, information processing of order adjustment of massive information can be completed among multiple computing nodes, thereby improving processing efficiency. If there is a process whose keyword sequence is an unordered keyword sequence, the keyword sequence in the process is processed from disorder to order first; when load adjustment processing is required between processes, load adjustment processing is performed to make Load balancing of each process; based on the calculated common demarcation value, perform fast merge sort processing between the two processes in the current process group, and combine the respective ordered keyword sequences on the two processes to complete a quick merge sort. It is disclosed that the load balance of each process is good, and the computational complexity is low, the distributed parallel processing efficiency can be effectively improved, and the ordered keywords of the process group can be quickly obtained.
需要说明的是,在本公开中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this disclosure, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements , but also other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article, or device that includes the element.
虽然本公开所揭露的实施方式如上,但上述的内容只是为了便于理解本公开而采用的实施方式,并非用以限定本公开。任何本公开所属技术领域内的技术人员,在不脱离本公开所揭露的精神和范围的前提下,可以在实施的形式上及细节上作任何的修改与变化,但本公开的专利保护范围,仍须以所附的权利要求书所界定的范围为准。Although the embodiments disclosed in the present disclosure are as above, the above-mentioned contents are only the embodiments adopted to facilitate the understanding of the present disclosure, and are not intended to limit the present disclosure. Any person skilled in the art to which this disclosure belongs, without departing from the spirit and scope disclosed in this disclosure, can make any modifications and changes in the form and details of implementation, but the scope of patent protection of this disclosure, The scope as defined by the appended claims shall still prevail.

Claims (17)

  1. 一种信息处理方法,包括:An information processing method, comprising:
    获取当前进程组的待处理信息,其中,所述当前进程组包括至少两个不同进程,所述待处理信息的每个信息包含一个用于标记或搜索查询该信息的关键字,所述待处理信息的所有关键字已分散存储在当前进程组的各个进程中,在各个进程组形成关键字序列;Obtain the pending information of the current process group, wherein the current process group includes at least two different processes, and each information of the pending information contains a keyword for marking or searching for the information, the pending information All keywords of the information have been scattered and stored in each process of the current process group, forming a keyword sequence in each process group;
    若存在关键字序列为非有序关键字序列的进程,则将该进程的关键字序列处理为有序关键字序列;If there is a process whose keyword sequence is a non-ordered keyword sequence, the keyword sequence of the process is processed as an ordered keyword sequence;
    当需要进行负载调整处理时,对当前进程组进行进程间负载调整处理,以使各进程间的负载均衡;When load adjustment processing is required, inter-process load adjustment processing is performed on the current process group to balance the load among the processes;
    计算当前进程组的公共分界值;Calculate the common demarcation value of the current process group;
    根据所述公共分界值,在当前进程组的每两个进程间进行关键字归并排序处理;以及Perform keyword merge sort processing between every two processes in the current process group according to the common demarcation value; and
    将所述关键字归并排序处理的结果分散存储在当前进程组的各个进程中,各进程分散存储有序关键字序列,且第i进程上的任意关键字都小于第i+1进程上的任意关键字。The results of the keyword merging and sorting processing are scattered and stored in each process of the current process group, and each process is scattered and stored in an ordered sequence of keywords, and any keyword on the i-th process is smaller than any keyword on the i+1-th process. keywords.
  2. 根据权利要求1所述的信息处理方法,所述方法还包括:The information processing method according to claim 1, further comprising:
    根据所述公共分界值,得到第一子进程组和第二子进程组,其中,第一子进程组中的所有关键字小于所述公共分界值,第二子进程组中的所有关键字大于所述公共分界值。According to the common demarcation value, the first sub-process group and the second sub-process group are obtained, wherein all the keywords in the first sub-process group are less than the common demarcation value, and all the keywords in the second sub-process group are greater than the common cutoff value.
  3. 根据权利要求2所述的信息处理方法,所述方法还包括:The information processing method according to claim 2, further comprising:
    当所述第一子进程组或第二子进程组包括至少两个进程时,执行所述获取当前进程组的待处理信息,以对该子进程组进行信息处理。When the first sub-process group or the second sub-process group includes at least two processes, the acquiring the pending information of the current process group is performed to perform information processing on the sub-process group.
  4. 根据权利要求1所述的信息处理方法,其中,所述当需要进行负载调整处理时,对当前进程组进行进程间负载调整处理,以使各进程间的负载均衡,包括:The information processing method according to claim 1, wherein when load adjustment processing is required, performing inter-process load adjustment processing on the current process group to balance the load among the processes, comprising:
    获取当前进程组所有进程的负载信息,并汇集于当前进程组的主进程;Get the load information of all processes in the current process group and collect them in the main process of the current process group;
    在所述主进程中对当前进程组的所有进程的负载进行排序处理,得到当前进程组的负载有序数组,并将所述负载有序数组发送给当前进程组的所有进程;In the main process, the loads of all processes in the current process group are sorted to obtain an ordered array of loads of the current process group, and the ordered array of loads is sent to all processes in the current process group;
    根据所述当前进程组的负载信息,判断是否需要进行负载调整处理;According to the load information of the current process group, determine whether load adjustment processing is required;
    当需要进行负载调整处理时,根据所述负载有序数组将当前进程组的所有进程划分成若干进程对,在各进程对的两个进程之间进行负载调整处理,其中,负载第i大的 进程与负载第N-i大的进程组成一个进程对,i<N,N为当前进程组的进程数量。When load adjustment processing is required, all processes in the current process group are divided into several process pairs according to the load ordered array, and load adjustment processing is performed between the two processes of each process pair, wherein the i-th largest load is the one with the largest load. The process and the process with the largest load Ni form a process pair, i<N, where N is the number of processes in the current process group.
  5. 根据权利要求4所述的信息处理方法,其中,所述在各进程对的两个进程之间进行负载调整处理,包括:The information processing method according to claim 4, wherein the performing load adjustment processing between two processes of each process pair comprises:
    在各进程对中,负载大的进程将部分关键字转移到负载小的进程中,负载小的进程基于归并排序将来自于负载大的进程的部分关键字合并进来,以使进程对中两个进程的负载均衡。In each process pair, the process with a large load transfers some keywords to the process with a small load, and the process with a small load merges some keywords from the process with a large load based on the merge sort, so that the two processes in the pair are merged. Load balancing of processes.
  6. 根据权利要求4所述的信息处理方法,其中,所述负载信息包括:各进程的负载、所有进程的最大负载和所有进程的平均负载。The information processing method according to claim 4, wherein the load information includes: the load of each process, the maximum load of all the processes, and the average load of all the processes.
  7. 根据权利要求4所述的信息处理方法,其中,所述根据所述当前进程组的负载信息,判断是否需要进行负载调整处理,包括:The information processing method according to claim 4, wherein the determining whether to perform load adjustment processing according to the load information of the current process group comprises:
    根据所述当前进程组的负载信息,计算负载不平衡率;Calculate the load imbalance rate according to the load information of the current process group;
    当所述负载不平衡率大于预设阈值时,则需要进行负载调整处理。When the load imbalance ratio is greater than the preset threshold, load adjustment processing is required.
  8. 根据权利要求1所述的信息处理方法,其中,所述计算当前进程组的公共分界值,包括:The information processing method according to claim 1, wherein the calculating the common demarcation value of the current process group comprises:
    获取当前进程组的每个进程的有序关键字序列的信息,并汇集于当前进程组的主进程,所述有序关键字序列的信息包括:有序关键字序列中关键字的数量、特定位置的关键字;Obtain the information of the ordered keyword sequence of each process of the current process group, and collect it in the main process of the current process group. The information of the ordered keyword sequence includes: the number of keywords in the ordered keyword sequence, the specific the keyword of the location;
    根据当前进程组的各进程的有序关键字序列的信息,主进程计算公共分界值。According to the information of the ordered key sequence of each process of the current process group, the main process calculates the common demarcation value.
  9. 根据权利要求8所述的信息处理方法,其中,所述特定位置的关键字为将有序关键字序列分为K段时的K-1个位置上的关键字,有序关键字序列中第m位置上的关键字被称为m/K分位关键字,m≥1,且m≤K;所述根据当前进程组的各进程的有序关键字序列的信息,主进程计算公共分界值,包括:The information processing method according to claim 8, wherein the keyword at the specific position is the keyword at K-1 positions when the ordered keyword sequence is divided into K segments, and the first keyword in the ordered keyword sequence The keyword at the m position is called the m/K quantile keyword, m≥1, and m≤K; the main process calculates the common demarcation value according to the information of the ordered keyword sequence of each process in the current process group ,include:
    计算每个进程的有序关键字序列的各m/K分位关键字的权重,所述m/K分位关键字权重为:所述m/K分位关键字的值与所述有序关键字序列的关键字数量之积;Calculate the weight of each m/K quantile keyword of the ordered keyword sequence of each process, the m/K quantile keyword weight is: the value of the m/K quantile keyword and the ordered keyword The product of the number of keywords of the keyword sequence;
    计算各m/K分位关键字对应的m/K候选公共分界值,所述m/K候选公共分界值为:当前进程组的所有进程的有序关键字序列的m/K分位关键字的权重总和除以当前进程组的所有进程的有序关键字序列的关键字总量。Calculate the m/K candidate common demarcation value corresponding to each m/K quantile keyword, where the m/K candidate public demarcation value is: the m/K quantile keyword of the ordered keyword sequence of all processes in the current process group The sum of the weights is divided by the total number of keywords in the ordered keyword sequence of all processes in the current process group.
  10. 根据权利要求9所述的信息处理方法,其中,所述根据当前进程组的各进程的有序关键字序列的信息,主进程计算公共分界值,还包括:The information processing method according to claim 9, wherein, according to the information of the ordered keyword sequence of each process of the current process group, the main process calculates the common demarcation value, further comprising:
    主进程将K-1个的m/K候选公共分界值发送给所有进程。The master process sends K-1 m/K candidate common demarcation values to all processes.
  11. 根据权利要求10所述的信息处理方法,其中,所述根据当前进程组的各进程的有序关键字序列的信息,主进程计算公共分界值,还包括:The information processing method according to claim 10, wherein, according to the information of the ordered keyword sequence of each process of the current process group, the main process calculates the common demarcation value, further comprising:
    统计出当前进程组的所有进程中的各m/K候选公共分界值的第一数量值和第二数量值,其中,所述第一数量值是小于m/K候选公共分界值的关键字数量,所述第二数量值是大于等于m/K候选公共分界值的关键字数量;Counting the first quantity value and the second quantity value of each m/K candidate common demarcation value in all processes of the current process group, wherein the first quantity value is the number of keywords less than the m/K candidate public demarcation value , the second quantity value is the quantity of keywords that is greater than or equal to the m/K candidate common demarcation value;
    若存在所述第一数量值和所述第二数量值之间比值不超过预设值的m/K候选公共分界值,则将该m/K候选公共分界值确定为所述当前进程组的公共分界值;If there is an m/K candidate common demarcation value for which the ratio between the first quantity value and the second quantity value does not exceed a preset value, the m/K candidate common demarcation value is determined as the current process group public cutoff value;
    当任意m/K候选公共分界值的所述第一数量值和所述第二数量值之间比值均超过预设值时,将K-1个的m/K候选公共分界值保存为已有的候选公共分界值。When the ratio between the first quantitative value and the second quantitative value of any m/K candidate common demarcation value exceeds the preset value, save K-1 m/K candidate public demarcation values as existing The candidate common cutoff value of .
  12. 根据权利要求11所述的信息处理方法,其中,所述根据当前进程组的各进程的有序关键字序列的信息,主进程计算公共分界值,还包括:The information processing method according to claim 11, wherein, according to the information of the ordered keyword sequence of each process of the current process group, the main process calculates the common demarcation value, further comprising:
    对已有的候选公共分界值进行优化,从已有的候选公共分界值中选择出相邻的第一参考候选公共分界值和第二参考候选公共分界值,其中第一参考候选公共分界值的第一数量值小于第二数量值,第二参考候选公共分界值的第一数量值大于第二数量值,计算出介于第一参考候选公共分界值和第二参考候选公共分界值之间的值,作为优化后的候选公共分界值,统计出优化后的候选公共分界值的第一数量值和所述第二数量值;The existing candidate common demarcation values are optimized, and the adjacent first reference candidate public demarcation value and the second reference candidate public demarcation value are selected from the existing candidate public demarcation values, wherein the first reference candidate public demarcation value is The first quantity value is smaller than the second quantity value, the first quantity value of the second reference candidate common demarcation value is greater than the second quantity value, and the value between the first reference candidate common demarcation value and the second reference candidate common demarcation value is calculated. value, as the optimized candidate common demarcation value, and count the first quantitative value and the second quantitative value of the optimized candidate public demarcation value;
    当优化后的候选公共分界值的所述第一数量值和所述第二数量值之间比值不超过预设值时,将优化后的候选公共分界值确定为所述当前进程组的公共分界值;When the ratio between the first quantity value and the second quantity value of the optimized candidate common boundary value does not exceed a preset value, determine the optimized candidate common boundary value as the public boundary of the current process group value;
    当优化后的候选公共分界值的所述第一数量值和所述第二数量值之间比值超过预设值时,将优化后的候选公共分界值加入为已有的候选公共分界值,执行所述对已有的候选公共分界值进行优化,从已有的候选公共分界值中选择出相邻的第一参考候选公共分界值和第二参考候选公共分界值的步骤。When the ratio between the first quantitative value and the second quantitative value of the optimized candidate common demarcation value exceeds a preset value, add the optimized candidate public demarcation value as the existing candidate public demarcation value, and execute The step of optimizing the existing candidate common boundary values, and selecting the adjacent first reference candidate public boundary value and the second reference candidate public boundary value from the existing candidate public boundary values.
  13. 根据权利要求1所述的信息处理方法,其中,所述根据所述公共分界值,在当前进程组的每两个进程间进行关键字归并排序处理,包括:The information processing method according to claim 1, wherein, according to the common demarcation value, performing keyword merge sorting processing between every two processes in the current process group, comprising:
    根据所述当前进程组的公共分界值,将当前进程组的每个进程的有序关键字序列划分为小于所述公共分界值的第一有序关键字子序列、大于等于所述公共分界值的第二有序关键字子序列;According to the common demarcation value of the current process group, the ordered key sequence of each process in the current process group is divided into a first ordered key subsequence smaller than the common demarcation value, and greater than or equal to the common demarcation value The second ordered key subsequence of ;
    在当前进程组的两个进程之间进行有序关键字子序列的交换处理,所述交换处理包括:编号小的进程将其第二有序关键字子序列转移到编号大的进程,编号大的进程将其第一有序关键字子序列转移到编号小的进程;An exchange process of ordered keyword subsequences is performed between two processes in the current process group. The exchange process includes: a process with a smaller number transfers its second ordered keyword subsequence to a process with a larger number. transfers its first ordered keyword subsequence to the process with the lower number;
    当前进程组的每个进程将该进程的两个有序关键字子序列归并为一个有序关键字序列。Each process of the current process group merges the two ordered key subsequences of that process into one ordered key sequence.
  14. 一种信息处理系统,包括:An information processing system, comprising:
    信息获取模块,配置为获取当前进程组的待处理信息,其中,所述当前进程组包括至少两个不同进程,所述待处理信息的每个信息包含一个用于标记或搜索查询该信息的关键字,所述待处理信息的所有关键字已分散存储在当前进程组的各个进程中,在各个进程组形成关键字序列;An information acquisition module, configured to acquire pending information of the current process group, wherein the current process group includes at least two different processes, and each piece of the pending information contains a key for marking or searching for the information word, all keywords of the information to be processed have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group;
    第一处理模块,配置为若存在关键字序列为非有序关键字序列的进程,则将该进程的关键字序列处理为有序关键字序列;The first processing module is configured to process the keyword sequence of the process into an ordered keyword sequence if there is a process whose keyword sequence is a non-ordered keyword sequence;
    第二处理模块,配置为当需要进行负载调整处理时,对当前进程组进行进程间负载调整处理,以使各进程间的负载均衡;The second processing module is configured to perform inter-process load adjustment processing on the current process group when load adjustment processing is required, so as to balance the load among the processes;
    计算模块,配置为计算当前进程组的公共分界值;A calculation module, configured to calculate the common demarcation value of the current process group;
    第三处理模块,配置为根据所述公共分界值,在当前进程组的每两个进程间进行关键字归并排序处理;a third processing module, configured to perform keyword merge sorting processing between every two processes in the current process group according to the common demarcation value;
    存储模块,配置为将所述关键字归并排序处理的结果分散存储在当前进程组的各个进程中,使各进程分散存储有序关键字序列,且第i进程上的任意关键字都小于第i+1进程上的任意关键字;以及The storage module is configured to store the results of the keyword merging and sorting processing in each process of the current process group in a scattered manner, so that each process is scattered and stored in an ordered sequence of keywords, and any keyword on the i-th process is smaller than the i-th process. +1 for any keyword on the process; and
    控制模块,配置为控制信息获取模块、第一处理模块、第二处理模块、计算模块、第三处理模块、存储模块的调用。The control module is configured to control the calling of the information acquisition module, the first processing module, the second processing module, the calculation module, the third processing module and the storage module.
  15. 根据权利要求14所述的信息处理系统,其中,所述第三处理模块还配置为根据所述公共分界值,得到第一子进程组和第二子进程组,其中,第一子进程组中的所有关键字小于所述公共分界值,第二子进程组中的所有关键字大于分界值;The information processing system according to claim 14, wherein the third processing module is further configured to obtain a first sub-process group and a second sub-process group according to the common demarcation value, wherein the first sub-process group is All the keywords of are less than the common demarcation value, and all the keywords in the second sub-process group are greater than the demarcation value;
    所述控制模块还配置为当所述第一子进程组或第二子进程组包括至少两个进程时,调用信息获取模块、第一处理模块、第二处理模块、计算模块、第三处理模块、存储模块,以对包括至少两个进程的第一子进程组或第二子进程组进行信息处理。The control module is further configured to call an information acquisition module, a first processing module, a second processing module, a computing module, and a third processing module when the first sub-process group or the second sub-process group includes at least two processes and a storage module, to perform information processing on the first sub-process group or the second sub-process group including at least two processes.
  16. 一种电子设备,包括存储器和处理器,所述存储器上存储有计算机程序,所述计算机程序被所述处理器执行时实现如权利要求1至13中任一项所述的信息处理方法。An electronic device includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the information processing method according to any one of claims 1 to 13 is implemented.
  17. 一种存储介质,所述存储介质上存储有计算机程序,所述计算机程序被一个或多个处理器执行时,实现如权利要求1至13中任一项所述的信息处理方法。A storage medium, on which a computer program is stored, and when the computer program is executed by one or more processors, implements the information processing method according to any one of claims 1 to 13.
PCT/CN2021/104640 2020-07-24 2021-07-06 Information processing method and system, electronic device, and storage medium WO2022017167A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010720545.5A CN111597054B (en) 2020-07-24 2020-07-24 Information processing method, system, electronic equipment and storage medium
CN202010720545.5 2020-07-24

Publications (1)

Publication Number Publication Date
WO2022017167A1 true WO2022017167A1 (en) 2022-01-27

Family

ID=72191828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/104640 WO2022017167A1 (en) 2020-07-24 2021-07-06 Information processing method and system, electronic device, and storage medium

Country Status (2)

Country Link
CN (1) CN111597054B (en)
WO (1) WO2022017167A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597054B (en) * 2020-07-24 2020-12-04 北京卡普拉科技有限公司 Information processing method, system, electronic equipment and storage medium
CN112988907B (en) * 2021-04-28 2022-01-21 北京卡普拉科技有限公司 Information adjusting method, system, electronic equipment and storage medium
CN113259482B (en) * 2021-06-21 2021-12-07 北京卡普拉科技有限公司 Many-to-many communication mode optimization method and device, storage medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102834807A (en) * 2011-04-18 2012-12-19 华为技术有限公司 Method and device for balancing load of multiprocessor system
CN107102839A (en) * 2017-04-13 2017-08-29 青岛蓝云信息技术有限公司 A kind of data processing method for the MapReduce that sorted based on hardware
CN110187969A (en) * 2019-05-30 2019-08-30 北京理工大学 A kind of distributed big data parallel calculating method based on GPU
US20200073633A1 (en) * 2018-08-31 2020-03-05 International Business Machines Corporation Parallel sort accelerator sharing first level processor cache
CN111597054A (en) * 2020-07-24 2020-08-28 北京卡普拉科技有限公司 Information processing method, system, electronic equipment and storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8321476B2 (en) * 2010-03-29 2012-11-27 Sybase, Inc. Method and system for determining boundary values dynamically defining key value bounds of two or more disjoint subsets of sort run-based parallel processing of data from databases
CN103530084A (en) * 2013-09-26 2014-01-22 北京奇虎科技有限公司 Data parallel sequencing method and system
CN108874798B (en) * 2017-05-09 2022-08-12 北京京东尚科信息技术有限公司 Big data sorting method and system
CN111176843A (en) * 2019-12-23 2020-05-19 中国平安财产保险股份有限公司 Multi-dimension-based load balancing method and device and related equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102834807A (en) * 2011-04-18 2012-12-19 华为技术有限公司 Method and device for balancing load of multiprocessor system
CN107102839A (en) * 2017-04-13 2017-08-29 青岛蓝云信息技术有限公司 A kind of data processing method for the MapReduce that sorted based on hardware
US20200073633A1 (en) * 2018-08-31 2020-03-05 International Business Machines Corporation Parallel sort accelerator sharing first level processor cache
CN110187969A (en) * 2019-05-30 2019-08-30 北京理工大学 A kind of distributed big data parallel calculating method based on GPU
CN111597054A (en) * 2020-07-24 2020-08-28 北京卡普拉科技有限公司 Information processing method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111597054B (en) 2020-12-04
CN111597054A (en) 2020-08-28

Similar Documents

Publication Publication Date Title
WO2022017167A1 (en) Information processing method and system, electronic device, and storage medium
US10846332B2 (en) Playlist list determining method and device, electronic apparatus, and storage medium
WO2021043057A1 (en) Task allocation method and apparatus, and readable storage medium and terminal device
CN106649401A (en) Data writing method and device of distributed file system
EP4033440A1 (en) Consensus method, apparatus and device of block chain
US20160328445A1 (en) Data Query Method and Apparatus
US20190236474A1 (en) Load balancing for distributed processing of deterministically assigned data using statistical analysis of block data
CN110837584B (en) Method and system for constructing suffix array in block parallel manner
WO2021073196A1 (en) High-precision rounding technique-based data processing system and method capable of error control
KR102086936B1 (en) User data sharing method and device
US20210209690A1 (en) Order matching
CN110704424B (en) Sorting method and device applied to database and related equipment
CN108304404B (en) Data frequency estimation method based on improved Sketch structure
CN105677645B (en) A kind of tables of data comparison method and device
WO2017095413A1 (en) Incremental automatic update of ranked neighbor lists based on k-th nearest neighbors
CN107122412A (en) A kind of magnanimity telephone number Rapid matching search method
Xue et al. Dc-top-k: A novel top-k selecting algorithm and its parallelization
CN107368281B (en) Data processing method and device
US20210248142A1 (en) Dual filter histogram optimization
RU2755568C1 (en) Method for parallel execution of the join operation while processing large structured highly active data
CN111913945A (en) Data management method and device and storage medium
CN111695153A (en) K-anonymization method, system, equipment and readable storage medium for multi-branch forest
WO2019227415A1 (en) Scorecard model adjustment method, device, server and storage medium
CN116821559B (en) Method, system and terminal for rapidly acquiring a group of big data centralized trends
Ouyang et al. Memory-efficient gpu-based exact and parallel triangle counting in large graphs

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21846699

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 300623)

122 Ep: pct application non-entry in european phase

Ref document number: 21846699

Country of ref document: EP

Kind code of ref document: A1