WO2022017167A1

WO2022017167A1 - Information processing method and system, electronic device, and storage medium

Info

Publication number: WO2022017167A1
Application number: PCT/CN2021/104640
Authority: WO
Inventors: 赵彤; 李锐喆
Original assignee: 北京卡普拉科技有限公司
Priority date: 2020-07-24
Filing date: 2021-07-06
Publication date: 2022-01-27
Also published as: CN111597054B; CN111597054A

Abstract

An information processing method and system, an electronic device, and a storage medium. The method comprises: obtaining information to be processed of the current process group; if there is a process in which the keyword sequence is a non-ordered keyword sequence, processing the keyword sequence of the process into an ordered keyword sequence; when load regulation processing is required, performing inter-process load regulation processing on the current process group so as to balance a load between respective processes; calculating a common boundary value of the current process group; according to the common boundary value, performing keyword merging and sorting processing between every two processes of the current process group; and dispersedly storing the results of keyword merging and sorting processing in each process of the current process group, so that each process dispersedly stores the ordered keyword sequence, and any keyword on the i-th process is smaller than any keyword on the (i+1)-th process.

Description

An information processing method, system, electronic device and storage medium

CROSS-REFERENCE TO RELATED APPLICATIONS

This application claims the priority of Chinese patent application CN202010720545.5, which was filed on July 24, 2020 and entitled "An Information Processing Method, System, Electronic Device and Storage Medium", the entire contents of which are incorporated into this application by reference .

technical field

The present disclosure relates to the technical field of information processing, and in particular, to an information processing method, system, electronic device and storage medium.

Background technique

In technical fields such as databases and the Internet, it is often necessary to find the desired information from a set of information. In order to quickly find the required information, an effective way is to establish a data structure containing keywords (which can be integers or floating-point numbers of any number of digits, or strings of any length, etc.) A set of information is organized according to some order of keywords. Sorting is a method that can effectively organize information. It can adjust a group of information into a group of information with ordered keywords; then, the required information can be quickly found based on the keywords.

In the past, due to the small amount of information to be processed in fields such as databases and the Internet, it is usually only necessary to use one processor core to complete the adjustment of a set of information from disordered to ordered. As the amount of information increases, parallel computing of multiple processor cores within the same computing node needs to be utilized to speed up the adjustment. At present, the amount of information to be processed has reached a massive amount, such as T-level or even P-level. A computing node cannot meet the demand in terms of computing power or storage capacity. It is necessary to efficiently utilize parallel computing among multiple computing nodes to complete the adjustment in time. .

To complete the adjustment of massive information from disordered to ordered keywords among multiple computing nodes, the complexity is relatively high and the efficiency is low. Therefore, there is an urgent need for an information that can adjust the order of massive information among multiple computing nodes. Treatment scheme to improve treatment efficiency.

SUMMARY OF THE INVENTION

In view of the above technical problems, the present disclosure provides an information processing method, system, electronic device and storage medium, which can complete information processing of order adjustment of massive information among multiple computing nodes, thereby improving processing efficiency.

In a first aspect, the present disclosure provides an information processing method, comprising: acquiring information to be processed of a current process group, wherein the current process group includes at least two different processes, and each information of the information to be processed includes one for marking or searching Query the keywords of the information, all the keywords of the information to be processed have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group; if there is a process whose keyword sequence is an unordered keyword sequence, Then the keyword sequence of the process is processed as an ordered keyword sequence; when load adjustment processing is required, the inter-process load adjustment processing is performed on the current process group to balance the load among the processes; the public value of the current process group is calculated. Demarcation value; according to the common demarcation value, perform keyword merge sort processing between every two processes in the current process group; the results of keyword merge sort processing are scattered and stored in each process of the current process group, and each process is scattered and stored in an orderly manner A sequence of keywords, and any keyword on the i-th process is less than any keyword on the i+1-th process.

In a second aspect, the present disclosure provides an information processing system, comprising: an information acquisition module configured to acquire pending information of a current process group, wherein the current process group includes at least two different processes, and each piece of information to be processed includes A keyword used to mark or search for the information, all keywords of the information to be processed have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group; the first processing module is configured to For a process whose keyword sequence is a non-ordered keyword sequence, the keyword sequence of the process is processed as an ordered keyword sequence; the second processing module is configured to process the current process group when load adjustment processing is required. Inter-process load adjustment processing to balance the load among the processes; the calculation module is configured to calculate the common demarcation value of the current process group; the third processing module is configured to calculate between every two processes of the current process group according to the public demarcation value Perform keyword merge sorting processing; the storage module is configured to store the results of keyword merge sorting processing in each process of the current process group, so that each process stores an ordered keyword sequence in a scattered manner, and any key on the i-th process is distributed. The words are all smaller than any keyword on the i+1th process; the control module is configured to control the invocation of the information acquisition module, the first processing module, the second processing module, the computing module, the third processing module, and the storage module.

In a third aspect, the present disclosure provides an electronic device, including a memory and a processor, where a computer program is stored in the memory, and when the computer program is executed by the processor, the information processing method according to the first aspect is implemented.

In a fourth aspect, the present disclosure provides a storage medium on which a computer program is stored, and when the computer program is executed by one or more processors, implements the information processing method according to the first aspect.

Description of drawings

In order to illustrate the technical solutions of the embodiments of the present disclosure more clearly, the following briefly introduces the accompanying drawings that need to be used in the embodiments. It should be understood that the following drawings only show some embodiments of the present disclosure, and therefore do not It should be regarded as a limitation of the scope, and for those of ordinary skill in the art, other related drawings can also be obtained according to these drawings without any creative effort.

1 is a flowchart of an information processing method provided in Embodiment 1 of the present disclosure;

FIG. 2 is an example of a process group provided by Embodiment 1 of the present disclosure;

FIG. 3(a) is an example of load adjustment of a process group provided by Embodiment 1 of the present disclosure, where is a process pair of the process group;

FIG. 3(b) is a result of load adjustment processing of a process group provided by Embodiment 1 of the present disclosure;

4 is a calculation example of a candidate common demarcation value provided by Embodiment 1 of the present disclosure;

FIG. 5 is a block diagram of an information processing system provided by Embodiment 2 of the present disclosure.

detailed description

The technical solutions in the embodiments of the present disclosure will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, but not all of the embodiments. The components of the disclosed embodiments generally described and illustrated in the drawings herein may be arranged and designed in a variety of different configurations. Therefore, the following detailed description of the embodiments of the disclosure provided in the accompanying drawings is not intended to limit the scope of the disclosure as claimed, but is merely representative of selected embodiments of the disclosure. Based on the embodiments of the present disclosure, all other embodiments obtained by those skilled in the art without creative work fall within the protection scope of the present disclosure.

Given N' keywords, the minimum complexity of keyword unordered to ordered adjustment is O(N'*log(N')). When a large amount of information is adjusted from keyword disorder to order among multiple computing nodes, sorting methods that can achieve the lowest complexity include quick sort and merge sort.

The basic idea of quick sort is to divide a set of data (or keywords) into two independent parts through one-way sorting, in which all data (or keywords) in one part are less than the cut-off value, and all data (or keys) in the other part are word) are greater than or equal to the cutoff value, and then recursively perform quicksort on these two parts of data (or keywords) in this way. Therefore, the choice of cutoff value has an important impact on the complexity of quicksort, although the average complexity is O(N'*log(N')), but in the worst case, the complexity can reach O(N'*N').

The basic idea of merge sort is to combine several ordered subsequences of data (or keywords) in one pass to obtain a completely ordered sequence of data (or keywords); Makes subsequence segments ordered. The complexity of merge sort is stable at O(N'*log(N')), but it requires an extra storage space compared to quicksort.

Therefore, to complete the adjustment of massive information from disordered to ordered keywords among multiple computing nodes, the complexity is relatively high and the efficiency is low. The present disclosure provides a method that can complete the sequential adjustment of massive information among multiple computing nodes. The information processing scheme can improve the processing efficiency.

Example 1

Given a process group, which includes N processes (each process has a unique number between 1 and N), a total number of N' keywords are scattered in each process of the process group, and each process is stored in A keyword sequence composed of some keywords, in order to obtain the ordered keywords of the current process group, so that the information required by the user can be quickly found based on the ordered keywords, an information processing method provided by this embodiment is used to store the information in a distributed manner. The sequence of keywords in each process of the process group is processed in an orderly manner, so that an ordered sequence of keywords is stored in each process, and any keyword on the i-th process is smaller than the i+1-th process. Any keyword, where i is any number between 1 and N-1. FIG. 1 shows a flowchart of an information processing method provided by this embodiment. It should be noted that the information processing method provided by this embodiment of the present disclosure is not limited to the specific sequence shown in FIG. 1 and the following. It should be understood that in other In the embodiment, the order of some steps in the information processing method provided by the embodiment of the present disclosure may be exchanged with each other according to actual needs, or some of the steps may be omitted or deleted, or some of the steps may be performed simultaneously.

The following describes a method flow of an embodiment of the present disclosure with reference to FIG. 1 . As shown in FIG. 1 , the method includes the following steps S1 to S6 .

Step S1: Acquire pending information of the current process group, wherein the current process group includes at least two different processes, each information of the information to be processed includes a keyword for marking or searching for the information, all the information to be processed The keywords have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group. It can be understood that the keyword can be an integer or floating-point number of any number of digits, or a string of any length or other forms, which are not limited here.

Step S2: If there is a process in which the keyword sequence is a non-ordered keyword sequence, the keyword sequence of the process is processed as an ordered keyword sequence. Preferably, in this step, the non-ordered key sequence can be processed into an ordered key sequence by using an existing sorting method, such as quick sort, merge sort or heap sort. To this end, the sorting process of the local keyword sequence stored in the process can be completed by an existing thread-level parallel sorting program or an existing sorting program using processor vector instructions to form an ordered keyword on the process. sequence.

Step S3: When the load adjustment process needs to be performed, the inter-process load adjustment process is performed on the current process group, so as to balance the load among the processes.

Step S4: Calculate the common boundary value of the current process group.

Step S5: Perform keyword merge sorting processing between every two processes in the current process group according to the common demarcation value.

Step S6, the results of the keyword merging and sorting processing are scattered and stored in each process of the current process group, and each process is scattered and stored in an ordered sequence of keywords, and any keyword on the i-th process is smaller than the i+1-th process. any keyword.

In this embodiment, if there is a process in which the keyword sequence is an unordered keyword sequence, the keyword sequence in the process is first processed from disorder to order, and the keyword sequence in the process is processed as an ordered sequence. sequence keyword sequence; when load adjustment processing is required between processes, load adjustment processing is performed to balance the load among each process; based on the calculated common demarcation value, fast merge sort processing is performed between two processes in the current process group, combined with The respective ordered keyword sequences on the two processes complete a fast merge sort. Through this method, the load balance of each process is good, and the computational complexity is low, which can effectively improve the efficiency of distributed parallel processing and quickly obtain process groups. The ordered key of , completes the information processing of a group of processes.

After completing the information processing of a group of processes, the method further includes the following steps S7-S8.

Step S7, obtain the first sub-process group and the second sub-process group according to the common demarcation value, wherein, all the keywords in the first sub-process group are less than the common demarcation value, and all the keywords in the second sub-process group are greater than the common demarcation value. cutoff value.

Here, the current process group is further grouped according to the common demarcation value, and two sub-process groups are obtained, and then the next round of information processing is continued. That is, the sub-process group including at least two processes is used as the current process group, and step S1 is executed. ~ The process of step S6.

Step S8: When the first sub-process group or the second sub-process group includes at least two processes, execute the acquisition of pending information of the current process group, so as to perform information processing on the sub-process group.

Further, in order to achieve load balancing, it is necessary to analyze whether load adjustment processing is required according to the load-related information of all processes in the current process group. For example, step S3 may include the following sub-steps S301-S304.

Step S301: Obtain the load information of all processes in the current process group, and collect them in the main process of the current process group; wherein, the load information includes: the load of each process (that is, the number of keywords in the keyword sequence of each process) , the maximum load of all processes, and the average load of all processes.

Step S302: Sort the loads of all processes in the current process group in the main process, obtain an ordered array of loads of the current process group, and send the ordered array of loads to all processes in the current process group.

Here, the sorting processing is performed on the loads of all processes in the current process group, which may be sorted according to the load from small to large, or may be sorted according to the load from large to small, which is not limited here.

Step S303, according to the load information of the current process group, determine whether the load adjustment process needs to be performed.

Here, in order to accurately determine whether the load between the processes is balanced, that is, whether the load adjustment processing is required, it can be realized by calculating the load unbalance rate in the main process.

Therefore, in step S303, first, the load imbalance ratio is calculated according to the load information of the current process group. Among them, the calculation method of the load imbalance ratio is: the maximum load of all processes in the process group is divided by the average load of all processes in the process group. The larger the load imbalance ratio, the more unbalanced the load is. When the load imbalance ratio is close to 1, it means that the load is unbalanced. The load is well balanced. Then, when the load unbalance rate is greater than the preset threshold, load adjustment processing is required.

It can be understood that, when the load unbalance rate is not greater than the preset threshold, the load adjustment process does not need to be performed. The preset threshold may be set according to the actual situation, which is not limited here.

Step S304: When load adjustment processing is required, all processes of the current process group are divided into several process pairs according to the load ordered array, and load adjustment processing is performed between the two processes of each process pair, wherein the load i is the largest. The process of and the process with the largest load Ni form a process pair, i<N, N is the number of processes in the current process group.

It can be understood that since the load of each process has been sorted in the load ordered array, in order to improve the efficiency of load adjustment processing, when the load adjustment processing is performed, the process with the i-th largest load and the load of the Ni-th largest process are used. All processes are divided into several process pairs with roughly balanced load, and load adjustment is performed within the process pair, which can quickly achieve complex balance.

In practical applications, when dividing process pairs, the current process group can also be divided into a small-value sub-process group and a large-value sub-process group, wherein the number of each process in the small-value sub-process group is smaller than that of each process in the large-value sub-process group. The number of the process; finally, the processes in the two sub-process groups of small value and large value are paired (one process in each pair is from the small-value sub-process group, and the other process is from the large-value sub-process group), and the small-value sub-process group is the same as the The number of processes in the large-value sub-process group can be as close as possible, and can also be determined according to the number of data less than the common threshold and the number of data greater than or equal to the common threshold obtained by the calculation module 4 . For example, when there are 100 processes in the process group, and the number of data less than or equal to the common threshold value is 450,000 and 550,000, respectively, the small-value sub-process group and the large-value sub-process group should have 45 and 55 processes respectively. If the number of processes in the small-value sub-process group is less than the number of processes in the large-value sub-process group, when the third processing module 5 is called, a small-value process appears in multiple process pairs.

Here, load adjustment processing is performed between two processes in each process pair, including: in each process pair, the process with a large load transfers some keywords to a process with a small load, and the process with a small load transfers the keywords from the process based on the merge sort. Some keywords related to the process with heavy load are merged in, so that the load of the two processes in the process pair is balanced, so that the load of the two processes is the same or close.

Taking the process group shown in FIG. 2 as an example, the process group includes 7 processes, namely: process 1 to process 7 . Each process first establishes its own load information, including the number of local data (that is, the load of the process) and the two-tuple composed of the process number; then in the bottom-up process along the binary tree, complete the merging of the load of each process Sorting (keyword for sorting by load), after reaching the top root node, i.e. process 1, based on the sorted load information array, process 1 finds the maximum load (that is, process 5 has 12 data), and finds out The total load of all processes is 53 data, so the average load of all processes is 53/7, and the load imbalance ratio is 12/(53/7), that is, the maximum load of all processes is divided by the average load of all processes. When the load imbalance ratio exceeds a preset threshold, load adjustment processing will be performed. Figure 3 gives an example of the corresponding load adjustment, where there are a total of 3 process pairs as shown in Figure 3(a) (since the total number of processes 7 is an odd number, the process 7 with the 4th load is not paired), which can be based on the periodic interval Select data to be adjusted to another process, for example, in Figure 3(a), 77, 232, 973, 1404 of process 5, 33, 99, 887 of process 6, and 134, 311, 832 of process 4. The result after the load adjustment process is shown in Figure 3(b).

Further, in order to complete the calculation of the common boundary value, it is necessary to comprehensively consider the relevant information of the ordered keyword sequences of all processes in the current process group. Therefore, step S4 may include the following sub-steps S401-S402.

Step S401: Obtain the information of the ordered keyword sequence of each process of the current process group, and collect it in the main process of the current process group, wherein the information of the ordered keyword sequence includes: Quantities, keywords in specific locations.

Step S402: According to the information of the ordered key sequence of each process of the current process group, the main process calculates the candidate common demarcation value.

The keyword at a specific position is the keyword at K-1 positions when the ordered keyword sequence is divided into K segments, and the keyword at the mth position in the ordered keyword sequence is called the m/K quantile key word, m≥1, and m≤K; according to the information of the ordered key sequence of each process in the current process group, the main process calculates the common demarcation value. This sub-step may further include the following sub-steps S402-1 to S402-2.

Step S402-1: Calculate the weight of each m/K quantile keyword of the ordered keyword sequence of each process. The weight of the m/K quantile keyword is: the value of the m/K quantile keyword and the ordered key The product of the number of keywords in the word sequence.

Step S402-2: Calculate the m/K candidate common demarcation value corresponding to each m/K quantile keyword, and the m/K candidate public demarcation value is: the m/K score of the ordered keyword sequence of all processes in the current process group. The weighted sum of the bit keys divided by the total number of keys in the ordered key sequence of all processes in the current process group.

Taking the keyword at a specific position as the median of the ordered keyword sequence as an example, calculate the median weight of the ordered keyword sequence of each process. The median weight is: the value of the median and the ordered keyword The product of the number of keywords in the word sequence.

The median weight is calculated as follows:

Median weight = median value * number of keywords in the ordered sequence of keywords.

The median candidate common demarcation value corresponding to the calculated median is: the sum of the median weights of the ordered key sequences of all processes in the current process group divided by the keys of the ordered key sequences of all processes in the current process group total.

The formula for calculating the median candidate common cutoff value is as follows:

Median Candidate Common Cutoff = Median Weight Sum/Keyword Total.

In some embodiments, the above step S402 may further include the following sub-step S402-3.

Step S402-3, the main process sends K-1 m/K candidate common demarcation values to all processes.

In practical applications, in order to reduce the operation process, when calculating the m/K candidate common demarcation value, the current m/K candidate public demarcation value can be directly determined as the public demarcation value of the current process group. Threshold, when the calculated m/K candidate common demarcation value reaches the threshold, the current m/K candidate public demarcation value is determined as the public demarcation value of the current process group. It can be understood that, these two situations are only examples of practical applications, and this embodiment does not make any limitations on them.

In other embodiments, in order to obtain a more accurate common demarcation value, after the above step S402-3, the step S402 may further include the following sub-steps S402-4 to S402-6.

Step S402-4, count the first and second values of each m/K candidate common demarcation value in all processes of the current process group, wherein the first quantitative value is less than the m/K candidate common demarcation value. The number of keywords. The second value is the number of keywords that are greater than or equal to the m/K candidate common demarcation value.

Step S402-5: If there is an m/K candidate common demarcation value whose ratio between the first quantitative value and the second quantitative value does not exceed the preset value, then determine the m/K candidate public demarcation value as the common demarcation value of the current process group. cutoff value.

Step S402-6, when the ratio between the first quantitative value and the second quantitative value of any m/K candidate public demarcation value exceeds the preset value, save K-1 m/K candidate public demarcation values as There are candidate common cutoff values.

In order to further determine the common cutoff value according to the existing candidate common cutoff values, step S402 may further include the following sub-steps S402-7 to S402-9.

Step S402-7, optimize the existing candidate common demarcation values, and select the adjacent first reference candidate public demarcation value and the second reference candidate public demarcation value from the existing candidate public demarcation values, wherein the first reference candidate public demarcation value is selected. The first quantity value of the candidate common demarcation value is smaller than the second quantity value, the first quantity value of the second reference candidate common demarcation value is greater than the second quantity value, and the calculation is calculated to be between the first reference candidate common demarcation value and the second reference candidate common demarcation value. The values between the boundary values are used as the optimized candidate common boundary values, and the first and second quantities of the optimized candidate public boundary values are counted.

Step S402-8, when the ratio between the first quantity value and the second quantity value of the optimized candidate common demarcation value does not exceed the preset value, determine the optimized candidate public demarcation value as the public demarcation value of the current process group ;

Step S402-9, when the ratio between the first quantitative value and the second quantitative value of the optimized candidate common demarcation value exceeds a preset value, adding the optimized candidate public demarcation value as the existing candidate public demarcation value, Step S402-7 is executed.

Here, when the difference between the first quantity value and the second quantity value does not exceed the preset value, it means that the load is relatively balanced under the current candidate common demarcation value, and when the difference between the first quantity value and the second quantity value exceeds the preset value When the value is set, it indicates that the load is unbalanced under the current candidate public boundary value. In this case, the existing candidate public boundary value needs to be optimized. Since the candidate common demarcation value is recorded every time it is calculated, during optimization, the adjacent first reference candidate public demarcation value and the second reference candidate public demarcation value can be selected from the existing candidate public demarcation values for optimization processing. Through such optimization, a better candidate common demarcation value can be obtained, which can further improve the load balance, thereby improving the efficiency of distributed parallel processing of data, avoiding low processing efficiency caused by unbalanced load, and the need for different computing nodes. The processing time cost is uneven, resulting in low resource utilization.

For example, the adjacent first reference candidate common demarcation value 10000 and the second reference candidate common demarcation value 20000 are selected from the existing candidate common demarcation values, and the existing candidate public demarcation value of 10000 makes the number of keywords less than and greater than 10000 are 1000 (the first number value) and 2000 (the second number value), respectively, and the existing candidate common demarcation value of 20000 makes the number of keywords less than and greater than 20000 to be 2000 (the first number value) and 1000 (the second number value), respectively ), the average value of 10000 and 20000, 15000 (a value between the first reference candidate common demarcation value and the second reference candidate public demarcation value), can be calculated as the optimized current candidate public demarcation value.

Taking the process group shown in Figure 4 as an example, the keyword in a specific position is the median of the ordered keyword sequence. There are 6 processes in this process group, forming a binary tree structure for communication to speed up the information transfer. The inter-process transfer of the process group is calculated by the bottom-up transfer and reduction along the binary tree. The root node at the top of the binary tree, that is, process 1, finally obtains (68, 17653), where 68 is the keyword of 6 processes The total, 17653 is the sum of the median weights of the 6 processes, and the candidate common cutoff value 259 = 17653/68 is calculated.

After the non-ordered keyword sequence is processed into an ordered keyword sequence, and the intra-process load adjustment is processed, the load between the processes is balanced. In order to obtain the ordered keyword of the current process group, it is necessary to perform a Quick merge sort. To this end, step S5 further includes the following sub-steps S501-S503.

Step S501, according to the public demarcation value of the current process group, divide the ordered keyword sequence of each process of the current process group into a first ordered keyword subsequence smaller than the public demarcation value, and a second sequenced subsequence that is greater than or equal to the public demarcation value. An ordered keyword subsequence.

Step S502, performing an exchange process of an ordered keyword subsequence between two processes in the current process group, the exchange process includes: a process with a small number transfers its second ordered keyword subsequence to a process with a large number, and the numbered process The larger process transfers its first ordered key subsequence to the lower-numbered process.

Step S503, each process of the current process group merges the two ordered keyword subsequences of the process into an ordered keyword sequence, so that each process stores the ordered keyword sequence scatteredly, and any arbitrary sequence on the i-th process. The keys are all less than any key on the i+1th process.

Here, when performing keyword merge sort processing between every two processes in the current process group, each process firstly divides the ordered keyword sequence into two groups that are smaller than the common demarcation value and greater than or equal to the common demarcation value according to the common demarcation value. ordinal key subsequence. Then, the two processes complete the exchange of ordered key subsequences, in which the process with the smaller number transfers the ordered key subsequence whose value is greater than or equal to the common demarcation value to the process with the larger number, and the process with the larger number transfers the subsequence whose number is less than the common demarcation value. An ordered key subsequence of demarcation values is transferred to the lower-numbered process. Finally, each process merges the original reserved and newly received ordered key subsequences into a new ordered key sequence.

Taking the two processes shown in Table 1 as an example, the common demarcation value is 400, and the ordered data sequence on the two processes (the smaller number is marked as process 1, and the larger number is marked as process 2) is given. Table 2 shows two ordered key subsequences determined by each process according to the common demarcation value. Table 3 shows the situation after the two processes exchange the ordered key subsequence with each other. Table 4 shows that each process obtains a new ordered keyword sequence after merging the two ordered keyword subsequences.

Table 1

Table 2

table 3

Table 4

Given the total number of keywords N' and the number of processes N, through the method provided in this embodiment, the processes are always balanced, and the overall minimum complexity in information processing is O(N'*log(N')), The complexity of each process is O(N'*log(N')/N), which has taken into account the fast merge sort and load exchange between processes. Log(N) sorting is performed between processes, and in each sorting, inter-process communication is carried out within the process group to transmit and broadcast information related to common demarcation values and load balancing. When a binary tree is used to organize the communication within the process group, the complexity of the related communication on each process is O(log(N)*log(N)).

Embodiment 2

Corresponding to the first embodiment, this embodiment provides an information processing system. As shown in FIG. 5 , the system includes: an information acquisition module 1 , a first processing module 2 , a second processing module 3 , a computing module 4 , a third processing module 5 , a storage module 6 and a control module 7 .

The information acquisition module 1 is configured to acquire pending information of the current process group, wherein the current process group includes at least two different processes, and each information of the information to be processed contains a keyword for marking or searching the information, and the information to be processed is to be processed. All the keywords of the information have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group.

The first processing module 2 is configured to process the keyword sequence of the process into an ordered keyword sequence if there is a process whose keyword sequence is a non-ordered keyword sequence.

The second processing module 3 is configured to perform inter-process load adjustment processing on the current process group when the load adjustment processing needs to be performed, so as to balance the load among the processes.

The calculation module 4 is configured to calculate the common demarcation value of the current process group.

The third processing module 5 is configured to perform keyword merge sorting processing between every two processes in the current process group according to the common demarcation value.

The storage module 6 is configured to store the results of the keyword merging and sorting processing in each process of the current process group in a scattered manner, so that each process is scattered and stored in an ordered sequence of keywords, and any keyword on the ith process is smaller than the ith+1th. Arbitrary keyword on the process.

The control module 7 is configured to control the calling of the information acquisition module 1 , the first processing module 2 , the second processing module 3 , the calculation module 4 , the third processing module 5 , and the storage module 6 .

In this embodiment, the implementation process of the information acquisition module 1 can refer to the implementation process of step S1 in the above-mentioned first embodiment; the implementation process of the first processing module 2 can refer to the implementation process of step S2 in the above-mentioned first embodiment; For the implementation process of the processing module 3, refer to the implementation process of step S3 in the first embodiment; for the implementation process of the calculation module 4, refer to the implementation process of step S4 in the above-mentioned first embodiment; for the implementation process of the third processing module 5, refer to For the implementation process of step S5 in the above-mentioned embodiment 1; for the implementation process of the storage module 6, reference may be made to the implementation process of step S6 in the above-mentioned embodiment 1, which will not be repeated in this embodiment.

The system provided by this embodiment is a process-level parallel distributed information processing system. When performing information processing, the control module 7 first calls the first processing module 2, so that the keyword sequence in each process becomes an ordered keyword sequence; then Based on the idea of quick sorting, the collaborative sorting between processes is completed by calling the other three modules. In a process of sorting a process group consisting of N>1 processes, the control module 7 calls the second processing module 3 to adjust the distribution of all keywords of the process group among the processes; and then calls the calculation module 4 to determine the The common demarcation value of the process group; then call the third processing module 5 to complete the data exchange and fast merge sorting processing between the two processes of each pair, complete one sorting process, and obtain the information processing result of the current process group, that is, the current The ordered keywords of the process group, the storage module 6 stores the results of the keyword merging and sorting processing in each process of the current process group, so that each process is scattered and stores the ordered keyword sequence, and any keyword on the i-th process is scattered and stored. are less than any keyword on the i+1th process.

It can be understood that, when dividing the process pair, the current process group can be divided into a small-value sub-process group and a large-value sub-process group, (wherein the number of each process in the small-value sub-process group is smaller than that in the large-value sub-process group. The number of each process), therefore, in practical applications, after completing one round of sorting, the control module 7 can respectively perform the next round of sorting in the small-value sub-process group and the large-value sub-process group.

In addition, the third processing module 5 is further configured to obtain the first sub-process group and the second sub-process group according to the common demarcation value, wherein all keywords in the first sub-process group are less than the common demarcation value, and the second sub-process group All keywords in are greater than the cutoff value. The control module 7 is further configured to call the information acquisition module, the first processing module, the second processing module, the computing module, the third processing module, and the storage module when the first sub-process group or the second sub-process group includes at least two processes. , so as to perform information processing on the first sub-process group or the second sub-process group including at least two processes.

It is worth noting that when a process includes multiple threads, each of the above modules can use multiple threads within a process to perform accelerated computation.

Embodiment 3

This embodiment provides an electronic device, including a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the information processing method according to the first embodiment is implemented.

It will be appreciated that the electronic device may also include a communication component.

The processor is configured to execute all or part of the steps in the information processing method in the first embodiment. The memory is used to store various types of data, which may include, for example, instructions for methods in the electronic device, as well as data related to the electronic device.

The processor may be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device ( Programmable Logic Device (PLD for short), Field Programmable Gate Array (FPGA for short), controller, microcontroller, microprocessor or other electronic components are implemented to perform the information processing in the first embodiment above. method.

The memory can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory) Erasable Programmable Read-Only Memory (EEPROM for short), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read-Only Memory, referred to as ROM), magnetic memory, flash memory, magnetic disk or optical disk.

Communication components are used for wired or wireless communication between electronic devices and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC for short), 2G, 3G or 4G, or a combination of one or more of them, so the corresponding communication components may include: Wi-Fi -Fi module, bluetooth module, NFC module.

Embodiment 4

This embodiment provides a storage medium, where a computer program is stored on the storage medium, and when the computer program is executed by one or more processors, the information processing method according to the first embodiment is implemented.

Such as flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable only memory Read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, server, App application mall, etc., on which a computer program is stored, and when the computer program is executed by the processor, it can be as in the first embodiment For all or part of the steps in the information processing method in , the specific embodiment process of all or part of the steps in the above-mentioned information processing method can refer to the first embodiment, which will not be repeated in this embodiment.

In the embodiments provided in the present disclosure, it should be understood that the disclosed apparatus and method may also be implemented in other manners. The apparatus embodiments described above are merely illustrative, for example, the flowcharts and block diagrams in the accompanying drawings illustrate the architecture, functionality and possible implementations of apparatuses, methods and computer program products according to various embodiments of the present disclosure. operate. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which contains one or more possible functions for implementing the specified logical function(s) Execute the instruction. It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or actions , or can be implemented in a combination of dedicated hardware and computer instructions.

With the information processing method, system, electronic device and storage medium provided by the present disclosure, information processing of order adjustment of massive information can be completed among multiple computing nodes, thereby improving processing efficiency. If there is a process whose keyword sequence is an unordered keyword sequence, the keyword sequence in the process is processed from disorder to order first; when load adjustment processing is required between processes, load adjustment processing is performed to make Load balancing of each process; based on the calculated common demarcation value, perform fast merge sort processing between the two processes in the current process group, and combine the respective ordered keyword sequences on the two processes to complete a quick merge sort. It is disclosed that the load balance of each process is good, and the computational complexity is low, the distributed parallel processing efficiency can be effectively improved, and the ordered keywords of the process group can be quickly obtained.

It should be noted that, in this disclosure, the terms "comprising", "comprising" or any other variation thereof are intended to encompass non-exclusive inclusion, such that a process, method, article or device comprising a series of elements includes not only those elements , but also other elements not expressly listed or inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article, or device that includes the element.

Although the embodiments disclosed in the present disclosure are as above, the above-mentioned contents are only the embodiments adopted to facilitate the understanding of the present disclosure, and are not intended to limit the present disclosure. Any person skilled in the art to which this disclosure belongs, without departing from the spirit and scope disclosed in this disclosure, can make any modifications and changes in the form and details of implementation, but the scope of patent protection of this disclosure, The scope as defined by the appended claims shall still prevail.

Claims

An information processing method, comprising:

Obtain the pending information of the current process group, wherein the current process group includes at least two different processes, and each information of the pending information contains a keyword for marking or searching for the information, the pending information All keywords of the information have been scattered and stored in each process of the current process group, forming a keyword sequence in each process group;

If there is a process whose keyword sequence is a non-ordered keyword sequence, the keyword sequence of the process is processed as an ordered keyword sequence;

When load adjustment processing is required, inter-process load adjustment processing is performed on the current process group to balance the load among the processes;

Calculate the common demarcation value of the current process group;

Perform keyword merge sort processing between every two processes in the current process group according to the common demarcation value; and

The results of the keyword merging and sorting processing are scattered and stored in each process of the current process group, and each process is scattered and stored in an ordered sequence of keywords, and any keyword on the i-th process is smaller than any keyword on the i+1-th process. keywords.
The information processing method according to claim 1, further comprising:

According to the common demarcation value, the first sub-process group and the second sub-process group are obtained, wherein all the keywords in the first sub-process group are less than the common demarcation value, and all the keywords in the second sub-process group are greater than the common cutoff value.
The information processing method according to claim 2, further comprising:

When the first sub-process group or the second sub-process group includes at least two processes, the acquiring the pending information of the current process group is performed to perform information processing on the sub-process group.
The information processing method according to claim 1, wherein when load adjustment processing is required, performing inter-process load adjustment processing on the current process group to balance the load among the processes, comprising:

Get the load information of all processes in the current process group and collect them in the main process of the current process group;

In the main process, the loads of all processes in the current process group are sorted to obtain an ordered array of loads of the current process group, and the ordered array of loads is sent to all processes in the current process group;

According to the load information of the current process group, determine whether load adjustment processing is required;

When load adjustment processing is required, all processes in the current process group are divided into several process pairs according to the load ordered array, and load adjustment processing is performed between the two processes of each process pair, wherein the i-th largest load is the one with the largest load. The process and the process with the largest load Ni form a process pair, i<N, where N is the number of processes in the current process group.
The information processing method according to claim 4, wherein the performing load adjustment processing between two processes of each process pair comprises:

In each process pair, the process with a large load transfers some keywords to the process with a small load, and the process with a small load merges some keywords from the process with a large load based on the merge sort, so that the two processes in the pair are merged. Load balancing of processes.
The information processing method according to claim 4, wherein the load information includes: the load of each process, the maximum load of all the processes, and the average load of all the processes.
The information processing method according to claim 4, wherein the determining whether to perform load adjustment processing according to the load information of the current process group comprises:

Calculate the load imbalance rate according to the load information of the current process group;

When the load imbalance ratio is greater than the preset threshold, load adjustment processing is required.
The information processing method according to claim 1, wherein the calculating the common demarcation value of the current process group comprises:

Obtain the information of the ordered keyword sequence of each process of the current process group, and collect it in the main process of the current process group. The information of the ordered keyword sequence includes: the number of keywords in the ordered keyword sequence, the specific the keyword of the location;

According to the information of the ordered key sequence of each process of the current process group, the main process calculates the common demarcation value.
The information processing method according to claim 8, wherein the keyword at the specific position is the keyword at K-1 positions when the ordered keyword sequence is divided into K segments, and the first keyword in the ordered keyword sequence The keyword at the m position is called the m/K quantile keyword, m≥1, and m≤K; the main process calculates the common demarcation value according to the information of the ordered keyword sequence of each process in the current process group ,include:

Calculate the weight of each m/K quantile keyword of the ordered keyword sequence of each process, the m/K quantile keyword weight is: the value of the m/K quantile keyword and the ordered keyword The product of the number of keywords of the keyword sequence;

Calculate the m/K candidate common demarcation value corresponding to each m/K quantile keyword, where the m/K candidate public demarcation value is: the m/K quantile keyword of the ordered keyword sequence of all processes in the current process group The sum of the weights is divided by the total number of keywords in the ordered keyword sequence of all processes in the current process group.
The information processing method according to claim 9, wherein, according to the information of the ordered keyword sequence of each process of the current process group, the main process calculates the common demarcation value, further comprising:

The master process sends K-1 m/K candidate common demarcation values to all processes.
The information processing method according to claim 10, wherein, according to the information of the ordered keyword sequence of each process of the current process group, the main process calculates the common demarcation value, further comprising:

Counting the first quantity value and the second quantity value of each m/K candidate common demarcation value in all processes of the current process group, wherein the first quantity value is the number of keywords less than the m/K candidate public demarcation value , the second quantity value is the quantity of keywords that is greater than or equal to the m/K candidate common demarcation value;

If there is an m/K candidate common demarcation value for which the ratio between the first quantity value and the second quantity value does not exceed a preset value, the m/K candidate common demarcation value is determined as the current process group public cutoff value;

When the ratio between the first quantitative value and the second quantitative value of any m/K candidate common demarcation value exceeds the preset value, save K-1 m/K candidate public demarcation values as existing The candidate common cutoff value of .
The information processing method according to claim 11, wherein, according to the information of the ordered keyword sequence of each process of the current process group, the main process calculates the common demarcation value, further comprising:

The existing candidate common demarcation values are optimized, and the adjacent first reference candidate public demarcation value and the second reference candidate public demarcation value are selected from the existing candidate public demarcation values, wherein the first reference candidate public demarcation value is The first quantity value is smaller than the second quantity value, the first quantity value of the second reference candidate common demarcation value is greater than the second quantity value, and the value between the first reference candidate common demarcation value and the second reference candidate common demarcation value is calculated. value, as the optimized candidate common demarcation value, and count the first quantitative value and the second quantitative value of the optimized candidate public demarcation value;

When the ratio between the first quantity value and the second quantity value of the optimized candidate common boundary value does not exceed a preset value, determine the optimized candidate common boundary value as the public boundary of the current process group value;

When the ratio between the first quantitative value and the second quantitative value of the optimized candidate common demarcation value exceeds a preset value, add the optimized candidate public demarcation value as the existing candidate public demarcation value, and execute The step of optimizing the existing candidate common boundary values, and selecting the adjacent first reference candidate public boundary value and the second reference candidate public boundary value from the existing candidate public boundary values.
The information processing method according to claim 1, wherein, according to the common demarcation value, performing keyword merge sorting processing between every two processes in the current process group, comprising:

According to the common demarcation value of the current process group, the ordered key sequence of each process in the current process group is divided into a first ordered key subsequence smaller than the common demarcation value, and greater than or equal to the common demarcation value The second ordered key subsequence of ;

An exchange process of ordered keyword subsequences is performed between two processes in the current process group. The exchange process includes: a process with a smaller number transfers its second ordered keyword subsequence to a process with a larger number. transfers its first ordered keyword subsequence to the process with the lower number;

Each process of the current process group merges the two ordered key subsequences of that process into one ordered key sequence.
An information processing system, comprising:

An information acquisition module, configured to acquire pending information of the current process group, wherein the current process group includes at least two different processes, and each piece of the pending information contains a key for marking or searching for the information word, all keywords of the information to be processed have been scattered and stored in each process of the current process group, and a keyword sequence is formed in each process group;

The first processing module is configured to process the keyword sequence of the process into an ordered keyword sequence if there is a process whose keyword sequence is a non-ordered keyword sequence;

The second processing module is configured to perform inter-process load adjustment processing on the current process group when load adjustment processing is required, so as to balance the load among the processes;

A calculation module, configured to calculate the common demarcation value of the current process group;

a third processing module, configured to perform keyword merge sorting processing between every two processes in the current process group according to the common demarcation value;

The storage module is configured to store the results of the keyword merging and sorting processing in each process of the current process group in a scattered manner, so that each process is scattered and stored in an ordered sequence of keywords, and any keyword on the i-th process is smaller than the i-th process. +1 for any keyword on the process; and

The control module is configured to control the calling of the information acquisition module, the first processing module, the second processing module, the calculation module, the third processing module and the storage module.
The information processing system according to claim 14, wherein the third processing module is further configured to obtain a first sub-process group and a second sub-process group according to the common demarcation value, wherein the first sub-process group is All the keywords of are less than the common demarcation value, and all the keywords in the second sub-process group are greater than the demarcation value;

The control module is further configured to call an information acquisition module, a first processing module, a second processing module, a computing module, and a third processing module when the first sub-process group or the second sub-process group includes at least two processes and a storage module, to perform information processing on the first sub-process group or the second sub-process group including at least two processes.
An electronic device includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the information processing method according to any one of claims 1 to 13 is implemented.
A storage medium, on which a computer program is stored, and when the computer program is executed by one or more processors, implements the information processing method according to any one of claims 1 to 13.