CN113722071A - Data processing method, data processing apparatus, electronic device, storage medium, and program product - Google Patents

Data processing method, data processing apparatus, electronic device, storage medium, and program product Download PDF

Info

Publication number
CN113722071A
CN113722071A CN202111062546.6A CN202111062546A CN113722071A CN 113722071 A CN113722071 A CN 113722071A CN 202111062546 A CN202111062546 A CN 202111062546A CN 113722071 A CN113722071 A CN 113722071A
Authority
CN
China
Prior art keywords
reduction
data
key value
nodes
target key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111062546.6A
Other languages
Chinese (zh)
Inventor
关振宇
朱家强
郑为锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lakala Payment Co ltd
Original Assignee
Lakala Payment Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lakala Payment Co ltd filed Critical Lakala Payment Co ltd
Priority to CN202111062546.6A priority Critical patent/CN113722071A/en
Publication of CN113722071A publication Critical patent/CN113722071A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Abstract

The embodiment of the disclosure discloses a data processing method, a data processing device, an electronic device, a storage medium and a program product, wherein the method comprises the following steps: determining a target field and a target key value; determining the reduction number according to the target field and the target key value; and executing a grouping calculation task on the grouping calculation input data by taking the target field, the target key value and the reduction number as grouping parameters. The technical scheme can flexibly adjust the reduction number, thereby ensuring the utilization rate of the reduce node, reducing the resource overhead of the system and being beneficial to load balancing.

Description

Data processing method, data processing apparatus, electronic device, storage medium, and program product
Technical Field
The disclosed embodiments relate to the technical field of data processing, and in particular, to a data processing method, an apparatus, an electronic device, a storage medium, and a program product.
Background
Reduce refers to a computing way to copy data from the relevant map (mapping) end to the Reduce node for reduction processing at runtime. In the prior art, the reduce number is generally not variable except that the reduce number can be set to 0, i.e. some data is considered not to be required to be reduced. However, the sizes of the data to be processed are different, and if the same reduce number is allocated to the data to be processed with various sizes, the utilization rate of the reduce nodes may be reduced, the resource overhead of the system may be increased, and load balancing may not be facilitated.
Disclosure of Invention
The disclosed embodiment provides a data processing method, a data processing device, an electronic device, a storage medium and a program product.
In a first aspect, an embodiment of the present disclosure provides a data processing method.
Specifically, the data processing method includes:
determining a target field and a target key value;
determining the reduction number according to the target field and the target key value;
and executing a grouping calculation task on the grouping calculation input data by taking the target field, the target key value and the reduction number as grouping parameters.
With reference to the first aspect, in a first implementation manner of the first aspect, the determining, according to the target field and the target key value, a reduction number includes:
retrieving the grouped input data by taking the target field and the target key value as index values to obtain the number of target key value data;
and determining the reduction number according to the number of the target key value data.
With reference to the first aspect and the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the determining a reduction number according to the number of target key value data includes:
acquiring the number of reduction nodes;
and comparing the number of the target key value data with the number of the reduction nodes to determine the reduction number.
With reference to the first aspect, the first implementation manner of the first aspect, and the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the comparing the number of target key value data with the number of reduction nodes to determine a reduction number includes:
when the number of the target key value data is less than or equal to the number of the reduction nodes, setting the reduction number as the number of the target key value data;
when the number of the target key value data is larger than the number of the reduction nodes and the number of the reduction nodes can be divided completely, setting the reduction number as the number of the reduction nodes;
when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the remainder is larger than half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder;
and when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided completely, and the remainder is smaller than or equal to half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder.
With reference to the first aspect, the first implementation manner of the first aspect, the second implementation manner of the first aspect, and the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the present disclosure further includes:
performing preset processing on data obtained after the grouping calculation task is executed to obtain preset processing data;
and sending the preset processing data and the reduction number to a reduction component.
In a second aspect, a data processing method is provided in an embodiment of the present disclosure.
Specifically, the data processing method includes:
receiving preset processing data and reduction quantity sent by a grouping calculation component;
determining a reduction node according to the reduction number;
and executing reduction processing on the preset processing data by using the determined reduction node.
In a third aspect, an embodiment of the present disclosure provides a data processing method.
Specifically, the data processing method includes:
the grouping calculation component determines a target field and a target key value, determines a reduction number according to the target field and the target key value, executes a grouping calculation task on grouping calculation input data by taking the target field, the target key value and the reduction number as grouping parameters, performs preset processing on data obtained after the grouping calculation task is executed to obtain preset processing data, and sends the preset processing data and the reduction number to the reduction component;
the reduction component receives preset processing data and reduction quantity sent by the grouping calculation component, determines a reduction node according to the reduction quantity, and executes reduction processing on the preset processing data by using the determined reduction node.
In a fourth aspect, a data processing apparatus is provided in an embodiment of the present disclosure.
Specifically, the data processing apparatus includes:
a first determination module configured to determine a target field and a target key value;
a second determination module configured to determine a reduction number from the target field and a target key value;
an execution module configured to execute a group computation task on the group computation input data with the target field, the target key value, and the reduction number as group parameters.
With reference to the fourth aspect, in a first implementation manner of the fourth aspect, the second determining module is configured to:
retrieving the grouped input data by taking the target field and the target key value as index values to obtain the number of target key value data;
and determining the reduction number according to the number of the target key value data.
With reference to the fourth aspect and the first implementation manner of the fourth aspect, in a second implementation manner of the fourth aspect, the determining, according to the number of target key value data, a reduction number part configured to:
acquiring the number of reduction nodes;
and comparing the number of the target key value data with the number of the reduction nodes to determine the reduction number.
With reference to the fourth aspect, the first implementation manner of the fourth aspect, and the second implementation manner of the fourth aspect, in a third implementation manner of the fourth aspect, the comparing the number of target key-value data with the number of reduction nodes to determine a reduction number part is configured to:
when the number of the target key value data is less than or equal to the number of the reduction nodes, setting the reduction number as the number of the target key value data;
when the number of the target key value data is larger than the number of the reduction nodes and the number of the reduction nodes can be divided completely, setting the reduction number as the number of the reduction nodes;
when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the remainder is larger than half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder;
and when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided completely, and the remainder is smaller than or equal to half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder.
With reference to the fourth aspect, the first implementation manner of the fourth aspect, the second implementation manner of the fourth aspect, and the third implementation manner of the fourth aspect, in a fourth implementation manner of the fourth aspect, the present disclosure further includes:
and the sending module is configured to perform preset processing on the data obtained after the grouping calculation task is executed to obtain preset processing data, and send the preset processing data and the reduction number to the reduction component.
In a fifth aspect, a data processing apparatus is provided in an embodiment of the present disclosure.
Specifically, the data processing apparatus includes:
the receiving module is configured to receive the preset processing data and the reduction quantity sent by the grouping calculation component;
a third determining module configured to determine a reduction node according to the reduction number;
a processing module configured to perform reduction processing on the preset processing data using the determined reduction node.
In a sixth aspect, a data processing apparatus is provided in an embodiment of the present disclosure.
Specifically, the data processing apparatus includes:
the group calculation component is configured to determine a target field and a target key value, determine a reduction number according to the target field and the target key value, execute a group calculation task on group calculation input data by taking the target field, the target key value and the reduction number as group parameters, perform preset processing on data obtained after the group calculation task is executed to obtain preset processing data, and send the preset processing data and the reduction number to the reduction component;
a reduction component configured to receive the preset processing data and the reduction number sent by the packet calculation component, determine a reduction node according to the reduction number, and perform reduction processing on the preset processing data by using the determined reduction node.
In a seventh aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor, where the memory is used to store one or more computer instructions for supporting a data processing apparatus to execute the data processing method, and the processor is configured to execute the computer instructions stored in the memory. The data processing apparatus may further comprise a communication interface for the data processing apparatus to communicate with other devices or a communication network.
In an eighth aspect, the disclosed embodiments provide a computer-readable storage medium for storing computer instructions for a data processing apparatus, which contains computer instructions for executing the data processing method described above as a data processing apparatus.
In a ninth aspect, the disclosed embodiments provide a computer program product comprising computer programs/instructions which, when executed by a processor, implement the steps of the above-described data processing method.
The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:
the above solution enables the adjustment of the reduction number by setting a reduction number parameter for the packet calculation. The technical scheme can flexibly adjust the reduction number, thereby ensuring the utilization rate of the reduce node, reducing the resource overhead of the system and being beneficial to load balancing.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of embodiments of the disclosure.
Drawings
Other features, objects, and advantages of embodiments of the disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:
FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;
FIG. 2 shows a flow diagram of a data processing method according to another embodiment of the present disclosure;
FIG. 3 shows a flow diagram of a data processing method according to yet another embodiment of the present disclosure;
FIG. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure;
FIG. 5 shows a block diagram of a data processing apparatus according to another embodiment of the present disclosure;
FIG. 6 shows a block diagram of a data processing apparatus according to yet another embodiment of the present disclosure;
FIG. 7 shows a block diagram of an electronic device according to an embodiment of the present disclosure;
fig. 8 is a schematic block diagram of a computer system suitable for implementing a data processing method according to an embodiment of the present disclosure.
Detailed Description
Hereinafter, exemplary embodiments of the disclosed embodiments will be described in detail with reference to the accompanying drawings so that they can be easily implemented by those skilled in the art. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.
In the disclosed embodiments, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of the disclosed features, numbers, steps, behaviors, components, parts, or combinations thereof, and are not intended to preclude the possibility that one or more other features, numbers, steps, behaviors, components, parts, or combinations thereof may be present or added.
It should be further noted that the embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict. The embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
The technical scheme provided by the embodiment of the disclosure realizes the adjustment of the reduction number by setting the reduction number parameter for the grouping calculation. The technical scheme can flexibly adjust the reduction number, thereby ensuring the utilization rate of the reduce node, reducing the resource overhead of the system and being beneficial to load balancing.
Fig. 1 shows a flowchart of a data processing method according to an embodiment of the present disclosure, which includes the following steps S101 to S103, as shown in fig. 1:
in step S101, a target field and a target key value are determined;
in step S102, determining a reduction number according to the target field and the target key value;
in step S103, a grouping calculation task is performed on the grouping calculation input data with the destination field, the destination key value, and the reduction number as grouping parameters.
As mentioned above, Reduce refers to a calculation mode of copying data from a relevant map (mapping) end to a Reduce node at runtime for reduction processing. In the prior art, the reduce number is generally not variable except that the reduce number can be set to 0, i.e. some data is considered not to be required to be reduced. However, the sizes of the data to be processed are different, and if the same reduce number is allocated to the data to be processed with various sizes, the utilization rate of the reduce nodes may be reduced, the resource overhead of the system may be increased, and load balancing may not be facilitated.
In view of the above, in this embodiment, a data processing method is proposed which achieves adjustment of the reduction number by setting a reduction number parameter for packet calculation. The technical scheme can flexibly adjust the reduction number, thereby ensuring the utilization rate of the reduce node, reducing the resource overhead of the system and being beneficial to load balancing.
In an embodiment of the present disclosure, the data processing method may be applied to a packet computing component for data processing.
In an embodiment of the present disclosure, the target field refers to field information according to which the grouping calculation is performed, and the target key value refers to key value information according to which the grouping calculation is performed. For example, if the target field is a and the target key value is B, the grouping calculation performed by using the target field and the target key value as grouping calculation parameters refers to grouping calculation performed on all data with key values of B in the field a.
In an embodiment of the present disclosure, the reduction number refers to the number of reduction nodes that need to be used when a reduction process is subsequently performed. And determining the reduction number according to the target field and a target key value, wherein the reduction number is smaller than or equal to the total number of reduction nodes.
In an embodiment of the present disclosure, the packet computation input data refers to data waiting for packet computation, and the packet computation input data may be, for example, mapping output data or the like.
In the above embodiment, the target field and the target key value are first determined, then the reduction number required for the subsequent reduction processing is determined according to the target field and the target key value, and then the target field, the target key value and the reduction number are used as common grouping parameters to perform the grouping calculation task on the grouping calculation input data.
In an embodiment of the present disclosure, the step S102, namely, the step of determining the reduction number according to the target field and the target key value, may include the following steps:
retrieving the grouped input data by taking the target field and the target key value as index values to obtain the number of target key value data;
and determining the reduction number according to the number of the target key value data.
In this embodiment, when determining the reduction number, the target field and the target key value may be used as index values to search the grouped input data to obtain the number of data corresponding to the target key value, where the data corresponding to the target key value is data that needs to be subjected to reduction processing subsequently, and therefore the reduction number may be determined according to the number of the target key value data.
In an embodiment of the present disclosure, the step S102, namely, the step of determining the reduction number according to the number of the target key value data, may include the following steps:
acquiring the number of reduction nodes;
and comparing the number of the target key value data with the number of the reduction nodes to determine the reduction number.
In this embodiment, when determining the reduction number according to the number of the target key-value data, first obtaining the total number of the reduction nodes, and then comparing the number of the target key-value data with the number of the reduction nodes, to determine a most suitable reduction number that can both fully utilize the reduction nodes and reduce the resource overhead of the system, and is beneficial to load balancing.
Specifically, if the number of the target key value data is less than or equal to the number of the reduction nodes, in order to fully utilize the reduction nodes without causing waste of resources of the reduction nodes, the reduction number may be set to the number of the target key value data having a smaller value. For example, if the number of the target key value data is 15 and the number of the reduction nodes is 18, the reduction number may be set to 15, so that the processing of all data can be completed through one round of reduction processing.
If the number of the target key value data is larger than the number of the reduction nodes and the number of the reduction nodes can be divided, the reduction number can be set as the number of the reduction nodes. For example, if the number of the target key value data is 15, the number of the reduction nodes is 5, and 15 can be divided by 5, the reduction number can be set to 5, so that all data can be processed through three rounds of reduction processing.
If the number of the target key value data is greater than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the obtained remainder is greater than half of the number of the reduction nodes, the number of the reduction nodes can be set as the number of the reduction nodes or the remainder. For example, if the number of the target key value data is 19, the number of the reduction nodes is 5, the remainder of dividing 19 by 5 is 4, and 4 is greater than half of 5, the reduction number may be set to be 5, so that all data processing can be completed through four rounds of reduction processing, or may be set to be 4, so that all data processing can be completed through four rounds of reduction processing, and too much reduction node resources are not wasted through the last round of reduction processing.
If the number of the target key value data is greater than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the remainder is less than or equal to half of the number of the reduction nodes, the reduction number may be set as the number of the reduction nodes or the remainder. For example, if the number of destination key-value data is 17, the number of reduction nodes is 5, the remainder of dividing 17 by 5 is 2, and 2 is less than half of 5, then if the number of reduction nodes is set to 5, less than half of the reduction node resources are used in the final round of reduction processing, which results in waste of the reduction node resources, but if the reduction number is set to the remainder 2, although the reduction process of the last round does not bring much waste of the reduction node resources, the reduction process flow is lengthened and the reduction process time is increased due to more rounds required by the reduction process, therefore, when the number of the target key-value data is greater than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the remainder is less than or equal to half of the number of the reduction nodes, whether the reduction number is set as the reduction node number or the remainder can be selected according to the requirements of practical application.
That is, in an embodiment of the present disclosure, the step of comparing the number of target key value data with the number of reduction nodes to determine the number of reductions may include the following steps:
when the number of the target key value data is less than or equal to the number of the reduction nodes, setting the reduction number as the number of the target key value data;
when the number of the target key value data is larger than the number of the reduction nodes and the number of the reduction nodes can be divided completely, setting the reduction number as the number of the reduction nodes;
when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the remainder is larger than half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder;
and when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided completely, and the remainder is smaller than or equal to half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder.
In an embodiment of the present disclosure, the method may further include the steps of:
performing preset processing on data obtained after the grouping calculation task is executed to obtain preset processing data;
and sending the preset processing data and the reduction number to a reduction component.
After the packet computing task is performed on the packet computing input data, before the data obtained after the packet computing task is performed is sent to the reduction component, it is also possible to perform processing such as merging, partitioning, cleaning, and the like on the data obtained after the packet computing task is performed.
Fig. 2 shows a flowchart of a data processing method according to another embodiment of the present disclosure, which includes the following steps S201 to S203, as shown in fig. 2:
in step S201, receiving preset processing data and a reduction number sent by the grouping calculation component;
in step S202, a reduction node is determined according to the reduction number;
in step S203, reduction processing is performed on the preset processing data using the determined reduction node.
As mentioned above, Reduce refers to a calculation mode of copying data from a relevant map (mapping) end to a Reduce node at runtime for reduction processing. In the prior art, the reduce number is generally not variable except that the reduce number can be set to 0, i.e. some data is considered not to be required to be reduced. However, the sizes of the data to be processed are different, and if the same reduce number is allocated to the data to be processed with various sizes, the utilization rate of the reduce nodes may be reduced, the resource overhead of the system may be increased, and load balancing may not be facilitated.
In view of the above, in this embodiment, a data processing method is proposed which achieves adjustment of the reduction number by setting a reduction number parameter for packet calculation. The technical scheme can flexibly adjust the reduction number, thereby ensuring the utilization rate of the reduce node, reducing the resource overhead of the system and being beneficial to load balancing.
In an embodiment of the present disclosure, the data processing method may be applied to a reduction component that performs reduction processing.
In an embodiment of the present disclosure, when determining the reduction nodes according to the reduction number, when the reduction number is smaller than the reduction nodes, or when the number of the reduction nodes required in a certain turn is smaller than the number of the available reduction nodes, the required reduction nodes may be randomly selected from the available reduction nodes, or an appropriate reduction node may be selected from the available reduction nodes according to the actual application requirement.
In the above embodiment, the preset processing data and the reduction number sent by the packet computing component are received first, and then the reduction node is determined according to the reduction number, more specifically, when the reduction number is smaller than the number of the reduction nodes, the reduction node participating in the reduction processing can be determined according to the reduction number, and when the reduction number is larger than the number of the reduction nodes, the reduction processing round can be determined first according to the reduction number, and then the reduction node participating in the reduction processing in the last reduction processing round is determined; and finally, executing reduction processing on the preset processing data by using the determined reduction node.
Technical terms and technical features related to the technical terms and technical features shown in fig. 2 and related embodiments are the same as or similar to those of the technical terms and technical features shown in fig. 1 and related embodiments, and for the explanation and description of the technical terms and technical features related to the technical terms and technical features shown in fig. 2 and related embodiments, reference may be made to the above explanation of the explanation of fig. 1 and related embodiments, and no further description is provided here.
Fig. 3 illustrates a flowchart of a data processing method according to still another embodiment of the present disclosure, which includes the following steps S301 to S302, as illustrated in fig. 3:
in step S301, the grouping calculation module determines a target field and a target key value, determines a reduction number according to the target field and the target key value, performs a grouping calculation task on grouping calculation input data with the target field, the target key value, and the reduction number as grouping parameters, performs preset processing on data obtained after the grouping calculation task is performed, obtains preset processing data, and sends the preset processing data and the reduction number to the reduction module;
in step S302, the reduction component receives the preset processing data and the reduction number sent by the packet calculation component, determines a reduction node according to the reduction number, and performs reduction processing on the preset processing data using the determined reduction node.
As mentioned above, Reduce refers to a calculation mode of copying data from a relevant map (mapping) end to a Reduce node at runtime for reduction processing. In the prior art, the reduce number is generally not variable except that the reduce number can be set to 0, i.e. some data is considered not to be required to be reduced. However, the sizes of the data to be processed are different, and if the same reduce number is allocated to the data to be processed with various sizes, the utilization rate of the reduce nodes may be reduced, the resource overhead of the system may be increased, and load balancing may not be facilitated.
In view of the above, in this embodiment, a data processing method is proposed which achieves adjustment of the reduction number by setting a reduction number parameter for packet calculation. The technical scheme can flexibly adjust the reduction number, thereby ensuring the utilization rate of the reduce node, reducing the resource overhead of the system and being beneficial to load balancing.
In an embodiment of the present disclosure, the data processing method may be applied to a data processing system including a packet computing component and a reduction component that performs data processing.
Technical terms and technical features related to the technical terms and technical features shown in fig. 3 and related embodiments are the same as or similar to those of the technical terms and technical features shown in fig. 1-2 and related embodiments, and for the explanation and description of the technical terms and technical features related to the technical terms and technical features shown in fig. 3 and related embodiments, reference may be made to the above explanation of the embodiment shown in fig. 1-2 and related embodiments, and no further description is provided here.
The following are embodiments of the disclosed apparatus that may be used to perform embodiments of the disclosed methods.
Fig. 4 shows a block diagram of a data processing apparatus according to an embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 4, the data processing apparatus includes:
a first determining module 401 configured to determine a target field and a target key value;
a second determining module 402 configured to determine a reduction number according to the target field and a target key value;
an execution module 403 configured to execute a grouping calculation task on the grouping calculation input data with the target field, the target key value, and the reduction number as grouping parameters.
As mentioned above, Reduce refers to a calculation mode of copying data from a relevant map (mapping) end to a Reduce node at runtime for reduction processing. In the prior art, the reduce number is generally not variable except that the reduce number can be set to 0, i.e. some data is considered not to be required to be reduced. However, the sizes of the data to be processed are different, and if the same reduce number is allocated to the data to be processed with various sizes, the utilization rate of the reduce nodes may be reduced, the resource overhead of the system may be increased, and load balancing may not be facilitated.
In view of the above, in this embodiment, a data processing apparatus is proposed which achieves adjustment of the reduction number by setting a reduction number parameter for packet calculation. The technical scheme can flexibly adjust the reduction number, thereby ensuring the utilization rate of the reduce node, reducing the resource overhead of the system and being beneficial to load balancing.
In an embodiment of the present disclosure, the data processing apparatus may be implemented as a packet computing component that performs data processing.
In an embodiment of the present disclosure, the target field refers to field information according to which the grouping calculation is performed, and the target key value refers to key value information according to which the grouping calculation is performed. For example, if the target field is a and the target key value is B, the grouping calculation performed by using the target field and the target key value as grouping calculation parameters refers to grouping calculation performed on all data with key values of B in the field a.
In an embodiment of the present disclosure, the reduction number refers to the number of reduction nodes that need to be used when a reduction process is subsequently performed. And determining the reduction number according to the target field and a target key value, wherein the reduction number is smaller than or equal to the total number of reduction nodes.
In an embodiment of the present disclosure, the packet computation input data refers to data waiting for packet computation, and the packet computation input data may be, for example, mapping output data or the like.
In the above embodiment, the target field and the target key value are first determined, then the reduction number required for the subsequent reduction processing is determined according to the target field and the target key value, and then the target field, the target key value and the reduction number are used as common grouping parameters to perform the grouping calculation task on the grouping calculation input data.
In an embodiment of the present disclosure, the second determining module 402 may be configured to:
retrieving the grouped input data by taking the target field and the target key value as index values to obtain the number of target key value data;
and determining the reduction number according to the number of the target key value data.
In this embodiment, when determining the reduction number, the target field and the target key value may be used as index values to search the grouped input data to obtain the number of data corresponding to the target key value, where the data corresponding to the target key value is data that needs to be subjected to reduction processing subsequently, and therefore the reduction number may be determined according to the number of the target key value data.
In an embodiment of the present disclosure, the determining the reduction number according to the number of the target key value data may be configured to:
acquiring the number of reduction nodes;
and comparing the number of the target key value data with the number of the reduction nodes to determine the reduction number.
In this embodiment, when determining the reduction number according to the number of the target key-value data, first obtaining the total number of the reduction nodes, and then comparing the number of the target key-value data with the number of the reduction nodes, to determine a most suitable reduction number that can both fully utilize the reduction nodes and reduce the resource overhead of the system, and is beneficial to load balancing.
Specifically, if the number of the target key value data is less than or equal to the number of the reduction nodes, in order to fully utilize the reduction nodes without causing waste of resources of the reduction nodes, the reduction number may be set to the number of the target key value data having a smaller value. For example, if the number of the target key value data is 15 and the number of the reduction nodes is 18, the reduction number may be set to 15, so that the processing of all data can be completed through one round of reduction processing.
If the number of the target key value data is larger than the number of the reduction nodes and the number of the reduction nodes can be divided, the reduction number can be set as the number of the reduction nodes. For example, if the number of the target key value data is 15, the number of the reduction nodes is 5, and 15 can be divided by 5, the reduction number can be set to 5, so that all data can be processed through three rounds of reduction processing.
If the number of the target key value data is greater than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the obtained remainder is greater than half of the number of the reduction nodes, the number of the reduction nodes can be set as the number of the reduction nodes or the remainder. For example, if the number of the target key value data is 19, the number of the reduction nodes is 5, the remainder of dividing 19 by 5 is 4, and 4 is greater than half of 5, the reduction number may be set to be 5, so that all data processing can be completed through four rounds of reduction processing, or may be set to be 4, so that all data processing can be completed through four rounds of reduction processing, and too much reduction node resources are not wasted through the last round of reduction processing.
If the number of the target key value data is greater than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the remainder is less than or equal to half of the number of the reduction nodes, the reduction number may be set as the number of the reduction nodes or the remainder. For example, if the number of destination key-value data is 17, the number of reduction nodes is 5, the remainder of dividing 17 by 5 is 2, and 2 is less than half of 5, then if the number of reduction nodes is set to 5, less than half of the reduction node resources are used in the final round of reduction processing, which results in waste of the reduction node resources, but if the reduction number is set to the remainder 2, although the reduction process of the last round does not bring much waste of the reduction node resources, the reduction process flow is lengthened and the reduction process time is increased due to more rounds required by the reduction process, therefore, when the number of the target key-value data is greater than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the remainder is less than or equal to half of the number of the reduction nodes, whether the reduction number is set as the reduction node number or the remainder can be selected according to the requirements of practical application.
That is, in an embodiment of the present disclosure, the comparing the number of target key value data with the number of reduction nodes to determine a reduction number portion may be configured to:
when the number of the target key value data is less than or equal to the number of the reduction nodes, setting the reduction number as the number of the target key value data;
when the number of the target key value data is larger than the number of the reduction nodes and the number of the reduction nodes can be divided completely, setting the reduction number as the number of the reduction nodes;
when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the remainder is larger than half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder;
and when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided completely, and the remainder is smaller than or equal to half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder.
In an embodiment of the present disclosure, the apparatus may further include:
and the sending module is configured to perform preset processing on the data obtained after the grouping calculation task is executed to obtain preset processing data, and send the preset processing data and the reduction number to the reduction component.
After the packet computing task is performed on the packet computing input data, before the data obtained after the packet computing task is performed is sent to the reduction component, it is also possible to perform processing such as merging, partitioning, cleaning, and the like on the data obtained after the packet computing task is performed.
Fig. 5 shows a block diagram of a data processing apparatus according to another embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 5, the data processing apparatus includes:
a receiving module 501 configured to receive preset processing data and reduction number sent by the packet computing component;
a third determining module 502 configured to determine a reduction node according to the reduction number;
a processing module 503 configured to perform reduction processing on the preset processing data using the determined reduction node.
As mentioned above, Reduce refers to a calculation mode of copying data from a relevant map (mapping) end to a Reduce node at runtime for reduction processing. In the prior art, the reduce number is generally not variable except that the reduce number can be set to 0, i.e. some data is considered not to be required to be reduced. However, the sizes of the data to be processed are different, and if the same reduce number is allocated to the data to be processed with various sizes, the utilization rate of the reduce nodes may be reduced, the resource overhead of the system may be increased, and load balancing may not be facilitated.
In view of the above, in this embodiment, a data processing apparatus is proposed which achieves adjustment of the reduction number by setting a reduction number parameter for packet calculation. The technical scheme can flexibly adjust the reduction number, thereby ensuring the utilization rate of the reduce node, reducing the resource overhead of the system and being beneficial to load balancing.
In an embodiment of the present disclosure, the data processing apparatus may be implemented as a reduction component that performs reduction processing.
In an embodiment of the present disclosure, when determining the reduction nodes according to the reduction number, when the reduction number is smaller than the reduction nodes, or when the number of the reduction nodes required in a certain turn is smaller than the number of the available reduction nodes, the required reduction nodes may be randomly selected from the available reduction nodes, or an appropriate reduction node may be selected from the available reduction nodes according to the actual application requirement.
In the above embodiment, the preset processing data and the reduction number sent by the packet computing component are received first, and then the reduction node is determined according to the reduction number, more specifically, when the reduction number is smaller than the number of the reduction nodes, the reduction node participating in the reduction processing can be determined according to the reduction number, and when the reduction number is larger than the number of the reduction nodes, the reduction processing round can be determined first according to the reduction number, and then the reduction node participating in the reduction processing in the last reduction processing round is determined; and finally, executing reduction processing on the preset processing data by using the determined reduction node.
Technical terms and technical features related to the technical terms and technical features shown in fig. 5 and related embodiments are the same as or similar to those of the technical terms and technical features shown in fig. 4 and related embodiments, and for the explanation and description of the technical terms and technical features related to the technical terms and technical features shown in fig. 5 and related embodiments, reference may be made to the above explanation of the explanation of fig. 4 and related embodiments, and no further description is provided here.
Fig. 6 shows a block diagram of a data processing apparatus according to still another embodiment of the present disclosure, which may be implemented as part or all of an electronic device by software, hardware, or a combination of both. As shown in fig. 6, the data processing apparatus includes:
the grouping calculation component 601 is configured to determine a target field and a target key value, determine a reduction number according to the target field and the target key value, execute a grouping calculation task on grouping calculation input data by taking the target field, the target key value and the reduction number as grouping parameters, perform preset processing on data obtained after the grouping calculation task is executed to obtain preset processing data, and send the preset processing data and the reduction number to the reduction component;
a reduction component 602 configured to receive the preset processing data and the reduction number sent by the packet calculation component, determine a reduction node according to the reduction number, and perform reduction processing on the preset processing data by using the determined reduction node.
As mentioned above, Reduce refers to a calculation mode of copying data from a relevant map (mapping) end to a Reduce node at runtime for reduction processing. In the prior art, the reduce number is generally not variable except that the reduce number can be set to 0, i.e. some data is considered not to be required to be reduced. However, the sizes of the data to be processed are different, and if the same reduce number is allocated to the data to be processed with various sizes, the utilization rate of the reduce nodes may be reduced, the resource overhead of the system may be increased, and load balancing may not be facilitated.
In view of the above, in this embodiment, a data processing apparatus is proposed which achieves adjustment of the reduction number by setting a reduction number parameter for packet calculation. The technical scheme can flexibly adjust the reduction number, thereby ensuring the utilization rate of the reduce node, reducing the resource overhead of the system and being beneficial to load balancing.
In an embodiment of the present disclosure, the data processing apparatus may be implemented as a data processing system including a packet computation component and a reduction component that performs data processing.
Technical terms and technical features related to the technical terms and technical features shown in fig. 6 and related embodiments are the same as or similar to those of the technical terms and technical features shown in fig. 4-5 and related embodiments, and for the explanation and description of the technical terms and technical features related to the technical terms and technical features shown in fig. 6 and related embodiments, reference may be made to the above explanation of the explanation of fig. 4-5 and related embodiments, and no further description is provided here.
The present disclosure also discloses an electronic device, fig. 7 shows a block diagram of an electronic device according to an embodiment of the present disclosure, and as shown in fig. 7, the electronic device 700 includes a memory 701 and a processor 702; wherein the content of the first and second substances,
the memory 701 is used to store one or more computer instructions, which are executed by the processor 702 to implement the above-described method steps.
Fig. 8 is a schematic block diagram of a computer system suitable for implementing a data processing method according to an embodiment of the present disclosure.
As shown in fig. 8, the computer system 800 includes a processing unit 801 which can execute various processes in the above-described embodiments according to a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM803, various programs and data necessary for the operation of the system 800 are also stored. The processing unit 801, the ROM802, and the RAM803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, a mouse, and the like; an output section 807 including a signal such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted on the storage section 808 as necessary. The processing unit 801 may be implemented as a CPU, a GPU, a TPU, an FPGA, an NPU, or other processing units.
In particular, the above described methods may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the route planning method. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809 and/or installed from the removable medium 811.
A computer program product is also disclosed in embodiments of the present disclosure, the computer program product comprising computer programs/instructions which, when executed by a processor, implement any of the above method steps.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.
As another aspect, the disclosed embodiment also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the foregoing embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer readable storage medium stores one or more programs for use by one or more processors in performing the methods described in the embodiments of the present disclosure.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims (10)

1. A method of data processing, comprising:
determining a target field and a target key value;
determining the reduction number according to the target field and the target key value;
and executing a grouping calculation task on the grouping calculation input data by taking the target field, the target key value and the reduction number as grouping parameters.
2. The method of claim 1, the determining a reduction number from the destination field and a destination key value, comprising:
retrieving the grouped input data by taking the target field and the target key value as index values to obtain the number of target key value data;
and determining the reduction number according to the number of the target key value data.
3. The method of claim 2, the determining a reduction quantity from the quantity of target key-value data, comprising:
acquiring the number of reduction nodes;
and comparing the number of the target key value data with the number of the reduction nodes to determine the reduction number.
4. The method of claim 3, the comparing the number of target key-value data to the number of reduction nodes to determine a number of reductions, comprising:
when the number of the target key value data is less than or equal to the number of the reduction nodes, setting the reduction number as the number of the target key value data;
when the number of the target key value data is larger than the number of the reduction nodes and the number of the reduction nodes can be divided completely, setting the reduction number as the number of the reduction nodes;
when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided exactly, and the remainder is larger than half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder;
and when the number of the target key value data is larger than the number of the reduction nodes, the number of the reduction nodes cannot be divided completely, and the remainder is smaller than or equal to half of the number of the reduction nodes, setting the reduction number as the number of the reduction nodes or the remainder.
5. The method of any of claims 1-4, further comprising:
performing preset processing on data obtained after the grouping calculation task is executed to obtain preset processing data;
and sending the preset processing data and the reduction number to a reduction component.
6. A method of data processing, comprising:
receiving preset processing data and reduction quantity sent by a grouping calculation component;
determining a reduction node according to the reduction number;
and executing reduction processing on the preset processing data by using the determined reduction node.
7. A method of data processing, comprising:
the grouping calculation component determines a target field and a target key value, determines a reduction number according to the target field and the target key value, executes a grouping calculation task on grouping calculation input data by taking the target field, the target key value and the reduction number as grouping parameters, performs preset processing on data obtained after the grouping calculation task is executed to obtain preset processing data, and sends the preset processing data and the reduction number to the reduction component;
the reduction component receives preset processing data and reduction quantity sent by the grouping calculation component, determines a reduction node according to the reduction quantity, and executes reduction processing on the preset processing data by using the determined reduction node.
8. An electronic device comprising a memory and a processor; wherein the content of the first and second substances,
the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the steps of the method of any one of claims 1-7.
9. A computer readable storage medium having computer instructions stored thereon, wherein the computer instructions, when executed by a processor, implement the steps of the method of any one of claims 1-7.
10. A computer program product comprising computer programs/instructions which, when executed by a processor, carry out the steps of the method of any one of claims 1 to 7.
CN202111062546.6A 2021-09-10 2021-09-10 Data processing method, data processing apparatus, electronic device, storage medium, and program product Pending CN113722071A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111062546.6A CN113722071A (en) 2021-09-10 2021-09-10 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111062546.6A CN113722071A (en) 2021-09-10 2021-09-10 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Publications (1)

Publication Number Publication Date
CN113722071A true CN113722071A (en) 2021-11-30

Family

ID=78683212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111062546.6A Pending CN113722071A (en) 2021-09-10 2021-09-10 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Country Status (1)

Country Link
CN (1) CN113722071A (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101236511A (en) * 2007-01-31 2008-08-06 国际商业机器公司 Method and system for optimizing global reduction treatment
EP2746941A1 (en) * 2012-12-20 2014-06-25 Thomson Licensing Device and method for optimization of data processing in a MapReduce framework
US20150127649A1 (en) * 2013-11-01 2015-05-07 Cognitive Electronics, Inc. Efficient implementations for mapreduce systems
US20150149437A1 (en) * 2013-11-26 2015-05-28 InMobi Pte Ltd. Method and System for Optimizing Reduce-Side Join Operation in a Map-Reduce Framework
US20150227393A1 (en) * 2014-02-10 2015-08-13 International Business Machines Corporation Dynamic Resource Allocation in Mapreduce
US20180067764A1 (en) * 2016-09-08 2018-03-08 International Business Machines Corporation Smart reduce task scheduler
CN109901931A (en) * 2019-03-07 2019-06-18 北京奇艺世纪科技有限公司 A kind of reduction function numbers determine method, apparatus and system
CN109992372A (en) * 2017-12-29 2019-07-09 中国移动通信集团陕西有限公司 A kind of data processing method and device based on mapping reduction
CN110555070A (en) * 2018-06-01 2019-12-10 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101236511A (en) * 2007-01-31 2008-08-06 国际商业机器公司 Method and system for optimizing global reduction treatment
EP2746941A1 (en) * 2012-12-20 2014-06-25 Thomson Licensing Device and method for optimization of data processing in a MapReduce framework
US20150127649A1 (en) * 2013-11-01 2015-05-07 Cognitive Electronics, Inc. Efficient implementations for mapreduce systems
US20150149437A1 (en) * 2013-11-26 2015-05-28 InMobi Pte Ltd. Method and System for Optimizing Reduce-Side Join Operation in a Map-Reduce Framework
US20150227393A1 (en) * 2014-02-10 2015-08-13 International Business Machines Corporation Dynamic Resource Allocation in Mapreduce
US20180067764A1 (en) * 2016-09-08 2018-03-08 International Business Machines Corporation Smart reduce task scheduler
CN109992372A (en) * 2017-12-29 2019-07-09 中国移动通信集团陕西有限公司 A kind of data processing method and device based on mapping reduction
CN110555070A (en) * 2018-06-01 2019-12-10 百度在线网络技术(北京)有限公司 Method and apparatus for outputting information
CN109901931A (en) * 2019-03-07 2019-06-18 北京奇艺世纪科技有限公司 A kind of reduction function numbers determine method, apparatus and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
HAEJOON LEE, ET.AL: "An experimental comparison of iterative mapreduce frameworks", 《PROCEEDINGS OF THE 25TH ACMINTERNATIONAL ON CONFERENCE ON INFORMATION AND KNOWLEDGE MANAGEMENT》, pages 2089 - 2094 *
刘子骜: "云环境下基于MapReduce的可验证计算技术研究", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》, no. 4, pages 138 - 204 *

Similar Documents

Publication Publication Date Title
CN108052615B (en) Access request processing method, device, medium and electronic equipment
US8726290B2 (en) System and/or method for balancing allocation of data among reduce processes by reallocation
US8234652B2 (en) Performing setup operations for receiving different amounts of data while processors are performing message passing interface tasks
US8127300B2 (en) Hardware based dynamic load balancing of message passing interface tasks
CN110113408B (en) Block synchronization method, equipment and storage medium
CN109408590B (en) Method, device and equipment for expanding distributed database and storage medium
CN109447253B (en) Video memory allocation method and device, computing equipment and computer storage medium
CN110347515B (en) Resource optimization allocation method suitable for edge computing environment
US9852050B2 (en) Selecting computing resources
WO2015127668A1 (en) Task centric resource scheduling framework
CN114503076A (en) Incremental data determining method and device, server and terminal equipment
CN110673959A (en) System, method and apparatus for processing tasks
CN112559165A (en) Memory management method and device, electronic equipment and computer readable storage medium
US10133688B2 (en) Method and apparatus for transmitting information
CN110795226A (en) Method for processing task using computer system, electronic device and storage medium
CN112286688A (en) Memory management and use method, device, equipment and medium
CN110225082B (en) Task processing method and device, electronic equipment and computer readable medium
CN113626173A (en) Scheduling method, device and storage medium
CN115525400A (en) Method, apparatus and program product for managing multiple computing tasks on a batch basis
CN111597035B (en) Simulation engine time propulsion method and system based on multithreading
CN109842665B (en) Task processing method and device for task allocation server
CN113722071A (en) Data processing method, data processing apparatus, electronic device, storage medium, and program product
EP4328748A1 (en) Data processing method and apparatus, computer device, computer-readable storage medium, and computer program product
CN112799820A (en) Data processing method, data processing apparatus, electronic device, storage medium, and program product
CN110543362B (en) Graphics processor management method and device and server

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination