CN109901931B - Reduction function quantity determination method, device and system - Google Patents

Reduction function quantity determination method, device and system Download PDF

Info

Publication number
CN109901931B
CN109901931B CN201910171361.5A CN201910171361A CN109901931B CN 109901931 B CN109901931 B CN 109901931B CN 201910171361 A CN201910171361 A CN 201910171361A CN 109901931 B CN109901931 B CN 109901931B
Authority
CN
China
Prior art keywords
key
function
determining
reduction
mapping function
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910171361.5A
Other languages
Chinese (zh)
Other versions
CN109901931A (en
Inventor
梁建煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201910171361.5A priority Critical patent/CN109901931B/en
Publication of CN109901931A publication Critical patent/CN109901931A/en
Application granted granted Critical
Publication of CN109901931B publication Critical patent/CN109901931B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The application provides a method, a device and a system for determining the number of reduction functions, and the method is based on a MapReduce server, wherein the MapReduce server comprises a mapping function for calculating an input file block and outputting a key value pair and a reduction function for reducing the key value pair, and the number determination method firstly obtains the number of the key value pairs output by the mapping function in a first preset time period. And then determining the target time for processing one key value pair by the reduction function, and determining the key value pair processing number of the reduction function in a second preset time period based on the target time. And then determining the target number of the reduction function based on the number of the key value pairs output by the mapping function and the key value pair processing number of the reduction function. Therefore, the target number of the reduction functions is determined according to the number of the key value pairs output by the mapping functions, so that resources are reasonably distributed, the task processing efficiency is improved, and the problems of resource waste or blockage and the like caused by too much or too little number of the reduction functions are solved.

Description

Reduction function quantity determination method, device and system
Technical Field
The application relates to the technical field of data processing, in particular to a method, a device and a system for determining the number of reduction functions.
Background
MapReduce is a programming model for parallel computation of large-scale datasets (greater than 1 TB). It contains two parts, Map mapping and Reduce. Specifically, the input file is divided into a plurality of split blocks, each split block is calculated by a mapping function Mapper, the mapping function Mapper outputs a new set of key-value pairs, such as < key, value > pairs, and then sends the key-value pairs to the reduction function Reducer, which performs reduction calculation.
At present, before sending the key-value pairs to the Reducer function Reducer, the number of the Reducer function Reducer needs to be predetermined, and the mapping function Mapper sorts the key-value pairs according to the Reducer function to which the key-value pairs belong, and stores the key-value pairs in a continuous disk, such as on SATA, according to the sorting. Therefore, when the reduction function Reducer reads the key value pairs in the mapping function Mapper, the reduction function Reducer can read in a whole block, and the reading performance is further improved.
However, the inventors have found that the number of the Reducer functions is set by a user in the above-described method, and that an excessive or insufficient number of the Reducer functions causes a waste of resources or a task jam. Therefore, how to provide a method, an apparatus and a system for determining the number of the reduction functions to improve the resource utilization rate and the task processing efficiency is a great technical problem to be solved by those skilled in the art.
Disclosure of Invention
In view of this, embodiments of the present application provide a method, an apparatus, and a system for determining the number of reduction functions, which can improve resource utilization and task processing efficiency.
In order to achieve the above purpose, the embodiments of the present application provide the following technical solutions:
a reduction function number determination method applied to a MapReduce server, where the MapReduce server includes a mapping function and a reduction function, the mapping function is configured to calculate an input file block and output at least one key-value pair, and the reduction function is configured to perform reduction calculation on the key-value pair, and the reduction function number determination method includes:
acquiring the number of the key value pairs output by the mapping function in a first preset time period;
determining a target time for the reduction function to process one of the key-value pairs;
determining the key value pair processing number of the reduction function in a second preset time period based on the target time;
and determining the target number of the reduction function based on the number of the key-value pairs output by the mapping function and the key-value pair processing number of the reduction function.
Optionally, the obtaining the number of key-value pairs output by the mapping function within the first preset time period includes:
acquiring the number of the file blocks input into the mapping function and the number of the key value pairs output by the mapping function in a third preset time period;
determining an input-output ratio of the mapping function based on the number of file blocks input into the mapping function and the number of the key-value pairs output by the mapping function;
acquiring the number of file blocks input into the mapping function in the first preset time period;
and determining the product of the number of the file blocks input into the mapping function in the first preset time period and the input-output ratio of the mapping function as the number of the key value pairs output by the mapping function in the first preset time period.
Optionally, the determining a target time for the reduction function to process one of the key-value pairs includes:
acquiring an attribute identifier of the key value pair;
when the attribute identification is the first-class attribute identification, acquiring historical processing time of the key value pair in a fourth preset time period, and determining the average value of the historical processing time as the target time of the key value pair;
when the attribute identifier is a second-type attribute identifier, splitting the key-value pair into a plurality of subdata, acquiring the processing time for each reduction function to process each subdata, and determining the sum of the processing times as the target time for the reduction function to process one key-value pair.
Optionally, the determining, based on the target time, the key-value pair processing number of the reduction function in a second preset time period includes:
and determining the quotient of the second preset time period and the target time as the key-value pair processing number of the reduction function in the second preset time period.
Optionally, the determining the target number of the reduction function based on the number of the key-value pairs output by the mapping function and the number of key-value pair processes of the reduction function includes:
determining the quotient of the number of the key-value pairs output by the mapping function and the number of the key-value pair treatments of the reduction function as the target number of the reduction function.
A reduction function number determination apparatus applied to a MapReduce server, the MapReduce server including a mapping function and a reduction function, the mapping function being configured to calculate an input file block and output at least one key-value pair, and the reduction function being configured to perform a reduction calculation on the key-value pair, the reduction function number determination apparatus comprising:
the obtaining module is used for obtaining the number of the key value pairs output by the mapping function within a first preset time period;
a first determining module for determining a target time for the reduction function to process one of the key-value pairs;
the second determining module is used for determining the key value pair processing number of the reduction function in a second preset time period based on the target time;
and a third determining module, configured to determine the target number of the reduction function based on the number of key-value pairs output by the mapping function and the number of key-value pair processes of the reduction function.
Optionally, the obtaining module includes:
a first obtaining unit, configured to obtain, in a third preset time period, the number of file blocks input to the mapping function and the number of key value pairs output by the mapping function;
a first determining unit, configured to determine an input-output ratio of the mapping function based on the number of file blocks input to the mapping function and the number of key-value pairs output by the mapping function;
a second obtaining unit, configured to obtain the number of file blocks input to the mapping function within the first preset time period;
a second determining unit, configured to determine that a product of the number of file blocks input to the mapping function in the first preset time period and an input-output ratio of the mapping function is the number of key-value pairs output by the mapping function in the first preset time period.
Optionally, the first determining module includes:
a third obtaining unit, configured to obtain an attribute identifier of the key-value pair;
a third determining unit, configured to, when the attribute identifier is the first-class attribute identifier, obtain historical processing time of the key-value pair within a fourth preset time period, and determine that an average value of the historical processing time is a target time of the key-value pair;
and a fourth determining unit, configured to, when the attribute identifier is a second-type attribute identifier, split the key-value pair into a plurality of sub-data, obtain a processing time for each reduction function to process each sub-data, and determine that a sum of the processing times is a target time for the reduction function to process one key-value pair.
Optionally, the second determining module includes a fifth determining unit, the third determining module includes a sixth determining unit,
the fifth determining unit is configured to determine that a quotient of the second preset time period and the target time is a key-value pair processing number of the reduction function in the second preset time period;
the sixth determining unit is configured to determine that a quotient of the number of key-value pairs output by the mapping function and the number of key-value pair manipulations of the reduction function is a target number of the reduction function.
A reduced function quantity determination system comprising any one of the above-described reduced function quantity determination apparatuses.
Based on the technical scheme, the application provides a reduction function quantity determining method based on a MapReduce server, wherein the MapReduce server comprises a mapping function and a reduction function, the mapping function is used for calculating an input file block and outputting at least one key value pair, the reduction function is used for reducing the key value pair, and the reduction function quantity determining method firstly obtains the quantity of the key value pairs output by the mapping function in a first preset time period. And then determining the target time for processing one key value pair by the reduction function, and determining the key value pair processing number of the reduction function in a second preset time period based on the target time. And then determining the target number of the reduction function based on the number of the key value pairs output by the mapping function and the key value pair processing number of the reduction function. Therefore, the target number of the reduction functions is determined according to the number of the key value pairs output by the mapping functions, so that resources are reasonably distributed, the task processing efficiency is improved, and the problems of resource waste or blockage and the like caused by too much or too little number of the reduction functions are solved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a block diagram illustrating a structure of a reduction function quantity determination system according to an embodiment of the present application;
fig. 2 is a flowchart of a method for determining the number of reduction functions according to an embodiment of the present application;
fig. 3 is a flowchart of a method for determining the number of reduction functions according to an embodiment of the present application;
fig. 4 is a flowchart of a method for determining the number of reduction functions according to an embodiment of the present application;
fig. 5 is a flowchart of a method for determining the number of reduction functions according to an embodiment of the present application;
fig. 6 is a flowchart of a method for determining the number of reduction functions according to an embodiment of the present application;
fig. 7 is a schematic structural diagram of a reduction function quantity determining apparatus according to an embodiment of the present application;
fig. 8 is a schematic hardware structure diagram of a server according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
Fig. 1 is a block diagram of a structure of a reduction function quantity determining system according to an embodiment of the present application, where the reduction function quantity determining system shown in the figure can be used to implement the reduction function quantity determining method according to the embodiment of the present application. Referring to fig. 1, the reduction function number determination system includes a MapReduce server including a mapping function 101 and a reduction function 102.
The input file is divided into a plurality of file blocks, each file block is calculated by a mapping function Mapper, the mapping function Mapper outputs a new set of key-value pairs, such as < key, value > pairs, and then sends the key-value pairs to a reduction function Reducer, which performs reduction calculation.
Based on the reduction function number determination system shown in fig. 1, the reduction function number determination method provided by the present application is introduced from the perspective of the MapReduce server. As shown in fig. 2, a flowchart of a method for determining the number of reduction functions provided in the embodiment of the present application, where the method is applied to a MapReduce server, may include:
and S21, acquiring the number of the key value pairs output by the mapping function in a first preset time period.
Specifically, as shown in fig. 3, the embodiment provides a specific implementation manner for obtaining the number of key-value pairs output by the mapping function within a first preset time period, including:
s31, acquiring the number of file blocks input into the mapping function and the number of key value pairs output by the mapping function in a third preset time period;
s32, determining the input-output ratio of the mapping function based on the number of file blocks input into the mapping function and the number of key value pairs output by the mapping function;
s33, acquiring the number of file blocks input into the mapping function in the first preset time period;
and S34, determining that the product of the number of the file blocks input into the mapping function in the first preset time period and the input-output ratio of the mapping function is the number of the key value pairs output by the mapping function in the first preset time period.
The third preset time period may be less than the first preset time period, for example, the third preset time period is 1 hour, and the first preset time period is 12 hours, in this embodiment, the number of file blocks input by the mapping function and the number of key-value pairs output by the mapping function within 1 hour are first obtained, then, a quotient of the number of file blocks input to the mapping function and the number of key-value pairs output by the mapping function is determined as an input-output ratio of the mapping function, and then, a product of the number of file blocks of the input mapping function corresponding to 12 hours and the input-output ratio of the mapping function is determined as the number of key-value pairs output by the mapping function corresponding to 12 hours.
In addition, the third predetermined time period may be a time period within the first predetermined time period, for example, the first time period is 1: 00-12: 00, in this case, the third preset time period may be 1: 00-2: 00, calculating the number of the key-value pairs output by the mapping function in the first preset time period through the related data (the number of the file blocks input to the mapping function and the number of the key-value pairs output by the mapping function) in a time period (a third preset time period) in the first preset time period.
And S22, determining the target time for the reduction function to process one key-value pair.
Specifically, as shown in fig. 4, this embodiment provides a specific way for determining a target time for the reduction function to process one key-value pair, including:
s41, acquiring the attribute identification of the key value pair;
s42, when the attribute identifier is the first-class attribute identifier, acquiring historical processing time of the key value pair in a fourth preset time period, and determining that the average value of the historical processing time is the target time of the key value pair;
s43, when the attribute identifier is a second-class attribute identifier, splitting the key-value pair into a plurality of subdata, acquiring the processing time for each reduction function to process each subdata, and determining the sum of the processing times as the target time for the reduction function to process one key-value pair.
In this embodiment, the first-type attribute is identified as a key-value pair that is not operated for the first time, for example, a key-value pair that is operated repeatedly at regular time, so that the historical processing time of the key-value pair can be obtained, and the average time of a plurality of the key-value pairs is the target time of one key-value pair.
The second type of attribute information is a key value pair which is operated for the first time, so that when calculating the target time, the key value pair can be split into a plurality of groups of subdata, the group of subdata is processed by a plurality of reduction functions, and finally the sum of the operation time of each subdata is determined to be the target time of the key value pair.
And S23, determining the key value pair processing number of the reduction function in a second preset time period based on the target time.
Specifically, as shown in fig. 5, the embodiment provides a specific implementation manner for determining the key value pair processing number of the reduction function in a second preset time period based on the target time, and the implementation manner includes:
and S51, determining the quotient of the second preset time period and the target time as the key value pair processing number of the reduction function in the second preset time period.
The second preset time is a set value, which may be an expected calculation time value set by a user, for example, the total time period for which the user wants to calculate data is controlled within 1 minute, so that the second preset time period may be set to 1 minute. Of course, it is also possible to automatically calculate a preferred calculation time according to the historical calculation time, determine the preferred calculation time as the second preset time period, for example, obtain the historical times of calculating the same number of key-value pairs for 3 times as t1, t2 and t3, and then determine the average value of t1, t2 and t3 as the second preset time period.
The key-value pair processing amount of the reduction function characterizes the processing capacity of the reduction function for processing the key-value pair, and the step may empirically determine a preferred calculation time, i.e. a second preset time period, and then determine the key-value pair processing amount of the reduction function as the second preset time period/target time according to the target time for processing a key-value pair by the reduction function.
And S24, determining the target number of the reduction function based on the number of the key value pairs output by the mapping function and the number of the key value pair processing of the reduction function.
Specifically, as shown in fig. 6, the embodiment provides a specific implementation manner for determining the target number of the reduction function based on the number of key-value pairs output by the mapping function and the number of key-value pair processes of the reduction function, and the implementation manner includes:
and S61, determining the quotient of the number of the key value pairs output by the mapping function and the key value pair processing number of the reduction function as the target number of the reduction function.
When the number of key-value pair processings of each reduction function is determined in step S23, the target number of reduction functions required is the number of key-value pairs output by the mapping function/the number of key-value pair processings of the reduction function.
Therefore, the target number of the reduction functions is determined according to the number of the key value pairs output by the mapping functions, so that resources are reasonably distributed, the task processing efficiency is improved, and the problems of resource waste or blockage and the like caused by too much or too little number of the reduction functions are solved.
The MapReduce server provided by the embodiment of the present application is introduced below, and the MapReduce server described below and the method for determining the number of reduction functions described above in terms of the MapReduce server are referred to in a corresponding manner. As shown in fig. 7, which is a structural block diagram of a MapReduce server provided in the embodiment of the present application, referring to fig. 7, the MapReduce server may include:
an obtaining module 71, configured to obtain the number of key-value pairs output by the mapping function within a first preset time period;
a first determining module 72 for determining a target time for the reduction function to process one of the key-value pairs;
a second determining module 73, configured to determine, based on the target time, a key-value-pair processing number of the reduction function within a second preset time period;
a third determining module 74, configured to determine the target number of the reduction function based on the number of the key-value pairs output by the mapping function and the number of the key-value pair processes of the reduction function.
On the basis of the foregoing embodiment, the obtaining module provided in this embodiment includes:
a first obtaining unit, configured to obtain the number of file blocks input to the mapping function and the number of key value pairs output by the mapping function in a third preset time period, where the third preset time period is included in the first preset time period;
a first determining unit, configured to determine an input-output ratio of the mapping function based on the number of file blocks input to the mapping function and the number of key-value pairs output by the mapping function;
a second obtaining unit, configured to obtain the number of file blocks input to the mapping function within the first preset time period;
and the second determining unit is used for determining that the product of the number of the file blocks of the mapping function input in the first preset time period and the input-output ratio of the mapping function is the number of the key value pairs output by the mapping function in the first preset time period.
In addition, in the device for determining the number of reduction functions provided in this embodiment, the first determining module includes:
a third obtaining unit, configured to obtain an attribute identifier of the key-value pair;
a third determining unit, configured to, when the attribute identifier is the first-class attribute identifier, obtain historical processing time of the key-value pair within a fourth preset time period, and determine that an average value of the historical processing time is a target time of the key-value pair;
and a fourth determining unit, configured to, when the attribute identifier is a second-type attribute identifier, split the key-value pair into a plurality of sub-data, obtain a processing time for each reduction function to process each sub-data, and determine that a sum of the processing times is a target time for the reduction function to process one key-value pair.
On the basis of the above-described embodiment, the second determination module includes a fifth determination unit, the third determination module includes a sixth determination unit,
the fifth determining unit is configured to determine that a quotient of the second preset time period and the target time is a key-value pair processing number of the reduction function in the second preset time period;
the sixth determining unit is configured to determine that a quotient of the number of key-value pairs output by the mapping function and the number of key-value pair processes of the reduction function is a target number of the reduction function.
The working principle of the MapReduce server is shown in the embodiment of the method.
The above describes a software functional module architecture of a MapReduce server, and on the hardware structure of the server, the server may implement a resource allocation scheme by:
fig. 8 is a block diagram of a hardware structure of a server according to an embodiment of the present application, and referring to fig. 8, the server may include: a processor 111, a communication interface 112, a memory 113, and a communication bus 114;
the processor 111, the communication interface 112 and the memory 113 are communicated with each other through a communication bus 114;
alternatively, the communication interface 112 may be an interface of a communication module, such as an interface of a GSM module;
a processor 111 for executing programs;
a memory 113 for storing programs;
the program may include program code including computer operating instructions.
The processor 111 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present application.
The memory 113 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
Among them, the procedure can be specifically used for:
acquiring the number of key value pairs output by the mapping function in a first preset time period;
determining a target time for the reduction function to process one of the key-value pairs;
determining the key value pair processing number of the reduction function in a second preset time period based on the target time;
and determining the target number of the reduction function based on the number of the key-value pairs output by the mapping function and the key-value pair processing number of the reduction function.
Optionally, the obtaining the number of key-value pairs output by the mapping function within the first preset time period includes:
acquiring the number of file blocks input into the mapping function and the number of key value pairs output by the mapping function in a third preset time period, wherein the third preset time period is included in the first preset time period;
determining the input-output ratio of the mapping function based on the number of file blocks input into the mapping function and the number of key value pairs output by the mapping function;
acquiring the number of file blocks input into the mapping function in the first preset time period;
and determining the product of the number of the file blocks input into the mapping function in the first preset time period and the input-output ratio of the mapping function as the number of the key value pairs output by the mapping function in the first preset time period.
Optionally, the determining a target time for the reduction function to process one of the key-value pairs includes:
acquiring an attribute identifier of the key value pair;
when the attribute identification is the first-class attribute identification, acquiring historical processing time of the key value pair in a fourth preset time period, and determining the average value of the historical processing time as the target time of the key value pair;
when the attribute identifier is a second-type attribute identifier, splitting the key-value pair into a plurality of subdata, acquiring the processing time for each reduction function to process each subdata, and determining the sum of the processing times as the target time for the reduction function to process one key-value pair.
Optionally, the determining, based on the target time, the key-value pair processing number of the reduction function in a second preset time period includes:
and determining the quotient of the second preset time period and the target time as the key-value pair processing number of the reduction function in the second preset time period.
Optionally, the determining the target number of the reduction function based on the number of the key-value pairs output by the mapping function and the number of key-value pair processes of the reduction function includes:
determining the quotient of the number of the key-value pairs output by the mapping function and the number of the key-value pair treatments of the reduction function as the target number of the reduction function.
The working principle of the server refers to the method embodiment, and the target number of the reduction functions can be determined according to the number of the key value pairs output by the mapping function without repeated description, so that resources are reasonably distributed, the task processing efficiency is further improved, and the problems of resource waste or blockage and the like caused by too much or too little number of the reduction functions are avoided.
To sum up, the application provides a method, a device and a system for determining the number of reduction functions, based on a MapReduce server, wherein the MapReduce server includes a mapping function and a reduction function, and the method for determining the number of reduction functions firstly obtains the number of key value pairs output by the mapping function in a first preset time period. And then determining the target time for processing one key value pair by the reduction function, and determining the key value pair processing number of the reduction function in a second preset time period based on the target time. And then determining the target number of the reduction function based on the number of the key value pairs output by the mapping function and the key value pair processing number of the reduction function. Therefore, the target number of the reduction functions is determined according to the number of the key value pairs output by the mapping functions, so that resources are reasonably distributed, the task processing efficiency is improved, and the problems of resource waste or blockage and the like caused by too much or too little number of the reduction functions are solved.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A reduction function quantity determination method is applied to a MapReduce server, wherein the MapReduce server comprises a mapping function and a reduction function, the mapping function is used for calculating an input file block and outputting at least one key-value pair, the reduction function is used for performing reduction calculation on the key-value pair, and the reduction function quantity determination method comprises the following steps:
acquiring the number of the key value pairs output by the mapping function in a first preset time period;
determining a target time for the reduction function to process one of the key-value pairs;
determining the key value pair processing number of the reduction function in a second preset time period based on the target time;
and determining the target number of the reduction function based on the number of the key-value pairs output by the mapping function and the key-value pair processing number of the reduction function.
2. The reduction function quantity determination method according to claim 1, wherein the obtaining the quantity of key-value pairs output by the mapping function within a first preset time period comprises:
acquiring the number of the file blocks input into the mapping function and the number of the key value pairs output by the mapping function in a third preset time period;
determining an input-output ratio of the mapping function based on the number of file blocks input into the mapping function and the number of the key-value pairs output by the mapping function;
acquiring the number of file blocks input into the mapping function in the first preset time period;
and determining the product of the number of the file blocks input into the mapping function in the first preset time period and the input-output ratio of the mapping function as the number of the key value pairs output by the mapping function in the first preset time period.
3. The method of claim 1, wherein determining a target time for the reduction function to process one of the key-value pairs comprises:
acquiring an attribute identifier of the key value pair;
when the attribute identification is the first-class attribute identification, acquiring historical processing time of the key value pair in a fourth preset time period, and determining the average value of the historical processing time as the target time of the key value pair;
when the attribute identifier is a second-type attribute identifier, splitting the key-value pair into a plurality of subdata, acquiring the processing time for each reduction function to process each subdata, and determining the sum of the processing times as the target time for the reduction function to process one key-value pair.
4. The method for determining the number of reduction functions according to any one of claims 1 to 3, wherein the determining the number of key-value pair manipulations of the reduction function within a second preset time period based on the target time comprises:
and determining the quotient of the second preset time period and the target time as the key-value pair processing number of the reduction function in the second preset time period.
5. The method for determining the number of reduction functions according to any one of claims 1 to 3, wherein the determining the target number of reduction functions based on the number of key-value pairs output by the mapping function and the number of key-value pair processes of the reduction functions comprises:
determining the quotient of the number of the key-value pairs output by the mapping function and the number of the key-value pair treatments of the reduction function as the target number of the reduction function.
6. A reduction function quantity determination device applied to a MapReduce server, the MapReduce server including a mapping function and a reduction function, the mapping function being configured to calculate an input file block and output at least one key-value pair, and the reduction function being configured to perform a reduction calculation on the key-value pair, the reduction function quantity determination device comprising:
the obtaining module is used for obtaining the number of the key value pairs output by the mapping function within a first preset time period;
a first determining module for determining a target time for the reduction function to process one of the key-value pairs;
the second determining module is used for determining the key value pair processing number of the reduction function in a second preset time period based on the target time;
and a third determining module, configured to determine the target number of the reduction function based on the number of key-value pairs output by the mapping function and the number of key-value pair processes of the reduction function.
7. The reduction function quantity determination apparatus according to claim 6, wherein the obtaining module includes:
a first obtaining unit, configured to obtain, in a third preset time period, the number of file blocks input to the mapping function and the number of key value pairs output by the mapping function;
a first determining unit, configured to determine an input-output ratio of the mapping function based on the number of file blocks input to the mapping function and the number of key-value pairs output by the mapping function;
a second obtaining unit, configured to obtain the number of file blocks input to the mapping function within the first preset time period;
a second determining unit, configured to determine that a product of the number of file blocks input to the mapping function in the first preset time period and an input-output ratio of the mapping function is the number of key-value pairs output by the mapping function in the first preset time period.
8. The reduction function quantity determination apparatus according to claim 6, wherein the first determination module includes:
a third obtaining unit, configured to obtain an attribute identifier of the key-value pair;
a third determining unit, configured to, when the attribute identifier is the first-class attribute identifier, obtain historical processing time of the key-value pair within a fourth preset time period, and determine that an average value of the historical processing time is a target time of the key-value pair;
and a fourth determining unit, configured to, when the attribute identifier is a second-type attribute identifier, split the key-value pair into a plurality of sub-data, obtain a processing time for each reduction function to process each sub-data, and determine that a sum of the processing times is a target time for the reduction function to process one key-value pair.
9. The reduction function number determination apparatus according to any one of claims 6 to 8, wherein the second determination module includes a fifth determination unit, the third determination module includes a sixth determination unit,
the fifth determining unit is configured to determine that a quotient of the second preset time period and the target time is a key-value pair processing number of the reduction function in the second preset time period;
the sixth determining unit is configured to determine that a quotient of the number of key-value pairs output by the mapping function and the number of key-value pair manipulations of the reduction function is a target number of the reduction function.
10. A reduction function quantity determination system comprising a reduction function quantity determination apparatus according to any one of claims 6 to 9.
CN201910171361.5A 2019-03-07 2019-03-07 Reduction function quantity determination method, device and system Active CN109901931B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910171361.5A CN109901931B (en) 2019-03-07 2019-03-07 Reduction function quantity determination method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910171361.5A CN109901931B (en) 2019-03-07 2019-03-07 Reduction function quantity determination method, device and system

Publications (2)

Publication Number Publication Date
CN109901931A CN109901931A (en) 2019-06-18
CN109901931B true CN109901931B (en) 2021-06-15

Family

ID=66946617

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910171361.5A Active CN109901931B (en) 2019-03-07 2019-03-07 Reduction function quantity determination method, device and system

Country Status (1)

Country Link
CN (1) CN109901931B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113722071A (en) * 2021-09-10 2021-11-30 拉卡拉支付股份有限公司 Data processing method, data processing apparatus, electronic device, storage medium, and program product

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015191428A (en) * 2014-03-28 2015-11-02 日本電信電話株式会社 Distributed data processing apparatus, distributed data processing method, and distributed data processing program
CN107038072A (en) * 2016-02-03 2017-08-11 博雅网络游戏开发(深圳)有限公司 Method for scheduling task and device based on Hadoop system
CN108595268A (en) * 2018-04-24 2018-09-28 咪咕文化科技有限公司 A kind of data distributing method, device and computer readable storage medium based on MapReduce
CN109324898A (en) * 2018-08-27 2019-02-12 北京奇虎科技有限公司 A kind of method for processing business and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102799486B (en) * 2012-06-18 2014-11-26 北京大学 Data sampling and partitioning method for MapReduce system
US20150039667A1 (en) * 2013-08-02 2015-02-05 Linkedin Corporation Incremental processing on data intensive distributed applications
CN104298550B (en) * 2014-10-09 2017-11-14 南通大学 A kind of dynamic dispatching method towards Hadoop
CN105577438B (en) * 2015-12-22 2018-09-28 桂林电子科技大学 A kind of network flow body constructing method based on MapReduce

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2015191428A (en) * 2014-03-28 2015-11-02 日本電信電話株式会社 Distributed data processing apparatus, distributed data processing method, and distributed data processing program
CN107038072A (en) * 2016-02-03 2017-08-11 博雅网络游戏开发(深圳)有限公司 Method for scheduling task and device based on Hadoop system
CN108595268A (en) * 2018-04-24 2018-09-28 咪咕文化科技有限公司 A kind of data distributing method, device and computer readable storage medium based on MapReduce
CN109324898A (en) * 2018-08-27 2019-02-12 北京奇虎科技有限公司 A kind of method for processing business and system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Hadoop-MapReduce;_1990;《https://www.cnblogs.com/wxd0108/p/7156223.html》;20170712;第1-6页 *
基于归约函数数量裁减的彩虹表技术改进;王小鉴 等;《计算机工程》;20130715;第39卷(第7期);第156-160、第164页 *

Also Published As

Publication number Publication date
CN109901931A (en) 2019-06-18

Similar Documents

Publication Publication Date Title
CN108270805B (en) Resource allocation method and device for data processing
CN109189572B (en) Resource estimation method and system, electronic equipment and storage medium
CN109840248B (en) Operation flow optimization method and device and storage medium
CN104679590A (en) Map optimization method and device in distributive calculating system
CN105488134A (en) Big data processing method and big data processing device
CN105022807A (en) Information recommendation method and apparatus
CN102394833A (en) Adaptively selecting electronic message scanning rules
CN116185588A (en) Task scheduling method and device, electronic equipment and readable storage medium
CN109901931B (en) Reduction function quantity determination method, device and system
CN109412865B (en) Virtual network resource allocation method, system and electronic equipment
CN109544347B (en) Tail difference distribution method, computer readable storage medium and tail difference distribution system
CN107784195A (en) Data processing method and device
US20020077791A1 (en) Method and apparatus for computing data storage assignments
CN111813535A (en) Resource configuration determining method and device and electronic equipment
JP2005128866A (en) Computer unit and method for controlling computer unit
CN113568759A (en) Cloud computing-based big data processing method and system
CN110413393B (en) Cluster resource management method and device, computer cluster and readable storage medium
CN110851282A (en) Distributed data calculation method and system based on memory grid
CN114546652A (en) Parameter estimation method and device and electronic equipment
US11755376B2 (en) Automatic assignment of hardware/software resources to different entities using machine learning based on determined scores for assignment solutions
CN112148470B (en) Parameter synchronization method, computer device and readable storage medium
CN108564135B (en) Method for constructing framework program and realizing high-performance computing program running time prediction
CN115080197A (en) Computing task scheduling method and device, electronic equipment and storage medium
CN112835931A (en) Method and device for determining data acquisition frequency
US20200195550A1 (en) Tree structure-based smart inter-computing routing model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant