CN109324898A

CN109324898A - A kind of method for processing business and system

Info

Publication number: CN109324898A
Application number: CN201810983481.0A
Authority: CN
Inventors: 董涛; 卜云涛
Original assignee: Beijing Qihoo Technology Co Ltd
Current assignee: Beijing Qihoo Technology Co Ltd
Priority date: 2018-08-27
Filing date: 2018-08-27
Publication date: 2019-02-12
Anticipated expiration: 2038-08-27
Also published as: CN109324898B

Abstract

The invention discloses a kind of method for processing business and systems, are classified by treating processing business, so that it is determined that the corresponding a kind of or multiclass product version of the business to be processed out；Then the data volume of each product version of the business to be processed is estimated, since the data volume of each product version is exactly the data volume of subsequent Reduce reduction task processing, so can be based on the data volume of each product version of the business to be processed, apply for the Reduce reduction task of corresponding number, it can be avoided the problem of resource allocation unevenness caused by applying for excessive or very few Reduce reduction task, achieve the purpose that reasonably distributing Reduce reduction task handles business to be processed, the finally Reduce reduction task based on the corresponding number, distributed treatment is carried out to the business to be processed.

Description

A kind of method for processing business and system

Technical field

This application involves distributed technical field more particularly to a kind of method for processing business and systems.

Background technique

In distributed system infrastructure, core design is exactly: HDFS and MapReduce.HDFS is the data of magnanimity Storage is provided, then MapReduce (mapping reduction) provides calculating for the data of magnanimity.

And although distributed system infrastructure has the computing resource of magnanimity, but the task in a large amount of map (mapping) stages Eventually it is pooled to a small amount of reduce (reduction) calculation stages.

If the task of mapping calculation and the task amount of reduction calculation stages mismatch, it will cause the wastes of resource.Example If reduce task is very little, and the task in map stage is too many, then toward reduce operation time in stage will be very long, or even meeting Because resource occupation leads to greatly very much memory improper use, task operation failure is eventually led to.And if reduce task is too many, just It will lead to the wasting of resources.

So the problem of how reasonably distributing the resource of MapReduce, being current urgent need to resolve.

Summary of the invention

The present invention provides a kind of method for processing business and systems, are provided with solution or part solution MapReduce stage The technical issues of source is distributed.

In order to solve the above technical problems, the present invention provides a kind of method for processing business, which comprises

It treats processing business to classify, determines the corresponding a kind of or multiclass product version of the business to be processed；

Estimate the data volume of each product version of the business to be processed；

The data volume of each product version based on the business to be processed applies for the Reduce reduction task of corresponding number；

Reduce reduction task based on the corresponding number carries out distributed treatment to the business to be processed.

Preferably, sorting parameter includes: log category, diary service ID, product version；

The processing business for the treatment of is classified, so that it is determined that the corresponding a kind of or multiclass of the business to be processed produces out Product version, specifically includes:

Classify to the business to be processed according to log category, obtains the first classification results in each log category；

Classify to the first classification results in each log category according to diary service ID, obtains in each diary service ID The second classification results；

Classify to the second classification results in each diary service ID according to product version, it is described to be processed to determine The corresponding a kind of or multiclass product version of business.

Preferably, the data volume of each product version based on the business to be processed, applies for corresponding number Reduce reduction task, specifically includes:

The data volume of each product version based on the business to be processed, determines the Reduce reduction task Quantity to be applied；

Based on the quantity to be applied of the Reduce reduction task, apply for the Reduce reduction task of corresponding number.

Preferably, the data volume of each product version based on the business to be processed, determines that the Reduce returns The about quantity to be applied of task, specifically includes:

Judge whether the data volume of each product version of the business to be processed is greater than preset data amount threshold value；

Appoint if so, distributing corresponding Reduce reduction to the first product version for being greater than the preset data amount threshold value Business；

If it is not, the second more than two classes or two classes that are less than preset data amount threshold value product versions is combined with each other For third product version, corresponding Reduce reduction task is distributed to the third product version；Wherein, the third product version This data volume and the difference of the preset data amount threshold value are in a preset range；

It is corresponding based on the corresponding Reduce reduction task of first product version and the third product version Reduce reduction task obtains each product version of the business to be processed and the mapping relations of Reduce reduction task；

Quantity and the third product version for counting the corresponding Reduce reduction task of the first product version are corresponding The quantity of Reduce reduction task determines the quantity to be applied of the Reduce reduction task.

Preferably, the preset data amount threshold value obtains as follows: according to the resource threshold of Reduce reduction task Value determines the data volume that single Reduce reduction task is capable of handling, and the single Reduce reduction task is capable of handling Data volume is determined as the preset data amount threshold value.

Preferably, the Reduce reduction task based on the corresponding number, is distributed the business to be processed Formula processing, specifically includes:

It is that multiple subtasks input Map frame by the delineation of activities to be processed, carries out mapping calculation processing respectively, obtain With the intermediate data set of the multiple subtask corresponding number；

The intermediate data set is subjected to classification processing, and then obtains each mediant in the intermediate data set According to respective a kind of or multiclass product version；Wherein, each product version in the intermediate data set and described to be processed Each product version corresponds in business；

The mapping relations of each product version and Reduce reduction task based on the business to be processed, by the mediant Reduction calculation processing is carried out according to the Reduce reduction task of the corresponding distribution of each product version input of set.

Preferably, described that the intermediate data set is subjected to classification processing, it specifically includes:

Classify to each intermediate data in the intermediate data set according to log category, obtains each log category In third classification results；

Classify to the third classification results in each log category according to diary service ID, obtains in each diary service ID The 4th classification results；

Classify to the 4th classification results in each diary service ID according to product version, to determine the mediant According to the corresponding a kind of or multiclass product version of each intermediate data in set.

Preferably, the data volume of each product version of the business to be processed is: each product version of business to be processed Amount of access.

Another aspect of the present invention discloses a kind of transaction processing system, the system comprises:

First categorization module is classified for treating processing business, determines the corresponding one kind of the business to be processed Or multiclass product version；

Module is estimated, the data volume of each product version for estimating the business to be processed；

Apply for that module applies for corresponding number for the data volume of each product version based on the business to be processed Reduce reduction task；

Processing module divides the business to be processed for the Reduce reduction task based on the corresponding number Cloth processing.

First categorization module, specifically includes:

First classification submodule obtains each log class for classifying to the business to be processed according to log category The first classification results in not；

Second classification submodule, for dividing according to diary service ID the first classification results in each log category Class obtains the second classification results in each diary service ID；

Third classification submodule, for dividing according to product version the second classification results in each diary service ID Class, to determine the corresponding a kind of or multiclass product version of the business to be processed.

Preferably, the application module, specifically includes:

Determining module is determined described for the data volume of each product version based on the business to be processed The quantity to be applied of Reduce reduction task；

Application submodule applies for corresponding number for the quantity to be applied based on the Reduce reduction task Reduce reduction task.

Preferably, the determining module, specifically includes:

Judgment module, for judging whether the data volume of each product version of the business to be processed is greater than preset data amount Threshold value；

First distribution module, for if so, to the first product version distribution pair greater than the preset data amount threshold value The Reduce reduction task answered；

Second distribution module, for if it is not, by more than two classes or two classes that are less than the preset data amount threshold value the It is third product version that two product versions, which are combined with each other, distributes corresponding Reduce reduction task to the third product version； Wherein, the difference of the data volume and the preset data amount threshold value of the third product version is in a preset range；

Module is obtained, for being based on the corresponding Reduce reduction task of first product version and the third product version The mapping of this corresponding Reduce reduction task, each product version and Reduce reduction task that obtain the business to be processed is closed System；

Statistical module, for count the corresponding Reduce reduction task of the first product version quantity and the third product The quantity of the corresponding Reduce reduction task of version, determines the quantity to be applied of the Reduce reduction task.

Preferably, the processing module, specifically includes:

Mapping block, for being that multiple subtasks input Map frame maps respectively by the delineation of activities to be processed Calculation processing obtains the intermediate data set with the multiple subtask corresponding number；

Second categorization module for the intermediate data set to be carried out classification processing, and then obtains the intermediate data The respective a kind of or multiclass product version of each intermediate data in set；Wherein, each production in the intermediate data set Each product version corresponds in product version and the business to be processed；

Reduction module, the mapping for each product version based on the business to be processed and Reduce reduction task are closed System carries out the Reduce reduction task of the corresponding distribution of each product version input of the intermediate data set at reduction calculating Reason.

Preferably, second categorization module, specifically includes:

4th classification submodule, for being carried out to each intermediate data in the intermediate data set according to log category Classification, obtains the third classification results in each log category；

5th classification submodule, for dividing according to diary service ID the third classification results in each log category Class obtains the 4th classification results in each diary service ID；

6th classification submodule, for dividing according to product version the 4th classification results in each diary service ID Class, to determine the corresponding a kind of or multiclass product version of each intermediate data in the intermediate data set.

Preferably, the data volume of each product version of the business to be processed is the visit of each product version of business to be processed The amount of asking.

The invention discloses a kind of computer readable storage mediums, are stored thereon with computer program, and the program is processed The step of above method is realized when device executes.

The invention discloses a kind of computer equipment, including memory, processor and storage on a memory and can located The step of computer program run on reason device, the processor realizes the above method when executing described program.

One or more technical solution through the invention, the invention has the advantages that advantage:

The invention discloses a kind of method for processing business and systems, are classified by treating processing business, so that it is determined that The corresponding a kind of or multiclass product version of the business to be processed out；Then each product version of the business to be processed is estimated Data volume, since the data volume of each product version is exactly the data volume of subsequent Reduce reduction task processing, so can be based on The data volume of each product version of the business to be processed applies for the Reduce reduction task of corresponding number, can be avoided application The problem of resource allocation unevenness caused by excessive or very few Reduce reduction task reaches reasonable distribution Reduce reduction Task handles the purpose of business to be processed, finally the Reduce reduction task based on the corresponding number, to the industry to be processed Business carries out distributed treatment.

The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And it can be implemented in accordance with the contents of the specification, and in order to allow above and other objects of the present invention, feature and advantage can It is clearer and more comprehensible, the followings are specific embodiments of the present invention.

Detailed description of the invention

By reading the following detailed description of the preferred embodiment, various other advantages and benefits are common for this field Technical staff will become clear.The drawings are only for the purpose of illustrating a preferred embodiment, and is not considered as to the present invention Limitation.And throughout the drawings, the same reference numbers will be used to refer to the same parts.In the accompanying drawings:

Fig. 1 shows the flow chart of method for processing business according to an embodiment of the invention；

Fig. 2 shows the crossing number schematic diagrames after business classification to be processed according to an embodiment of the invention；

Fig. 3 shows the schematic diagram of transaction processing system according to an embodiment of the invention.

Specific embodiment

Exemplary embodiments of the present disclosure are described in more detail below with reference to accompanying drawings.Although showing the disclosure in attached drawing Exemplary embodiment, it being understood, however, that may be realized in various forms the disclosure without should be by embodiments set forth here It is limited.On the contrary, these embodiments are provided to facilitate a more thoroughly understanding of the present invention, and can be by the scope of the present disclosure It is fully disclosed to those skilled in the art.

The embodiment of the invention provides a kind of method for processing business and system, to solve the wasting of resources of the prior art Technical problem.

Referring to Fig. 1, the embodiment of the invention discloses a kind of method for processing business, this method comprises:

Step 11, it treats processing business to classify, determines that the corresponding a kind of or multiclass of the business to be processed produces Product version.

The sorting parameter of the present embodiment generally comprises: log category, diary service ID, product version.Log category is 1 grade Classification, diary service ID are in the lower section of log category, belong to 2 grades of classification, and product version belongs to the lower section of diary service ID, belong to Classify in 3 grades.So during treating processing business and being classified, first to the business to be processed according to log category Classify, obtains the first classification results in each log category, that is to say, that there are multiple log categories in business to be processed, Such as in a business to be processed include log category 1 and log category 2, so can classify according to log category, so The first classification results in every class log category are obtained afterwards.And in every class log category, it is possible to include multiple log industry It is engaged in ID, such as log category 1 including ID1 and ID2, so in each log category, to first point in each log category Class result is classified according to diary service ID, obtains the second classification results in each diary service ID.Further, Mei Ge Include again a kind of or multiclass product version in will traffic ID, such as includes product version 1 and product version 2 in ID1, therefore And again in each diary service ID, classify to the second classification results in each diary service ID according to product version, with true Make the corresponding a kind of or multiclass product version of the business to be processed.

It can be seen that business to be processed was actually made of a kind of or multiclass product version, so in classification Afterwards, business to be processed is eventually categorized into a kind of or multiclass product version.And in classification processing, then it is according to " log Classification, diary service ID, product version " successively classifies, the purpose done so, be because different log category centainly not On the same reduce task (reduce task is processed towards different classes of log), but different product versions is It is possible that in identical reduce task, so must be according to log category, diary service ID, product version successively carries out It divides, and division cannot be optionally combined.

The present embodiment includes 18 log categories, such as: activity, activityTimes, net, netReceive、netLinkTime、netErrcode、netTimes、memory、monitor、fileinfo、func、io、 Processinfo, anr, block, cpu, fps, netMidSend, netMidReceive etc..

Each log category understands one or more diary service ID again, such as: mobilesafe, clean_droid, Ganme_union etc. can be increasing according to increasing for access product.

Each diary service ID has one or more product version, iteration of each product with product, version again Number it can be incremented by.

Please refer to table 1 below, in order to it is being better described classification as a result, the present embodiment according to above-mentioned log category, day Will traffic ID, the specific example that product version is classified.

Seen from table 1, a business to be processed of the present embodiment finally contains 4 class product versions, is respectively as follows: product version This 1, product version 2, product version 3, product version 4.Each product version has respective quantity.

Table 1

By above-mentioned table 1 it is found that actually " log category ", " diary service ID " is mainly classification, when according to " log Classification " will not calculate data volume after " diary service ID " classification, only be divided into after product version, just can calculating task Amount.So " log category ", " diary service ID " is for distinguishing use, and a similar multiway tree, is effectively exactly leaf Node (product version).

It is the schematic diagram for the Cross-Tree for forming the sum of delineation of activities to be processed referring to Fig. 2.It is worth noting that, in Fig. 2 Cross-Tree be intended merely to it is more intuitive explanation the present embodiment in business to be processed classification, do not do other limitations, The sorted form of expression is in addition to there are also many forms, such as list, set etc. form for crossing number.

After the one kind or multiclass product version for determining business to be processed, then following step can be executed.

Step 12, the data volume of each product version of the business to be processed is estimated.

After the one kind or multiclass product version for determining business to be processed, all have respectively for every class product version From data volume, such as product version 1 in table 1 have altogether two, the sum of data volume is 5M, product version 2 is 10M etc..This In the data volume of product version be to be calculated according to classification, rather than number.

The data volume of each product version of the business to be processed is the amount of access of each product version of business to be processed.

Step 13, the data volume of each product version based on the business to be processed, applies for that the Reduce of corresponding number returns About task.

It, first can be based on each product of the business to be processed during the present embodiment application Reduce reduction task The data volume of version determines the quantity to be applied of the Reduce reduction task；It is then based on the Reduce reduction task Quantity to be applied, apply for the Reduce reduction task of corresponding number.

Specifically, the quantity to be applied of the data volume of each product version of business to be processed and Reduce reduction task has There is corresponding relationship, during determining the relationship of the two, can carry out in the following way:

Judge whether the data volume of each product version of the business to be processed is greater than preset data amount threshold value.Preset data Measuring threshold value is calculated according to the resource threshold of Reduce reduction task, and preset data amount threshold value is single Reduce reduction The data volume that task is capable of handling.The preset data amount threshold value obtains as follows: according to the money of Reduce reduction task Source threshold value determines the data volume that single Reduce reduction task is capable of handling, and the single Reduce reduction task can be located The data volume of reason is determined as the preset data amount threshold value.Determine data volume that single Reduce reduction task is capable of handling it Afterwards, then the data volume that can be capable of handling based on the single Reduce reduction task compares each product version of business to be processed one by one Data volume, judge whether the data volume of each product version of the business to be processed is greater than preset data amount threshold value.

Appoint if so, distributing corresponding Reduce reduction to the first product version for being greater than the preset data amount threshold value Business.Specifically, if in each product version of business to be processed, the data volume of certain class product version is greater than preset data amount threshold Value, then be named as the first product version for such product version, so the first product version refers to that data volume is greater than present count According to single class product version of amount threshold value.

If it is not, the second more than two classes or two classes that are less than preset data amount threshold value product versions is combined with each other For third product version, corresponding Reduce reduction task is distributed to the third product version.Specifically, the second product version Originally refer to that data volume is less than single class product version of preset data amount threshold value, since its data volume is smaller, if being individually for the Two product versions distribute a Reduce reduction task, will lead to the wasting of resources of Reduce reduction task, so can be by two classes Or two the second more than class product versions is combined, group is combined into third product version, so third product version refers to Data volume is less than two classes of preset data amount threshold value or the set of the second more than two classes product versions, in turn, after combination To third product version data volume be exactly two classes or two classes or more the second product version the sum of data volume.And To after third product version, then corresponding Reduce reduction task can be distributed to third product version, further, to the Three product versions distribute in the specific assigning process of corresponding Reduce reduction task, first judge the number of the third product version It whether is in a preset range according to the difference of amount and preset data amount threshold value, is corresponded to if so, can distribute third product version Reduce reduction task.Wherein, the data volume of the third product version and the difference of the preset data amount threshold value are in In one preset range.The difference of the data volume of third product version and preset data amount threshold value is limited within preset range, The reason of just distributing corresponding Reduce reduction task to third product version is to guarantee the data volume of third product version and pre- If the data volume difference very little of both data-quantity thresholds, and then can reasonably apply for Reduce reduction task.It avoids the occurrence of Difference is too big, and the unmatched situation of the treating capacity and third product version of the Reduce reduction task of application occurs.If third produces The data volume of product version is excessive compared to for preset data amount threshold value, then the processing time of Reduce reduction task can be very long, If the data volume of third product version is too small compared to for preset data amount threshold value, then will lead to the money of Reduce reduction task Source waste.So the difference of the data volume and preset data amount threshold value that need to guarantee third product version is within preset range, Such as [- 3M, 3M].

And corresponding Reduce reduction task is being assigned with to the first product version, and be assigned with to third product version It, then can be based on the corresponding Reduce reduction task of first product version and described the after corresponding Reduce reduction task The corresponding Reduce reduction task of three product versions obtains each product version and Reduce reduction task of the business to be processed Mapping relations.Specifically, (return it is possible that two classes or multiclass correspond to an identical Reduce to every class product version About task) it is assigned with after corresponding Reduce reduction task, every class product version can all correspond to respective Reduce reduction and appoint Business, and then each product version of business to be processed and the mapping relations of Reduce reduction task can be obtained.

Referring to table 2, it is assumed that have 4 class product versions, respectively product version 1, product version 2, product in business to be processed Version 3, product version 4.And product version 1 and product version 2 respectively correspond to a Reduce reduction task, respectively Reduce Reduction task 1 and Reduce reduction task 2.And product version 3 and product version 4 respectively correspond to the same Reduce reduction and appoint Business, it is assumed that number is Reduce reduction task 3.

Table 2

First product version	Product version 1	Reduce reduction task 1
			First product version	Product version 2	Reduce reduction task 2
Third product version	Product version 3, product version 4	Reduce reduction task 3

As a kind of optional embodiment, corresponding Reduce reduction task is being assigned with to the first product version, and After being assigned with corresponding Reduce reduction task to third product version, then counting the corresponding Reduce of the first product version The quantity of the quantity of reduction task and the corresponding Reduce reduction task of the third product version, it will be able to determine described The quantity to be applied of Reduce reduction task.Referring to table 2, according to the quantity of the corresponding Reduce reduction task of each product version, The quantity to be applied of statistics available Reduce reduction task out is 3.

Step 14, the Reduce reduction task based on the corresponding number carries out at distribution the business to be processed Reason.

The step of distributed treatment, generally comprises: classification；Map mapping processing；Reduce reduction process.So the present embodiment Distributed treatment, it is general as follows:

It is that multiple subtasks input Map frame carries out mapping calculation processing respectively by the delineation of activities to be processed, obtains With the intermediate data set of the multiple subtask corresponding number.Wherein, business to be processed can random division be multiple subtasks, Such as data first can be divided into multiple the first key-value pairs of key/value by MapReduce.Then input Map frame comes To (new key/value key-value pair) the second key-value pair, the second key-value pair is exactly the intermediate data of the present embodiment.Multiple Map reflect It penetrates after processing in intermediate data set, one or more intermediate data is contained, after each Map frame mapping processing An intermediate data will be obtained, what is obtained after each Map frame mapping processing is exactly intermediate data set.

Further, the intermediate data set can be carried out to classification processing, and then obtained in the intermediate data set Each intermediate data is respective a kind of or multiclass product version.

It is also all according to " log category, diary service ID are produced for each intermediate data during classification Product version " is classified.

Specifically, respectively classification is carried out according to log category to each intermediate data in the intermediate data set to obtain Obtain the third classification results in each log category.That is, each intermediate data includes one or more log class Not, so for each intermediate data, all can respectively classify according to log category, that is to say, that each intermediate data Classification it is all not related with the classification of other intermediate data, respectively handle.Such as intermediate data A includes that there are two logs Classification, respectively log category A1 and log category A2.After then by intermediate data A according to log category, then each day can be obtained Third classification results in will classification, that is, it is categorized into log category A1 and log category A2.There are three intermediate data B includes Log category, respectively log category B1, log category B2, log category B3.Then by intermediate data B according to log category it Afterwards, then the third classification results in each log category can be obtained, that is, are categorized into log category B1, log category B2, log Classification B3.

It is not related between each other due to being independent of each other to the processing between each intermediate data, so to each day Will classification, which is handled, to be also independent from each other.Third classification results in each log category are divided according to diary service ID Class obtains the 4th classification results in each diary service ID.

Then classify to the 4th classification results in each diary service ID according to product version, to determine in described Between the corresponding a kind of or multiclass product version of each intermediate data in data acquisition system.

As can be seen from the above description, by taking an intermediate data A as an example, classify to intermediate data A according to log category, Obtain the third classification results in each log category；Third classification results in each log category are carried out according to diary service ID Classification, obtains the 4th classification results in each diary service ID；To the 4th classification results in each diary service ID according to product Version is classified, to determine the corresponding a kind of or multiclass product version of intermediate data A.

And the processing of each intermediate data is mutually indepedent with the processing of other intermediate data, so for each mediant According to can all be handled according to the method described above.

But since intermediate data is actually business progress Map mapping processing acquisition later to be processed, and due to The mode of classification is identical, so each product version and the business to be processed after sorted, in the intermediate data set In each product version it is actually one-to-one.Intermediate data actually only treats processing business and carries out the mapping of Map frame The data obtained after processing, so if the mode of classification is identical, then obtained product version and business to be processed directly divide The product version that class obtains is the same.In addition, as each product version with the mapping relations of Reduce reduction task is also, So can each product version based on the business to be processed and Reduce reduction task mapping relations, by the intermediate data The Reduce reduction task of the corresponding distribution of each product version input of set carries out reduction calculation processing.

In the present embodiment, it by precalculating the data volume of each product version of business to be processed, and then determines to need The quantity of the Reduce reduction task to be applied reasonably applies for Reduce reduction task, and it is excessive to can be avoided application Or the problem of resource allocation unevenness caused by very few Reduce reduction task, reach reasonable distribution Reduce reduction task Handle the purpose of business to be processed.

In addition, business to be processed is handled also with the Reduce reduction task of application during distributed treatment, It can be improved the efficiency of processing.

Based on the same inventive concept, referring to Fig. 3, the present embodiment also discloses a kind of transaction processing system, the system packet It includes:

First categorization module 31, classifies for treating processing business, so that it is determined that the business to be processed is corresponding out One kind or multiclass product version；

Module 32 is estimated, the data volume of each product version for estimating the business to be processed；

Apply for that module 33 applies for corresponding number for the data volume of each product version based on the business to be processed Reduce reduction task；

Processing module 34 carries out the business to be processed for the Reduce reduction task based on the corresponding number Distributed treatment.

As a kind of optional embodiment, sorting parameter includes: log category, diary service ID, product version；

First categorization module 31, specifically includes:

As a kind of optional embodiment, the application module 33 is specifically included:

As a kind of optional embodiment, the determining module is specifically included:

As a kind of optional embodiment, the preset data amount threshold value obtains as follows: returning according to Reduce The resource threshold of about task determines the data volume that single Reduce reduction task is capable of handling, by the single Reduce reduction The data volume that task is capable of handling is determined as the preset data amount threshold value.

As a kind of optional embodiment, the processing module 34 is specifically included:

As a kind of optional embodiment, second categorization module is specifically included:

As a kind of optional embodiment, the data volume of each product version of the business to be processed is business to be processed The amount of access of each product version.

Based on inventive concept same in previous embodiment, the embodiment of the present invention also provides a kind of computer-readable storage The step of medium is stored thereon with computer program, and any the method above is realized when which is executed by processor.

Based on inventive concept same in previous embodiment, the embodiment of the present invention also provides a kind of computer equipment, wraps The computer program that includes memory, processor and storage on a memory and can run on a processor, the processor execute The step of any the method above is realized when described program.

One or more embodiment through the invention, the invention has the advantages that advantage:

Algorithm and display are not inherently related to any particular computer, virtual system, or other device provided herein. Various general-purpose systems can also be used together with teachings based herein.As described above, it constructs required by this kind of system Structure be obvious.In addition, the present invention is also not directed to any particular programming language.It should be understood that can use various Programming language realizes summary of the invention described herein, and the description done above to language-specific is to disclose this hair Bright preferred forms.

In the instructions provided here, numerous specific details are set forth.It is to be appreciated, however, that implementation of the invention Example can be practiced without these specific details.In some instances, well known method, structure is not been shown in detail And technology, so as not to obscure the understanding of this specification.

Similarly, it should be understood that in order to simplify the disclosure and help to understand one or more of the various inventive aspects, Above in the description of exemplary embodiment of the present invention, each feature of the invention is grouped together into single implementation sometimes In example, figure or descriptions thereof.However, the disclosed method should not be interpreted as reflecting the following intention: i.e. required to protect Shield the present invention claims features more more than feature expressly recited in each claim.More precisely, as following Claims reflect as, inventive aspect is all features less than single embodiment disclosed above.Therefore, Thus the claims for following specific embodiment are expressly incorporated in the specific embodiment, wherein each claim itself All as a separate embodiment of the present invention.

Those skilled in the art will understand that can be carried out adaptively to the module in the equipment in embodiment Change and they are arranged in one or more devices different from this embodiment.It can be the module or list in embodiment Member or component are combined into a module or unit or component, and furthermore they can be divided into multiple submodule or subelement or Sub-component.Other than such feature and/or at least some of process or unit exclude each other, it can use any Combination is to all features disclosed in this specification (including adjoint claim, abstract and attached drawing) and so disclosed All process or units of what method or apparatus are combined.Unless expressly stated otherwise, this specification is (including adjoint power Benefit require, abstract and attached drawing) disclosed in each feature can carry out generation with an alternative feature that provides the same, equivalent, or similar purpose It replaces.

In addition, it will be appreciated by those of skill in the art that although some embodiments in this include institute in other embodiments Including certain features rather than other feature, but the combination of the feature of different embodiment means in the scope of the present invention Within and form different embodiments.For example, in the following claims, embodiment claimed it is any it One can in any combination mode come using.

Various component embodiments of the invention can be implemented in hardware, or to run on one or more processors Software module realize, or be implemented in a combination thereof.It will be understood by those of skill in the art that can be used in practice Microprocessor or digital signal processor (DSP) realize gateway according to an embodiment of the present invention, proxy server, in system Some or all components some or all functions.The present invention is also implemented as executing side as described herein Some or all device or device programs (for example, computer program and computer program product) of method.It is such It realizes that program of the invention can store on a computer-readable medium, or can have the shape of one or more signal Formula.Such signal can be downloaded from an internet website to obtain, and perhaps be provided on the carrier signal or with any other shape Formula provides.

It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and ability Field technique personnel can be designed alternative embodiment without departing from the scope of the appended claims.In the claims, Any reference symbol between parentheses should not be configured to limitations on claims.Word "comprising" does not exclude the presence of not Element or step listed in the claims.Word "a" or "an" located in front of the element does not exclude the presence of multiple such Element.The present invention can be by means of including the hardware of several different elements and being come by means of properly programmed computer real It is existing.In the unit claims listing several devices, several in these devices can be through the same hardware branch To embody.The use of word first, second, and third does not indicate any sequence.These words can be explained and be run after fame Claim.

The invention discloses A1, a kind of method for processing business, which is characterized in that the described method includes:

A2, method as described in a1, which is characterized in that sorting parameter includes: log category, diary service ID, product version This；

A3, method as described in a1, which is characterized in that the data of each product version based on the business to be processed Amount, applies for the Reduce reduction task of corresponding number, specifically includes:

A4, the method as described in A3, which is characterized in that the data of each product version based on the business to be processed Amount, determines the quantity to be applied of the Reduce reduction task, specifically includes:

A5, the method as described in A4, which is characterized in that the preset data amount threshold value obtains as follows: according to The resource threshold of Reduce reduction task determines the data volume that single Reduce reduction task is capable of handling, will be described single The data volume that Reduce reduction task is capable of handling is determined as the preset data amount threshold value.

A6, the method as described in A4, which is characterized in that the Reduce reduction task based on the corresponding number is right The business to be processed carries out distributed treatment, specifically includes:

A7, the method as described in A6, which is characterized in that it is described that the intermediate data set is subjected to classification processing, specifically Include:

A8, method as described in a1, which is characterized in that the data volume of each product version of the business to be processed is: to The amount of access of each product version of processing business.

B9, a kind of transaction processing system, which is characterized in that the system comprises:

B10, the system as described in B8, which is characterized in that sorting parameter includes: log category, diary service ID, product version This；

First categorization module, specifically includes:

B11, the system as described in B8, which is characterized in that the application module specifically includes:

B12, system as described in b11, which is characterized in that the determining module specifically includes:

B13, as described in B12 system, which is characterized in that the preset data amount threshold value obtains as follows: pressing Data volume that single Reduce reduction task is capable of handling is determined according to the resource threshold of Reduce reduction task, it will be described single The data volume that Reduce reduction task is capable of handling is determined as the preset data amount threshold value.

B14, as described in B12 system, which is characterized in that the processing module specifically includes:

B15, the system as described in B14, which is characterized in that second categorization module specifically includes:

B16, the system as described in B8, which is characterized in that the data volume of each product version of the business to be processed be to The amount of access of each product version of processing business.

C17, a kind of computer readable storage medium, are stored thereon with computer program, which is characterized in that the program is located Manage the step of any one of A1-A8 the method is realized when device executes.

C18, a kind of computer equipment, including memory, processor and storage can transport on a memory and on a processor Capable computer program, which is characterized in that the processor realizes the step of any one of A1-A8 the method when executing described program Suddenly.

Claims

1. a kind of method for processing business, which is characterized in that the described method includes:

2. the method as described in claim 1, which is characterized in that sorting parameter includes: log category, diary service ID, product Version；

The processing business for the treatment of is classified, so that it is determined that the corresponding a kind of or multiclass product version of the business to be processed out This, specifically includes:

Classify to the first classification results in each log category according to diary service ID, obtains in each diary service ID Two classification results；

Classify to the second classification results in each diary service ID according to product version, to determine the business to be processed Corresponding a kind of or multiclass product version.

3. the method as described in claim 1, which is characterized in that the number of each product version based on the business to be processed According to amount, applies for the Reduce reduction task of corresponding number, specifically includes:

The data volume of each product version based on the business to be processed, determine the Reduce reduction task to Shen It please quantity；

4. method as claimed in claim 3, which is characterized in that the number of each product version based on the business to be processed According to amount, determines the quantity to be applied of the Reduce reduction task, specifically includes:

If so, distributing corresponding Reduce reduction task to the first product version for being greater than the preset data amount threshold value；

If it is not, it is that the second more than two classes or two classes that are less than preset data amount threshold value product versions, which is combined with each other, Three product versions distribute corresponding Reduce reduction task to the third product version；Wherein, the third product version The difference of data volume and the preset data amount threshold value is in a preset range；

Returned based on the corresponding Reduce reduction task of first product version and the corresponding Reduce of the third product version About task obtains each product version of the business to be processed and the mapping relations of Reduce reduction task；

Count the quantity and the corresponding Reduce of the third product version of the corresponding Reduce reduction task of the first product version The quantity of reduction task determines the quantity to be applied of the Reduce reduction task.

5. method as claimed in claim 4, which is characterized in that the preset data amount threshold value obtains as follows: pressing Data volume that single Reduce reduction task is capable of handling is determined according to the resource threshold of Reduce reduction task, it will be described single The data volume that Reduce reduction task is capable of handling is determined as the preset data amount threshold value.

6. method as claimed in claim 4, which is characterized in that the Reduce reduction task based on the corresponding number, Distributed treatment is carried out to the business to be processed, is specifically included:

It is that multiple subtasks input Map frame by the delineation of activities to be processed, carries out mapping calculation processing, acquisition and institute respectively State the intermediate data set of multiple subtask corresponding numbers；

The intermediate data set is subjected to classification processing, and then each intermediate data obtained in the intermediate data set is each From one kind or multiclass product version；Wherein, each product version in the intermediate data set and the business to be processed In each product version correspond；

The mapping relations of each product version and Reduce reduction task based on the business to be processed, by the intermediate data set The Reduce reduction task for the corresponding distribution of each product version input closed carries out reduction calculation processing.

7. method as claimed in claim 6, which is characterized in that described that the intermediate data set is carried out classification processing, tool Body includes:

Respectively classify according to log category to each intermediate data in the intermediate data set, obtains each log category In third classification results；

Classify to the third classification results in each log category according to diary service ID, obtains in each diary service ID Four classification results；

Classify to the 4th classification results in each diary service ID according to product version, to determine the intermediate data set The corresponding a kind of or multiclass product version of each intermediate data in conjunction.

8. a kind of transaction processing system, which is characterized in that the system comprises:

First categorization module is classified for treating processing business, determine the business to be processed it is corresponding a kind of or Multiclass product version；

Apply for that module applies for the Reduce of corresponding number for the data volume of each product version based on the business to be processed Reduction task；

Processing module carries out the business to be processed distributed for the Reduce reduction task based on the corresponding number Processing.

9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The step of any one of claim 1-7 the method is realized when row.

10. a kind of computer equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes the step of any one of claim 1-7 the method when executing described program Suddenly.