CN113312381A

CN113312381A - Data processing method and device

Info

Publication number: CN113312381A
Application number: CN202010760370.0A
Authority: CN
Inventors: 马云雷
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2021-08-27

Abstract

The invention discloses a data processing method and a data processing device. Wherein, the method comprises the following steps: acquiring a query request; judging whether the data in the query condition in the query request is matched with the pre-stored intermediate data or not; if the judgment result is yes, acquiring corresponding data from the intermediate data, and determining the data as the query result; and under the condition that the judgment result is negative, acquiring the data again, obtaining intermediate data according to the number and the sum of the numerical values of the acquired data, and calculating the mean value of the data according to the intermediate data to obtain the query result corresponding to the query request. The invention solves the technical problems of high time delay and low efficiency caused by the fact that calculation needs to be carried out on a large amount of data in a distributed algorithm.

Description

Data processing method and device

Technical Field

The invention relates to the technical field of internet, in particular to a data processing method and device.

Background

For big data calculation, a Map Reduce algorithm is usually adopted, data needs to be read from a disk once calculation, calculation is performed on a large amount of data, delay is high, and for different requests, the query conditions may be the same, and the covered data ranges are overlapped.

Namely, in the related art: the original data are cached and read into the memory, so that the time for reading the data from the disk can be saved in the next reading, but the defect is that the read data still need to be recalculated; the big data calculation method comprises the following steps: and caching the summary result, and directly reading the cached result if the parameters are completely the same, the same section of data is hit and the calculation is the same when reading next time. But the drawback is that if the parameters change, a portion of the data is computed more, and the cache will fail.

Aiming at the problems of high time delay and low efficiency caused by the fact that calculation needs to be carried out on a large amount of data in the distributed algorithm, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a data processing method and a data processing device, which are used for at least solving the technical problems of high time delay and low efficiency caused by the fact that calculation needs to be carried out on a large amount of data in a distributed algorithm.

According to an aspect of an embodiment of the present invention, there is provided a data processing method including: acquiring a query request; judging whether the data in the query condition in the query request is matched with the pre-stored intermediate data or not; if the judgment result is yes, acquiring corresponding data from the intermediate data, and determining the data as the query result; and under the condition that the judgment result is negative, acquiring the data again, obtaining intermediate data according to the number and the sum of the numerical values of the acquired data, and calculating the mean value of the data according to the intermediate data to obtain the query result corresponding to the query request.

Optionally, the method further includes: before acquiring the query request, performing fragment storage on each group of data; respectively calculating the intermediate data of the data in each fragment; and acquiring the intermediate data of the data in each fragment, and performing mean value calculation on the intermediate data of the data in each fragment to obtain a mean value.

Further, optionally, the performing fragmented storage on each group of data includes: and acquiring each group of data from the first storage medium, and loading each group of data to the second storage medium.

Optionally, the respectively calculating the intermediate data of the data in each segment includes: under the condition that the third storage medium comprises each fragmented memory, acquiring corresponding data from each fragmented memory to obtain data in each fragment; acquiring each group of data from the data in each fragment; and obtaining the intermediate data of the data in each fragment according to the number of each group of data and the sum of the values of each data in each group of data.

Further, optionally, performing mean calculation on the intermediate data of the data in each segment, and obtaining the mean includes: storing the intermediate data of the data in each fragment to a fourth storage medium, wherein the fourth storage medium comprises a memory of the summary node; and calculating the numerical value in each group of data and the number of data in each group of data according to the intermediate data stored in the memory of the summary node to obtain the average value.

According to an aspect of an embodiment of the present invention, there is provided a data processing apparatus including: the request acquisition module is used for acquiring a query request; the judging module is used for judging whether the data in the query condition in the query request is matched with the pre-stored intermediate data or not; the first acquisition module is used for acquiring corresponding data from the intermediate data and determining the data as the query result under the condition that the judgment result is yes; and the second acquisition module is used for acquiring the data again under the condition that the judgment result is negative, acquiring intermediate data according to the number and the sum of the acquired data, and calculating the mean value of the data according to the intermediate data to acquire the query result corresponding to the query request.

Optionally, the apparatus further comprises: the storage module is used for carrying out fragment storage on each group of data before acquiring the query request; the first calculation module is used for calculating the intermediate data of the data in each fragment; and the second calculation module is used for acquiring the intermediate data of the data in each fragment and performing mean value calculation on the intermediate data of the data in each fragment to obtain a mean value.

Further, optionally, the first calculating module includes: the extracting unit is used for acquiring corresponding data from each fragmented memory to obtain data in each fragment under the condition that the third storage medium comprises each fragmented memory; the data acquisition unit is used for acquiring each group of data from the data in each fragment; and the computing unit is used for obtaining the intermediate data of the data in each fragment according to the number of each group of data and the numerical sum of each data in each group of data.

According to an aspect of the embodiments of the present invention, there is provided a storage medium, wherein the storage medium includes a stored program, and wherein, when the program runs, a device on which the storage medium is controlled to execute the above method.

According to an aspect of the embodiments of the present invention, there is provided a processor, where the processor is configured to execute a program, where the program executes to perform the above method.

In the embodiment of the invention, the query request is obtained; judging whether the data in the query condition in the query request is matched with the pre-stored intermediate data or not; if the judgment result is yes, acquiring corresponding data from the intermediate data, and determining the data as the query result; and under the condition that the judgment result is negative, acquiring the data again, obtaining intermediate data according to the number and the sum of the values of the acquired data, calculating the mean value of the data according to the intermediate data, and obtaining the query result corresponding to the query request, thereby achieving the purpose of effectively increasing the utilization rate of the cache, further achieving the technical effect of improving the calculation efficiency of the distributed algorithm, and further solving the technical problems of high time delay and low efficiency caused by the fact that calculation needs to be carried out on a large amount of data in the distributed algorithm.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware configuration of a computer terminal of a data processing method according to an embodiment of the present invention;

FIG. 2 is a flow chart of a data processing method according to a first embodiment of the invention;

FIG. 2a is a schematic diagram of a query scenario applied in a data processing method according to a first embodiment of the present invention;

FIG. 2b is a diagram illustrating data reading and mean value calculation in a data processing method according to a first embodiment of the present invention;

FIG. 2c is a schematic diagram illustrating a time ratio of each step in the data processing method according to the first embodiment of the present invention;

FIG. 2d is a diagram illustrating a mean value calculation in a data processing method according to a first embodiment of the present invention;

FIG. 2e is a schematic diagram illustrating a time ratio of each step in the data processing method according to the first embodiment of the present invention;

fig. 3 is a schematic diagram of a data processing apparatus according to a second embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The technical terms related to the present application are:

MapReduce is a distributed computing algorithm, which performs primary computation (Map) on a plurality of machines and transmits the primary computation result to one machine or transmits the primary computation result to a plurality of machines according to hash to perform summary computation (Reduce).

And (3) polymerization calculation: the summary calculation performed for the data is input as a plurality of pieces of data and output as a piece of data. Corresponding to this is a conversion calculation which calculates each piece of data and outputs a result.

Raw data: map input data.

Intermediate results: the calculation result of the Map phase is called an intermediate result, and the intermediate result is simultaneously the input of Reduce.

Summarizing the results: the calculation results of the Reduce phase are called summary results.

Caching: the intermediate result is cached so as to facilitate the next calculation of the used data.

Example 1

There is also provided, in accordance with an embodiment of the present invention, a method embodiment of a data processing method, it being noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than that presented herein.

The method provided by the first embodiment of the present application may be executed in a mobile terminal, a computer terminal, or a similar computing device. Taking the example of being operated on a computer terminal, fig. 1 is a hardware structure block diagram of a computer terminal of a data processing method according to an embodiment of the present invention. As shown in fig. 1, the computer terminal 10 may include one or more (only one shown) processors 102 (the processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA), a memory 104 for storing data, and a transmission module 106 for communication functions. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the data processing method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by executing the software programs and modules stored in the memory 104, that is, implementing the data processing method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 can be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

Under the operating environment, the application provides a data processing method as shown in fig. 2. Fig. 2 is a flowchart of a data processing method according to a first embodiment of the present invention. The data processing method provided by the embodiment of the application comprises the following steps:

step S202, acquiring a query request;

step S204, judging whether the data in the query condition in the query request is matched with the pre-stored intermediate data;

step S206, acquiring corresponding data from the intermediate data and determining the data as the query result under the condition that the judgment result is yes;

and step S208, under the condition that the judgment result is negative, acquiring data again, obtaining intermediate data according to the number and the sum of the values of the acquired data, and calculating the mean value of the data according to the intermediate data to obtain the query result corresponding to the query request.

Specifically, with reference to step S202 to step S208, the data processing method provided in the embodiment of the present application may be applied to a distributed storage system, and is particularly applied to a scenario where mass data is processed.

When the cache is stored, the calculation and the storage are carried out by taking the fragments as units. In subsequent queries, it is first necessary to determine which of the complete fragments hit in the query is: directly reading out an intermediate result cache from a cache system aiming at the complete matching fragments; for the fragments which are not completely matched, the original data needs to be read again for calculation.

In the case of fig. 2a, the first slice only needs to calculate two numbers, so

data

2 and 3 need to be read, and sum 2+ 3-5 needs to be calculated again; the number of counts is 2, and (sum-5, count-2) is obtained. The 2 nd and 3 rd fragments hit the cache completely, and the results (sum 15 and count 3) are directly read from the cache system, and (sum 24 and count 3) are sent to the final node for aggregation, and the average value at this time is 5.5 (5+15+24)/(2+3+ 3). In the calculation, only part of data of the first fragment needs to be read and calculated again. The io and the computing resources are greatly saved.

Optionally, the data processing method provided in the embodiment of the present application further includes:

step S196, before the query request is acquired, performing fragment storage on each group of data;

the acquired data is divided into a plurality of segments (i.e., "the acquired data is grouped" in this embodiment of the application), so that the data of each segment is obtained.

Specifically, taking 9 data as an example, the obtained data are respectively "1, 2,3, 4,5,6, 7,8, 9", and the 9 data are grouped to obtain: a first group: "1, 2, 3"; second group: "4, 5, 6"; third group: "7,8,9".

Step S198, respectively calculating the intermediate data of the data in each fragment;

based on each group of data obtained in step S196, obtaining intermediate data according to the number of each group of data and the sum of the values of each data in each group of data, that is, in the embodiment of the present application, the data structure of the intermediate data in each group may include { sum, count }; for example, taking the first group of data "1, 2, 3" as an example, the intermediate data of the first group of data is { sum ═ 6, count ═ 3}, and the intermediate data indicates that the number of the first group of data is 3, and the sum of the numerical values is 6.

And obtaining intermediate data of the second group of data and the third group of data in the same way.

Step S200, acquiring the intermediate data of the data in each fragment, and performing mean value calculation on the intermediate data of the data in each fragment to obtain a mean value.

Based on the obtained intermediate data of each group of data, mean value calculation is performed according to the intermediate data, so that a mean value corresponding to 9 data in step S202 can be obtained.

For example, the first, second and third sets of data correspond to intermediate data as follows:

a first group:

the first set of data: "1, 2, 3";

the intermediate data { sum ═ 6, count ═ 3} of the first set of data;

second group:

the second set of data: "4, 5, 6";

intermediate data of the second set of data: { sum ═ 15, count ═ 3 };

third group:

third group of data: "7,8,9"

Intermediate data of the third set of data: { sum ═ 24, count ═ 3 };

the average value of the "9 data" in step S202 can be obtained by performing the average value calculation based on the intermediate data of each group of data: (6+15+24)/(3+3+3) ═ 5.

In summary, in the data processing method provided by the embodiment of the present application, the original data is divided into a plurality of fragments to be executed, and after the execution of each fragment is completed, an intermediate result is obtained and then sent to the final node to perform summary calculation. Take the example of calculating the average of 9 numbers. 9 numbers are divided into 3 slices, each slice calculates respective intermediate results in parallel, the calculation process is divided into two steps, fig. 2b is a schematic diagram of data reading and mean value calculation in the data processing method according to the first embodiment of the present invention, as shown in fig. 2b, data is read from a disk in the first step and loaded into a memory; and the second step of calculating the intermediate result of the average value, wherein the intermediate result is in the form of the sum of the number of the data and the numerical value in the calculation of the average value.

Further, optionally, the performing, in step S196, the fragmented storage on each group of data includes: and acquiring each group of data from the first storage medium, and loading each group of data to the second storage medium.

In this embodiment of the present application, a first storage medium is a magnetic disk as an example, and a second storage medium is a memory as an example, where acquiring data from the first storage medium and loading the data into the second storage medium includes: and reading data from the disk and loading the data into the memory.

Optionally, the step S198 of calculating the intermediate data of the data in each slice respectively includes: under the condition that the third storage medium comprises each fragmented memory, acquiring corresponding data from each fragmented memory to obtain data in each fragment; acquiring each group of data from the data in each fragment; and obtaining the intermediate data of the data in each fragment according to the number of each group of data and the sum of the values of each data in each group of data.

Specifically, the third storage medium may include: and each piece of memory stores each group of data in each piece of memory respectively in the process of grouping the acquired data.

Specifically, the fourth storage medium may include a memory of the aggregation node, and sending the intermediate data of each set of data stored in the third storage medium to the fourth storage medium includes: and sending each group of intermediate data to the memory of the summary node from each partitioned memory.

According to the intermediate data stored in the memory of the summary node, the average value of the data acquired in step S202 is obtained by calculating the sum of the numerical values in each group of data and the number of data in each group of data.

For example, taking 9 data as an example, the obtained data are respectively "1, 2,3, 4,5,6, 7,8, 9", and the 9 data are grouped to obtain: a first group: "1, 2, 3"; second group: "4, 5, 6"; third group: "7, 8, 9"; the intermediate data for each set of data is as follows:

a first group:

the first set of data: "1, 2, 3";

the intermediate data { sum ═ 6, count ═ 3} of the first set of data;

second group:

the second set of data: "4, 5, 6";

intermediate data of the second set of data: { sum ═ 15, count ═ 3 };

third group:

third group of data: "7,8,9"

Intermediate data of the third set of data: { sum ═ 24, count ═ 3 };

To sum up, the difference between the data processing method provided in the embodiment of the present application and the prior art is that the embodiment of the present application needs to execute the following four steps in the data calculation of distributed storage:

the method comprises the steps of firstly, reading original data from a disk and loading the original data into a memory;

and secondly, calculating intermediate data of the average value, wherein the intermediate data is in the form of data number and sum in the average value calculation. The original data are many, and the generated intermediate result occupies a small space after calculation.

Thirdly, after the intermediate data of the three fragments are calculated and divided, the intermediate data are respectively sent to a summary node for final average value calculation, wherein the data are sent from the memory of each fragment to the memory of the summary node through a network;

and fourthly, calculating a final average value.

In the first step, a large amount of time is required for reading the disk because a large amount of data is read from the disk.

The second step involves computation of a large amount of data, which consumes a large amount of CPU resources, and for some complex computation types, the delay is relatively high.

The third step is not very high latency because of the cross-network transmission involving small amounts of data.

The fourth step involves the computation of a small amount of data, the delay not being very high.

It can be seen that, because the third step and the fourth step involve a small amount of data, the computation cost and the delay are small. As shown in fig. 2c, fig. 2c is a schematic diagram of the time proportion occupied by each step in the data processing method according to the first embodiment of the present invention, fig. 2c is a time distribution in the whole calculation process, and the sum of the time durations of the first step and the second step occupies 90% of the time consumed by the whole calculation. In order to optimize the time and save resource consumption, the data processing method provided in the embodiment of the present application introduces an intermediate data cache, fig. 2d is a schematic diagram of mean value calculation in the data processing method according to the first embodiment of the present invention, as shown in fig. 2d, 9 numbers are divided into 3 slices, each slice calculates its own intermediate result, and the intermediate data corresponding to the mean value is the sum of the number of data and the numerical value; and storing the intermediate data of the three fragments in a cache system, and simultaneously sending the intermediate data to a final node for summary calculation to obtain an average value 5.

As can be seen from fig. 2d, the intermediate data of each slice only contains two types of numbers, and there are 3 pieces of original data corresponding to each slice, and there are tens of millions of pieces of data corresponding to each slice in actual use. The reading and calculation costs of the original data are large, and after the initial calculation, the space becomes small.

In addition, fig. 2a is a schematic diagram of a query scenario applied in the data processing method according to the first embodiment of the present invention, and as shown in fig. 2a, when the cache is stored, the calculation and the storage are performed in units of slices. In subsequent queries, it is first necessary to determine which of the complete fragments hit in the query is: directly reading out an intermediate result cache from a cache system aiming at the complete matching fragments; for the fragments which are not completely matched, the original data needs to be read again for calculation.

In the case of fig. 2a, the first slice only needs to calculate two numbers, so

data

In the calculation using the cache, fig. 2e is a schematic diagram of the proportion of time occupied by each step in the data processing method according to the first embodiment of the present invention, and the time distribution is as shown in fig. 2e, since only one slice needs to be recalculated and the data volume of the slice only needs to be recalculated at 2/3, the time delay is 2/3 as it is, and the time can be ignored because the cache is completely multiplexed by the other two slices. Due to the use of the cache, 2 slices of computing resources are saved, more computations can be made, and the overall latency is reduced from 10 seconds to 7 seconds.

The data processing method provided by the embodiment of the application is only described by taking 3 pieces of data as an example, and in an actual case, tens of millions of data may be available, so that the optimization effect is more obvious. The maximum optimization effect is to completely multiplex all the fragmented buffers, from 10 seconds to 1 second.

It should be noted that the data processing method provided in the embodiment of the present application is only described by taking the above example as an example, and is not limited specifically to implement the data processing method provided in the embodiment of the present application.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the data processing method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an aspect of the embodiments of the present invention, there is provided a data processing apparatus, and fig. 3 is a schematic diagram of a data processing apparatus according to a second embodiment of the present invention, as shown in fig. 3, including: a request obtaining module 32, configured to obtain a query request; a judging module 34, configured to judge whether data in the query condition in the query request is matched with pre-stored intermediate data; a first obtaining module 36, configured to, if the determination result is yes, obtain corresponding data from the intermediate data, and determine the data as a result of the query; and a second obtaining module 38, configured to, if the determination result is negative, obtain data again, obtain intermediate data according to the sum of the number and the value of the obtained data, and calculate an average value of the data according to the intermediate data to obtain a query result corresponding to the query request.

Optionally, the data processing apparatus provided in the embodiment of the present application further includes: the storage module is used for carrying out fragment storage on each group of data before acquiring the query request; the first calculation module is used for calculating the intermediate data of the data in each fragment; and the second calculation module is used for acquiring the intermediate data of the data in each fragment and performing mean value calculation on the intermediate data of the data in each fragment to obtain a mean value.

Example 3

According to an aspect of the embodiments of the present invention, there is provided a storage medium, wherein the storage medium includes a stored program, and wherein, when the program runs, a device in which the storage medium is controlled to execute the method in embodiment 1 above.

Example 4

According to an aspect of the embodiments of the present invention, there is provided a processor, wherein the processor is configured to execute a program, and the program executes to perform the method in embodiment 1.

Example 5

The embodiment of the invention also provides a storage medium. Optionally, in this embodiment, the storage medium may be configured to store a program code executed by the data processing method provided in the first embodiment.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a query request; judging whether the data in the query condition in the query request is matched with the pre-stored intermediate data or not; if the judgment result is yes, acquiring corresponding data from the intermediate data, and determining the data as the query result; and under the condition that the judgment result is negative, acquiring the data again, obtaining intermediate data according to the number and the sum of the numerical values of the acquired data, and calculating the mean value of the data according to the intermediate data to obtain the query result corresponding to the query request.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: before acquiring the query request, performing fragment storage on each group of data; respectively calculating the intermediate data of the data in each fragment; and acquiring the intermediate data of the data in each fragment, and performing mean value calculation on the intermediate data of the data in each fragment to obtain a mean value.

Further, optionally, in the present embodiment, the storage medium is configured to store program code for performing the following steps: the step of carrying out fragment storage on each group of data comprises the following steps: and acquiring each group of data from the first storage medium, and loading each group of data to the second storage medium.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: respectively calculating the intermediate data of the data in each fragment comprises the following steps: under the condition that the third storage medium comprises each fragmented memory, acquiring corresponding data from each fragmented memory to obtain data in each fragment; acquiring each group of data from the data in each fragment; and obtaining the intermediate data of the data in each fragment according to the number of each group of data and the sum of the values of each data in each group of data.

Further, optionally, in the present embodiment, the storage medium is configured to store program code for performing the following steps: performing mean value calculation on the intermediate data of the data in each fragment, wherein the obtaining of the mean value comprises the following steps: storing the intermediate data of the data in each fragment to a fourth storage medium, wherein the fourth storage medium comprises a memory of the summary node; and calculating the numerical value in each group of data and the number of data in each group of data according to the intermediate data stored in the memory of the summary node to obtain the average value.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A data processing method, comprising:

acquiring a query request;

judging whether the data in the query condition in the query request is matched with pre-stored intermediate data or not;

if the judgment result is yes, acquiring corresponding data from the intermediate data, and determining the data as a query result;

and under the condition that the judgment result is negative, acquiring data again, obtaining intermediate data according to the number and the sum of the values of the acquired data, and calculating the mean value of the data according to the intermediate data to obtain the query result corresponding to the query request.

2. The method of claim 1, wherein the method further comprises:

before acquiring the query request, performing fragment storage on each group of data;

respectively calculating the intermediate data of the data in each fragment;

and acquiring the intermediate data of the data in each fragment, and performing mean value calculation on the intermediate data of the data in each fragment to obtain a mean value.

3. The method of claim 2, wherein the sharded storage of each set of data comprises:

and acquiring each group of data from a first storage medium, and loading each group of data to a second storage medium.

4. The method of claim 2, wherein the separately computing the intermediate data for the data in each slice comprises:

under the condition that a third storage medium comprises each fragmented memory, acquiring corresponding data from each fragmented memory to obtain data in each fragment;

acquiring each group of data from the data in each fragment;

and obtaining the intermediate data of the data in each fragment according to the number of each group of data and the numerical sum of each data in each group of data.

5. The method of claim 4, wherein the averaging the intermediate data of the data in each slice comprises:

storing the intermediate data of the data in each fragment to a fourth storage medium, wherein the fourth storage medium comprises a memory of a summary node;

and according to the intermediate data stored in the memory of the summary node, calculating the numerical value in each group of data and the number of the data in each group of data to obtain the average value.

6. A data processing apparatus, comprising:

the request acquisition module is used for acquiring a query request;

the judging module is used for judging whether the data in the query condition in the query request is matched with the pre-stored intermediate data or not;

the first acquisition module is used for acquiring corresponding data from the intermediate data under the condition that the judgment result is yes and determining the data as the query result;

and the second acquisition module is used for acquiring the data again under the condition that the judgment result is negative, acquiring intermediate data according to the number and the sum of the acquired data, and calculating the mean value of the data according to the intermediate data to acquire the query result corresponding to the query request.

7. The apparatus of claim 6, wherein the apparatus further comprises:

the storage module is used for carrying out fragment storage on each group of data before acquiring the query request;

the first calculation module is used for calculating the intermediate data of the data in each fragment;

and the second calculation module is used for acquiring the intermediate data of the data in each fragment and performing mean value calculation on the intermediate data of the data in each fragment to obtain a mean value.

8. The apparatus of claim 7, wherein the first computing module comprises:

the extracting unit is used for acquiring corresponding data from each fragmented memory to obtain data in each fragment under the condition that a third storage medium comprises each fragmented memory;

the data acquisition unit is used for acquiring each group of data from the data in each fragment;

and the computing unit is used for obtaining the intermediate data of the data in each segment according to the number of each group of data and the numerical sum of each data in each group of data.

9. A storage medium, wherein the storage medium comprises a stored program, and wherein the program, when executed, controls an apparatus in which the storage medium is located to perform the method of any one of claims 1 to 5.

10. A processor, wherein the processor is configured to run a program, wherein the program when running performs the method of any one of claims 1 to 5.