CN118093669A

CN118093669A - Big data calculation method and device

Info

Publication number: CN118093669A
Application number: CN202410083330.5A
Authority: CN
Inventors: 朱海鹏; 陈梦翔
Original assignee: Jingdong Technology Holding Co Ltd
Current assignee: Jingdong Technology Holding Co Ltd
Priority date: 2024-01-19
Filing date: 2024-01-19
Publication date: 2024-05-28

Abstract

The invention discloses a big data calculation method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: acquiring real-time data and historical data of task indexes of a big data task in response to receiving a trigger instruction of the big data task; respectively calculating first time window data comprising real-time data and third time window data comprising historical data of cycle starting time of a task cycle, wherein the current time of the real-time data is the ending time of the task cycle; acquiring an offline operation result of second time window data, wherein the second time window data is historical data of a preset time period in a task period; and combining the operation result of the first time window data, the operation result of the third time window data and the offline operation result of the second time window data to output the target operation result of the big data task. This embodiment satisfies efficient calculation of large data while improving calculation accuracy of large data.

Description

Big data calculation method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a big data computing method and apparatus.

Background

With the development of big data technology, big data is widely applied in more and more application scenes. When calculating big data with long period features, the method is limited by the computing power of a computer, so that the consumption of computing resources and the computing time are generally reduced by adopting an offline timing computing mode.

In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art: the offline calculation of big data can reduce the calculation time and has high calculation performance, but the calculation accuracy is difficult to meet, and the problem of inaccurate calculation result exists.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a big data calculation method and apparatus, which can satisfy efficient calculation of big data and improve calculation accuracy of big data.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a big data calculation method including:

Responding to a trigger instruction of a received big data task, and acquiring real-time data and historical data of a task index of the big data task;

Respectively calculating first time window data comprising the real-time data and third time window data comprising the historical data of the starting time of a task period of the task index, wherein the current time of the real-time data is the ending time of the task period;

Acquiring an offline operation result of second time window data of the task index, wherein the second time window data is the historical data of a preset time period in the task period;

And combining the operation result of the first time window data, the operation result of the third time window data and the offline operation result of the second time window data of the task index to output a target operation result of the big data task.

Preferably, the predetermined period of time is a remaining period of time outside of the first time window and the third time window within the task cycle.

Preferably, after acquiring the real-time data and the history data of the task index, the method further includes:

and preprocessing the real-time data and the historical data according to the task period and the operation precision of the big data task to obtain the first time window data, the second time window data and the third time window data of the task index, wherein the time window length of each of the first time window data and the third time window data is smaller than or equal to the operation precision.

Preferably, the time window length of the first time window data of the task index is equal to the time window length of the third time window data.

Preferably, after obtaining the first time window data, the second time window data, and the third time window data of the task index, the big data calculation method further includes: creating a dictionary table for storage, mapping the first time window data, the second time window data, and the third time window data of the task index to dictionary values of the dictionary table,

Outputting the target operation result of the big data task comprises the following steps: and outputting the dictionary value corresponding to the data of the target operation result.

Preferably, before the first time window data and the third time window data of the task index are respectively operated, the big data calculating method further includes: performing de-duplication processing on the first time window data and the third time window data of the task index respectively,

The computing of the first time window data and the third time window data respectively comprises: and respectively operating the first time window data and the third time window data after the de-duplication processing.

Preferably, combining the operation results of the respective time window data includes:

And performing de-duplication processing on the operation results of the first time window data, the operation results of the third time window data and the offline operation results of the second time window data of the task index, and adding the de-duplicated operation results.

According to another aspect of an embodiment of the present invention, there is provided a big data computing device including:

the data acquisition unit is used for responding to the received trigger instruction of the big data task and acquiring real-time data and historical data of the task index of the big data task;

an operation unit that performs an operation on first time window data including the real-time data and third time window data including the history data of a start time of a task cycle of the task index, respectively, wherein a current time of the real-time data is a termination time of the task cycle;

an offline operation result obtaining unit, configured to obtain an offline operation result of second time window data of the task index, where the second time window data is the history data of a predetermined time period in the task cycle; and

And the combining unit combines the operation result of the first time window data, the operation result of the third time window data and the offline operation result of the second time window data of the task index to output the target operation result of the big data task.

According to another aspect of an embodiment of the present invention, there is provided an electronic device for big data calculation, including:

One or more processors; and

Storage means for storing one or more programs,

The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the big data calculation method according to the above aspects of the embodiments of the present invention.

According to another aspect of an embodiment of the present invention, there is provided a computer-readable medium having stored thereon a computer program, characterized in that the program, when executed by a processor, implements the big data calculation method according to the above aspect of an embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: under the scenes of long period and large data volume, the high performance and high accuracy of the calculation result can be realized, the calculation periods and granularity of different tasks can be flexibly configured, the balance of calculation performance and accuracy is realized, and the multi-service scene is satisfied.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of the main flow of a big data calculation method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a data import flow according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a three-segment data structure according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a specific example of a three-segment data structure in accordance with the present invention;

FIG. 5 is a schematic diagram of a query index flow in accordance with an embodiment of the present invention;

FIG. 6 is a schematic diagram of the major modules of a big data computing device according to an embodiment of the present invention;

FIG. 7 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

Fig. 8 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, in the technical solution of the present disclosure, the related aspects of collecting, updating, analyzing, processing, using, transmitting, storing, etc. of the personal information of the user all conform to the rules of the related laws and regulations, and are used for legal purposes without violating the public order colloquial. Necessary measures are taken for the personal information of the user, illegal access to the personal information data of the user is prevented, and the personal information security, network security and national security of the user are maintained.

To count our product/service usage, we aggregate, analyze and use the technical processed user data and share the processed statistics with third parties. We can guarantee that the information receiver cannot re-identify the specific person through the technical processing mode of the secure encryption and other modes.

Fig. 1 is a schematic diagram of a main flow of a big data calculation method according to an embodiment of the present invention. As shown in fig. 1, the big data calculation method includes steps S101 to S104.

Step S101: and responding to the received trigger instruction of the big data task, and acquiring real-time data and historical data of the task index of the big data task.

After the real-time data and the historical data of the task index are obtained, the real-time data and the historical data can be preprocessed according to the task period and the operation precision of the big data task, so that the first time window data, the second time window data and the third time window data of the task index are obtained, wherein the time window length of each of the first time window data and the third time window data is smaller than or equal to the operation precision.

Further, the time window length of the first time window data is equal to the time window length of the third time window data.

Further, after the first time window data, the second time window data, and the third time window data are obtained, a dictionary table for storage may also be created such that the first time window data, the second time window data, and the third time window data are mapped to dictionary values of the dictionary table.

Step S102: and respectively calculating first time window data comprising the real-time data and third time window data comprising the historical data of the starting time of the task period of the task index, wherein the current time of the real-time data is the ending time of the task period.

In addition, before the first time window data and the third time window data of the task index are respectively operated, duplicate removal processing may be further performed on the first time window data and the third time window data of the task index. The calculating the first time window data and the third time window data may include: and respectively operating the first time window data and the third time window data after the de-duplication processing.

Step S103: and acquiring an offline operation result of second time window data of the task index, wherein the second time window data is the historical data of a preset time period in the task period.

The predetermined time period is, for example, a remaining time period outside the first time window and the third time window in the task cycle.

Step S104: and combining the operation result of the first time window data, the operation result of the third time window data and the offline operation result of the second time window data of the task index to output a target operation result of the big data task.

In addition, the operation results of the first time window data, the operation results of the third time window data and the offline operation results of the second time window data of the task index may be subjected to duplication removal processing, and the duplication removed operation results may be added.

In addition, the dictionary value corresponding to the data of the target calculation result may be output.

According to the big data computing method provided by the embodiment of the invention, the high performance and high accuracy of the computing result can be realized under the scene of long period and large data volume, the computing period and granularity of different tasks can be flexibly configured, the balance of computing performance and accuracy is realized, and the multi-service scene is satisfied.

Specific examples of the big data calculation method according to the embodiment of the present invention are further described below with reference to fig. 2 to 5.

First, as shown in fig. 2, long period data including real-time data and history data for a large data service is imported from a data source. The long period is, for example, a calculation period of more than 30 days. The imported historical data is used for offline calculation and real-time calculation. The data may be imported using a message queue MQ, for example. Offline data refers to: data within time T-n, such as today's date t=2023-05-22, then the business data that can be represented in the data result includes only the previous day (yesterday data). The real-time data refers to data at or near the current time T.

Then, as shown in fig. 2, data processing (stream processing) is performed on the imported data, and the processed data is stored by dictionary table mapping. The dictionary table uses, for example, a String structure of redis in which key=index field Value and value=dictionary map Value. The redis is a remote dictionary service, and is an open-source log-type and Key-Value database which is written and supported by using ANSI C language and can be based on memory and also can be persistent. Such as an address dictionary table: key=the number 999 of the east great name road in the rainbow area of Shanghai city, value=1, and the mapping value 1 is used for executing the operation when inquiring, calculating and storing the address, so that the problem that a large amount of data and a long character string occupy the storage space is solved, and the aim of reducing the storage space is fulfilled.

The imported data may be subjected to data processing (stream processing) based on a predetermined calculation cycle to obtain three-segment data. As shown in fig. 3, the three-segment data includes: far real-time data (i.e., past due history), fixed time data (fixed time history), and near real-time data (new data generated in real-time). "near" means closest to the current statistical moment T, and "far" means farthest from the current statistical moment T.

According to different service requirements, different calculation time windows can be configured for near real-time data, fixed time data and far real-time data. And calculating the starting and stopping window time of each section according to the granularity and the period of configuration. The granularity is the minimum time length (calculation accuracy) in calculation and can be used as a calculation time window. The particle size may be, for example: one month, one day, one week, 1 hour, 1 minute, etc.

For example, the current time is t0, the following time window configuration may be performed:

-the first segment of calculation cycle window start-stop times t1-t0. The data set in the corresponding period is D1 (near real-time data).

-The second calculation period window start-stop time is t2-t1. The data set in the corresponding period is D2 (fixed time data).

-The third calculation period window start-stop time is t3-t2. The dataset within the corresponding time period is D3 (far real-time data).

If the granularity is one day, t1 is the time of day 0, and the time period from t1 to t0 is less than one day. t2 is the time of day 0 after the furthest day of the whole cycle. t3 is the time t0 of the day of the furthest day of the cycle.

For example, the query period is index data within one year. For example, the current time is 2023, 5, 22, 17:15, then, statistics of 2022, 5, 22, 17:15 to 2023, 5, 22 days 17: index data over 15 hours. As shown in fig. 4, corresponding three-segment data are obtained, namely, far real-time data of 2022-5-22:17:15:00 to 2022-5-23:00:00:00, fixed time data of 2022-5-23:00:00 to 2023-5-21 12:00:00, and near real-time data of 2023-5-21:12:00 to 2023-5-22:17:15:00.

However, based on the three-segment data obtained as above, the following steps (1) to (3) are performed.

(1) First stage real-time calculation step

And calculating the obtained near real-time data in real time. The first segment of calculated data is, for example, a real-time data set D1 from the 0 th point of the day to the current time t. And using a sliding window algorithm to perform real-time streaming calculation on the D1 set.

In addition, the data to be calculated in the D1 set may be subjected to deduplication processing. Deduplication uses the natural deduplication properties of the zset set of redis for deduplication calculations, and stores the results of the calculations in zset. The deduplication operation on the D1 set can reduce the storage space for data and save the calculation force of the deduplication addition of the calculation results of three pieces of data described later.

The first section of real-time calculation has the characteristics of short period, fine granularity and quick calculation. The method and the device can calculate in real time each time the data calculation service (such as data query) is executed, thereby achieving the purposes of high-efficiency calculation and high precision.

(2) Second stage offline computing step

The second off-line calculating step calculates the historical data off-line. For example, the second segment calculates historical data using timed offline running lot and stores the calculation results.

The second piece of calculated data is all data D2 of the day after, for example, the last day to the farthest day of the calculation period. The calculation result is stored in zset of redis. The dictionary values mapped to by the dictionary table are saved to Redis, for example using BitMap (bitmap data structure, available for space saving) format. key=fixed time data index name packet field dictionary value, value= BitMap. For example, the same unit address applicant number in one year, key=middle_company_addr_cnt_360 # { company address dictionary value }, value = [ '# { applicant identity card number dictionary value 1}', '# { applicant identity card number dictionary value 2}, …' ].

The second section of calculation has the characteristics of long period, thicker granularity and longer calculation time, so that the offline calculation result of the second section is stored. Each time a data computing service is executed, such as querying the result of the segment, real-time computation is not performed, but the stored offline computing result is directly used, so that computing performance is improved.

(3) Third stage far real-time calculation step

The third section of calculation uses real-time streaming calculation, and the calculated data is all data D3 from the furthest moment of the period window to the 0 point before the furthest day of the period window.

For example, the offline data is traversed, the data in 24 hours which is about to expire is obtained, the data is processed, the data is mapped to dictionary values through a dictionary table and is stored in zset structures of redis, wherein key=far real-time data index names are indexes, packet field values, score=time stamps generated by the data, and value=statistical duplicate removal field dictionary values. For example, the number of the applicant of the filling company within 365 days is queried, and the query result of the far real-time data is: key=far_company_addr_cnt_365 # { company address dictionary value }, score=timestamp, value = [ '# { applicant identity card number dictionary value 1}', '# { applicant identity card number dictionary value 2}, …' ].

Historical data is calculated in real time based on a sliding window algorithm, and the calculation has the characteristics of short period, small granularity and high efficiency. Similar to the first-stage real-time calculation step, the real-time calculation is performed every time the data calculation service (e.g., data query) is performed, thereby achieving the purpose of efficient calculation and high accuracy.

According to the three-section calculation, the calculation of the first section and the third section ensures high accuracy of calculation, and the calculation of the second section ensures high performance of calculation.

After the three calculation steps are executed, post-processing is carried out on the calculation result. The post-processing includes de-duplication adding the calculated results of the segments and outputting the final result. For example: the three-section calculation results are R0, R1 and R2 respectively, and the final statistical result R=sum { distinct { R0, R1 and R2 }.

After the final result is obtained, other post-processing such as freeing up memory space, saving intermediate process results, etc. may also be performed.

The above-described calculation process is described below by way of a more specific example.

For example, query the number of applicant for a company in 365 days. Fig. 5 shows a flow of querying an index. As shown in fig. 5, the query index is triggered first, and real-time data and history data of the index are obtained. Near real-time data, fixed time data and far real-time data are obtained and stored through aggregation calculation processing of the real-time data and the historical data.

The query process is as follows: data for each segment is obtained from redis, where near real-time and fixed period data is converted to a set of deduplication details by BitMap format. The remote real-time data queries out a detail set of the specified time interval in the set according to zrangebyscore commands of the current timestamp using redis. For example, query for an index within one year, the expiration real-time is the current date time minus the 365 day date time, currently 2023-05-22:17:15:28, and the expiration real-time is 2022-05-22:17:15:28. A set of deduplication details within the expiration real-time range is returned using zrangebyscore (a list of members of the query set that specify the time interval).

Query results: the first section of calculation (inquiry) results are [ ('# { applicant identification card number dictionary value 1}', '# { applicant identification card number dictionary value 2}' ]; the second section of calculation result is [ (' # { applicant identification card number dictionary value 3} ' # { applicant identification card number dictionary value 6} ' ]; the third section of calculation result is [ (' # { applicant identification card number dictionary value 1} ' # { applicant identification card number dictionary value 3} ' ]. The final statistics are 4, i.e. identification number dictionary values 1,2, 3 and 6.

Further, the pressure test was performed on the big data calculation method of the above specific example. The press test environment is a 4-core 8G Linux environment machine, and the press test result is shown in the following table 1.

TABLE 1

Database for storing data

Data volume

Quantity of adjustment

Calculation cycle

Response time

tps

tp99

redis

8677567

176532

365 Days

13ms

726

35ms

Tps is the transaction amount per second and tp99 is a measure of the median of the performance metrics, which represents that 99% of the response time in a data set is less than or equal to this value. As shown in table 1, the 8677567 data was subjected to 365-day long-period deduplication calculation using the three-stage calculation method, the average response was within 13ms, and the calculation accuracy was accurate. Fully satisfies high-performance and high-precision scenes.

According to the specific example of the big data calculation method, the high performance and the high accuracy of the calculation result can be realized under the scene of long period and large data volume, the calculation periods and the granularity of different tasks can be flexibly configured, the balance of the calculation performance and the accuracy is realized, and the multi-service scene is satisfied.

Fig. 6 is a schematic diagram of main modules of a big data computing device according to an embodiment of the present invention. As shown in fig. 6, the big data calculation device 200 includes: a data acquisition unit 201, an operation unit 202, an offline operation result acquisition unit 203, and a combination unit 204.

The data acquisition unit 201 acquires real-time data and history data of a task index of a big data task in response to receiving a trigger instruction of the big data task.

After acquiring the real-time data and the history data of the task index, the data acquisition unit 201 may further perform preprocessing on the real-time data and the history data according to the task period and the operation precision of the big data task to obtain the first time window data, the second time window data, and the third time window data of the task index, wherein a time window length of each of the first time window data and the third time window data is less than or equal to the operation precision.

Further, after obtaining the first time window data, the second time window data, and the third time window data, the data obtaining unit 201 may further create a dictionary table for storage such that the first time window data, the second time window data, and the third time window data are mapped to dictionary values of the dictionary table.

The operation unit 202 performs an operation on the first time window data including the real-time data and the third time window data including the history data of the start time of the task period, where the current time of the real-time data is the end time of the task period.

Furthermore, before the first time window data and the third time window data of the task index are respectively operated, the operation unit 202 may further perform a deduplication process on the first time window data and the third time window data of the task index, respectively. The operation unit 202 performs an operation on the first time window data and the third time window data after the deduplication process, respectively.

The offline operation result obtaining unit 203 obtains an offline operation result of second time window data of the task index, where the second time window data is the historical data of a predetermined time period in the task cycle.

The combining unit 204 combines the operation result of the first time window data, the operation result of the third time window data, and the offline operation result of the second time window data of the task index to output the target operation result of the big data task.

In addition, the combining unit 204 may further perform a deduplication process on the calculation result of the first time window data, the calculation result of the third time window data, and the offline calculation result of the second time window data of the task index, and add the calculation results after deduplication.

In addition, the combining unit 204 may output the dictionary value corresponding to the data of the target calculation result.

According to the big data computing device provided by the embodiment of the invention, the high performance and high accuracy of a computing result can be realized under the scene of long period and large data volume, the computing period and granularity of different tasks can be flexibly configured, the balance of computing performance and accuracy is realized, and the multi-service scene is satisfied.

Fig. 7 illustrates an exemplary system architecture 700 to which the big data computing method or big data computing apparatus of embodiments of the present invention may be applied.

As shown in fig. 7, a system architecture 700 may include terminal devices 701, 702, 703, a network 704, and a server 705. The network 704 is the medium used to provide communication links between the terminal devices 701, 702, 703 and the server 705. The network 704 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with the server 705 via the network 704 using the terminal devices 701, 702, 703 to receive or send messages or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 701, 702, 703.

The terminal devices 701, 702, 703 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 705 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using the terminal devices 701, 702, 703. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.

It should be noted that, the big data computing method provided by the embodiment of the present invention is generally executed by the server 705, and accordingly, the big data computing device is generally disposed in the server 705.

It should be understood that the number of terminal devices, networks and servers in fig. 7 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 8, there is illustrated a schematic diagram of a computer system 800 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 8 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU) 801 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data required for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other by a bus 804. An input/output (I/O) interface 805 is also connected to the bus 804.

The following components are connected to the I/O interface 805: an input portion 806 including a keyboard, mouse, etc.; an output portion 807 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 808 including a hard disk or the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. The drive 810 is also connected to the I/O interface 805 as needed. A removable medium 811 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 810 as needed so that a computer program read out therefrom is mounted into the storage section 808 as needed.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section 809, and/or installed from the removable media 811. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 801.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units involved in the embodiments of the present invention may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a data acquisition unit, an operation unit, an offline operation result acquisition unit, and a combination unit. The names of these units do not constitute a limitation on the unit itself in some cases, and for example, the offline operation result obtaining unit may also be described as "a unit that obtains an offline operation result of the second time window data of the task index".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: responding to a trigger instruction of a received big data task, and acquiring real-time data and historical data of a task index of the big data task; respectively calculating first time window data comprising the real-time data and third time window data comprising the historical data of the starting time of a task period of the task index, wherein the current time of the real-time data is the ending time of the task period; acquiring an offline operation result of second time window data of the task index, wherein the second time window data is the historical data of a preset time period in the task period; and combining the operation result of the first time window data, the operation result of the third time window data and the offline operation result of the second time window data of the task index to output a target operation result of the big data task.

According to the technical scheme provided by the embodiment of the invention, the high performance and high accuracy of the calculation result can be realized under the scene of long period and large data volume, the calculation periods and granularity of different tasks can be flexibly configured, the balance of calculation performance and accuracy is realized, and the multi-service scene is satisfied.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A big data calculation method, characterized by comprising:

2. The big data calculation method according to claim 1, wherein,

The predetermined time period is a remaining time period outside the first time window and the third time window in the task cycle.

3. The big data calculation method according to claim 1 or 2, wherein,

After acquiring the real-time data and the history data of the task index, further comprising:

4. The big data calculation method of claim 3, wherein,

The time window length of the first time window data of the task index is equal to the time window length of the third time window data.

5. The big data calculation method of claim 3, wherein,

After obtaining the first time window data, the second time window data, and the third time window data of the task indicator, further comprising: creating a dictionary table for storage, mapping the first time window data, the second time window data, and the third time window data of the task index to dictionary values of the dictionary table,

6. The big data calculation method according to claim 1, wherein,

Before the first time window data and the third time window data of the task index are respectively operated, the method further comprises: performing de-duplication processing on the first time window data and the third time window data of the task index respectively,

7. The big data calculation method according to claim 1, wherein,

Combining the operation results of the time window data comprises:

8. A big data computing device, comprising:

9. An electronic device for big data calculation, comprising:

One or more processors; and

Storage means for storing one or more programs,

When executed by the one or more processors, causes the one or more processors to implement the big data calculation method as recited in any of claims 1-7.

10. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the big data calculation method according to any of claims 1-7.