CN111723114B

CN111723114B - Stream statistics method and device and electronic equipment

Info

Publication number: CN111723114B
Application number: CN202010585418.9A
Authority: CN
Inventors: 赵文越; 徐端丰; 章孜谦; 朱敏
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2023-07-25
Anticipated expiration: 2040-06-24
Also published as: CN111723114A

Abstract

The disclosure provides a stream statistics method, a stream statistics device and electronic equipment. The method comprises the following steps: setting a variable scale; determining a statistical starting point and a statistical ending point for a specified variable on a variable scale; determining a statistical point, wherein the statistical point at least comprises a statistical starting point and a statistical ending point; determining the statistical result of stream data aiming at the appointed variable based on the statistical result of the statistical point; wherein the statistical starting point and the statistical ending point are determined based on a processing speed of a data processing stage of the stream data.

Description

Stream statistics method and device and electronic equipment

Technical Field

The present disclosure relates to the field of data processing technologies, and in particular, to a method, an apparatus, and an electronic device for stream statistics.

Background

For shopping malls or daily promotional campaigns, it is often desirable to track changes in commodity transactions over a specified period of time as efficiently and accurately as possible. The processing of stream data in the related art only involves the collection and collection of stream data based on time windows, which can be said to be blank in the field of stream statistics and presentation. For example, when counting the number of the stream statistics days, the related technology can temporarily store the stream data of each time window, then accumulate the stream data circularly from time window to time window, and delay the data for a plurality of minutes for displaying.

In the process of implementing the disclosed concept, the inventor finds that the related art has at least the following problems: the prior art results in higher latency by way of a time window-by-time window loop calculation.

Disclosure of Invention

In view of this, the present disclosure provides a method, an apparatus, and an electronic device for streaming statistics that help to improve the problem of higher latency.

One aspect of the present disclosure provides a stream statistics method, in which stream data is processed by a data processing stage to complete a data processing procedure, the method comprising: setting a variable scale; determining a statistical starting point and a statistical ending point for a specified variable on a variable scale; determining a statistical point, wherein the statistical point at least comprises a statistical starting point and a statistical ending point; determining the statistical result of stream data aiming at the appointed variable based on the statistical result of the statistical point; wherein the statistical starting point and the statistical ending point are determined based on a processing speed of a data processing stage of the stream data.

The flow type statistical method provided by the embodiment of the disclosure creatively proposes the concept of the parameter scale, and then the flow type statistical data of a series of statistical points are calculated by cutting once without circularly calculating the parameter variable window one by one through a plurality of statistical points based on the parameter scale, so that the statistical efficiency is effectively improved and the time delay is reduced.

One aspect of the present disclosure provides a flow statistical apparatus, comprising: the system comprises a scale setting module, a start-stop point determining module, a statistical point determining module and a statistical module. The scale setting module is used for setting a variable scale; the starting and stopping point determining module is used for determining a statistical starting point and a statistical stopping point aiming at a specified variable on the variable scale, wherein the statistical starting point and the statistical stopping point are determined based on the processing speed of a data processing stage of the stream data; the statistical point determining module is used for determining statistical points, and the statistical points at least comprise a statistical starting point and a statistical ending point; the statistics module is used for determining the statistics result of the stream data aiming at the appointed variable based on the statistics result of the statistics points.

According to the flow type statistics device provided by the embodiment of the disclosure, the variable scale is set through the scale setting module, so that flow type statistics data of a series of statistics points meeting the shortest aging is calculated based on one-time cutting of the variable scale, the data precision can be specified, and the promise and the precision of aging are specified clearly and controllably.

Another aspect of the present disclosure provides an electronic device comprising one or more processors and a storage device for storing executable instructions that, when executed by the processors, implement the method as described above.

Another aspect of the present disclosure provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a method as described above.

Another aspect of the present disclosure provides a computer program comprising computer executable instructions which, when executed, are adapted to carry out the method as described above.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:

fig. 1 schematically illustrates an application scenario of a streaming statistics method, apparatus and electronic device according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates an exemplary system architecture to which the streaming statistics methods, apparatus, and electronic devices may be applied, according to an embodiment of the present disclosure;

FIG. 3 schematically illustrates a flow chart of a flow statistical method according to an embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart for determining a machining age according to an embodiment of the disclosure;

FIG. 5 schematically illustrates a schematic diagram of a data processing stage according to an embodiment of the present disclosure;

FIG. 6 schematically illustrates a schematic diagram of processing durations of data processing stages according to an embodiment of the disclosure;

FIG. 7 schematically illustrates a logic diagram of a flow statistical method according to an embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic diagram of a flow statistical method according to an embodiment of the present disclosure;

FIG. 9 schematically illustrates a schematic diagram of a flow statistical method according to another embodiment of the present disclosure;

FIG. 10 schematically illustrates a structural schematic of a flow statistics apparatus according to an embodiment of the present disclosure; and

fig. 11 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure.

Detailed Description

Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. One or more embodiments may be practiced without these specific details. In the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and/or the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.

All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art, and the terms used herein should be interpreted as having a meaning consistent with the context of this specification and not in an idealized or overly formal sense.

Where expressions like at least one of "A, B and C, etc. are used, the expressions should generally be interpreted in accordance with the meaning as commonly understood by those skilled in the art (e.g.," a system having at least one of A, B and C "shall include, but not be limited to, a system having a alone, B alone, C alone, a and B together, a and C together, B and C together, and/or A, B, C together, etc.). The terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more features.

The embodiment of the disclosure provides a stream statistics method, a stream statistics device and electronic equipment. The stream statistical method comprises a start-stop point determining process and a statistical process. In the process of determining the start point and the end point, firstly, a variable scale is set, then, a statistical start point and a statistical end point aiming at a specified variable on the variable scale are determined, then, a statistical point is determined, the statistical point at least comprises a statistical start point and a statistical end point, and the statistical start point and the statistical end point are determined based on the processing speed of a data processing stage of stream data. After the start-stop point determining process is completed, a statistical process is entered, and a statistical result of stream data for a specified variable is determined based on the statistical result of the statistical point. The embodiment of the disclosure determines a plurality of statistical points based on the parameter scale, does not need to circularly calculate the parameter windows one by one, but cuts and calculates the flow statistical data of a series of statistical points once, thereby effectively improving the statistical efficiency and reducing the delay.

Fig. 1 schematically illustrates an application scenario of a streaming statistics method, apparatus and electronic device according to an embodiment of the present disclosure.

As shown in FIG. 1, for shopping malls or daily promotional campaigns, such as double 11 shopping mall, 618 shopping mall, home appliance subsidy campaigns, merchant promotional campaigns, etc., users desire to be able to determine the change in merchandise transactions with high efficiency and accuracy. As shown in fig. 1, a platform pushes out XX shopping malls, and platform operators want to know the trade conditions of various commodities, such as electronic products, clothes, travel products, etc. (such as number of deals, amount of deals, etc.) in real time. In addition, platform operators may wish to learn about the trade of more finely categorized goods, such as electronic products, which may include cell phones, computers, appliances, etc. For another example, a platform operator may wish to know the period: aa bbto cc dd. Wherein the values of the time periods aa: bb, cc: dd may be adjusted by the user in real time. The values of X, A, B, C in fig. 1 can be dynamically changed, and a visual chart can be further formed to improve intuitiveness. The platform operators can allocate resources, judge the running state of the platform and the like according to the real-time statistical result, and the platform operators are beneficial to improving the operation performance of the platform.

Fig. 2 schematically illustrates an exemplary system architecture to which the streaming statistics methods, apparatuses, and electronic devices may be applied, according to an embodiment of the present disclosure. It should be noted that fig. 2 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.

As shown in fig. 2, the system architecture 200 according to this embodiment may include terminal devices 201, 202, 203, a network 204, and a server 205. The network 204 may include a number of gateways, hubs, network cables, etc. to provide a medium for communication links between the terminal devices 201, 202, 203 and the server 205. The network 204 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

The user can interact with other terminal devices and the server 205 through the network 204 using the terminal devices 201, 202, 203 to receive or transmit information or the like, such as transmitting an association relation request, transmitting an information request, receiving a processing result, or the like. The terminal devices 201, 202, 203 may be installed with various communication client applications, such as banking applications, web browser applications, search applications, office applications, instant messaging tools, mailbox clients, social platform software, etc. applications (just examples).

Terminal devices 201, 202, 203 include, but are not limited to, self-service terminals, smartphones, virtual reality devices, augmented reality devices, tablet computers, laptop portable computers, and the like.

The server 205 may receive requests, such as commodity information requests, real-time statistics requests, shopping requests, etc., from the terminal devices 201, 202, 203, and the server 205 may obtain streaming data from the terminal, other servers (e.g., information platforms, database servers, cloud databases, etc.), and make statistics on the streaming data. For example, the server 205 may be a background management server, a server cluster, or the like. The background management server can analyze and process the received service request, information request and the like, and feed back the processing result (such as the statistical result of the request) to the terminal equipment.

It should be noted that, the flow statistics method provided by the embodiments of the present disclosure may be generally performed by the server 205. The streaming statistics method provided by the embodiments of the present disclosure may also be performed by a server or a cluster of servers that are different from the server 205 and that are capable of communicating with the terminal devices 201, 202, 203 and/or the server 205. It should be understood that the number of terminal devices, networks and servers is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Fig. 3 schematically illustrates a flow chart of a streaming statistics method according to an embodiment of the present disclosure.

As shown in fig. 3, the flow statistics method includes operations S301 to S307.

In operation S301, a variable scale is set.

In the present embodiment, the parameters of the variable scale include: any one of physical parameters such as time parameter, length parameter, volume parameter, weight parameter, flow parameter, electric quantity parameter and the like. For example, the variable scale may be a time scale, a length scale, a volume scale, a flow scale, or the like. The selected variable scale may be dependent on the variable for which the particular application scenario is intended. For example, the method can be applied to various scenes in which statistics corresponding to a series of variable points are calculated by setting the variable scale and mapping once according to popularization of an actual scene to other metrics.

The time scale will be described below as an example. The time scale may include parameters such as scale length and scale accuracy. For example, the total length of the time scale is in hours, the accuracy is in seconds, etc. When using the time scale, the time scale may be initialized first.

In one embodiment, the variable scale may be initialized based on a preset accuracy based on the total length of the variable scale. Taking the time scale as an example, the precision and the total length of the time scale can be flexibly set along with different application scenes, so that the requirements of various application scenes can be met, for example, the application scenes with high precision requirements can be met, and the effect of reducing the time delay in the high-precision application scenes is more obvious. It should be noted that the total length of the time scale may be determined by the processing time length of the data processing stage, for example, the total length of the time scale is an integer multiple of the processing time length.

In one embodiment, the total length of the time scale (e.g., 1 minute, 5 minutes, 10 minutes, half hour, 1 hour, 5 hours, 1 day, 1 week, etc.), the accuracy (e.g., 1, 3, 5, 10, 30, 50, or 100, etc.), the accuracy unit (e.g., seconds), etc., may be determined first, and the design initialization parameter table (e.g., as shown in table 1 below) contains fields for time scale accuracy a, time accuracy unit, time scale value, etc. The scales of the time scale may be equally spaced or may be variable, such as, but not limited to, a different scale spacing in one segment than another segment.

An exemplary description of the initialization algorithm follows.

When the time scale value in the time scale coefficient table 1 does not exist, a record is inserted, and the record may include the following parameters: time zero scale value, time scale precision, time precision unit.

When the maximum value of the time scale values in the time scale coefficient table 1 is smaller than the total length of the time scale, the maximum value of the current time scale values is increased by one time scale unit to obtain new time scale values, and then the new time scale values are inserted into the coefficient table 1. This step is repeated until the maximum value of the time scale values in the coefficient table 1 is greater than or equal to the total length of the time scale.

For example, when the total length of the time scale is 24 hours, the input time scale accuracy a is 10, the time accuracy unit is seconds(s), and the time scale accuracy is 10 seconds, a time scale coefficient table can be obtained as shown in table 1.

TABLE 1

Time scale value	Accuracy of time scale	Time accuracy unit
			00:00:00	10	s
00:00:10	10	s
			00:00:20	10	s
...	...	...
			23:59:40	10	s
23:59:50	10	s

It should be noted that, according to actual needs, variable values with equal spacing or unequal spacing may be used as the initialized variable scale parameter table.

In operation S303, a statistical start point and a statistical end point for a specified variable on the variable scale are determined.

Wherein the stream data is processed in a data processing stage to complete the data processing process. The statistical start point and the statistical end point are determined based on processing characteristics of the data processing stage of the stream data, such as processing duration, processing speed, etc. The processing speed of the data processing stage may be determined based on the processing duration of the data processing node and the processing duration threshold, and if the processing duration of the data processing node is greater than the processing duration threshold, the processing speed of the data processing stage may be considered to be slower, and if the processing duration of the data processing node is less than the processing duration threshold, the processing speed of the data processing stage may be considered to be faster.

In one embodiment, the specified variable is a time variable. Determining statistical starting and ending points on the variable scale for a specified variable may include the following operations. A statistical starting point and a statistical ending point for the current point in time are determined based on the process aging and/or the duration information of the process units, each process unit comprising at least one data processing stage.

Fig. 4 schematically illustrates a flow chart for determining a machining age according to an embodiment of the disclosure.

As shown in fig. 4, the time duration information of the processing unit may be determined first, and then the processing time period may be determined.

Specifically, the duration information of the processing unit may be determined by determining how many stages of data processing are (determining which stage the flow statistics stage is in), and estimating the time consumption of each stage. For example, the duration information of the processing unit is determined by: first, a specified number of data processing stages through which stream data needs to pass is determined. Then, the processing duration of each data processing stage is determined. And combining at least part of the data processing stages in the designated number of data processing stages based on a preset rule and the processing time length of each data processing stage to obtain at least one processing unit and respective time length information.

Fig. 5 schematically shows a schematic diagram of a data processing stage according to an embodiment of the present disclosure.

The data processing process of the investigation flow can be divided into a plurality of stages, and a plurality of processing stages with relatively independent technology and function are listed. As shown in fig. 5, after examining each function of a certain system flow data processing stage, the system flow data processing stage may be roughly divided into a real-time data acquisition stage a, a per-minute collection processing stage B, a spark sql platform processing stage C, a file transmission stage D, an application side flow statistics stage E, and a foreground server processing and presentation request-response stage F.

Fig. 6 schematically illustrates a schematic diagram of processing durations of data processing stages according to an embodiment of the disclosure.

As shown in fig. 6, the processing duration of the acquisition stage a of real-time data may be initially determined by investigation, statistics, etc. for about several seconds, the processing duration of the processing stage B is collected by minutes for about 1 minute, the processing duration of the spark sql platform processing stage C is about 2 minutes 45 seconds (2 min45 s), the processing duration of the file transfer stage D is less than 2 minutes, the processing duration of the application side-stream statistics stage E is less than 1 minute, and the processing duration of the foreground server processing presentation request-response stage F is about seconds.

Determining the processing age may include the following. First, time slice duration information is determined based on at least one processing unit. Then, a processing age is determined based on the time slice duration information and the number of at least one processing unit.

For example, a data processing stage with the greatest estimated time consumption is found, the time consumption of the data processing stage is denoted as Tmax, the processing time consumption of the data of the ith stage is denoted as Ti, and the data amount of each stage for processing one round is measured by the time length and is denoted as a single time slice L. Because the data throughput of the ith stage is to be guaranteed integrity within a single time slice and the processing of the ith stage is dependent on the processing of the ith-1 stage, if the ith stage is more time consuming than the ith-1 stage. i is a positive integer of 1 or more. For example, the stream data includes at least one round of data, each round of data including an amount of data processed by one data processing stage having the longest processing duration.

Then the assigned time slices need to satisfy: l mod Ti-1=0, if the time consumption of the i-th phase is less than the i-1-th phase, then the assigned time slice needs to satisfy L > =ti-1. For example, the time consumption Tb of the per-minute collection processing stage B is greater than the time consumption Ta of the real-time data collection stage a, and L mod ta=0 needs to be satisfied. The time consumption Tc of the spark sql platform processing stage C is greater than the time consumption Tb of the per-minute collection processing stage B, so that L mod tb=0 needs to be satisfied, and the time consumption Td of the file transfer stage D is less than the time consumption Tc of the spark sql platform processing stage C, and L > =tc needs to be satisfied, similarly L > =td, L > =te. To sum up, the set of L is {3,4,5,.. } i.e., a natural number greater than 3, where the minimum value of L, min (L) =3 minutes. Where mod is the remainder function.

If the precision of the time scale is a, then it is subdivided into (b=l/a) smaller precision slices according to a single time slice L. Obviously, the time slice L takes the smallest value, and the whole process ages the fastest.

The preset rules include at least one of: and for each data processing stage, if the total processing time length of the data processing stages after being combined into the adjacent data processing stages is smaller than the set time length threshold value, combining into the adjacent data processing stages, and repeating the above operation until the total processing time length of the combined data processing stages is greater than or equal to the set time length threshold value. The merging process may not be performed for the data processing stage having a smaller functional dependency.

For example, if the time spent in a certain stage is less than 1 second according to the actual situation evaluation, the influence on the processing line is ignored, and the processing line is used as continuous uninterrupted operation and is not divided into one processing unit independently. If the total time consumption of consecutive m stages is determined to be less than the time slice length L, it may be combined into the same processing unit. If the time consumption of both the front and rear stages is smaller than the time slice L and the sum of the time consumption is larger than the time slice L, then the front and rear stages should be divided into different processing units. Therefore, the minimum value min (N) of the number N of processing units can be calculated, and then the ideal time consumption Tx of the whole processing is the time slice length and the number of processing units, and the fastest aging min (Tx) =the minimum value min (L) of the time slice and the minimum value min (N) of the number of processing units. According to the time-consuming schematic diagram of each stage, according to the preset rule, stage a and stage B may be combined into one processing unit, stage B and stage C respectively belong to different processing units, stage C and stage D respectively belong to different processing units, stage D and stage E are combined into the same processing unit, and then the minimum value min (N) =3, and the fastest aging min (Tx) =min (L) ×min (N) =9 minutes of the number of processing units are shown in table 2.

TABLE 2

Stage(s)

A

B

C

D

E

F

0-3min

<1 second

First wheel

Waiting for

3-6min

<1 second

Second wheel

First wheel

Waiting for

6-9min

<1 second

Third wheel

Second wheel

First wheel

Waiting for

9-12min

......

Third wheel

Second wheel

First wheel

12-15min

......

Third wheel

Second wheel

15-18min

......

Third wheel

18-21min

......

In another embodiment, the above method may further include the following operations. After determining the machining ageing, the fluctuation range of the machining ageing is determined. The committed aging is then determined based on the process aging and the fluctuation range.

The external promise may be aged (9 +. DELTA.) for minutes, where DELTA is an error determined based on the fluctuation range, etc., for example, the fluctuation range of phase A to phase F is 1 second to 60 seconds, and then the external promise may be aged for 10 minutes.

Accordingly, because the stream data system has the characteristics of fast processing, weak stability and easy lapse compared with the batch system, determining the statistical start point and the statistical end point for the current time point based on the processing aging and/or the duration information of the processing unit for each new round of the stream statistics may include: a statistical starting point and a statistical ending point for the current point in time are determined based on the promised time and/or the duration information of the processing unit.

In another embodiment, the above method may further include the following operations.

After determining the committed aging based on the process aging and the fluctuation range, if the deviation between the committed aging and the actual aging exceeds a set deviation threshold, updating the committed aging.

For example, if the actual running average duration of each production stage is not in accordance with the estimated condition of the original design after a period of time, the parameters should be readjusted according to the previous steps no matter too long or too short, so as to achieve the effect that the data is as accurate as possible, reflects the actual condition as real time as possible, and reissues promise for aging.

After determining the committed aging, a statistical starting point and a statistical ending point may be determined based on the following.

In one embodiment, for the jth rotation data and the jth-1 rotation data, determining the statistical start point and the statistical end point for the current time point based on the committed aging and/or the duration information of the processing unit may include the following operations. Wherein j is a positive integer greater than or equal to 1.

When the difference between the system time for the jth rotation data and the statistical ending point for the jth-1 rotation data is greater than or equal to the committed aging, a statistical starting point for the jth rotation data is determined based on the system time for the jth rotation data and the committed aging.

When the difference between the system time for the jth rotation data and the statistical termination point for the jth-1 rotation data is less than the promise aging, the statistical termination point for the jth rotation data is determined based on the system time for the jth rotation data, the time slice duration information, and the promise aging.

When the difference between the system time for the jth rotation data and the statistical ending point for the jth-1 rotation data is greater than or equal to zero and less than the committed aging, a statistical starting point for the jth rotation data is determined based on the statistical ending point for the jth-1 rotation data.

When the difference between the system time for the jth rotation data and the statistical termination point for the jth-1 rotation data is greater than or equal to zero and less than the committed aging, the statistical termination point for the jth rotation data is determined based on the latest input time for the jth rotation data.

For example, to meet the outward promise age Tx, according to a single time slice length L, accuracy a, a start timestamp and an end timestamp of data to be processed need to be calculated: the maximum time stamp of the ith round of input data defining the stream statistics stage is In (i), and the ith round needs to output data Dat with the minimum time stamp being P (i) and the maximum time stamp being Q (i). In (i) refers to a time stamp of the last data In the ith round of input data, which is influenced by a network or a processing speed of other devices, so that part of data may arrive later.

For the j-th round of statistics to be processed, recording the current system time as Sys (j), obtaining the maximum timestamp of the processed data, namely the maximum timestamp Q (j-1) of the j-1-th round of output data, and initializing Q (j) to a specified starting value if Q (j-1) is not present if j=0, such as: initializing Q (j-1) to 00:00:00.

the time difference between Sys (j) and Q (j-1) is used as Deltat 1 (j), and according to the relation between the time difference Deltat 1 (j) and the promised aging Tx, a starting point timestamp P (j) meeting the maximum data processing capacity in the promised aging Tx and the time slice L, namely the starting timestamp P (j), can be determined under the current system time; since output data of future time stamps is generally not processed in the j-1 th rotation statistics, Q (j-1) <=sys (j), i.e., Δt1 (j) > =0.

When Δt1 (j) < Tx, it means that the streaming statistics of the j-1 th round are faster than the preset time, then the next time stamp of the j-1 th round of processing is enough, so when Δt1 (j) < Tx, P (j) =q (j-1); when Δt1 (j) > = Tx, it is indicated that the processing speed of the jth round is slower than the preset aging or just reaches the aging, then the jth round is to process the timestamp of the output data starting from Sys (j) -Tx, at which time the input data is still input with the latest maximum data amount, and the timestamp from Q (j-1) to Sys (j) -Tx is over-aged, thus ignoring the output timestamp from Q (j-1) to Sys (j) -Tx, so when Δt1 (j) > = Tx, P (j) = Sys (j) -Tx.

The time difference between the current system time Sys (j) and In (j) is used as Deltat 2 (j), and Q (j) meeting the maximum data processing amount In the promised aging Tx and the time slice L, namely a termination time stamp Q (j), can be determined according to the relation between the time difference Deltat 2 (j) and the promised aging Tx; since the processing at other stages before the jth rotation statistics takes time, in (j) <=sys (j), i.e., Δt2 (j) > =0.

When Δt2 (j) < Tx, the previous stage (e.g., stage C) is described as the input data of the streaming statistics stage, the processing aging is faster than the preset aging, and then the streaming statistics of the jth round is processed to the maximum timestamp (i.e., in (j)) of the previous stage of the jth round, so when Δt2 (j) < Tx, Q (j) =in (j); when Δt2 (j) > = Tx, the input data of the previous stage is used as the streaming statistics stage, and the processing aging is later than the preset aging, so that the streaming processing of the jth round needs to be processed until the future timestamp Sys (j) +l of the reserved running time of the present round is subtracted by the timestamp obtained by Tx, so that the error of the corresponding relationship between the data and time is as small as possible under the condition of satisfying the aging, and therefore when Δt2 (j) > = Tx, Q (j) = Sys (j) +l-Tx. The formulas for counting the start point and the end point can be shown in table 3.

TABLE 3 Table 3

For example, the maximum guaranteed aging min (Tx) =10 min, and the minimum time slice length min (L) =3 min, the calculated statistical starting point and statistical ending point may be as shown in table 4.

TABLE 4 Table 4

After the calculated promise aging is confirmed manually, setting the promise aging as a appointed value of the promise aging, incorporating a fixed parameter, and carrying out subsequent algorithm and program operation according to the parameter value.

In operation S305, a statistical point is determined, the statistical point including at least a statistical start point and a statistical end point.

Specifically, determining the statistical points may include the following operations.

Firstly, dividing a variable scale based on preset precision to obtain at least one statistical slice. The statistical slice obtained by dividing may correspond to one scale of the variable scale, or may correspond to a plurality of scales of the variable scale. Furthermore, the length of each statistical score may be the same or different. Referring to table 1, one or more of the time points may be selected as the statistical points.

Then, a statistical slice is determined from the at least one statistical slice that is located between the statistical starting point and the statistical ending point.

Specifically, after manual confirmation, the appointed value or the optimal value of the time slice and the statistical score slice is set, a fixed parameter is included, and the follow-up algorithm and program operation are carried out according to the parameter value.

In operation S307, the statistical result of the stream data for the specified variable is determined based on the statistical result of the statistical point.

In one embodiment, determining statistics of the stream data for the specified variable based on the statistics of the statistics points may include the following operations. And establishing a mapping relation between each statistic point and an operator to determine a statistic result of each statistic point based on the operator. Wherein the operator may be a statistical operator of the data set, including but not limited to: SUM cumulative value, AVREGE average value, MAX maximum value, variance \mean square error, etc.

Fig. 7 schematically illustrates a logic diagram of a streaming statistics method according to an embodiment of the present disclosure.

As shown in fig. 7, after initializing the time scale, the committed aging and time slice lengths are determined based on the calculated aging and time slice optimal values. Thus, the current statistical starting point and the current statistical ending point can be calculated based on the promised time effect, the time slice length and the like. And then carrying out stream statistics on stream data output by other processing stages through a mapping algorithm. The results of the streaming statistics may be sent to other processing stages.

In another embodiment, the method further comprises: if the statistical range of the stream data spans at least two variable scales, sub-statistical results of the stream data are respectively determined based on the at least two variable scales.

Then, statistics of the stream data are determined based on sub-statistics of the stream data.

Fig. 8 schematically illustrates a schematic diagram of a flow statistical method according to an embodiment of the present disclosure. Fig. 9 schematically illustrates a schematic diagram of a flow statistical method according to another embodiment of the present disclosure.

The schematic diagrams of the flow system calculation method for performing slice mapping on data based on a time scale in the embodiment of the disclosure are shown in fig. 8 and 9, and flow data generated by a distributed server cluster is collected into a table through processing of a plurality of stages and is recorded as a flow data table. The flow data of the flow data table and the time scale for determining the statistics starting point P and the statistics ending point Q are subjected to equal-precision (such as precision a) cutting mapping in table 1, for example, (the statistics starting point P) + (i times precision a) is the statistics point Pa (i), i is a positive integer greater than 0, specifically, the flow data of the flow data table is sequentially cut from the flow data corresponding to a certain scale value (such as a zero scale value in the embodiment) of the time scale to the statistics starting point P, the statistics point Pa (1), the statistics point Pa (2), the statistics point Pa (3),. The statistics point Pa (n),. The statistics ending point Q, and one-time cutting is completed, so that the flow statistics result is obtained. In another embodiment, the length from a certain scale value of the time scale (which scale value may also be a fixed distance from the statistics point Pa (i)) to the statistics point Pa (k) is denoted as statistics score segment S (k), k=0, 1,2,3,..n. In another embodiment, S (k) is a fixed distance, so the statistical score S (k) may be an arithmetic series interval or a constant value interval, and may even be generalized to an arithmetic series interval. Thus, the stream data is mapped once at the time scale segments P to Q through a series of statistics segments S (k) to obtain a stream statistics Dat.

It should be noted that fig. 8 is a schematic diagram of the statistical principle for the xth rotation data. Wherein data002, data005 are the data that should not arrive at the current statistics time, at this time, the statistics result for data002 may be missing in statistics result Dat01, and the statistics result for data005 may be missing in statistics result Dat 03. Along with the continuous arrival of stream data, after the successive arrival of data002 and data005, the subsequent statistical results comprise statistical results aiming at the data002 and the data 005. Fig. 9 is a schematic diagram of the statistical principle for the z-th rotation data. The statistical principle shown in fig. 8 and 9 is an embodiment in which the zero scale of the time scale is used as the statistical starting point. In other embodiments, the statistical starting point may be a non-zero graduated point on the time scale.

The algorithm is described below.

Specific pseudo-code references are as follows:

INSERT INTO stream statistics VALUES (Dat, pa (i))

SELECT operator (NVL (stream data table, data, initial value)), pa (i) FROM stream data table

RIGHT JOIN (SELECT time scale value FROM table 1WHERE time scale value BETWEEN P AND Q) time scale segment

ON (stream data table. Timestamp < = time scale segment. Time scale value Pa (i)

AND stream data table timestamp > some fixed scale value (or Pa (i) -S (k))

)

GROUP BY time scale segment time scale value

The operator may be a statistical operator of the data set, such as SUM cumulative value, AVREGE average value, MAX maximum value, variance \mean square error, and the like. If the mapping operation is carried out on the belt period (such as hour, day, month, season and year), if the starting point P and the ending point Q are counted by the cross-period time stamp, the mapping operation is directly carried out according to the pseudo-code algorithm; if the starting point P and the ending point Q are counted by the cross-period time stamp, the method is carried out in two sections: firstly, calculating a flow statistical result from a statistical starting point P to a zero scale value of the next period, and then calculating a flow statistical result from the zero scale value of the next period to a statistical ending point Q. For example,

INSERT INTO stream statistics VALUES (Dat, pa (i))

RIGHT JOIN (SELECT time scale value FROM table 1WHERE time scale value BETWEEN P AND previous cycle endpoint) time scale segment

AND stream data table timestamp > some fixed scale value (or Pa (i) -S (k))

)

GROUP BY time scale segment time scale value

Then go on again

INSERT INTO stream statistics VALUES (Dat, pa (i))

RIGHT JOIN (SELECT time scale value FROM table 1WHERE time scale value BETWEEN followed by a period start AND Q) time scale segment

AND stream data table timestamp > some fixed scale value (or Pa (i) -S (k))

)

GROUP BY time scale segment time scale value

Whether not cross-period or cross-period flow statistics, the above pseudo-code only exemplifies critical portions of logic, but is not limited to logic that may be associated with more tables in embodiments.

The stream statistics method provided by the embodiment of the disclosure provides a stream data processing method which is as efficient and accurate as possible and accords with the current method, and can flexibly adjust promise aging, adjust data precision, adjust time slices and adjust statistics score/statistics intervals by setting a time scale.

The flow statistical method provided by the embodiment of the disclosure gives the correlation between the aging and the time slice, and can continuously adjust and calculate the achievable optimal values of the aging and the time slice through the production record, and the determination of the aging is fundamental and clear.

According to the streaming statistics method provided by the embodiment of the disclosure, the precision and the scale of the time scale can be flexibly set along with different production scenes, so that the requirements of various application scenes can be met, and the scenes with high precision requirements can be met.

According to the streaming statistics method provided by the embodiment of the disclosure, when the application scene continuously generates data, the cyclic iteration calculation statistics is often carried out at a certain stage, but the cyclic iteration efficiency is low, the occupied resources are large, for example, if the application side accumulation showing processing stage E adopts a traditional cyclic iteration mode, the processing cannot be completed within less than 1 minute in advance, and the processing time can be shortened by times based on the algorithm of the variable scale.

According to the streaming statistics method provided by the embodiment of the disclosure, the streaming statistics algorithm based on the time scale can automatically solve the optimal statistics starting point and the optimal statistics ending point which are required to be processed in real time under the condition of appointed promise aging, so that the statistics result is obtained as accurately and timely as possible under the condition of meeting the appointed condition, and the scene requirement of streaming data application is met.

According to the flow statistics method provided by the embodiment of the disclosure, the solution provided by the invention is not only a time scale, but also can be popularized to other metrics according to an actual scene, the variable scale is set, and the method can be popularized to complete calculation of statistics values corresponding to a series of variable points through one-time mapping of the variable scale, so that the popularization is high.

Another aspect of the present disclosure provides a flow statistical device.

Fig. 10 schematically illustrates a structural diagram of a flow statistics apparatus according to an embodiment of the present disclosure.

As shown in fig. 10, the flow statistics apparatus 1000 includes: the scale setting module 1010, the start-stop point determining module 1020, the statistics point determining module 1030, and the statistics module 1040.

Wherein the scale setting module 1010 is configured to set a variable scale. The total length, accuracy, scale initialization and other processes of the variable scale may refer to the content of the relevant parts of the method, and will not be described herein.

The start-stop determination module 1020 is configured to determine a statistical start point and a statistical end point for a specified variable on the variable scale, where the statistical start point and the statistical end point are determined based on a processing speed of a data processing stage of the stream data.

The statistical point determining module 1030 is configured to determine statistical points, where the statistical points include at least a statistical start point and a statistical end point.

The statistics module 1040 is configured to determine statistics of stream data for a specified variable based on statistics of the statistics points.

The flow statistics apparatus 1000 relates to a time slice design algorithm, a shortest time effect calculation method, an adjustable time slice, an adjustable statistics interval, a flow statistics algorithm capable of promise aging, and the like, and specifically refers to relevant part of the content in the method embodiment.

It should be noted that, in the embodiment of the apparatus portion, the implementation manner, the solved technical problem, the implemented function, and the achieved technical effect of each module and the like are the same as or similar to the implementation manner, the solved technical problem, the implemented function, and the achieved technical effect of each corresponding step in the embodiment of the method portion, and are not described in detail herein.

Any number of the modules, or at least some of the functionality of any number, according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-a-substrate, a system-on-a-package, an Application Specific Integrated Circuit (ASIC), or in hardware or firmware in any other reasonable manner of integrating or packaging the circuits, or in any one of or in any suitable combination of three of software, hardware, and firmware. Alternatively, one or more of the modules according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which when executed, may perform the corresponding functions.

For example, any of the scale setting module 1010, the start-stop point determining module 1020, the statistical point determining module 1030, and the statistical module 1040 may be combined in one module to be implemented, or any of the modules may be split into a plurality of modules. Alternatively, at least some of the functionality of one or more of the modules may be combined with at least some of the functionality of other modules and implemented in one module. According to embodiments of the present disclosure, at least one of the scale setting module 1010, the start-stop point determination module 1020, the statistics point determination module 1030, and the statistics module 1040 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Alternatively, at least one of the scale setting module 1010, the start-stop point determining module 1020, the statistics point determining module 1030, and the statistics module 1040 may be at least partially implemented as computer program modules that, when executed, perform the corresponding functions.

Another aspect of the present disclosure provides an electronic device.

Fig. 11 schematically illustrates a block diagram of an electronic device according to an embodiment of the disclosure. The electronic device shown in fig. 11 is merely an example, and should not impose any limitations on the functionality and scope of use of embodiments of the present disclosure.

As shown in fig. 11, an electronic device 1100 according to an embodiment of the present disclosure includes a processor 1101 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. The processor 1101 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. The processor 1101 may also include on-board memory for caching purposes. The processor 1101 may comprise a single processing unit or a plurality of processing units for performing the different actions of the method flow according to embodiments of the present disclosure.

In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are stored. The processor 1101, ROM 1102, and RAM 1103 are communicatively connected to each other by a bus 1104. The processor 1101 performs various operations of the method flow according to the embodiments of the present disclosure by executing programs in the ROM 1102 and/or the RAM 1103. Note that the program can also be stored in one or more memories other than the ROM 1102 and the RAM 1103. The processor 1101 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in one or more memories.

According to an embodiment of the disclosure, the electronic device 1100 may also include an input/output (I/O) interface 1105, the input/output (I/O) interface 1105 also being connected to the bus 1104. The electronic device 1100 may also include one or more of the following components connected to the I/O interface 1105: an input section 1106 including a keyboard, a mouse, and the like; an output portion 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk or the like; and a communication section 1109 including a network interface card such as a LAN card, a modem, and the like. The communication section 1109 performs communication processing via a network such as the internet. The drive 1110 is also connected to the I/O interface 1105 as needed. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.

According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1101. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.

The present disclosure also provides a computer-readable storage medium that may be embodied in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the apparatus/device/system. The computer-readable storage medium carries one or more programs which, when executed, implement methods in accordance with embodiments of the present disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example, but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, the computer-readable storage medium may include ROM 1102 and/or RAM1103 described above and/or one or more memories other than ROM 1102 and RAM 1103.

Those skilled in the art will appreciate that the features recited in the various embodiments of the disclosure and/or in the claims may be combined in various combinations and/or combinations, even if such combinations or combinations are not explicitly recited in the disclosure. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described above separately, this does not mean that the measures in the embodiments cannot be used advantageously in combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims

1. A streaming statistics method applied to statistics of streaming data, the streaming data being processed by a data processing stage to complete a data processing procedure, the method comprising:

setting a variable scale;

determining a statistical starting point and a statistical ending point for a specified variable on the variable scale, wherein the specified variable is a time variable;

determining a statistical point, wherein the statistical point at least comprises the statistical starting point and the statistical ending point; and

Determining a statistical result of the stream data for the specified variable based on the statistical result of the statistical point;

wherein the statistical start point and the statistical end point are determined based on a processing speed of a data processing stage of the stream data;

wherein the determining the statistical starting point and the statistical ending point for the specified variable on the variable scale comprises:

determining a statistical starting point and a statistical ending point for the current point in time based on the processing time and/or the duration information of the processing units, each processing unit comprising at least one data processing stage;

wherein the duration information of the processing unit is determined by:

determining a specified number of data processing stages through which the stream data needs to pass,

the processing duration of each data processing stage is determined,

merging at least part of the data processing stages in the designated number of data processing stages based on a preset rule and the processing time length of each data processing stage to obtain at least one processing unit and respective time length information;

the processing age is determined by:

determining time slice duration information based on the at least one processing unit,

determining a processing age based on the time slice duration information and the number of the at least one processing unit;

Wherein the preset rule includes at least one of: for each of the data processing stages,

if the total processing time length of the data processing stages after being combined into the adjacent data processing stages is smaller than the set time length threshold value, combining into the adjacent data processing stages, and repeating the above operations until the total processing time length of the combined data processing stages is greater than or equal to the set time length threshold value;

wherein, after the processing aging is determined,

determining the fluctuation range of processing aging;

determining a committed aging based on the process aging and the fluctuation range;

the determining a statistical starting point and a statistical ending point for the current time point based on the processing aging and/or the time length information of the processing unit comprises: determining a statistical starting point and a statistical ending point for the current time point based on the promised aging and/or the duration information of the processing unit;

the obtaining method of the promise aging comprises the following steps: a committed aging is determined based on the process aging and the fluctuation range.

2. The method of claim 1, wherein the determining a statistical starting point and a statistical ending point for a current point in time based on the committed aging and/or the duration information of the processing unit comprises: for the j-th rotation data and the j-1-th rotation data, wherein j is a positive integer greater than or equal to 1,

Determining a statistical starting point for the jth rotation data based on the system time for the jth rotation data and the committed aging when a difference between the system time for the jth rotation data and the statistical ending point for the jth-1 rotation data is greater than or equal to the committed aging;

when the difference between the system time for the jth rotation data and the statistical termination point for the jth-1 rotation data is smaller than the promise aging, determining the statistical termination point for the jth rotation data based on the system time for the jth rotation data, the time slice duration information and the promise aging;

when the difference between the system time for the jth rotation data and the statistical termination point for the jth-1 rotation data is greater than or equal to zero and less than the promise aging, determining a statistical starting point for the jth rotation data based on the statistical termination point for the jth-1 rotation data; and

when the difference between the system time for the jth rotation data and the statistical termination point for the jth-1 rotation data is greater than or equal to zero and less than the promise aging, the statistical termination point for the jth rotation data is determined based on the latest input time for the jth rotation data.

3. The method of claim 1, further comprising: after said determining a committed aging based on said machining aging and said fluctuation range,

if the deviation between the committed aging and the actual aging exceeds a set deviation threshold, the committed aging is updated.

4. The method of claim 1, wherein the stream data comprises at least one round of data, each round of data comprising an amount of data processed by a data processing stage having a longest processing duration.

5. The method of claim 1, further comprising: before the variable scale is set up,

initializing the variable scale based on the total length of the variable scale and a preset precision.

6. The method of claim 1, wherein the determining a statistical point comprises:

dividing the variable scale based on preset precision to obtain at least one statistical slice;

a statistical slice is determined from the at least one statistical slice that is located between the statistical starting point and the statistical ending point.

7. The method of claim 1, further comprising:

if the statistical range of the stream data spans at least two variable scales, respectively determining sub-statistical results of the stream data based on the at least two variable scales; and

And determining the statistical result of the stream data based on the sub-statistical result of the stream data.

8. The method of claim 1, wherein the determining statistics of the stream data for the specified variable based on statistics of the statistics points comprises:

and establishing a mapping relation between each statistic point and an operator to determine a statistic result of each statistic point based on the operator.

9. A statistics apparatus for stream data, comprising:

the scale setting module is used for setting a variable scale;

a start-stop determination module, configured to determine a statistical start point and a statistical end point for a specified variable on the variable scale, where the statistical start point and the statistical end point are determined based on a processing speed of a data processing stage of the stream data, and the specified variable is a time variable;

the statistical point determining module is used for determining statistical points, and the statistical points at least comprise the statistical starting point and the statistical ending point; and

a statistics module, configured to determine a statistics result of the stream data for the specified variable based on a statistics result of the statistics point;

wherein the duration information of the processing unit is determined by:

the processing duration of each data processing stage is determined,

the processing age is determined by:

Wherein, after the processing aging is determined,

determining the fluctuation range of processing aging;

10. An electronic device, comprising:

one or more processors;

storage means for storing executable instructions which when executed by the processor implement the method according to any one of claims 1 to 8.

11. A computer readable storage medium having stored thereon instructions which, when executed, implement the method according to any of claims 1 to 8.