CN111125109A - Real-time statistical report system based on time grouping accumulation algorithm - Google Patents

Real-time statistical report system based on time grouping accumulation algorithm Download PDF

Info

Publication number
CN111125109A
CN111125109A CN201911351721.6A CN201911351721A CN111125109A CN 111125109 A CN111125109 A CN 111125109A CN 201911351721 A CN201911351721 A CN 201911351721A CN 111125109 A CN111125109 A CN 111125109A
Authority
CN
China
Prior art keywords
time
data
instruction
statistical
source data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911351721.6A
Other languages
Chinese (zh)
Inventor
包海全
罗志坚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Dejiu Information Technology Co Ltd
Original Assignee
Guangzhou Dejiu Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Dejiu Information Technology Co Ltd filed Critical Guangzhou Dejiu Information Technology Co Ltd
Priority to CN201911351721.6A priority Critical patent/CN111125109A/en
Publication of CN111125109A publication Critical patent/CN111125109A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a real-time statistical statement system based on a time grouping accumulation algorithm, which comprises a source data acquisition module, a source data processing module, a data accumulation and combination module, a data storage module and a statistical query module. The invention realizes a low-cost high-output real-time statistical system, so that the data growth amount has direct relation with the storage space, but has no direct relation with the calculation capacity of the statistical system, and the waiting time of the report data inquired by a user is obviously reduced.

Description

Real-time statistical report system based on time grouping accumulation algorithm
Technical Field
The invention relates to the technical field of computer software, in particular to a real-time statistical statement system based on a time grouping accumulation algorithm.
Background
When massive data need to output report data and decision data through a statistical algorithm, a large amount of hardware and software resources are consumed, the decision is delayed due to long calculation time, the cost of the hardware and the cost of the calculation time are in inverse proportion, the flexibility of the output report data or the decision data on the statistical dimension is not enough, the more the dimension of a real object is, the more the input hardware and the output calculation time is, the economic value is not in direct proportion to the input of the resources, and the balance is lost.
Disclosure of Invention
The technical problem to be solved by the embodiments of the present invention is to provide a real-time statistical reporting system based on a time grouping accumulation algorithm, which can effectively improve the statistical efficiency of the real-time statistical reporting system.
In order to solve the above technical problem, an embodiment of the present invention provides a real-time statistical reporting system based on a time grouping accumulation algorithm, including:
the source data acquisition module is used for acquiring and storing the instruction data and the source data input by the external service module;
the source data processing module is used for reading the instruction data and the source data which are transmitted by the source data acquisition module, analyzing the source data according to the read source data format, and generating and storing report data according to the read source data statistical information;
the data accumulation and combination module is used for detecting each level of time dimension of the current time in real time when the source data are obtained, and accumulating and combining the statistical data of the previous moment of the previous level of time dimension and storing the statistical data when the time value is judged to reach the preset trigger value according to the next level of time dimension of the current time; wherein, the judging order of the time dimension is seconds, minutes, hours, days, months and years in turn;
the data storage module is used for responding to data storage and reading requests of all the modules and storing and reading data;
and the statistical query module is used for performing time division according to the input instruction of the external service module and time period information in the input instruction, constructing a report query instruction according to the result of the time division and the data field filtering condition and the field grouping condition in the input instruction, performing statistical report query according to the report query instruction and returning a query result.
Further, the source data obtaining module specifically includes:
the data acquisition unit is used for receiving the instruction data and the source data input by the external service module through a Restful interface;
the instruction checking unit is used for performing instruction analysis and format checking on the instruction data input by the external service module;
and the data storage unit is used for storing the instruction data and the source data.
Further, the source data processing module specifically includes:
the data reading unit is used for reading the instruction data and the source data which are transmitted by the source data acquisition module;
the source data analysis unit is used for analyzing the source data according to the read source data format;
the report generation unit is used for generating report data according to the read source data statistical information;
and the statistical data storage unit is used for constructing a data storage table name according to the file information obtained by analyzing the source data and storing the report data by using the data storage table name.
Further, the real-time statistical reporting system based on the time grouping accumulation algorithm further comprises a data clearing module, which is used for deleting the corresponding reporting data according to the data clearing instruction of the external service module.
Further, the real-time statistical reporting system based on the time grouping accumulation algorithm further comprises a data resetting module, which is used for resetting the corresponding reporting data according to the data resetting instruction of the external service module.
Further, the statistical query module specifically includes:
an input instruction acquisition unit, configured to acquire an input instruction of the external service module;
the time grouping unit is used for carrying out time division according to the time period information and the time grouping field in the input instruction; wherein the time division result comprises a time starting segment group, a time middle segment group and a time end segment group, the time starting segment group is a starting time segment of a time dimension next to the time dimension of the time grouping field, and the end time point of the time starting segment group is the end time specified by the time grouping field; the time middle section is grouped into a middle time section of the time dimension where the time grouping field is located; the time end section group is the end time section of the time dimension which is next to the time dimension of the time grouping field, and the starting time point of the time end section group is the starting time specified by the time grouping field;
the query instruction construction unit is used for constructing a report query instruction according to the time division result and the data field filtering condition and the field grouping condition in the input instruction; wherein the filtering condition comprises a statistical condition, a condition group, a data statistic and a statistical identifier;
and the report query unit is used for performing statistical report query according to the report query instruction, merging the queried report data and returning the merged report data.
Compared with the prior art, the invention has the following beneficial effects:
the embodiment of the invention provides a real-time statistical statement system based on a time grouping accumulation algorithm, which comprises a source data acquisition module, a source data processing module, a data accumulation and combination module, a data storage module and a statistical query module. The invention realizes a low-cost high-output real-time statistical system, so that the data growth amount has direct relation with the storage space, but has no direct relation with the calculation capacity of the statistical system, and the waiting time of the report data inquired by a user is obviously reduced.
Drawings
FIG. 1 is a schematic structural diagram of a real-time statistical reporting system based on a time grouping accumulation algorithm according to an embodiment of the present invention;
FIG. 2 is another schematic diagram of a real-time statistical reporting system based on a time grouping accumulation algorithm according to an embodiment of the present invention;
fig. 3 is a general flowchart of a real-time statistics reporting system according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the system of the invention can be used as grouping conditions and filtering conditions according to the time characteristics of year, month, day, hour and minute; meanwhile, the data can be accumulated and merged according to time: hours for a number of minutes, days for a number of hours, months for a number of days, and adults for a number of months; the data in any time period can be queried and the result can be presented immediately; it will be appreciated that a reporting system is a system that outputs any data of interest in tabular form. The invention achieves the effect of data compression and storage by accumulative combination statistics, achieves the effect of multi-time dimension real-time query by dimension pre-statistics of minutes, hours, days, months, years and the like, and achieves the effect of quick query of data in any time period by a time grouping method.
Referring to fig. 1, an embodiment of the present invention provides a real-time statistical reporting system based on a time grouping accumulation algorithm, including:
the source data acquisition module is used for acquiring and storing the instruction data and the source data input by the external service module;
in the embodiment of the present invention, further, the source data obtaining module specifically includes:
the data acquisition unit is used for receiving the instruction data and the source data input by the external service module through a Restful interface;
the instruction checking unit is used for performing instruction analysis and format checking on the instruction data input by the external service module;
and the data storage unit is used for storing the instruction data and the source data.
In the embodiment of the present invention, the working process of the source data obtaining module is as follows: receiving an external service module instruction through a Restful interface; the instruction analysis and the format check are carried out, if the check is passed, the next flow is entered, otherwise, an error is returned; acquiring source data format information from the instruction, entering a data storage submodule, starting a processing thread of the source data acquisition submodule, and inputting parameters including source data address information and related parameters; a source data acquisition module monitors a network address and prepares to receive source data; the network socket monitors and reads source data; the source data analysis and format verification are carried out, the analysis and verification basis is obtained from the data storage submodule, the information is stored in the front, the analysis and verification are passed, the data storage submodule is accessed, and otherwise, an error is returned; and starting a data accumulation and combination module.
The source data processing module is used for reading the instruction data and the source data which are transmitted by the source data acquisition module, analyzing the source data according to the read source data format, and generating and storing report data according to the read source data statistical information;
in this embodiment of the present invention, further, the source data processing module specifically includes:
the data reading unit is used for reading the instruction data and the source data which are transmitted by the source data acquisition module;
the source data analysis unit is used for analyzing the source data according to the read source data format;
the report generation unit is used for generating report data according to the read source data statistical information;
and the statistical data storage unit is used for constructing a data storage table name according to the file information obtained by analyzing the source data and storing the report data by using the data storage table name.
In the embodiment of the present invention, the working process of the source data processing module is as follows: and starting a source data processing thread, and awakening the data accumulation merging module when the 'minute' of the module is triggered. Reading the source data format information and reading the source data. And analyzing the source data and checking the format, wherein the analyzing and checking basis is the format information read in the front and is used as the basis, if the analyzing and checking pass, the step of reading the statistical information is started, and if the analyzing and checking pass, the error log is written in. The statistically relevant information is read from the data store, previously stored, and only read here. And combining the statistical field combination according to the statistical information and extracting statistical target data. Entering a data storage sub-module, and writing statistical data and key values (statistical condition combination) into a Mongo database; and thread destruction is carried out when the source data processing is finished.
The data accumulation and combination module is used for detecting each level of time dimension of the current time in real time when the source data are obtained, and accumulating and combining the statistical data of the previous moment of the previous level of time dimension and storing the statistical data when the time value is judged to reach the preset trigger value according to the next level of time dimension of the current time; wherein, the judging order of the time dimension is seconds, minutes, hours, days, months and years in turn;
in the embodiment of the present invention, the working process of the data accumulation and merging module is as follows: the cumulative merged data thread is started and the source data acquisition module can trigger this module.
1. And acquiring the current time, judging whether the second number of the current time is 0 or not, acquiring statistical information, and executing the judgment time once every second.
2. And (4) if the current time second is 0, entering a source data processing submodule and judging whether the current time second is 0, otherwise, returning to the step 1.
And if the current time is 0, reading all the minute statistical data of the last hour, counting the statistical data of the last hour, judging whether the current time hour is 0, and if not, executing the step 1. All minute statistics were read and combined for the last hour. And entering a data storage submodule, and writing the statistical data of the last hour into a Mongo database.
3. And if the current hour number is 0, reading all hour statistical data of the previous day, counting the statistical data of the previous day, judging whether the current time day is 1, and if not, executing the step 1. Reading all hour statistics data of the previous day and combining the statistics data of the previous day. And entering a data storage submodule, and writing the statistical data in the Mongo database in the previous day.
4. And if the current time day is 1, reading all day statistical data of the last month, counting the statistical data of the last month, judging whether the current time month is 1, and if not, executing the step 1. And reading the statistical data of all days in the last month, and combining the statistical data in the last month. And entering a data storage submodule, and writing the statistical data of the last month into a Mongo database.
5. And (4) reading the statistical data of all the months in the previous year if the current month number is 1, and counting the statistical data of the previous year, otherwise, executing the step 1.
Remarking: the hour statistical data consists of 60-minute statistical data, the day statistical data consists of 24-hour statistical data, the number of days of the month statistical data composition is variable, the month statistical data is different, and the year statistical data consists of 12-month statistical data.
The data storage module is used for responding to data storage and reading requests of all the modules and storing and reading data;
in the embodiment of the present invention, the data storage module specifically includes:
1. other modules invoke interfaces.
2. And (5) file operation.
And 2.1, writing the file, and acquiring parameters, directory and file name of the interface target file.
2.2: and reading the file, and acquiring parameters, directory, file name and call back of the interface target file.
2.3: and deleting the file, and acquiring parameters, directory and file name of the interface target file.
3: and (5) operating the database.
3.1: and data storage, namely acquiring interface data storage parameters, table names, conditions and data.
3.2: and data query is carried out, and interface data storage parameters, table names and conditions are obtained.
3.3: and deleting data, and acquiring interface data storage parameters, table names and conditions.
Remarking: the operation target, condition and written data must be established when the memory module interface is called.
And the statistical query module is used for performing time division according to the input instruction of the external service module and time period information in the input instruction, constructing a report query instruction according to the result of the time division and the data field filtering condition and the field grouping condition in the input instruction, performing statistical report query according to the report query instruction and returning a query result.
In the embodiment of the present invention, further, the statistical query module specifically includes:
an input instruction acquisition unit, configured to acquire an input instruction of the external service module;
the time grouping unit is used for carrying out time division according to the time period information and the time grouping field in the input instruction; wherein the result of the time slicing comprises a start time segment group, a middle time segment group and an end time segment group;
the time starting segment group is a starting time segment of a time dimension which is lower than the time dimension of the time grouping field, the starting time segment can comprise a plurality of time segments specified by incomplete time grouping fields, and the ending time point of the time starting segment group is the ending time specified by the time grouping field;
the time middle section grouping is the middle time section of the time dimension where the time grouping field is located, and comprises time sections appointed by a plurality of complete time grouping fields;
the time end period grouping is the end time period of the time dimension which is next to the time dimension of the time grouping field, the end time period can comprise a plurality of time periods specified by incomplete time grouping fields, and the starting time point of the time end period grouping is the starting time specified by the time grouping field;
the query instruction construction unit is used for constructing a report query instruction according to the time division result and the data field filtering condition and the field grouping condition in the input instruction; wherein the filtering condition comprises a statistical condition, a condition group, a data statistic and a statistical identifier;
and the report query unit is used for performing statistical report query according to the report query instruction, merging the queried report data and returning the merged report data.
In a specific embodiment, the statistical query module specifically includes:
1: instruction parsing and format checking.
2: and time grouping, wherein the time is divided according to the input instruction time period and the time grouping condition.
2.1: the starting year of the instruction input time is incremented as a base until the instruction is equal to the ending year of the instruction input time.
2.2: and entering a query data storage submodule when the current time interval is enough to cover one year and the years are grouped.
2.3: performing a monthly traversal when the current time interval is not full of a year or is non-yearly grouped.
2.4: the starting month of the year traversal input time is incremented as a base until equal to the ending month of the year traversal input time.
2.5: and entering a query data storage submodule when the current time interval is enough to cover a month and the time interval is divided into annual groups or monthly groups.
2.6: performing a daily traversal when the current time interval is less than a monthly or non-yearly grouping or a non-monthly grouping.
2.7: the starting number of days of the month traversal input time is incremented as a base until equal to the ending number of days of the month traversal input time.
2.8: and entering the query data storage submodule when the current time interval is enough to cover one day and the current time interval is grouped by year or month or day.
2.9: when the current time interval is less than one day or non-annual grouping or non-monthly grouping or non-daily grouping, the traversal is performed.
2.10: the number of starting hours of the daily traversal input time is incremented as a base until equal to the number of ending hours of the daily traversal input time.
2.11: the query data storage submodule is entered when the current time interval is sufficient for one hour and a yearly or monthly or daily or hourly group.
2.12: performing a fractional traversal when the current time interval is less than one hour or non-annual grouping or non-monthly grouping or non-hourly grouping.
2.13: the starting number of minutes of the hour traversal input time increments as a base number until equal to the ending number of minutes of the hour traversal input time.
2.14: and entering the query data storage submodule when the data are grouped into years or months or days or time or groups.
Remarks 1: grouping conditions are year, month, day, hour and minute, and statistical query instruction input is carried out.
Remarks 2: the packet minimum unit is minutes, so step 2.14 goes to the data storage submodule in all cases.
3: conditional queries
3.1: conditional filtering portion for constructing data storage query instructions based on instruction input conditional filter fields
3.2: and constructing a condition grouping part of the data storage query instruction according to the instruction input condition grouping field, wherein the condition grouping is completed by depending on an internal mechanism of the Mongo data.
3.3: constructing a statistics portion of a data storage query instruction based on instruction input statistics field
4: merging data
4.1: if the command input packet condition is equal to the data query range, a node is saved separately.
4.2: if the command input packet condition is greater than the data query range, accumulating and storing one node.
4.3: if the above two steps are not satisfied, the month group data combination is executed.
4.4: and constructing memory storage key values, and sequentially accumulating and connecting the grouping time, the filtering condition and the condition grouping to form a hash key value.
4.5: and (5) sequentially executing grouping and annual grouping until the data processing is finished, and executing step 5.
Remarking: grouping condition is year, when the instruction input start time and end time are not complete for one year, the data are merged into data of one year according to month, so the result queried by the Mongo data needs to be merged twice
5: returning data
5.1: and traversing the memory storage nodes.
5.2: json structure data is constructed.
5.3: the Restful interface returns data.
5.4: waiting for asynchronous data fetches.
5.5: the statistical result is clear.
In the embodiment of the present invention, further, the real-time statistical reporting system based on the time grouping accumulation algorithm further includes a data clearing module, configured to delete corresponding reporting data according to a data clearing instruction of the external service module. The method specifically comprises the following steps:
1: the Restful interface accepts external service module instructions.
2: and (4) analyzing the instruction and checking the format, if the checking is passed, entering the next process, and if the checking is not passed, returning an error.
3: and constructing a source data deleting instruction according to the information of the source data acquisition module and the instruction data parameter.
4: and constructing a statistical data deleting instruction according to the information of the source data acquisition module and the instruction data parameter.
5: and entering a data storage submodule for data operation.
In the embodiment of the present invention, the real-time statistical reporting system based on the time grouping accumulation algorithm further includes a data resetting module, configured to reset corresponding reporting data according to a data resetting instruction of the external service module. The method specifically comprises the following steps:
1: the Restful interface accepts external service module instructions.
2: and (4) analyzing the instruction and checking the format, if the checking is passed, entering the next process, and if the checking is not passed, returning an error.
3: and constructing a statistical data deleting instruction according to the information of the source data processing module and the instruction data parameter.
4: and entering a data storage submodule for data operation.
5: returning to the source data processing module, and regenerating the data.
Referring to fig. 2-3, for better illustrating the working principle of the present invention, the following examples are given:
it should be noted that, in the embodiments of the present invention, a data source processing and data query instruction is implemented, and source data is calculated by a system to generate specified report data for external multiple types of real-time queries. The report data is generated through the following four steps of data source acquisition, source data processing, data accumulation and combination and data storage, and the report data is inquired through the step of statistical inquiry. The final purpose is to obtain statistical report data in a certain time period, formulate conditions to filter and combine the report data with multiple dimensions, and return a query result json tree structure.
Step 1: obtaining source data
The data acquisition source receives data commands and stores source data from the outside, and the data acquisition method is realized by the following steps.
Step 1.1: the Restful interface receives an external service module input instruction, and the instruction format is as follows:
Figure BDA0002334330450000101
Figure BDA0002334330450000111
data format: is the format of the source data to be counted and which fields the table name contains.
And (3) statistical conditions are as follows: is a condition for which fields in the source data are used as statistics.
And (3) data statistics: it is the field in the source data that is used to count out the data, and it can be renamed, and the action is the merge or total number, and the data with what condition can be included in the statistical data.
A time field: the time of source data generation.
Step 1.2: and analyzing the instruction and storing the instruction data for subsequent source data processing, accumulating the merged data and performing statistical query.
Step 1.2.1: and generating the statistical report owner identity and the unique identifier 'report identifier' for subsequent inquiry, deletion and reset.
Step 1.2.2: the instruction information and the report identifier are stored in a correlated mode, the hard disk of the embodiment stores the instruction information, and the storage condition of the hard disk is as follows.
…/report id/format
Step 1.3: and acquiring source data, and monitoring and reading the transmitted source data in real time by using a network.
Step 1.3.1: network acceptance preparation, thread starting and network socket monitoring.
Step 1.3.2: and returning to the step 1.1 instruction.
Figure BDA0002334330450000121
Step 1.4: and the source data is stored in a low-cost server hard disk storage mode.
Step 1.4.1: and (3) storing a directory and file name structure, wherein the time field is '2019-09-2012: 32: 55' in the example, the hard disk is stored as follows, one file is stored in one minute, and the time field is used as a directory and used as a file name.
…/report ID/2019/09/20/12/32. data
Step 1.4.2: and writing the hard disk data through the data storage submodule.
Step 1.5: the data accumulation merge module is started (step 3).
Step 2: source data processing
Source data format: "field 1, field 2, field 3, field 4"
Step 2.1: reading the source data format and the statistical information, reading the file stored in the step 1.2.2, and transmitting a report identifier when the module is started.
Step 2.2: reading the source data, reading the file stored in step 1.4.1, wherein the time for starting execution of the source data processing module in this example is "2019-09-2012: 32: 55", performing 31-share source data processing in 32-share mode, performing 32-share source data processing in 33-share mode, and so on.
Step 2.3: and (3) analyzing the source data according to the source data format obtained in the step (2.1), generating report data of 32 minutes, reading all data of the last minute file, and processing one by one.
Step 2.4: the report data is generated by using the statistical information obtained in step 2.1 as a basis, and the report data generated in this example is as follows.
Figure BDA0002334330450000122
Figure BDA0002334330450000131
Step 2.5: and storing the statistical data in a Mongo database.
Step 2.5.1: and (3) constructing a Mongo data storage table name according to the file information read in the step 2.2, wherein the storage table name in the step is as a sub-clock.
Table_201909201232
Step 2.5.2: and entering a data storage submodule to write into the Mongo database.
And step 3: accumulating merged data
Step 3.1: the cumulative merge data thread is started, step 1.5 triggers this module.
Step 3.2: a time trigger is started, and the turn in the thread started in step 3.1 judges whether the current time is a minute starting point, wherein the starting time in step 3.1 is "2019-09-2012: 32: 55" in this example.
Step 3.2.1: and triggering the step 2 when the current time is 2019-09-2012: 33:00, and starting the source data processing submodule.
Step 3.2.2: counting data of 2019-09-2012: 00-2019-09-2013: 00:00 at the current time of 2019-09-2013: 00:00 for one hour, reading 60 minutes of table data in the current time period, and constructing the read table name according to the step 2.5.1.
Step 3.2.3: the minute statistics were analyzed according to step 2.4.
Step 3.2.4: the statistical data is stored according to the method of step 2.5, where the hour table is written as follows.
Table_2019092012
Step 3.2.5: counting data of 2019-09-2000: 00-2019-09-2023: 59:59 for one day when the current time is 2019-09-2100: 00:00, reading 24-hour table data in the current time period, and constructing the read table name according to the step 2.5.1.
Step 3.2.6: the minute statistics were analyzed according to step 2.4.
Step 3.2.7: the statistical data is stored according to the method of step 2.5, where the day table is written as follows.
Table_20190920
Step 3.2.8: counting data of 2019-09-0100: 00: 00-2019-09-3023: 59:59 for one month when the current time is 2019-10-0100: 00:00, reading data of a table of 30 small days in the current time period, and constructing the read table name according to the step 2.5.1.
Step 3.2.9: the minute statistics were analyzed according to step 2.4.
Step 3.2.10: the statistical data is stored according to the method of step 2.5, written into the monthly table in this step, as follows.
Table_201909
Step 3.2.11: counting data of 2019-01-0100: 00: 00-2019-12-3123: 59:59 for one year when the current time is 2020-01-0100: 00:00, reading 12 moonlet table data of the current time period, and constructing the read table name according to the step 2.5.1 method.
Step 3.2.12: the minute statistics were analyzed according to step 2.4.
Step 3.2.13: the statistical data is stored according to the method of step 2.5, written in the chronology in this step, as follows.
Table_2019
Remarking: and 3.2.1-3.2.13 are continuously trained and executed, and the interval of the cyclic sleep is 1 second.
And 4, step 4: data storage
In the step, the reading and writing of the basic document and the reading and writing functions of the Mongo data are encapsulated and are irrelevant to the data.
And 5: statistical query
The statistical query is realized by receiving a data command and querying statistical data from the outside and the following steps.
Step 5.1: the Restful interface receives an external service module input instruction, and the instruction format is as follows:
Figure BDA0002334330450000141
Figure BDA0002334330450000151
in this example, data of a period from "start time" to "end time" is acquired, data satisfying statistical conditions is read in a time range, and content groups specified by data statistics are returned according to the content of the condition groups.
In this example the statistical query instruction is constructed from the information in step 1.1.
The statistics query in this example must carry the statistics identifier field returned in step 1.3.2.
In the example, the current time of the statistical query instruction is '2019-09-2015: 15: 50'.
Step 5.2: and reading the report information according to the condition of the step 1.2.1, and reading the report identification from the instruction.
Step 5.3: and 5, analyzing the instruction and checking, wherein the instruction and the check are required to accord with the preset report information read in the step 5.2.
Step 5.4: the "start time" to "end time" in the instruction are grouped, in conjunction with the "time grouping" field in the instruction.
Remarking: the time grouping in the instruction is divided into five cases, namely year, month, day, hour and hour.
Step 5.4.1: the time start segment is grouped, according to step 5.1 instruction "time grouping" field, the start segment is as follows:
2019-09-2009: 31: 00-2019-09-2009: 59
Step 5.4.2: time middle segment grouping, according to step 5.1 instruction "time grouping" field, middle segment as follows:
2019-09-2010: 00:00 to 2019-09-2010: 59
2019-09-2011: 00: 00-2019-09-2011: 59
2019-09-2012: 00: 00-2019-09-2012: 59
These three hours are the complete time period that fits into a "time packet".
Step 5.4.3: the end of time segment is grouped, according to step 5.1 instructing the "time grouping" field, the end segment is as follows:
2019-09-2013: 00: 00-2019-09-2013: 30:59
Remarking: the input time is usually random, the first and last time periods are incomplete when "time-grouped" packets, so that subdivision into smaller "time-groups" is required to merge the results.
Step 5.5: and constructing a query instruction according to the time divided in the step 5.4.1 to the step 5.4.3 and the 'statistical condition', 'condition grouping', 'data statistics' and 'statistical identification' input by the instruction, and determining the table name of the Mongo query according to the divided time.
An initial stage: and querying 28 tables from Table201909200931 to Table201909200959 to merge the hour data.
Middle section: and 3 tables are searched, and tables 2019092010-2019092012 do not need to merge data.
A tail section: and (4) inquiring 30 tables from Table201909201300 to Table201909201330, and merging the hour data.
The starting time to the ending time comprise 5 hours, and the Mongo data query instruction is as follows:
{
"Filter Condition
"field 1": field 1 value ",
"report form identification": identification 1 "
},
“$group”:{
"_ id": field 2 ",
field C { $ sum { $ field C "}
}
}
"$ group" is the internal instruction of the Mongo database, the condition grouping is active, "_ id" is the "field 2" carrying variable "of the" condition grouping "in the instruction, the" field C "is the statistical target, and the data is summarized through the" $ sum "instruction of the Mongo database.
The source data in this example has only one piece of data of "2019-09-2012: 32: 55", so the Mongo database returns only one piece of data as follows:
Figure BDA0002334330450000171
because the instruction execution time point is '2019-09-2015: 15: 50', and the data merging operation is performed when the step 3 is already executed too small, only the Table2019092012 has returned data, and the rest has no data.
Step 5.5: merging data
In this example, there is only one piece of data and the data is obtained from the hour table, and the "time grouping" is "hour", so there is no need to merge the data.
If the data input by the source data is in the range of 2019-09-2009: 31:00 to 2019-09-2009: 59:59, the data of the secondary time period needs to be combined into the data of 2019-09-2009: 00: 00.
The merge data operation is performed in memory.
In this example, the hour time and the condition are grouped into key values in sequence, and the statistical data is accumulatively stored in the hash table of the key values.
Figure BDA0002334330450000172
Figure BDA0002334330450000181
Step 5.6: returning data
In this example, the query result of the instruction in step 5.1 constructs json format data and Restful instruction return path, and the format is as follows:
Figure BDA0002334330450000182
Figure BDA0002334330450000191
step 6: clearing data
A clear past operation may be performed when statistics or source data is no longer used or storage space is insufficient.
Step 6.1: the Restful interface receives an external service module input instruction, and the instruction format is as follows:
Figure BDA0002334330450000192
step 6.2: and analyzing the instruction and acquiring an instruction field.
Step 6.3: time is divided into time points of year, month, day, hour and minute according to the method of step 5.4.
Step 6.4: and (4) constructing the name of the Mongo data table to be deleted according to the result of the step 6.3, and distinguishing the deletion from the year, month, day, hour and multi-table deletion.
Step 6.5: and executing a Mongo data deleting instruction to delete data, wherein the instruction format is as follows:
Figure BDA0002334330450000193
the data in the field "report id" of "id 1" is deleted, and in this example, only the Table of Table _2019092012 has data.
Step 6.7: clearing source data, constructing a directory structure and a file name by the method in the step 1.4.1, storing the source data in a minute-dimension file only, and deleting the following files in the example:
…/report ID/2019/09/20/12/32. data
Step 6.8: and executing the data storage submodule to delete the data.
Step 6.9: and returning a deletion result, wherein the data is as follows:
Figure BDA0002334330450000201
and 7: resetting data
When the source data changes, report data can be regenerated according to the latest source data.
Step 7.1: the Restful interface receives an external service module input instruction, and the instruction format is as follows:
Figure BDA0002334330450000202
step 7.2: and analyzing the instruction and acquiring an instruction field.
Step 7.3: and 6.2, 6.3 and 6.4, deleting the original report data and keeping the source data.
Step 7.4: the current time is set to the "start time" of the step 7.1 command input.
Step 7.5: and starting the step 3, and re-executing the source data processing and data accumulation merging module.
Step 7.6: returning top-up data results
Figure BDA0002334330450000203
In this case, the data returned by the instruction all carry the "instruction result" and the "report identifier", and the rest fields are different according to different instructions.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention.

Claims (6)

1. A real-time statistical reporting system based on a time grouping accumulation algorithm is characterized by comprising the following components:
the source data acquisition module is used for acquiring and storing the instruction data and the source data input by the external service module;
the source data processing module is used for reading the instruction data and the source data which are transmitted by the source data acquisition module, analyzing the source data according to the read source data format, and generating and storing report data according to the read source data statistical information;
the data accumulation and combination module is used for detecting each level of time dimension of the current time in real time when the source data are obtained, and accumulating and combining the statistical data of the previous moment of the previous level of time dimension and storing the statistical data when the time value is judged to reach the preset trigger value according to the next level of time dimension of the current time; wherein, the judging order of the time dimension is seconds, minutes, hours, days, months and years in turn;
the data storage module is used for responding to data storage and reading requests of all the modules and storing and reading data;
and the statistical query module is used for performing time division according to the input instruction of the external service module and time period information in the input instruction, constructing a report query instruction according to the result of the time division and the data field filtering condition and the field grouping condition in the input instruction, performing statistical report query according to the report query instruction and returning a query result.
2. The real-time statistical reporting system based on the time-grouping accumulation algorithm of claim 1, wherein the source data acquisition module specifically comprises:
the data acquisition unit is used for receiving the instruction data and the source data input by the external service module through a Restful interface;
the instruction checking unit is used for performing instruction analysis and format checking on the instruction data input by the external service module;
and the data storage unit is used for storing the instruction data and the source data.
3. The real-time statistical reporting system based on the time-grouping accumulation algorithm of claim 1, wherein the source data processing module specifically comprises:
the data reading unit is used for reading the instruction data and the source data which are transmitted by the source data acquisition module;
the source data analysis unit is used for analyzing the source data according to the read source data format;
the report generation unit is used for generating report data according to the read source data statistical information;
and the statistical data storage unit is used for constructing a data storage table name according to the file information obtained by analyzing the source data and storing the report data by using the data storage table name.
4. The real-time statistical reporting system based on the time-grouped accumulation algorithm of claim 1, further comprising a data removal module for deleting corresponding reporting data according to a data removal instruction of the external service module.
5. The real-time statistical reporting system based on the time-grouping accumulation algorithm of claim 1, further comprising a data resetting module for resetting the corresponding reporting data according to a data resetting instruction of the external service module.
6. The real-time statistical reporting system based on the time-grouping accumulation algorithm as claimed in claim 1, wherein said statistics query module specifically comprises:
an input instruction acquisition unit, configured to acquire an input instruction of the external service module;
the time grouping unit is used for carrying out time division according to the time period information and the time grouping field in the input instruction; wherein the result of the time slicing comprises a start time segment group, a middle time segment group and an end time segment group; the time starting segment group is a starting time segment of a time dimension which is lower than the time dimension of the time grouping field, and the ending time point of the time starting segment group is the ending time specified by the time grouping field; the time middle section is grouped into a middle time section of the time dimension where the time grouping field is located; the time end section group is the end time section of the time dimension which is next to the time dimension of the time grouping field, and the starting time point of the time end section group is the starting time specified by the time grouping field;
the query instruction construction unit is used for constructing a report query instruction according to the time division result and the data field filtering condition and the field grouping condition in the input instruction; wherein the filtering condition comprises a statistical condition, a condition group, a data statistic and a statistical identifier;
and the report query unit is used for performing statistical report query according to the report query instruction, merging the queried report data and returning the merged report data.
CN201911351721.6A 2019-12-24 2019-12-24 Real-time statistical report system based on time grouping accumulation algorithm Pending CN111125109A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911351721.6A CN111125109A (en) 2019-12-24 2019-12-24 Real-time statistical report system based on time grouping accumulation algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911351721.6A CN111125109A (en) 2019-12-24 2019-12-24 Real-time statistical report system based on time grouping accumulation algorithm

Publications (1)

Publication Number Publication Date
CN111125109A true CN111125109A (en) 2020-05-08

Family

ID=70502203

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911351721.6A Pending CN111125109A (en) 2019-12-24 2019-12-24 Real-time statistical report system based on time grouping accumulation algorithm

Country Status (1)

Country Link
CN (1) CN111125109A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395372A (en) * 2020-12-10 2021-02-23 四川长虹电器股份有限公司 Quick statistical method based on two-dimensional table of relational database system
CN114138868A (en) * 2021-12-03 2022-03-04 中科三清科技有限公司 Method and device for drawing air quality statistical distribution map
CN114416817A (en) * 2021-12-21 2022-04-29 北京镁伽科技有限公司 Method, apparatus, device, system and storage medium for processing data

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424229A (en) * 2013-08-26 2015-03-18 腾讯科技(深圳)有限公司 Calculating method and system for multi-dimensional division
CN104657446A (en) * 2015-02-04 2015-05-27 深圳市汇朗科技有限公司 Combined statistical query method, combined statistical query device and combined statistical query system for secondary tables

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424229A (en) * 2013-08-26 2015-03-18 腾讯科技(深圳)有限公司 Calculating method and system for multi-dimensional division
CN104657446A (en) * 2015-02-04 2015-05-27 深圳市汇朗科技有限公司 Combined statistical query method, combined statistical query device and combined statistical query system for secondary tables

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112395372A (en) * 2020-12-10 2021-02-23 四川长虹电器股份有限公司 Quick statistical method based on two-dimensional table of relational database system
CN114138868A (en) * 2021-12-03 2022-03-04 中科三清科技有限公司 Method and device for drawing air quality statistical distribution map
CN114416817A (en) * 2021-12-21 2022-04-29 北京镁伽科技有限公司 Method, apparatus, device, system and storage medium for processing data

Similar Documents

Publication Publication Date Title
CN111125109A (en) Real-time statistical report system based on time grouping accumulation algorithm
WO2022126983A1 (en) Electronic report file export method, apparatus and device, and storage medium
CN111339078A (en) Data real-time storage method, data query method, device, equipment and medium
US10893067B1 (en) Systems and methods for rapidly generating security ratings
CN113360554A (en) Method and equipment for extracting, converting and loading ETL (extract transform load) data
CN110147470B (en) Cross-machine-room data comparison system and method
CN113312376B (en) Method and terminal for real-time processing and analysis of Nginx logs
CN112347501A (en) Data processing method, device, equipment and storage medium
CN114579408A (en) System and method for analyzing real-time equation of real-time database
CN113961573B (en) Time sequence database query method and query system
CN107644382A (en) Policy information statistical method and device
CN117251414B (en) Data storage and processing method based on heterogeneous technology
CN114398520A (en) Data retrieval method, system, device, electronic equipment and storage medium
CN104317820B (en) Statistical method and device for report forms
CN110888909B (en) Data statistical processing method and device for evaluation content
US11797480B2 (en) Storage of order books with persistent data structures
US10956369B1 (en) Data aggregations in a distributed environment
Colosi et al. Time series data management optimized for smart city policy decision
CN112765200A (en) Data query method and device based on Elasticissearch
CN111581220A (en) Storage and retrieval method, device, equipment and storage medium for time series data
CN110674190B (en) Statistical method and device for file system tasks and server
CN114625595B (en) Method, device and system for rechecking dynamic configuration information of service system
WO2022222665A1 (en) Request recognition method and apparatus, and device and storage medium
CN117633000A (en) Slow SQL analysis treatment method, device, equipment and storage medium
CN106407205B (en) Data aggregation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200508