CN106919566A

CN106919566A - A kind of query statistic method and system based on mass data

Info

Publication number: CN106919566A
Application number: CN201510983031.8A
Authority: CN
Inventors: 杨梦帆; 张学军
Original assignee: Aisino Corp
Current assignee: Aisino Corp
Priority date: 2015-12-24
Filing date: 2015-12-24
Publication date: 2017-07-04

Abstract

The present invention relates to a kind of query statistic method and system based on mass data, business datum is carried out into cutting by enterprise ID and minimum statistics time, the process of statistics is split as multiple processes simultaneously, with reference to modes such as business dispersion hair, distributed statistics and standby times, the behavioral statisticses object of user is converted into middle table by original table, so as to improve query statistic efficiency.

Description

A kind of query statistic method and system based on mass data

Technical field

The present invention relates to technical field of data processing, a kind of query statistic method and system based on mass data are referred in particular to.

Background technology

Mass data refers to huge, immense data.At present, most application will be connected with database, and data predicting result is obtained by the operation such as inquiry.When reaching certain data volume, meeting more querying condition or many people's online queries simultaneously, query statistic generally needs to take a long time from database, and this means that this causes the time cost of great number；In addition, therefore the performance of Database Systems and data organization and management ability also can greatly weaken, or even internal memory can be caused to overflow or system crash.

The method of some conventional at present solution mass data query statistics has following several:（1）The fields such as extensive index, such as packet, sequence for big table are set up, respective index will be set up, can also typically set up composite index, it is considered to fill factor, curve factor and aggregation, nonclustered index of index etc.；（2）Optimizing Queries SQL statement, during query processing is carried out to mass data, the influence of the performance of the SQL statement of inquiry to search efficiency is very big, write efficiently excellent SQL scripts, association is for example reduced, it is few with or without vernier, design efficient database table structure etc. all very necessary；（3）Virtual memory is increased, if limited system resources, internal memory prompting is not enough, then can be solved by increasing virtual memory.The method of these conventional solution mass datas can reduce the time that mass data inquiry spends to a certain extent; but the data for several ten million even several hundred million; when foreground interface is inquired about, operational efficiency still slowly, is extremely difficult to good customer experience.

In order to overcome above mentioned problem, Publication No.（ CN103761251A）Chinese invention patent discloses storage and the lookup method of a kind of large-data-volume client information, including：(1) data storage：It is that each data item sets up unique data identification information during acquired customer profile data deposited in into multi-dimensional table structure；(2) index file is set up：It is index with identification information, is value generation index file with location parameter corresponding to identification information；(3) data search：According to the identification information of data to be found, the identification information is searched in customer information index file, obtain location parameter corresponding to the identification information；Data element is obtained from the multi-dimensional table structure according to the location parameter.The above method realizes that customer profile data non-database mode is stored and inquired about by index file, although has properly increased lookup speed, but during many people online query simultaneously, can still have a strong impact on the execution efficiency of statistical query.

The content of the invention

If therefore, the technical problems to be solved by the invention are to overcome the online query simultaneously of many people in the prior art to cause the problem of execution efficiency reduction so as to provide a kind of query statistic method and system based on mass data that can increase substantially execution efficiency.

In order to solve the above technical problems, a kind of query statistic method based on mass data of the invention, comprises the following steps：Step S1：The initial data that multiple users are uploaded is stored in original table；Step S2：The related data in all initial data in the original table is stored in Incremental Log table according to user terminal predicted query condition；Step S3：Related data in the Incremental Log table is carried out collecting statistics, statistics write-in will be collected and collect table, complete paired data collects statistics；Step S4：According to user terminal querying condition from it is described collect table in export Query Result, complete the output display of data.

In one embodiment of the invention, before the step S1, user terminal passes through the upload operation that uploading tools complete file, wherein the file includes initial data, server end is decrypted during initial data write into original table after the initial data from the file for receiving.

In one embodiment of the invention, after the step S2, if input has a newly-increased data in the original table, the trigger of the original table is by newly-increased data storage in the Incremental Log table.

In one embodiment of the invention, after the step S2, the related data that will be stored in the original table in Incremental Log table is marked or deleted.

In one embodiment of the invention, the initial data is the data on commodity, and the data on commodity include that commodity code and commodity are detailed；In the step S3, it is to the method that the related data in the Incremental Log table collect statistics：Data on commodity in the Incremental Log table are counted by commodity code to commodity detail, by the middle table that statistical information writes data summarization that collects of commodity.

In one embodiment of the invention, the data in the middle table are carried out by the treatment that collects in number of days and month, completes the insertion or renewal of result, and the data storage after treatment is collected into table described.

Present invention also offers a kind of query statistic system based on mass data, including：Receiver module, for receiving the initial data that multiple user terminals are uploaded, and by the original data transmissions to definition module；Definition module, for according to user terminal predicted query condition from the receiver module transmit come initial data in extract related data, and the related data of extraction sent to described collect module；Module is collected, the related data for being sent to the definition module collect statistics, and data is activation after collecting is to the query optimization module；Query optimization module, for according to user terminal querying condition from it is described collect that module sends collect after extracting data go out corresponding data, and the data output that will be extracted is to the user terminal.

In one embodiment of the invention, the predicted query condition is provided by user terminal or set by system server.

In one embodiment of the invention, the statistics that collects for collecting module and carrying out includes：Statistics is analyzed to the related data that the definition module sends according to data type and date terms, the insertion or renewal of data is completed.

In one embodiment of the invention, it is described collect module according to date terms to the related data that the definition module sends be analyzed statistics refer to by number of days and month recurrence complete data statistics.

Above-mentioned technical proposal of the invention has advantages below compared to existing technology：

The present invention collects the statistical query procedure decomposition of mass data and two processes of real-time query into statistical computation, statistical computation is collected into process and is placed on running background, and collect process and only collect incremental data, editor user's end subscriber querying condition, and then acquisition result is inquired about, substantially increase the execution efficiency of mass data statistical query.

Brief description of the drawings

In order that present disclosure is more likely to be clearly understood, below according to specific embodiment of the invention and with reference to accompanying drawing, the present invention is further detailed explanation, wherein

Fig. 1 is the flow chart of mass data inquiry method of the present invention；

Fig. 2 is the framework map of mass data inquiry system of the present invention.

Specific embodiment

Embodiment one：

As shown in figure 1, the present embodiment provides a kind of query statistic method based on mass data, its step is as follows, step S1：The initial data that multiple users are uploaded is stored in original table；Step S2：The related data in all initial data in the original table is stored in Incremental Log table according to user terminal predicted query condition；Step S3：Related data in the Incremental Log table is carried out collecting statistics, statistics write-in will be collected and collect table, complete paired data collects statistics；Step S4：According to user terminal querying condition from it is described collect table in export Query Result, complete the output display of data.

This implementation query statistic method based on mass data, in the step S1, the initial data that multiple users are uploaded is stored in original table, the convenient follow-up treatment to data；In the step S2, the related data in all initial data in the original table is stored in Incremental Log table according to user terminal predicted query condition, data in wherein described Incremental Log table come from the original table, and the data in the Incremental Log table are the data screened according to user terminal querying condition, and the Incremental Log table is the data of interim storage screening, data can be processed using backstage free time after data receiver in this step, so as to not interfere with the operational efficiency of system；In the step S3, related data in the Incremental Log table is carried out collecting statistics, statistics write-in will be collected and collect table, complete paired data collects statistics, this step will be still processed data after the data receiver of user using backstage, and data constantly are updated with statistics by collecting, make full use of the free time of database and server, largely improve the treatment effeciency of mass data, ensure that whole system user can obtain desired Query Result when query statistic data without a large amount of stand-by period, such that it is able to improve operating efficiency；Even if in addition, after having new initial data to be stored in the original table, if data before have processed completion, the Incremental Log table is only responsible for the newly-increased data for the treatment of, therefore also improves the treatment effeciency of data；In the step S4, according to user terminal querying condition from it is described collect table in export Query Result, the output display of data is completed, because data are classified statistics according to the querying condition of user, even if therefore many people online query simultaneously, does not interfere with the execution efficiency of statistical query yet；Furthermore, during the interface queries of foreground, Query Result is directly obtained from table is collected, full table query time is reduced, improve operating efficiency.

Before the step S1, user terminal is after uploading tools complete the upload operation of file, wherein the file includes initial data, server end is decrypted during initial data write into original table after the initial data from the file for receiving.In the step S1, the original table such as PTFPXX, the table is used to store the merchandise sales record of enterprise, and such as source table PTFPSPMX, the sale that the table is used to store enterprise is detailed.In the step S2, the related data in all initial data in the original table is stored in Incremental Log table according to user terminal predicted query condition, the predicted query condition provided according to user terminal in this step, wherein described predicted query condition is provided by user terminal or set by system server, such as by analyzing customer inquiries condition, according to querying condition, different sql are performed（Structured Query Language SQLs）Sentence, optimizes executive plan, the respective data storage of querying condition will be met in all initial data in the original table in the Incremental Log table, above-mentioned data to be processed using the backstage spare time, therefore improves the disposal ability of system；After the step S2, if input has newly-increased data in the original table, then the trigger of the original table can be automatically by newly-increased data storage in the Incremental Log table, and the related data that will be stored in the original table in Incremental Log table mark or deletion in time, such that it is able to only process newly-increased data, and the newly-increased data of capture treatment according to time such as hour in batches, therefore the treatment effeciency of mass data can be improved.Specifically, insertion, modification are set up such as on the original table to be extracted, 3 triggers are deleted, whenever the data in the original table change, the data that will just be changed by corresponding trigger（The key value and renewal action type of renewal）One Incremental Log table of write-in, and can be according to the newly-increased data of time such as hour capture in batches treatment, the related data being now stored in the original table in Incremental Log table is labeled in time or deletes, and has supplied the treatment of newly-increased data, therefore can improve the processing system of system.

Initial data described in the present embodiment is the data on commodity, and the data on commodity include that commodity code and commodity are detailed.In the step S3, it is to the method that the related data in the Incremental Log table collect statistics：Data on commodity in the Incremental Log table are counted by commodity code to commodity detail, by the middle table that statistical information writes data summarization that collects of commodity, specifically, the Incremental Log table is counted by commodity code to commodity detail, by the middle table that statistical information writes data summarization that collects of certain commodity, the data of the middle table are carried out to collect treatment by number of days and month, complete the insertion or renewal of result, and the data storage after treatment collected into table described, complete paired data collects statistics.The data in the Incremental Log table are analyzed according to data type and date terms such as and count and write middle table, by calling data processing storing process come processing data, wherein described Incremental Log table is counted by commodity code to commodity detail, the query statistic information of certain commodity is write the middle table of data summarization, including the operation such as data accumulation, duplicate removal, verification, the middle table is carried out to data in chronological order to collect statistics, the insertion or renewal of data are completed, formation collects table.Wherein described middle table collects related data in table with described, is contrasted by fields such as enterprise ID, commodity codes, completes the insertion or renewal of data, recurrence complete data by number of days and the statistics in month.Called automatically because all treatment are backstage storing process, the free time of database and server can be made full use of, therefore largely improve the treatment effeciency of mass data, ensure that whole system user can obtain desired Query Result when query statistic data without a large amount of stand-by period, so as to improve operating efficiency；In addition, business datum is carried out into cutting by enterprise ID and minimum statistics time, the process of statistics is split as multiple processes simultaneously, with reference to modes such as business dispersion hair, distributed statistics and standby times, the behavioral statisticses object of user is converted into middle table by original table, query statistic efficiency can be improved.

In the step S4, according to the querying condition that user terminal is provided, query statistic is carried out to the table that collects, and complete the output display of data.When being processed mass data using the above method, by statistical computation and collect process and be placed on running background, by increment data capturing, optimization increment capture scheduling executive plan, perform incremental data and periodically collect etc. the background process that step efficiently completes incremental data, during the interface queries of foreground, system optimization real-time query condition, Query Result is directly obtained from table is collected, not only reduce full table query time, operating efficiency is improved, and realizes that flexibility is good, extended capability is strong, easy to use and efficiency high advantage, maximize computer software and hardware resource efficiency.

Embodiment two：

As shown in Fig. 2 the present embodiment provides a kind of query statistic system based on mass data, including receiver module, for receiving the initial data that multiple user terminals are uploaded, and by original data transmissions to definition module；Definition module, for according to user terminal predicted query condition from the receiver module transmit come initial data in extract related data, and the related data of extraction sent to described collect module；Module is collected, the related data for being sent to the definition module collect statistics, and sent to the query optimization module after rear data processing will be collected；Query optimization module, for according to user terminal querying condition from it is described collect that module sends collect after extracting data go out corresponding data, and the data output that will be extracted is to the user terminal.

Query statistic system based on mass data described in the present embodiment,Including receiver module,Definition module,Collect module and query optimization module,Wherein described receiver module is used to receive the data of multiple user terminals,Specifically,User terminal 1,User terminal 2,User terminal 3 ... user terminal n sends data in the receiver module respectively,The initial data that each user terminal is uploaded is received by the receiver module,The receiver module is by the original data transmissions of the upload of user terminal to definition module,The definition module is used for according to user terminal predicted query condition,Related data is extracted in the initial data come from receiver module transmission,And the related data of extraction is sent to described collect module,The module that collects to the related data that the definition module sends for carrying out collecting statistics,And the data is activation after collecting is to the query optimization module,The query optimization module according to user terminal querying condition to collecting after data carry out query statistic,Wherein described querying condition is provided by user terminal,And Query Result is back to described each user terminal respectively,Such as user terminal 1,User terminal 2,User terminal 3 ... user terminal n.The present invention is processed data after the data receiver due to system by user using backstage, and data constantly are updated with statistics by collecting, make full use of the free time of database and server, largely improve the treatment effeciency of mass data, ensure that whole system user can obtain desired Query Result when query statistic data without a large amount of stand-by period, such that it is able to improve operating efficiency；In addition, system is optimized according to user terminal predicted query condition to initial data, realizes and collect statistics, even if therefore many people simultaneously online query, do not interfere with the execution efficiency of statistical query yet.

In the present embodiment, the definition module analyzes every scheduling executive plan of query SQL, by using index, ROWID according to family end predicted query condition generation correspondence SQL (SQL) sentence（Well-regulated character string）Etc. aspect optimization SQL, while analyze SQL executive plan.Wherein described predicted query condition is provided by user terminal or set by system server, by using the aspects such as index, ROWID optimization SQL analysis can be made more effective, to CBO (optimizer based on cost) the undesirable SQL of effect of optimization, assist to optimize using Hints, so as to be conducive to analyzing customer inquiries condition, and different SQL statements are performed according to querying condition.

The statistics that collects for collecting module and carrying out includes：Statistics is analyzed to the related data that the definition module sends according to data type and date terms, the insertion or renewal of data is completed.Specifically, include that statistic of classification and time count when to the related data in initial data collect statistics, realize being analyzed statistics to the related data in initial data according to data type and date terms by calling data processing storing process, including the operation such as data accumulation, duplicate removal, verification, and pass through the insertion or renewal that contrast completion data of data.It is described collect module according to date terms to the related data that the definition module sends be analyzed statistics refer to by number of days and month recurrence complete data statistics, make full use of the free time of database and server, largely improve the treatment effeciency of mass data, ensure that whole system user can obtain desired Query Result when query statistic data without a large amount of stand-by period, improve operating efficiency.

To sum up, technical scheme of the present invention has advantages below：

1. the query statistic method based on mass data of the present invention, when processing mass data, by statistical computation and collect process and be placed on running background, by increment data capturing, optimization increment capture scheduling executive plan, perform incremental data and periodically collect etc. the background process that step efficiently completes incremental data, during the interface queries of foreground, system optimization real-time query condition, Query Result is directly obtained from table is collected, not only reduce full table query time, improve operating efficiency, and realize that flexibility is good, extended capability is strong, the advantage of easy to use and efficiency high, maximize computer software and hardware resource efficiency.

2. the query statistic system based on mass data of the present invention,Including receiver module,Definition module,Collect module and query optimization module,Data are sent to described and collect module by the definition module after background process,The module that collects will effectively perform statistics and collect process after data statistics according to operation plan,And collect process and only collect incremental data,Complete paired data collects statistics,Then in data is activation to the query optimization module after collecting,By the query optimization module by data output to each user terminal,Due to the definition module and described module is collected in processing data in running background,Make full use of the free time of database and server,Largely improve the treatment effeciency of mass data,Ensure that whole system user can obtain desired Query Result when query statistic data without a large amount of stand-by period,Such that it is able to improve operating efficiency.

Obviously, above-described embodiment is only intended to clearly illustrate example, not to the restriction of implementation method.For those of ordinary skill in the field, the change of other multi-forms can also be made on the basis of the above description or is changed.There is no need and unable to be exhaustive to all of implementation method.And the obvious change thus extended out or among changing still in the protection domain of the invention.

Claims

1. a kind of query statistic method based on mass data, it is characterised in that comprise the following steps：

Step S1：The initial data that multiple users are uploaded is stored in original table；

Step S2：The related data in all initial data in the original table is stored in Incremental Log table according to user terminal predicted query condition；

Step S3：Related data in the Incremental Log table is carried out collecting statistics, statistics write-in will be collected and collect table, complete paired data collects statistics；

Step S4：According to user terminal querying condition from it is described collect table in export Query Result, complete the output display of data.

2. the query statistic method based on mass data according to claim 1, it is characterised in that：Before the step S1, user terminal passes through the upload operation that uploading tools complete file, wherein the file includes initial data, server end is decrypted during initial data write into original table after the initial data from the file for receiving.

3. the query statistic method based on mass data according to claim 1, it is characterised in that：After the step S2, if input has a newly-increased data in the original table, the trigger of the original table is by newly-increased data storage in the Incremental Log table.

4. the query statistic method based on mass data according to claim 3, it is characterised in that：After the step S2, the related data that will be stored in the original table in Incremental Log table is marked or deleted.

5. the query statistic method based on mass data according to claim 1, it is characterised in that：The initial data is the data on commodity, and the data on commodity include that commodity code and commodity are detailed；In the step S3, it is to the method that the related data in the Incremental Log table collect statistics：Data on commodity in the Incremental Log table are counted by commodity code to commodity detail, by the middle table that statistical information writes data summarization that collects of commodity.

6. the query statistic method based on mass data according to claim 5, it is characterised in that：Data in the middle table are carried out by the treatment that collects in number of days and month, complete the insertion or renewal of result, and the data storage after treatment is collected into table described.

7. a kind of query statistic system based on mass data, it is characterised in that including：

Receiver module, for receiving the initial data that multiple user terminals are uploaded, and by the original data transmissions to definition module；

Definition module, for according to user terminal predicted query condition from the receiver module transmit come initial data in extract related data, and the related data of extraction sent to described collect module；

Module is collected, the related data for being sent to the definition module collect statistics, and data is activation after collecting is to the query optimization module；

Query optimization module, for according to user terminal querying condition from it is described collect that module sends collect after extracting data go out corresponding data, and the data output that will be extracted is to the user terminal.

8. the query statistic systems approach based on mass data according to claim 7, it is characterised in that：The predicted query condition is provided by user terminal or set by system server.

9. the query statistic system based on mass data according to claim 7, it is characterised in that：The statistics that collects for collecting module and carrying out includes：Statistics is analyzed to the related data that the definition module sends according to data type and date terms, the insertion or renewal of data is completed.

10. the query statistic system based on mass data according to claim 9, it is characterised in that：It is described collect module according to date terms to the related data that the definition module sends be analyzed statistics refer to by number of days and month recurrence complete data statistics.