CN111625557B

CN111625557B - Method for quickly estimating result of multi-condition billion-level data volume

Info

Publication number: CN111625557B
Application number: CN202010268561.5A
Authority: CN
Inventors: 董一聪
Original assignee: Shanghai Sailing Information Technology Co ltd
Current assignee: Shanghai Sailing Information Technology Co ltd
Priority date: 2020-04-07
Filing date: 2020-04-07
Publication date: 2023-04-14
Anticipated expiration: 2040-04-07
Also published as: CN111625557A

Abstract

The invention discloses a method for quickly estimating results of billion-level data volume multiple conditions, which relates to the field of data processing and specifically comprises the following steps: storing data; creating pre-statistical information, combining a use scene, and carrying out multi-dimension pre-statistics on a large amount of data stored in a database every day in real time by adopting a data pre-statistics technology when a system is constructed; estimating the data quantity; and (5) conditional segmentation. The invention provides a set of complete, efficient and quick data multi-condition quick result estimation method, which has the advantages of low implementation difficulty, low requirement on hardware and stable operation.

Description

Method for rapidly estimating results of billion-level data volume multi-condition

Technical Field

The invention relates to the field of data processing, in particular to a method for rapidly estimating results of billion-level data volume multi-condition.

Background

Along with the increasing importance of the state on security, the construction of checkpoint positions is increased year by public security and traffic departments at all levels, so that the received data is larger and larger, the dependence of user departments on the data is higher and higher, and the problem of how to solve the storage and query problems of mass data is solved.

The following 2 schemes are mainly used at present:

in the first scheme, a relational database (oracle and sqlserver are main) stores and queries full data.

And according to the scheme II, a relational database (oracle and sqlserver are mainly used) stores recent data and query, and a big data platform (hadoop is mainly used) stores full data, so that offline calculation is facilitated, and a result is returned to the relational database for query.

The first scheme and the second scheme have the following problems: the user construction cost is high, a large amount of data storage occupies high resources, and the maintenance cost is high, which mainly reflects in that the hard disk resources occupy high, the operation and maintenance difficulty of database software is high, and the data maintenance cost is high; the user experience is poor, the performance is poor during data warehousing and query, and the requirements of the actual combat application of public security and traffic user departments cannot be met; the operation and maintenance cost is high, the complexity of the installation and maintenance processes of oracle and sqlserver databases is high, certain requirements are imposed on field operation and maintenance personnel, and the operation and maintenance cost cannot be lowered; the data tuning difficulty is higher aiming at the industry background.

Therefore, those skilled in the art are devoted to developing a complete, efficient, fast method for multi-conditional fast result estimation of data.

Disclosure of Invention

In view of the above defects in the prior art, the technical problems to be solved by the present invention are to solve the storage cost, improve the data storage performance, improve the data query performance, and reduce the on-site operation and maintenance cost.

To achieve the above object, the present invention provides a method for multi-condition fast result estimation of billions of data volume, which is characterized by comprising the following steps:

step A, data storage;

b, creating pre-statistical information, combining a use scene, and performing multi-dimension pre-statistics on a large amount of data stored in a database in real time every day by adopting a data pre-statistics technology when a system is constructed;

c, estimating the data quantity;

and D, segmenting the conditions.

Further, the engine of the step A adopts TokuDB.

Further, the step A adopts a Mysql database based on Mycat data middleware.

Further, the line compression mode of step a adopts a zstd mode.

Further, the step B specifically includes selecting a condition of the service query as a dimension, automatically establishing the dimensions by a program, performing statistics in the database, and writing a statistical result into a corresponding result table, so as to implement the step C and the step D according to the statistical result when the service scene query is performed.

Further, the step C specifically includes querying the pre-statistical information in the step B according to a query condition of a user and combining the query results each time the system performs service query.

Further, the step D specifically includes, when querying data, the system performs truncation query on time according to a time condition in the query condition, in combination with the pre-statistical information and a system page display requirement.

Further, the condition query of the step B comprises a vehicle passing amount query.

Further, the system page display of step D needs to include a limited amount of information displayed per page when page display is performed.

Further, the index of the TokuDB is created online.

The technical effects of the method for rapidly estimating the result of the multi-condition multi-billion data volume disclosed by the invention are as follows:

1. the realization difficulty is smaller, the prior art is solved by a pure technical means, the technical difficulty is large, and the operation and maintenance cost is high. The invention combines the actual scene of the business, uses the engineering construction means to integrally solve the problem, and has small realization difficulty and low operation and maintenance cost. Both the requirements and performance requirements are met.

2. The invention has lower hardware requirement, and particularly has less number of servers used in the same data volume scene.

3. The invention has stable operation.

The conception, the specific structure and the technical effects of the present invention will be further described below to fully understand the objects, the features and the effects of the present invention.

Detailed Description

The following describes several preferred embodiments of the present invention to make the technical contents thereof clearer and easier to understand. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.

The following describes an embodiment of the present invention by taking a certain practical application as an example. The multi-condition data filtering mode based on the message middleware can be realized by the technical personnel in the field through the description.

Step 1, deploying a Mysql database of Mycat data middleware, installing a tokedb engine and installing a ztd row compression mode.

And 2, creating pre-statistical information. The method comprises the steps of combining a use scene, carrying out multi-dimension pre-statistics on a large amount of data stored in a warehouse every day in real time by adopting a data pre-statistics technology when a system is constructed, specifically selecting conditions of service query as dimensions, such as vehicle passing amount query, using vehicle passing time, passing places and vehicle attributes to query according to the scene, automatically establishing the dimensions by a program, carrying out statistics in a real-time database and writing results into a corresponding result table, and then carrying out data amount estimation and condition segmentation according to the statistical results when the service scene is queried.

And 3, estimating. Aiming at data volume statistics, after the pre-statistical information is created, when the system inquires the service each time, the system inquires the pre-statistical information according to the conditions inquired by the user, and then the inquiry results are combined, so that the inquiry speed is met, and the service requirements are met.

And 4, segmenting the conditions. Aiming at data query, when the system queries, according to the time condition in the query condition, the system performs truncation query on the time by combining the pre-statistical information and the system page display requirement (for example, the quantity of each page displayed in paging is limited), so that the query speed is improved.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions that can be obtained by those skilled in the art through logic analysis, reasoning or limited experiments based on the prior art according to the concepts of the present invention should be within the scope of protection determined by the claims.

Claims

1. A method for multi-conditional fast result estimation on billions of data volumes, comprising the steps of:

step A, deploying a Mysql database of Mycat data middleware, installing a TokuDB engine, and installing a zstd row compression mode;

b, creating pre-statistical information, combining a use scene, and performing multi-dimensional pre-statistics on a large amount of data stored in a database in real time every day by adopting a data pre-statistics technology when a system is constructed;

step C, after the pre-statistical information is created, when the system inquires the service each time, inquiring the pre-statistical information in the step B according to the inquiry condition of the user and combining the inquiry result;

and D, when data is inquired, the system carries out truncation inquiry on time according to the time condition in the inquiry condition and the combination of the pre-statistical information and the system page display requirement.

2. The method according to claim 1, wherein said step B specifically comprises selecting conditions of service query as dimensions, automatically creating these dimensions by a program, performing statistics in a real-time database and writing the statistical results into a corresponding result table, so as to implement said step C and said step D according to the statistical results when querying service scenes.

3. The method for trillion-level data volume multi-condition fast result evaluation as claimed in claim 2, wherein said conditional query of step B comprises an over-vehicle query.

4. The method for billion data volume multi-conditional fast result evaluation according to claim 1 wherein said system page display requirement of step D comprises a limited amount of information displayed per page during pagination display.

5. The method of billion-scale data volume multi-conditional fast result evaluation according to claim 2, wherein the index to TokuDB is created online.