CN111625557B - Method for quickly estimating result of multi-condition billion-level data volume - Google Patents
Method for quickly estimating result of multi-condition billion-level data volume Download PDFInfo
- Publication number
- CN111625557B CN111625557B CN202010268561.5A CN202010268561A CN111625557B CN 111625557 B CN111625557 B CN 111625557B CN 202010268561 A CN202010268561 A CN 202010268561A CN 111625557 B CN111625557 B CN 111625557B
- Authority
- CN
- China
- Prior art keywords
- data
- condition
- query
- statistics
- billion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2453—Query optimisation
- G06F16/24534—Query rewriting; Transformation
- G06F16/24539—Query rewriting; Transformation using cached or materialised query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/248—Presentation of query results
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method for quickly estimating results of billion-level data volume multiple conditions, which relates to the field of data processing and specifically comprises the following steps: storing data; creating pre-statistical information, combining a use scene, and carrying out multi-dimension pre-statistics on a large amount of data stored in a database every day in real time by adopting a data pre-statistics technology when a system is constructed; estimating the data quantity; and (5) conditional segmentation. The invention provides a set of complete, efficient and quick data multi-condition quick result estimation method, which has the advantages of low implementation difficulty, low requirement on hardware and stable operation.
Description
Technical Field
The invention relates to the field of data processing, in particular to a method for rapidly estimating results of billion-level data volume multi-condition.
Background
Along with the increasing importance of the state on security, the construction of checkpoint positions is increased year by public security and traffic departments at all levels, so that the received data is larger and larger, the dependence of user departments on the data is higher and higher, and the problem of how to solve the storage and query problems of mass data is solved.
The following 2 schemes are mainly used at present:
in the first scheme, a relational database (oracle and sqlserver are main) stores and queries full data.
And according to the scheme II, a relational database (oracle and sqlserver are mainly used) stores recent data and query, and a big data platform (hadoop is mainly used) stores full data, so that offline calculation is facilitated, and a result is returned to the relational database for query.
The first scheme and the second scheme have the following problems: the user construction cost is high, a large amount of data storage occupies high resources, and the maintenance cost is high, which mainly reflects in that the hard disk resources occupy high, the operation and maintenance difficulty of database software is high, and the data maintenance cost is high; the user experience is poor, the performance is poor during data warehousing and query, and the requirements of the actual combat application of public security and traffic user departments cannot be met; the operation and maintenance cost is high, the complexity of the installation and maintenance processes of oracle and sqlserver databases is high, certain requirements are imposed on field operation and maintenance personnel, and the operation and maintenance cost cannot be lowered; the data tuning difficulty is higher aiming at the industry background.
Therefore, those skilled in the art are devoted to developing a complete, efficient, fast method for multi-conditional fast result estimation of data.
Disclosure of Invention
In view of the above defects in the prior art, the technical problems to be solved by the present invention are to solve the storage cost, improve the data storage performance, improve the data query performance, and reduce the on-site operation and maintenance cost.
To achieve the above object, the present invention provides a method for multi-condition fast result estimation of billions of data volume, which is characterized by comprising the following steps:
step A, data storage;
b, creating pre-statistical information, combining a use scene, and performing multi-dimension pre-statistics on a large amount of data stored in a database in real time every day by adopting a data pre-statistics technology when a system is constructed;
c, estimating the data quantity;
and D, segmenting the conditions.
Further, the engine of the step A adopts TokuDB.
Further, the step A adopts a Mysql database based on Mycat data middleware.
Further, the line compression mode of step a adopts a zstd mode.
Further, the step B specifically includes selecting a condition of the service query as a dimension, automatically establishing the dimensions by a program, performing statistics in the database, and writing a statistical result into a corresponding result table, so as to implement the step C and the step D according to the statistical result when the service scene query is performed.
Further, the step C specifically includes querying the pre-statistical information in the step B according to a query condition of a user and combining the query results each time the system performs service query.
Further, the step D specifically includes, when querying data, the system performs truncation query on time according to a time condition in the query condition, in combination with the pre-statistical information and a system page display requirement.
Further, the condition query of the step B comprises a vehicle passing amount query.
Further, the system page display of step D needs to include a limited amount of information displayed per page when page display is performed.
Further, the index of the TokuDB is created online.
The technical effects of the method for rapidly estimating the result of the multi-condition multi-billion data volume disclosed by the invention are as follows:
1. the realization difficulty is smaller, the prior art is solved by a pure technical means, the technical difficulty is large, and the operation and maintenance cost is high. The invention combines the actual scene of the business, uses the engineering construction means to integrally solve the problem, and has small realization difficulty and low operation and maintenance cost. Both the requirements and performance requirements are met.
2. The invention has lower hardware requirement, and particularly has less number of servers used in the same data volume scene.
3. The invention has stable operation.
The conception, the specific structure and the technical effects of the present invention will be further described below to fully understand the objects, the features and the effects of the present invention.
Detailed Description
The following describes several preferred embodiments of the present invention to make the technical contents thereof clearer and easier to understand. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.
The following describes an embodiment of the present invention by taking a certain practical application as an example. The multi-condition data filtering mode based on the message middleware can be realized by the technical personnel in the field through the description.
Step 1, deploying a Mysql database of Mycat data middleware, installing a tokedb engine and installing a ztd row compression mode.
And 2, creating pre-statistical information. The method comprises the steps of combining a use scene, carrying out multi-dimension pre-statistics on a large amount of data stored in a warehouse every day in real time by adopting a data pre-statistics technology when a system is constructed, specifically selecting conditions of service query as dimensions, such as vehicle passing amount query, using vehicle passing time, passing places and vehicle attributes to query according to the scene, automatically establishing the dimensions by a program, carrying out statistics in a real-time database and writing results into a corresponding result table, and then carrying out data amount estimation and condition segmentation according to the statistical results when the service scene is queried.
And 3, estimating. Aiming at data volume statistics, after the pre-statistical information is created, when the system inquires the service each time, the system inquires the pre-statistical information according to the conditions inquired by the user, and then the inquiry results are combined, so that the inquiry speed is met, and the service requirements are met.
And 4, segmenting the conditions. Aiming at data query, when the system queries, according to the time condition in the query condition, the system performs truncation query on the time by combining the pre-statistical information and the system page display requirement (for example, the quantity of each page displayed in paging is limited), so that the query speed is improved.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions that can be obtained by those skilled in the art through logic analysis, reasoning or limited experiments based on the prior art according to the concepts of the present invention should be within the scope of protection determined by the claims.
Claims (5)
1. A method for multi-conditional fast result estimation on billions of data volumes, comprising the steps of:
step A, deploying a Mysql database of Mycat data middleware, installing a TokuDB engine, and installing a zstd row compression mode;
b, creating pre-statistical information, combining a use scene, and performing multi-dimensional pre-statistics on a large amount of data stored in a database in real time every day by adopting a data pre-statistics technology when a system is constructed;
step C, after the pre-statistical information is created, when the system inquires the service each time, inquiring the pre-statistical information in the step B according to the inquiry condition of the user and combining the inquiry result;
and D, when data is inquired, the system carries out truncation inquiry on time according to the time condition in the inquiry condition and the combination of the pre-statistical information and the system page display requirement.
2. The method according to claim 1, wherein said step B specifically comprises selecting conditions of service query as dimensions, automatically creating these dimensions by a program, performing statistics in a real-time database and writing the statistical results into a corresponding result table, so as to implement said step C and said step D according to the statistical results when querying service scenes.
3. The method for trillion-level data volume multi-condition fast result evaluation as claimed in claim 2, wherein said conditional query of step B comprises an over-vehicle query.
4. The method for billion data volume multi-conditional fast result evaluation according to claim 1 wherein said system page display requirement of step D comprises a limited amount of information displayed per page during pagination display.
5. The method of billion-scale data volume multi-conditional fast result evaluation according to claim 2, wherein the index to TokuDB is created online.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010268561.5A CN111625557B (en) | 2020-04-07 | 2020-04-07 | Method for quickly estimating result of multi-condition billion-level data volume |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010268561.5A CN111625557B (en) | 2020-04-07 | 2020-04-07 | Method for quickly estimating result of multi-condition billion-level data volume |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111625557A CN111625557A (en) | 2020-09-04 |
CN111625557B true CN111625557B (en) | 2023-04-14 |
Family
ID=72273060
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010268561.5A Active CN111625557B (en) | 2020-04-07 | 2020-04-07 | Method for quickly estimating result of multi-condition billion-level data volume |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111625557B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678634A (en) * | 2013-12-19 | 2014-03-26 | 北京锐安科技有限公司 | Method for improving data query speed in J-Hi open-source development platform |
CN103927346A (en) * | 2014-03-28 | 2014-07-16 | 浙江大学 | Query connection method on basis of data volumes |
CN108984797A (en) * | 2018-08-20 | 2018-12-11 | 新疆工程学院 | A kind of method for quick estimating of magnanimity monitor-type long-term sequence data related coefficient |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10970280B2 (en) * | 2015-10-07 | 2021-04-06 | International Business Machines Corporation | Query plan based on a data storage relationship |
US10902022B2 (en) * | 2017-03-28 | 2021-01-26 | Shanghai Kyligence Information Technology Co., Ltd | OLAP pre-calculation model, automatic modeling method, and automatic modeling system |
-
2020
- 2020-04-07 CN CN202010268561.5A patent/CN111625557B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103678634A (en) * | 2013-12-19 | 2014-03-26 | 北京锐安科技有限公司 | Method for improving data query speed in J-Hi open-source development platform |
CN103927346A (en) * | 2014-03-28 | 2014-07-16 | 浙江大学 | Query connection method on basis of data volumes |
CN108984797A (en) * | 2018-08-20 | 2018-12-11 | 新疆工程学院 | A kind of method for quick estimating of magnanimity monitor-type long-term sequence data related coefficient |
Non-Patent Citations (1)
Title |
---|
栾开宁 ; 郑海雁 ; 丁陈 ; 李昆明 ; .用于电力大数据快速组合查询的动态索引技术.电气技术.2015,(01),全文. * |
Also Published As
Publication number | Publication date |
---|---|
CN111625557A (en) | 2020-09-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Mouratidis et al. | Continuous monitoring of top-k queries over sliding windows | |
CN103577440B (en) | A kind of data processing method and device in non-relational database | |
US20220164345A1 (en) | Managed query execution platform, and methods thereof | |
EP2263180B1 (en) | Indexing large-scale gps tracks | |
CN109656958B (en) | Data query method and system | |
Ma et al. | KSQ: Top-k similarity query on uncertain trajectories | |
CN105159845A (en) | Memory reading method | |
US7860822B1 (en) | Distributed aggregation mapping | |
CN107766445B (en) | Efficient and rapid data retrieval method supporting multi-dimensional retrieval | |
CN102521269A (en) | Index-based computer continuous data protection method | |
EP3862888A1 (en) | Hybrid data distribution in a massively parallel processing architecture | |
Magdy et al. | GeoTrend: spatial trending queries on real-time microblogs | |
CN110888861A (en) | Novel big data storage method | |
CN109165096B (en) | Cache utilization system and method for web cluster | |
CN111258798A (en) | Fault positioning method and device for monitoring data, computer equipment and storage medium | |
US20220261391A1 (en) | Auto unload | |
CN112100510A (en) | Mass data query method and device based on Internet of vehicles platform | |
CN111625557B (en) | Method for quickly estimating result of multi-condition billion-level data volume | |
CN112445833A (en) | Data paging query method, device and system for distributed database | |
CN114185885A (en) | Streaming data processing method and system based on column storage database | |
CN110765221A (en) | Management method and device of space-time trajectory data | |
CN109739883A (en) | Promote the method, apparatus and electronic equipment of data query performance | |
US9465846B2 (en) | Storing events from a datastream | |
CN113282608A (en) | Intelligent traffic data analysis and storage method based on column database | |
Feng et al. | Indexing techniques of distributed ordered tables: A survey and analysis |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |