CN111625557B - Method for quickly estimating result of multi-condition billion-level data volume - Google Patents

Method for quickly estimating result of multi-condition billion-level data volume Download PDF

Info

Publication number
CN111625557B
CN111625557B CN202010268561.5A CN202010268561A CN111625557B CN 111625557 B CN111625557 B CN 111625557B CN 202010268561 A CN202010268561 A CN 202010268561A CN 111625557 B CN111625557 B CN 111625557B
Authority
CN
China
Prior art keywords
data
condition
query
statistics
billion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010268561.5A
Other languages
Chinese (zh)
Other versions
CN111625557A (en
Inventor
董一聪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sailing Information Technology Co ltd
Original Assignee
Shanghai Sailing Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sailing Information Technology Co ltd filed Critical Shanghai Sailing Information Technology Co ltd
Priority to CN202010268561.5A priority Critical patent/CN111625557B/en
Publication of CN111625557A publication Critical patent/CN111625557A/en
Application granted granted Critical
Publication of CN111625557B publication Critical patent/CN111625557B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/248Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for quickly estimating results of billion-level data volume multiple conditions, which relates to the field of data processing and specifically comprises the following steps: storing data; creating pre-statistical information, combining a use scene, and carrying out multi-dimension pre-statistics on a large amount of data stored in a database every day in real time by adopting a data pre-statistics technology when a system is constructed; estimating the data quantity; and (5) conditional segmentation. The invention provides a set of complete, efficient and quick data multi-condition quick result estimation method, which has the advantages of low implementation difficulty, low requirement on hardware and stable operation.

Description

Method for rapidly estimating results of billion-level data volume multi-condition
Technical Field
The invention relates to the field of data processing, in particular to a method for rapidly estimating results of billion-level data volume multi-condition.
Background
Along with the increasing importance of the state on security, the construction of checkpoint positions is increased year by public security and traffic departments at all levels, so that the received data is larger and larger, the dependence of user departments on the data is higher and higher, and the problem of how to solve the storage and query problems of mass data is solved.
The following 2 schemes are mainly used at present:
in the first scheme, a relational database (oracle and sqlserver are main) stores and queries full data.
And according to the scheme II, a relational database (oracle and sqlserver are mainly used) stores recent data and query, and a big data platform (hadoop is mainly used) stores full data, so that offline calculation is facilitated, and a result is returned to the relational database for query.
The first scheme and the second scheme have the following problems: the user construction cost is high, a large amount of data storage occupies high resources, and the maintenance cost is high, which mainly reflects in that the hard disk resources occupy high, the operation and maintenance difficulty of database software is high, and the data maintenance cost is high; the user experience is poor, the performance is poor during data warehousing and query, and the requirements of the actual combat application of public security and traffic user departments cannot be met; the operation and maintenance cost is high, the complexity of the installation and maintenance processes of oracle and sqlserver databases is high, certain requirements are imposed on field operation and maintenance personnel, and the operation and maintenance cost cannot be lowered; the data tuning difficulty is higher aiming at the industry background.
Therefore, those skilled in the art are devoted to developing a complete, efficient, fast method for multi-conditional fast result estimation of data.
Disclosure of Invention
In view of the above defects in the prior art, the technical problems to be solved by the present invention are to solve the storage cost, improve the data storage performance, improve the data query performance, and reduce the on-site operation and maintenance cost.
To achieve the above object, the present invention provides a method for multi-condition fast result estimation of billions of data volume, which is characterized by comprising the following steps:
step A, data storage;
b, creating pre-statistical information, combining a use scene, and performing multi-dimension pre-statistics on a large amount of data stored in a database in real time every day by adopting a data pre-statistics technology when a system is constructed;
c, estimating the data quantity;
and D, segmenting the conditions.
Further, the engine of the step A adopts TokuDB.
Further, the step A adopts a Mysql database based on Mycat data middleware.
Further, the line compression mode of step a adopts a zstd mode.
Further, the step B specifically includes selecting a condition of the service query as a dimension, automatically establishing the dimensions by a program, performing statistics in the database, and writing a statistical result into a corresponding result table, so as to implement the step C and the step D according to the statistical result when the service scene query is performed.
Further, the step C specifically includes querying the pre-statistical information in the step B according to a query condition of a user and combining the query results each time the system performs service query.
Further, the step D specifically includes, when querying data, the system performs truncation query on time according to a time condition in the query condition, in combination with the pre-statistical information and a system page display requirement.
Further, the condition query of the step B comprises a vehicle passing amount query.
Further, the system page display of step D needs to include a limited amount of information displayed per page when page display is performed.
Further, the index of the TokuDB is created online.
The technical effects of the method for rapidly estimating the result of the multi-condition multi-billion data volume disclosed by the invention are as follows:
1. the realization difficulty is smaller, the prior art is solved by a pure technical means, the technical difficulty is large, and the operation and maintenance cost is high. The invention combines the actual scene of the business, uses the engineering construction means to integrally solve the problem, and has small realization difficulty and low operation and maintenance cost. Both the requirements and performance requirements are met.
2. The invention has lower hardware requirement, and particularly has less number of servers used in the same data volume scene.
3. The invention has stable operation.
The conception, the specific structure and the technical effects of the present invention will be further described below to fully understand the objects, the features and the effects of the present invention.
Detailed Description
The following describes several preferred embodiments of the present invention to make the technical contents thereof clearer and easier to understand. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.
The following describes an embodiment of the present invention by taking a certain practical application as an example. The multi-condition data filtering mode based on the message middleware can be realized by the technical personnel in the field through the description.
Step 1, deploying a Mysql database of Mycat data middleware, installing a tokedb engine and installing a ztd row compression mode.
And 2, creating pre-statistical information. The method comprises the steps of combining a use scene, carrying out multi-dimension pre-statistics on a large amount of data stored in a warehouse every day in real time by adopting a data pre-statistics technology when a system is constructed, specifically selecting conditions of service query as dimensions, such as vehicle passing amount query, using vehicle passing time, passing places and vehicle attributes to query according to the scene, automatically establishing the dimensions by a program, carrying out statistics in a real-time database and writing results into a corresponding result table, and then carrying out data amount estimation and condition segmentation according to the statistical results when the service scene is queried.
And 3, estimating. Aiming at data volume statistics, after the pre-statistical information is created, when the system inquires the service each time, the system inquires the pre-statistical information according to the conditions inquired by the user, and then the inquiry results are combined, so that the inquiry speed is met, and the service requirements are met.
And 4, segmenting the conditions. Aiming at data query, when the system queries, according to the time condition in the query condition, the system performs truncation query on the time by combining the pre-statistical information and the system page display requirement (for example, the quantity of each page displayed in paging is limited), so that the query speed is improved.
The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions that can be obtained by those skilled in the art through logic analysis, reasoning or limited experiments based on the prior art according to the concepts of the present invention should be within the scope of protection determined by the claims.

Claims (5)

1. A method for multi-conditional fast result estimation on billions of data volumes, comprising the steps of:
step A, deploying a Mysql database of Mycat data middleware, installing a TokuDB engine, and installing a zstd row compression mode;
b, creating pre-statistical information, combining a use scene, and performing multi-dimensional pre-statistics on a large amount of data stored in a database in real time every day by adopting a data pre-statistics technology when a system is constructed;
step C, after the pre-statistical information is created, when the system inquires the service each time, inquiring the pre-statistical information in the step B according to the inquiry condition of the user and combining the inquiry result;
and D, when data is inquired, the system carries out truncation inquiry on time according to the time condition in the inquiry condition and the combination of the pre-statistical information and the system page display requirement.
2. The method according to claim 1, wherein said step B specifically comprises selecting conditions of service query as dimensions, automatically creating these dimensions by a program, performing statistics in a real-time database and writing the statistical results into a corresponding result table, so as to implement said step C and said step D according to the statistical results when querying service scenes.
3. The method for trillion-level data volume multi-condition fast result evaluation as claimed in claim 2, wherein said conditional query of step B comprises an over-vehicle query.
4. The method for billion data volume multi-conditional fast result evaluation according to claim 1 wherein said system page display requirement of step D comprises a limited amount of information displayed per page during pagination display.
5. The method of billion-scale data volume multi-conditional fast result evaluation according to claim 2, wherein the index to TokuDB is created online.
CN202010268561.5A 2020-04-07 2020-04-07 Method for quickly estimating result of multi-condition billion-level data volume Active CN111625557B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010268561.5A CN111625557B (en) 2020-04-07 2020-04-07 Method for quickly estimating result of multi-condition billion-level data volume

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010268561.5A CN111625557B (en) 2020-04-07 2020-04-07 Method for quickly estimating result of multi-condition billion-level data volume

Publications (2)

Publication Number Publication Date
CN111625557A CN111625557A (en) 2020-09-04
CN111625557B true CN111625557B (en) 2023-04-14

Family

ID=72273060

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010268561.5A Active CN111625557B (en) 2020-04-07 2020-04-07 Method for quickly estimating result of multi-condition billion-level data volume

Country Status (1)

Country Link
CN (1) CN111625557B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678634A (en) * 2013-12-19 2014-03-26 北京锐安科技有限公司 Method for improving data query speed in J-Hi open-source development platform
CN103927346A (en) * 2014-03-28 2014-07-16 浙江大学 Query connection method on basis of data volumes
CN108984797A (en) * 2018-08-20 2018-12-11 新疆工程学院 A kind of method for quick estimating of magnanimity monitor-type long-term sequence data related coefficient

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10970280B2 (en) * 2015-10-07 2021-04-06 International Business Machines Corporation Query plan based on a data storage relationship
US10902022B2 (en) * 2017-03-28 2021-01-26 Shanghai Kyligence Information Technology Co., Ltd OLAP pre-calculation model, automatic modeling method, and automatic modeling system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678634A (en) * 2013-12-19 2014-03-26 北京锐安科技有限公司 Method for improving data query speed in J-Hi open-source development platform
CN103927346A (en) * 2014-03-28 2014-07-16 浙江大学 Query connection method on basis of data volumes
CN108984797A (en) * 2018-08-20 2018-12-11 新疆工程学院 A kind of method for quick estimating of magnanimity monitor-type long-term sequence data related coefficient

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
栾开宁 ; 郑海雁 ; 丁陈 ; 李昆明 ; .用于电力大数据快速组合查询的动态索引技术.电气技术.2015,(01),全文. *

Also Published As

Publication number Publication date
CN111625557A (en) 2020-09-04

Similar Documents

Publication Publication Date Title
Mouratidis et al. Continuous monitoring of top-k queries over sliding windows
CN103577440B (en) A kind of data processing method and device in non-relational database
US20220164345A1 (en) Managed query execution platform, and methods thereof
EP2263180B1 (en) Indexing large-scale gps tracks
CN109656958B (en) Data query method and system
Ma et al. KSQ: Top-k similarity query on uncertain trajectories
CN105159845A (en) Memory reading method
US7860822B1 (en) Distributed aggregation mapping
CN107766445B (en) Efficient and rapid data retrieval method supporting multi-dimensional retrieval
CN102521269A (en) Index-based computer continuous data protection method
EP3862888A1 (en) Hybrid data distribution in a massively parallel processing architecture
Magdy et al. GeoTrend: spatial trending queries on real-time microblogs
CN110888861A (en) Novel big data storage method
CN109165096B (en) Cache utilization system and method for web cluster
CN111258798A (en) Fault positioning method and device for monitoring data, computer equipment and storage medium
US20220261391A1 (en) Auto unload
CN112100510A (en) Mass data query method and device based on Internet of vehicles platform
CN111625557B (en) Method for quickly estimating result of multi-condition billion-level data volume
CN112445833A (en) Data paging query method, device and system for distributed database
CN114185885A (en) Streaming data processing method and system based on column storage database
CN110765221A (en) Management method and device of space-time trajectory data
CN109739883A (en) Promote the method, apparatus and electronic equipment of data query performance
US9465846B2 (en) Storing events from a datastream
CN113282608A (en) Intelligent traffic data analysis and storage method based on column database
Feng et al. Indexing techniques of distributed ordered tables: A survey and analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant