CN110659294B - Space-time data ad hoc query method, system, electronic device and storage medium - Google Patents

Space-time data ad hoc query method, system, electronic device and storage medium Download PDF

Info

Publication number
CN110659294B
CN110659294B CN201910909588.5A CN201910909588A CN110659294B CN 110659294 B CN110659294 B CN 110659294B CN 201910909588 A CN201910909588 A CN 201910909588A CN 110659294 B CN110659294 B CN 110659294B
Authority
CN
China
Prior art keywords
data
spatio
query
component
database
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910909588.5A
Other languages
Chinese (zh)
Other versions
CN110659294A (en
Inventor
李蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201910909588.5A priority Critical patent/CN110659294B/en
Publication of CN110659294A publication Critical patent/CN110659294A/en
Application granted granted Critical
Publication of CN110659294B publication Critical patent/CN110659294B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24564Applying rules; Deductive queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application relates to a spatio-temporal data ad hoc query method, a spatio-temporal data ad hoc query system, electronic equipment and a storage medium. The query system generates a corresponding query request according to a preset rule based on a query condition, the Jetty service component submits the query request to the Spark distributed cluster engine, and the Spark distributed cluster engine analyzes the query request according to the preset service rule and calls the corresponding component to retrieve data in the database according to the type or field to be queried. Because the consistent service rule is agreed, developers do not need to customize the query system according to the data storage structure in the database, only need to generate the query request according to the service rule by the query condition, and the Spark distributed cluster engine can call the corresponding component to query the corresponding data by analyzing the query request, so that the query requirements of the diversity of users are met.

Description

Space-time data ad hoc query method, system, electronic device and storage medium
Technical Field
The application relates to the technical field of big data processing, in particular to a spatio-temporal data ad hoc query method, a spatio-temporal data ad hoc query system, electronic equipment and a storage medium.
Background
The statistical analysis of the spatio-temporal data is a problem of comparison in the field of big data analysis and processing at present, and the spatio-temporal data refers to multi-dimensional morphological data including time and space, such as some trajectory data and point location data of event occurrence. The amount of spatiotemporal data is large, especially similar to behavioral trace data, and a large amount of data is generated every second.
At present, the processing mode of the space-time big data is that a front-end query system directly submits an SQL statement for query. Before the process, developers make customized development according to the query requirements of users to generate reports with fixed formats and fixed conditions, and users cannot set the query conditions for the second time in the using process and cannot meet the query requirements of user diversity.
Disclosure of Invention
In view of the above, the present application aims to provide a method, a system, an electronic device and a storage medium for querying spatio-temporal data on an ad hoc basis to meet the query requirement for diversity.
In a first aspect, an embodiment of the present application provides a spatio-temporal data ad hoc query method, which is applied to a spatio-temporal data ad hoc query system, where the spatio-temporal data ad hoc query system includes a database, a query system, a Jetty service component, and a Spark distributed cluster engine, where the database stores spatio-temporal data and geomesas indexes corresponding to the spatio-temporal data, non-spatio-temporal data and Phoenix indexes corresponding to the spatio-temporal data, and the method includes:
the query system generates a corresponding query request according to a preset business rule based on a query condition selected by a user, and sends the query request to a Jetty service component, wherein the preset business rule comprises query requests corresponding to data types respectively contained in spatio-temporal data or non-spatio-temporal data;
the Jetty service component submits the query request to a Spark distributed cluster engine in a Spark task mode;
the Spark distributed cluster engine analyzes the Spark task according to the preset service rule to obtain an analyzed query service logic, wherein the query service logic comprises a type or a field to be queried;
the Spark distributed cluster engine judges whether the data to be inquired is space-time data or non-space-time data according to the type or field to be inquired, which is included in the inquiry service logic;
if the spatiotemporal data is the spatiotemporal data, acquiring the corresponding spatiotemporal data in the database through a Geomesa index;
and if the data is non-spatiotemporal data, obtaining the corresponding non-spatiotemporal data in the database through the Phoenix index.
In an optional embodiment, the spatiotemporal data ad hoc query system further includes a Flink-based real-time ETL system, a third-party database, a Geomesa component, and a Phoenix component, and the method includes a step of importing the non-spatiotemporal data and the spatiotemporal data into the database, the step including:
the real-time ETL system based on the Flink extracts data from the third-party database in real time and judges whether the extracted data is space-time data or non-space-time data;
if the spatio-temporal data is the spatio-temporal data, importing the spatio-temporal data into the database through a Geomesa component, and establishing a corresponding Geomesa index;
and if the data is non-spatio-temporal data, importing the non-spatio-temporal data into the database through a Phoenix component, and establishing a corresponding Phoenix index.
In an optional embodiment, the generating, by the query system, a corresponding query request according to a query condition selected by a user and a preset business rule, and sending the query request to the Jetty service component includes:
and the query system packages the query conditions into a JSON-form service request according to a preset service rule and sends the JSON-form service request to the Jetty service assembly.
In an optional embodiment, the submitting the query request to the Spark distributed cluster engine by the Jetty service component in the form of a Spark task includes:
and the Jetty service assembly analyzes the JSON-form service request into a service object, and submits the service object to a Spark distributed cluster engine for execution in a Spark task mode through a Spark Session object of the Jetty service assembly.
In an alternative embodiment, the method further comprises:
after the real-time ETL system based on the Flink establishes the Geomesa index or the Phoenix index, the spatio-temporal data and the Geomesa index corresponding to the spatio-temporal data, the non-spatio-temporal data and the Phoenix index corresponding to the spatio-temporal data are compressed and stored in the database.
In a second aspect, an embodiment of the present application provides a spatio-temporal data ad hoc query system, where the spatio-temporal data ad hoc query system includes a database for storing data, a query system, a Jetty service component, and a Spark distributed cluster engine, where the data includes spatio-temporal data and Geomesa indexes corresponding thereto, and non-spatio-temporal data and Phoenix indexes corresponding thereto:
the query system is used for generating a corresponding query request according to a preset business rule based on a query condition selected by a user and sending the query request to the Jetty service component, wherein the preset business rule comprises query requests corresponding to data types respectively contained in spatiotemporal data or non-spatiotemporal data;
the Jetty service component is used for submitting the query request to a Spark distributed cluster engine in a Spark task mode;
the Spark distributed cluster engine is used for analyzing the Spark task according to the preset service rule to obtain an analyzed query service logic, wherein the query service logic comprises a type or a field to be queried;
the Spark distributed cluster engine is further configured to determine whether the data to be queried is spatio-temporal data or non-spatio-temporal data according to the query service logic including a type or a field to be queried;
if the spatio-temporal data exist, retrieving a Geomesa index corresponding to the spatio-temporal data in the database to obtain corresponding spatio-temporal data;
and if the data is non-spatiotemporal data, searching a Phoenix index corresponding to the non-spatiotemporal data in the database to obtain corresponding non-spatiotemporal data.
In an optional embodiment, the spatiotemporal data ad hoc query system further includes a Flink-based real-time ETL system, a third-party database, a Geomesa component, and a Phoenix component:
the real-time ETL system based on the Flink is used for extracting data from the third-party database in real time and judging whether the extracted data is space-time data or non-space-time data;
if the spatio-temporal data exist, the real-time ETL system based on the Flink is further used for importing the spatio-temporal data into the database through a Geomesa component and establishing a corresponding Geomesa index;
and if the data is non-spatio-temporal data, the real-time ETL system based on the Flink is also used for importing the non-spatio-temporal data into the database through a Phoenix component and establishing a corresponding Phoenix index.
In an optional embodiment, after the real-time ETL system based on Flink establishes the geomesas index or the Phoenix index, the real-time ETL system is further configured to compress the spatio-temporal data and the geomesas index corresponding thereto, the non-spatio-temporal data and the Phoenix index corresponding thereto, and store the compressed data in the database.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor and a non-volatile memory storing computer instructions, where when the computer instructions are executed by the processor, the electronic device executes the spatiotemporal data ad hoc query method according to any one of the foregoing embodiments.
In a fourth aspect, an embodiment of the present application provides a storage medium, where a computer program is stored in the storage medium, and the computer program, when executed, implements the spatio-temporal data ad-hoc query method according to any one of the foregoing embodiments.
The space-time data ad hoc query method, the space-time data ad hoc query system, the electronic device and the storage medium provided by the embodiment of the application comprise a database, a query system, a Jetty service component and a Spark distributed cluster engine. The database stores the spatio-temporal data and the corresponding Geomesa index, the non-spatio-temporal data and the corresponding Phoenix index. The query system generates a corresponding query request according to a preset rule based on a query condition selected by a user, the Jetty service component submits the query request to the Spark distributed cluster engine for execution, the Spark distributed cluster engine performs analysis according to a preset service rule in the execution process, and the analyzed query service logic is obtained and comprises a type or a field to be queried. And the Spark distributed cluster engine calls a Geomesa or Phoenix component to a database to retrieve a corresponding index according to the type and the field of the query. By means of agreeing with the consistent business rules, developers do not need to customize, develop and query a system according to the storage structure of data in the database, only need to generate query requests according to the business rules according to query conditions, and the Spark distributed cluster engine can call corresponding components to query corresponding data by analyzing the query requests, so that the diversified query requirements of users are met.
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are required to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained from the drawings without inventive effort.
FIG. 1 is a schematic diagram of a spatiotemporal data processing method of the prior art;
FIG. 2 is an architecture diagram of the spatiotemporal data ad hoc query system according to the present embodiment;
FIG. 3 is a flowchart of a spatio-temporal data ad hoc query method according to an embodiment of the present application;
FIG. 4 is a flowchart illustrating sub-steps of step S310 in FIG. 3;
FIG. 5 is a second flowchart of a spatio-temporal data ad hoc query method according to an embodiment of the present application.
Description of the main element symbols: 10-a spatio-temporal data ad-hoc query system; 11-a database; 12-a query system; 13-Jetty service component; 14-Spark distributed cluster engine; 15-Flink-based real-time ETL system; 16-third party databases; 17-a Geomesa assembly; an 18-Phoenix component; 141-routing component.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, presented in the accompanying drawings, is not intended to limit the scope of the claimed application, but is merely representative of selected embodiments of the application. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present application without making any creative effort, shall fall within the protection scope of the present application.
The processing of spatio-temporal data relates to the related field of data engineering, and at present, there are many applicable technologies, such as mapreduce, Hive, Pig, HBase, etc., and there are also popular big data processing technologies, such as Spark, Storm, Flink, Presto, Phoenix, elastic search, Redis, etc., these components have advantages, some aiming at data calculation, some aiming at data query, and some aiming at data storage, how to reasonably apply these technologies to achieve the user's demand is a big problem at present.
At present, the query and processing of spatiotemporal data generally takes the same approach as other non-spatiotemporal big data. As shown in fig. 1, fig. 1 is a schematic diagram of a spatiotemporal data processing method in the prior art provided by an embodiment of the present application.
In the prior art, when processing the space-time data, the process is generally divided into two parts, namely, the process of extracting, converting and loading the data firstly, and then the process of querying the data secondly. The spatio-temporal data is generally generated by a third-party service system (such as positioning software on a mobile phone) and then imported into a third-party original data storage system, the ETL system collects data from the third-party service data storage system in a timing trigger mode, and after the data are cleaned and converted, a MapReduce program of the Hadoop system is called, or the data are put into the Hadoop system by using components such as Pic and Hive, so that a user can conveniently inquire. During query, a front-end query system usually directly converts query conditions into SQL statements, Hive dynamically generates MapReduce codes according to the to-be-executed SQL statements, packages the MapReduce codes and submits the MapReduce codes to a Hadoop system, and the Hadoop system executes the MapReduce program, returns an execution result to the front-end query system, and displays the execution result to a user.
However, in the prior art, the SQL statement is executed by Hive in a manner of submitting a mapreduce task, and this execution manner is generally used for batch processing of mass data, and the delay of execution time is relatively high, so that the user experience is reduced; in addition, the ETL system acquires data from a third-party original data storage system in a timing trigger manner and stores the data into the Hadoop system, which results in poor timeliness of data queried by a user. Meanwhile, the front-end query system needs to acquire data through the SQL sentences, developers need to know the storage structure and the business meaning of the data in the database, otherwise, the SQL sentences meeting the requirements are difficult to write, the parallel development of the front-end query system and the back-end data storage system is not facilitated, and the query requirements of user diversity cannot be met.
In order to solve the above problem, an embodiment of the present application provides a spatio-temporal data ad hoc query method, which can search spatio-temporal data or non-spatio-temporal data in a database according to a user requirement.
Referring to fig. 2, fig. 2 is an architecture diagram of the spatio-temporal data ad hoc query system 10 according to the present embodiment. In this embodiment, the spatiotemporal data ad hoc query system 10 runs on a computer and is used for completing data analysis and processing, and includes a database 11, a query system 12, a Jetty service component 13, a Spark distributed cluster engine 14, a Flink-based real-time ETL system 15, a third-party database 16, a geomessa component 17, and a Phoenix component 18.
The database 11 stores spatio-temporal data and a Geomesa index corresponding thereto, non-spatio-temporal data and a Phoenix index corresponding thereto. The non-spatiotemporal data refers to data such as name, age, sex, etc. which are not related to time and space, and the spatiotemporal data refers to data related to time and space, such as data of time (including start time and end time), place, etc. of occurrence of an event.
The Jetty service component 13, Spark distributed cluster engine 14, Flink-based real-time ETL system 15, Geomesa component 17, and Phoenix component 18 are components that are composed of computer program codes and can run on a computer, and are used for data processing.
The third-party database 16 is a database for storing third-party data, which includes spatio-temporal data and non-spatio-temporal data. The third-party data is originated from a third-party service system, for example, positioning software in the electronic device or other software capable of acquiring the position information. The third party business system stores the data in the third party database 16 for subsequent querying or processing.
Referring to fig. 3, fig. 3 is a flowchart of a spatio-temporal data ad hoc query method according to an embodiment of the present disclosure. In this embodiment, the method includes:
in step S320, the query system 12 generates a corresponding query request according to a preset business rule based on the query condition selected by the user, and sends the query request to the Jetty service component 13.
In this embodiment, the preset business rule includes a query request corresponding to a data type included in the spatiotemporal data or the non-spatiotemporal data, respectively. For example, a query request for non-spatiotemporal data such as name, age, etc. or other spatiotemporal data.
Specifically, in this step, the query system 12 encapsulates the query condition selected by the user into a JSON-form service request according to a preset service rule, and sends the JSON-form service request to the Jetty service component 13.
In step S330, the Jetty service component 13 submits the query request to the Spark distributed cluster engine 14 in the form of a Spark task.
Specifically, in this step, after the Jetty service component 13 receives the query request, the interaction layer of the Jetty service component 13 parses the JSON character string of the query request into the business object, and then submits the query request to the Spark distributed cluster engine 14 for execution in the Spark task manner through the Spark session object embedded in the Jetty service component 13.
In step S340, the Spark distributed cluster engine 14 parses the Spark task according to the preset service rule, and obtains the parsed query service logic.
In this step, after receiving the Spark task, the Spark distributed cluster engine 14 parses the Spark task according to a preset service rule to obtain a parsed service logic, where the query service logic includes a type or a field to be queried.
The Spark distributed cluster engine 14 determines whether the query is spatio-temporal data or non-spatio-temporal data according to the type or field to be queried, so as to generate a corresponding SQL statement and call a corresponding component to the database 11 to query corresponding data. Specifically, referring to fig. 3 again, the above steps include:
in step S350, the Spark distributed cluster engine 14 determines whether the data to be queried is spatio-temporal data according to the query service logic including the type or field to be queried.
Step S360, if the spatio-temporal data is obtained, obtaining the corresponding spatio-temporal data in the database 11 through geomesas index.
Step S370, if the data is non-spatiotemporal data, obtaining the corresponding non-spatiotemporal data in the database 11 through the Phoenix index.
In steps S350 to S370, the Spark distributed cluster engine 14 analyzes the service object according to the preset service rule, reorganizes the service object in SQL language, determines whether the service object is spatio-temporal data according to the type or field to be queried, calls the Geomesa component 17 to obtain corresponding spatio-temporal data according to the Geomesa index if the service object is spatio-temporal data, and calls the Phoenix component 18 to obtain corresponding non-spatio-temporal data according to the Phoenix index if the service object is non-spatio-temporal data.
Specifically, the Spark distributed cluster engine 14 includes a routing component 141, where the routing component 141 is configured to determine whether data to be queried is spatio-temporal data according to a type or a field of the data to be queried, and if the data is spatio-temporal data, the routing component 141 generates an SQL statement that the Geomesa component 17 can recognize and submits the SQL statement to the database 11, and invokes the Geomesa component 17 to retrieve corresponding spatio-temporal data in the database 11 according to a Geomesa index. If the data is non-spatio-temporal data, the routing component 141 generates an SQL statement which can be identified by the Phoenix component 18, and calls the Phoenix component 18 to search the corresponding non-spatio-temporal data in the database 11 according to the Phoenix index.
In this embodiment, the Phoenix component 18 and the Geomesa component 17 are used to create corresponding indexes, and the database 11 (e.g., Key-Value storage engine HBase) is used to store the non-spatiotemporal data and the spatiotemporal data, so that the spatiotemporal data and the non-spatiotemporal data can be simultaneously queried, and the creation of the indexes can increase the query speed, and the query result can be returned to the user in a short time.
Meanwhile, in this embodiment, by using the same set of preset service rules at the query system 12 end (front end) and the database 11 end (back end) in a predetermined manner, developers do not need to know the storage structure of the back end data when developing the front end system, and only need to generate corresponding query requests according to the service rules based on the query conditions, and analyze corresponding service requests according to the service rules by the Spark distributed cluster engine 14, information such as the type or field of the data to be queried can be determined, so as to generate corresponding SQL language and call corresponding components to retrieve and obtain corresponding data. The method can reduce the coupling degree of the front-end system and the back-end data, is beneficial to the parallel development of the front-end system and the back-end data, and improves the development efficiency of the system.
Referring to fig. 3 again, in the present embodiment, the method for querying spatiotemporal data in an ad hoc manner further includes:
in step S310, non-spatiotemporal data and spatiotemporal data are imported into the database 11.
Specifically, referring to fig. 4, fig. 4 is a flowchart illustrating sub-steps of step S310 in fig. 3. Step S310 includes the following substeps:
and a substep S311 of extracting data from the third-party database 16 in real time based on the Flink real-time ETL system 15 and judging whether the extracted data is spatio-temporal data.
In the substep S312, if the data is spatio-temporal data, the spatio-temporal data is imported into the database 11 through the Geomesa component 17, and a corresponding Geomesa index is established.
And a substep S313, if the data is non-spatio-temporal data, importing the non-spatio-temporal data into the database 11 through the Phoenix component 18, and establishing a corresponding Phoenix index.
In sub-steps S311 through S313, data (including spatio-temporal data and non-spatio-temporal data) is generated by the third party business system and stored in the third party database 16 (e.g., a relational database). When the Flink processing engine is used for processing streaming data, the Flink processing engine has special effects of high throughput and low delay and has very high real-time performance. Therefore, the real-time ETL system 15 based on the Flink, which is formed by introducing a Flink processing engine into an ETL (Extract-Transform-Load) system, can acquire spatio-temporal data or non-spatio-temporal data from the third-party database 16 in real time, so that the data acquired by the real-time ETL system 15 based on the Flink is the latest data, and the timeliness of the data is ensured.
In order to enable the query system 12 to rapidly query the time-space data or the non-time-space data, after the data is acquired from the third-party database 16 based on the Flink real-time ETL system 15, a corresponding index needs to be created for the data, and the data is imported into the database 11 after the index is created, so that the query speed of the data can be significantly improved.
Referring to fig. 5, fig. 5 is a second flowchart of a spatio-temporal data ad hoc query method according to an embodiment of the present disclosure. Optionally, in this embodiment, because the amount of index data created by the Geomesa component 17 is huge, and a large space is occupied if the index data is directly imported into the database 11, in this embodiment, the method for querying spatiotemporal data in an ad hoc manner further includes:
step S380, after establishing a Geomesa index or a Phoenix index based on the Flink real-time ETL system 15, the spatio-temporal data and the Geomesa index corresponding thereto, the non-spatio-temporal data and the Phoenix index corresponding thereto are compressed and stored in the database 11.
Specifically, in this step, the Snappy compression algorithm may be used to compress the spatio-temporal data and the corresponding Geomesa index, and the non-spatio-temporal data and the corresponding Phoenix index. The compression algorithm has the advantages of high compression rate, high compression and decompression speed and the like.
Referring to fig. 2 again, an embodiment of the present application further provides a spatio-temporal data ad hoc query system 10, which includes a database 11 for storing data, a query system 12, a Jetty service component 13, and a Spark distributed cluster engine 14, where the data includes spatio-temporal data and geomesas indexes corresponding thereto, and non-spatio-temporal data and Phoenix indexes corresponding thereto.
The query system 12 is configured to generate a corresponding query request according to a preset business rule based on a query condition selected by a user, and send the query request to the Jetty service component 13, where the preset business rule includes a query request corresponding to a data type included in spatiotemporal data or non-spatiotemporal data, respectively.
The Jetty service component 13 is configured to submit the query request to the Spark distributed cluster engine 14 in the form of a Spark task.
The Spark distributed cluster engine 14 is configured to analyze the Spark task according to a preset service rule, and obtain an analyzed query service logic, where the query service logic includes a type or a field to be queried.
The Spark distributed cluster engine 14 is further configured to determine whether the data to be queried is spatio-temporal data or non-spatio-temporal data according to the query business logic including the type or field to be queried.
If the spatiotemporal data is the spatiotemporal data, a geomesas index corresponding to the spatiotemporal data is retrieved from the database 11 to obtain the corresponding spatiotemporal data.
If the data is non-spatiotemporal data, the Phoenix index corresponding to the non-spatiotemporal data is retrieved in the database 11 to obtain the corresponding non-spatiotemporal data.
Optionally, in this embodiment, the spatiotemporal data ad hoc query system 10 further includes a Flink-based real-time ETL system 15, a third-party database 16, a geomesas component 17, and a Phoenix component 18.
The Flink-based real-time ETL system 15 is used to extract data from the third party database 16 in real-time and determine whether the extracted data is spatio-temporal data or non-spatio-temporal data.
If the spatio-temporal data is, the Flink-based real-time ETL system 15 is further configured to import the spatio-temporal data into the database 11 through the Geomesa component 17, and establish a corresponding Geomesa index.
If the non-spatio-temporal data exist, the real-time ETL system 15 based on Flink is further configured to import the non-spatio-temporal data into the database 11 through the Phoenix component 18, and establish a corresponding Phoenix index.
Optionally, in this embodiment, after the fast real-time ETL system 15 establishes the geomesas index or the Phoenix index, it is further configured to compress the spatio-temporal data and the geomesas index corresponding thereto, the non-spatio-temporal data and the Phoenix index corresponding thereto, and store the compressed data in the database 11.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the spatio-temporal data ad-hoc query system described above may refer to the corresponding process in the foregoing method embodiments, and is not described herein again.
The embodiment of the application also provides an electronic device, which comprises a processor and a nonvolatile memory storing computer instructions, wherein when the computer instructions are executed by the processor, the electronic device executes the space-time data ad hoc query method.
The embodiment of the application also provides a storage medium, wherein a computer program is stored in the storage medium, and when being executed, the computer program realizes the space-time data ad hoc query method.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above-mentioned embodiments are only specific embodiments of the present application, and are used for illustrating the technical solutions of the present application, but not limiting the same, and the scope of the present application is not limited thereto, and although the present application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive the technical solutions described in the foregoing embodiments or equivalent substitutes for some technical features within the technical scope disclosed in the present application; such modifications, changes or substitutions do not depart from the spirit and scope of the present disclosure, which should be construed in light of the above teachings. Are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (10)

1. A spatio-temporal data ad hoc query method is applied to a spatio-temporal data ad hoc query system, the spatio-temporal data ad hoc query system comprises a database, a query system, a Jetty service component, a Geomesa component, a Phoenix component and a Spark distributed cluster engine, the Spark distributed cluster engine comprises a routing component, wherein spatio-temporal data and corresponding Geomesa indexes thereof, non-spatio-temporal data and corresponding Phoenix indexes thereof are stored in the database, and the method comprises the following steps:
the query system generates a corresponding query request according to a preset business rule based on a query condition selected by a user, and sends the query request to a Jetty service component, wherein the preset business rule comprises query requests corresponding to data types respectively contained in spatio-temporal data or non-spatio-temporal data;
the Jetty service component submits the query request to a Spark distributed cluster engine in a Spark task mode;
the Spark distributed cluster engine analyzes the Spark task according to the preset service rule to obtain an analyzed query service logic, wherein the query service logic comprises a type or a field to be queried;
the routing component judges whether the data to be inquired is space-time data or not according to the type or the field of the data to be inquired;
if the query result is spatio-temporal data, the routing component generates SQL statements which can be identified by the Geomesa component and submits the SQL statements to the database, and the Geomesa component is called to retrieve corresponding spatio-temporal data in the database according to the Geomesa index;
if the data are non-spatiotemporal data, the route selection component generates SQL sentences which can be identified by the Phoenix component, and the Phoenix component is called to retrieve the corresponding non-spatiotemporal data in the database according to the Phoenix index.
2. The method according to claim 1, wherein the spatiotemporal data-ad hoc query system further comprises a Flink-based real-time ETL system, a third party database, the method comprising the step of importing the non-spatiotemporal data and spatiotemporal data into the database, the steps comprising:
the real-time ETL system based on the Flink extracts data from the third-party database in real time and judges whether the extracted data is space-time data or non-space-time data;
if the spatio-temporal data is the spatio-temporal data, importing the spatio-temporal data into the database through a Geomesa component, and establishing a corresponding Geomesa index;
and if the data is non-spatio-temporal data, importing the non-spatio-temporal data into the database through a Phoenix component, and establishing a corresponding Phoenix index.
3. The method of claim 2, wherein the query system generates a corresponding query request according to a preset business rule according to a query condition selected by a user, and sends the query request to the Jetty service component, and the query system comprises:
and the query system packages the query conditions into a JSON-form service request according to a preset service rule and sends the JSON-form service request to the Jetty service assembly.
4. The method of claim 3, wherein the Jetty service component submits the query request to a Spark distributed cluster engine as a Spark task, comprising:
and the Jetty service assembly analyzes the JSON-form service request into a service object, and submits the service object to a Spark distributed cluster engine for execution in a Spark task mode through a Spark Session object of the Jetty service assembly.
5. The method of claim 2, further comprising:
after the real-time ETL system based on the Flink establishes the Geomesa index or the Phoenix index, the spatio-temporal data and the Geomesa index corresponding to the spatio-temporal data, the non-spatio-temporal data and the Phoenix index corresponding to the spatio-temporal data are compressed and stored in the database.
6. The spatio-temporal data ad hoc query system is characterized by comprising a database for storing data, a query system, a Jetty service component, a geomesas component, a Phoenix component and a Spark distributed cluster engine, wherein the Spark distributed cluster engine comprises a routing component, the data comprises spatio-temporal data and corresponding geomesas indexes thereof, non-spatio-temporal data and corresponding Phoenix indexes thereof:
the query system is used for generating a corresponding query request according to a preset business rule based on a query condition selected by a user and sending the query request to the Jetty service component, wherein the preset business rule comprises query requests corresponding to data types respectively contained in spatiotemporal data or non-spatiotemporal data;
the Jetty service component is used for submitting the query request to a Spark distributed cluster engine in a Spark task mode;
the Spark distributed cluster engine is used for analyzing the Spark task according to the preset service rule to obtain an analyzed query service logic, wherein the query service logic comprises a type or a field to be queried;
the routing component is used for judging whether the data to be inquired is space-time data or not according to the type or the field of the data to be inquired;
if the spatiotemporal data is the spatiotemporal data, the routing component is further used for generating SQL statements which can be identified by the Geomesa component and submitting the SQL statements into the database, and calling the Geomesa component to retrieve the corresponding spatiotemporal data in the database according to the Geomesa index;
and if the data is non-spatio-temporal data, the routing component is also used for generating an SQL statement which can be identified by the Phoenix component and calling the Phoenix component to search the corresponding non-spatio-temporal data in the database according to the Phoenix index.
7. The system of claim 6, wherein the spatiotemporal data ad hoc query system further comprises a Flink-based real-time ETL system, a third party database:
the real-time ETL system based on the Flink is used for extracting data from the third-party database in real time and judging whether the extracted data is space-time data or non-space-time data;
if the spatio-temporal data exist, the real-time ETL system based on the Flink is further used for importing the spatio-temporal data into the database through a Geomesa component and establishing a corresponding Geomesa index;
and if the data is non-spatio-temporal data, the real-time ETL system based on the Flink is also used for importing the non-spatio-temporal data into the database through a Phoenix component and establishing a corresponding Phoenix index.
8. The system of claim 7, wherein the Flink-based real-time ETL system is further configured to compress the spatio-temporal data and the corresponding Geomesa index thereof, the non-spatio-temporal data and the corresponding Phoenix index thereof and store the compressed spatio-temporal data and the corresponding Geomesa index thereof in the database after establishing the Geomesa index or the Phoenix index.
9. An electronic device comprising a processor and a non-volatile memory having computer instructions stored thereon, wherein when the computer instructions are executed by the processor, the electronic device performs the ad hoc query method according to any one of claims 1 to 5.
10. A storage medium having stored thereon a computer program that, when executed, implements the spatiotemporal data ad hoc query method of any one of claims 1-5.
CN201910909588.5A 2019-09-25 2019-09-25 Space-time data ad hoc query method, system, electronic device and storage medium Active CN110659294B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910909588.5A CN110659294B (en) 2019-09-25 2019-09-25 Space-time data ad hoc query method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910909588.5A CN110659294B (en) 2019-09-25 2019-09-25 Space-time data ad hoc query method, system, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN110659294A CN110659294A (en) 2020-01-07
CN110659294B true CN110659294B (en) 2022-05-17

Family

ID=69039012

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910909588.5A Active CN110659294B (en) 2019-09-25 2019-09-25 Space-time data ad hoc query method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN110659294B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111291047A (en) * 2020-01-16 2020-06-16 北京明略软件系统有限公司 Space-time data storage method and device, storage medium and electronic equipment
CN113434580A (en) * 2020-03-23 2021-09-24 北京国双科技有限公司 Phoenix database access method, device, equipment and medium
CN112214571B (en) * 2020-10-10 2024-02-20 中国平安人寿保险股份有限公司 Index generation method, device, equipment and medium based on KV storage
CN113835929B (en) * 2021-09-24 2024-04-30 深圳追一科技有限公司 Data acquisition method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107590250A (en) * 2017-09-18 2018-01-16 广州汇智通信技术有限公司 A kind of space-time orbit generation method and device
CN107766381A (en) * 2016-08-22 2018-03-06 北京京东尚科信息技术有限公司 Data query method, system and electronic equipment
CN108280082A (en) * 2017-01-06 2018-07-13 北京京东尚科信息技术有限公司 A kind of extemporaneous querying method and system of statistical data
CN109271437A (en) * 2018-09-27 2019-01-25 智庭(北京)智能科技有限公司 A kind of Query method in real time of magnanimity rent information
KR20190069229A (en) * 2017-12-11 2019-06-19 한국교통대학교산학협력단 Method and system for managing moving objects in distributed memory

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107766381A (en) * 2016-08-22 2018-03-06 北京京东尚科信息技术有限公司 Data query method, system and electronic equipment
CN108280082A (en) * 2017-01-06 2018-07-13 北京京东尚科信息技术有限公司 A kind of extemporaneous querying method and system of statistical data
CN107590250A (en) * 2017-09-18 2018-01-16 广州汇智通信技术有限公司 A kind of space-time orbit generation method and device
KR20190069229A (en) * 2017-12-11 2019-06-19 한국교통대학교산학협력단 Method and system for managing moving objects in distributed memory
CN109271437A (en) * 2018-09-27 2019-01-25 智庭(北京)智能科技有限公司 A kind of Query method in real time of magnanimity rent information

Also Published As

Publication number Publication date
CN110659294A (en) 2020-01-07

Similar Documents

Publication Publication Date Title
CN110659294B (en) Space-time data ad hoc query method, system, electronic device and storage medium
US11151206B2 (en) Method and apparatus for pushing information
JP7343568B2 (en) Identifying and applying hyperparameters for machine learning
US11775501B2 (en) Trace and span sampling and analysis for instrumented software
US9544355B2 (en) Methods and apparatus for realizing short URL service
US8661027B2 (en) Vertical search-based query method, system and apparatus
US20150213042A1 (en) Search term obtaining method and server, and search term recommendation system
US20130138636A1 (en) Image Searching
WO2015169056A1 (en) Information presentation method and device
US20120278354A1 (en) User analysis through user log feature extraction
KR101947299B1 (en) Systems and methods for search query rewrites
WO2020044096A1 (en) Information searching method and apparatus, and device/terminal/server
US10353966B2 (en) Dynamic attributes for searching
US11423096B2 (en) Method and apparatus for outputting information
CN104035938A (en) Performance continuous integration data processing method and device
CN112256772A (en) Data service method, device and readable storage medium
CN114116827B (en) Query system and method for user portrait data
CN107943846B (en) Data processing method and device and electronic equipment
CN108959294B (en) Method and device for accessing search engine
CN110955712A (en) Development API processing method and device based on multiple data sources
CN107977381B (en) Data configuration method, index management method, related device and computing equipment
CN112765200A (en) Data query method and device based on Elasticissearch
Nouvellet et al. A Quantitative analysis of digital library user behaviour based on access logs
CN113268487B (en) Data statistical method, device and computer readable storage medium
CN110704436B (en) Hbase-based index generation method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant