CN112434189A

CN112434189A - Data query method, device and equipment

Info

Publication number: CN112434189A
Application number: CN202011404960.6A
Authority: CN
Inventors: 张振
Original assignee: New H3C Big Data Technologies Co Ltd
Current assignee: New H3C Big Data Technologies Co Ltd
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-03-02

Abstract

The application provides a data query method, a data query device and data query equipment, wherein the method comprises the following steps: receiving a query request, wherein the query request comprises a service classification, a data table name and a data filtering condition; searching a data source name corresponding to the service classification and the data table name from a data directory, and searching connection information corresponding to the data source name from metadata; selecting a target data source corresponding to the data source name from a plurality of data sources, establishing connection with the target data source based on the connection information, and sending the data filtering condition to the target data source through the connection so that the target data source inquires target data corresponding to the data filtering condition; and receiving the target data through the connection, and sending the target data to a client. According to the technical scheme, the data access efficiency is improved, and the data access difficulty is reduced.

Description

Data query method, device and equipment

Technical Field

The present application relates to the field of communications technologies, and in particular, to a data query method, apparatus, and device.

Background

GeoMesa is an open-source tool kit for performing spatio-temporal data processing, supports query and analysis of large-scale geospatial data, provides a management Interface for data storage and spatio-temporal indexing, and also provides an API (Application Programming Interface) for data query. In practical development, GeoMesa can be used for data table definition on a distributed data storage, so that data management and index management are realized, and data query and analysis are provided for upper-layer services. The GeoMesa can realize management and query of the space-time data in a big data scene, greatly reduces the development threshold of the space-time data and improves the development efficiency.

GeoMesa can realize adaptive butt joint of various data sources, and can manage and index time-space data on various data sources such as HBase (distributed open source database with column orientation), Accumulo (reliable scalable high-performance sequencing distributed database), Cassandra (open source distributed NoSQL database), Kafka (high-throughput distributed publishing and subscribing message database), Redis (Remote Dictionary service) and the like, so as to improve the data query efficiency.

However, for a user accessing GeoMesa, it is necessary to acquire information of all data sources to create a data access connection to access various types of data sources, which reduces data access efficiency and increases data access difficulty. For example, if the HBase data source needs to be accessed, the user needs to know the information of the HBase data source to access the HBase data source, and if the Accumulo data source needs to be accessed, the user needs to know the information of the Accumulo data source to access the Accumulo data source, and so on. Obviously, the user needs to know the information of all data sources, and the difficulty of data access is relatively high.

Disclosure of Invention

The application provides a data query method, which is applied to data management equipment in a data management system, wherein the data management system also comprises a plurality of data sources, and the method comprises the following steps:

receiving a query request, wherein the query request comprises a service classification, a data table name and a data filtering condition;

searching a data source name corresponding to the service classification and the data table name from a pre-configured data directory, and searching connection information corresponding to the data source name from pre-configured metadata;

selecting a target data source corresponding to the data source name from a plurality of data sources, establishing connection with the target data source based on the connection information, and sending the data filtering condition to the target data source through the connection, so that the target data source queries target data corresponding to the data filtering condition;

and receiving the target data through the connection and sending the target data to a client.

In a possible embodiment, the searching for the connection information corresponding to the data source name from the preconfigured metadata includes: determining a data source type corresponding to the data source name;

and searching the connection information corresponding to the data source type from the metadata.

Illustratively, if the data source type is an HBase type, the connection information includes a zookeeper connection address, a zookeeper connection port, and authentication information; if the data source type is the HDFS type, the connection information comprises a file system address, a port and authentication information; if the data source type is Kafka type, the connection information comprises a bootstrap service address and port, and a zookeeper connection address and port; if the data source type is a Redis type, the connection information comprises a Redis cluster address, a port and an access password.

In a possible implementation manner, before the searching for the data source name corresponding to the service classification and the data table name from the pre-configured data directory, the method further includes:

acquiring a pre-configured data directory, and storing the data directory in a database of the data management equipment, wherein the data directory comprises a mapping relation among a service classification, a data table name and a data source name;

acquiring preconfigured metadata and storing the metadata in the database of the data management equipment, wherein the metadata comprises a mapping relation between a data source type and connection information.

In one possible embodiment, the method further comprises:

after receiving data to be stored, determining the type of the data to be stored;

if the type is real-time data, storing the data to be stored to a Kafka type data source;

if the type is historical data, storing the data to be stored in a data source of an HBase type, or storing the data to be stored in a data source of an HDFS type;

and if the type is transient data, storing the data to be stored to a data source of the Redis type.

In one possible embodiment, the method further comprises:

acquiring a data migration task; the data migration task comprises source service classification, a source data table name, target service classification, a target data table name and attribute information of data to be migrated;

searching a first data source name corresponding to the source service classification and the source data table name and a second data source name corresponding to the target service classification and the target data table name from the data directory;

searching first connection information corresponding to the first data source name and second connection information corresponding to the second data source name from the metadata; establishing a first connection with a first data source based on the first connection information, and establishing a second connection with a second data source based on the second connection information;

and migrating the data to be migrated in the first data source to the second data source based on the attribute information.

In a possible implementation, the attribute information includes a field mapping relationship between a first field in a source data table and a second field in a target data table, and migrating the data to be migrated in the first data source to the second data source based on the attribute information includes:

and migrating the data to be migrated of the first field in the source data table in the first data source to a second field in the target data table in the second data source based on the attribute information.

In one possible embodiment, the plurality of data sources are each for storing spatiotemporal data; wherein the spatiotemporal data comprises a temporal dimension, a spatial dimension, and an attribute dimension.

The application provides a data inquiry device, is applied to the data management equipment in the data management system, the data management system still includes a plurality of data sources, the device includes: the receiving module is used for receiving a query request, wherein the query request comprises a service classification, a data table name and a data filtering condition;

the acquisition module is used for searching a data source name corresponding to the service classification and the data table name from a pre-configured data directory and searching connection information corresponding to the data source name from pre-configured metadata; selecting a target data source corresponding to the data source name from a plurality of data sources;

the establishing module is used for establishing connection with a target data source based on the connection information;

a sending module, configured to send the data filtering condition to the target data source through the connection, so that the target data source queries target data corresponding to the data filtering condition;

the receiving module is further configured to receive the target data through the connection;

the sending module is further configured to send the target data to a client.

The application provides a data management device, including: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor;

the processor is configured to execute machine executable instructions to perform the steps of:

Based on the technical scheme, in the embodiment of the application, the query request only needs to carry the service classification and the data table name, but does not need to carry the data source name, the data management device can search the data source name corresponding to the service classification and the data table name, and then learn the information of the data source, so that the information of the data source is shielded, a user does not need to learn the information of the data source, namely, the user cannot perceive which data source the data is queried from, and cannot perceive the connection establishment process, so that the user can access various types of data sources, can quickly access the data in the data source, can use a uniform access interface to realize the access and query of the data, improve the efficiency of data access, reduce the difficulty of data access, and improve the development efficiency of upper-layer applications.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments of the present application or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings can be obtained by those skilled in the art according to the drawings of the embodiments of the present application.

FIG. 1 is a functional schematic of GeoMesa in one embodiment of the present application;

FIG. 2 is a block diagram of a data management system according to an embodiment of the present application;

FIG. 3 is a block diagram of a data directory in one embodiment of the present application;

FIG. 4 is a flow diagram of a data query method in one embodiment of the present application;

FIG. 5 is a schematic representation of spatiotemporal data in one embodiment of the present application;

FIG. 6 is a block diagram of a data query device according to an embodiment of the present application;

fig. 7 is a hardware configuration diagram of a data management device according to an embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".

GeoMesa is an open-source tool kit for spatio-temporal data processing, and is shown in FIG. 1, wherein GeoMesa supports a data management API, supports a query analysis API for spatio-temporal data, provides spatio-temporal indexes for distributed data, and supports a query engine for distributed data. The GeoMesa can store the spatio-temporal data by adopting a spatio-temporal data storage system, thereby realizing data management and index management of the spatio-temporal data and providing the spatio-temporal data for upper-layer services to perform data query and analysis. The spatiotemporal data storage system is a database for storing spatiotemporal data, and is referred to herein as a data source, i.e., the data source is a database for storing spatiotemporal data.

Referring to fig. 2, a schematic diagram of a system structure of a data management system according to an embodiment of the present application is shown, where the data management system may be a system that implements data query by using GeoMesa, or may be another type of system, which is not limited to this. The data management system may include a data management device, a plurality of clients (e.g., browsers or APPs of terminal devices, etc.) and a plurality of data sources, where the data sources may be data sources of the same type or data sources of different types, and the data sources are data sources of different types in this embodiment as an example.

Referring to fig. 2, these data sources may include an HBase type data source (referred to as an HBase data source), an HDFS (Hadoop Distributed File System) type data source (HDFS data source), a Kafka type data source (Kafka data source), and a Redis type data source (Redis data source). Of course, the above are just a few examples of the data source types, and the data source types are not limited, such as Accumulo data source, Cassandra data source, and the like. The time-space data are stored and the query request is responded by a plurality of data sources under different scenes, and a time-space data warehouse is formed together.

In the related art, for a user accessing a data source, it is necessary to acquire information of all data sources to create a data access connection to access various types of data sources, so that data access efficiency is reduced, and data access difficulty is increased. For example, if the HBase data source needs to be accessed, the user needs to know the information of the HBase data source to access the HBase data source, and if the HDFS data source needs to be accessed, the user needs to know the information of the HDFS data source to access the HDFS data source, and so on.

In view of the above problem, in the embodiment of the present application, as shown in fig. 2, a data management device (an instant null data management system) is additionally disposed in a data management system, where the data management device may be an independent device, or may be an existing device with a functional module integrated in the data management system, and the data management device is not limited thereto. By deploying the data management device, the query request only needs to carry the service classification and the data table name, but does not need to carry the data source name, the data management device can search the data source name corresponding to the service classification and the data table name, and then learn the information of the data source, so that the information of the data source is shielded, a user does not need to learn the information of the data source, namely, the user cannot perceive which data source the data is queried from, and cannot perceive the connection establishment process.

The technical solutions of the embodiments of the present application are described below with reference to specific embodiments.

In one possible implementation, the data management device may obtain preconfigured metadata, and store the metadata in a database of the data management device, where the metadata may include a mapping relationship between a data source type and connection information, and the connection information indicates that the data management device needs to establish a connection with a data source of the data source type through the connection information, and then access the data source through the connection.

For example, the metadata may be preconfigured, and the data management device may include a database (e.g., a PostgreSQL database, the type of the database is not limited), and the data management device may obtain the preconfigured metadata and store the metadata in the database of the data management device.

For example, if the data source type of the data source is HBase type, the connection information may include, but is not limited to, a zookeeper connection address and port, and authentication information. If the data source type of the data source is the HDFS type, the connection information may include, but is not limited to, a file system address and port, and authentication information. If the data source type of the data source is Kafka type, the connection information may include, but is not limited to, bootstrap service address and port, zookeeper connection address and port. If the data source type of the data source is a Redis type, the connection information may include, but is not limited to, a Redis cluster address and port, and an access password.

For example, each type of data source may be managed uniformly, metadata of each data source may be defined uniformly, connection information of different types of data sources may have differences, and connection information of data sources may be distinguished according to the type of the data source. Referring to table 1, metadata for each data source may be recorded in a database.

TABLE 1

Attribute name	Means of	Description of the rules	Whether or not to fill	Default value
					name	Data source name	Letters&Number, global only	Must fill in	Is free of
type	Data source type	Enumerating: kafka, HBase, HDFS, Redis	Must fill in	Is free of
					description	Description information	Without limitation	Optionally	Is free of
params	Connection information	Data source connection parameters, each data source restriction being different	Must fill in	Is free of

For the HBase data source, in the metadata shown in table 1, name is the name of the HBase data source, and the name of the data source has uniqueness, i.e., is globally unique. type is the type of HBase data source, i.e. Hbase. The description is description information of the HBase data source, and the description information is not limited. params is connection information of the HBase data source, such as zookeeper connection address and port, Kerberos authentication information and the like. Of course, the connection information may also include other types of connection information, and the connection information is not limited thereto.

For the HDFS data source, in the metadata shown in table 1, name is the name of the HDFS data source, type is the type of the HDFS data source, that is, HDFS, and description is description information of the HDFS data source. params is connection information of the HDFS data source, such as a file system address and port, Kerberos authentication information, and the like, and the connection information may also include other types of connection information, and the connection information is not limited.

For the Kafka data source, in the metadata shown in table 1, name is the name of the Kafka data source, type is the type of the Kafka data source, i.e., Kafka, and description is description information of the Kafka data source. params is connection information of Kafka data source, such as bootstrap service address and port, zookeeper connection address and port, and the connection information may also include other types of connection information, which is not limited herein.

For a Redis data source, in the metadata shown in Table 1, name is the name of the Redis data source, type is the type of the Redis data source, i.e. Redis, and description is the description information of the Redis data source. params is connection information of the Redis data source, such as Redis cluster address and port, access password, etc., and the connection information may also include other types of connection information, which is not limited.

In summary, referring to table 1, the data management device may store the metadata shown in table 1 in the database, where the metadata may include a mapping relationship between a data source name, a data source type, and connection information.

In a possible implementation manner, the data management device may further obtain a preconfigured data directory, and store the data directory in a database of the data management device, where the data directory may include a mapping relationship between the service class, the data table name, and the data source name. For example, the data directory may be preconfigured, and the data management device may include a database (e.g., PostgreSQL database), and the data management device may obtain the preconfigured data directory and store the preconfigured data directory in the database of the data management device.

For example, the data management device may record the mapping relationship between the data source name, the service class, and the data table name in a three-layer data directory manner, as shown in fig. 3, which is a schematic structural diagram of the data directory.

The first layer of the data directory is a data source name, the data source name is a unique identifier of the data source, such as the name of an HBase data source, the name of an HDFS data source and the like, the first layer is used for determining the specific data source, namely the data source can be found through the first layer, and then connection with the data source is established when data is accessed.

The second layer of the data directory is a service classification (also called data classification), which is a classification for classifying data, and the service classification is independent of the type of the data source, and indicates which service classification the data in the data table belongs to, so that the definition of the data table at the third layer can be logically classified. For example, the service classification may be a traffic type, a point of interest type, a location positioning type, and the like, and the service classification is not limited. The traffic type indicates that the data in all the data tables under the service classification are the data of the traffic type, the interest point type indicates that the data in all the data tables under the service classification are the data of the interest point type, and the like.

The third layer of the data directory is a data table name, which may be a unique identifier of the data table, such as a name of data table a1 in the HBase data source, a name of data table a2 in the HBase data source, a name of data table b1 in the HDFS data source, and the like, and the third layer is used to determine a specific data table, that is, a certain data table in the data source may be found through the third layer, which indicates that data in the data table needs to be accessed.

For example, regarding the name of a data source in a data directory, a datastore can be used for representing, regarding a service class in the data directory, a catalog can be used for representing, regarding a name of a data table in the data directory, a schema can be used for representing, and a three-layer structure of the data directory can be represented as datastore.

In summary, referring to fig. 3, the data directory may be a tree structure, where the name of the data source in the first layer is globally unique, the name of the data source is unique for the service class in the second layer, and the name of the data table is unique for the service class in the third layer. The uniqueness of the data source name is determined through the combination of the service classification and the data table name, the data source difference is not required to be sensed any more, and the complete three-layer structure data is determined only through the service classification and the data table name.

For example, the definition of the service class catalog in the data directory can be shown in table 2, and the definition of the data table name schema in the data directory can be shown in table 3.

TABLE 2

Attribute name	Means of	Description of the rules	Whether or not to fill	Default value
					name	Name of service class	Letters&Number, global only	Must fill in	Is free of
description	Description information	Without limitation	Optionally	Is free of

TABLE 3

Attribute name	Means of	Description of the rules	Whether or not to fill	Default value
					name	Data table names	Letters&Number, unique under the service classification range	Must fill in	Is free of
comment	Description information	Without limitation	Optionally	Is free of

In a possible implementation manner, the data management device may further obtain a preconfigured data table structure, and store the data table structure in a database of the data management device, where the data table structure defines information of a certain data field in the data table. For example, the data table structure may be preconfigured, and the data management device may include a database (e.g., PostgreSQL database), and the data management device may obtain the preconfigured data table structure and store the preconfigured data table structure in the database of the data management device.

Referring to table 4, the content of the data table structure is not limited, which is an example of the data table structure. The data source may reference the data table structure when creating the data table, that is, create the data table based on the data table structure. For example, for a certain data field (a data table structure in which a plurality of data fields may be recorded in a database of the data management device), if the data field is a primary key, the data field needs to be used as the primary key when creating the data table; if the data field is an index field, the data field needs to be used as the index field when the data table is created; if the data field is an object id, the data field needs to be used as the object id when creating the data table. In addition, the data management device may also provide the data table structure to the user, so that the user can know the data table structure, and then know which data field is the primary key, which data field is the index field, and which data field is the object identifier.

TABLE 4

Attribute name	Means of	Description of the rules	Whether or not to fill	Default value
					name	Data field name	Letters&Digital, full table only	Must fill in	Is free of
type	Data type	Spatio-temporal data types	Must fill in	Is free of
					pk	Whether or not to make a key	Of the Boolean type	Must fill in	false
index	Whether to index a field	Of the Boolean type	Must fill in	fasle
					uid	Object identification	Of the Boolean type	Must fill in	fasle
comment	Description information	Without limitation	Optionally	Is free of

In the foregoing application scenario, an embodiment of the present application provides a data query method, where the method may be applied to a data management device in a data management system, where the data management system may further include a client and a plurality of data sources, and as shown in fig. 4, the method is a flowchart of the method, and the method may include:

step 401, receiving an inquiry request, where the inquiry request includes a service classification, a data table name and a data filtering condition, for example, the data management device receives an inquiry request sent by a client.

For example, the data management device needs to implement query analysis of data on the basis of implementing data directory management and metadata management, and in the query analysis process, a data access API is provided for different data sources, so that encapsulation of a REST (Representational State Transfer) interface can be performed on data access, and differences between different data sources are shielded, that is, data source information is shielded, so that data access operations do not involve data source information, and the interface only involves a business classification (catalog), a data table name (schema) and a data filtering condition. Therefore, when the client sends a query request to the data management device, the query request includes the service classification, the data table name and the data filtering condition, but does not include the data source information.

In summary, after receiving the query request, the data management device may parse the service classification, the data table name and the data filtering condition from the query request, but the query request does not include the data source information.

In one possible implementation, one example of the query request may be:

URL：/spatiotemporal/catalogs/${catalog}/schemas/${schema}/features

the relevant meaning of the query request can be shown in table 5, spatiomporal indicates that the type of the data to be queried is spatio-temporal data, catalog indicates a business classification, $ { catalog } indicates a specific business classification value, schema indicates a data table name, and $ schema } indicates a specific data table name value. In addition, features denote data filtering conditions, and the data filtering conditions may include, but are not limited to, HEADERS field whose related meaning may be shown in table 6 and REQUEST BODY field whose related meaning may be shown in table 7.

TABLE 5

Parameter name	Means of	Description of the rules	Whether or not to fill	Default value
					catalog	Name of service class	Letters&Number of	Must fill in	Is free of
schema	Data table names	Letters&Number of	Must fill in	Is free of

TABLE 6

Parameter name	Means of	Description of the rules	Whether or not to fill	Default value
					Content-Type	Content type	The interface can only use application/json	Must fill in	application/json

TABLE 7

Parameter name	Means of	Data type	Description of the rules	Whether or not to fill	Default value
						cql	General query statement	String	General query statement	Must fill in	INCLUDE
limit	Number limit of return values	Int	Query data volume restriction	Optionally	1000
						attributes	Deriving attribute names	JsonArray<String>	Query attribute collection	Optionally	Is free of

In summary, it can be seen that the query request includes the service classification, the data table name and the data filtering condition, but the query request does not include the data source information, so that the user can access the data by using the service classification and the data table name without sensing the type of the data source, the data can be directly accessed without configuring the connection information of the data source, and a uniform data query statement is used in the query request.

The data Query statement may be a data filter condition in the Query request, where the data filter condition may use a CQL (Common Query Language) Query statement to implement filtering Query of spatio-temporal data, or may be other types of Query statements, which is not limited to this.

Step 402, searching a data source name corresponding to the service classification and the data table name from a pre-configured data directory, and searching connection information corresponding to the data source name from pre-configured metadata.

For example, the data management device may obtain a preconfigured data directory from the database, where the data directory records a mapping relationship between a data source name, a service class, and a data table name, and therefore, after the data management device parses the service class and the data table name from the query request, the data management device may query the data directory through the service class and the data table name to obtain the data source name corresponding to the service class and the data table name.

The data management device can acquire preconfigured metadata from a database, and the metadata records the mapping relationship among the data source name, the data source type and the connection information, so that after the data source name is obtained, the metadata can be inquired through the data source name to obtain the connection information corresponding to the data source name.

In one possible implementation, the searching for the connection information corresponding to the data source name from the metadata may include: firstly, determining the data source type corresponding to the data source name, and then searching the connection information corresponding to the data source type from the metadata. For example, if the data source type is the HBase type, the connection information includes a zookeeper connection address, a port, and authentication information; if the data source type is the HDFS type, the connection information comprises a file system address, a port and authentication information; if the data source type is Kafka type, the connection information comprises a bootstrap service address and a port, and a zookeeper connection address and a port; if the data source type is Redis type, the connection information comprises Redis cluster address and port, and access password.

For example, as shown in table 1, a data source type corresponding to a data source name may be determined, where data sources of the same data source type have the same connection information, and data sources of different data source types have different connection information, so that after the data source type is obtained, connection information corresponding to the data source type may be determined, for example, connection information of the HBase type includes a zookeeper connection address and port, authentication information, and the like.

Step 403, selecting a target data source corresponding to the data source name from the multiple data sources, establishing a connection with the target data source based on the connection information, and sending the data filtering condition to the target data source through the connection, so that the target data source queries target data corresponding to the data filtering condition.

For example, after obtaining the data source name, the data management device may select, from all the data sources, a target data source corresponding to the data source name, such as an HBase data source, or an HDFS data source, or a Kafka data source, or a Redis data source, where for convenience of description, the target data source is the HBase data source.

Then, the data management device establishes connection with the HBase data source based on the connection information of the HBase data source, and the connection establishment process is not limited, for example, the data management device may establish connection with the HBase data source based on the zookeeper connection address and port of the HBase data source, the authentication information, and the like.

After the data management device establishes a connection with the HBase data source, the data management device may send the data filtering condition to the HBase data source through the connection, as shown in tables 6 and 7.

After receiving the data filtering condition, the HBase data source may query the target data corresponding to the data filtering condition, without limitation on the query process of the target data. After obtaining the target data corresponding to the data filtering condition, the HBase data source may return the target data to the data management device.

Step 404, receive the target data through the connection and send the target data to the client.

Specifically, the data management device may receive target data (i.e., a query result) returned by the HBase data source through the connection, and send the target data to the client. For example, the target data may be sent to the client through a standard GeoJSON data format, and the sending method is not limited.

In the above embodiments, multiple data sources (e.g., HBase data source, HDFS data source, Kafka data source, Redis data source, etc.) are each used to store spatio-temporal data, which may include, but is not limited to, a temporal dimension, a spatial dimension, and an attribute dimension. In summary, the data filtering condition may be a data filtering condition of spatio-temporal data, and the target data may be spatio-temporal data, but the spatio-temporal data is merely an example.

Referring to fig. 5, the spatiotemporal data is a schematic diagram of spatiotemporal data, which may include three dimensions of time, space, attributes, etc., in terms of data representation, ID may be used as a data unique identifier, UID (user identifier) may be used as an object unique identifier, data itself is generated by a specific object or is used to express a spatiotemporal state of a specific object, and in terms of data dimension, attribute data includes a data identifier and an object identifier.

For example, the execution sequence is only an example given for convenience of description, and in practical applications, the execution sequence between the steps may also be changed, and the execution sequence is not limited. Moreover, in other embodiments, the steps of the respective methods do not have to be performed in the order shown and described herein, and the methods may include more or less steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

Based on the technical scheme, in the embodiment of the application, the query request only needs to carry the service classification and the data table name, but does not need to carry the data source name, the data management device can search the data source name corresponding to the service classification and the data table name, and then learn the information of the data source, so that the information of the data source is shielded, a user does not need to learn the information of the data source, namely, the user cannot perceive which data source the data is queried from, and cannot perceive the connection establishment process, so that the user can access various types of data sources, can quickly access the data in the data source, can use a uniform access interface to realize the access and query of the data, improve the efficiency of data access, reduce the difficulty of data access, and improve the development efficiency of upper-layer applications. The query request can be compatible with CQL query grammar, management and query on the space-time data under a big data scene can be realized, the development threshold of the space-time data is greatly reduced, and the development efficiency is improved. And a unified lightweight data access interface is provided, unified data access rules are used for data access, and different development languages are supported for data docking.

In a possible embodiment, the storage of the data to be stored may also be implemented as follows:

after receiving the data to be stored, determining the type of the data to be stored; if the type is real-time data, storing the data to be stored to a Kafka type data source; if the type is historical data, storing the data to be stored in a data source of the HBase type, or storing the data to be stored in a data source of the HDFS type; and if the type is transient data, storing the data to be stored into a data source of the Redis type.

For example, the storage requirements of all data can be divided into the storage requirements of real-time data, the storage requirements of historical data (i.e. massive historical data), and the storage requirements of transient data according to the performance characteristics of different data sources, based on which, a Kafka type data source is used as a database of real-time data, an Hbase type data source and an HDFS type data source are used as a database of historical data, and a Redis type data source is used as a database of transient data, so as to form a complete data storage mode.

On the basis, after the data management equipment obtains the data to be stored, if the data to be stored is real-time data, the data to be stored is stored in a Kafka type data source; if the data to be stored is historical data, storing the data to be stored in a data source of an HBase type or a data source of an HDFS type; and if the data to be stored is transient data, storing the data to be stored to a Redis type data source.

In one possible implementation, the data migration/data synchronization may also be implemented as follows:

step a1, acquiring a data migration task, where the data migration task includes a source service classification, a source data table name, a target service classification, a target data table name, and attribute information of data to be migrated, where the attribute information includes a field mapping relationship between a first field in the source data table and a second field in the target data table.

Illustratively, the source service classification and the source data table name uniquely correspond to one data source, the data source is a data source before data migration, and the data source is marked as a first data source, that is, data in the first data source needs to be migrated. The target service classification and the target data table name only correspond to one data source, the data source is a data source after data migration, and the data source is marked as a second data source, namely the data needs to be migrated to the second data source. In summary, data in a first data source needs to be migrated to a second data source.

Illustratively, the attribute information of the data to be migrated includes a field mapping relationship between a first field in a source data table and a second field in a target data table, where the source data table is a data table located in the first data source, and the target data table is a data table located in the second data source, and it is known that the data of the first field in the source data table needs to be migrated to the second field in the target data table based on the attribute information of the data to be migrated.

For an example, an example of the data migration task may be shown in table 8, where the data migration task may include a source service class, a source data table name, a target service class, a target data table name, attribute information of data to be migrated, and parallelism, where the parallelism indicates the number of tasks that run in a distributed manner.

TABLE 8

Step a2, searching a first data source name corresponding to the source service classification and the source data table name and a second data source name corresponding to the target service classification and the target data table name from the data directory.

For example, since the data directory includes a mapping relationship between a data source name, a service class, and a data table name, the data management device may search the data directory for a first data source name corresponding to the source service class and the source data table name, and search the data directory for a second data source name corresponding to the target service class and the target data table name, which is not described in detail herein.

A3, searching the metadata for the first connection information corresponding to the first data source name and the second connection information corresponding to the second data source name; and establishing a first connection with a first data source based on the first connection information, and establishing a second connection with a second data source based on the second connection information.

For example, since the metadata includes a mapping relationship between a data source name, a data source type, and connection information, the data management device may search the metadata for first connection information corresponding to the first data source name and second connection information corresponding to the second data source name, which is not described again.

After obtaining the first connection information, the data management device may establish a first connection with a first data source (a data source before migration) based on the first connection information. After obtaining the second connection information, the data management device may establish a second connection with a second data source (the migrated data source) based on the second connection information.

And a4, migrating the data to be migrated in the first data source to the second data source based on the attribute information.

For example, since the first connection is established between the data management device and the first data source, the data management device and the first data source can communicate with each other, and the data management device may notify the first data source to migrate the data to be migrated to the second data source, so that the first data source migrates the data to be migrated to the second data source. In addition, since the second connection is established between the data management device and the second data source, communication can be performed between the data management device and the second data source, and the data management device can notify the second data source to receive the data to be migrated from the first data source, so that the second data source receives the data to be migrated.

In a possible implementation manner, the attribute information may include, but is not limited to, a field mapping relationship between a first field in the source data table and a second field in the target data table, based on which, in step a4, the data management device may migrate the data to be migrated in the first field in the source data table in the first data source to the second field in the target data table in the second data source based on the attribute information.

For example, the data management device may notify the first data source of the information of the first field in the source data table and the information of the second data source. Based on the information of the first field in the source data table, the first data source may obtain the data to be migrated from the first field of the source data table. Based on the information of the second data source, the first data source may send the data to be migrated in the first field to the second data source.

The data management device may notify the second data source of the information of the second field in the target data table, and the information of the first data source. Based on the information of the first data source, the second data source can receive the data to be migrated sent by the first data source. Based on the information of the second field in the target data table, the second data source may write the data to be migrated to the second field in the target data table, thereby completing the data migration.

In the above embodiment, the first field may be any field in the source data table, and the number of the first field may be plural. The second field may be any field in the target data table, and the number of the second field may be plural. The number of the first fields and the number of the second fields may be the same, i.e. one-to-one correspondence.

In summary, the data to be migrated in the first data source may be migrated to the second data source.

For example, after migrating the data to be migrated in the first data source to the second data source, if the data to be migrated is deleted from the first data source, it indicates that the migration of the data to be migrated is implemented, and at this time, the data to be migrated is only retained in the second data source. After the data to be migrated in the first data source is migrated to the second data source, if the data to be migrated is not deleted from the first data source, it means that the data to be migrated is synchronized, and at this time, the data to be migrated is simultaneously retained in the first data source and the second data source, that is, two data are stored.

For example, in order to implement the data migration, a Flink (distributed real-time computing framework) may be used, for example, the data management device sends a data migration task to the Flink, the Flink implements data migration between a source data table in a first data source and a target data table in a second data source, and migrates data to be migrated in a first field in the source data table in the first data source to a second field in the target data table in the second data source. When the Flink performs data migration, distributed operation is supported, namely, a plurality of data are migrated at the same time, the number of simultaneous migration can be determined according to the parallelism degree in the data migration task, the parallelism degree constraint is more than or equal to 1, the process number of the same data migration task can be represented, and the higher the parallelism degree is, the higher the data synchronization rate is.

Illustratively, the HBase data source and the HDFS data source may store historical data, are mainly used for query and offline analysis, and may perform data migration (i.e., data synchronization) according to a data ID as a primary key, thereby ensuring that all data can be persistently stored. The Redis data source adopts a memory storage mode, has very good query performance, can perform data migration according to the object identifier UID as a primary key, and ensures that the data stored in the Redis data source is the latest spatiotemporal state data of the object all the time without storing historical spatiotemporal data.

Based on the technical scheme, a complete storage scheme is provided for the space-time data, and the storage requirements of real-time data storage, historical data storage, transient data storage and the like are met. The method can realize data migration (or data synchronization) among different data sources, realize unified management of data, realize a distributed space-time big data storage scheme, greatly reduce the cost of maintaining massive space-time data and improve the development efficiency of upper-layer application of the space-time data.

Based on the same concept as the method, an embodiment of the present application provides a data query apparatus, which is applied to a data management device in a data management system, where the data management system further includes a plurality of data sources, as shown in fig. 6, which is a schematic structural diagram of the apparatus, and the apparatus includes: a receiving module 61, configured to receive a query request, where the query request includes a service classification, a data table name, and a data filtering condition;

an obtaining module 62, configured to search, from a preconfigured data directory, a data source name corresponding to the service classification and the data table name, and search, from preconfigured metadata, connection information corresponding to the data source name; selecting a target data source corresponding to the data source name from a plurality of data sources;

an establishing module 63, configured to establish a connection with a target data source based on the connection information;

a sending module 64, configured to send the data filtering condition to the target data source through the connection, so that the target data source queries target data corresponding to the data filtering condition;

the receiving module 61 is further configured to receive the target data through the connection;

the sending module 64 is further configured to send the target data to the client.

In a possible implementation manner, when the obtaining module 62 searches for the connection information corresponding to the data source name from the preconfigured metadata, the obtaining module is specifically configured to: determining a data source type corresponding to the data source name; and searching the connection information corresponding to the data source type from the metadata.

If the data source type is an HBase type, the connection information comprises a zookeeper connection address, a zookeeper connection port and authentication information; if the data source type is the HDFS type, the connection information comprises a file system address, a port and authentication information; if the data source type is Kafka type, the connection information comprises a bootstrap service address and port, and a zookeeper connection address and port; if the data source type is a Redis type, the connection information comprises a Redis cluster address, a port and an access password.

In a possible implementation, the obtaining module 62 is further configured to:

In a possible embodiment, the device may further comprise (not shown in the figures):

the storage module is used for determining the type of the data to be stored after receiving the data to be stored;

the migration module is used for acquiring a data migration task; the data migration task comprises source service classification, a source data table name, target service classification, a target data table name and attribute information of data to be migrated; searching a first data source name corresponding to the source service classification and the source data table name and a second data source name corresponding to the target service classification and the target data table name from the data directory;

In a possible implementation manner, the attribute information includes a field mapping relationship between a first field in the source data table and a second field in the target data table, and the migration module is specifically configured to, when migrating the data to be migrated in the first data source to the second data source based on the attribute information:

Based on the same application concept as the method, the embodiment of the present application provides a data management device, as shown in fig. 7, the data management device includes: a processor 71 and a machine-readable storage medium 72, the machine-readable storage medium 72 storing machine-executable instructions executable by the processor 71; the processor 71 is configured to execute machine executable instructions to perform the following steps:

Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where a plurality of computer instructions are stored on the machine-readable storage medium, and when the computer instructions are executed by a processor, the data query method disclosed in the above example of the present application can be implemented.

The machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.

For convenience of description, the above devices are described as being divided into various units by function, and are described separately. Of course, the functionality of the units may be implemented in one or more software and/or hardware when implementing the present application.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

Furthermore, these computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A data query method, applied to a data management device in a data management system, the data management system further including a plurality of data sources, the method comprising:

2. The method of claim 1,

the searching for the connection information corresponding to the data source name from the preconfigured metadata includes:

determining a data source type corresponding to the data source name;

3. The method of claim 2,

4. The method of claim 1, wherein before searching for the data source name corresponding to the traffic classification and the data table name from the pre-configured data directory, the method further comprises:

5. The method according to any one of claims 1-4, further comprising:

6. The method according to any one of claims 1-4, further comprising:

7. The method of claim 6, wherein the attribute information comprises a field mapping relationship between a first field in a source data table and a second field in a target data table, and wherein migrating the data to be migrated in the first data source to the second data source based on the attribute information comprises:

8. The method of any of claims 1-4, wherein the plurality of data sources are each configured to store spatiotemporal data; wherein the spatiotemporal data comprises a temporal dimension, a spatial dimension, and an attribute dimension.

9. A data query apparatus, applied to a data management device in a data management system, the data management system further including a plurality of data sources, the apparatus comprising:

the receiving module is used for receiving a query request, wherein the query request comprises a service classification, a data table name and a data filtering condition;

the sending module is further configured to send the target data to a client.

10. A data management apparatus, characterized by comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor;