CN108874873B

CN108874873B - Data query method, device, storage medium and processor

Info

Publication number: CN108874873B
Application number: CN201810387774.2A
Authority: CN
Inventors: 周一鸣; 刘悦
Original assignee: Beijing Institute Of Space Science And Technology Information
Current assignee: Beijing Institute Of Space Science And Technology Information
Priority date: 2018-04-26
Filing date: 2018-04-26
Publication date: 2022-04-12
Anticipated expiration: 2038-04-26
Also published as: CN108874873A

Abstract

The invention discloses a data query method, a data query device, a storage medium and a processor. The method comprises the following steps: acquiring a query instruction, wherein the query instruction is used for querying first data in the first class of data and second data corresponding to the first data in at least one second class of data; determining a first corresponding relation and at least one second corresponding relation according to the query instruction, and querying first data in the first class of data, wherein the first corresponding relation is used for representing the corresponding relation between the data in the first class of data and a first identifier, the second corresponding relation is used for representing the corresponding relation between the data in the second class of data and a second identifier, and the second identifier is determined according to the first identifier; inquiring a designated identifier corresponding to the first data in the first identifier according to the first corresponding relation; and querying second data corresponding to the specified identification in the at least one second type of data according to the specified identification and the at least one second corresponding relation. The invention solves the technical problem of low efficiency of inquiring the data with the characteristic of obliquity.

Description

Data query method, device, storage medium and processor

Technical Field

The invention relates to the field of data processing, in particular to a data query method, a data query device, a storage medium and a processor.

Background

With the continuous development of hardware and software technologies, data warehousing becomes a key research field of database-based information management systems. And data queries are the most frequent operations in an information management system. In the relational database, data query is a process and technology for storing selected and processed data into a relational data table, generating query statements according to user requirements based on the data, and querying required data from the relational database. Essentially, the data query is used for extracting data oriented to business analysis from the relational data table and carrying out a process of summarizing and analyzing the business data.

In the process of inquiring the space information data, due to the characteristics of high obliquity of the space information data, for example, due to the differences of comprehensive national strength, scientific and technological level, application value and the like, indexes of manufacturing, production, sale, operation and the like of various satellite spacecrafts in the united states are far higher than those of other countries, so that in the process of inquiring the space information data, the inquiring process is always inclined towards the united states, and therefore, the data with the obliquity characteristic needs to be inquired to obtain the inquiring result with the obliquity characteristic.

The method is characterized in that a conventional query mode is adopted to query data with the characteristic of obliquity, the data including the tendency condition is queried in a stored aerospace intelligence database according to the obliquity condition, and then the specific condition is further queried. For example, in the case of "satellite name, transmission field, and operation institution in a type of communication satellite of a certain country", it is necessary to screen out data including a communication satellite of a certain country from a database stored in advance, and then further inquire the screened data about data such as "satellite name, transmission field, and operation institution".

However, when the query method is applied to a big data query environment, the query efficiency will be seriously reduced by mass data in the database.

Aiming at the problem of low data efficiency caused by the fact that the query has a gradient characteristic, an effective solution is not provided at present.

Disclosure of Invention

The embodiment of the invention provides a data query method, a data query device, a storage medium and a processor, which are used for at least solving the technical problem of low data efficiency caused by the characteristic of gradient query.

According to an aspect of an embodiment of the present invention, there is provided a data query method, including: acquiring a query instruction, wherein the query instruction is used for querying first data in first-class data and second data corresponding to the first data in at least one second-class data, and the first-class data and the second-class data are different types of data sets in a predetermined database; determining a first corresponding relation and at least one second corresponding relation according to the query instruction, and querying the first data in the first class of data, wherein the first corresponding relation is used for representing the corresponding relation between the data in the first class of data and a first identifier, the second corresponding relation is used for representing the corresponding relation between the data in the second class of data and a second identifier, and the second identifier is determined according to the first identifier; inquiring a designated identifier corresponding to the first data in the first identifier according to the first corresponding relation; and querying the second data corresponding to the specified identification in the at least one second type of data according to the specified identification and the at least one second corresponding relation.

Further, determining the first corresponding relationship and the at least one second corresponding relationship according to the query instruction includes: respectively selecting the first type of data, the second type of data and a data corresponding relation in the preset database according to the query instruction, wherein the preset database comprises the data corresponding relation, and the data corresponding relation is the corresponding relation between the data in the second type of data and the data in the first type of data; setting the corresponding first identifier for each data in the first type of data to obtain the first corresponding relationship; setting a corresponding second identifier for each data in the at least one second type of data to obtain a second corresponding relationship; and determining the corresponding relation between the first identifier and the second identifier according to the data corresponding relation.

Further, setting the corresponding first identifier for each data in the first class of data includes: determining a Key Value pair (Key, Value) corresponding to each piece of data in the first type of data, wherein the Key of the Key Value pair (Key, Value) is a hash Value of the data in the first type of data obtained by a predetermined hash function, and the Value of the Key Value pair (Key, Value) is a stored Value of the data in the first type of data; setting the corresponding first identifier for the Key Value pair (Key, Value).

Further, querying the first data in the first class of data comprises: determining a hash value for querying the first data through the predetermined hash function; determining the Key-Value pair (Key, Value) whose Key is the same as a hash Value of the first data; determining a Value in the Key-Value pair (Key, Value) as the first data.

Further, the predetermined hash function is f_m(Key)＝Key mod2^HashValueWherein, HashValue is a predetermined positive integer.

Further, in a case that the at least one second correspondence is a plurality of second correspondences, querying the at least one second class of data for the second data corresponding to the specified identifier includes: and parallelly inquiring the second data corresponding to the specified identification in a plurality of second-class data.

Further, after querying the second data corresponding to the specified identifier in the at least one second class of data, the method further includes: and synthesizing the first data and the second data corresponding to the specified identification to generate a query result.

According to another aspect of the embodiments of the present invention, there is also provided a data query apparatus, including: the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a query instruction, the query instruction is used for querying first data in first-class data and second data corresponding to the first data in at least one second-class data, and the first-class data and the second-class data are different types of data sets in a predetermined database; a first determining unit, configured to determine a first corresponding relationship and at least one second corresponding relationship according to the query instruction, and query the first data in the first class of data, where the first corresponding relationship is used to indicate a corresponding relationship between data in the first class of data and a first identifier, the second corresponding relationship is used to indicate a corresponding relationship between data in the second class of data and a second identifier, and the second identifier is determined according to the first identifier; a first query unit, configured to query, in the first identifier, a specified identifier corresponding to the first data according to the first correspondence; and the second query unit is used for querying the second data corresponding to the specified identifier in the at least one second type of data according to the specified identifier and the at least one second corresponding relation.

According to still another embodiment of the present invention, there is also provided a storage medium including a stored program, wherein the program executes to perform the data query method of any one of the above.

According to another embodiment of the present invention, there is also provided a processor, configured to execute a program, where the program executes to perform the data query method of any one of the above.

In the embodiment of the present invention, if first data in first-class data and second data corresponding to the first data in second-class data need to be queried, a query instruction for querying the first data and the second data may be obtained, then a corresponding relationship between data in the first-class data and a first identifier may be determined according to the query instruction, a corresponding relationship between data in the second-class data and a second identifier and a corresponding relationship between the first identifier and the second identifier may also be determined, the first data is queried in the first-class data, then a specific identifier corresponding to the first data is determined according to the first corresponding relationship, further, the second data is determined in the second-class data according to the second corresponding relationship of the specific identifier, so as to obtain the first data and the second data corresponding to the first data in the second-class data, the method achieves the purpose of respectively inquiring in a plurality of data sets corresponding to the inquiry instruction according to the inquiry instruction, can respectively finish various inquiry contents in the inquiry instruction in a small-range data set according to the inquiry instruction, and quickly finish the inquiry of data with the characteristic of obliquity, such as satellite names, launching fields and operation mechanisms in communication satellite types of a certain country, thereby realizing the technical effect of improving the inquiry efficiency of the data with the characteristic of obliquity and further solving the technical problem of low data efficiency of inquiring the characteristic of obliquity.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

FIG. 1 is a flow chart of a data query method according to an embodiment of the invention;

FIG. 2 is a schematic diagram of a method for optimizing a space intelligence data query based on a column store hash partition according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of generating a query plan graph from a query plan, according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a data query device according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In accordance with an embodiment of the present invention, there is provided a data query method embodiment, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that while a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.

Fig. 1 is a flowchart of a data query method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S102, obtaining a query instruction, wherein the query instruction is used for querying first data in first-class data and second data corresponding to the first data in at least one second-class data, and the first-class data and the second-class data are different types of data sets in a predetermined database;

step S104, determining a first corresponding relation and at least one second corresponding relation according to the query instruction, and querying first data in the first class of data, wherein the first corresponding relation is used for representing the corresponding relation between the data in the first class of data and a first identifier, the second corresponding relation is used for representing the corresponding relation between the data in the second class of data and a second identifier, and the second identifier is determined according to the first identifier;

step S106, inquiring an appointed identification corresponding to the first data in the first identification according to the first corresponding relation;

step S108, according to the designated identification and the at least one second corresponding relation, second data corresponding to the designated identification is inquired in the at least one second type data.

Through the steps, in the query process, if first data in first class data and second data corresponding to the first data in second class data need to be queried, a query instruction for querying the first data and the second data can be obtained, then a corresponding relation for representing the data in the first class data and a first identifier can be determined according to the query instruction, a corresponding relation for representing the data in the second class data and a second identifier can be determined, a corresponding relation between the first identifier and the second identifier can be determined, the first data is queried in the first class data, then a designated identifier corresponding to the first data is determined according to the first corresponding relation, further the second data is determined in the second class data according to the second corresponding relation of the designated identifier, the first data and the second data corresponding to the first data in the second class data are obtained, the method achieves the purpose of respectively inquiring in a plurality of data sets corresponding to the inquiry instruction according to the inquiry instruction, can respectively finish various inquiry contents in the inquiry instruction in a small-range data set according to the inquiry instruction, and quickly finish the inquiry of data with the characteristic of obliquity, such as satellite names, launching fields and operation mechanisms in communication satellite types of a certain country, thereby realizing the technical effect of improving the inquiry efficiency of the data with the characteristic of obliquity and further solving the technical problem of low data efficiency of inquiring the characteristic of obliquity.

Optionally, according to the technical solution provided by the foregoing embodiment, the query of the data with the tilt characteristic may be completed in the data stored in the predetermined database.

In the scheme provided in step S102, the query instruction may be an instruction for querying data having a characteristic of skewness. For example, the instruction may be an instruction to inquire about "satellite name, transmission field, and operation institution in a communication satellite type of a certain country".

Optionally, the first type of data and the second type of data may be different types of data sets in a predetermined database, the first data may be data that needs to be queried in the first type of data, and the second data is data corresponding to the first data in the second type of data.

It should be noted that, there is a corresponding relationship between the data in the first type of data and the data in the second type of data, that is, a data corresponding relationship, and the data relationship in the predetermined database can be visually represented in the form of table 1.

Table 1 is a data table for representing data correspondence in a predetermined database according to an embodiment of the present invention, as shown in table 1.

A	B
		A1	B1
A2	B2
		A3	B3
A4	B4
		A5	B4
A6	B6

TABLE 1

In table 1, the first type of data is represented by a, and the second type of data is represented by B, so a1 in the a type of data corresponds to B1 in the B type of data; a2 in the A class data corresponds to B2 in the B class data; a3 in the A class data corresponds to B3 in the B class data; a4 in the A class data corresponds to B4 in the B class data; a5 in the A class data corresponds to B5 in the B class data; a6 in the class A data corresponds to B6 in the class B data.

In the scheme provided in step S104, the first corresponding relationship and the at least one second corresponding relationship may be determined according to the query instruction.

Optionally, after the query instruction is obtained, the first type of data and the at least one second type of data may be extracted from the predetermined database, a first identifier may be set for each data in the first type of data, and a second identifier may be set for the at least one second type of data, so that the first corresponding relationship may be a data table of the first type of data and the first identifier, or may be a mapping relationship between a storage location of the first type of data and the first identifier; similarly, the second corresponding relationship may be a data table of the second type data and the second identifier, or may be a mapping relationship between a storage location of the second type data and the second identifier.

Optionally, the first identification data is used for identifying data in the first type of data; the second type of data is used to identify data in the second type of data.

Taking table 1 as An example, if the query instruction is "query for B-class data corresponding to An", a first identifier may be set for each data in class a to obtain a first corresponding relationship, then a second identifier is determined according to the first identifier, and a second identifier is set for each data in class B to obtain a second corresponding relationship, where the first corresponding relationship may be visually represented in a form shown in table 2, and the second corresponding relationship may be visually represented in a form shown in table 3.

Table 2 is a data table of a first correspondence relationship according to an embodiment of the present invention, as shown in table 2.

First mark	A
		1	A1
2	A2
		3	A3
4	A4
		5	A5
6	A6

TABLE 2

In table 2, the first type of data is represented by a, and the first correspondence relationship is: a1 in the class A data corresponds to a first identifier 1; a2 in the class A data corresponds to a first identifier 2; a3 in the class A data corresponds to a first identifier 3; a4 in the class A data corresponds to a first identifier 4; a5 in the class A data corresponds to a first identifier 5; a6 in the class a data corresponds to the first identity 6.

Optionally, the second identifier may be determined according to the first identifier, and then the second identifier a is determined according to the first identifier 1, the second identifier b is determined according to the first identifier 2, the second identifier c is determined according to the first identifier 3, the second identifier d is determined according to the first identifier 4, the second identifier e is determined according to the first identifier 5, and the second identifier f is determined according to the first identifier 6; that is, the first identifier 1 corresponds to the second identifier a, the first identifier 2 corresponds to the second identifier b, the first identifier 3 corresponds to the second identifier c, the first identifier 4 corresponds to the second identifier d, the first identifier 5 corresponds to the second identifier e, and the first identifier 6 corresponds to the second identifier f.

Table 3 is a data table of a second corresponding relationship according to the embodiment of the present invention, as shown in table 3.

TABLE 3

In table 3, the second type of data is represented by B, and the second correspondence is: b1 in the B type data corresponds to a second identifier a; b2 in the B type data corresponds to a second identifier B; b3 in the B type data corresponds to a second identifier c; b4 in the B type data corresponds to a second identifier d; b5 in the B-type data corresponds to a second identifier e; b6 in the class B data corresponds to the second identifier f.

In table 2, if An is a1, a2, and a6, a1, a2, and a6 are the first data, and a1, a2, and a6 can be searched from the first type data a.

Optionally, the second identifier may be determined according to the first identifier, the second identifier may also be the same as the first identifier, and the second object relationship may also be visually represented in a form as shown in table 4.

Table 4 is a data table of another second correspondence relationship according to the embodiment of the present invention, as shown in table 4.

First mark	B
		1	B1
2	B2
		3	B3
4	B4
		5	B4
6	B6

TABLE 4

In table 4, the second type of data is represented by B, and the second correspondence is: b1 in the B type data corresponds to a first identifier 1; b2 in the B type data corresponds to a first identifier 2; b3 in the B type data corresponds to a first identifier 3; b4 in the B type data corresponds to a first identifier 4; b5 in the B type data corresponds to a first identifier 5; b6 in the class B data corresponds to the first identity 6.

In the scheme provided in step S106, after querying the first data in the first type of data, the specific identifier corresponding to the first data in the first identifier may be determined.

Based on the above tables 1 to 4, in the case that the first data is searched from the first type data a as a1, a2, and a6, it can be determined that the first identifiers corresponding to a1, a2, and a6 are 1, 2, and 6, that is, the identifiers are designated as 1, 2, and 6.

In the solution provided in step S108, after the specific identifier is determined, second data corresponding to the specific identifier may be queried in the second type of data according to the specific identifier and the second corresponding relationship.

Based on the above tables 1 to 4, in the case where it is determined that the designated identifiers are 1, 2, and 6, it may be determined that, in the second type data B, the designated identifier 1 corresponds to B1, the designated identifier 2 corresponds to B2, and the designated identifier 6 corresponds to B6, and then it may be determined that the second data is B1, B2, and B6.

According to the above embodiment of the present invention, based on the predetermined database as shown in table 1, in the case that the query instruction is "query for the B-class data corresponding to An", the first data a1, a2, and a6, and the second data B1 corresponding to the first data a1, the second data B2 corresponding to the first data a2, and the second data B6 corresponding to the first data a6 can be obtained.

As an alternative embodiment, the determining the first corresponding relationship and the at least one second corresponding relationship according to the query instruction includes: respectively selecting a first type of data, a second type of data and a data corresponding relation in a preset database according to the query instruction, wherein the preset database comprises the data corresponding relation, and the data corresponding relation is the corresponding relation between the data in the second type of data and the data in the first type of data; setting a corresponding first identifier for each data in the first type of data to obtain a first corresponding relation; setting a corresponding second identifier for each data in at least one second type of data to obtain a second corresponding relation; and determining the corresponding relation between the first identifier and the second identifier according to the data corresponding relation.

By adopting the embodiment of the invention, the predetermined database stores the first type of data, the second type of data and the data corresponding relation for representing the corresponding relation between the data in the second type of data and the data in the first type of data, after the query instruction is obtained, the first type of data can be extracted from the predetermined database according to the query instruction, and a corresponding first identifier is set for each data in the first type of data to obtain the first corresponding relation; the method can also extract at least one second type of data from a preset database according to the query instruction, set a corresponding second identifier for each data in the second type of data to obtain a second corresponding relationship, and then determine the corresponding relationship between the first identifier and the second identifier according to the data corresponding relationship, so that the relation between the first corresponding relationship and the second corresponding relationship can be established according to the first identifier and the second identifier, and after the first data is queried in the first type of data, the second data can be queried in the second type of data according to the first identifier corresponding to the first data.

As an optional embodiment, the setting, for each data in the first class of data, a corresponding first identifier includes: determining a Key Value pair (Key, Value) corresponding to each piece of data in the first type of data, wherein the Key of the Key Value pair (Key, Value) is a hash Value of the data in the first type of data obtained by a predetermined hash function, and the Value of the Key Value pair (Key, Value) is a stored Value of the data in the first type of data; and setting a corresponding first identification for the Key Value pair (Key, Value).

By adopting the above embodiment of the present invention, each data in the first type of data is provided with the corresponding first identifier, the first type of data may be mapped to obtain the Key Value pair (Key, Value) corresponding to the first type of data, and then the corresponding first identifier is set for the Key Value pair (Key, Value).

Optionally, in a Key Value pair (Key, Value) corresponding to the first type of data, the Key is a hash Value of data in the first type of data obtained through a predetermined hash function, and the Value is a stored Value of the data in the first type of data, such as a storage location of the data in a predetermined database.

Optionally, the first identifier may be an id, and a corresponding first identifier is set for a Key Value pair (Key, Value), so that a corresponding relationship (id, (Key, Value)) between the first identifier and the Key Value pair can be obtained, and thus, the first corresponding relationship is represented in a Key Value pair form, so that the data amount occupied by the first corresponding relationship is minimized, the query speed is increased in the query process, and the query efficiency is improved.

Alternatively, in a case where the hash Value of the data in the first type of data is determined according to a predetermined hash function, that is, in a case where the Key is determined by the predetermined hash function, Key Value pairs (keys) identical to the Key may be merged.

As an alternative embodiment, querying the first data in the first type of data comprises: determining a hash value for querying the first data through a predetermined hash function; determining a Key Value pair (Key, Value) of which the Key is the same as the hash Value of the first data; determining a Value in a Key Value pair (Key) as first data.

By adopting the above embodiment of the present invention, after the query instruction is obtained, the hash Value for querying the first data may be obtained through a predetermined hash function, then the Key Value pair (Key, Value) having the Key that is the same as the hash Value of the first data is selected from the Key Value pairs (Key, Value) corresponding to the first type of data, and the Value in the selected Key Value pair (Key, Value) is determined as the first data.

As an alternative embodiment, the predetermined hash function is f_m(Key)＝Key mod2^HashValueWherein, HashValue is a predetermined positive integer.

As an optional embodiment, after determining the first corresponding relationship and the at least one second corresponding relationship according to the query instruction, the method further includes: determining a third corresponding relation according to the first corresponding relation and at least one second corresponding relation, wherein the third corresponding relation comprises: the corresponding relation of the data in the first class of data, the data in the second class of data and the first identifier; the corresponding relation of the data in the first class of data, the data in the second class of data and the second identifier; and the corresponding relation of the data in the first class of data, the data in the second class of data, the first identification and the second identification.

Optionally, the third corresponding relationship may include a corresponding relationship between data in the first type of data, data in the second type of data, and the first identifier. In the embodiment based on tables 1 to 4, the third correspondence may be visually expressed in the form as shown in table 5.

Table 5 is a data table of a third correspondence relationship according to the embodiment of the present invention, as shown in table 5.

First mark	A	B
			1	A1	B1
2	A2	B2
			3	A3	B3
4	A4	B4
			5	A5	B4
6	A6	B6

TABLE 5

Optionally, the third corresponding relationship may include a corresponding relationship between data in the first type of data, data in the second type of data, and the second identifier. In the embodiment based on tables 1 to 4, the third correspondence may also be visually expressed in the form as shown in table 6.

Table 6 is a data table of another third correspondence relationship according to the embodiment of the present invention, as shown in table 6.

Second label	A	B
			a	A1	B1
b	A2	B2
			c	A3	B3
d	A4	B4
			e	A5	B4
f	A6	B6

TABLE 6

Optionally, the third corresponding relationship may include a corresponding relationship between data in the first type of data, data in the second type of data, the first identifier, and the second identifier. In the embodiment based on tables 1 to 4, the third correspondence may also be visually expressed in the form as shown in table 7.

Table 7 is a data table of a third correspondence relationship according to the embodiment of the present invention, as shown in table 7.

TABLE 7

As an alternative embodiment, in the case that the at least one second correspondence relationship is a plurality of second correspondence relationships, querying the at least one second type data for the second data specifying the identity of the corresponding second data includes: and inquiring the second data corresponding to the specified identification in the plurality of second-class data in parallel.

By adopting the above embodiment of the present invention, when at least one second mapping relationship is a plurality of second mapping relationships, the second data corresponding to the designated identifier can be queried in parallel in the plurality of second types of data.

For example, after determining the designated identifier, at least one second corresponding relationship, a second corresponding relationship B and a second corresponding relationship C, may query, in parallel, second data corresponding to the designated identifier in the second corresponding relationship B and the second corresponding relationship C.

Optionally, in different threads, second data corresponding to the specified identifier in the second corresponding relationship B and second data corresponding to the specified identifier in the second corresponding relationship C may be respectively queried.

As an optional embodiment, after querying the at least one second type of data for the second data that specifies the corresponding identifier, the method further includes: and synthesizing the first data and the second data corresponding to the specified identification to generate a query result.

By adopting the embodiment of the invention, after the first data and the second data are obtained, the first data and the second data can be integrated to generate the query result.

For example, in the above embodiment based on tables 1 to 4, in the case that the query instruction is "query for class B data corresponding to An", first data a1, a2, and a6, and second data B1 corresponding to the first data a1, second data B2 corresponding to the first data a2, and second data B6 corresponding to the first data a6 may be obtained, and the generated query result may be visually represented in the form shown in table 8 by integrating the first data and the second data.

Table 8 is a data table of results from a query according to an embodiment of the present invention, as shown in table 8.

TABLE 8

According to the embodiment of the invention, the query of the data with the tilt characteristic, such as the satellite name, the launching site and the operation mechanism in the communication satellite type of a certain country, can be quickly completed, the query speed can be accelerated and the query efficiency can be improved through the first corresponding relation and the at least one second corresponding relation, and under the condition that the at least one second corresponding relation is a plurality of second corresponding relations, the plurality of second corresponding relations can be queried in parallel, so that the query efficiency can be further improved.

The invention also provides a preferred embodiment, which provides a method for optimizing the aerospace intelligence data query based on the column storage Hash partition.

Data query optimization methods can be divided into two categories: a distributed parallel data query optimization method and a column storage data query optimization method.

The distributed parallel data inquiry connects a plurality of computers with different places or different data with different functions, and utilizes a control system to uniformly manage and coordinate to complete the computer system of the data inquiry. The distributed parallel data query optimization method is divided into a one-stage parallel query optimization method and a two-stage parallel query optimization method. The first-stage parallel query optimization method can be directly generated through a query plan; the two-stage query optimization method is used for parallelly calculating a query execution plan generated in one stage on the basis of the one-stage method.

By changing the physical storage of the database, the column storage data query optimization has higher query efficiency. The traditional row storage database utilizes materialized views to improve query efficiency. And the data query of the column storage database only extracts some data columns related to the query statement, so that the extraction of irrelevant columns can be avoided, the input/output (I/O) is reduced, and the query efficiency is improved. Meanwhile, the data stored in the same column have the same data structure and the data repeatability probability is higher, so that the data compression can be performed by using a compression algorithm. The compressed data can reduce the storage cost and promote the improvement of the query efficiency.

The column storage data query optimization technology utilizes different physical storage layers to store data, and the same data structure is utilized to improve the data query efficiency. This effect is premised on operations such as precomputation, table join and aggregation, and although these time-consuming input/output (I/O) operations are avoided when executing queries, because of the large number of materialized view intermediate results generated during execution, scalability is lacking, and query performance cannot be improved when processing large dataset analysis, etc. Meanwhile, in the query optimization stage, the column storage data query optimization technology does not consider the parallel computation of the query tasks, and computing resources are wasted.

The invention aims to overcome the defects of the existing storage method, better combine a Hash partition with a column storage database, apply the Hash partition to the field of aerospace information data, effectively query by adopting a multi-core parallel computing method, utilize the column storage database and the Hash partition and use a multi-core parallel computing technology based on a mapping-simplification model, load related data columns into a memory, dynamically distribute query operation to a plurality of cores by using the mapping-simplification model for parallel computing, and can shorten the data query return time and improve the data query efficiency aiming at oblique aerospace information data.

It should be noted that Hash partitioning (Hash partitioning), also called Hash partitioning, is a partitioning method for realizing uniform distribution of data by specifying partition numbers, and by performing Hash partitioning on an input/output device, when data reaches a certain scale, the partitions are approximately consistent in size, thereby improving the efficiency of the whole query processing. Hash partitioning is used primarily to ensure that data is evenly distributed across a predetermined number of partitions. It may select the partition based on the return value of a user-defined expression that is computed using the column values of the rows to be inserted into the table. The use of hash partitioning can achieve a high level of parallelism in processing data and reduce response time. Hash partitioning has the advantages of strong availability, convenient maintenance, balanced I/O and improved query performance. By combining with the column storage database, the I/O of different partitions mapped to the disk can be further balanced, so that the performance of data query is improved.

The technical idea for realizing the invention is as follows: and generating a query plan, and constructing a corresponding mapping-simplifying flow.

In the process of executing the mapping-simplification process, the aerospace intelligence data is divided into a plurality of modules by using a Hash partition, wherein one mapping-simplification process may comprise a plurality of sub-mapping-simplification processes, each sub-mapping-simplification process only comprises one mapping stage and one simplification stage, all the sub-mapping-simplification processes are executed, and the query plan is finally realized.

Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

It should be noted that the following section adopts an 8-core processor, and the invention is described in conjunction with the query of the space information (space Info) of a part of space intelligence data table.

Fig. 2 is a schematic diagram of a method for optimizing a space intelligence data query based on a column storage hash partition according to an embodiment of the present invention, as shown in fig. 2, the method includes:

step 201, reading a query statement.

At step 202, a query plan is generated.

And step 203, carrying out Hash partitioning according to the query plan.

And step 204, constructing a mapping-simplification process according to the query plan.

Step 205, the map-reduce flow is executed according to the query plan.

And step 206, outputting a query result.

In the scheme provided in step S201, the query statement may be an SQL statement.

It should be noted that SQL, which is called Structured Query Language in english, is a database Query and programming Language for storing data and querying, updating and managing a relational database system.

In this embodiment, the preset makes a simple query to the spatial information (space Info) as follows: the satellite name, launch site and operating agency are located in the U.S. communication satellite type.

In the scheme provided in step S202, a query plan is generated, and a query plan for a query statement may be generated by using a time-lapse materialization strategy.

It should be noted that the delayed materialization strategy is proposed for the advanced materialization strategy, and both strategies belong to the category of tuple materialization. Tuple materialization is to merge tuples (one or several rows of data) which need to be logically merged to generate materialized tuples and store the materialized tuples in a memory. According to the different time points of combination, the two strategies can be divided. Materializing the tuple before submitting the query by the advanced materialization strategy; and the delayed materialization strategy delays the materialization time as much as possible and materializes the tuple in the query. In order to reduce space and time overhead and reduce unnecessary materialization operations that may exist, therefore, in the column store database, a delayed materialization strategy is generally adopted.

Fig. 3 is a schematic diagram of generating a query plan graph according to a query plan according to an embodiment of the present invention, as shown in fig. 3, where a PR operator is used to represent a mapping operation, a mapping relationship of data stored in a column storage database is established through the mapping operation, so as to obtain new column data, and a return Value (id, (Key, Value)) after the mapping operation is obtained, where id is a row number of a row where data in the new column data is located, Key is a hash Key Value of the mapping operation, and Value is a storage Value of the column data; the PC operator is used for expressing a parallel computing operator and is used for realizing parallel computing of the Hash partition; the PA operator contains column data and an id serial number, is used for taking out data corresponding to the id serial number from a column storage database, and outputs the data as (id, value); merge indicates that data at the same position is merged and output as (id, value)₁，value₂，……，value_n)。

For example, for "finding out the satellite name, the launching site and the operating agency in the american communication satellite type", the PR operator means mapping the nationality and the satellite type stored in the column storage database, wherein the nationality and the satellite type are used as data storage values, the hash Key values of the nationality and the satellite type are used as keys, and the row number of each nationality and satellite type is set as id; the PC operational character represents that Hash calculation is carried out on the nationality and the satellite type to obtain Hash key values of the nationality and the satellite type; the PA operator indicates that the satellite name, the launching site and the operating mechanism corresponding to each id are taken out from the column storage database, and the satellite name is used as value₁Transmitting field as value₂And operating agencies as value₃(ii) a Merge operator represents id, value₁、value₂And value₃Merge and output (id, value)₁，value₂，value₃)。

Table 9 is a data table for column storing partial data in the database, as shown in table 9, according to an embodiment of the present invention.

TABLE 9

The nationality (county) of the satellite, the type (type) of the satellite, the name (name) of the satellite, the transmission field (location) of the satellite, the operation institution (organization) of the satellite, and the id number of the satellite stored in the column storage database are shown in table 9.

Alternatively, in the PR process, a data table of query statements may be derived, as shown in Table 10.

Table 10 is a data table of a query statement according to an embodiment of the present invention, as shown in table 10.

Watch 10

Table 10 is a data table indicating correspondence between the nationality (county) of the satellite, the type (type) of the satellite, and the id, and the nationality (county) of the satellite, and the type (type) of the satellite are used as Value in the Key Value pair (id (Key, Value)).

Alternatively, in the PA process, a name data table, a transmission field data table, and an operating mechanism data table may be obtained, as shown in tables 11, 12, and 13, respectively.

Table 11 is a table for indicating (id, value) according to an embodiment of the present invention₁) Table 11, as shown in table 11.

TABLE 11

Table 11 is a data table indicating the correspondence between the satellite name (name) and the id, and the satellite name (name) is (id, value)₁) Value in (1)₁。

Table 12 is a table for indicating (id, value) according to an embodiment of the present invention₂) Table 12, as shown in table 12.

TABLE 12

Table 12 is a data table for indicating the correspondence between the transmission field (location) and id of the satellite, and the transmission field (location) of the satellite is (id, value)₂) Middle value₂。

Table 13 is a table for indicating (id, value) according to an embodiment of the present invention₃) The operating mechanism data table of (2) is shown in table 13.

Watch 13

Table 13 is a data table showing the association between the satellite operator organization (organization) and the id, and the (id, value) of the satellite operator organization (organization) is used as the (id, value)₃) Middle value₃。

Alternatively, in the Merge process, logical structure data tables of the space intelligence data table may be obtained, as shown in table 14 respectively.

Table 14 is a logical structure data table of an aerospace intelligence data table according to an embodiment of the present invention, as shown in table 14,

TABLE 14

In the scheme provided in step S203, a Key Value pair (Key, Value) may be generated according to the query plan, and hash partitioning may be performed, including the following steps:

a) the column store structure is built according to the query plan.

b) The stored entries are mapped to generate corresponding Key Value pairs (keys, values), in this case, for "finding out the type of the american Communication satellite", the America-Communication in the column storage database may be mapped to (1), as shown in table 1.

c) Carrying out Hash partitioning, and setting a Hash function as follows: f. of_m(Key)＝Key mod2^HashValueHashValue is a predefined positive integer hash parameter, the Value range is [1, + ∞ ], Hash calculation is performed on Key values in Key values (Key, Value) of aerospace intelligence data, Key values with the same calculation result are obtained, and are distributed to multiple cores in an idle state to execute parallel operation, and finally, items required by a query plan are obtained.

In the solution provided in step S204, a map-reduce execution flow may be constructed according to the query plan, where one map-reduce execution flow may include a plurality of sub-map-reduce execution flows, and each sub-map-reduce execution flow includes only one map phase and one reduce phase, including the following steps:

a) splitting the query plan into serially executed sub-query plans, representing the sub-query plans in the following sequence: (Plan)₁，Plan₂，Plan₃，……，Plan_n)。

b) And traversing the sub-query plan sequence to construct a query plan mapping-simplification process. The following sub-query plans are serialized:

Plan₁：

Plan₂：

Plan₃：

Plan₄：

wherein Plan₁And Plan₂Composing a map-reduce procedure, Plan₃And Plan₄Composing a map-reduce procedure, Plan₃The corresponding sub-query plans may be computed in parallel.

In Plan, Plan₁And Plan₂During the mapping-simplification of the composition, Plan₁The middle PR operator is a mapping process, Plan₂The PC operator is a simplification process,

In Plan, Plan₃And Plan₄During the mapping-simplification of the composition, Plan₃The middle PA operator is a mapping process; plan₄The Merge operator is a mapping process.

In the solution provided in step S205, executing the query plan according to the constructed map-reduce execution flow specifically includes the following steps:

a) counting the number of sub-mapping-simplifying execution flows, and marking the number as Nu;

b) traversing all the child mapping-simplifying execution flows;

c) initializing k to 1;

d) when k is less than Nu, reading the kth sub-mapping-simplification execution flow, and determining a specific data structure through query plan analysis;

e) the following mapping stage and simplification stage are performed in sequence:

e1) reading a mapping sub-query plan in a sub-process, and distributing a plurality of parallel operations to the core computation in an idle state by using a multi-core parallel technology;

e2) reading a simplified sub-query plan in the sub-process, reading data from the intermediate result set, and completing the sub-query plan;

e3) k + +, return d);

in the scenario provided in step S206, the query results are output, as shown in Table 15.

Table 15 is a query result data table according to an embodiment of the present invention, as shown in table 15,

id	country	type	name	location	organization
						1	America	communication	comm1	Kennedy Space Center	NASA
2	America	communication	comm2	Kennedy Space Center	NOAA
						6	America	communication	comm4	Vandenberg Air Force Base	NASA
7	America	communication	comm5	Cape Canaveral	NOAA

watch 15

As shown in table 15, the satellite name, the transmission field, and the operator in the american communication satellite type are indicated.

According to the technical scheme provided by the invention, a new column storage structure is designed by combining a column storage database according to the special characteristics of the space information data, the space information data is quickly partitioned by Hash partitioning, multi-core parallel computation is realized by using a mapping-simplification model, the return time of a query result is shortened, and the cache efficiency and the data query efficiency are improved.

According to another aspect of the present invention, an embodiment of the present invention further provides a storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus on which the storage medium is located is controlled to execute the data query method described above.

According to another aspect of the present invention, an embodiment of the present invention further provides a processor, where the processor is configured to execute a program, where the program executes the data query method described above.

According to an embodiment of the present invention, there is also provided an embodiment of a data query apparatus, and it should be noted that the data query apparatus may be configured to execute a data query method in the embodiment of the present invention, and the data query method in the embodiment of the present invention may be executed in the data query apparatus.

Fig. 4 is a schematic diagram of a data query apparatus according to an embodiment of the present invention, and as shown in fig. 4, the apparatus may include: the obtaining unit 41 is configured to obtain a query instruction, where the query instruction is used to query first data in first class data and second data corresponding to the first data in at least one second class data, and the first class data and the second class data are different types of data sets in a predetermined database; the first determining unit 43 determines a first corresponding relationship and at least one second corresponding relationship according to the query instruction, and queries the first data in the first class of data, where the first corresponding relationship is used to represent a corresponding relationship between data in the first class of data and a first identifier, the second corresponding relationship is used to represent a corresponding relationship between data in the second class of data and a second identifier, and the second identifier is determined according to the first identifier; the first querying unit 45 queries the first identifier for a specific identifier corresponding to the first data according to the first corresponding relationship; the second query unit 47 queries the second data corresponding to the specified identifier in the at least one second type of data according to the specified identifier and the at least one second corresponding relationship.

It should be noted that the obtaining unit 41 in this embodiment may be configured to execute step S102 in this embodiment, the first determining unit 43 in this embodiment is configured to execute step S104 in this embodiment, the first querying unit 45 in this embodiment may be configured to execute step S106 in this embodiment, and the second querying unit 47 in this embodiment may be configured to execute step S108 in this embodiment. The modules are the same as the corresponding steps in the realized examples and application scenarios, but are not limited to the disclosure of the above embodiments.

As an alternative embodiment, the first determination unit includes: the selecting module is used for respectively selecting first type data in a preset database according to the query instruction, wherein the preset database comprises a data corresponding relation, and the data corresponding relation is a corresponding relation between data in the second type data and data in the first type data; the first setting module is used for setting a corresponding first identifier for each data in the first type of data to obtain a first corresponding relation; the second setting module is used for setting a corresponding second identifier for each data in at least one second type of data to obtain a second corresponding relation; and the first determining module is used for determining the corresponding relation between the first identifier and the second identifier according to the data corresponding relation.

As an alternative embodiment, the first setting module includes: a second determining module, configured to determine a Key-Value pair (Key, Value) corresponding to each piece of data in the first type of data, where the Key of the Key-Value pair (Key, Value) is a hash Value obtained by the data in the first type of data through a predetermined hash function, and the Value of the Key-Value pair (Key, Value) is a stored Value of the data in the first type of data; and the first setting submodule is used for setting a corresponding first identifier for a Key Value pair (Key, Value).

As an alternative embodiment, the first determination unit includes: a third determining module, configured to determine, by using a predetermined hash function, a hash value for querying the first data; a fourth determining module, configured to determine a Key Value pair (Key, Value) whose Key is the same as the hash Value of the first data; a fifth determining module, configured to determine that Value in a Key Value pair (Key) is the first data.

As an alternative embodiment, the embodiment may further include: the second determining unit is used for determining a third corresponding relation according to the first corresponding relation and the at least one second corresponding relation after the first corresponding relation and the at least one second corresponding relation are determined according to the query instruction, wherein the third corresponding relation comprises the corresponding relation of the data in the first type of data, the data in the second type of data and the first identifier; and the corresponding relation of the data in the first class of data, the data in the second class of data and the second identifier.

As an alternative embodiment, in the case that the at least one second correspondence is a plurality of second correspondences, the second querying unit includes: and the parallel query module is used for querying second data corresponding to the specified identification in the plurality of second-class data in parallel.

As an alternative embodiment, the embodiment may further include: and the query result integration unit is used for integrating the first data and the second data corresponding to the specified identification after querying the second data corresponding to the specified identification in at least one second type of data to generate a query result.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for querying data, comprising:

acquiring a query instruction, wherein the query instruction is used for querying first data in first-class data and second data corresponding to the first data in at least one second-class data, and the first-class data and the second-class data are different types of data sets in a predetermined database;

determining a first corresponding relation and at least one second corresponding relation according to the query instruction, and querying the first data in the first class of data, wherein the first corresponding relation is used for representing the corresponding relation between the data in the first class of data and a first identifier, the second corresponding relation is used for representing the corresponding relation between the data in the second class of data and a second identifier, and the second identifier is determined according to the first identifier;

inquiring a designated identifier corresponding to the first data in the first identifier according to the first corresponding relation;

according to the designated identification and the at least one second corresponding relation, second data corresponding to the designated identification is inquired in the at least one second type data;

wherein after querying the second data corresponding to the specified identifier in the at least one second type of data, the method further comprises:

and synthesizing the first data and the second data corresponding to the specified identification to generate a query result.

2. The method of claim 1, wherein determining the first correspondence and the at least one second correspondence according to the query instruction comprises:

respectively selecting the first type of data, the second type of data and a data corresponding relation in the preset database according to the query instruction, wherein the preset database comprises the data corresponding relation, and the data corresponding relation is the corresponding relation between the data in the second type of data and the data in the first type of data;

setting the corresponding first identifier for each data in the first type of data to obtain the first corresponding relationship;

setting a corresponding second identifier for each data in the at least one second type of data to obtain a second corresponding relationship;

and determining the corresponding relation between the first identifier and the second identifier according to the data corresponding relation.

3. The method of claim 2, wherein setting the corresponding first identifier for each data in the first class of data comprises:

determining a Key Value pair (Key, Value) corresponding to each piece of data in the first type of data, wherein the Key of the Key Value pair (Key, Value) is a hash Value obtained by the data in the first type of data through a predetermined hash function, and the Value of the Key Value pair (Key, Value) is a stored Value of the data in the first type of data;

setting the corresponding first identifier for the Key Value pair (Key, Value).

4. The method of claim 3, wherein querying the first data in the first class of data comprises:

determining a hash value for querying the first data through the predetermined hash function;

determining the Key-Value pair (Key, Value) whose Key is the same as a hash Value of the first data;

determining a Value in the Key-Value pair (Key, Value) as the first data.

5. The method of claim 3, wherein the predetermined hash function is f_m(Key)＝Keymod2^HashValueWherein, HashValue is a predetermined positive integer.

6. The method according to claim 1, wherein, in the case that the at least one second correspondence is a plurality of second correspondences, querying the at least one second type of data for the second data corresponding to the specified identifier comprises:

and parallelly inquiring the second data corresponding to the specified identification in a plurality of second-class data.

7. A data query apparatus, comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring a query instruction, the query instruction is used for querying first data in first-class data and second data corresponding to the first data in at least one second-class data, and the first-class data and the second-class data are different types of data sets in a predetermined database;

a first determining unit, configured to determine a first corresponding relationship and at least one second corresponding relationship according to the query instruction, and query the first data in the first class of data, where the first corresponding relationship is used to indicate a corresponding relationship between data in the first class of data and a first identifier, the second corresponding relationship is used to indicate a corresponding relationship between data in the second class of data and a second identifier, and the second identifier is determined according to the first identifier;

a first query unit, configured to query, in the first identifier, a specified identifier corresponding to the first data according to the first correspondence;

a second query unit, configured to query, according to the specified identifier and the at least one second correspondence, the second data corresponding to the specified identifier in the at least one second type of data;

wherein the apparatus further comprises:

and the query result integration unit is used for integrating the first data and the second data corresponding to the specified identification after the second data corresponding to the specified identification is queried in the at least one second type of data to generate a query result.

8. A storage medium characterized by comprising a stored program, wherein the program executes the data query method of any one of claims 1 to 6.

9. A processor, configured to run a program, wherein the program when running performs the data query method of any one of claims 1 to 6.