CN110678854B

CN110678854B - Data query method and device

Info

Publication number: CN110678854B
Application number: CN201780091399.0A
Authority: CN
Inventors: 高紫娟; 王铁英
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2017-05-31
Filing date: 2017-05-31
Publication date: 2021-10-15
Anticipated expiration: 2037-05-31
Also published as: CN110678854A; WO2018218504A1

Abstract

A method and a device for data query, the method comprises the following steps: acquiring an inquiry statement, wherein the inquiry statement is used for inquiring N layers of data in a data hierarchy structure of a data set to obtain target data to be inquired, the data hierarchy structure is a hierarchy structure for storing the data in the data set, the data set is stored in K files according to the sequence of the hierarchy in the data hierarchy structure from high to low, the K files comprise a first file and a second file, the first file is a file with the earliest creation time in the K files, the second file is a file with the latest creation time in the K files, and N and K are positive integers greater than 1 (310); if the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file, wherein the preset condition comprises that a filter condition used for querying the data set in the first sub-statement is null, the filter condition used for querying the data set in the second sub-statement is not null, the first sub-statement is used for querying top-layer data in the N-layer data, and the second sub-statement is used for querying bottom-layer data in the N-layer data (320). The method is beneficial to improving the data query efficiency.

Description

Data query method and device

Technical Field

The present application relates to the field of computers, and more particularly, to a method and apparatus for data querying.

Background

Distributed databases refer to a logically unified database formed by connecting a plurality of physically distributed data storage units using a high-speed computer network. The basic idea of the distributed database is to store data in an original centralized database to a plurality of data storage nodes connected through a network in a scattered manner, so as to obtain a larger storage capacity and a higher concurrent access amount. In recent years, with the rapid increase of data volume, distributed database technology has also been rapidly developed, traditional relational databases begin to develop from centralized models to distributed architectures, and relational-based distributed databases are moving from centralized storage to distributed storage and from centralized computing to distributed computing while maintaining the data models and basic features of traditional databases.

On the other hand, as the data volume is larger and larger, some defects which are difficult to overcome are exposed in the relational database, the advantages of high expandability, high concurrency and the like of the non-relational database represented by the NoSQL are rapidly developed, and a large number of NoSQL database products such as key-value storage systems, document type databases and the like appear in the market at one time. The document type database uses document storage as a database storage model, can support the storage of structured data and unstructured data, and has no mandatory limitation on the structure of the data stored in the document.

Currently, in a data query process of a document-type database, after receiving a query statement input by a user, a management system in the database queries files according to a fixed query sequence, for example, iterating files of a data set in which data is stored in a query sequence from an old file to a new file, so as to query data in the files.

However, such querying of files in a fixed query order as described above may result in inefficient querying of data. For example, if the filter condition in the query statement is only the attribute of the bottom data in the hierarchical structure for storing the data of the target application, and the query order of the files is the query order from the old file to the new file, since most of the data stored in the old file is the data close to the top in the data hierarchical structure for storing the data set, when the data is searched in the old file, since the filter condition for the data at the top in the data hierarchical structure in the query statement is empty, a large number of intermediate results that do not meet all filter conditions in the query statement may be generated, and the efficiency of querying the data is reduced.

Disclosure of Invention

The application provides a data query method and device, which are beneficial to improving the efficiency of querying data.

In a first aspect, a method for querying data is provided, including: acquiring a query statement, wherein the query statement is used for querying N layers of data in a data hierarchy structure of a data set, the data hierarchy structure is a hierarchy structure for storing data in the data set, the data set is stored in K files according to a sequence from high to low of hierarchies in the data hierarchy structure, the K files comprise a first file and a second file, the first file is a file with the earliest creation time in the K files, the second file is a file with the latest creation time in the K files, and N and K are positive integers greater than 1; when the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file to obtain target data to be queried, wherein the preset condition includes that a filter condition for querying the data set in a first sub-statement of the query statement is null, a filter condition for querying the data set in a second sub-statement of the query statement is not null, the first sub-statement is used for querying top-layer data in the N-layer data, and the second sub-statement is used for querying bottom-layer data in the N-layer data.

According to the data searching method in the embodiment of the application, when the first sub-statement and the second sub-statement in the query statement meet the preset condition, the K files can be queried according to the sequence from the second file to the first file, the data is queried from the second file (namely, a new file) by utilizing the filter condition in the second sub-statement, and in a certain degree, the query statement meeting the preset condition in the prior art can be reduced by still adopting the query sequence from the first file to the second file, so that a large number of intermediate results which do not meet the filter condition of the query statement are generated, and the efficiency of querying the data is improved.

With reference to the first aspect, in some implementations, the method further includes: and dividing the query statement into a plurality of sub-statements according to the data hierarchy structure of the data set according to the attribute of querying the data in the data set and the filtering condition for querying the data in the data set, which are contained in the query statement.

With reference to the first aspect, in some implementations, the query statement includes at least 3 filter conditions, and different filter conditions in the at least 3 filter conditions are used to filter data located in different layers in the data hierarchy, and when the query statement satisfies a preset condition, querying the K files according to a query sequence from the second file to the first file to obtain target data to be queried includes: and when the query statement meets the preset condition and the at least 3 filtering conditions are the filtering conditions which need to be met by the target data at the same time, querying the K files according to a query sequence from the second file to the first file to obtain the target data to be queried.

In the embodiment of the present application, if the "cross-layer" filtering condition for filtering data located at different layers in the data hierarchy satisfies the preset condition, the query may be performed in the K files according to the query sequence from the second file to the first file, which is beneficial to reducing the number of intermediate results that do not conform to all filtering conditions in the query statement.

With reference to the first aspect, in some implementations, the querying the K files according to the query sequence from the second file to the first file to obtain target data to be queried includes: when the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file to obtain target data to be queried, and querying each file in the K files according to a first sequence to obtain the target data, wherein the first sequence is a sequence from the bottom layer to the top layer according to a data hierarchy of the files, and the hierarchy sequence in the data hierarchy of the files is the same as the hierarchy sequence of the data hierarchy of the data set.

Querying the data in each file in the first order is advantageous for further reducing the number of intermediate results that do not meet all filter criteria of the query statement.

With reference to the first aspect, in some implementations, the method further includes: and when the query statement does not meet the preset condition, querying the K files according to the query sequence from the first file to the second file to obtain the target data to be queried.

For query statements that do not satisfy the preset condition, query according to a query sequence from the first file to the second file, which is beneficial to improve the speed of querying data, for example, a query statement that does not satisfy the preset condition may be a query statement that a filter condition in a first sub-statement of the query statement is not empty, and for the query statement of this type, query data according to a query sequence from the first file to the second file by using a data hierarchy structure, that is, in the process of querying data in the first file, if data to be queried does not satisfy the filter condition in the first sub-statement, lower data in the data hierarchy structure where the query data that does not satisfy the filter condition is located may not be queried any more.

With reference to the first aspect, in some implementation manners, when the query statement does not satisfy the preset condition, querying the K files according to a query sequence from the first file to the second file to obtain target data to be queried includes: when the query statement does not meet the preset condition, querying the K files according to a query sequence from the first file to the second file to obtain the target data to be queried, and querying each file in the K files according to a second sequence to obtain the target data, wherein the second sequence is a sequence from the top layer to the bottom layer according to a data hierarchy of the files, and the hierarchical sequence in the data hierarchy of the files is the same as the hierarchical sequence of the data hierarchy of the data set.

And querying data in each file according to the second sequence, which is favorable for further reducing the number of intermediate results which do not accord with all filtering conditions of the query statement.

In a second aspect, an apparatus for data query is provided, and the terminal includes means for performing the method in the first aspect.

In a third aspect, an apparatus for data query is provided, the apparatus comprising: a memory, a processor, an input/output interface, and a communication interface. Wherein, there is communication connection between the memory, the processor, the input/output interface and the communication interface, the memory is used for storing instructions, the processor is used for executing the instructions stored by the memory, when the instructions are executed, the processor executes the method of the first aspect through the communication interface, and controls the input/output interface to receive input data and information and output data such as operation results.

In a fourth aspect, a computer readable medium is provided, which stores program code for execution by a terminal device, the program code comprising instructions for performing the method in the first aspect.

In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of the above aspects.

The technical scheme provided by the application is beneficial to reducing intermediate results which do not accord with the query statement filtering condition in the data query process, so that the data query efficiency is improved.

Drawings

Fig. 1 is a schematic flow chart of a method of file generation of an embodiment of the present application.

Fig. 2 is a schematic block diagram of a file generated by the file generation method according to the embodiment of the present application.

Fig. 3 is a schematic flow chart of a method of data query according to an embodiment of the present application.

Fig. 4 is a schematic flow chart of a method of data query according to another embodiment of the present application.

Fig. 5 is a schematic block diagram of an apparatus for data query according to an embodiment of the present application.

Fig. 6 is a schematic block diagram of an apparatus for data query according to an embodiment of the present application.

Detailed Description

The technical solution in the present application will be described below with reference to the accompanying drawings.

The embodiment of the application can be applied to databases, in particular to document type databases. The document type database stores documents as a database storage model, and can support the storage of structured data. The user can send the query statement to the application module, the application module sends the query statement sent by the user to the management system in the database, the management system in the database queries the target data stored in the document type database according to the query statement, and finally the target data (namely the query result) can be presented to the user through the screen.

The structured data stored in the document-type database may be data stored in the document-type database in a data hierarchy structure, and the data hierarchy structure of the stored data according to the embodiment of the present application is briefly described below.

All files corresponding to a certain application may be stored in a Schema (Schema) in the distributed database. The data hierarchy for storing data in the file stored in each Schema may comprise a plurality of hierarchies, and the first field of each layer may be used to uniquely identify the data stored in the hierarchy, i.e. each layer may also store data of different data attributes, using the first field of each layer as a primary key.

The following describes in detail a data hierarchy structure used for data stored in a Schema for storing data in WeChat, taking WeChat as an example. The data hierarchy for storing data in the WeChat may specifically be:

the data hierarchy may be divided into 4 levels, and the top level of the data hierarchy to the bottom level of the data hierarchy are a first layer, a second layer, a third layer and a fourth layer in sequence (see table 1). The first layer (also called as the top layer) may use a User identifier (User ID) as a main key, and may store data with data attributes of User ID, Name (Name), gender (six), and Birthday (Birthday); the second layer may have Topic identification (Topic ID) as a primary key, and may store data whose data attributes are Topic ID, Title (Title), and issue Date (T _ Date) of the Topic; the third layer can use Comment ID as a main key and can store data with data attributes of Comment ID, Comment Content (C _ Content), Comment Date (C _ Date) and User ID (C _ User ID) of the User who gives a Comment; the fourth layer (also referred to as "bottom layer") may have a reply identification (Feed ID) as a primary key, and may store data having data attributes of Feed ID, reply (Feed) Content (F _ Content), and release Date (F _ Date) of Feed. Table 1 shows the above-described hierarchical structure of data in the storage WeChat in the form of a table.

TABLE 1

Level sequence number	Main key	Data attributes
			1	UserID	Name、Sex、Birthday
2	Topic ID	Title、T_Date
			3	Comment ID	C_Content，C_Date，C_User ID
4	Feed ID	F_Content，F+Date

The data storage process based on the above data hierarchy is briefly described below. During the process of storing the data into the database, the data can be stored sequentially from the top layer of the data hierarchy to the bottom layer of the data hierarchy according to the data hierarchy described above. In the process of storing data into the database, the database management system can output and store the data blocks formed in the cache region into a disk to form a file through a Flush operation. The file generation process of the embodiment of the present application is described in detail below with reference to fig. 1.

Fig. 1 is a schematic flow chart of a method of file generation of an embodiment of the present application. The method shown in fig. 1 comprises:

110, User (1, …) is stored in the database.

111, storing Topic (1, 1, …) in the database.

112, store Topic (1, 2, …) into the database.

113, deposit Comment (1, 1, 1, …) into the database.

114, deposit Comment (1, 1, 2, …) into the database.

115, file 1 is generated.

Specifically, the data stored in the buffer area of the database in steps 110 to 114 is stored in the disk of the database through Flush operation, and a file 1 is formed.

116, Feed (1, 1, 1, 1, …) is stored in the database.

117, Feed (1, 1, 2, 2, …) is stored in the database.

118, deposit Comment (1, 2, 3, …) into the database.

Specifically, when storing the Comment (1, 2, 3, …) in the database, it needs to carry the Comment ID of the primary key of the third layer, the primary key of the first layer (UserID) and the primary key of the second layer (Topic ID).

119, storing Feed (1, 2, 3, 4 …) in the database.

And 120, generating a file 2.

Specifically, the data stored in the buffer area of the database in steps 116 to 119 is stored in the disk of the database by Flush operation, and file 2 is formed.

It should be understood that the data stored in the file may be stored in the same file according to the same layer, but the data between different lower layers of the same upper layer may be stored in different files.

It should be noted that, in the process of generating a file by the file generation method shown in fig. 1, a Delete Tag Bitmap (Delete Tag Bitmap) may also be generated at the same time, and the Delete Tag Bitmap may be stored in the disk together with the corresponding file, where the Delete Tag Bitmap is used to indicate whether data stored in the file is valid. In the deletion tag bitmap, if the value corresponding to a bit is 0 (initial value), it may indicate that the data storage record corresponding to the bit is not deleted, and if the value corresponding to a bit is 1, it may indicate that the data storage record corresponding to the bit is deleted. Referring to fig. 2, if 5 data storage records, which are stored in the database in steps 110 to 114, are stored in file 1, 5 bits may be used in the deletion tag bitmap corresponding to file 1 to indicate whether the data stored in the database corresponding to the 5 data records in file 1 is deleted, and the values of the 5 bits corresponding to the 5 data storage records stored in file 1 are all 0, that is, none of the 5 data storage records in file 1 is deleted. Similarly, in the deletion tag bitmap, the value of 4 bits for indicating whether 4 data storage records in the file 2 are deleted is 0, which indicates that the data stored in the file 2 through the steps 116 to 119 are not deleted.

As can be seen from the method for generating a file shown in fig. 1, the numbers of the files may be numbered from small to large according to the generation sequence of the files, that is, the sequence of the numbers of the files from small to large is the same as the sequence from the old file to the new file, and a larger number of the file may indicate that the time for generating the file is closer to the current time, that is, the file is "new"; a smaller number of a file may indicate that the file was generated farther from the current time, i.e., the file is "older". In the process of storing data into a database according to a data hierarchy structure and generating a new file, if the number of the file is smaller, the generation time of the file is farther from the current time, and the file is more "old", the probability that the data stored in the file is located at the upper layer of the data hierarchy structure (for example, the first layer data and the second layer data shown in table 1) is higher; if the number of the file is larger, the generation time of the file is closer to the current time, and the file is more "new", the probability that the data stored in the file is located at a lower level in the data hierarchy (for example, the third layer data and the fourth layer data shown in table 1) is higher.

The embodiment of the present application proposes a method for querying data by using the hierarchical relationship between the new file and the old file and the data stored in the files in the data hierarchical structure, and the method for querying data in the embodiment of the present application is described in detail below with reference to fig. 3.

Fig. 3 is a schematic flow chart of a method of data query according to an embodiment of the present application. It should be understood that the method illustrated in FIG. 3 may be performed by a management system in a database, such as a distributed data management system. The method shown in fig. 3 comprises:

310, obtaining a query statement, where the query statement is used to query N-layer data in a data hierarchy, the data hierarchy is a hierarchy for storing data in a data set, and the data set is stored in K files according to a sequence from high to low in a hierarchy in the data hierarchy, where the K files include a first file and a second file, the first file is a file with the earliest creation time among the K files, the second file is a file with the latest creation time among the K files, and N and K are positive integers greater than 1.

Specifically, the data sets are stored in the K files in the order from the top to the bottom of the hierarchy in the data hierarchy, which may mean that the data in the data sets stored in the K files may be stored in the K files in the order from the top to the bottom of the hierarchy in the data hierarchy, and it does not exclude that, in the process of storing data in the K files according to the rule, there is a case where newly inserted data close to a higher layer in the data hierarchy is stored in a new file. For example, file 2 after the generation (i.e., the new file) shown in fig. 2 stores data Comment (1, 2, 3, …).

The first file is the oldest created file of the K files, that is, the first file is the old file mentioned above.

The second file is a file whose creation time is the latest among the K files, that is, the second file is the new file mentioned above.

It should be understood that the data set may be a set of all data in a certain application stored in a database, for example, a data set of all data in a WeChat.

It should also be understood that the query statement may be a query statement for a range query (range query) for finding at least one data meeting a filtering condition, for example, a Scan query statement or a key-bearing range query statement.

For example, query statement Q1 is: when the Select Topic ID, Title from WeiXin where Birthday ═ 1998 "and Title like" × ", the attribute of the data contained in the query sentence Q1 is: topic ID and Title (Title), the filter conditions in query statement Q1 include: "Birthday" ═ 1998 "and" Title like ". Table 2 shows the attributes and filter conditions of the data for each sub-statement in the query statement Q1. Two sub-statements may be contained in the query statement Q1: sub-sentences 1 and 2, the filter condition in sub-sentence 1 is "Birthday" 1998 ", and the attribute of the data in sub-sentence 1 is null, see the data hierarchy shown in table 1, the filter condition is used to filter the data with the attribute of" Birthday "at the first level in the data hierarchy, i.e. sub-sentence 1 is used to query the data at the first level in the data hierarchy shown in table 1; the filter condition in the sub-statement 2 is "Title like". about. ", and the attribute of the data is" Topic ID, Title ", see the data hierarchy shown in table 1, and the sub-statement 2 is used for querying the data located at the second level in the data hierarchy shown in table 1.

TABLE 2

Sub-sentence	Properties of data	Filtration conditions
			Sub-sentence 1	Birthday＝“1998”
Sub-sentence 2	Topic ID、Title	Title like“***”

It should be noted that, the fact that the attribute of the data of the sub-statement in the query statement is null may mean that the sub-statement only filters the attribute of the data in a certain layer in the data hierarchy, and may not reflect the attribute of the data in the layer in the query result, for example, the sub-statement 1 in the query statement Q1 only filters the data with the attribute "Birthday", and the query result corresponding to the query statement Q1 may not reflect the attribute of the data in the first layer queried by the sub-statement 1.

And 320, when the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file to obtain target data to be queried, where the preset condition includes that a filter condition for querying the data set in a first sub-statement of the query statement is null, a filter condition for querying the data set in a second sub-statement of the query statement is not null, the first sub-statement is used for querying top-level data in the N-level data, and the second sub-statement is used for querying bottom-level data in the N-level data.

Specifically, the filter condition for querying the data set in the first sub-statement is null, and alternatively, no filter condition exists in the first sub-statement. The filtering condition for querying the data set in the second sub-statement is not null, and alternatively, the filtering condition exists in the second sub-statement.

It should be understood that the above sub-statements may contain attributes for querying the data set filter conditions and data, wherein any one of the filter conditions and the attributes for querying the data set may be null, or none of the filter conditions and the attributes for querying the data set may be null.

It should be further understood that, the aforementioned querying the K files according to the query sequence from the second file to the first file may refer to sequentially querying the K files according to the query sequence from the second file to the first file, where one file of the K files may be queried each time; the method can also refer to that the K files are grouped according to the query sequence from the second file to the first file, and a plurality of files in a group can be queried at a time according to the query sequence from the second file to the first file.

It should also be understood that the embodiment of the present application is not particularly limited to whether the filter condition for querying the data set in the sub-statement other than the first sub-statement and the second sub-statement in the query statement is null.

For example, the query statement Q1 includes 2 sub-statements, where the 2 sub-statements are used for querying data at a first layer and data at a second layer in the data hierarchy shown in table 1, and the sub-statement 1 can be regarded as a "first sub-statement" in the query statement Q1 and is used for querying data at the first layer (top layer) in the data at the 2 layers; sub-statement 2 can be viewed as a "second sub-statement" in query statement Q1 for querying data in a second (bottom) layer in the layer 2 data. However, since the filter condition in sub-statement 1 is not null, the query statement Q1 does not satisfy the above-described preset condition.

For another example, query statement Q2 is: when the Name, the Topic ID, and the Title from Table Name tile ". the attribute of the data included in the query statement Q2 includes" Name "and" Topic ID, tile ", and the filtering condition that the data included in the query statement Q2 needs to satisfy includes: "Title like" ". Table 3 shows the attributes and filter conditions of the data for each sub-statement in the query statement Q2. Two sub-statements may be contained in the query statement Q2: a sub-statement 3 and a sub-statement 4, wherein the attribute of the data in the sub-statement 3 is "Name", and the filtering condition in the sub-statement 3 is null, see the data hierarchy shown in table 1, and the attribute of the data is located at the first layer in the data hierarchy; the filter condition in the sub-statement 4 is "Title like". about. ", and the attribute of the data is" Topic ID, Title ", see the data hierarchy shown in table 1, and the sub-statement 4 is used for querying the data located at the second level in the data hierarchy shown in table 1.

TABLE 3

Sub-sentence	Properties of data	Filtration conditions
			Sub-sentence 3	Name
Sub-sentence 4	Topic ID、Title	Title like“***”

In the above query statement Q2, 2 sub-statements are included, and these 2 sub-statements are used for querying the data at the first layer and the data at the second layer in the data hierarchy shown in table 1, where sub-statement 3 can be regarded as "first sub-statement" in query statement Q2 and is used for querying the data in the first layer (top layer) in the 2-layer data; sub-statement 4 can be viewed as a "second sub-statement" in query statement Q2 for querying data in a second (bottom) layer in the layer 2 data. Since the filter condition in sub-statement 3 is null and the filter condition in sub-statement 4 is not null, the query statement Q2 satisfies the above preset condition.

According to the data searching method in the embodiment of the application, when the first sub-statement and the second sub-statement in the query statement meet the preset condition, the K files can be queried according to the sequence from the second file to the first file, the data is queried from the second file (namely, a new file) by utilizing the filter condition in the second sub-statement, and in a certain degree, the query statement meeting the preset condition in the prior art can be reduced, the query statement meeting the preset condition still adopts the query sequence from the first file to the second file, a large number of intermediate results which do not meet the filter condition of the query statement are generated, and the efficiency of querying the data is improved.

Optionally, the query statement includes at least 3 filter conditions, and different filter conditions in the at least 3 filter conditions are used for filtering data located at different levels in the data hierarchy, step 320, further includes: and if the query statement meets a preset condition and the at least 3 filtering conditions are filtering conditions which need to be met by data at the same time, querying the K files according to a query sequence from the second file to the first file.

Specifically, different filter conditions of the at least 3 filter conditions are used for filtering data at different layers in the data hierarchy, AND may be referred to as "cross-layer" filter conditions, AND the filter conditions of the "cross-layer" may be connected by an AND logical operator, that is, the data needs to simultaneously satisfy the "cross-layer" filter conditions, or that is, the data needs to simultaneously satisfy each filter condition of the at least 3 filter conditions.

For example, the filter conditions "Birthday" 1998 "and" Title like "contained in the query statement Q1 are a filter condition for data located in the first layer in the data hierarchy and a filter condition for data located in the second layer in the data hierarchy, that is, a filter condition of the above-described" cross-layer ", respectively. Moreover, it can be seen from the query statement Q1 that the two filtering conditions are connected by "and", that is, the data needs to satisfy the two "cross-layer" filtering conditions at the same time.

Optionally, as an embodiment, step 320 further includes: when the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file, and querying each file in the K files according to a first sequence to obtain the target data, wherein the first sequence is a sequence from a bottom layer to a top layer according to a data hierarchy of the files, and a hierarchy sequence in the data hierarchy of the files is the same as a hierarchy sequence of the data hierarchy of the data set.

Specifically, the data hierarchy of the file may refer to a data hierarchy of data stored in each file, and may be regarded as a sub-data hierarchy of a data set. That is, the data hierarchy of a file contains the levels in the data hierarchy of a partial data collection, which is used to represent the hierarchical relationship or hierarchical order between the data stored in the file.

For example, the data structure of the file in file 1 shown in fig. 2 includes the data User ID of the first hierarchy, the data Topic ID of the second hierarchy, and the data Comment ID of the third hierarchy in the data hierarchy shown in table 1, and it can be seen that the data hierarchy of the file in file 1 may be a sub data hierarchy of the data hierarchy (4 hierarchies) shown in table 1.

It should be noted that, each file in the K files may be queried according to the first order under the condition that the query statement satisfies the preset condition; or under the condition that the query statement meets the preset condition and the filtering condition of the cross-layer in the query statement is the filtering condition which needs to be met by the data at the same time, querying each file in the K files according to the first sequence.

Optionally, as an embodiment, the method further includes: 330, dividing the query statement into a plurality of sub-statements according to the data hierarchy structure according to the attribute of querying the data in the data set and the filter condition for querying the data in the data set, which are included in the query statement.

Specifically, the dividing the query statement into a plurality of sub-statements according to the data hierarchy structure may refer to determining a hierarchy of attributes of data in a query data set in the query statement in the data hierarchy structure, and determining a filter condition for data located at a layer in the data hierarchy structure, where the attribute of data in the query data set and the filter condition for data in the query data set included in each sub-statement in the query statement are determined, and when the attribute of data in the query data set and the filter condition for filtered data are located at a same hierarchy, the sub-statements may be merged into one sub-statement, and different sub-statements in the query statement are used for querying data located at different layers in the data hierarchy structure.

Optionally, as an embodiment, the method further includes: 340, when the query statement does not satisfy the preset condition, querying the K files according to the query sequence from the first file to the second file to obtain the target data to be queried.

Specifically, if the query statement does not satisfy the preset condition, the K files may be queried according to a query sequence from an old file to a new file to obtain the target data to be queried.

Optionally, when a query statement includes at least 3 filter conditions, and different filter conditions in the at least 3 filter conditions are used to filter data located in different layers in the data hierarchy, if the query statement satisfies the preset condition, but the at least 3 filter conditions are not filter conditions that the data need to satisfy simultaneously, the K files are queried according to a query sequence from the first file to the second file to obtain the target data to be queried.

Specifically, the filter conditions that are not required to be simultaneously satisfied by the data in the at least 3 filter conditions may indicate that the data satisfies any one of the at least 3 filter conditions, OR the data satisfies any two of the at least 3 filter conditions, in other words, the at least 3 filter conditions are connected by a logical OR (OR).

Optionally, as an embodiment, step 340 further includes: when the query statement does not meet the preset condition, querying the K files according to a query sequence from the first file to the second file to obtain the target data to be queried, and querying each file in the K files according to a second sequence to obtain the target data, wherein the second sequence is a sequence from the top layer to the bottom layer according to a data hierarchy of the files, and the hierarchical sequence in the data hierarchy of the files is the same as the hierarchical sequence of the data hierarchy of the data set.

In the above-mentioned query of each file in the K files in the second order, the second order is from the top to the bottom in the data hierarchy of the file, that is, in the process of querying each file, the query is performed from the top of the data hierarchy of each file to the bottom of the data hierarchy of the file.

It should be noted that, each file in the K files may be queried according to the second order under the condition that the query statement does not satisfy the preset condition; or under the condition that the query statement meets the preset condition but does not meet the filtering condition of 'cross-layer' in the query statement, namely the filtering condition which needs to be met by the data at the same time, querying each file in the K files according to the second sequence.

The method for querying data according to the embodiment of the present application is described in more detail below with reference to fig. 4. It should be understood that fig. 4 is only for assisting a person skilled in the art in understanding the embodiments of the present application, and is not intended to limit the embodiments of the present application to the specific scenarios illustrated. It will be apparent to those skilled in the art that various equivalent changes or modifications may be made from the example shown in fig. 4, and such changes or modifications also fall within the scope of the embodiments of the present application. It should be noted that, for ease of understanding, in the following description of the method for querying data, the second file is replaced by a new file, and the first file is replaced by an old file.

Fig. 4 is a schematic flow chart of a method of data query according to an embodiment of the present application. The method shown in fig. 4 includes:

at 410, a query statement is received, the query statement for querying target data in the WeChat.

Specifically, the query statement contains attributes of the target data and filter conditions that the target data needs to satisfy.

And 420, dividing the query statement into N sub-statements according to the attribute of the target data contained in the query statement and the filtering condition required to be met by the target data and the target data hierarchical structure.

Specifically, after the query statement Q1 is divided, 2 sub-statements can be obtained, see table 2; and after the query statement Q2 is divided, 2 sub-statements can be obtained, see table 3.

430, it is determined whether the query statement satisfies a preset rule.

Specifically, the preset rule may be that the filter condition in the first sub-statement of the query statement is null, the filter condition in the second sub-statement of the query statement is not null, and an and operation is performed between the filter conditions of the "cross-layer" in the query statement.

And 440, if the query statement meets a preset rule, querying the target data from the file for storing the target data in the WeChat according to the query sequence from the new file to the old file.

And 450, if the query statement does not meet the preset rule, querying the target data from the file for storing the target data in the WeChat according to the query sequence from the old file to the new file.

460, determining whether there is next target data in the file for storing the target data in the WeChat.

Specifically, if there is next target data in the file for storing the target data in the WeChat, step 470 is executed; and if the file for storing the target data in the WeChat does not have the next target data, finishing the target data query process.

470, according to the filter condition contained in the query statement, selecting the target data meeting the filter condition from the files for storing the target data in the WeChat.

480, it is determined whether the target data meeting the filtering condition is deleted.

Specifically, whether the target data meeting the filtering condition is deleted or not may be determined by deleting the label of the target data meeting the filtering condition in the deleted label bitmap, and if the deleted label of the target data meeting the filtering condition in the deleted label bitmap is 0, it indicates that the target data meeting the filtering condition is not deleted, step 490 is performed; if the delete tag value of the target data meeting the filter condition in the delete tag bitmap is 1, indicating that the target data meeting the filter condition is deleted, discarding the target data, and performing step 460.

490, store the target data meeting the filter criteria into the intermediate result set.

Specifically, if the delete tag value of the target data meeting the filter condition in the delete tag bitmap is 0, it indicates that the target data meeting the filter condition is not deleted and meets the filter condition in the query statement, and step 460 may be executed after the target data meeting the filter condition is placed in the intermediate result set.

Fig. 5 is a schematic block diagram of an apparatus for data query according to an embodiment of the present application. The apparatus 500 shown in fig. 5 comprises: an acquisition unit 510 and a query unit 520.

An obtaining unit 510, configured to obtain a query statement, where the query statement is used to query N layers of data in a data hierarchy, the data hierarchy is a hierarchy for storing data in a data set, and the data set is stored in K files according to a sequence from high to low in a hierarchy in the data hierarchy, where the K files include a first file and a second file, the first file is a file with the earliest creation time among the K files, the second file is a file with the latest creation time among the K files, and N and K are positive integers greater than 1;

a query unit 520, configured to query the K files according to a query sequence from the second file to the first file to obtain target data to be queried when the query statement obtained by the obtaining unit meets a preset condition,

the preset condition includes that a filter condition for querying the data set in a first sub-statement of the query statement is null, a filter condition for querying the data set in a second sub-statement of the query statement is not null, the first sub-statement is used for querying top-level data in the N-level data, and the second sub-statement is used for querying bottom-level data in the N-level data.

In the embodiment of the application, when the first sub-statement and the second sub-statement in the query statement meet the preset condition, the K files can be queried according to the sequence from the second file to the first file, and data is queried from the second file (namely, the new file) by using the filtering condition in the second sub-statement.

Optionally, as an embodiment, the apparatus further includes: and the determining unit is used for dividing the query statement into a plurality of sub-statements according to the data hierarchy structure according to the attribute of the data in the data set and the filter condition for querying the data in the data set, wherein the attribute is contained in the query statement.

Optionally, as an embodiment, the query statement includes at least 3 filter conditions, and different filter conditions in the at least 3 filter conditions are used to filter data located at different layers in the data hierarchy, and the query unit is further specifically configured to: and when the query statement meets the preset condition and the at least 3 filtering conditions are the filtering conditions which need to be met by the target data at the same time, querying the K files according to a query sequence from the second file to the first file.

Optionally, as an embodiment, the querying unit is further configured to: when the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file, and querying each file in the K files according to a first sequence to obtain the target data, wherein the first sequence is a sequence from a bottom layer to a top layer according to a data hierarchy of the files, and a hierarchy sequence in the data hierarchy of the files is the same as a hierarchy sequence of the data hierarchy of the data set.

Optionally, as an embodiment, the querying unit is further configured to: and if the query statement does not meet the preset condition, querying the K files according to the query sequence from the first file to the second file to obtain the target data to be queried.

Optionally, as an embodiment, the querying unit is further configured to: if the query statement does not meet the preset condition, querying the K files according to a query sequence from the first file to the second file to obtain the target data to be queried, and querying each file of the K files according to a second sequence to obtain the target data, wherein the second sequence is a sequence from the top layer to the bottom layer according to a data hierarchy of the files, and the hierarchy sequence in the data hierarchy of the files is the same as the hierarchy sequence of the data hierarchy of the data set.

Alternatively, the acquiring unit 510 and the querying unit 520 may be processors.

Specifically, fig. 6 is a schematic block diagram of an apparatus for data query according to an embodiment of the present application. The apparatus 600 shown in fig. 6 includes a memory 610, a processor 620, an input/output interface 630, and a communication interface 640. The memory 610, the processor 620, the input/output interface 630 and the communication interface 640 are connected through a communication interface, the memory 610 is used for storing instructions, and the processor 620 is used for executing the instructions stored in the memory 610, so as to control the input/output interface 630 to receive input data and information, output data such as operation results, and control the communication interface 640 to send signals.

The processor 620 is configured to obtain a query statement, where the query statement is configured to query N-layer data in a data hierarchy, the data hierarchy is a hierarchy for storing data in a data set, the data set is stored in K files according to a sequence from high to low in a hierarchy in the data hierarchy, the K files include a first file and a second file, the first file is a file with the earliest creation time among the K files, the second file is a file with the latest creation time among the K files, and N and K are positive integers greater than 1; the file query unit is further configured to query the K files according to a query sequence from the second file to the first file when the query statement acquired by the acquisition unit meets a preset condition, so as to obtain target data to be queried, where the preset condition includes that a filter condition for querying the data set in a first sub-statement of the query statement is null, a filter condition for querying the data set in a second sub-statement of the query statement is not null, the first sub-statement is used to query top-level data in the N-level data, and the second sub-statement is used to query bottom-level data in the N-level data.

It should be understood that, in the embodiment of the present invention, the processor 620 may adopt a general-purpose Central Processing Unit (CPU), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, for executing related programs to implement the technical solutions provided by the embodiments of the present invention.

It should also be appreciated that the communication interface 640 enables communication between the mobile terminal 600 and other devices or communication networks using transceiver means, such as, but not limited to, transceivers.

The memory 610 may include a read-only memory and a random access memory, and provides instructions and data to the processor 620. A portion of processor 620 may also include non-volatile random access memory. For example, the processor 620 may also store information of the device type.

The bus system 650 may include a power bus, a control bus, a status signal bus, and the like, in addition to a data bus. For clarity of illustration, the various buses are designated in the figure as the bus system 650.

In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 620. The data query method disclosed in the embodiments of the present invention may be directly implemented by a hardware processor, or implemented by a combination of hardware and software modules in the processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 610, and the processor 620 reads the information in the memory 610 and performs the steps of the above method in combination with the hardware thereof. To avoid repetition, it is not described in detail here.

It should be understood that in the embodiment of the present application, "B corresponding to a" means that B is associated with a, from which B can be determined. It should also be understood that determining B from a does not mean determining B from a alone, but may be determined from a and/or other information.

It should be understood that the term "and/or" herein is merely one type of association relationship that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.

It should be understood that, in the various embodiments of the present application, the sequence numbers of the above-mentioned processes do not mean the execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wire (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be read by a computer or a data storage device including one or more available media integrated servers, data centers, and the like. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., Digital Versatile Disk (DVD)), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of querying data, comprising:

acquiring a query statement, wherein the query statement is used for querying N layers of data in a data hierarchy structure of a data set, the data hierarchy structure is a hierarchy structure for storing data in the data set, the data set is stored in K files according to a sequence from high to low of hierarchies in the data hierarchy structure, the K files comprise a first file and a second file, the first file is a file with the earliest creation time in the K files, the second file is a file with the latest creation time in the K files, and N and K are positive integers greater than 1;

when the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file to obtain target data to be queried, wherein the preset condition includes that a filter condition for querying the data set in a first sub-statement of the query statement is null, a filter condition for querying the data set in a second sub-statement of the query statement is not null, the first sub-statement is used for querying top-layer data in the N-layer data, and the second sub-statement is used for querying bottom-layer data in the N-layer data.

2. The method of claim 1, wherein the method further comprises:

and dividing the query statement into a plurality of sub-statements according to the data hierarchy structure of the data set according to the attribute of querying the data in the data set and the filtering condition for querying the data in the data set, which are contained in the query statement.

3. The method of claim 1 or 2, wherein the query statement contains at least 3 filter terms, and different ones of the at least 3 filter terms are used to filter data at different levels in the data hierarchy,

when the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file to obtain target data to be queried, including:

and when the query statement meets the preset condition and the at least 3 filtering conditions are the filtering conditions which need to be met by the target data at the same time, querying the K files according to a query sequence from the second file to the first file to obtain the target data to be queried.

4. The method according to any one of claims 1-3, wherein the querying the K files to obtain target data to be queried in the query order from the second file to the first file comprises:

when the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file to obtain target data to be queried, and querying each file in the K files according to a first sequence to obtain the target data, wherein the first sequence is a sequence from the bottom layer to the top layer according to a data hierarchy of the files, and the hierarchy sequence in the data hierarchy of the files is the same as the hierarchy sequence of the data hierarchy of the data set.

5. The method of any one of claims 1-4, further comprising:

and when the query statement does not meet the preset condition, querying the K files according to the query sequence from the first file to the second file to obtain the target data to be queried.

6. The method of claim 5, wherein when the query statement does not satisfy the preset condition, querying the K files in a query order from the first file to the second file to obtain target data to be queried, comprises:

when the query statement does not meet the preset condition, querying the K files according to a query sequence from the first file to the second file to obtain the target data to be queried, and querying each file in the K files according to a second sequence to obtain the target data, wherein the second sequence is a sequence from the top layer to the bottom layer according to a data hierarchy of the files, and the hierarchical sequence in the data hierarchy of the files is the same as the hierarchical sequence of the data hierarchy of the data set.

7. An apparatus for querying data, comprising:

an obtaining unit, configured to obtain an inquiry statement, where the inquiry statement is used to inquire data in N layers in a data hierarchy of a data set, the data hierarchy is a hierarchy for storing data in the data set, the data set is stored in K files according to a sequence from high to low in a hierarchy in the data hierarchy, the K files include a first file and a second file, the first file is a file with the earliest creation time among the K files, the second file is a file with the latest creation time among the K files, and N and K are positive integers greater than 1;

and a query unit, configured to query the K files according to a query sequence from the second file to the first file when the query statement acquired by the acquisition unit satisfies a preset condition, to obtain target data to be queried, where the preset condition includes that a filter condition for querying the data set in a first sub-statement of the query statement is null, a filter condition for querying the data set in a second sub-statement of the query statement is not null, the first sub-statement is used to query top-level data in the N-level data, and the second sub-statement is used to query bottom-level data in the N-level data.

8. The apparatus of claim 7, wherein the apparatus further comprises:

and the determining unit is used for dividing the query statement into a plurality of sub-statements according to the data hierarchy structure of the data set according to the attribute of the data in the data set, which is queried in the query statement, and the filtering condition for querying the data in the data set.

9. The apparatus according to claim 7 or 8, wherein the query statement comprises at least 3 filter terms, and wherein different ones of the at least 3 filter terms are used for filtering data at different levels in the data hierarchy, the query unit being further configured to:

and when the query statement meets the preset condition and the at least 3 filtering conditions are the filtering conditions which need to be met by the target data at the same time, querying the K files according to a query sequence from the second file to the first file.

10. The apparatus of any of claims 7-9, wherein the querying element is further configured to:

when the query statement meets a preset condition, querying the K files according to a query sequence from the second file to the first file, and querying each file in the K files according to a first sequence to obtain the target data, wherein the first sequence is a sequence from a bottom layer to a top layer according to a data hierarchy of the files, and a hierarchy sequence in the data hierarchy of the files is the same as a hierarchy sequence of the data hierarchy of the data set.

11. The apparatus of any of claims 7-10, wherein the querying element is further configured to:

and if the query statement does not meet the preset condition, querying the K files according to the query sequence from the first file to the second file to obtain the target data to be queried.

12. The apparatus as recited in claim 11, said query unit to further:

if the query statement does not meet the preset condition, querying the K files according to a query sequence from the first file to the second file to obtain the target data to be queried, and querying each file of the K files according to a second sequence to obtain the target data, wherein the second sequence is a sequence from the top layer to the bottom layer according to a data hierarchy of the files, and the hierarchy sequence in the data hierarchy of the files is the same as the hierarchy sequence of the data hierarchy of the data set.