CN114003616A - Data query method, device, equipment and readable storage medium - Google Patents

Data query method, device, equipment and readable storage medium Download PDF

Info

Publication number
CN114003616A
CN114003616A CN202010742022.0A CN202010742022A CN114003616A CN 114003616 A CN114003616 A CN 114003616A CN 202010742022 A CN202010742022 A CN 202010742022A CN 114003616 A CN114003616 A CN 114003616A
Authority
CN
China
Prior art keywords
query
data
condition
tree
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010742022.0A
Other languages
Chinese (zh)
Inventor
张正武
李永光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihu Technology Service Co ltd
Original Assignee
Beijing Qihu Technology Service Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihu Technology Service Co ltd filed Critical Beijing Qihu Technology Service Co ltd
Priority to CN202010742022.0A priority Critical patent/CN114003616A/en
Publication of CN114003616A publication Critical patent/CN114003616A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24532Query optimisation of parallel queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data query method, a device, equipment and a readable storage medium, wherein the method comprises the following steps: constructing a query condition and determining the number of concurrent nodes; inquiring a data layer corresponding to the inquiry condition in the tree-shaped data structure according to the inquiry condition and the number of concurrent nodes to obtain an inquiry result; and updating the query condition and the number of concurrent nodes according to the query result, and executing the step of querying the data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of concurrent nodes based on the updated query condition and the number of concurrent nodes until all the data layers in the tree-shaped data structure are queried to obtain the final query result. According to the invention, through updating the query conditions and the number of concurrent nodes, all data layers in the data structure are queried layer by layer in a parallel manner, and each data layer simultaneously queries a plurality of data nodes in parallel, so that the data query efficiency is improved, and the rapid query analysis of data is facilitated.

Description

Data query method, device, equipment and readable storage medium
Technical Field
The present invention relates to the field of data analysis, and in particular, to a data query method, apparatus, device, and readable storage medium.
Background
With the development of informatization technology, various industries increasingly pay more attention to the data and index processing of information, various types of information are queried and analyzed in a data form, various indexes of various dimensions are obtained, and the development of various industries is embodied comprehensively.
The basic data volume of data query analysis is usually huge, the data query efficiency inevitably influences the data analysis, the higher the query efficiency is, the more beneficial the analysis is, and otherwise, the more adverse the analysis is. Therefore, how to improve the data query efficiency is a technical problem to be solved urgently at present.
Disclosure of Invention
The invention mainly aims to provide a data query method, a data query device, data query equipment and a readable storage medium, and aims to solve the technical problem of how to improve the data query efficiency in the prior art.
In order to achieve the above object, the present invention provides a data query method, including the steps of:
constructing a query condition and determining the number of concurrent nodes;
inquiring a data layer corresponding to the inquiry condition in the tree-shaped data structure according to the inquiry condition and the number of the concurrent nodes to obtain an inquiry result;
and updating the query condition and the number of the concurrent nodes according to the query result, and executing a step of querying a data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes based on the updated query condition and the number of the concurrent nodes until all data layers in the tree-shaped data structure are queried to obtain a final query result.
Optionally, the step of constructing a query condition and determining the number of concurrent nodes includes:
judging whether a cache data layer exists or not, if so, establishing a query condition according to the cache data layer, and determining the number of concurrent nodes;
and if the cache data layer does not exist, constructing a query condition according to the received dimension key words and determining the number of concurrent nodes.
Optionally, the step of constructing a query condition according to the cache data layer and determining the number of concurrent nodes includes:
acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure and the data volume of a cache result corresponding to the cache data layer;
and constructing the query condition according to the target data layer, and determining the number of the concurrent nodes according to the total number to be queried and the data volume.
Optionally, the step of obtaining a target data layer located at a level next to the cache data layer in the tree data structure includes:
judging whether each data layer in the tree-shaped data structure carries a query identifier, if so, taking the cache data corresponding to the cache data layer as a final query result, and finishing the query;
and if the query identifiers are not carried, executing the step of acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure.
Optionally, the step of constructing a query condition according to the received dimension key words and determining the number of concurrent nodes includes:
determining a heuristic level according to the number of the received dimension key words, and constructing a query condition according to the dimension key words and the heuristic level;
initiating a heuristic query to the tree-shaped data structure according to the query condition and the heuristic hierarchy to obtain a heuristic query result;
and determining the number of the concurrent nodes according to the total number to be queried and the result number of the heuristic query result.
Optionally, the querying a data layer corresponding to the query condition in the tree data structure, and the step of obtaining a query result includes:
and caching the data layer corresponding to the query condition in the tree-shaped data structure and the query result in a key-value pair mode.
Optionally, the step of updating the query condition and the number of concurrent nodes according to the query result includes:
judging whether the cache data amount corresponding to the cached inquiry result is greater than or equal to the total number to be inquired, if so, taking the cached inquiry result as a final inquiry result, and finishing the inquiry;
and if the cache data volume is smaller than the total number to be queried, executing the step of updating the query condition and the number of the concurrent nodes according to the query result.
Optionally, the querying a data layer corresponding to the query condition in the tree data structure, and the step of obtaining a query result includes:
and adding query identification to the data layer corresponding to the query condition in the tree-like data structure.
Optionally, the querying a data layer corresponding to the query condition in the tree data structure, and the step of obtaining a query result includes:
and adding result identification to the query result according to the query condition, and the dimension key words and the query key words corresponding to the query condition.
Optionally, the querying a data layer corresponding to the query condition in the tree data structure, and the step of obtaining a query result includes:
and adding a counting identifier to a data layer corresponding to the query condition in the tree-like data structure.
Optionally, the step of constructing a query condition and determining the number of concurrent nodes includes:
acquiring a plurality of items of data supporting query, and cleaning the plurality of items of data to generate effective data;
and constructing the valid data into the tree data structure.
Further, to achieve the above object, the present invention also provides a data query apparatus, including:
the construction module is used for constructing query conditions and determining the number of concurrent nodes;
the query module is used for querying a data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes to obtain a query result;
and the updating module is used for updating the query condition and the number of the concurrent nodes according to the query result, executing the step of querying the data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes on the basis of the updated query condition and the number of the concurrent nodes, and obtaining the final query result until all the data layers in the tree-shaped data structure are queried.
Optionally, the building module further comprises:
the judging unit is used for judging whether a cache data layer exists or not, if so, establishing a query condition according to the cache data layer and determining the number of concurrent nodes;
and the constructing unit is used for constructing a query condition and determining the number of concurrent nodes according to the received dimension key words if the cache data layer does not exist.
Optionally, the determining unit is further configured to:
acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure and the data volume of a cache result corresponding to the cache data layer;
and constructing the query condition according to the target data layer, and determining the number of the concurrent nodes according to the total number to be queried and the data volume.
Optionally, the determining unit is further configured to:
judging whether each data layer in the tree-shaped data structure carries a query identifier, if so, taking the cache data corresponding to the cache data layer as a final query result, and finishing the query;
and if the query identifiers are not carried, executing the step of acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure.
Optionally, the construction unit is further configured to:
determining a heuristic level according to the number of the received dimension key words, and constructing a query condition according to the dimension key words and the heuristic level;
initiating a heuristic query to the tree-shaped data structure according to the query condition and the heuristic hierarchy to obtain a heuristic query result;
and determining the number of the concurrent nodes according to the total number to be queried and the result number of the heuristic query result.
Optionally, the data query apparatus further includes:
and the cache module is used for caching the data layer corresponding to the query condition in the tree-shaped data structure and the query result in a key-value pair mode.
Optionally, the data query apparatus further includes:
the judging module is used for judging whether the cache data amount corresponding to the cached inquiry result is greater than or equal to the total number to be inquired, if so, taking the cached inquiry result as a final inquiry result and finishing the inquiry;
and the execution module is used for updating the query condition and the number of the concurrent nodes according to the query result if the cache data volume is less than the total number to be queried.
Further, to achieve the above object, the present invention also provides a data query device, which includes a memory, a processor, and a data query program stored in the memory and operable on the processor, wherein the data query program, when executed by the processor, implements the steps of the data query method as described above.
Further, to achieve the above object, the present invention also provides a readable storage medium, on which a data query program is stored, and the data query program, when executed by a processor, implements the steps of the data query method as described above.
According to the data query method, the data query device, the data query equipment and the readable storage medium, the query condition is firstly constructed, and the number of concurrent nodes is determined; then, according to the query condition and the number of concurrent nodes, querying a data layer corresponding to the query condition in the tree-shaped data structure to obtain a query result; and then updating the query condition and the number of concurrent nodes according to the query result, and querying the data layer corresponding to the query condition in the tree-shaped data structure again based on the updated query condition and the number of concurrent nodes, and circularly updating the query until all the data layers in the tree-shaped data structure are queried to obtain the final query result. By updating the query conditions and the number of concurrent nodes, all data layers in the data structure are queried layer by layer in a parallel mode, and each data layer simultaneously queries a plurality of data nodes in parallel, so that the data query efficiency is improved, and the rapid query analysis of data is facilitated.
Drawings
FIG. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the data query apparatus of the present invention;
FIG. 2 is a flowchart illustrating a first embodiment of a data query method according to the present invention;
FIG. 3 is a functional block diagram of a data query device according to a preferred embodiment of the present invention;
FIG. 4 is a schematic diagram of a tree data structure in the data query method of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides a data query device, and referring to fig. 1, fig. 1 is a schematic structural diagram of a hardware operating environment according to an embodiment of the data query device of the invention.
As shown in fig. 1, the data query apparatus may include: a processor 1001, such as a CPU, a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may optionally be a stored data query device separate from the processor 1001 described above.
Those skilled in the art will appreciate that the hardware configuration of the data query device shown in FIG. 1 does not constitute a limitation of the data query device, and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a readable storage medium, may include therein an operating system, a network communication module, a user interface module, and a data query program. The operating system is a program for managing and controlling hardware and software resources of the data query device and supports the operation of a network communication module, a user interface module, a data query program and other programs or software; the network communication module is used to manage and control the network interface 1004; the user interface module is used to manage and control the user interface 1003.
In the hardware structure of the data query device shown in fig. 1, the network interface 1004 is mainly used for connecting to a background server and performing data communication with the background server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; the processor 1001 may call a data query program stored in the memory 1005 and perform the following operations:
constructing a query condition and determining the number of concurrent nodes;
inquiring a data layer corresponding to the inquiry condition in the tree-shaped data structure according to the inquiry condition and the number of the concurrent nodes to obtain an inquiry result;
and updating the query condition and the number of the concurrent nodes according to the query result, and executing a step of querying a data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes based on the updated query condition and the number of the concurrent nodes until all data layers in the tree-shaped data structure are queried to obtain a final query result.
Further, the step of constructing a query condition and determining the number of concurrent nodes comprises:
judging whether a cache data layer exists or not, if so, establishing a query condition according to the cache data layer, and determining the number of concurrent nodes;
and if the cache data layer does not exist, constructing a query condition according to the received dimension key words and determining the number of concurrent nodes.
Further, the steps of constructing a query condition according to the cache data layer and determining the number of concurrent nodes include:
acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure and the data volume of a cache result corresponding to the cache data layer;
and constructing the query condition according to the target data layer, and determining the number of the concurrent nodes according to the total number to be queried and the data volume.
Further, before the step of obtaining the target data layer located at the next level of the cache data layer in the tree data structure, the processor 1001 may call a data query program stored in the memory 1005, and perform the following operations:
judging whether each data layer in the tree-shaped data structure carries a query identifier, if so, taking the cache data corresponding to the cache data layer as a final query result, and finishing the query;
and if the query identifiers are not carried, executing the step of acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure.
Further, the steps of constructing a query condition according to the received dimension key words and determining the number of concurrent nodes include:
determining a heuristic level according to the number of the received dimension key words, and constructing a query condition according to the dimension key words and the heuristic level;
initiating a heuristic query to the tree-shaped data structure according to the query condition and the heuristic hierarchy to obtain a heuristic query result;
and determining the number of the concurrent nodes according to the total number to be queried and the result number of the heuristic query result.
Further, after the step of querying the data layer corresponding to the query condition in the tree data structure to obtain the query result, the processor 1001 may call the data query program stored in the memory 1005, and perform the following operations:
and caching the data layer corresponding to the query condition in the tree-shaped data structure and the query result in a key-value pair mode.
Further, before the step of updating the query condition and the number of concurrent nodes according to the query result, the processor 1001 may call a data query program stored in the memory 1005, and perform the following operations:
judging whether the cache data amount corresponding to the cached inquiry result is greater than or equal to the total number to be inquired, if so, taking the cached inquiry result as a final inquiry result, and finishing the inquiry;
and if the cache data volume is smaller than the total number to be queried, executing the step of updating the query condition and the number of the concurrent nodes according to the query result.
Further, after the step of querying the data layer corresponding to the query condition in the tree data structure to obtain the query result, the processor 1001 may call the data query program stored in the memory 1005, and perform the following operations:
and adding query identification to the data layer corresponding to the query condition in the tree-like data structure.
Further, after the step of querying the data layer corresponding to the query condition in the tree data structure to obtain the query result, the processor 1001 may call the data query program stored in the memory 1005, and perform the following operations:
and adding result identification to the query result according to the query condition, and the dimension key words and the query key words corresponding to the query condition.
Further, after the step of querying the data layer corresponding to the query condition in the tree data structure to obtain the query result, the processor 1001 may call the data query program stored in the memory 1005, and perform the following operations:
and adding a counting identifier to a data layer corresponding to the query condition in the tree-like data structure.
Further, before the step of constructing the query condition and determining the number of concurrent nodes, the processor 1001 may call the data query program stored in the memory 1005 and perform the following operations:
acquiring a plurality of items of data supporting query, and cleaning the plurality of items of data to generate effective data;
and constructing the valid data into the tree data structure.
The specific implementation of the data query device of the present invention is basically the same as the following embodiments of the data query method, and is not described herein again.
The invention also provides a data query method.
Referring to fig. 2, fig. 2 is a flowchart illustrating a data query method according to a first embodiment of the present invention.
While a logical order is shown in the flow chart, in some cases, the steps shown or described may be performed in an order different than presented herein. Specifically, in the data query method of this embodiment, the data query method includes:
step S10, constructing a query condition and determining the number of concurrent nodes;
the data query method in the embodiment is applied to a data analysis system, and is suitable for constructing query conditions through the data analysis system and querying data in the query conditions. The analysis system may only include a plurality of data indicators of a certain aspect, such as sales data, growth data, and the like of a certain enterprise, or personal gender data, employment data, and the like of a certain region, or may include a plurality of data indicators of a plurality of aspects, such as sales data, growth data, and personal gender data, employment data, and the like of a plurality of enterprises, or personal gender data, employment data, and the like of a plurality of regions. Whether multiple aspects or a single aspect, each aspect data exists in the form of a tree data structure, with the data of one aspect formed as a tree data structure. The tree data structure comprises a plurality of hierarchical nodes, and each hierarchical node forms a dimension under the represented aspect of the tree data structure, such as a hierarchical node formed by each department under an enterprise, or a hierarchy formed by each city and county under a region. The implementation establishes query conditions through all dimensions, determines the number of concurrent nodes and realizes layer-by-layer query of the tree-shaped data structure. The number of concurrent nodes represents the number of nodes simultaneously queried on the tree data structure.
Further, the step of constructing the query condition and determining the number of concurrent nodes comprises:
a1, acquiring a plurality of items of data supporting query, and cleaning the plurality of items of data to generate effective data;
step a2, constructing the valid data into the tree data structure.
Understandably, each business has a lot of data on each index, and each item of data includes data supporting query and data not supporting query. Constructing and forming a tree-shaped data structure for the data supporting the query so as to provide the query; data which does not support query can be added into the tree data structure or not. The data which is added to the tree data structure and does not support query can be set with different query authorities, so as to limit the query of the data which does not support query through different query authorities.
Further, it is possible for the query-supporting data to include noisy data, such as duplicate data, invalid data that is difficult to identify, and the like. After a plurality of items of data supporting query are obtained, the data are identified, the identified result is removed from the plurality of items of data, the plurality of items of data are cleaned, effective data are generated, and the effective data are constructed into a tree-shaped data structure. The tree-like data structure comprises all hierarchies, and the hierarchy where the effective data is located is identified firstly during construction and then added into the corresponding hierarchy to form the tree-like data structure. For a specific tree data structure, reference may be made to fig. 4, where root in fig. 4 is a root node of the tree data structure, x1 and x2 are nodes of a first hierarchy, and y1, y1 and y3 are nodes of a second hierarchy. A hierarchy node represents a dimension, each dimension corresponds to a respective data index, for example, x1 and x2 represent provincial regions and have respective employment people indexes, y1, y2 and y3 represent city regions saved in x2 and also have respective employment people indexes.
Step S20, according to the query condition and the number of concurrent nodes, querying the data layer corresponding to the query condition in the tree-like data structure to obtain a query result;
furthermore, after the query condition is constructed and the number of the concurrent nodes is determined, the data layer corresponding to the query condition in the tree-shaped data structure is queried according to the query condition and the number of the concurrent nodes to obtain a query result. The query conditions are different according to different required query dimensions, and the tree-shaped data structure is queried in a hierarchical mode. If there is a need to query the first level in FIG. 4, the query condition contains keywords in the first level dimension, such as provincial keywords. And the keywords contained in the query condition can be divided into basic keywords and screening keywords, if the specific provinces are not distinguished for query, the province-level keywords are formed into the basic keywords, if the specific provinces are distinguished for query, the province-level keywords are formed into the screening keywords in addition to the basic keywords, and the data of the specific provinces are screened from the data obtained by querying the basic keywords to serve as query results. And simultaneously, for the data layer corresponding to the query condition, simultaneously querying a plurality of nodes positioned on the data layer according to the number of concurrent nodes. If the data layer is the first layer and the number of concurrent nodes is 2, simultaneously querying x1 and x2 to obtain a query result.
Understandably, for a layer-by-layer query mechanism of the tree-shaped data structure, after the first query, some nodes in the tree-shaped data structure are queried and some nodes are not queried; moreover, the results obtained by each query have differences. Therefore, for the convenience of differentiation, the present embodiment is provided with a mechanism of adding an identifier to the query result obtained by each layer of query. Specifically, the step of querying the data layer corresponding to the query condition in the tree-like data structure to obtain the query result includes:
step b1, adding query identification to the data layer corresponding to the query condition in the tree data structure.
Step b2, adding result identification to the query result according to the query condition, and the dimension key words and the query key words corresponding to the query condition.
Step b3, adding counting identification to the data layer corresponding to the query condition in the tree data structure.
Specifically, after a data layer corresponding to a query condition in the tree-like data structure is queried to obtain a query result, a query identifier is added to the data layer to represent that the data layer is queried and distinguish the data layer which is not queried yet. Meanwhile, according to the query condition of the current query, the dimension key words corresponding to the query condition and the query key words, result identification is added to the query result so as to distinguish the query result. Wherein the dimension key words corresponding to the query conditions characterize the data layer to which the current query is directed, such as querying the y1 layer in fig. 4; and the query key words corresponding to the query conditions represent the data indexes aimed at by the current query, such as the query of sales volume and the like.
Further, in order to ensure that each node of each data layer is queried, in this embodiment, in addition to adding a query identifier to the data layer, a count identifier is also added. And reflecting the query condition aiming at the node corresponding to the data layer and the child nodes thereof through the counting identification. When only the node corresponding to the data layer is queried, the counting identifier with the numerical value of 1 is added to the node corresponding to the data layer, and when all the sub-nodes under the corresponding node are queried, the counting identifier with the corresponding numerical value is added to the corresponding node according to the number of the sub-nodes. As shown in the node x2 in fig. 4, if only x2 is queried, the count is 1, and if all the sub-nodes y1, y2 and y3 therebelow are queried, the count of x2 is 4, and the count of y1, y2 and y3 is 1.
Step S30, updating the query condition and the number of the concurrent nodes according to the query result, and executing the step of querying the data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes based on the updated query condition and the number of the concurrent nodes until each data layer in the tree-shaped data structure is queried to obtain the final query result.
Furthermore, after the query result is obtained through the query, the query condition and the number of concurrent nodes are updated according to the query result. Judging whether the data volume of the query result obtained by query meets the requirement of the data volume required to be queried or not, and if so, stopping querying; if the query condition does not meet the requirement, determining an unexquired data layer in the tree-shaped data structure, adding the dimensionality corresponding to the unexquired data layer into the query condition, updating the query condition, and continuously querying the unexquired data layer through the updated query condition. Meanwhile, estimating the data quantity of subsequent query according to the data quantity of the query result, and determining the difference between the data quantity of the query result and the data quantity required to be queried; and updating the number of concurrent nodes according to the difference value and the estimated data volume so as to simultaneously query a plurality of nodes which are not queried in the tree-shaped data structure through the updated number of concurrent nodes.
Further, after the query condition and the number of concurrent nodes are updated, querying a data layer corresponding to the query condition in the tree-shaped data structure according to the updated query condition and the number of concurrent nodes; namely, according to the number of concurrent nodes, continuously inquiring a plurality of nodes which are represented by the updated inquiry condition and are not inquired. And then, updating the query conditions and the number of concurrent nodes again according to the obtained query result, continuously querying based on the updating to obtain a new query result, and repeating the steps until all the nodes are queried. And after the query identifier is added to the node of each data layer in the tree-like data structure, namely, each node is queried, the query is stopped. And then, the query result obtained by the previous query is used as a final query result to complete layer-by-layer parallel query on the tree-shaped data structure.
The data query method of the invention comprises the steps of firstly constructing query conditions and determining the number of concurrent nodes; then, according to the query condition and the number of concurrent nodes, querying a data layer corresponding to the query condition in the tree-shaped data structure to obtain a query result; and then updating the query condition and the number of concurrent nodes according to the query result, and querying the data layer corresponding to the query condition in the tree-shaped data structure again based on the updated query condition and the number of concurrent nodes, and circularly updating the query until all the data layers in the tree-shaped data structure are queried to obtain the final query result. By updating the query conditions and the number of concurrent nodes, all data layers in the data structure are queried layer by layer in a parallel mode, and each data layer simultaneously queries a plurality of data nodes in parallel, so that the data query efficiency is improved, and the rapid query analysis of data is facilitated.
Further, based on the first embodiment of the data query method of the present invention, a second embodiment of the data query method of the present invention is provided.
The second embodiment of the data query method is different from the first embodiment of the data query method in that the step of constructing query conditions and determining the number of concurrent nodes comprises the following steps:
step S11, judging whether a cache data layer exists, if so, constructing a query condition according to the cache data layer, and determining the number of concurrent nodes;
the embodiment has different construction modes for the query condition according to whether the cache data is obtained through query or not, and the number of the concurrent nodes is determined according to the query result obtained through query. Specifically, when the query requirement exists, a dimension keyword representing the query requirement is input into a query box of the analysis system, the analysis system constructs a query condition according to the dimension keyword to perform query to obtain a query result, and the query result and the data layer information aimed at by the query are cached. Thereafter, the constructed query conditions are updated on a cached basis to continue the query. Therefore, the present embodiment differs in the query condition for each query depending on whether or not the query is cached. Therefore, when the query condition is constructed, whether a cache data layer exists is judged, and the cache data layer comprises the query result obtained by querying, the data layer aimed at by the last query and the like. And if the cache data layer exists in the analysis system, updating the query condition and determining the number of concurrent nodes according to the cache data layer, wherein the updated query condition is the query condition constructed for the subsequent query. Specifically, the steps of constructing a query condition according to the cache data layer and determining the number of concurrent nodes include:
step S111, obtaining a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure and the data volume of the cache result corresponding to the cache data layer;
and step S112, constructing the query condition according to the target data layer, and determining the number of the concurrent nodes according to the total number to be queried and the data volume.
Furthermore, the cache data layer includes the data layer targeted by the last query, and accordingly, the data layer located at the next level of the targeted data layer in the tree data structure is obtained and used as the target data layer. And meanwhile, acquiring a query result contained in the cache data layer as a cache result corresponding to the cache data layer, and counting the data volume of the query result to obtain the data volume of the cache result. And then, constructing a query condition according to the target data layer, and adding the key words representing the dimensionality of the target data layer into the query condition of the last query to construct and obtain a new query condition.
In addition, the number of concurrent nodes is determined according to the total number to be queried and the data volume. The total number to be queried represents the number required by query, and the data volume is the number obtained by last query. And accumulating the data volume obtained by the last query and the data volume obtained by the previous query and cached to obtain the queried data volume. And comparing the total number to be queried with the queried data quantity to obtain a difference value between the total number to be queried and the queried data quantity, and representing the data quantity still needing to be queried. And then determining the number of the nodes needing to be inquired according to the data quantity which can be obtained by inquiring each node and is represented by the inquired data quantity and the proportional relation between the data quantity needing to be inquired, wherein the number of the nodes is the number of concurrent nodes. In an embodiment, the number of queried nodes is p, the queried data amount obtained by querying is R, and the total number of the required to-be-queried nodes is D, and then the number m of concurrent nodes for next simultaneous querying is (D-R)/(R/p). If the number m of the concurrent nodes obtained through calculation is a non-integer, the value of m is an integer larger than the calculated value of m, and the taken integer and the calculated value have an adjacent relation; if m is calculated to be 2.2, m is 3.
Understandably, as the query proceeds layer by layer, each node in the tree data structure is sequentially queried until each node is queried. Therefore, before a target data layer located at a next level of the cache data layer is acquired, whether each node in the tree data structure is queried or not is judged, if yes, the target data layer does not exist, and otherwise, the target data layer exists. Specifically, the step of obtaining a target data layer located at a next level of the cache data layer in the tree data structure comprises:
step c1, judging whether each data layer in the tree-like data structure carries a query identifier, if both carry query identifiers, taking the cache data corresponding to the cache data layer as a final query result, and ending the query;
step c2, if not all carry the query identification, executing the step of obtaining the target data layer located at the next level of the cache data layer in the tree data structure.
Furthermore, as for each queried node in the tree data structure, a query identifier is set, so that whether each node is queried can be determined by judging whether each node carries the query identifier. If all the data layers are judged to carry the query identification, all the nodes in the tree-shaped data structure are queried, at the moment, the cache data corresponding to the cache data layers are formed into a final query result, and the query operation is finished. The cache data corresponding to the cache data layer not only contains the cache result obtained by the last query, but also relates to the query result of each previous query, and is the cache of each query result.
Further, if it is determined that each data layer does not carry the query identifier, indicating that there is an un-queried data layer, acquiring a data layer located at a next level of the cache data layer in the tree-like data structure, and using the data layer as a target data layer to construct a query condition to continue querying until each data layer in the tree-like data structure is queried, so as to obtain a final query result.
Step S12, if the cache data layer does not exist, constructing query conditions according to the received dimension key words and determining the number of concurrent nodes.
Further, for the case where it is determined that there is no cached data layer in the analysis system, a heuristic query is initiated through the received dimension key, a query condition is constructed from the results of the heuristic query, and the number of concurrent nodes is determined. The heuristic query is to determine a heuristic data layer according to the number of input dimension keywords, and to initiate a query on the heuristic data layer to explore the data amount obtained by querying on the data layer, so that the data amount can predict the data layer targeted by the subsequent query, and further, the data layer constructs query conditions and determines the number of concurrent nodes. Specifically, the steps of constructing a query condition according to the received dimension key words and determining the number of concurrent nodes include:
step S121, determining a heuristic level according to the number of the received dimension key words, and constructing a query condition according to the dimension key words and the heuristic level;
step S122, according to the query condition and the heuristic hierarchy, initiating heuristic query to the tree-like data structure to obtain a heuristic query result;
and step S123, determining the number of the concurrent nodes according to the total number to be queried and the result number of the heuristic query result.
Furthermore, the number of the dimension keywords is different, the dimensions to be queried are different, the data layers for querying are different, and the data volume of the query result obtained is also different. Therefore, in order to determine the data quantity which can be inquired by each node, the heuristic level is determined according to the number of the dimension key words. When the number of the dimension keywords is larger, the deeper the data layer targeted by the query is, the deeper the corresponding heuristic level is. If the dimensionality required to be inquired is 2-5 dimensionalities and the number of the dimensionality key words is 2-5, the first layer in the tree-shaped data structure can be used as a heuristic level; as the dimension increases, the level of the chosen probe increases accordingly, which can be layer 2 or layer 3, etc.
Further, a query condition is constructed according to the dimension key words and the heuristic levels, and the key words corresponding to the heuristic levels in the dimension key words are constructed into the query condition. And initiating a heuristic query to the heuristic level through the query condition to obtain a heuristic query result. And then, determining the number of concurrent nodes according to the proportional relation between the result number of the heuristic query result and the total number to be queried. In an embodiment, the number of results obtained by the heuristic query is R, the total number of the required queries is D, and the number of concurrent nodes queried at the same time is (D-R)/R.
It should be noted that the heuristic query may be a query performed on a single node or a query performed on a plurality of nodes. For a single node, the result quantity of the heuristic query result represents the data quantity obtained by querying the single node, and the number of concurrent nodes is determined according to the proportional relation between the total number to be queried and the result quantity. For a plurality of nodes, the result quantity of the heuristic query result represents the data quantity obtained by querying the plurality of nodes; determining the average data volume of a single node according to the proportional relation between the result quantity and a plurality of nodes; and determining the number of concurrent nodes according to the proportional relation between the total number to be queried and the average data volume.
The implementation constructs query conditions and determines the number of concurrent nodes in different ways depending on whether a cached data layer exists. If the cache data layer exists, the query condition and the determined number of concurrent nodes are established according to the cache data layer, so that the established query condition and the determined number of concurrent nodes are related to the data layer which is queried for the cache last time, and the established query condition and the determined number of concurrent nodes are more accurate. If the cache data layer does not exist, establishing a query condition according to the dimension key words to initiate a heuristic query, and determining the number of concurrent nodes through the data volume obtained by querying each node embodied by the heuristic; the accuracy of the determined number of the concurrent nodes is improved, and the method is favorable for realizing accurate query according to the number of the concurrent nodes.
Further, based on the first or second embodiment of the data query method of the present invention, a third embodiment of the data query method of the present invention is proposed.
The difference between the third embodiment of the data query method and the first or second embodiment of the data query method is that the step of querying the data layer corresponding to the query condition in the tree-like data structure to obtain the query result comprises the following steps:
step d1, caching the data layer corresponding to the query condition in the tree data structure and the query result in a key-value pair form.
In this embodiment, whether to end the query is determined by determining whether the data size obtained by each query satisfies the total number to be queried of the query required. Specifically, after the query result is obtained in each query, the data layer corresponding to the query condition in the tree-like data structure, that is, the data layer targeted by the query, and the query result obtained in the query are cached in the form of key-value pairs. And quickly determining the query result obtained by each query according to the key value pair, and determining whether to finish the query according to the size relation between the data volume of each query result and the total number to be queried.
Further, the step of updating the query condition and the number of the concurrent nodes according to the query result comprises:
step e1, determining whether the cached data amount corresponding to the cached query result is greater than or equal to the total number to be queried, if so, taking the cached query result as the final query result, and ending the query;
step e2, if the cache data volume is smaller than the total number to be queried, executing the step of updating the query condition and the number of concurrent nodes according to the query result.
Furthermore, before the current query obtains the query result and the next query is started, whether the cache data volume obtained after the current query meets the requirement of the total number to be queried or not is judged. Specifically, the query results obtained from each query are cached, and the query results obtained from the current query are correspondingly cached and form cached query results together with the query results cached in each previous time. And carrying out data volume statistics on the cached query results to obtain the cached data volume of each time of caching. And comparing the total number to be queried of the cache data volume, and judging whether the cache data volume is greater than or equal to the total number to be queried. If the number is larger than or equal to the total number to be queried, the data size obtained by each query meets the query requirement, and therefore each cached query result is taken as a final query result, and the query is finished.
Further, if the cache data volume is judged to be smaller than the total number to be queried, the data volume obtained through each query does not meet the query requirement, the query condition and the number of concurrent nodes are updated according to the query result, the query is continued according to the updated query condition and the number of concurrent nodes until the total data volume obtained through the query meets the total number to be queried, and the query operation is received.
In the embodiment, the queried data layer and the query result are cached in a key value pair mode, so that the data layer from which each query result is derived can be determined quickly. Meanwhile, determining whether to continue querying according to the size relationship between the data volume of each query and the total number to be queried; while the query requirement is met, excessive query is avoided, query resources are saved, and query efficiency is improved.
The invention also provides a data query device. The data inquiry device comprises:
the construction module is used for constructing query conditions and determining the number of concurrent nodes;
the query module is used for querying a data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes to obtain a query result;
and the updating module is used for updating the query condition and the number of the concurrent nodes according to the query result, executing the step of querying the data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes on the basis of the updated query condition and the number of the concurrent nodes, and obtaining the final query result until all the data layers in the tree-shaped data structure are queried.
Further, the building module further comprises:
the judging unit is used for judging whether a cache data layer exists or not, if so, establishing a query condition according to the cache data layer and determining the number of concurrent nodes;
and the constructing unit is used for constructing a query condition and determining the number of concurrent nodes according to the received dimension key words if the cache data layer does not exist.
Further, the judging unit is further configured to:
acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure and the data volume of a cache result corresponding to the cache data layer;
and constructing the query condition according to the target data layer, and determining the number of the concurrent nodes according to the total number to be queried and the data volume.
Further, the judging unit is further configured to:
judging whether each data layer in the tree-shaped data structure carries a query identifier, if so, taking the cache data corresponding to the cache data layer as a final query result, and finishing the query;
and if the query identifiers are not carried, executing the step of acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure.
Further, the construction unit is further configured to:
determining a heuristic level according to the number of the received dimension key words, and constructing a query condition according to the dimension key words and the heuristic level;
initiating a heuristic query to the tree-shaped data structure according to the query condition and the heuristic hierarchy to obtain a heuristic query result;
and determining the number of the concurrent nodes according to the total number to be queried and the result number of the heuristic query result.
Further, the data query device further comprises:
and the cache module is used for caching the data layer corresponding to the query condition in the tree-shaped data structure and the query result in a key-value pair mode.
Further, the data query device further comprises:
the judging module is used for judging whether the cache data amount corresponding to the cached inquiry result is greater than or equal to the total number to be inquired, if so, taking the cached inquiry result as a final inquiry result and finishing the inquiry;
and the execution module is used for updating the query condition and the number of the concurrent nodes according to the query result if the cache data volume is less than the total number to be queried.
The specific implementation of the data query device of the present invention is basically the same as that of the above data query method, and is not described herein again.
In addition, the embodiment of the invention also provides a readable storage medium.
The readable storage medium has stored thereon a data query program, which when executed by the processor implements the steps of the data query method as described above.
The readable storage medium of the present invention may be a computer readable storage medium, and the specific implementation manner of the readable storage medium is substantially the same as that of each embodiment of the data query method, which is not described herein again.
The present invention is described in connection with the accompanying drawings, but the present invention is not limited to the above embodiments, which are only illustrative and not restrictive, and those skilled in the art can make various changes without departing from the spirit and scope of the invention as defined by the appended claims, and all changes that come within the meaning and range of equivalency of the specification and drawings that are obvious from the description and the attached claims are intended to be embraced therein.

Claims (10)

1. A data query method, comprising the steps of:
constructing a query condition and determining the number of concurrent nodes;
inquiring a data layer corresponding to the inquiry condition in the tree-shaped data structure according to the inquiry condition and the number of the concurrent nodes to obtain an inquiry result;
and updating the query condition and the number of the concurrent nodes according to the query result, and executing a step of querying a data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes based on the updated query condition and the number of the concurrent nodes until all data layers in the tree-shaped data structure are queried to obtain a final query result.
2. The data query method of claim 1, wherein the step of constructing query terms and determining the number of concurrent nodes comprises:
judging whether a cache data layer exists or not, if so, establishing a query condition according to the cache data layer, and determining the number of concurrent nodes;
and if the cache data layer does not exist, constructing a query condition according to the received dimension key words and determining the number of concurrent nodes.
3. The data query method of claim 2, wherein the step of constructing query conditions according to the cached data layer and determining the number of concurrent nodes comprises:
acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure and the data volume of a cache result corresponding to the cache data layer;
and constructing the query condition according to the target data layer, and determining the number of the concurrent nodes according to the total number to be queried and the data volume.
4. The data query method of claim 3, wherein said step of obtaining a target data level in said tree data structure at a level next to said cache data level is preceded by the step of:
judging whether each data layer in the tree-shaped data structure carries a query identifier, if so, taking the cache data corresponding to the cache data layer as a final query result, and finishing the query;
and if the query identifiers are not carried, executing the step of acquiring a target data layer positioned at the next level of the cache data layer in the tree-shaped data structure.
5. The data query method of claim 2, wherein the steps of constructing query conditions based on the received dimension key words and determining the number of concurrent nodes comprise:
determining a heuristic level according to the number of the received dimension key words, and constructing a query condition according to the dimension key words and the heuristic level;
initiating a heuristic query to the tree-shaped data structure according to the query condition and the heuristic hierarchy to obtain a heuristic query result;
and determining the number of the concurrent nodes according to the total number to be queried and the result number of the heuristic query result.
6. The data query method according to claim 1, wherein the step of querying the data layer corresponding to the query condition in the tree-like data structure to obtain the query result comprises:
and caching the data layer corresponding to the query condition in the tree-shaped data structure and the query result in a key-value pair mode.
7. The data query method of claim 6, wherein the step of updating the query condition and the number of concurrent nodes according to the query result is preceded by:
judging whether the cache data amount corresponding to the cached inquiry result is greater than or equal to the total number to be inquired, if so, taking the cached inquiry result as a final inquiry result, and finishing the inquiry;
and if the cache data volume is smaller than the total number to be queried, executing the step of updating the query condition and the number of the concurrent nodes according to the query result.
8. A data query device, comprising:
the construction module is used for constructing query conditions and determining the number of concurrent nodes;
the query module is used for querying a data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes to obtain a query result;
and the updating module is used for updating the query condition and the number of the concurrent nodes according to the query result, executing the step of querying the data layer corresponding to the query condition in the tree-shaped data structure according to the query condition and the number of the concurrent nodes on the basis of the updated query condition and the number of the concurrent nodes, and obtaining the final query result until all the data layers in the tree-shaped data structure are queried.
9. A data query device, characterized in that the data query device comprises a memory, a processor and a data query program stored on the memory and executable on the processor, which when executed by the processor implements the steps of the data query method as claimed in any one of claims 1 to 7.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a data query program, which when executed by a processor implements the steps of the data query method according to any one of claims 1-7.
CN202010742022.0A 2020-07-28 2020-07-28 Data query method, device, equipment and readable storage medium Pending CN114003616A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010742022.0A CN114003616A (en) 2020-07-28 2020-07-28 Data query method, device, equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010742022.0A CN114003616A (en) 2020-07-28 2020-07-28 Data query method, device, equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN114003616A true CN114003616A (en) 2022-02-01

Family

ID=79920781

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010742022.0A Pending CN114003616A (en) 2020-07-28 2020-07-28 Data query method, device, equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN114003616A (en)

Similar Documents

Publication Publication Date Title
US10042911B2 (en) Discovery of related entities in a master data management system
CN110019292B (en) Data query method and device
JP5364719B2 (en) Statistical application in OLTP environment
US8538988B2 (en) Selective storing of mining models for enabling interactive data mining
US11720825B2 (en) Framework for multi-tenant data science experiments at-scale
JP2016100005A (en) Reconcile method, processor and storage medium
US11074267B2 (en) Staged approach to automatic data discovery and performance
US11461327B1 (en) Query plan caching for networked database systems
WO2022121227A1 (en) Data storage method and apparatus, query method, electronic device, and readable medium
US20180302268A1 (en) Systems and Methods for Real Time Streaming
US20130036222A1 (en) Inheritable dimensions in a service model
US20160342646A1 (en) Database query cursor management
CN113282630B (en) Data query method and device based on interface switching
JP2024038428A (en) Method and system for predicting properties of sample molecule, and non-transient computer-readable medium
US8024320B1 (en) Query language
US11550792B2 (en) Systems and methods for joining datasets
US20190278576A1 (en) Enhancing program execution using optimization-driven inlining
US20240176657A1 (en) Task processing method and apparatus, electronic device, storage medium and program product
US20220164396A1 (en) Metadata indexing for information management
CN114003616A (en) Data query method, device, equipment and readable storage medium
EP3616091A1 (en) Managing asynchronous analytics operation based on communication exchange
US11157508B2 (en) Estimating the number of distinct entities from a set of records of a database system
CN111767060A (en) Multi-stage gray scale verification method, multi-stage gray scale verification device, electronic equipment and medium
KR100496159B1 (en) Usability-based Cache Management Scheme Method of Query Results
US20230359506A1 (en) Methods and apparatus for aggregating metadata of multiple cloud databases

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination