CN118296032A

CN118296032A - Data query method and device

Info

Publication number: CN118296032A
Application number: CN202410487603.2A
Authority: CN
Inventors: 曹鑫; 邵先凯; 李刚勇; 尹迎昭
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2024-04-22
Filing date: 2024-04-22
Publication date: 2024-07-05

Abstract

The invention discloses a data query method and device, and relates to the technical field of computers. One embodiment of the method comprises the following steps: receiving a data query request; the data query request indicates a first combination identification of the index and the dimension to be queried; determining whether the first combined identifier exists in the metadata center; wherein, the metadata center indicates that the combined identifier of the pre-query result exists; if yes, acquiring a data query result corresponding to the first combination identifier from the pre-query result according to the first combination identifier, and feeding back the data query result to the user; wherein the pre-query result is derived from a second combination of identified queries determined from the history log information. According to the embodiment, the pre-query is performed according to the history log information before the instant query, so that after the data query request is received, the feedback can be directly performed according to the pre-query result, the query process is not required to be performed once after each data query request is received, and the query efficiency is greatly improved.

Description

Data query method and device

Technical Field

The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for querying data.

Background

In the existing data query process, the instant query is performed according to the data query request, and the existing optimization technology basically surrounds how to improve the instant query efficiency to improve the query performance, but in the scene of large data volume and high QPS (query per second) query volume, even if the query performance is improved, the timeliness requirement of different query requirements is difficult to meet.

Disclosure of Invention

In view of this, the embodiments of the present invention provide a method and an apparatus for querying data, which pre-queries, before an instant query, an index with a higher query frequency and a dimension combination according to history log information, so that after receiving a data query request, feedback can be directly performed according to a pre-query result, and there is no need to execute a query process after receiving each data query request, thereby greatly improving query efficiency.

To achieve the above object, according to one aspect of an embodiment of the present invention, there is provided a method of data query.

The data query method of the embodiment of the invention comprises the following steps: receiving a data query request; the data query request indicates a first combination identifier of an index to be queried and a dimension; determining whether the first combined identity exists in a metadata center; wherein the metadata center indicates that a combined identifier of the pre-query result exists; if yes, acquiring a data query result corresponding to the first combination identifier from a pre-query result according to the first combination identifier, and feeding back the data query result to a user; wherein the pre-query result is obtained by a second combination identification query determined by the history log information.

Optionally, before the receiving the data query request, the method further includes: collecting history log information; determining indexes of the pre-query and a second combination identifier of the dimension according to the history log information; and pre-inquiring the second combined identifier, and correspondingly storing a pre-inquiring result and the second combined identifier to the metadata center.

Optionally, the collecting history log information includes: respectively acquiring user behavior logs and resource consumption logs; the user behavior log indicates operation information of the historical query request, and the resource consumption log indicates resource consumption information of the historical query request; and carrying out association integration on the user behavior log and the resource consumption log, and storing an association integration result in a distributed file system so as to acquire the history log information from the distributed file system.

Optionally, the determining, according to the log information, the index of the pre-query and the second combination identifier of the dimension includes: acquiring a plurality of preset combinations corresponding to all indexes and dimensions; sequentially judging whether each preset combination meets a preset decision rule; and generating the second combination identifier according to a preset combination meeting the preset decision rule.

Optionally, the preset decision rule includes one or more of the following: the average response time of network requests in a preset period is larger than a preset first threshold, the access frequency is larger than a preset first proportion, and the access resource consumption is larger than a preset second proportion.

Optionally, the generating the second combination identifier according to a preset combination that satisfies the preset decision rule includes: acquiring a first source data table from the metadata center; the first source data table indicates field information configured by a user; acquiring a second source data table from the pre-query system; the second source data table indicates field information obtained by the pre-query system according to the historical log information; determining whether field information in the second source data table is consistent with field information in the first source data table; and if so, generating the second combination identifier according to the preset combination meeting the preset decision rule.

Optionally, the generating the second combination identifier according to a preset combination that satisfies the preset decision rule further includes: for each of the preset combinations: determining pre-query resource consumption information consumed by pre-querying the preset combination; and determining whether the preset combination can generate the second combination identifier according to the pre-query resource consumption information and the historical resource consumption information corresponding to the historical query request.

Optionally, the determining pre-query resource consumption information consumed by pre-querying the preset combination includes: acquiring a query method corresponding to the preset combination, and generating a query statement according to the query method; and executing the query statement by using a column database management system to generate target resource consumption information consumed by pre-querying the preset combination.

Optionally, the determining whether the preset combination can generate the second combination identifier according to the pre-query resource consumption information and the historical resource consumption information corresponding to the historical query request includes: and under the condition that the target resource consumption information is not greater than the historical query request resource consumption information, determining whether the preset combination can generate the second combination identifier according to the storage utilization rate corresponding to the preset combination and the overall utilization rate of the preset query system.

Optionally, the determining whether the preset combination can generate the second combination identifier according to the storage usage rate corresponding to the preset combination and the overall usage rate of the pre-query system includes: and generating the second combination identifier according to the preset combination under the condition that the daily increase rate of the storage utilization rate is smaller than a preset second threshold value and the overall utilization rate is larger than a preset third threshold value.

Optionally, the determining whether the first combined identifier exists in a metadata center includes: matching the first combined identifier with the second combined identifier; and determining whether the first combined identifier exists in the metadata center according to a matching result.

Optionally, the method further comprises: and directly executing the data query request to generate a query result and feeding back the query result to a user in the case that the first combination identifier does not exist in the metadata center.

To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided an apparatus for querying data.

The device for inquiring the data comprises the following components: the receiving module is used for receiving the data query request; the data query request indicates a first combination identifier of an index to be queried and a dimension; a determining module configured to determine whether the first combined identifier exists in a metadata center; wherein the metadata center indicates that a combined identifier of the pre-query result exists; the data query module is used for obtaining a data query result corresponding to the first combination identifier from a pre-query result according to the first combination identifier when the first combination identifier exists in the metadata center, and feeding the data query result back to a user; wherein the pre-query result is obtained by a second combination identification query determined by the history log information.

To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic device for data query.

The electronic equipment for data query in the embodiment of the invention comprises: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the data query method according to the embodiment of the invention.

To achieve the above object, according to still another aspect of the embodiments of the present invention, there is provided a computer-readable storage medium.

A computer readable storage medium of an embodiment of the present invention has stored thereon a computer program which, when executed by a processor, implements a method of data querying of an embodiment of the present invention.

One embodiment of the above invention has the following advantages or benefits: before instant query, the index with higher query frequency and the dimension combination are pre-queried according to the history log information, so that after a data query request is received, feedback can be directly performed according to the pre-query result, a query process is not required to be executed after each data query request is received, and the query efficiency is greatly improved.

Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic flow diagram of a method of data querying according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a main flow of determining a second combined identity according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a main flow of collecting history log information according to an embodiment of the present invention;

FIG. 4 is another flow diagram of collecting history log information according to an embodiment of the invention;

FIG. 5 is a schematic diagram of a primary flow for determining a second combination according to an embodiment of the invention;

FIG. 6 is a schematic diagram of a main flow of performing feasibility verification on a second combination according to an embodiment of the invention;

FIG. 7 is a schematic diagram of another primary flow for determining a second combination according to an embodiment of the invention;

FIG. 8 is a schematic diagram of a main flow for determining pre-query resource consumption information according to an embodiment of the present invention;

FIG. 9 is a schematic diagram of the major modules of an apparatus for data querying according to an embodiment of the present invention;

FIG. 10 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;

fig. 11 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that the embodiments of the present invention and the technical features in the embodiments may be combined with each other without collision.

It should be noted that, in the technical solution of the present disclosure, the related aspects of collecting, updating, analyzing, processing, using, transmitting, storing, etc. of the personal information of the user all conform to the rules of the related laws and regulations, and are used for legal purposes without violating the public order colloquial. Necessary measures are taken for the personal information of the user, illegal access to the personal information data of the user is prevented, and the personal information security, network security and national security of the user are maintained.

FIG. 1 is a schematic diagram of the main steps of a method of data querying according to an embodiment of the present invention.

As shown in fig. 1, the method for querying data according to the embodiment of the present invention mainly includes the following steps:

step S101: receiving a data query request; the data query request indicates a first combination identification of the index and the dimension to be queried;

step S102: determining whether the first combined identifier exists in the metadata center; wherein, the metadata center indicates that the combined identifier of the pre-query result exists;

Step S103: if yes, acquiring a data query result corresponding to the first combination identifier from the pre-query result according to the first combination identifier, and feeding back the data query result to the user; wherein the pre-query result is derived from a second combined query determined from the history log information.

Wherein the dimensions and indexes in step S101 refer to different contents, for example, the dimensions may include: store dimensions, brand dimensions, etc. need to be summarized to get data, and the indexes may include: data directly acquired by the transaction amount, the transaction amount and the like. In the data query process, it is generally necessary to input dimensions and indexes (i.e., a combination of dimensions and indexes) simultaneously to perform a query, such as querying the amount of the transaction of store a.

It should be noted that, in the prior art, clickHouse (column database management system) is generally used to perform data query, in order to ensure that the query effect is the same as the actual online query effect, the pre-query system applied in the embodiment of the present invention is also a ClickHouse-based query system, that is, the pre-query process is also implemented based on ClickHouse. However, the metadata center is a data storage center different from the pre-query system, which is only used for storing and recording data, and cannot provide a query function, so the present invention needs to implement a data query process through an interactive process between the pre-query system and the metadata center.

Since the metadata center stores the combination identifier in advance, which is the pre-query result, a detailed description is given of how to obtain the pre-query result combination identifier and how to obtain the pre-query result through the following procedure. Wherein the pre-query results are stored in the pre-query system and the combined identity of the pre-query results is stored in the metadata center. In an alternative embodiment, before step S101, as shown in fig. 2, the method includes:

Step S201: collecting history log information;

step S202: determining a second combination identifier of the index and the dimension of the pre-query according to the history log information;

step S203: and pre-inquiring the second combined identifier, and correspondingly storing the pre-inquiring result and the second combined identifier to the metadata center.

It will be appreciated that, when the second combination identifier is stored in the metadata center, the process of determining whether the first combination identifier exists in the metadata center in step S102 may include: matching the first combined identifier with the second combined identifier; and determining whether the first combination identifier exists in the metadata center according to the matching result.

In the embodiment of the invention, not only needs to determine which indexes and dimension combinations are pre-queried, but also needs to consider whether the cost of the pre-query meets the actual requirement, so the history log information needs to include: user behavior log and resource consumption log. Since the user behavior log and the resource consumption log are not monitored by the same system, in an alternative embodiment, the process of collecting the history log information in step S201 may include, as shown in fig. 3:

Step S301: respectively acquiring user behavior logs and resource consumption logs; the user behavior log indicates operation information of the historical query request, and the resource consumption log indicates resource consumption information of the historical query request;

step S302: and carrying out association integration on the user behavior log and the resource consumption log, and storing an association integration result in a distributed file system so as to acquire the history log information from the distributed file system.

Specifically, as shown in fig. 4, for the process of obtaining the user behavior log, the embodiment of the invention adopts a lightweight log collector Filebeat, and filters all logs by combining with logstack to obtain the user behavior log related to the user history query operation, stores the user behavior log into an elastic search analysis engine, converts the format of the user behavior log through the elastic search analysis engine, and finally synchronizes the converted user behavior log into a distributed file system. The distributed file system is a Hadoop Distributed File System (HDFS), is a high-fault-tolerance distributed system, can be deployed on a cheap machine, can provide high-throughput data access, is very suitable for application on a large-scale data set, and is very suitable for a query access process with huge data volume.

As for the resource consumption log, also as shown in fig. 4, when the ClickHouse system is used as the query system, the resource consumption per query is also recorded in the ClickHouse system, and thus it is necessary to acquire the resource consumption log from the ClickHouse system. In an alternative embodiment, the query_log file in the log file may be read through a Spark computing framework, and the read file may be synchronized into the HDFS.

By the method of storing the user behavior log and the resource consumption log in the distributed file system after being associated and integrated, when the history log information is acquired, different log information can be acquired only through the unique code ID in the data query request, and different systems are not required to be called to acquire the user behavior log and the resource consumption log respectively, so that the acquisition efficiency of the history log information is greatly improved.

After the history log information is obtained, it can be determined which combinations of indexes and dimensions are necessary to perform the pre-query according to the history log information, that is, the process of step S202. In an alternative embodiment, as shown in fig. 5, specifically includes:

step S501: acquiring a plurality of preset combinations corresponding to all indexes and dimensions;

step S502: sequentially judging whether each preset combination meets a preset decision rule;

step S503: and generating a second combination identifier according to the preset combination meeting the preset decision rule.

That is, the second combination is selected from a plurality of preset combinations of all indexes and dimensions, and specifically, the preset decision rule for the selection, in an alternative embodiment, includes one or more of the following: the average response time of network requests in a preset period is larger than a preset first threshold, the access frequency is larger than a preset first proportion, and the access resource consumption is larger than a preset second proportion. The data query process of the user generally has periodicity, for example, the query of the volume of volume is frequently performed in the period of more than one month (for example, during certain sales promotion activities), and the query of the return rate is frequently performed in the period of more than one month, so that the combination of the index and the dimension can be effectively screened according to the average response time in the preset period, and the second combination identifier which may be frequently queried recently by the user is obtained. The average response time of the network request in the preset period may include a plurality of parameters, for example, TP99 (average response time required to satisfy 99% of network requests), TP90 (average response time required to satisfy 90% of network requests), TP50 (average response time required to satisfy 50% of network requests), and the like, where the preset first thresholds corresponding to different parameters are generally different, so that the preset first thresholds may be set according to specific selected parameters, which is not limited in the present invention. In addition, the access frequency and the access resource consumption ratio can intuitively represent the use frequency of the combination of the index and the dimension, so that the method is also used as one of the preset decision rules in the embodiment of the invention. Through the multiple preset decision rules, the second combination which is most likely to be queried by the user can be obtained, the accuracy of the pre-query is improved, the pre-query is not required to be performed for each preset combination, and the resource waste is caused.

In a further alternative embodiment, not only the second combination needs to be obtained by screening from all preset combinations according to a preset decision rule, but also the availability of the second combination needs to be ensured, that is, the second combination needs to be subjected to feasibility verification, as shown in fig. 6, including:

Step S601: acquiring a first source data table from a metadata center; the first source data table indicates field information configured by a user;

Step S602: acquiring a second source data table from the pre-query system; the second source data table indicates field information obtained by the pre-query system according to the historical log information;

Step S603: determining whether the field information in the second source data table is consistent with the field information in the first source data table;

If yes, step S604 is performed: generating a second combination identifier according to a preset combination meeting a preset decision rule;

if not, step S605 is performed: ending the flow.

It should be noted that, in the actual application process, the first source data table may be changed, that is, the field name is "user name" in 1-3 months, and the field is modified to be "user ID" in 4 months, then the field "user name" obtained by summarizing only the history log information cannot be used in the subsequent query process, that is, there is no practicality, so only the preset combination where the first source data table and the second data table are consistent is used as the second combination. Namely, through the process, the embodiment of the invention reduces useless second combination and improves the accuracy of pre-query.

In addition to screening the preset combinations according to the plurality of preset decision rules, the embodiment of the invention also considers the resource consumption condition of the pre-query process, and if the resource consumption of the pre-query is far more than the resource consumption of executing the actual query request, the pre-query process is considered unnecessary. Thus in an alternative embodiment, for each preset combination, step S503, as shown in fig. 7, further comprises:

step S701: determining pre-query resource consumption information consumed by a pre-query preset combination;

Step S702: and determining whether the preset combination can generate a second combination identifier according to the preset query resource consumption information and the historical resource consumption information corresponding to the historical query request.

For the process of determining the pre-query resource consumption information in step S701, in an alternative embodiment, the pre-query step of performing a preset combination is performed, that is, step S701 is shown in fig. 8, and includes:

step S801: acquiring a query method corresponding to the preset combination, and generating a query statement according to the query method;

step S802: and executing the query statement by using the column database management system to generate target resource consumption information consumed by the preset combination of the pre-query.

The query method corresponding to the preset combination can be understood as a calculation method of dimensions and indexes in the preset combination, and the method can be obtained by querying the metadata center. Whereas the process of executing the query sentence using the columnar database management system in step S802, the query sentence may be executed by directly accessing ClickHouse the system. After the query statement execution is completed, the resource consumption log in ClickHouse may be queried to obtain the target resource consumption information consumed by the current preset combination of the preset query.

In particular, the resource consumption information may include a storage cost, a time cost, and a calculation cost. The calculation process of the storage cost is as follows: s=r (d×10+i×8)/(1024×1024), where S represents the required storage space, R represents the number of data result lines, D represents the number of dimensions, and I represents the number of indexes. For example, a materialized table contains 1 hundred million rows of data, 10 dimensions and 5 indices, the dimensions are predicted using String type storage, on average 10 bytes, the indices are predicted using int64 type storage, on average 8 bytes. According to the above calculation, the memory space required for the table is 1335MB. Through the pre-query process, the resource cost of the pre-query can be effectively evaluated before the decision is on line, and the resource configuration of the data intelligent materialized device can be well planned and managed, so that the balance of user experience and storage cost is optimized. In the embodiment of the present invention, the calculation cost is measured in units of seconds, that is, the number of seconds of cores being used is multiplied by the use time. ClickHouse act as vectorization engines, preempting CPU resources as much as possible when processing queries. Thus, the number of seconds cores of a cluster may be expressed as the total number of cores of the cluster multiplied by 86400 seconds. However, in view of the practical situation, this value is adjusted to 28800 in the present invention, because ClickHouse is typically used for on-line ad hoc queries, where peak queries occur at 8 to 24 points per day, running materialized tasks may affect the stability of the query. Therefore, we set the run period of the materialized task to 0 point to 8 points. Assuming that the materialization operation is running for 120 seconds, according to the calculation cost formula: the calculation cost = second kernel cost = use duration, and the calculation cost of the materialization operation is 196608 yuan after substituting the two values into a calculation formula. By comparing the calculated costs with the daily query costs as described above, it is also possible to decide whether to perform the pre-query process on the pre-set combination.

In a further optional embodiment, in step S602, other factors may also be considered to determine whether the preset combination may generate the second combination identifier, that is, in a case where the target resource consumption information is not greater than the historical query request resource consumption information, whether the preset combination may generate the second combination identifier is determined according to the storage usage rate corresponding to the preset combination and the overall usage rate of the preset query system. Specifically, in the case where the daily rate of increase of the storage usage is less than a preset second threshold and the overall usage is greater than a preset third threshold, the preset combination is determined to be the second combination. Illustratively, the second threshold may be 0.3%, and the third threshold may be set to 80%, i.e., the daily increase in storage usage is less than 0.3%, while the overall usage is greater than 80%, which may ensure normal execution of the second combination of pre-queries while ensuring stability of the pre-query system.

Through the above-mentioned fig. 1 to 8, a process of feeding back the pre-query result to the user as a feedback result in the case that the first combination identifier exists in the metadata center may be obtained, which may greatly improve the query efficiency, but may also exist in the case that the first combination identifier does not exist in the metadata center. Therefore, in the embodiment of the invention, in the case that the first combination identifier does not exist in the metadata center, the data query request can be directly executed to generate a query result, and the query result is fed back to the user.

According to the data query method provided by the embodiment of the invention, the index with higher query frequency and the dimension combination are pre-queried according to the history log information before the instant query, so that after the data query request is received, the feedback can be directly performed according to the pre-query result, the query process is not required to be executed once after each data query request is received, and the query efficiency is greatly improved.

Fig. 9 is a schematic diagram of main modules of an apparatus for data query according to an embodiment of the present invention.

As shown in fig. 9, an apparatus 900 according to an embodiment of the present invention includes:

A receiving module 901, configured to receive a data query request; the data query request indicates a first combination identifier of an index to be queried and a dimension;

A determining module 902, configured to determine whether the first combined identifier exists in a metadata center; wherein the metadata center indicates that a combined identifier of the pre-query result exists;

The result obtaining module 903 is configured to obtain, when a first combination identifier exists in the metadata center, a data query result corresponding to the first combination identifier from a pre-query result according to the first combination identifier, and feed back the data query result to a user; wherein the pre-query result is obtained from a second combined query determined from the history log information.

In an alternative embodiment of the present invention, the apparatus further includes a configuration module for collecting history log information prior to the receiving the data query request; determining indexes of the pre-query and a second combination identifier of the dimension according to the history log information; and pre-inquiring the second combined identifier, and correspondingly storing a pre-inquiring result and the second combined identifier to the metadata center.

In an optional embodiment of the present invention, the configuration module is further configured to obtain a user behavior log and a resource consumption log respectively; the user behavior log indicates operation information of the historical query request, and the resource consumption log indicates resource consumption information of the historical query request; and carrying out association integration on the user behavior log and the resource consumption log, and storing an association integration result in a distributed file system so as to acquire the history log information from the distributed file system.

In an optional embodiment of the present invention, the configuration module is further configured to obtain a plurality of preset combinations corresponding to all indexes and dimensions; sequentially judging whether each preset combination meets a preset decision rule; and generating the second combination identifier according to a preset combination meeting the preset decision rule.

In an alternative embodiment of the present invention, the preset decision rule includes one or more of the following: the average response time of network requests in a preset period is larger than a preset first threshold, the access frequency is larger than a preset first proportion, and the access resource consumption is larger than a preset second proportion.

In an alternative embodiment of the present invention, the configuration module is further configured to obtain a first source data table from the metadata center; the first source data table indicates field information configured by a user; acquiring a second source data table from the pre-query system; the second source data table indicates field information obtained by the pre-query system according to the historical log information; determining whether field information in the second source data table is consistent with field information in the first source data table; and if so, generating the second combination identifier according to the preset combination meeting the preset decision rule.

In an alternative embodiment of the present invention, the configuration module is further configured to, for each of the preset combinations: determining pre-query resource consumption information consumed by pre-querying the preset combination; and determining whether the preset combination can generate the second combination identifier according to the pre-query resource consumption information and the historical resource consumption information corresponding to the historical query request.

In an optional embodiment of the present invention, the configuration module is further configured to obtain a query method corresponding to the preset combination, and generate a query statement according to the query method; and executing the query statement by using a column database management system to generate target resource consumption information consumed by pre-querying the preset combination.

In an optional embodiment of the present invention, the configuration module is further configured to determine, when the target resource consumption information is not greater than the historical query request resource consumption information, whether the preset combination can generate the second combination identifier according to a storage usage rate corresponding to the preset combination and an overall usage rate of the preset query system.

In an optional embodiment of the present invention, the configuration module is further configured to generate the second combination identifier according to the preset combination when the daily increase rate of the storage usage rate is less than a preset second threshold value and the overall usage rate is greater than a preset third threshold value.

In an alternative embodiment of the present invention, the determining module 902 is further configured to match the first combined identifier with the second combined identifier; and determining whether the first combined identifier exists in the metadata center according to a matching result.

In an alternative embodiment of the present invention, the result obtaining module 903 is further configured to directly execute the data query request to generate a query result, and feed back the query result to the user, where the first combination identifier does not exist in the metadata center.

According to the data query device provided by the embodiment of the invention, the index with higher query frequency and the dimension combination are pre-queried according to the history log information before the instant query, so that after the data query request is received, the feedback can be directly performed according to the pre-query result, the query process is not required to be executed once after each data query request is received, and the query efficiency is greatly improved.

Fig. 10 illustrates an exemplary system architecture 1000 of a data querying method or device to which embodiments of the present invention may be applied.

As shown in fig. 10, a system architecture 1000 may include terminal devices 1001, 1002, 1003, a network 1004, and a server 1005. The network 1004 serves as a medium for providing a communication link between the terminal apparatuses 1001, 1002, 1003 and the server 1005. The network 1004 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user can interact with the server 1005 via the network 1004 using the terminal apparatuses 1001, 1002, 1003 to receive or transmit data or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc., may be installed on the terminal devices 1001, 1002, 1003.

The terminal devices 1001, 1002, 1003 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.

The server 1005 may be a server providing various services, such as a background management server providing support for data query requests transmitted by users using the terminal devices 1001, 1002, 1003. The background management server may analyze and process the received data, such as the data query request, and feed back the processing result (for example, the pre-query result) to the terminal device.

It should be noted that, the method for querying data provided in the embodiment of the present invention is generally executed by the server 1005, and accordingly, the device for querying data is generally disposed in the server 1005.

It should be understood that the number of terminal devices, networks and servers in fig. 10 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

Referring now to FIG. 11, there is illustrated a schematic diagram of a computer system 1100 suitable for use in implementing the terminal device of an embodiment of the present invention. The terminal device shown in fig. 11 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.

As shown in fig. 11, the computer system 1100 includes a Central Processing Unit (CPU) 1101, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage section 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data required for the operation of the system 1100 are also stored. The CPU 1101, ROM 1102, and RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) first interface 1105 is also connected to the bus 1104.

The following components are connected to the I/O first interface 1105: an input section 1106 including a keyboard, a mouse, and the like; an output portion 1107 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage section 1108 including a hard disk or the like; and a communication section 1109 including a network first interface card such as a LAN card, a modem, or the like. The communication section 1109 performs communication processing via a network such as the internet. The driver 1110 is also connected to the I/O first interface 1105 as needed. Removable media 1111, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is installed as needed in drive 1110, so that a computer program read therefrom is installed as needed in storage section 1108.

In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network via the communication portion 1109, and/or installed from the removable media 1111. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 1101.

The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, as: a processor includes a receiving module, a determining module, and a result acquisition module. The names of these modules do not in some way constitute a limitation of the module itself, for example, the receiving module may also be described as "module receiving a data query request".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: receiving a data query request; the data query request indicates a first combination identifier of an index to be queried and a dimension; determining whether the first combined identity exists in a metadata center; wherein the metadata center indicates that a combined identifier of the pre-query result exists; if yes, acquiring a data query result corresponding to the first combination identifier from a pre-query result according to the first combination identifier, and feeding back the data query result to a user; wherein the pre-query result is obtained from a second combined query determined from the history log information.

According to the technical scheme provided by the embodiment of the invention, before instant query, the index with higher query frequency and the dimension combination are pre-queried according to the history log information, so that after a data query request is received, feedback can be directly performed according to the pre-query result, a query process is not required to be executed after each data query request is received, and the query efficiency is greatly improved.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A method for data querying, applied to a pre-query system, comprising:

Receiving a data query request; the data query request indicates a first combination identifier of an index to be queried and a dimension;

Determining whether the first combined identity exists in a metadata center; wherein the metadata center indicates that a combined identifier of the pre-query result exists;

If yes, acquiring a data query result corresponding to the first combination identifier from a pre-query result according to the first combination identifier, and feeding back the data query result to a user; wherein the pre-query result is obtained by a second combination identification query determined by the history log information.

2. The method of claim 1, further comprising, prior to said receiving a data query request:

Collecting history log information;

Determining indexes of the pre-query and a second combination identifier of the dimension according to the history log information;

And pre-inquiring the second combined identifier, and correspondingly storing a pre-inquiring result and the second combined identifier to the metadata center.

3. The method of claim 2, wherein the collecting history log information comprises:

Respectively acquiring user behavior logs and resource consumption logs; the user behavior log indicates operation information of the historical query request, and the resource consumption log indicates resource consumption information of the historical query request;

and carrying out association integration on the user behavior log and the resource consumption log, and storing an association integration result in a distributed file system so as to acquire the history log information from the distributed file system.

4. The method of claim 2, wherein determining the index of the pre-query and the second combined identification of the dimension from the log information comprises:

Acquiring a plurality of preset combinations corresponding to all indexes and dimensions;

sequentially judging whether each preset combination meets a preset decision rule;

And generating the second combination identifier according to a preset combination meeting the preset decision rule.

5. The method of claim 4, wherein the preset decision rule comprises one or more of:

The average response time of network requests in a preset period is larger than a preset first threshold, the access frequency is larger than a preset first proportion, and the access resource consumption is larger than a preset second proportion.

6. The method of claim 4, wherein the generating the second combination identifier according to the preset combination satisfying the preset decision rule comprises:

Acquiring a first source data table from the metadata center; the first source data table indicates field information configured by a user;

Acquiring a second source data table from the pre-query system; the second source data table indicates field information obtained by the pre-query system according to the historical log information;

Determining whether field information in the second source data table is consistent with field information in the first source data table; and if so, generating the second combination identifier according to the preset combination meeting the preset decision rule.

7. The method of claim 4, wherein the generating the second combination identifier according to a preset combination that satisfies the preset decision rule further comprises:

for each of the preset combinations:

determining pre-query resource consumption information consumed by pre-querying the preset combination;

and determining whether the preset combination can generate the second combination identifier according to the pre-query resource consumption information and the historical resource consumption information corresponding to the historical query request.

8. The method of claim 7, wherein the determining pre-query resource consumption information consumed to pre-query the pre-set combination comprises:

acquiring a query method corresponding to the preset combination, and generating a query statement according to the query method;

and executing the query statement by using a column database management system to generate target resource consumption information consumed by pre-querying the preset combination.

9. The method of claim 7, wherein determining whether the second combination identifier can be generated by the preset combination according to the pre-query resource consumption information and the historical resource consumption information corresponding to the historical query request comprises:

And under the condition that the target resource consumption information is not greater than the historical query request resource consumption information, determining whether the preset combination can generate the second combination identifier according to the storage utilization rate corresponding to the preset combination and the overall utilization rate of the preset query system.

10. The method of claim 9, wherein determining whether the preset combination can generate the second combination identifier according to the storage usage rate corresponding to the preset combination and the overall usage rate of the pre-query system comprises:

And generating the second combination identifier according to the preset combination under the condition that the daily increase rate of the storage utilization rate is smaller than a preset second threshold value and the overall utilization rate is larger than a preset third threshold value.

11. The method of claim 2, wherein the determining whether the first combined identity exists at a metadata center comprises:

Matching the first combined identifier with the second combined identifier;

and determining whether the first combined identifier exists in the metadata center according to a matching result.

12. The method as recited in claim 1, further comprising:

and directly executing the data query request to generate a query result and feeding back the query result to a user in the case that the first combination identifier does not exist in the metadata center.

13. An apparatus for querying data, comprising:

the receiving module is used for receiving the data query request; the data query request indicates a first combination identifier of an index to be queried and a dimension;

A determining module configured to determine whether the first combined identifier exists in a metadata center; wherein the metadata center indicates that a combined identifier of the pre-query result exists;

the data query module is used for obtaining a data query result corresponding to the first combination identifier from a pre-query result according to the first combination identifier when the first combination identifier exists in the metadata center, and feeding the data query result back to a user; wherein the pre-query result is obtained from a second combined query determined from the history log information.

14. An electronic device for data querying, comprising:

one or more processors;

Storage means for storing one or more programs,

When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-12.

15. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-12.