CN117591744A - Query method and related equipment - Google Patents

Query method and related equipment Download PDF

Info

Publication number
CN117591744A
CN117591744A CN202311747367.5A CN202311747367A CN117591744A CN 117591744 A CN117591744 A CN 117591744A CN 202311747367 A CN202311747367 A CN 202311747367A CN 117591744 A CN117591744 A CN 117591744A
Authority
CN
China
Prior art keywords
query
cost
sub
field
data source
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311747367.5A
Other languages
Chinese (zh)
Inventor
曾丹
吴昊
邓桥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zitiao Network Technology Co Ltd
Original Assignee
Beijing Zitiao Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zitiao Network Technology Co Ltd filed Critical Beijing Zitiao Network Technology Co Ltd
Priority to CN202311747367.5A priority Critical patent/CN117591744A/en
Publication of CN117591744A publication Critical patent/CN117591744A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a query method and related equipment. The method comprises the following steps: receiving a query request; analyzing and obtaining query parameters based on the query request, wherein the query parameters comprise filtering conditions and data source types; predicting a total query cost for the query request based on the filtering conditions and the data source type; wherein the data source type comprises a single object, a sub-query, or a plurality of associated objects; and determining a query strategy based on the query total cost to execute the query request.

Description

Query method and related equipment
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to a query method and related devices.
Background
In the data query process, the query efficiency is different when the query strategies such as the query object data quantity is different or the query parameters are different. The existing query method often does not adopt a proper query strategy, so that the consumption of system resources in the query process is large, and the query efficiency is low.
Disclosure of Invention
The disclosure provides a query method, a query device, a query medium and a query program product, so as to solve the technical problems of high system resource consumption, low query efficiency and the like in the query process to a certain extent.
In a first aspect of the present disclosure, a query method is provided, including:
receiving a query request;
analyzing and obtaining query parameters based on the query request, wherein the query parameters comprise filtering conditions and data source types;
predicting a total query cost for the query request based on the filtering conditions and the data source type; wherein the data source type comprises a single object, a sub-query, or a plurality of associated objects;
and determining a query strategy based on the query total cost to execute the query request.
In a second aspect of the present disclosure, there is provided a query device, including:
the receiving module is used for receiving the query request;
the analysis module is used for analyzing and obtaining query parameters based on the query request, wherein the query parameters comprise filtering conditions and data source types;
a cost module for predicting a total cost of the query request based on the filtering condition and the data source type; wherein the data source type comprises a single object, a sub-query, or a plurality of associated objects;
and the strategy module is used for determining a query strategy based on the query total cost so as to execute the query request.
In a third aspect of the disclosure, an electronic device is provided that includes one or more processors, a memory; and one or more programs, wherein the one or more programs are stored in the memory and executed by the one or more processors, the programs comprising instructions for performing the method of the first or second aspect.
In a fourth aspect of the present disclosure, there is provided a non-transitory computer readable storage medium containing a computer program which, when executed by one or more processors, causes the processors to perform the method of the first or second aspect.
In a fifth aspect of the present disclosure, there is provided a computer program product comprising computer program instructions which, when run on a computer, cause the computer to perform the method of the first aspect.
From the above, it can be seen that, according to the query method and the related device provided by the present disclosure, the filtering condition of the query request and the data source type are used to predict the total cost of the query required by the query, so that an appropriate query strategy can be determined for different query requests, and the consumption of system resources can be reduced while the query efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions of the present disclosure or related art, the drawings required for the embodiments or related art description will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present disclosure, and other drawings may be obtained according to these drawings without inventive effort to those of ordinary skill in the art.
Fig. 1 is a schematic diagram of a query architecture of an embodiment of the present disclosure.
Fig. 2 is a schematic hardware architecture diagram of an exemplary electronic device according to an embodiment of the disclosure.
Fig. 3 is a schematic flow chart of a query method of an embodiment of the present disclosure.
Fig. 4 is a schematic diagram of query total cost calculation according to an embodiment of the present disclosure.
Fig. 5 is a schematic diagram of a query device according to an embodiment of the disclosure.
Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same.
It should be noted that unless otherwise defined, technical or scientific terms used in the embodiments of the present disclosure should be given the ordinary meaning as understood by one of ordinary skill in the art to which the present disclosure pertains. The terms "first," "second," and the like, as used in embodiments of the present disclosure, do not denote any order, quantity, or importance, but rather are used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that elements or items preceding the word are included in the element or item listed after the word and equivalents thereof, but does not exclude other elements or items. The terms "connected" or "connected," and the like, are not limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. "upper", "lower", "left", "right", etc. are used merely to indicate relative positional relationships, which may also be changed when the absolute position of the object to be described is changed.
It will be appreciated that prior to using the technical solutions disclosed in the embodiments of the present disclosure, the user should be informed and authorized of the type, usage range, usage scenario, etc. of the personal information related to the present disclosure in an appropriate manner according to the relevant legal regulations.
For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Thus, the user can autonomously select whether to provide personal information to software or hardware such as an electronic device, an application program, a server or a storage medium for executing the operation of the technical scheme of the present disclosure according to the prompt information.
As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.
It will be appreciated that the above-described notification and user authorization process is merely illustrative and not limiting of the implementations of the present disclosure, and that other ways of satisfying relevant legal regulations may be applied to the implementations of the present disclosure.
FIG. 1 illustrates a schematic diagram of a query architecture of an embodiment of the present disclosure. Referring to fig. 1, the query architecture 100 may include a server 110, a terminal 120, and a network 130 providing a communication link. The server 110 and the terminal 120 may be connected through a wired or wireless network 130. The server 110 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides basic cloud computing services such as cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, security services, CDNs, and the like.
The terminal 120 may be a hardware or software implementation. For example, when the terminal 120 is a hardware implementation, it may be a variety of electronic devices having a display screen and supporting page display, including but not limited to smartphones, tablets, e-book readers, laptop and desktop computers, and the like. When the terminal 120 is implemented in software, it may be installed in the above-listed electronic device; it may be implemented as a plurality of software or software modules (e.g., software or software modules for providing distributed services) or as a single software or software module, without limitation.
It should be noted that, the query method provided in the embodiment of the present application may be executed by the terminal 120 or may be executed by the server 110. It should be understood that the number of terminals, networks, and servers in fig. 1 are illustrative only and are not intended to be limiting. There may be any number of terminals, networks, and servers, as desired for implementation.
Fig. 2 shows a schematic hardware structure of an exemplary electronic device 200 provided by an embodiment of the disclosure. As shown in fig. 2, the electronic device 200 may include: processor 202, memory 204, network module 206, peripheral interface 208, and bus 210. Wherein the processor 202, the memory 204, the network module 206, and the peripheral interface 208 are communicatively coupled to each other within the electronic device 200 via a bus 210.
The processor 202 may be a central processing unit (Central Processing Unit, CPU), querier, neural Network Processor (NPU), microcontroller (MCU), programmable logic device, digital Signal Processor (DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits. The processor 202 may be used to perform functions related to the techniques described in this disclosure. In some embodiments, processor 202 may also include multiple processors integrated as a single logic component. For example, as shown in fig. 2, the processor 202 may include a plurality of processors 202a, 202b, and 202c.
The memory 204 may be configured to store data (e.g., instructions, computer code, etc.). As shown in fig. 2, the data stored by the memory 204 may include program instructions (e.g., program instructions for implementing the query methods of embodiments of the present disclosure) as well as data to be processed (e.g., the memory may store configuration files of other modules, etc.). The processor 202 may also access program instructions and data stored in the memory 204 and execute the program instructions to perform operations on the data to be processed. The memory 204 may include volatile storage or nonvolatile storage. In some embodiments, memory 204 may include Random Access Memory (RAM), read Only Memory (ROM), optical disks, magnetic disks, hard disks, solid State Disks (SSD), flash memory, memory sticks, and the like.
The network module 206 may be configured to provide communications with other external devices to the electronic device 200 via a network. The network may be any wired or wireless network capable of transmitting and receiving data. For example, the network may be a wired network, a local wireless network (e.g., bluetooth, wiFi, near Field Communication (NFC), etc.), a cellular network, the internet, or a combination of the foregoing. It will be appreciated that the type of network is not limited to the specific examples described above. In some embodiments, network module 306 may include any combination of any number of Network Interface Controllers (NICs), radio frequency modules, receivers, modems, routers, gateways, adapters, cellular network chips, etc.
Peripheral interface 208 may be configured to connect electronic device 200 with one or more peripheral devices to enable information input and output. For example, the peripheral devices may include input devices such as keyboards, mice, touchpads, touch screens, microphones, various types of sensors, and output devices such as displays, speakers, vibrators, and indicators.
Bus 210 may be configured to transfer information between the various components of electronic device 200 (e.g., processor 202, memory 204, network module 206, and peripheral interface 208), such as an internal bus (e.g., processor-memory bus), an external bus (USB port, PCI-E bus), etc.
It should be noted that, although the architecture of the electronic device 200 described above only shows the processor 202, the memory 204, the network module 206, the peripheral interface 208, and the bus 210, in a specific implementation, the architecture of the electronic device 200 may also include other components necessary to achieve normal operation. Furthermore, those skilled in the art will appreciate that the architecture of the electronic device 200 may also include only the components necessary to implement the embodiments of the present disclosure, and not all of the components shown in the figures.
The aPaaS (Application Platform as a Service, application platform, i.e. service) is an application development platform, and a user can quickly construct an application without codes or by using a small amount of codes. In the aPaaS platform, objects are abstractions of entities, and can be stored in a structured manner, each object can be understood as a table in a database, a field of an object can be understood as a table field in the database, and data of the object can be stored in the database finally. The user can perform data query on one object or a plurality of objects, when query is performed, query parameters such as output fields, screening conditions, sorting conditions and the like need to be specified, and the aPaaS can translate the query based on the objects into query sentences (such as SQL) of the database and submit the query sentences to the database for execution. When the amount of object data is different or the query parameters are different, the efficiency of execution of the query statement may vary widely. The query statement execution cost model of the database can also evaluate the consumption cost of the query on the system resource based on rules and the complexity of the query statement, for example, the sub-query cost weight is set to be 2, the associated query cost weight is set to be 5, and when the query statement contains 1 sub-query and 1 associated query, the total cost is 2×1+5×1=7. However, the resource consumption of data query is related to the complexity of query statement, and also has strong relevance to the size of data volume and data distribution. It is difficult to accurately evaluate the consumption of system resources by a query using the above method, so that an appropriate query policy cannot be determined to reduce the consumption of system resources by the query and improve the query efficiency. Therefore, how to improve the query efficiency and reduce the cost of computing resources is a technical problem to be solved.
In view of this, embodiments of the present disclosure provide a query method and related devices. The filtering condition of the query request and the type of the data source are used for predicting the total cost of the query required by the query, so that a proper query strategy can be determined for different query requests, the query efficiency can be improved, and the consumption of system resources can be reduced.
Referring to fig. 3, fig. 3 shows a schematic flow chart of a query method according to an embodiment of the disclosure. The query method according to the embodiment of the disclosure can be deployed at a server side. In fig. 3, the query method 300 may further include the following steps.
In step S310, a query request is received.
The query request may refer to a request for a query sent by a client to a server. The structured query language (Structured Query Language, SQL) of the query request in the aPaaS platform may include at least one of: output field lists, e.g., SELECT col1, col2, function (col 3). Wherein, the function is a calculation function, col1, col2, and function (col 3) are output fields; the query data source can be an object or sub-query, for example, FROM obj, which is a query data source in the form of an object; the association query, the supported association types may include inner, left, right, e.g., JOIN obj_i [ on JOIN CONDITIONS ] represents connecting object obj_i to CONDITIONS; filtering conditions, right values may be sub-queries, such as WHERE filter, which may be filtering conditions or sub-queries; grouping means, for example, GROUP BY col1, col 2. Ordering conditions, such as ORDER BY col1[ desc ], col2[ asc ],. Express ordering in the ORDER col1[ desc ], col2[ asc ],. A result display condition such as [ LIMIT n1] [ OFFSET n2], wherein LIMIT n1 represents the number of records displayed in the query result as n1; offset n2 indicates that the query results are displayed starting from n2 records.
In step S320, query parameters are obtained based on the query request parsing, where the query parameters include filtering conditions and data source types.
The query parameters may refer to query information obtained by parsing the query request, such as output fields, filtering conditions, data sources, types thereof, and the like. The filtering condition may refer to a condition that needs to be satisfied by an output field in the query request, for example, a filter included in a query statement "WHERE filter" is the filtering condition. A data source may refer to a collection of data that is to be searched or scanned by a query request, and a data source type refers to a category of data source, including a single object, sub-query, or multiple associated objects. For example, the type of the data source obj in the SQL statement "FROM obj" is a single object, and the query statement "FROM" can be followed by the data source representing the sub-query or multiple related objects.
Predicting a total query cost for the query request based on the filtering conditions and the data source type at step S330; wherein the data source type comprises a single object, a sub-query, or a plurality of associated objects.
Wherein, determining an appropriate query strategy requires evaluating consumption of system resources, thereby improving query performance. Besides being related to the complexity of query sentences, the resource consumption of the data query has strong relevance to the size and the data distribution of the data volume, and is difficult to accurately evaluate by using a calculation method of regularity. According to the method and the device for predicting the total cost of the query for different query requests based on the filtering conditions and the data source types, the consumption of resources can be accurately predicted, and reliable data basis is provided for accurately determining a proper query strategy.
In some embodiments, predicting a total cost of the query for the query request based on the filtering conditions and the data source type comprises:
predicting a filtered query cost of the query request based on the filtering condition;
predicting a data query cost of the query request based on the data source type;
and carrying out weighted summation based on the filtered query cost and the data query cost to obtain the query total cost.
The filtering query cost may refer to a cost generated when filtering based on filtering conditions. The data query costs may refer to costs incurred when scanning for different data source types. The total query cost may be obtained based on weighted summation of the filtered query cost and the data query cost, for example, the total query cost_total=the filtered query cost cost_filter×q1+the data query cost cost_from×q2, and q1 and q2 are weights corresponding to the filtered query cost_filter and the data query cost cost_from, respectively. For example, q1 and q2 are both within [0,1 ]. It will be appreciated that q1 and q2 may be the same or different, e.g., q1 may be 1 and q2 may not be 1.
In some embodiments, predicting a filtered query cost for the query request based on the filtering condition includes:
Judging whether the filtering condition comprises filtering sub-queries or not;
predicting a first filtering cost of the filtering sub-query in response to the filtering condition including the filtering sub-query;
predicting a second filtering cost of the filtering condition based on a result length of the filtering sub-query;
and carrying out weighted summation based on the first filtering cost and the second filtering cost to obtain the filtering query cost.
Specifically, it is determined whether the filtering condition includes a filtering sub-query, for example, whether there is a filtering sub-query (e.g., a right-value sub-query) in the filtering condition of the query term "WHERE filter 1". When the filtering condition of the query sentence "WHERE filter1" includes filtering sub-query filter1, a first filtering cost cost_filter1 of filtering sub-query filter1 can be calculated first; then, replacing the right value in the query statement, namely the filtered sub-query filter1, with a sub-query result length (result (filter 1)), carrying out cost calculation on the rewritten query statement to obtain a second filtering cost cost_length (result (filter 1)), and finally, obtaining the corresponding weights of the first filtering cost cost_filter1 and the second filtering cost cost_length (result (filter 1)) by filtering the query cost cost_filter=the first filtering cost_filter1×q3+the second filtering cost cost_length (result (filter 1))×q4, wherein q3 and q4 are respectively. It should be appreciated that q3 and q4 may be the same or different.
In some embodiments, predicting a filtered query cost of the query request based on the filtering condition further comprises:
and predicting a data query cost of the query request based on the data source type in response to the filtering condition not including the filtering sub-query.
Wherein, referring to fig. 4, fig. 4 shows a schematic diagram of query total cost calculation according to an embodiment of the present disclosure. In fig. 4, a server receives a query request, and may perform parameter analysis on the query request to obtain a query parameter. Judging whether the query request comprises a filter sub-query according to the query parameters, if the query request comprises the filter sub-query, extracting the sub-query in the filter, and calculating the filtering query cost sub_filter_cost of the filter sub-query. The from source of the query request may then be determined. If the query request does not include a filter sub-query, the from source may be further determined directly.
In some embodiments, the data source type comprises a single object;
predicting a data query cost for the query request based on the data source type, including:
predicting a first screening cost, a first ordering cost and a first calculated field cost of the query request based on first object statistics and first field statistics of the single object;
And obtaining the data query cost based on the sum of the first screening cost, the first sorting cost and the first calculation field cost.
Wherein the data source type, when comprising a single object, illustrates that the query request is a single object query that does not contain both the associated query and the sub-query. As shown in fig. 4, for a single object from a source, a data query cost may be predicted based on first statistical information of the single object. The first statistics of the single object may include first object statistics and first field statistics. The data query cost base_query_cost of the query request may be derived based on a sum of the first filtering cost, the first ordering cost, and the first calculated field cost. The first filtering cost, the first sorting cost and each part of cost of the first calculation field cost can be estimated based on statistical information when calculating the cost.
In some embodiments, the first object statistics include at least one of: the total number of records or the number of undeleted records of the single object, and the first field statistical information comprises at least one of the following: the number of the multi-value type average values of the null rate output field of the output field, the number of different values in the output field, the high-frequency word distribution of the output field and the histogram distribution of the output field; the query parameters further include: screening conditions, sorting conditions and calculation fields;
Predicting a first screening cost, a first ordering cost, and a first computed field cost of the query request based on the first object statistics and the first field statistics of the single object, comprising:
determining a selectivity of the screening condition based on the first object statistics and the first field statistics; determining the number of scanning records to be screened based on the selectivity to obtain the first screening cost;
obtaining a corresponding screening result record number based on the screening condition, and determining the first sorting cost based on the screening result record number and the sorting condition;
and determining the operation record number of the associated index field based on the associated index field of the calculated field and the corresponding operation, and obtaining the cost of the first calculated field.
Wherein the first object statistics may include at least one of: sum_record, number of undeleted records num_not_delete; the first field statistics may include at least one of: null rate of output field, number of multi-value type average values num_avg of output field, number of different values num_difference of output field, high frequency word distribution of output field (100 field values with highest frequency of occurrence and frequency of occurrence) num_high-frequency, histogram distribution of output field. Base group Query parameters including screening conditions, sorting conditions, calculation fields and the like can be obtained from the analysis of the query request. The filtering condition may be any field in a table of the database, and filtering costs are different when different fields are used for filtering, for example, filtering costs are different when indexes are included in the filtering condition and filtering costs are different when indexes are not included in the filtering condition; the sorting condition may refer to a condition for sorting the number of the query pre-estimated returned records; the computation field may refer to a field that redefines data source fields (including dimensions, metrics, parameters, etc.) using functions and operators to construct formulas. Specifically, the screening cost may be the number of records in which the number of scans is evaluated by calculating the selectivity of the screening conditions through statistical information. The sorting cost may refer to the number of operation records that sort the screening result based on the sorting condition; for example, if the record number recordcount returned by the query estimate is greater than a first preset value (e.g., 1 ten thousand), the ranking cost getordercount=ranking weight If the number recordcount of the filtering results is less than or equal to the first preset value (e.g. 1 ten thousand), the ranking cost getordercount=ranking weight Calculating the field cost can refer to determining the operation record number of the associated index field according to the associated index field of the calculated field and the corresponding logic operation; for example, calculate field cost getcomputefildscost = sum of various types of calculate field costs x recordcount, where each type of calculate field cost is: the weight corresponding to each type of calculation field is multiplied by the number of calculation fields; the weights corresponding to the various types of computation fields may be determined based on the different computation fields, e.g., the weights corresponding to the computation fields may be 5. Summing the first filtering cost, the first sorting cost and the first calculated field cost of the single object to obtain a single pair Data query cost base_query_cost like data source type. Further, the total query cost_total=data query cost base_query_cost+the filtered query cost sub_filter_cost×q5 of the query request can be obtained by combining the filtered query cost, and q5 is the weight of the filtered query cost sub_filter_cost.
In some embodiments, the data source type includes a data source sub-query;
predicting a data query cost for the query request based on the data source type, including:
predicting the data source sub-query cost of the data source sub-query;
determining sub-query virtual objects and corresponding second statistical information based on data source sub-query results of the data source sub-queries; wherein the second statistical information comprises second object statistical information and second field statistical information of the data source sub-query result;
predicting sub-query virtual object costs of the sub-query virtual object based on the second statistical information;
and carrying out weighted summation based on the sub-query cost of the data source and the virtual object cost of the sub-query to obtain the data query cost.
The data source type comprises sub-query results which indicate that the data source scanned by the query request is the sub-query when the sub-query is performed. As shown in fig. 4, for the sub-query with the from source, the sub-query part may be disassembled to perform cost calculation sub-query_cost, specifically, the sum of the filtering cost, the sorting cost and the calculated field cost of the sub-query may be calculated based on the object statistics information and the field statistics information of the sub-query, to obtain the data source sub-query cost sub-query_cost. And replacing the sub-query with the sub-query virtual object v_obj, constructing second statistical information of the sub-query virtual object v_obj, and calculating the sum of the screening cost, the sorting cost and the calculation field cost of the virtual object v_obj based on the second statistical information to obtain the sub-query virtual object cost vquery_cost. The sub-query virtual object may refer to a result set of the disassembled sub-query part, that is, an intermediate result of the query request, where each record in the sub-query result set has the same data and different structure, similar to the real object, and may represent the structure by a field. Accordingly, the data query cost is obtained based on weighted summation of the sub-query cost sub-query_cost and the sub-query virtual object cost vquery_cost. Further, the total query cost_total=the data query cost vquery_cost+sub_query_cost×q6+the filtered query cost sub_filter_cost×q7 of the query request can be obtained by combining the filtered query cost, q6 is the weight of the sub query cost of the data source, and q7 is the weight of the filtered query cost sub_filter_cost. It will be appreciated that q6 and q7 may be the same or different.
In some embodiments, the second object statistics include a record number of sub-query results for the data source; the second field statistical information comprises statistical information of an output field of the data source sub-query;
predicting a sub-query virtual object cost for the sub-query virtual object based on the second statistical information, comprising:
predicting a second screening cost, a second ordering cost and a second calculated field cost of the query request based on the second object statistics and the second field statistics;
and obtaining the sub-query virtual object cost based on the sum of the second screening cost, the second sorting cost and the second calculation field cost.
The second field statistics may further include statistics of calculation fields of the data source sub-query, and may be obtained according to a calculation mode corresponding to the calculation fields and statistics of output fields of the data source sub-query on which the calculation fields depend. And combining the screening condition, the sorting condition, the calculated field and other query parameters to obtain a second screening cost, a second sorting cost and a second calculated field cost of the query request.
In some embodiments, the data source type includes a plurality of associated objects;
Predicting a data query cost for the query request based on the data source type, including:
predicting the associated operation cost of the plurality of associated objects;
determining an associated virtual object and corresponding third statistical information based on association operation results of the plurality of associated objects; wherein the third statistical information comprises third object statistical information and third field statistical information of the association operation result;
predicting an associated virtual object query cost of the associated virtual object based on the third statistical information;
and carrying out weighted summation based on the association operation cost and the association virtual query cost to obtain the data query cost.
The data source type comprises a query result which indicates that the data source scanned by the query request is a plurality of association objects in association with the query. As shown in fig. 4, for a plurality of associated objects from a source, a part of the plurality of associated objects may be disassembled to perform cost calculation join_cost, specifically, a sum of filtering cost, sorting cost and field cost of the plurality of associated objects may be calculated based on object statistics information and field statistics information of the plurality of associated objects, to obtain an associated operation cost join_cost. For example, the associated operation cost join_cost may be the sum of the number of scans per table x the cost per scan. And replacing the plurality of associated objects with the associated virtual object v_obj, constructing third statistical information of the associated virtual object v_obj, and calculating the sum of the screening cost, the sorting cost and the calculated field cost of the virtual object v_obj based on the third statistical information to obtain the associated virtual object query cost vquery_cost.
Accordingly, the data query cost is obtained by carrying out weighted summation based on the associated operation cost join_cost and the associated virtual object query cost vquery_cost. Further, the total query cost_total=data query cost vquery_cost+join_cost×q8+filter query cost sub_filter_cost×q9 of the query request can be obtained by combining the filter query cost, q8 is the weight of the associated operation cost join_cost, and q9 is the weight of the filter query cost sub_filter_cost. It should be appreciated that q8 and q9 may be the same or different.
In some embodiments, the third object statistics include a number of outcome records of association operations of the plurality of associated objects; the third field statistical information comprises field statistical information of each association object and calculated field statistical information of corresponding association operation;
predicting an associated virtual object query cost for the associated virtual object based on the third statistical information, comprising:
predicting a third screening cost, a third ordering cost and a third calculated field cost of the query request based on the third object statistics and the third field statistics;
and obtaining the data query cost based on the sum of the third screening cost, the third sorting cost and the third calculation field cost.
The third field statistics may further include statistics of calculation fields of a plurality of associated objects, and may be obtained according to a calculation mode corresponding to the calculation field and statistics of output fields of the data source sub-query on which the calculation field depends. And combining the screening condition, the sorting condition, the calculated field and other query parameters to obtain a third screening cost, a third sorting cost and a third calculated field cost of the query request.
It can be seen that, according to the query method of the embodiment of the present disclosure, for a query request including a sub-query or a related query of multiple objects, a sub-query portion or multiple object portions may be disassembled to perform data query cost calculation; and after the calculation is completed, replacing the sub-query part or a plurality of object parts with virtual objects, and constructing the statistical information of the virtual objects. The data source of the virtual object may include a result set of the disassembled part, and the statistical information of the virtual object may include the estimated result record number of the disassembled part, i.e. the object statistical information; the statistics of the virtual object may also include output fields and calculation fields of the disassembled part, i.e. field statistics; such as the output fields of a sub-query or all object fields at the time of the association operation. Wherein, the calculation field can be constructed according to the calculation mode and the statistical information of the field on which the calculation field depends. For example, the calculation field COALESCE (f 1, f2, f 3) represents the first non-null value in the return field f1, f2, f 3; the statistics of this calculation field coaesece (f 1, f2, f 3) may comprise at least one of the following: null rate, i.e., the product of the null rates of fields f1, f2, f 3; the average number of the multi-value types can be 0 (the output result of the calculation field is a single-value type); the number of different values can be the average value of the number of different values of f1, f2 and f 3; the high frequency word distribution/histogram distribution may be none. For another example, the calculation field CAST (f as int) indicates that the field f is type-converted, and the statistics of this calculation field CAST (f as int) may include at least one of: null rate, which may be the null rate of field f; the average number of the multi-value types can be 0 (the output result of the calculation field is a single-value type); the number of different values may be the number of different values of field f; the high-frequency word distribution can be that the value of each high-frequency word in the high-frequency word distribution of the field f is converted into an int type; the histogram distribution may be a conversion of the value of each histogram in the histogram distribution of the field f into an int type. And then calculating the sum of the screening cost, the sorting cost and the calculated field cost of the disassembled part based on the statistical information of the virtual object to obtain the query cost of the virtual object. And carrying out weighted summation on the query cost of the sub-query part or the plurality of object parts and the virtual object query cost, and obtaining the data query cost of the query request.
In step S340, a query policy is determined based on the query total cost to execute the query request.
In some embodiments, determining a query policy based on the query total cost may include:
determining the difference between the corresponding query total cost under the preset index information and the query total cost under the no index information to obtain the query benefits of the preset index information;
determining preset index information corresponding to the maximum value in the query benefits as recommended index information;
recommending the recommendation index information to a user for executing the query.
Specifically, index information recommendation can be performed based on query benefits after the index is created on the fields, and fields with higher query benefits can be recommended to the user. The calculation mode of the query benefits can be based on the query total cost, as described above, index information is considered when the filtering cost is calculated, the preset field index information can be used for replacing real index information to calculate the query total cost after_cost, and the query total cost before_cost without indexes, so that the benefits of the preset field index information are before_cost-after_cost. The preset index information corresponding to max (before_cost-after_cost) can be recommended to the user.
In some embodiments, determining a query policy based on the query total cost may include:
determining a concurrent execution threshold of the query request based on the query total cost;
and refusing to execute the query request in response to the concurrency number of the hash value of the query request being higher than the concurrency execution threshold.
In some embodiments, determining a query policy based on the query total cost may include:
and allowing the query request to be executed in response to the concurrency number of hash values of the query request being less than the concurrency execution threshold.
Specifically, a hash value may be calculated for each query request according to the query parameter, where the query requests with the same hash value may be limited by the concurrency amount, where the limit threshold for the concurrency amount may be determined according to the total cost of the query, for example, the concurrency execution threshold may be 20 when the total cost of the query is between 100 and 500, and the concurrency execution threshold may be 10 when the total cost of the query is between 500 and 1000. And counting the concurrency quantity of the corresponding hash values when one query request is received each time, and rejecting the current query request when the concurrency quantity exceeds a concurrency execution threshold. Thus, different concurrent current limiting thresholds are set according to the resource consumption of the query request, and the thresholds with more resource consumption are lower, so that excessive system resources are prevented from being occupied.
In some embodiments, determining a query policy based on the query total cost may include:
calculating the sum of the query total costs of the query requests in a preset time window based on a preset dimension;
and in response to the load rate of the master database being above a load rate threshold, routing the query requests associated with the preset dimension and the sum of the query total costs being above a preset value to a slave database.
The preset dimension may include an application package or a traffic source, among others. The resources of the database are limited and when the database is overloaded, submitting a request for computational cost to the database can further burden the database. In order to avoid downtime of the database caused by too high load, the query submitted by the user needs to be limited, and the limiting basis can also be derived from the consumption of the query on system resources. Specifically, when a query request is received, the total cost of the query can be calculated, and the sum of the total cost of the query in a time window is counted according to multiple dimensions (application package and flow source), when the database load is high (the CPU or the memory usage rate exceeds a certain threshold), the dimension with the higher sum of the total cost of the current query is selected as a service degradation target, for example, if the sum of the total cost of the query of the application package A is highest, all requests from the application package A are routed to the slave library, and when the database load is recovered to be normal, the request of the application package A is recovered to the master library.
Therefore, according to the query method of the embodiment of the disclosure, the aPaaS platform developer is assisted to establish a proper index through index recommendation, so that the query performance is improved; the query is limited through current limiting and database overload protection, so that the resource consumption of platform users is balanced, and the stability of the system is improved while the fairness is ensured.
It should be noted that the method of the embodiments of the present disclosure may be performed by a single device, such as a computer or a server. The method of the embodiment can also be applied to a distributed scene, and is completed by mutually matching a plurality of devices. In the case of such a distributed scenario, one of the devices may perform only one or more steps of the methods of embodiments of the present disclosure, the devices interacting with each other to accomplish the methods.
It should be noted that the foregoing describes some embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments described above and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.
Based on the same technical concept, corresponding to the method of any embodiment, the disclosure further provides a query device, referring to fig. 5, where the query device includes:
the receiving module is used for receiving the query request;
the analysis module is used for analyzing and obtaining query parameters based on the query request, wherein the query parameters comprise filtering conditions and data source types;
a cost module for predicting a total cost of the query request based on the filtering condition and the data source type; wherein the data source type comprises a single object, a sub-query, or a plurality of associated objects;
and the strategy module is used for determining a query strategy based on the query total cost so as to execute the query request.
For convenience of description, the above devices are described as being functionally divided into various modules, respectively. Of course, the functions of the various modules may be implemented in the same one or more pieces of software and/or hardware when implementing the present disclosure.
The device of the foregoing embodiment is configured to implement the corresponding query method in any of the foregoing embodiments, and has the beneficial effects of the corresponding method embodiment, which is not described herein.
Based on the same technical concept, corresponding to any of the above embodiments of the method, the present disclosure further provides a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the query method as described in any of the above embodiments.
The computer readable media of the present embodiments, including both permanent and non-permanent, removable and non-removable media, may be used to implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device.
The storage medium of the foregoing embodiments stores computer instructions for causing the computer to execute the query method described in any of the foregoing embodiments, and has the beneficial effects of corresponding method embodiments, which are not described herein.
Those of ordinary skill in the art will appreciate that: the discussion of any of the embodiments above is merely exemplary and is not intended to suggest that the scope of the disclosure, including the claims, is limited to these examples; the technical features of the above embodiments or in the different embodiments may also be combined under the idea of the present disclosure, the steps may be implemented in any order, and there are many other variations of the different aspects of the embodiments of the present disclosure as described above, which are not provided in details for the sake of brevity.
Additionally, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown within the provided figures, in order to simplify the illustration and discussion, and so as not to obscure the embodiments of the present disclosure. Furthermore, the devices may be shown in block diagram form in order to avoid obscuring the embodiments of the present disclosure, and this also accounts for the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform on which the embodiments of the present disclosure are to be implemented (i.e., such specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative in nature and not as restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of those embodiments will be apparent to those skilled in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic RAM (DRAM)) may use the embodiments discussed.
The disclosed embodiments are intended to embrace all such alternatives, modifications and variances which fall within the broad scope of the appended claims. Accordingly, any omissions, modifications, equivalents, improvements, and the like, which are within the spirit and principles of the embodiments of the disclosure, are intended to be included within the scope of the disclosure.

Claims (13)

1. A method of querying, the method comprising:
receiving a query request;
analyzing and obtaining query parameters based on the query request, wherein the query parameters comprise filtering conditions and data source types;
predicting a total query cost for the query request based on the filtering conditions and the data source type; wherein the data source type comprises a single object, a sub-query, or a plurality of associated objects;
and determining a query strategy based on the query total cost to execute the query request.
2. The method of claim 1, wherein predicting a total query cost for the query request based on the filtering conditions and the data source type comprises:
predicting a filtered query cost of the query request based on the filtering condition;
predicting a data query cost of the query request based on the data source type;
and carrying out weighted summation based on the filtered query cost and the data query cost to obtain the query total cost.
3. The method of claim 2, wherein predicting a filtered query cost for the query request based on the filtering condition comprises:
judging whether the filtering condition comprises filtering sub-queries or not;
predicting a first filtering cost of the filtering sub-query in response to the filtering condition including the filtering sub-query;
predicting a second filtering cost of the filtering condition based on a result length of the filtering sub-query;
and carrying out weighted summation based on the first filtering cost and the second filtering cost to obtain the filtering query cost.
4. The method of claim 2, wherein the data source type comprises a single object;
predicting a data query cost for the query request based on the data source type, including:
predicting a first screening cost, a first ordering cost and a first calculated field cost of the query request based on first object statistics and first field statistics of the single object;
and obtaining the data query cost based on the sum of the first screening cost, the first sorting cost and the first calculation field cost.
5. The method of claim 4, wherein the first object statistics include at least one of: the total number of records or the number of undeleted records of the single object, and the first field statistical information comprises at least one of the following: the null rate of the output field, the average value number of the multi-value types of the output field, the number of different values in the output field, the high-frequency word distribution of the output field and the histogram distribution of the output field; the query parameters further include: screening conditions, sorting conditions and calculation fields;
Predicting a first screening cost, a first ordering cost, and a first computed field cost of the query request based on the first object statistics and the first field statistics of the single object, comprising:
determining a selectivity of the screening condition based on the first object statistics and the first field statistics; determining the number of scanning records to be screened based on the selectivity to obtain the first screening cost;
obtaining a corresponding screening result record number based on the screening condition, and determining the first sorting cost based on the screening result record number and the sorting condition;
and determining the operation record number of the associated index field based on the associated index field of the calculated field and the corresponding operation, and obtaining the cost of the first calculated field.
6. The method of claim 2, wherein the data source type comprises a data source sub-query;
predicting a data query cost for the query request based on the data source type, including:
predicting the data source sub-query cost of the data source sub-query;
determining sub-query virtual objects and corresponding second statistical information based on data source sub-query results of the data source sub-queries; wherein the second statistical information comprises second object statistical information and second field statistical information of the data source sub-query result;
Predicting sub-query virtual object costs of the sub-query virtual object based on the second statistical information;
and carrying out weighted summation based on the sub-query cost of the data source and the virtual object cost of the sub-query to obtain the data query cost.
7. The method of claim 6, wherein the second object statistics include a record count of the data source sub-query results; the second field statistical information comprises statistical information of an output field of the data source sub-query;
predicting a sub-query virtual object cost for the sub-query virtual object based on the second statistical information, comprising:
predicting a second screening cost, a second ordering cost and a second calculated field cost of the query request based on the second object statistics and the second field statistics;
and obtaining the sub-query virtual object cost based on the sum of the second screening cost, the second sorting cost and the second calculation field cost.
8. The method of claim 2, wherein the data source type comprises a plurality of associated objects;
predicting a data query cost for the query request based on the data source type, including:
Predicting the associated operation cost of the plurality of associated objects;
determining an associated virtual object and corresponding third statistical information based on association operation results of the plurality of associated objects; wherein the third statistical information comprises third object statistical information and third field statistical information of the association operation result;
predicting an associated virtual object query cost of the associated virtual object based on the third statistical information;
and carrying out weighted summation based on the association operation cost and the association virtual query cost to obtain the data query cost.
9. The method of claim 8, wherein the third object statistics include a result record number of association operations of the plurality of associated objects; the third field statistical information comprises field statistical information of each association object and calculated field statistical information of corresponding association operation;
predicting an associated virtual object query cost for the associated virtual object based on the third statistical information, comprising:
predicting a third screening cost, a third ordering cost and a third calculated field cost of the query request based on the third object statistics and the third field statistics;
And obtaining the associated virtual object query cost based on the sum of the third screening cost, the third sorting cost and the third calculation field cost.
10. The method of claim 1, wherein determining a query policy based on the query total cost comprises:
determining the difference between the corresponding query total cost under the preset index information and the query total cost under the no index information to obtain the query benefits of the preset index information;
determining preset index information corresponding to the maximum value in the query benefits as recommended index information;
recommending the recommendation index information to a user for executing the query;
or,
determining a concurrent execution threshold of the query request based on the query total cost;
refusing to execute the query request in response to the concurrency number of hash values of the query request being higher than the concurrency execution threshold;
or,
calculating the sum of the query total costs of the query requests in a preset time window based on a preset dimension;
and in response to the load rate of the master database being above a load rate threshold, routing the query requests associated with the preset dimension and the sum of the query total costs being above a preset value to a slave database.
11. A query device, comprising:
the receiving module is used for receiving the query request;
the analysis module is used for analyzing and obtaining query parameters based on the query request, wherein the query parameters comprise filtering conditions and data source types;
a cost module for predicting a total cost of the query request based on the filtering condition and the data source type; wherein the data source type comprises a single object, a sub-query, or a plurality of associated objects;
and the strategy module is used for determining a query strategy based on the query total cost so as to execute the query request.
12. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 10 when the program is executed.
13. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 10.
CN202311747367.5A 2023-12-18 2023-12-18 Query method and related equipment Pending CN117591744A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311747367.5A CN117591744A (en) 2023-12-18 2023-12-18 Query method and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311747367.5A CN117591744A (en) 2023-12-18 2023-12-18 Query method and related equipment

Publications (1)

Publication Number Publication Date
CN117591744A true CN117591744A (en) 2024-02-23

Family

ID=89911589

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311747367.5A Pending CN117591744A (en) 2023-12-18 2023-12-18 Query method and related equipment

Country Status (1)

Country Link
CN (1) CN117591744A (en)

Similar Documents

Publication Publication Date Title
KR101939554B1 (en) Determining a temporary transaction limit
US10936765B2 (en) Graph centrality calculation method and apparatus, and storage medium
CN109101658B (en) Information searching method and device, and equipment/terminal/server
US11250250B2 (en) Pedestrian retrieval method and apparatus
US20160328445A1 (en) Data Query Method and Apparatus
US20240126817A1 (en) Graph data query
CN111625561B (en) Data query method and device
US10810458B2 (en) Incremental automatic update of ranked neighbor lists based on k-th nearest neighbors
CN113609409B (en) Method and system for recommending browsing information, electronic equipment and storage medium
KR101568800B1 (en) Real-time issue search word sorting method and system
CN110442616B (en) Page access path analysis method and system for large data volume
CN117591744A (en) Query method and related equipment
CN114637790A (en) List data query method and device, electronic equipment and storage medium
CN114491253A (en) Observation information processing method, device, electronic device and storage medium
CN110362603B (en) Feature redundancy analysis method, feature selection method and related device
CN106484747A (en) A kind of webpage item recommendation method based on alternative events and device
CN112434056A (en) Method and device for inquiring detailed data
CN110750555A (en) Method, apparatus, computing device, and medium for generating index
CN112395510A (en) Method and device for determining target user based on activity
CN105447020A (en) Method and apparatus for determining business object keywords
CN117056663B (en) Data processing method and device, electronic equipment and storage medium
CN116561735B (en) Mutual trust authentication method and system based on multiple authentication sources and electronic equipment
CN117389514B (en) Method, device and system for classifying micro-application development requirements of fusion terminal
CN114021009A (en) Searching method and device, electronic equipment and readable storage medium
CN114356292A (en) Interactive information processing method and device and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination