WO2019165762A1 - 一种抽样查询的方法和装置 - Google Patents

一种抽样查询的方法和装置 Download PDF

Info

Publication number
WO2019165762A1
WO2019165762A1 PCT/CN2018/100561 CN2018100561W WO2019165762A1 WO 2019165762 A1 WO2019165762 A1 WO 2019165762A1 CN 2018100561 W CN2018100561 W CN 2018100561W WO 2019165762 A1 WO2019165762 A1 WO 2019165762A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
data
index partitions
target
index
Prior art date
Application number
PCT/CN2018/100561
Other languages
English (en)
French (fr)
Inventor
毕杰山
钟超强
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2019165762A1 publication Critical patent/WO2019165762A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load

Definitions

  • the present application relates to the field of storage and, more particularly, to a method and apparatus for sampling queries in the field of storage.
  • index information for querying data can be generated based on data.
  • the index information may be separately stored in multiple index partitions, so that after receiving the query request sent by the client, the server may query data in multiple index partitions that meet the query condition at the same time, and finally All the data in the multiple index partitions that satisfy the query condition are fed back to the client.
  • the present application provides a method and apparatus for sampling query, which can effectively reduce the response delay of the feedback query result and improve the user experience.
  • a method of sampling a query comprising:
  • N target index partitions from the M index partitions, so that a ratio of N to M corresponds to the sampling query ratio, wherein the M index partitions are determined according to the query condition.
  • M is an integer greater than 1
  • N is an integer greater than or equal to 1
  • N is less than M;
  • the query result includes at least one of the following: the M index partitions satisfy the query condition The number of data, the data of the N target index partitions that satisfy the query condition;
  • the method for sampling query determines a partial index partition (ie, N target index partitions) from the M index partitions determined based on the query condition by sampling the query ratio, and thus may only Performing a data query in the N target index partitions, and generating a query result including at least one of the number of data satisfying the query condition in the M index partitions and the data satisfying the query condition in the target index partition, thereby avoiding
  • the response delay caused by the feedback of the query results of the M index partitions is long, which effectively reduces the response delay of the feedback query result and improves the user experience.
  • the embodiment of the present application takes up less data than the data query in the M index partitions in the prior art.
  • the computing resource is configured to enable the device to perform data query for other query conditions in other index partitions in the M index partitions, thereby increasing query concurrency.
  • the generating the query result according to the queried data includes:
  • the method further includes:
  • N target index partitions from the M index partitions according to the sampling query ratio including:
  • the method for sampling query determines the index partition in the N target computing nodes with the smallest load as the N target index partitions by sampling the query ratio and the load information of the computing node corresponding to the query condition.
  • the load of the computing nodes including the N target computing nodes can be relatively balanced, and the data query by the N target index partitions can further improve the query speed.
  • the method before the generating the query result according to the queried data, the method further includes:
  • the device before generating the query result according to the queried data, the device generates N target computing nodes by adding the resource consumption value of the query condition to the load of the N target computing nodes.
  • the updated load information can enable the device to provide real and effective data when selecting the target index partition for subsequent query conditions, so as to make the load of the computing node tend to be as uniform as possible, which is beneficial to reduce the response delay.
  • the calculating, according to the query condition, the resource consumption value of the computing resource consumed in the data query process including:
  • the resource consumption values of all the query nodes corresponding to the at least one participle are summed to obtain a resource consumption value of the query condition.
  • the method further includes:
  • the resource consumption value is subtracted from the load indicated by the updated load information of the N target computing nodes.
  • the device deducts the resource consumption value of the query condition from the load of the N target computing nodes to release the computing resource.
  • the device is provided with real and effective data when selecting the target index partition for subsequent query conditions, so as to make the load of the computing node become uniform as much as possible, which is beneficial to reduce the response delay.
  • the query condition includes the sampled query scale.
  • an apparatus for sampling a query for performing the method of the first aspect or any possible implementation of the first aspect comprises means for performing the method of the first aspect or any of the possible implementations of the first aspect.
  • an apparatus for sampling queries comprising a processor and a memory; the memory is for storing computer execution instructions, and the processor and the memory communicate with each other through an internal connection path.
  • the processor executes the computer-executed instructions stored by the memory to cause the device to perform various ones of the first aspect or any of the possible implementations of the first aspect, while the device is running.
  • a computer storage medium comprising computer-executable instructions that, when executed by a processor of a computer, perform any of the above-described first or first aspects The various processes in the implementation.
  • a chip comprising a processor and a memory
  • the processor is configured to execute instructions stored by the memory, and when the instructions are executed, the processor can implement the first aspect or Each of the possible implementations of the first aspect.
  • FIG. 1 is a schematic diagram of a data storage system suitable for use in embodiments of the present application.
  • FIG. 2 is a schematic diagram of querying data in multiple index partitions based on query conditions in the prior art.
  • FIG. 3 is a schematic flowchart of a method of sampling a query according to an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of an apparatus for sampling a query in accordance with an embodiment of the present application.
  • FIG. 5 is a schematic structural diagram of an apparatus for sampling a query according to an embodiment of the present application.
  • FIG. 6 is a schematic structural diagram of a chip according to an embodiment of the present application.
  • FIG. 1 is a schematic diagram of a data storage system suitable for use in an embodiment of the present application.
  • the data storage system 100 includes a terminal device 110 and a device 120 for sample inquiry, which terminal device can be connected to the device 120 via a wired or wireless network.
  • the terminal device 110 has a request data inquiry function.
  • the client device 110 can be installed with a client capable of requesting a data query function, for example, the client can be a browser.
  • the terminal device 110 may be a mobile phone, a tablet computer, an e-reader, a personal computer, an in-vehicle device, a wearable device, or the like.
  • the terminal device 110 has a request data storage function.
  • the device 120 of the sample query has a data query function that can perform data query based on a query request sent from the terminal device 110.
  • the device 120 for the sample query may be a device for querying data, such as a computing device, a storage device, or a server.
  • the device 120 of the sample query has a data storage function, and the database set in the device 120 of the sample query is used to store data.
  • the database may be a distributed database such as HBase, Mongo Database (Mongo Database), Distributed Relational Database Service (DRDS), Volt Database (VoltDB), and Cassandra.
  • the device 120 of the sample query can store data through a database based on a data storage request sent by a user through a client of the terminal device 110.
  • FIG. 1 is for illustrative purposes only and should not be construed as limiting the embodiments of the present application.
  • the data storage system may include only the device 120 of the sample query that has not only the query function but also the request data query function.
  • the device 120 of the sample query can receive the query condition input by the user through the client in the device 120 of the sample query.
  • sampling device 120 as a storage device as an example.
  • index information generated based on data in one data table may be separately stored into multiple index partitions, and index information in each index partition corresponds to a part of the data table.
  • the index information of the data, the index information of any two index partitions are different, and the sum of the index information stored by the multiple index partitions is the complete index information corresponding to the data table.
  • the index information includes a correspondence between a plurality of word segments and a plurality of data identifiers, and one word segment corresponds to at least one data identifier.
  • the word segmentation is the tag value of the object identified by the corresponding data identifier.
  • the object identified by the data identifier may be any natural object.
  • the object identified by the data identifier may be a person, a car, a phone number, a virtual user account, or the like.
  • a tag is a way of organizing Internet content that characterizes a certain feature of an object to help people describe and classify content. For example, common labels are gender, education, occupation, color, and so on. Usually, labels are artificially defined.
  • the tag value is the specific value of the tag. For example, if the label is gender and the gender is male, the label value is male; if the label is academic and the education is undergraduate, the label value of the label is undergraduate.
  • Table 1 shows a data table for representing user data in a cell.
  • the first column in the data table is the data identifier
  • the second column to the fourth column are the different tag values of the objects identified by the data identifier.
  • the object identified by the data identifier is a user in the cell, and at the same time, the ID of the user is identified as a data.
  • A01 Take the data identification as A01 as an example.
  • the characteristics of the user identified by A01 are: gender is male, education is undergraduate, and occupation is student.
  • the storage device After the storage device stores the data in Table 1, the index information of Table 1 can be established for the convenience of query, and the index information is stored in multiple index partitions in order to speed up the query.
  • index partition #1 For example, assume that the index information of the data in Table 1 is stored in three index partitions, that is, index partition #1, index partition #2, and index partition #3, and each index partition stores an index corresponding to a part of data. information. among them:
  • Index partition #1 is index information of data identified by A01, A02, B01, B02;
  • Index partition #2 is index information of data identified by C01, C02, D01, D02;
  • Index partition #3 is index information of data identified by E01, E02, F01, and F02.
  • Table 2 shows the index information in each index partition of the three index partitions.
  • the storage device receives the query request sent by the terminal device through the client as an example.
  • the client of the terminal device can directly send the query request to the storage device.
  • the data is searched through the index information of each index partition to find data satisfying the query condition, and after the query is completed in the multiple index partitions, the plurality of index partitions are satisfied.
  • the data of the query condition is fed back to the terminal device.
  • the response of the query mode is long, which affects the user experience.
  • the data satisfying the query condition may have thousands or more, and the user may only want to know the amount of data satisfying the query condition.
  • the present application provides a method for sampling queries, which helps to reduce the response delay of query results and improve user experience.
  • FIG. 3 is a schematic flowchart of a method for sampling query according to an embodiment of the present application.
  • the executor of the method 200 may be a storage device in a device for sampling, and may specifically be a processor in the storage device.
  • FIG. 3 will be described in detail.
  • step S210 a query condition is acquired.
  • the storage device may receive the query condition sent by the client of the terminal device, or the storage device may directly receive the query request sent by the client of the storage device to the processor of the storage device.
  • the query condition includes at least one word segment.
  • the query condition includes a plurality of word segments
  • the query condition further includes a logical operator for connecting between two adjacent word segments, wherein the logical operators include "and” and "non” "and / or".
  • the logical operators include "and” and "non” "and / or”.
  • "and” can be expressed as "&&”, "not” can be represented as "!, and "or” can be expressed as "
  • the query condition may be: gender: male & & education: undergraduate & & occupation: student, that is, the object that needs to be queried must satisfy the three tag values or three participles in the query condition at the same time.
  • N target index partitions are determined from the M index partitions according to the sampling query ratio, such that the ratio of N to M corresponds to the sampling query ratio, wherein the M index partitions are determined according to the query condition.
  • M is an integer greater than 1
  • N is an integer greater than or equal to 1
  • N is less than M;
  • N target index partitions are determined from the M index partitions according to the sampling query ratio, such that the ratio of N to M corresponds to the sampling query ratio, wherein the M index partitions are determined according to the query condition.
  • M is an integer greater than 1
  • N is an integer greater than or equal to 1
  • N is less than M.
  • the sample query ratio is a value for the query condition, and is used to represent a preset ratio of the number of index partitions for performing data query to the number of index partitions.
  • the multiple index partitions are the M index partitions.
  • the storage device determines N target index partitions for data query from the M index partitions based on the sample query ratio.
  • the query condition further includes indication information for indicating metadata of the target data table that is queried based on the query condition.
  • the metadata of the target data table includes attribute information for indicating data in the target data table.
  • the metadata of the target data table includes information indicating an index partition storing index information of the target data table.
  • the storage device may determine the metadata of the target data table based on the indication information in the query condition, and further determine the M index partitions storing the index information of the target data table based on the metadata of the target data table.
  • mode A and mode B two methods for determining N target index partitions from the M index partitions based on the sample query ratio in the embodiment of the present application are described in detail.
  • the N index partitions in the M index partitions are used as the N target index partitions, wherein the N target index partitions may be any N index partitions in the M index partitions.
  • N index partitions may be arbitrarily selected among the M index partitions, and the N index partitions are the N targets. Index partition.
  • each index partition has a corresponding computing node
  • the M index partitions correspond to M index partitions
  • one computing node may correspond to one index partition
  • the computing node is used to perform data in the corresponding index partition. Inquire.
  • the present application also provides an optional implementation.
  • the N target computing nodes with the smallest load are selected from the M computing nodes, and the index partitions corresponding to the N target computing nodes are the N target index partitions.
  • the load of the computing node represents the computing resource consumed by the computing node in the data query process, and is a real-time load.
  • the storage device may obtain, by using the metadata of the target data table, the information indicating the computing node to which the M index partitions belong, so that the M index partitions are determined by the information of the computing node. Which computing nodes, in turn, obtain load information of the computing node; then, the storage device determines the number N of index partitions for performing data query according to the sampling query ratio, and selects the minimum load from the load of the computing nodes to which the M index partitions belong N target computing nodes, such that the index partitions corresponding to the N target computing nodes are the N target index partitions.
  • the specific process of determining the number of index partitions for performing data query according to the sampling query ratio in the mode B may refer to the specific process of determining the number of index partitions for performing data query according to the sampling query ratio in the above manner A, where not Let me repeat.
  • the storage device may select not only the index partition corresponding to the N target computing nodes with the smallest load among the computing nodes to which the M index partitions belong, but also the computing node is smaller than the first load threshold.
  • the N index partitions in the index partition corresponding to the compute node are used as the target index partition.
  • the number of index partitions in the compute node whose load is less than or equal to the first load threshold is greater than or equal to N.
  • the storage device may determine the first load threshold based on the sampling query ratio and the load of the computing node to which the M index partitions belong. That is, the storage device determines the number N of index partitions for performing data query based on the sampling query ratio, and determines the first load threshold based on the load and N of the computing nodes to which the M index partitions belong, where the load is lower than or equal to the load
  • the number of index partitions in the compute node of the first load threshold is greater than or equal to N. In this way, it is possible to ensure that there are N index partitions in the target computing nodes that are subsequently selected.
  • the target index partition is determined by the sampling query ratio and the load information of the computing node corresponding to the query condition, and the load of the target computing node to which the target index partition belongs is less than or equal to the first load threshold, so that the target computing node can be included.
  • the load in the computing node is relatively balanced, and the data query through the target index partition can further improve the query speed.
  • the sampling query ratio may be a preset value in the storage device, or may be a value obtained by the storage device through the terminal device for indicating information about the sampling query ratio. If the sampling query ratio is obtained by the storage device by using the information indicating the sampling query ratio, the information indicating the sampling query ratio may be carried in the query request simultaneously with the query condition.
  • step S240 the data is queried in the N target index partitions according to the query condition, and the query result is generated according to the queried data, the query result includes at least one of the following: the M index partitions satisfy the query condition. The number of data, the data of the N target index partitions that satisfy the query condition.
  • index information is stored in each index partition, and index information in an index partition includes a correspondence between a plurality of word segments and a plurality of data identifiers, and one word segment corresponds to at least one data identifier.
  • the storage device may query data that meets the query condition according to the index information in the target index partition to generate a query result.
  • the following takes the target index partition as an index partition as an example, and describes a specific process for the storage device to query data in the target index partition according to the query condition.
  • the query condition is parsed into a query syntax tree understandable by the data storage system, wherein the query syntax tree includes a plurality of query nodes, each query node includes a word segmentation and a word segmentation and a next query for indicating the current query node.
  • the logical operator of the logical relationship between the word segmentation of a node For a specific implementation process of parsing the query condition into the query syntax tree, reference may be made to the description of the prior art, and details are not described herein again.
  • the data satisfying each word segment is queried through the index information in the target index partition;
  • the content of the data that satisfies the query condition can be searched by the corresponding data identifier. Therefore, the data that satisfies the query condition can also be understood as the data identifier that satisfies the query condition.
  • the data satisfying the query condition may also be the content of the data satisfying the query condition.
  • the query node in the query syntax tree calculated based on the query condition includes three query nodes: query node #1 is ⁇ sex: male, and ⁇ , the participle is "gender: male", logical operator is “and”; query node #2 ⁇ : undergraduate, and ⁇ , the participle is "education: undergraduate”, the logical operator is “and”; the query node #3 is ⁇ occupation: student, and ⁇ , the participle is "occupation: student", logical operation The symbol is "and”.
  • the data satisfying each participle is searched by the index information in the index partition #1, wherein the data identifier satisfying the participle "gender: male” includes: A01, B01, and the data satisfying the participle "education: undergraduate” includes: A01, B02,
  • the data identifiers that satisfy the participle "Occupation: Student” include: A01, B02.
  • the data that satisfies the query condition in the target index partition may be used as the query result in the above manner.
  • the number of data in the M index partitions that satisfy the query condition may be determined by the following manner.
  • the query result is generated according to the queried data, including:
  • Querying data in the target index partition according to the query condition to generate query results including:
  • the storage device determines the number of data that satisfies the query condition by querying data in the target index partition according to the query condition described above, and the number of data in the target index partition that satisfies the query condition. Dividing with L, the number of data in the M index partitions that satisfy the query condition is obtained (denoted as Q for ease of distinction).
  • the storage device may directly divide the number of data in the target index partition that satisfies the query condition by the sampling query ratio, thereby estimating the number of data in the M index partitions that satisfy the query condition.
  • the query result is fed back to the terminal device.
  • the query result includes at least one of data in the target index partition that satisfies the query condition and the number of data in the M index partitions that satisfy the query condition.
  • the method for sampling query determines a partial index partition (ie, a target index partition) from the M index partitions corresponding to the query condition by sampling the query proportion, and thus may only Performing a data query in the target index partition, and generating a query result including at least one of the number of data satisfying the query condition in the M index partitions and the data satisfying the query condition in the target index partition, thereby avoiding existing
  • the response delay caused by the feedback of the query result of the M index partitions is long, which effectively reduces the response delay of the feedback query result and improves the user experience;
  • the data query is performed in the partial index partitions of the M index partitions, and the data query is performed on the M index partitions in the prior art.
  • the embodiment of the present application occupies less computing resources. In this way, the storage device can perform data query for other query conditions in other index partitions in the M index partitions, which increases query concurrency.
  • the size of the sequence numbers of the foregoing processes does not mean the order of execution sequence, and the order of execution of each process should be determined by its function and internal logic, and should not be applied to the embodiment of the present application.
  • the implementation process constitutes any limitation.
  • the load of the compute node to which the index partition belongs is tended to be uniform. It is advantageous to reduce the response delay.
  • the embodiment of the present application also provides an optional implementation manner.
  • the method further includes:
  • the resource consumption value is added to the load of the N target computing nodes to obtain updated load information of the N target computing nodes.
  • the resource consumption value of the computing resource to be consumed by the query condition is estimated, and the resource consumption value is added to the load of the target computing node, and the load of the target computing node is updated.
  • the load of the target computing node is updated in time before the data is queried in the target index partition, and a valid computing resource is reserved for the storage device to query data in the target index partition.
  • the method further includes:
  • the resource consumption value is subtracted from the load indicated by the updated load information of the N target computing nodes.
  • the following describes the process of calculating the resource consumption value of the computing resource consumed in the data query process based on the query condition in the embodiment of the present application.
  • the calculating, according to the query condition, the resource consumption value of the computing resource consumed in the data query process including:
  • the resource consumption values of all the query nodes corresponding to the at least one participle are summed to obtain the resource consumption value of the query condition.
  • the query condition is parsed into a query syntax tree, and the query syntax tree includes multiple query nodes, and each query node includes a word segmentation and logic between the word segmentation of the current query node and the word segmentation of the next query node.
  • the logical operator of the relationship may calculate a resource consumption value of each query node by using a first formula and sum the resource consumption values of the at least one query node, thereby obtaining a resource consumption value of the query condition.
  • the first formula can be: (TagK) denotes a participle, for example, “education: undergraduate”, “gender: male” and “occupation: student”, etc.; Cost (TagK) indicates the resource consumption value for the participle (TagK), Cost ( The value of TagK) is related to the size of the data identifier corresponding to the word segmentation (TagK) and the logical operator corresponding to the word segmentation; Indicates that n Cost(TagK) is summed; T represents the resource consumption value for the query condition.
  • the query node based on the query condition decomposition includes: query node #1 is ⁇ sex: male, and ⁇ , and the participle is "gender: "Male”, the logical operator is "and”; the query node #2 is ⁇ degree: undergraduate, and ⁇ , the participle is "education: undergraduate”, the logical operator is "and”; the query node #3 is ⁇ occupation: student, With ⁇ , the participle is “occupation: student” and the logical operator is “and”.
  • the method for sampling query determines a partial index partition (ie, a target index partition) from the M index partitions corresponding to the query condition by sampling the query scale, and thus may only be in the target index.
  • a partial index partition ie, a target index partition
  • Performing a data query in the partition, and generating a query result including at least one of the number of data satisfying the query condition in the M index partitions and the data satisfying the query condition in the target index partition avoiding the prior art
  • the problem that the response delay is long after the query result of the M index partitions is responded to, and the response delay of the feedback query result is effectively reduced, thereby improving the user experience;
  • the data query is performed in the partial index partitions of the M index partitions, and the data query of the M index partitions must be performed in the prior art.
  • the embodiment of the present application occupies less computing resources. In this way, the storage device can perform data query for other query conditions in other index partitions in the M index partitions, thereby increasing query concurrency;
  • the target index partition is determined by the sampling query ratio and the load information of the computing node corresponding to the query condition, and the load of the target computing node to which the target index partition belongs is less than or equal to the first load threshold, so that the target computing node may be included.
  • the load of the computing nodes is relatively balanced, and the data query through the target index partition can further improve the query speed;
  • the load information of the updated target computing node is generated by adding the resource consumption value of the query condition to the load of the target computing node to which the target index partition belongs.
  • the storage device provides real and effective data when selecting the target index partition for subsequent query conditions, so as to make the load of the computing node become uniform as much as possible, which is beneficial to reduce the response delay.
  • the storage resource is released by subtracting the resource consumption value of the query condition from the load of the target computing node to which the target index partition belongs, so that the storage device can be made Subsequent other query conditions provide real and effective data when selecting the target index partition, so as to make the load of the computing node tend to be as uniform as possible, which is beneficial to reduce the response delay.
  • the apparatus includes a processing unit 310 for:
  • N target index partitions from the M index partitions according to the sampling query ratio, so that the ratio of N to M corresponds to the sampling query ratio, wherein the M index partitions are index partitions to be queried determined according to the query condition.
  • M is an integer greater than 1
  • N is an integer greater than or equal to 1
  • N is less than M;
  • the query result includes at least one of the following: the number of data in the M index partitions that meet the query condition, Data in the N target index partitions that satisfy the query condition;
  • the apparatus for sampling query determines a partial index partition (ie, a target index partition) from the M index partitions corresponding to the query condition by sampling the query proportion, and thus may only be in the target index.
  • a partial index partition ie, a target index partition
  • Performing a data query in the partition, and generating a query result including at least one of the number of data satisfying the query condition in the M index partitions and the data satisfying the query condition in the target index partition avoiding the prior art
  • the problem that the response delay is long after the query result of the M index partitions is responded to, and the response delay of the feedback query result is effectively reduced, thereby improving the user experience;
  • the data query is performed in the partial index partitions of the M index partitions, and the data query of the M index partitions must be performed in the prior art.
  • the embodiment of the present application occupies less computing resources. In this way, the device can be made to perform data query for other query conditions in other index partitions in the M index partitions, which increases query concurrency.
  • processing unit 310 is specifically configured to:
  • processing unit 310 is further configured to:
  • the processing unit 310 is specifically configured to: select, according to load information of the M computing nodes, N target computing nodes with the smallest load from the M computing nodes, where the index partitions corresponding to the N target computing nodes are the N targets Index partition.
  • the apparatus for sampling query determines the target index partition by the sampling query ratio and the load information of the computing node corresponding to the query condition, and the load of the target computing node to which the target index partition belongs is less than or equal to the first.
  • the load threshold can make the load of the computing nodes including the target computing node relatively balanced, and the data query through the target index partition can further improve the query speed.
  • processor 310 is further configured to:
  • the resource consumption value is added to the load of the N target computing nodes to obtain updated load information of the N target computing nodes.
  • the apparatus for sampling query provided by the embodiment of the present application generates the updated resource value by adding the resource consumption value of the query condition to the load of the target computing node to which the target index partition belongs before querying the data in the target index partition.
  • the load information of the target computing node can enable the device to provide real and effective data when selecting the target index partition for subsequent query conditions, so as to make the load of the computing node tend to be uniform as much as possible, which is beneficial to reduce the response delay.
  • processing unit 310 is specifically configured to:
  • the resource consumption values of all the query nodes corresponding to the at least one participle are summed to obtain the resource consumption value of the query condition.
  • the method further includes:
  • the resource consumption value is subtracted from the load indicated by the updated load information of the N target computing nodes.
  • the apparatus for sampling query provided by the embodiment of the present application releases the resource consumption value of the query condition by subtracting the resource consumption value of the query condition from the load of the target computing node to which the target index partition belongs after querying the data in the target index partition.
  • the computing resource can make the device provide real and effective data when selecting the target index partition for subsequent query conditions, so as to make the load of the computing node tend to be uniform as much as possible, which is beneficial to reduce the response delay.
  • the apparatus 300 may correspond to (eg, may be configured or be itself) a device (eg, a storage device) of the sample query described in the method 200 above, and each module or unit in the device 300 is configured to perform the method 200 described above, respectively.
  • a device eg, a storage device
  • each module or unit in the device 300 is configured to perform the method 200 described above, respectively.
  • the detailed description of each action or process performed by the device in the sample query is omitted here.
  • the device 300 may be a device for sampling inquiry (for example, a storage device), and FIG. 5 shows a schematic structural diagram of the device 400 for sampling inquiry according to an embodiment of the present application.
  • the device 400 for sampling query may include a processor 410 and a memory 420, and the processor 410 and the memory 420 are communicatively coupled.
  • the memory 420 can be used to store instructions for executing instructions stored by the memory 420.
  • the processing unit 310 in the apparatus 300 shown in FIG. 4 may correspond to the processor 410 in the device 400 of the sample query shown in FIG.
  • the device 300 may be a chip (or a chip system) installed in a device (for example, a storage device) for sampling inquiry
  • FIG. 6 shows an outline of a chip according to an embodiment of the present application.
  • the chip 500 can include a processor 510 and a memory 520 that are interconnected by internal connection paths.
  • the processor 510 is configured to execute code in the memory 520. When the code is executed, the processor 510 can implement the method 200 performed by the device of the sample query in the method embodiment. For the sake of brevity, it will not be repeated here.
  • the processing unit 310 in the device 300 shown in FIG. 4 may correspond to the processor 510 in the chip 500 shown in FIG. 6.
  • the processor may be an integrated circuit chip with signal processing capabilities.
  • each step of the foregoing method embodiment may be completed by an integrated logic circuit of hardware in a processor or an instruction in a form of software.
  • the processor may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a Field Programmable Gate Array (FPGA), or the like. Programming logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • the methods, steps, and logical block diagrams disclosed in the embodiments of the present application can be implemented or executed.
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory, and the processor reads the information in the memory and combines the hardware to complete the steps of the above method.
  • the memory in the embodiments of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read only memory (PROM), an erasable programmable read only memory (Erasable PROM, EPROM), or an electric Erase programmable read only memory (EEPROM) or flash memory.
  • the volatile memory can be a Random Access Memory (RAM) that acts as an external cache.
  • RAM Random Access Memory
  • many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (Synchronous DRAM).
  • SDRAM Double Data Rate SDRAM
  • DDR SDRAM Double Data Rate SDRAM
  • ESDRAM Enhanced Synchronous Dynamic Random Access Memory
  • SLDRAM Synchronous Connection Dynamic Random Access Memory
  • DR RAM direct memory bus random access memory
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present application which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present application.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种抽样查询的方法和装置。该方法包括:获取查询条件;根据抽样查询比例,从M个索引分区中确定N个目标索引分区,以使得N与M的比值与该抽样查询比例对应,其中,该M个索引分区为根据该查询条件确定的待查询索引分区,M为大于1的整数,N为大于或等于1的整数,并且,N小于M;根据该查询条件在该N个目标索引分区中查询数据,并根据查询到的数据生成查询结果,该查询结果包括以下至少一项:该M个索引分区中满足该查询条件的数据的数量、该N个目标索引分区中满足该查询条件的数据;反馈该查询结果。这样,能够有效减少反馈查询结果的响应时延,提高用户体验。

Description

一种抽样查询的方法和装置 技术领域
本申请涉及存储领域,更具体地,涉及存储领域中一种抽样查询的方法和装置。
背景技术
在存储系统中,通过索引可以快速地查找数据。在分布式系统中,可以基于数据生成用于查询数据的索引信息。为了提升系统的读写性能,可以将索引信息分别存储至多个索引分区,这样,服务器在接收到客户端发送的查询请求后,可以同时在多个索引分区中查询满足该查询条件的数据,最后将多个索引分区中所有满足该查询条件的数据反馈给客户端。
但是,当查询涉及的查询结果中的内容较多,并且,用户不需要关注所有能够满足查询条件的数据,这种查询方式的响应较长,影响用户体验。
因此,亟需提供一种技术,能够提高用户体验。
发明内容
本申请提供一种抽样查询的方法和装置,能够有效减少反馈查询结果的响应时延,提高用户体验。
第一方面,提供了一种抽样查询的方法,该方法包括:
获取查询条件;
根据抽样查询比例,从M个索引分区中确定N个目标索引分区,以使得N与M的比值与所述抽样查询比例对应,其中,所述M个索引分区为根据所述查询条件确定的待查询索引分区,M为大于1的整数,N为大于或等于1的整数,并且,N小于M;
根据所述查询条件在所述N个目标索引分区中查询数据,并根据查询到的数据生成查询结果,所述查询结果包括以下至少一项:所述M个索引分区中满足所述查询条件的数据的数量、所述N个目标索引分区中满足所述查询条件的数据;
反馈所述查询结果。
因此,本申请实施例提供的抽样查询的方法,一方面,通过抽样查询比例从基于查询条件确定的M个索引分区中确定部分索引分区(即,N个目标索引分区),进而可以仅在该N个目标索引分区中进行数据查询,生成包括该M个索引分区中满足该查询条件的数据的数量和该目标索引分区中满足所述查询条件的数据中的至少一项的查询结果,避免了现有技术中由于必须在该M个索引分区的查询结果响应后才反馈查询结果而导致的响应时延较长的问题,有效的减少了反馈查询结果的响应时延,提高了用户体验;
另一方面,通过在该M个索引分区中的N个目标索引分区中进行数据查询,相比于现有技术中必须在该M个索引分区进行数据查询,本申请实施例占用了较少的计算资源,这样,可以使得设备在该M个索引分区中的其他索引分区中针对其他查询条件进行数据查询,增加了查询并发度。
在一种可选的设计中,所述根据查询到的数据生成查询结果,包括:
确定从所述N个目标索引分区中查询到的数据的数量;
根据确定的数据的数量和所述抽样查询比例,估算所述M个索引分区中满足所述查询条件的数据的数量,所述查询结果包括所述M个索引分区中满足所述查询条件的数据的数量。
在一种可选的设计中,所述方法还包括:
获取所述M个索引分区所属的M个计算节点的负载信息,所述负载信息记录计算节点的负载;以及,
所述根据抽样查询比例,从M个索引分区中确定N个目标索引分区,包括:
根据所述M个计算节点的负载信息,从所述M个计算节点中选择负载最小的N个目标计算节点,所述N个目标计算节点对应的索引分区为所述N个目标索引分区。
因此,本申请实施例提供的抽样查询的方法,通过抽样查询比例和对应于查询条件的计算节点的负载信息,将负载最小的N个目标计算节点中的索引分区确定为N个目标索引分区,可以使得包括该N个目标计算节点在内的计算节点的负载比较均衡,并且,通过该N个目标索引分区进行数据查询,也能更进一步提高查询速度。
在一种可选的设计中,在所述根据查询到的数据生成查询结果之前,所述方法还包括:
计算基于所述查询条件进行数据查询过程中消耗的计算资源的资源消耗值;
在所述N个目标计算节点的负载中加上所述资源消耗值,得到所述N个目标计算节点的更新的负载信息。
因此,本申请实施例提供的抽样查询的方法,设备在根据查询到的数据生成查询结果之前,通过在N个目标计算节点的负载中加上查询条件的资源消耗值,生成N个目标计算节点的更新的负载信息,可以使得设备为后续的其他查询条件选择目标索引分区时提供真实有效的数据,从而尽可能使得计算节点的负载趋于均匀化,有利于减少响应时延。
在一种可选的设计中,所述计算基于所述查询条件进行数据查询过程中消耗的计算资源的资源消耗值,包括:
将所述查询条件分解为至少一个分词,所述至少一个分词中的每个分词对应一个查询节点;
计算每个所述查询节点的资源消耗值;
对所述至少一个分词对应的所有查询节点的资源消耗值进行求和,得到所述查询条件的资源消耗值。
在一种可选的设计中,在所述根据查询到的数据生成查询结果之后,所述方法还包括:
在所述N个目标计算节点的更新的负载信息所指示的负载中减去所述资源消耗值。
因此,本申请实施例提供的抽样查询的方法,设备在根据查询到的数据生成查询结果之后,通过在N个目标计算节点的负载中减去查询条件的资源消耗值,以释放计算资源,可以使得设备为后续的其他查询条件选择目标索引分区时提供真实有效的数据,从而尽可能使得计算节点的负载趋于均匀化,有利于减少响应时延。
在一种可选的设计中,所述查询条件包括所述抽样查询比例。
第二方面,提供了一种抽样查询的装置,用于执行第一方面或第一方面的任意可能的实现方式中的方法。具体地,该装置包括用于执行第一方面或第一方面的任意可能的实现方式中的方法的单元。
第三方面,提供了一种抽样查询的设备,所述设备包括处理器和存储器;所述存储器用于存储计算机执行指令,所述处理器和所述存储器之间通过内部连接通路互相通信。当所述设备运行时,所述处理器执行所述存储器存储的所述计算机执行指令,以使所述设备执行第一方面或第一方面的任意可能的实现方式中的各个过程。
第四方面,提供了一种计算机存储介质,所述计算机存储介质包括计算机执行指令,当计算机的处理器执行所述计算机执行指令时,所述计算机执行上述第一方面或第一方面的任意可能的实现方式中的各个过程。
第五方面,提供了一种芯片,所述芯片包括处理器和存储器,所述处理器用于执行所述存储器存储的指令,当所述指令被执行时,所述处理器可以实现第一方面或第一方面的任意可能的实现方式中的各个过程。
附图说明
图1是适用于本申请实施例的数据存储系统的示意图。
图2是现有技术中基于查询条件在多个索引分区中查询数据的示意图。
图3是根据本申请实施例的抽样查询的方法的示意性流程图。
图4是根据本申请实施例的抽样查询的装置的示意性框图。
图5是根据本申请实施例的抽样查询的装置的示意性结构图。
图6是根据本申请实施例的芯片的示意性结构图。
具体实施方式
下面将结合附图,对本申请中的技术方案进行描述。
图1所示为适用于本申请实施例的数据存储系统的示意图。该数据存储系统100包括终端设备110和抽样查询的设备120,该终端设备可以通过有线或无线网络与设备120连接。
终端设备110具有请求数据查询功能。具体而言,该终端设备110中可以安装具有能够请求数据查询功能的客户端,例如,该客户端可以为浏览器。该终端设备110可以是手机、平板电脑、电子阅读器、个人计算机、车载设备、可穿戴设备等设备。可选地,该终端设备110具有请求数据存储功能。
抽样查询的设备120具有数据查询功能可以基于来自终端设备110发送的查询请求进行数据查询。该抽样查询的设备120可以为计算设备、存储设备或服务器等用于查询数据的设备。可选地,该抽样查询的设备120具有数据存储功能,该抽样查询的设备120中设置的数据库用于存储数据。可选地,数据库可以为HBase、Mongo数据库(Mongo Database,Mongo DB)、分布型关系数据库服务(Distribute Relational Database Service,DRDS)、Volt数据库(Volt Database,VoltDB)、和Cassandra等分布式数据库。例如,该抽样查询的设备120可以基于用户通过该终端设备110的客户端发送的数据存储请求通过数据库存储数据。
应理解,图1所示的数据存储系统仅为示意性说明,不应对本申请实施例构成限定。
例如,数据存储系统可以仅包括抽样查询的设备120,该抽样查询的设备120不仅具有查询功能也具有请求数据查询功能。其中,该抽样查询的设备120可以通过抽样查询的设备120中的客户端接收用户输入的查询条件。
为了描述方便,以抽样查询的设备120为存储设备为例来描述本申请实施例。
下面,结合表1和表2首先对本申请实施例中的索引分区做一简单介绍。
如背景技术所述,为了提升系统的读写性能,可以将基于一个数据表中的数据生成的索引信息分别存储至多个索引分区,每个索引分区中的索引信息是对应于数据表中的一部分数据的索引信息,任意两个索引分区的索引信息都不同,多个索引分区存储的索引信息的总和即为数据表对应的完整的索引信息。
索引信息包括多个分词与多个数据标识的对应关系,一个分词对应至少一个数据标识。其中,分词即为对应的数据标识所标识的对象所具备的标签值。
下面,简单介绍数据标识、标签以及标签值。
数据标识所标识的对象可以为任何自然物,例如,数据标识所标识的对象可以是人、车、电话号码、虚拟用户账号等等。
标签是一种互联网内容组织方式,用于表征对象的某一特征进而帮助人们描述和分类内容。比如,常见的标签有性别、学历、职业、颜色等等。通常情况下,标签是人为规定的内容。
对应地,标签值即标签的具体取值。例如,若标签为性别,且性别为男,则标签值为男;若标签为学历,且学历为本科,则标签为学历的标签值为本科。
表1所示为用于表示一个小区内的用户数据的一个数据表。数据表中的第1列为数据标识,第2列至第4列为数据标识所标识的对象所具有的不同标签值。这里,数据标识所标识的对象为小区内的用户,同时,将用户的ID作为数据标识。
以数据标识为A01为例,A01所标识的用户所具有的特性为:性别是男性、学历是本科、职业是学生。
表1
数据标识 性别 学历 职业
A01 本科 学生
A02 专科 个体,网购达人
B01 专科 企业员工
B02 本科 学生
C01 研究生 学生
C02 专科 企业员工
D01 研究生 企业员工,网购达人
D02 专科 企业员工
E01 研究生 个体
E02 专科 个体
F01 研究生 教师
F02 研究生 企业员工
存储设备将表1中的数据的存储完毕后,为了查询方便,可以为表1的数据建立相关索引信息,同时,为了加速查询速度,将索引信息分别存储在多个索引分区中。
例如,假设,将表1中的数据的索引信息分别存储在3个索引分区中,即,索引分区#1、索引分区#2和索引分区#3,每个索引分区存储对应于一部分数据的索引信息。其中:
索引分区#1是A01、A02、B01、B02所标识的数据的索引信息;
索引分区#2是C01、C02、D01、D02所标识的数据的索引信息;
索引分区#3是E01、E02、F01、F02所标识的数据的索引信息。
表2所示为3个索引分区中每个索引分区中的索引信息。
表2
Figure PCTCN2018100561-appb-000001
Figure PCTCN2018100561-appb-000002
现有技术中,以存储设备接收终端设备通过客户端发送的查询请求为例,在一种实现方式中,如图2所示,终端设备的客户端可以直接将查询请求发送至存储设备中对应于查询请求的多个索引分区中,通过每个索引分区的索引信息查询数据,以查找满足查询条件的数据,待将该多个索引分区中查询完毕后,将该多个索引分区中满足该查询条件的数据反馈给终端设备。
这种查询方式中,需要等待所有索引分区针对查询请求的查询结果响应后,才会向终端设备反馈最终的查询结果。但是,在实际处理过程中,由于各个索引分区的处理时间并不相同,最慢的一个索引分区的处理时间决定了反馈最终的查询结果的响应时延。
当查询涉及的查询结果中的内容较多,并且,用户不需要关注所有能够满足查询条件的数据时,这种查询方式的响应较长,影响用户体验。
例如,当查询条件是“地址:深圳”时,满足该查询条件的数据可能会有成千上文条,甚至更多,用户可能仅仅想知道满足该查询条件的数据的数量。
因此,本申请提供了一种抽样查询的方法,有助于减少查询结果的响应时延,提高用户体验。
图3所示为本申请实施例的抽样查询的方法的示意性流程图。该方法200的执行主体可以为抽样查询的设备中的存储设备,具体可以为存储设备内的处理器。下面,分别对图3中的各个步骤进行详细说明。
在步骤S210中,获取查询条件。
具体而言,存储设备可以接收终端设备的客户端发送的查询条件,或者,存储设备也可以直接接收该存储设备的客户端的发送给该存储设备的处理器的查询请求。
该查询条件包括至少一个分词,当该查询条件包括多个分词时,该查询条件还包括用于连接相邻两个分词之间的逻辑运算符,其中,逻辑运算符包括“与”、“非”和“或”。例如,“与”可以表示为“&&”,“非”可以表示为“!”,“或”可以表示为“||”。
例如,查询条件可以为:性别:男&&学历:本科&&职业:学生,即,表示需要查询的对象必须同时满足查询条件中的3个标签值或3个分词。
在步骤S220中,根据抽样查询比例,从M个索引分区中确定N个目标索引分区,以使得N与M的比值与该抽样查询比例对应,其中,该M个索引分区为根据该查询条件确定的待查询索引分区,M为大于1的整数,N为大于或等于1的整数,并且,N小于M;
在步骤S220中,根据抽样查询比例,从M个索引分区中确定N个目标索引分区,以使得N与M的比值与该抽样查询比例对应,其中,该M个索引分区为根据该查询条件确定的待查询索引分区,M为大于1的整数,N为大于或等于1的整数,并且,N小于M。
具体而言,该抽样查询比例为针对该查询条件的一个数值,用于表示预设的进行数据查询的索引分区的数量与多个索引分区的数量的比值。在本申请实施例中,该多个索引分区即为该M个索引分区。这样,该存储设备基于该抽样查询比例,从该M个索引分区中确定进行数据查询的N个目标索引分区。
可选地,该查询条件还包括用于指示基于该查询条件查询的目标数据表的元数据的指示信息。其中,该目标数据表的元数据包括用于指示该目标数据表中的数据的属性信息。例如,该目标数据表的元数据包括用于指示存储该目标数据表的索引信息的索引分区的信息。
这样,存储设备可以基于该查询条件中的指示信息确定该目标数据表的元数据,进而,基于该目标数据表的元数据确定存储该目标数据表的索引信息的M个索引分区。
下面,针对在本申请实施例中基于该抽样查询比例,从该M个索引分区中确定N个目标索引分区的两种方式(即,方式A和方式B)进行详细介绍。
方式A
根据该抽样查询比例,从该M个索引分区中确定进行数据查询的索引分区的数量N;
将该M个索引分区中的N个索引分区作为该N个目标索引分区,其中,该N个目标索引分区可以是该M个索引分区中的任意N个索引分区。
也就是说,存储设备基于该抽样查询比例,确定进行数据查询的索引分区的数量N后,可以在该M个索引分区中任意选择N个索引分区,该N个索引分区即为该N个目标索引分区。
这里,基于该抽样查询比例确定进行数据查询的索引分区的数量有多种方式,下面,分别对每种方式做简单介绍。同时,为了描述方便,将抽样查询比例记为P。
方式1
Figure PCTCN2018100561-appb-000003
其中,
Figure PCTCN2018100561-appb-000004
表示对(M*P)的结果向上取整。
假设,抽样查询比例P=0.3,M为4,则0.3*4=1.2,则
Figure PCTCN2018100561-appb-000005
即目标索引分区的数量为2。
方式2
Figure PCTCN2018100561-appb-000006
其中,
Figure PCTCN2018100561-appb-000007
表示对(M*P)的结果向下取整。
假设,抽样查询比例P=0.3,M为4,则0.3*4=1.2,则
Figure PCTCN2018100561-appb-000008
即目标索引分区的数量为1。
方式3
对(M*P)的结果进行四舍五入。
假设,抽样查询比例P=0.3,M为4,则0.3*4=1.2,对1.2进行四舍五入之后的结果为1,那么,目标索引分区的数量为1。
需要说明的是,当(M*P)的结果小于1时,从实现角度来说,采用方式1,当(M*P)的结果大于或等于1时,可以采用上述3种方式中的任一种方式,本申请实施例不做任何限定。
应理解,上述基于抽样查询比例确定进行数据查询的索引分区的数量的方式仅为示意性说明,不应对本申请实施例构成限定,任何基于抽样查询比例确定进行数据查询的索引分区的数量都在本申请的保护范围内。
方式B
在本申请实施例中,每个索引分区都有对应的计算节点,该M个索引分区对应M个索引分区,一个计算节点可以对应一个索引分区,计算节点用于在对应的索引分区中进行数据查询。
那么,为了使得各个计算节点的负载比较均衡,也为了加快查询速度,本申请还提供了一种可选的实现方式。
获取该M个索引分区所属的M个计算节点的负载信息,该负载信息记录计算节点的负载;
根据该M个计算节点的负载信息,从该M个计算节点中选择负载最小的N个目标计算节点,该N个目标计算节点对应的索引分区为该N个目标索引分区。
其中,计算节点的负载表示计算节点进行数据查询过程中消耗的计算资源,是实时负载。
具体而言,存储设备可以通过上文所述的目标数据表的元数据获取用于指示该M个索引分区所属的计算节点的信息,这样,通过该计算节点的信息确定该M个索引分区属于哪些计算节点,进而,获取计算节点的负载信息;接着,存储设备根据该抽样查询比例确定进行数据查询的索引分区的数量N后,从该M个索引分区所属的计算节点的负载中选择负载最小的N个目标计算节点,从而,该N个目标计算节点对应的索引分区即为该N个目标索引分区。其中,该方式B中根据该抽样查询比例确定进行数据查询的索引分区的数量的具体过程可以参考上述方式A中根据该抽样查询比例确定进行数据查询的索引分区的数量的具体过程,此处不再赘述。
作为示例而非限定,存储设备不仅可以选择M个索引分区所属的计算节点中负载最小的N个目标计算节点对应的索引分区作为该N个目标索引分区,也可以将计算节点小于第一负载阈值的计算节点对应的索引分区中的N个索引分区作为目标索引分区。
可选地,负载小于或等于第一负载阈值的计算节点中的索引分区的个数大于或等于N。
具体而言,存储设备可以基于该抽样查询比例和该M个索引分区所属的计算节点的负载确定该第一负载阈值。即,存储设备基于该抽样查询比例确定进行数据查询的索引分区的个数N,基于该M个索引分区所属的计算节点的负载和N确定该第一负载阈值,其中,负载低于或等于该第一负载阈值的计算节点中的索引分区的个数大于或等于N。这样,能够保证后续选择的目标计算节点中存在N个索引分区。
这样,通过抽样查询比例和对应于查询条件的计算节点的负载信息共同确定目标索引分区,并且目标索引分区所属的目标计算节点的负载小于或等于第一负载阈值,可以使得包括目标计算节点在内的计算节点中的负载比较均衡,并且,通过目标索引分区进行数据查询,也能更进一步提高查询速度。
在本申请实施例中,该抽样查询比例可以是存储设备中预设的值,也可以是存储设备通过终端设备接收到的用于指示该抽样查询比例的信息获取的值。若抽样查询比例是存储设备通过该用于指示该抽样查询比例的信息获取的,该用于指示该抽样查询比例的信息可以与查询条件同时承载于查询请求中。
在步骤S240中,根据该查询条件在该N个目标索引分区中查询数据,并根据查询到的数据生成查询结果,该查询结果包括以下至少一项:该M个索引分区中满足该查询条件的数据的数量、该N个目标索引分区中满足该查询条件的数据。
具体而言,如前所述,每个索引分区中存储有索引信息,一个索引分区中的索引信息包括多个分词与多个数据标识之间的对应关系,一个分词对应至少一个数据标识。这 样,存储设备在确定该查询条件以及该目标索引分区后,可以根据该目标索引分区中的索引信息查询满足该查询条件的数据,以生成查询结果。
下面,以该目标索引分区为一个索引分区为例,针对存储设备根据该查询条件在该目标索引分区中查询数据的具体过程进行说明。
首先,将该查询条件解析为数据存储系统能够理解的查询语法树,其中,查询语法树中包括多个查询节点,每个查询节点包括一个分词和用于表示当前查询节点的分词与下一个查询节点的分词之间的逻辑关系的逻辑运算符。这里,将查询条件解析为查询语法树的具体实现过程可以参考现有技术的描述,此处不再赘述。
其次,根据多个查询节点中的多个分词,通过目标索引分区中的索引信息查询满足每个分词的数据;
再次,针对满足多个分词的数据进行逻辑运算,最终确定满足该查询条件的数据。
需要说明的是,实际实现过程中,满足该查询条件的数据的内容可以通过对应的数据标识查找,因此,满足该查询条件的数据,也可以理解为满足该查询条件的数据标识。当然,满足该查询条件的数据也可以是满足该查询条件的数据的内容。
继续以表2为例,假设,该目标索引分区为索引分区#1,查询条件为:{性别:男&&学历:本科&&职业:学生},对上述根据该查询条件在该目标索引分区中查询数据的过程做进一步说明。
基于该查询条件计算的查询语法树中的查询节点包括3个查询节点:查询节点#1为{性别:男,与},分词为“性别:男”,逻辑运算符为“与”;查询节点#2为{学历:本科,与},分词为“学历:本科”,逻辑运算符为“与”;查询节点#3为{职业:学生,与},分词为“职业:学生”,逻辑运算符为“与”。
通过索引分区#1中的索引信息查询满足每个分词的数据,其中,满足分词“性别:男”的数据标识包括:A01、B01,满足分词“学历:本科”的数据包括:A01、B02,满足分词“职业:学生”的数据标识包括:A01、B02。
对上述3个分词对应的数据标识之间进行“与”运算,最终确定数据标识为A01的对象的数据满足该查询条件。
若查询结果包括满足该查询条件的数据,可以通过上述方式将在目标索引分区中查询的满足该查询条件的数据作为该查询结果。
若查询结果包括该M个索引分区中满足该查询条件的数据的数量,可以通过下述方式确定该M个索引分区中满足该查询条件的数据的数量。
可选地,该根据查询到的数据生成查询结果,包括:
确定从该N个目标索引分区中查询到的数据的数量;
根据确定的数据的数量和该抽样查询比例,估算该M个索引分区中满足该查询条件的数据的数量,该查询结果包括该M个索引分区中满足该查询条件的数据的数量。
该根据该查询条件在该目标索引分区中查询数据,以生成查询结果,包括:
根据该查询条件查询该目标索引分区中满足该查询条件的数据的数量;
根据该目标索引分区中满足该查询条件的数据的数量和第一数值L,计算该M个索引分区中满足该查询条件的数据的数量,其中,L=N/M。
具体而言,存储设备通过上文中所描述的根据该查询条件在该目标索引分区中查询 数据的方式确定满足该查询条件的数据的数量,将该目标索引分区中满足该查询条件的数据的数量与L相除,获得该M个索引分区中满足该查询条件的数据的数量(为了便于区分,记为Q)。
例如,抽样查询比例为0.3,M=6,通过上文所述的方式1确定的N=2,那么,L=2/6=1/3,该目标索引分区中满足该查询条件的数据的数量为20条,那么该M个索引分区中满足该查询条件的数据的数量Q=20/(1/3)=60。
作为示例而非限定,上述确定该M个索引分区中满足该查询条件的数据的数量仅为示意性说明,不应对本申请构成限定。例如,存储设备也可以直接将该目标索引分区中满足该查询条件的数据的数量与该抽样查询比例相除,从而估算该M个索引分区中满足该查询条件的数据的数量。
进而,在S250中,向该终端设备反馈该查询结果。
应理解,该查询结果包括该目标索引分区中满足该查询条件的数据和该M个索引分区中满足该查询条件的数据的数量中的至少一项。
因此,本申请实施例提供的一种抽样查询的方法,一方面,通过抽样查询比例从对应于查询条件的M个索引分区中确定部分索引分区(即,目标索引分区),进而可以仅在该目标索引分区中进行数据查询,生成包括该M个索引分区中满足该查询条件的数据的数量和该目标索引分区中满足所述查询条件的数据中的至少一项的查询结果,避免了现有技术中由于必须在该M个索引分区的查询结果响应后才反馈查询结果而导致的响应时延较长的问题,有效的减少了反馈查询结果的响应时延,提高了用户体验;
另一方面,通过在该M个索引分区中的部分索引分区中进行数据查询,比于现有技术中必须在该M个索引分区进行数据查询,本申请实施例占用了较少的计算资源,这样,可以使得存储设备在该M个索引分区中的其他索引分区中针对其他查询条件进行数据查询,增加了查询并发度。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
存储设备在该目标索引分区中查询数据之前,为了使得存储设备为后续的其他查询条件选择目标索引分区时提供真实有效的数据,从而尽可能使得索引分区所属的计算节点的负载趋于均匀化,有利于减少响应时延,本申请实施例还提供了一种可选的实现方式。
在该根据查询到的数据生成查询结果之前,该方法还包括:
计算基于该查询条件进行数据查询过程中消耗的计算资源的资源消耗值;
在该N个目标计算节点的负载中加上该资源消耗值,得到该N个目标计算节点的更新的负载信息。
即,对查询条件将要消耗的计算资源的资源消耗值进行估值计算,在该目标计算节点的负载中加上该资源消耗值,更新该目标计算节点的负载。
或者,也可以这么理解,在目标索引分区中查询数据之前,及时更新目标计算节点的负载,为存储设备在该目标索引分区中查询数据预留有效的计算资源。
同理,当在目标索引分区中查询完数据后,为了使得存储设备为后续的其他查询条 件选择目标索引分区提供真实有效的数据,需要释放之前预留的计算资源,因此,本申请实施例还提供了一种可选的实现方式。
在该根据该查询条件在该目标索引分区中查询数据,以生成查询结果之后,该方法还包括:
在该N个目标计算节点的更新的负载信息所指示的负载中减去该资源消耗值。
下面,针对本申请实施例中计算基于该查询条件进行数据查询过程中消耗的计算资源的资源消耗值的过程做一说明。
可选地,该计算基于该查询条件进行数据查询过程中消耗的计算资源的资源消耗值,包括:
将该查询条件分解为至少一个分词,该至少一个分词中的每个分词对应一个查询节点;
计算每个该查询节点的资源消耗值;
对该至少一个分词对应的所有查询节点的资源消耗值进行求和,得到该查询条件的资源消耗值。
具体而言,将查询条件解析为查询语法树,查询语法树中包括多个查询节点,每个查询节点包括一个分词和用于表示当前查询节点的分词与下一个查询节点的分词之间的逻辑关系的逻辑运算符,可以通过第一公式计算每个查询节点的资源消耗值以及对该至少一个查询节点的资源消耗值求和,从而获得该查询条件的资源消耗值。
其中,第一公式可以为:
Figure PCTCN2018100561-appb-000009
(TagK)表示一个分词,例如,上文所述的“学历:本科”、“性别:男”以及“职业:学生”等;Cost(TagK)表示针对分词(TagK)的资源消耗值,Cost(TagK)的值和分词(TagK)对应的数据标识的大小以及分词对应的逻辑运算符有关;
Figure PCTCN2018100561-appb-000010
表示对n个Cost(TagK)求和;T表示针对查询条件的资源消耗值。
例如,继续以查询条件{性别:男&&学历:本科&&职业:学生}为例,基于该查询条件分解的查询节点包括:查询节点#1为{性别:男,与},分词为“性别:男”,逻辑运算符为“与”;查询节点#2为{学历:本科,与},分词为“学历:本科”,逻辑运算符为“与”;查询节点#3为{职业:学生,与},分词为“职业:学生”,逻辑运算符为“与”。
基于Cost(TagK),计算得到的每个查询节点的资源消耗值如下:
“学历:本科”在“与”条件下的计算资源消耗值:200;
“性别:男”在“AND”条件下的计算资源消耗值:1000
“职业:学生”在“AND”条件下的计算资源消耗值:150。
这样,针对该查询条件的资源消耗值为T=200+1000+150=1350。
因此,本申请实施例提供的抽样查询的方法,一方面,通过抽样查询比例从对应于查询条件的M个索引分区中确定部分索引分区(即,目标索引分区),进而可以仅在该目标索引分区中进行数据查询,生成包括该M个索引分区中满足该查询条件的数据的数量和该目标索引分区中满足该查询条件的数据中的至少一项的查询结果,避免了现有技术中由于必须在该M个索引分区的查询结果响应后才反馈查询结果而导致的响应时延较长的问题,有效的减少了反馈查询结果的响应时延,提高了用户体验;
另一方面,通过在该M个索引分区中的部分索引分区中进行数据查询,相比于现有技术中必须在该M个索引分区进行数据查询,本申请实施例占用了较少的计算资源,这 样,可以使得存储设备在该M个索引分区中的其他索引分区中针对其他查询条件进行数据查询,增加了查询并发度;
另一方面,通过抽样查询比例和对应于查询条件的计算节点的负载信息共同确定目标索引分区,并且目标索引分区所属的目标计算节点的负载小于或等于第一负载阈值,可以使得包括目标计算节点在内的计算节点的负载比较均衡,并且,通过目标索引分区进行数据查询,也能更进一步提高查询速度;
另一方面,存储设备在目标索引分区中查询数据之前,通过在该目标索引分区所属的目标计算节点的负载中加上查询条件的资源消耗值,生成更新的目标计算节点的负载信息,可以使得存储设备为后续的其他查询条件选择目标索引分区时提供真实有效的数据,从而尽可能使得计算节点的负载趋于均匀化,有利于减少响应时延。
再一方面,存储设备在目标索引分区中查询完数据之后,通过在该目标索引分区所属的目标计算节点的负载中减去查询条件的资源消耗值,以释放计算资源,可以使得该存储设备为后续的其他查询条件选择目标索引分区时提供真实有效的数据,从而尽可能使得计算节点的负载趋于均匀化,有利于减少响应时延。
以上,结合图1至图3详细描述了本申请实施例的抽样查询的方法,下面,结合图4和图5描述根据本申请实施例的抽样查询的装置,方法实施例所描述的技术特征同样适用于以下装置实施例。
图4所示为根据本申请实施例的抽样查询的示意性框图。如图4所示,该装置包括处理单元310,该处理单元310用于:
获取查询条件;
根据抽样查询比例,从M个索引分区中确定N个目标索引分区,以使得N与M的比值与该抽样查询比例对应,其中,该M个索引分区为根据该查询条件确定的待查询索引分区,M为大于1的整数,N为大于或等于1的整数,并且,N小于M;
根据该查询条件在该N个目标索引分区中查询数据,并根据查询到的数据生成查询结果,该查询结果包括以下至少一项:该M个索引分区中满足该查询条件的数据的数量、该N个目标索引分区中满足该查询条件的数据;
反馈该查询结果。
因此,本申请实施例提供的抽样查询的装置,一方面,通过抽样查询比例从对应于查询条件的M个索引分区中确定部分索引分区(即,目标索引分区),进而可以仅在该目标索引分区中进行数据查询,生成包括该M个索引分区中满足该查询条件的数据的数量和该目标索引分区中满足该查询条件的数据中的至少一项的查询结果,避免了现有技术中由于必须在该M个索引分区的查询结果响应后才反馈查询结果而导致的响应时延较长的问题,有效的减少了反馈查询结果的响应时延,提高了用户体验;
另一方面,通过在该M个索引分区中的部分索引分区中进行数据查询,相比于现有技术中必须在该M个索引分区进行数据查询,本申请实施例占用了较少的计算资源,这样,可以使得该装置在该M个索引分区中的其他索引分区中针对其他查询条件进行数据查询,增加了查询并发度。
可选地,该处理单元310具体用于:
确定从该N个目标索引分区中查询到的数据的数量;
根据确定的数据的数量和该抽样查询比例,估算该M个索引分区中满足该查询条件的数据的数量,该查询结果包括该M个索引分区中满足该查询条件的数据的数量。
可选地,该处理单元310还用于:
获取该M个索引分区所属的M个计算节点的负载信息,该负载信息记录计算节点的负载;以及,
该处理单元310具体用于:根据该M个计算节点的负载信息,从该M个计算节点中选择负载最小的N个目标计算节点,该N个目标计算节点对应的索引分区为该N个目标索引分区。
因此,本申请实施例提供的抽样查询的装置,通过抽样查询比例和对应于查询条件的计算节点的负载信息共同确定目标索引分区,并且目标索引分区所属的目标计算节点的负载小于或等于第一负载阈值,可以使得包括目标计算节点在内的计算节点的负载比较均衡,并且,通过目标索引分区进行数据查询,也能更进一步提高查询速度。
可选地,该处理器310还用于:
计算基于该查询条件进行数据查询过程中消耗的计算资源的资源消耗值;
在该N个目标计算节点的负载中加上该资源消耗值,得到该N个目标计算节点的更新的负载信息。
因此,本申请实施例提供的抽样查询的装置,该装置在目标索引分区中查询数据之前,通过在该目标索引分区所属的目标计算节点的负载中加上查询条件的资源消耗值,生成更新的目标计算节点的负载信息,可以使得该装置为后续的其他查询条件选择目标索引分区时提供真实有效的数据,从而尽可能使得计算节点的负载趋于均匀化,有利于减少响应时延。
可选地,该处理单元310具体用于:
将该查询条件分解为至少一个分词,该至少一个分词中的每个分词对应一个查询节点;
计算每个该查询节点的资源消耗值;
对该至少一个分词对应的所有查询节点的资源消耗值进行求和,得到该查询条件的资源消耗值。
在一种可选的设计中,在该根据该查询条件在该目标索引分区中查询数据,以生成查询结果之后,该方法还包括:
在该N个目标计算节点的更新的负载信息所指示的负载中减去该资源消耗值。
因此,本申请实施例提供的抽样查询的装置,该装置在目标索引分区中查询完数据之后,通过在该目标索引分区所属的目标计算节点的负载中减去查询条件的资源消耗值,以释放计算资源,可以使得该装置为后续的其他查询条件选择目标索引分区时提供真实有效的数据,从而尽可能使得计算节点的负载趋于均匀化,有利于减少响应时延。
该装置300可以对应(例如,可以配置于或本身即为)上述方法200中描述的抽样查询的设备(例如,存储设备),并且,该装置300中各模块或单元分别用于执行上述方法200中抽样查询的设备所执行的各动作或处理过程,这里,为了避免赘述,省略其详细说明。
在本申请实施例中,该装置300可以为抽样查询的设备(例如,存储设备),图5 示出了根据本申请实施例的抽样查询的设备400的示意性结构图。如图5所示,该抽样查询的设备400可以包括:处理器410和存储器420,处理器410和存储器420通信连接。该存储器420可以用于存储指令,该处理器410用于执行该存储器420存储的指令。
此种情况下,图4所示的装置300中的处理单元310可以对应图5所示的抽样查询的设备400中的处理器410。
在本申请实施例中,该装置300可以为安装在抽样查询的设备(例如,存储设备)中的芯片(或者说,芯片系统),图6示出了根据本申请实施例的芯片的示意性结构图。该芯片500可以包括:处理器510和存储器520,该处理器510以及存储器520之间通过内部连接通路互相连接。该处理器510用于执行该存储器520中的代码。当该代码被执行时,该处理器510可以实现方法实施例中由抽样查询的设备执行的方法200。为了简洁,这里不再赘述。
此情况下,图4所示的装置300中的处理单元310可以对应图6所示的芯片500中的处理器510。
应注意,本申请实施例上述方法实施例可以应用于处理器中,或者由处理器实现。处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元 及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。

Claims (16)

  1. 一种抽样查询的方法,其特征在于,所述方法包括:
    获取查询条件;
    根据抽样查询比例,从M个索引分区中确定N个目标索引分区,以使得N与M的比值与所述抽样查询比例对应,其中,所述M个索引分区为根据所述查询条件确定的待查询索引分区,M为大于1的整数,N为大于或等于1的整数,并且,N小于M;
    根据所述查询条件在所述N个目标索引分区中查询数据,并根据查询到的数据生成查询结果,所述查询结果包括以下至少一项:所述M个索引分区中满足所述查询条件的数据的数量、所述N个目标索引分区中满足所述查询条件的数据;
    反馈所述查询结果。
  2. 根据权利要求1所述的方法,其特征在于,所述根据查询到的数据生成查询结果,包括:
    确定从所述N个目标索引分区中查询到的数据的数量;
    根据确定的数据的数量和所述抽样查询比例,估算所述M个索引分区中满足所述查询条件的数据的数量,所述查询结果包括所述M个索引分区中满足所述查询条件的数据的数量。
  3. 根据权利要求1或2所述的方法,其特征在于,所述方法还包括:
    获取所述M个索引分区所属的M个计算节点的负载信息,所述负载信息记录计算节点的负载;以及,
    所述根据抽样查询比例,从M个索引分区中确定N个目标索引分区,包括:
    根据所述M个计算节点的负载信息,从所述M个计算节点中选择负载最小的N个目标计算节点,所述N个目标计算节点对应的索引分区为所述N个目标索引分区。
  4. 根据权利要求3所述的方法,其特征在于,在所述根据查询到的数据生成查询结果之前,所述方法还包括:
    计算基于所述查询条件进行数据查询过程中消耗的计算资源的资源消耗值;
    在所述N个目标计算节点的负载中加上所述资源消耗值,得到所述N个目标计算节点的更新的负载信息。
  5. 根据权利要求4所述的方法,其特征在于,所述计算基于所述查询条件进行数据查询过程中消耗的计算资源的资源消耗值,包括:
    将所述查询条件分解为至少一个分词,所述至少一个分词中的每个分词对应一个查询节点;
    计算每个所述查询节点的资源消耗值;
    对所述至少一个分词对应的所有查询节点的资源消耗值进行求和,得到所述查询条件的资源消耗值。
  6. 根据权利要求4或5所述的方法,其特征在于,在所述根据查询到的数据生成查询结果之后,所述方法还包括:
    在所述N个目标计算节点的更新的负载信息所指示的负载中减去所述资源消耗值。
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述查询条件包括所述抽样查询比例。
  8. 一种抽样查询的装置,其特征在于,所述装置包括处理单元,所述处理单元用于:
    获取查询条件;
    根据抽样查询比例,从M个索引分区中确定N个目标索引分区,以使得N与M的比值与所述抽样查询比例对应,其中,所述M个索引分区为根据所述查询条件确定的待查询索引分区,M为大于1的整数,N为大于或等于1的整数,并且,N小于M;
    根据所述查询条件在所述N个目标索引分区中查询数据,并根据查询到的数据生成查询结果,所述查询结果包括以下至少一项:所述M个索引分区中满足所述查询条件的数据的数量、所述N个目标索引分区中满足所述查询条件的数据;
    反馈所述查询结果。
  9. 根据权利要求8所述的装置,其特征在于,所述处理单元具体用于:
    确定从所述N个目标索引分区中查询到的数据的数量;
    根据确定的数据的数量和所述抽样查询比例,估算所述M个索引分区中满足所述查询条件的数据的数量,其中,所述抽样查询比例为N与M的比值,所述查询结果包括所述M个索引分区中满足所述查询条件的数据的数量。
  10. 根据权利要求8或9所述的装置,其特征在于,所述处理单元还用于:
    获取所述M个索引分区所属的M个计算节点的负载信息,所述负载信息记录计算节点的负载;以及,
    所述处理单元具体用于:
    根据所述M个计算节点的负载信息,从所述M个计算节点中选择负载最小的N个目标计算节点,所述N个目标计算节点对应的索引分区为所述N个目标索引分区。
  11. 根据权利要求10所述的装置,其特征在于,所述处理单元还用于:
    计算基于所述查询条件进行数据查询过程中消耗的计算资源的资源消耗值;
    在所述N个目标计算节点的负载中加上所述资源消耗值,得到所述N个目标计算节点的更新的负载信息。
  12. 根据权利要求11所述的装置,其特征在于,所述处理单元具体用于:
    将所述查询条件分解为至少一个分词,所述至少一个分词中的每个分词对应一个查询节点;
    计算每个所述查询节点的资源消耗值;
    对所述至少一个分词对应的所有查询节点的资源消耗值进行求和,得到所述查询条件的资源消耗值。
  13. 根据权利要求11或12所述的装置,其特征在于,所述处理单元还用于:
    在所述N个目标计算节点的更新的负载信息所指示的负载中减去所述资源消耗值。
  14. 根据权利要求8至13中任一项所述的装置,其特征在于,所述查询条件包括所述抽样查询比例。
  15. 一种抽样查询的设备,其特征在于,所述设备包括:
    存储器,用于存储指令;
    处理器,用于执行所述存储器存储的指令,并且,当所述处理器执行所述存储器存储的指令时,使得所述设备执行如权利要求1至7中任一项所述的方法。
  16. 一种计算机存储介质,其特征在于,包括计算机执行指令,当计算机的处理器执行所述计算机执行指令时,所述计算机执行权利要求1至7中任一项所述的方法。
PCT/CN2018/100561 2018-02-28 2018-08-15 一种抽样查询的方法和装置 WO2019165762A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810168880.1A CN108491262A (zh) 2018-02-28 2018-02-28 一种抽样查询的方法和装置
CN201810168880.1 2018-02-28

Publications (1)

Publication Number Publication Date
WO2019165762A1 true WO2019165762A1 (zh) 2019-09-06

Family

ID=63341166

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/100561 WO2019165762A1 (zh) 2018-02-28 2018-08-15 一种抽样查询的方法和装置

Country Status (2)

Country Link
CN (1) CN108491262A (zh)
WO (1) WO2019165762A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117785930A (zh) * 2022-09-21 2024-03-29 华为云计算技术有限公司 一种数据查询方法和云服务系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239747A1 (en) * 2006-03-29 2007-10-11 International Business Machines Corporation Methods, systems, and computer program products for providing read ahead and caching in an information lifecycle management system
CN101477542A (zh) * 2009-01-22 2009-07-08 阿里巴巴集团控股有限公司 一种抽样分析方法、系统和设备
CN101739410A (zh) * 2008-11-24 2010-06-16 华为技术有限公司 运算结果展现的方法、装置和系统
US20110066606A1 (en) * 2009-09-15 2011-03-17 International Business Machines Corporation Search engine with privacy protection
CN104391913A (zh) * 2014-11-18 2015-03-04 北京锐安科技有限公司 一种数据库管理方法及装置
CN106383860A (zh) * 2016-08-31 2017-02-08 无锡雅座在线科技发展有限公司 数据查询方法及装置

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107329801B (zh) * 2017-06-29 2020-12-15 深信服科技股份有限公司 一种节点管理方法及装置、多子星服务器

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070239747A1 (en) * 2006-03-29 2007-10-11 International Business Machines Corporation Methods, systems, and computer program products for providing read ahead and caching in an information lifecycle management system
CN101739410A (zh) * 2008-11-24 2010-06-16 华为技术有限公司 运算结果展现的方法、装置和系统
CN101477542A (zh) * 2009-01-22 2009-07-08 阿里巴巴集团控股有限公司 一种抽样分析方法、系统和设备
US20110066606A1 (en) * 2009-09-15 2011-03-17 International Business Machines Corporation Search engine with privacy protection
CN104391913A (zh) * 2014-11-18 2015-03-04 北京锐安科技有限公司 一种数据库管理方法及装置
CN106383860A (zh) * 2016-08-31 2017-02-08 无锡雅座在线科技发展有限公司 数据查询方法及装置

Also Published As

Publication number Publication date
CN108491262A (zh) 2018-09-04

Similar Documents

Publication Publication Date Title
WO2017215370A1 (zh) 构建决策模型的方法、装置、计算机设备及存储设备
US9672272B2 (en) Method, apparatus, and computer-readable medium for efficiently performing operations on distinct data values
CN112287182B (zh) 图数据存储、处理方法、装置及计算机存储介质
WO2019052209A1 (zh) 数据存储方法、装置及存储介质
TWI677828B (zh) 基於資料源的業務客製裝置、方法及電腦可讀儲存介質
WO2019024060A1 (zh) 数据存储方法、装置和存储介质
WO2019061991A1 (zh) 多元通用模型平台建模方法、电子设备及计算机可读存储介质
CN108897874B (zh) 用于处理数据的方法和装置
WO2021043064A1 (zh) 社区发现方法、装置、计算机设备和存储介质
WO2017005094A1 (zh) 一种数据查询方法和装置
CN111651641B (zh) 一种图查询方法、装置及存储介质
WO2017161540A1 (zh) 数据查询的方法、数据对象的存储方法和数据系统
CN109656986A (zh) 一种业务数据汇总的辅助方法、装置及电子设备
WO2024174305A1 (zh) 一种基于预计算场景的查询处理方法及其装置
US20220417324A1 (en) Computer-implemented method, system, and storage medium for prefetching in a distributed graph architecture
CN111475736A (zh) 社区挖掘的方法、装置和服务器
CN112328592A (zh) 数据存储方法、电子设备及计算机可读存储介质
WO2019165762A1 (zh) 一种抽样查询的方法和装置
EP3743821A1 (en) Wide key hash table for a graphics processing unit
WO2016155384A1 (zh) 一种搜索优化方法、装置和系统
CN116795995A (zh) 知识图谱构建方法、装置、计算机设备和存储介质
CN116304079A (zh) 基于时序的图谱数据管理方法、设备和可读存储介质
CN115733787A (zh) 一种网络识别方法、装置、服务器及存储介质
CN110781354B (zh) 一种对象选择方法、装置、系统及计算设备
CN108229572B (zh) 一种参数寻优方法及计算设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18907528

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18907528

Country of ref document: EP

Kind code of ref document: A1