WO2019141134A1 - 一种数据查询方法、装置及设备 - Google Patents

一种数据查询方法、装置及设备 Download PDF

Info

Publication number
WO2019141134A1
WO2019141134A1 PCT/CN2019/071357 CN2019071357W WO2019141134A1 WO 2019141134 A1 WO2019141134 A1 WO 2019141134A1 CN 2019071357 W CN2019071357 W CN 2019071357W WO 2019141134 A1 WO2019141134 A1 WO 2019141134A1
Authority
WO
WIPO (PCT)
Prior art keywords
data set
query request
context information
data
query
Prior art date
Application number
PCT/CN2019/071357
Other languages
English (en)
French (fr)
Inventor
杜敬兵
周祥
马文博
吉剑南
占超群
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Priority to JP2020539715A priority Critical patent/JP2021511588A/ja
Priority to EP19741895.7A priority patent/EP3742306A4/en
Publication of WO2019141134A1 publication Critical patent/WO2019141134A1/zh
Priority to US16/932,596 priority patent/US11734271B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2453Query optimisation
    • G06F16/24534Query rewriting; Transformation
    • G06F16/24539Query rewriting; Transformation using cached or materialised query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues

Definitions

  • the present application relates to the field of Internet technologies, and in particular, to a data query method, apparatus, and device.
  • the analytical database is a real-time computing engine that can analyze and count arbitrary data in any dimension, and supports high concurrency, low latency (millisecond response), real-time online analysis, and massive data query.
  • the analytical database can store a large amount of data. After receiving the query request sent by the client, the data corresponding to the query request can be queried, and the queried data is returned to the client.
  • multiple query requests may be received in a short period of time (ie, the number of concurrent calls is high), and the amount of data corresponding to each query request is relatively large. Large, in this case, it is necessary to process multiple query requests in a short period of time, and return a large amount of data for each query request, resulting in abnormalities in CPU (Central Processing Unit) resources, memory resources, network bandwidth, etc., resulting in a client.
  • CPU Central Processing Unit
  • the application provides a data query method, which is applied to a front-end server, and the method includes:
  • the application provides a data query device, which is applied to a front-end server, and the device includes:
  • a receiving module configured to receive a query request sent by the client
  • a determining module configured to determine resource occupation information of a data set corresponding to the query request
  • An obtaining module configured to acquire a data set corresponding to the query request when determining, according to the resource occupation information, that the query request is cached;
  • a storage module configured to store the data set in an external memory
  • a sending module configured to read a data set from the external storage and send the data set to the client.
  • the application provides a front-end server, and the front-end server includes:
  • a receiver configured to receive a query request sent by the client
  • a processor configured to determine resource occupation information of a data set corresponding to the query request; if it is determined to cache the query request according to the resource occupation information, acquiring a data set corresponding to the query request, in an external memory Storing the data set; reading the data set from an external memory;
  • a transmitter for transmitting the data set to a client for transmitting the data set to a client.
  • the data set after receiving the query request, the data set is not directly sent to the client, but the query request and the data set are first stored in the external memory, so that the data corresponding to the query request may be
  • the cache is localized, so that multiple query requests can be processed in a short period of time, that is, to avoid returning multiple data sets to the client in a short time, thereby reducing CPU resource, memory resources, and network bandwidth usage, and avoiding client query timeout or failure. Improve user experience.
  • FIG. 1 is a schematic structural diagram of a system in an embodiment of the present application.
  • 2A is a schematic diagram of a query request processing in an implementation manner of the present application.
  • 2B is a schematic diagram of a query request processing in another embodiment of the present application.
  • FIG. 3 is a flowchart of a data query method in an implementation manner of the present application.
  • 4A is a flowchart of a data query method in another embodiment of the present application.
  • 4B is a flowchart of a data query method in still another embodiment of the present application.
  • 4C is a flowchart of a data query method in still another embodiment of the present application.
  • FIG. 5A is a schematic diagram of an effect in an embodiment of the present application.
  • FIG. 5B is a schematic diagram of effects in another embodiment of the present application.
  • 5C is a schematic diagram of effects in still another embodiment of the present application.
  • FIG. 5D is a schematic diagram of effects in still another embodiment of the present application.
  • 5E is a schematic diagram of effects in still another embodiment of the present application.
  • FIG. 6 is a structural diagram of a data query device in an embodiment of the present application.
  • first, second, third, etc. may be used to describe various information in this application, such information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other.
  • first information may also be referred to as the second information without departing from the scope of the present application.
  • second information may also be referred to as the first information.
  • word “if” may be interpreted to mean “at time” or "when” or "in response to determination.”
  • the embodiment of the present application provides a data query method, which can be applied to a system including a client, a front-end server (also referred to as a front node), a computing server (also referred to as a compute node), and a database.
  • a front-end server also referred to as a front node
  • a computing server also referred to as a compute node
  • a database e.g., a database that implements an analytical database (such as Analytic DB, or ADS for short).
  • a system that implements an analytical database (such as Analytic DB, or ADS for short).
  • a system that implements an analytical database (such as Analytic DB, or ADS for short).
  • other servers such as a resource scheduling server, may also be included, and no limitation is imposed on this.
  • the application scenario of the embodiment of the present application may be one or more of the number of the client, the front-end server, the computing server, and the database.
  • the number of clients, front-end servers, computing servers, and databases can also be other values, and no limitation is imposed on them.
  • the analytical database is a real-time computing engine that can analyze and count arbitrary data in any dimension, and supports high concurrency, low latency, real-time online analysis, and massive data query.
  • the client is used to query data from the database, such as an application (application) on a terminal device (such as a PC (Personal Computer), a notebook computer, a mobile terminal, etc.), or may be a terminal device.
  • a terminal device such as a PC (Personal Computer), a notebook computer, a mobile terminal, etc.
  • the browser does not impose specific restrictions on the type of this client.
  • the database is used to store various types of data, and can provide data stored in the database to the client.
  • the type of the data stored in the database is not limited in the embodiment of the present application, and may be user data, product data, map data, video data, image data, audio data, and the like.
  • the front-end server is configured to receive a query request sent by the client, perform SQL (Structured Query Language) parsing on the query request, generate a scheduling request by using the SQL parsing result, and send the scheduling request to the computing server.
  • the scheduling request is for requesting data corresponding to the query request. Receive data returned by the compute server and send the data to the client.
  • the calculation server is configured to receive a scheduling request sent by the front-end server, and use the scheduling request to read data corresponding to the scheduling request from the database, and send the data to the front-end server.
  • the front-end server can initiate multiple threads, each thread for processing a query request. For example, after receiving the query request 1, the front-end server starts the thread 1 for the query request 1, and the thread 1 processes the query request 1, that is, performs the operation of sending the scheduling request to the computing server, and sending the data returned by the computing server to the client. After the data is sent to the client, thread 1 is released.
  • thread 1 is working, if the front-end server receives the query request 2, the thread 2 is started for the query request 2, and the query request 2 is processed by the thread 2.
  • the front-end server receives the query request 3 is started for the query request 3, the query request 3 is processed by the thread 3, and so on.
  • the front-end server can start multiple threads, and multiple threads respectively process multiple query requests.
  • the amount of data corresponding to each query request is large, a large amount of data needs to be returned for each query request, thereby causing abnormalities in CPU resources, memory resources, network bandwidth, etc., resulting in degradation of processing performance of the front-end server, and Causes the client query to time out or fail, affecting the user experience. For example, when a large amount of data is returned for each query request, and the data consumes a large amount of memory resources, there is a problem of memory overflow or frequent memory recovery, which causes the query to time out or fail.
  • the front-end server may start multiple query threads, and each query thread is used to process one query request.
  • the front-end server can initiate a scheduling thread that is used to process query requests in the external memory in turn. For example, after receiving the query request 1, the front-end server starts the query thread 1 for the query request 1, and the query thread 1 processes the query request 1.
  • the query thread 1 is working, if the front-end server receives the query request 2, the query thread 2 is started for the query request 2, and the query request 2 is processed by the query thread 2.
  • the query thread 1 and the query thread 2 work, if the front-end server receives the query request 3, the query request 4, the query request 5, etc., the query thread is no longer started for these query requests, but the query requests are stored in the external memory. .
  • the front-end server can start the scheduling thread, and the scheduling thread first processes the query request 3 in the external memory. After the processing of the query request 3 is completed, the scheduling thread processes the query request 4 in the external memory, and after the processing of the query request 4 is completed, the scheduling thread executes the thread. Process the query request 5 in the external memory, and so on.
  • the front-end server receives multiple query requests in a short period of time (that is, the number of concurrent calls is high), the number of startup threads of the query thread can be controlled, and multiple query requests can be processed in a short time, that is, short-time avoidance is avoided.
  • the inbound client returns a large amount of data, thereby reducing the CPU resources, memory resources, network bandwidth usage, improving processing performance, avoiding client query timeouts or failures, and improving user experience.
  • the method may be applied to a front-end server, and the method may include the following steps:
  • Step 301 Receive a query request sent by a client.
  • Step 302 Determine resource occupation information of a data set corresponding to the query request.
  • Step 303 If it is determined that the query request is cached according to the resource occupation information, obtain a data set corresponding to the query request, and store the data set in an external memory.
  • Step 304 Read a data set from the external memory, and send the data set to the client.
  • the above-described execution order is only an example given for convenience of description. In an actual application, the execution order between the steps may also be changed, and the order of execution is not limited. Moreover, in other embodiments, the steps of the respective methods are not necessarily performed in the order shown and described herein, and the methods may include more or less steps than those described in this specification. In addition, the individual steps described in this specification may be decomposed into a plurality of steps for description in other embodiments; the various steps described in the present specification may be combined into a single step for description in other embodiments.
  • the query request may be sent, and the front-end server receives a query request sent by the client, and the query request is used to request data in the database.
  • the data set corresponding to the query request may be acquired, and the data may be acquired.
  • the set is sent to the client without caching the data set.
  • the process of obtaining the data set corresponding to the query request and storing the data set in the external memory may include, but is not limited to, storing the context information corresponding to the query request in the external memory. Then, the context information is read from the external memory, and the data set corresponding to the query request is obtained using the context information, and the data set is stored in the external memory.
  • the front-end server may directly store the context information corresponding to the query request in the external storage; or, the front-end server may store in the external storage when certain specific conditions are met.
  • the query requests corresponding context information.
  • the storage medium for storing the context information and the data set corresponding to the query request (for the convenience of description, the data finally returned to the client is referred to as a data set, and the data set is introduced in a subsequent process) is an external storage instead of an internal storage. (ie memory) to avoid taking up resources in the internal memory.
  • the external memory refers to a memory other than the memory and the CPU cache, and may include, but is not limited to, a hard disk, a floppy disk, an optical disk, a USB disk, etc., and the type of the external memory is not limited as long as it is different from the internal memory.
  • the context information corresponding to the query request may include, but is not limited to, one or any combination of the following: the query identifier corresponding to the query request, the receiving time of the query request, the user information corresponding to the query request (such as the IP address of the client, etc.), and the query request. Corresponding predicted data amount, the query request. As shown in Table 1, there is no limitation on an example of the context information corresponding to the query request.
  • the query request may correspond to a query identifier, and the query identifier may be unique, that is, the query identifiers of different query requests are different.
  • the front-end server can obtain the receiving time, the user information, the predicted data volume, the query request, and the like.
  • the receiving time refers to the time when the front-end server receives the query request, that is, the front-end server can record the receiving time of the query request when receiving the query request, such as 2017.11.6.14.16.10.
  • the user information may include, but is not limited to, the IP address of the client (the source IP address of the query request) and the user identity information (such as the user name, password, etc., which can be known from the query request), and no limitation is imposed thereon.
  • the amount of predicted data refers to the determined amount of data of the data set corresponding to the query request (ie, the data size of the data set). For example, after receiving the query request 1 , the front-end server may not actually obtain the data set corresponding to the query request 1 , so the data volume of the data set corresponding to the query request 1 may be determined first, and the data volume is the prediction data. The amount of this determination process will be explained in the subsequent process.
  • the query request is a query request received by the front-end server, including all content carried by the query request.
  • the process of "the front-end server can store the context information corresponding to the query request in the external storage when the certain conditions are met" may include, but is not limited to, the following cases:
  • Case 1 is another flowchart of the data query method, and the method includes:
  • Step 411 Determine resource occupation information of the data set corresponding to the query request.
  • Step 412 Determine, according to the resource occupation information, whether to cache the query request.
  • step 413 can be performed; if not, step 414 can be performed.
  • Step 413 Store context information corresponding to the query request in an external memory.
  • the step of reading the context information from the external memory, acquiring the data set corresponding to the query request by using the context information, and storing the data set in the external memory is performed.
  • Step 414 Obtain a data set corresponding to the query request, and send the data set to the client.
  • the process of determining the resource occupation information of the data set corresponding to the query request may include, but is not limited to, the data identifier query mapping table carried by the query request, to obtain the The data identifies the resource occupation information of the data set corresponding to the data identifier; wherein the mapping table is used to record the correspondence between the data identifier and the resource occupation information.
  • the front-end server may record the correspondence between the data identifier carried by the query request and the resource occupation information of the data set in the mapping table after acquiring the data set corresponding to the query request.
  • the front-end server After the front-end server receives the query request for requesting the data set A for the first time, the data identifier (such as the data identifier 1 corresponding to the data set A) carried by the query request is used to query the mapping table, because the mapping table does not exist.
  • the resource identifier 1 corresponds to the resource occupation information. Therefore, the front-end server determines that the resource occupation information is the default resource occupation information 1 (the resource occupation information 1 is pre-configured according to experience, and when the resource occupation information corresponding to the data identifier does not exist in the mapping table) , the resource occupancy information is determined 1).
  • the front-end server may calculate the resource occupation information of the data set, and assume If the resource occupation information is the resource occupation information A, the correspondence between the data identifier 1 and the resource occupation information A can be recorded in the mapping table. As shown in Table 2, it is an example of the mapping table.
  • the front-end server After the front-end server receives the query request for requesting the data set A again (for example, the second time, the third time, and the like), the data identifier 1 corresponding to the data identifier 1 can be obtained by using the data identifier 1 query mapping table carried in the query request.
  • the information A that is, the determination result is the resource occupation information A.
  • the front-end server may collect the resource occupation information of the data set; if the current statistical result is the same as the resource occupation information A, the resource occupation information A is kept unchanged; The statistical result is different from the resource occupation information A, and the current occupation result is used to replace the resource occupation information A.
  • the foregoing resource occupation information may include, but is not limited to, one or any combination of the following: memory usage, CPU usage, and data volume of the data set (ie, the size of the data set).
  • the process of the resource usage information of the statistic data set may include: after obtaining the data set corresponding to the query request, the memory usage rate corresponding to the data set may be counted, that is, how much memory is occupied by the data set, such as 5%. Etc., indicating that this data set occupies 5% of the total memory.
  • the CPU usage of the data set can be counted, that is, how many CPUs are occupied by the data set, such as 5%, indicating that the data set occupies 5% of the total CPU.
  • the amount of data corresponding to the data set that is, the size of the data set, can be counted.
  • the process of determining whether to cache the query request according to the resource occupation information may include, but is not limited to, the following: if the resource occupation information and the current occupied resources of the front-end server are If the resource threshold is greater than or equal to the resource threshold, the query request may be cached. Alternatively, if the sum of the resource occupation information and the current occupied resource of the front-end server is less than the resource threshold, it may be determined that the query request is not cached.
  • the resource threshold may include a memory threshold (such as 85%), a CPU threshold (such as 90%), and a data threshold (such as 200M).
  • the front-end server can count the current memory usage (such as 60%, indicating that 60% of the memory has been used), the current CPU usage (such as 60%, indicating that 60% of the CPU has been used), current bandwidth usage. (For example, 100M, it means that the bandwidth is currently occupied by 100M).
  • the query request is not cached. If the following three conditions satisfy one or more, it is determined that the query request is cached: the sum of the memory usage and the current memory usage is greater than or equal to the memory threshold, and the sum of the CPU usage and the current CPU usage is greater than or equal to the CPU threshold. The sum of the data volume of the data set and the current bandwidth usage is greater than or equal to the data volume threshold.
  • the resource threshold (eg, memory threshold, CPU threshold, data volume threshold, etc.) may be a fixed value configured empirically, ie, the resource threshold does not change.
  • the resource threshold may be dynamically adjusted, that is, after the initial value of the resource threshold is configured according to experience, the resource threshold may be dynamically adjusted. The dynamic adjustment process of the resource threshold is described below in conjunction with the specific embodiment.
  • the resource threshold may be dynamically adjusted, that is, each query request in the external memory triggers the adjustment resource threshold.
  • the resource threshold 1 is used to determine whether to cache the query request 1; if the cache is performed, the context information corresponding to the query request 1 is stored in the external memory; and the data set corresponding to the query request 1 is sent. After giving the client, adjust resource threshold 1 to resource threshold 2.
  • the resource threshold 2 is used to determine whether to cache the query request 2; if not, the context information corresponding to the query request 2 is not stored in the external memory; the data set corresponding to the query request 2 is After sending to the client, there is no need to adjust the resource threshold 2.
  • the process for dynamically adjusting the resource threshold may include: acquiring a processing time corresponding to the query request; wherein the processing time is specifically: starting from receiving the query request, until the data set is sent to the client. Time between. Then, the resource threshold can be adjusted according to the processing time corresponding to the query request. Further, the process of adjusting the resource threshold according to the processing time corresponding to the query request may include, but is not limited to, if the processing time corresponding to the query request is greater than the preset first time threshold, the resource threshold may be increased. The value of the resource threshold is not greater than the resource threshold; if the processing time corresponding to the query request is less than the preset second time threshold, the value of the resource threshold may be decreased; wherein the reduced resource threshold is not less than the resource. Lower threshold.
  • the receiving time of the query request can be learned from the context information, and the time when the data set is sent to the client, that is, the current time when the resource threshold is adjusted.
  • the front-end server can compare the current time with the receiving time. The difference between the two is determined as the processing time corresponding to the query request.
  • the preset first time threshold and the preset second time threshold may be configured according to experience, and the preset first time threshold may be greater than the preset second time threshold.
  • the upper threshold of the resource threshold can be configured based on the experience.
  • the upper threshold of the resource threshold is not limited.
  • the lower threshold of the resource threshold can be configured based on the experience.
  • the lower threshold of the resource threshold is not limited.
  • the resource threshold upper limit can be greater than the resource threshold lower limit.
  • the value of the resource threshold may be: calculating a sum of the current resource threshold and the first resource adjustment value, and if the sum of the two is not greater than the resource threshold, the increased resource threshold is the sum of the two; The sum of the resources is greater than the upper limit of the resource threshold, and the increased resource threshold is the upper limit of the resource threshold.
  • the value of the reduced resource threshold may include: calculating a difference between the current resource threshold and the second resource adjustment value. If the difference between the two is not less than the resource threshold lower limit, the reduced resource threshold is the difference between the two; If the difference is less than the resource threshold lower limit, the reduced resource threshold is the resource threshold lower limit.
  • the first resource adjustment value and the second resource adjustment value may be configured according to experience, and the first resource adjustment value and the second resource adjustment value may be the same or different.
  • the processing time corresponding to the query request is greater than the preset first time threshold, so that the processing time of the query request is long when the query request is stored in a cache manner, so that the query request that needs to be cached can be minimized.
  • Quantity Based on this, the value of the resource threshold can be increased, so that the sum of the resource occupation information and the current occupied resource of the front-end server is as small as possible, so that the query request is not cached, and the processing time of the query request is too long.
  • the processing time corresponding to the query request is less than the preset second time threshold, the processing time of the query request is shorter when the query request is stored in a cache manner, and therefore, the number of query requests that need to be cached may be increased.
  • the value of the resource threshold can be reduced, so that the sum of the resource occupation information and the current occupied resource of the front-end server is greater than or equal to the resource threshold, so that the query request can be cached, and the processing resources of the front-end server can be saved.
  • Case 2 is another flowchart of the data query method, and the method includes:
  • Step 421 Parse the parameter information carried in the query request.
  • Step 422 Determine whether the parameter information exists in the parameter table.
  • the parameter table is used to record parameter information that needs to be cached. If yes, go to step 423; if no, go to step 424.
  • Step 423 storing context information corresponding to the query request in an external memory.
  • the step of reading the context information from the external memory, acquiring the data set corresponding to the query request by using the context information, and storing the data set in the external memory is performed.
  • Step 424 Obtain a data set corresponding to the query request, and send the data set to the client.
  • the front-end server can obtain parameter information that needs to be cached and store the parameter information in a parameter table. Based on this, after receiving the query request, the parameter information carried in the query request may be parsed, and whether the parameter information exists in the parameter table is determined. If yes, it indicates that the query request needs to be cached; if not, it means that the query request does not need to be cached.
  • the process of “the front-end server acquires the parameter information that needs to be cached and stores the parameter information in the parameter table” may include, but is not limited to, the following manners:
  • the front-end server receives the database creation request, and the database creation request carries the parameter information that needs to be cached, and records the parameter information carried in the database creation request to the parameter table.
  • the database creation request carrying the parameter information may be sent to the front-end server, so that the parameter information that needs to be cached is notified to the front-end server, and the parameter information carried in the database creation request is recorded by the front-end server into the parameter table.
  • the parameter information is valid for all the tables in the current database, that is, for all the tables in the current database, if the query request for the table is received, and the query request carries the parameter information that needs to be cached, It means that the query request needs to be cached.
  • the front-end server receives the table group creation request, and the table group creation request carries the parameter information that needs to be cached, and records the parameter information carried in the table group creation request to the parameter table.
  • the table group creation request carrying the parameter information may be sent to the front-end server, so that the parameter information that needs to be cached is notified to the front-end server, and the parameter information carried in the table group creation request is recorded by the front-end server into the parameter table.
  • the parameter information is valid for all the tables in the current table group, that is, for all the tables in the current table group, if the query request for the table is received, and the query request carries the parameters that need to be cached. Information indicates that the query request needs to be cached.
  • the front-end server receives the table creation request, and the table creation request carries the parameter information that needs to be cached, and then records the parameter information carried in the table creation request to the parameter table.
  • the table creation request carrying the parameter information may be sent to the front-end server, so that the parameter information that needs to be cached may be notified to the front-end server, and the parameter information carried in the table creation request is recorded by the front-end server into the parameter table.
  • the parameter information is valid only for the current table. That is to say, if the query request for the current table is received and the parameter information is carried in the query request, the query request needs to be cached.
  • the tables all refer to data tables in the database, and no limitation is imposed thereon.
  • the above embodiment only gives a few examples of "whether to cache the query request", and there is no limitation thereto.
  • the server determines to cache the query request.
  • the front-end server can also force the cache through the global, database, table group, and table switch states.
  • all query requests are cached; if the global switch is turned off and the database switch is turned on, all query requests for the database are cached; if the global switch, the database switch is closed, and the table switch is turned on, All query requests for the table group are cached; if the global switch, the database switch, and the table group switch are all turned off, and the table switch is turned on, all query requests for the table are cached.
  • Case 3 is another flow chart of the data query method, the method includes:
  • Step 431 Parse the parameter information carried in the query request.
  • Step 432 Determine whether the parameter information exists in the parameter table.
  • step 433 can be performed; if not, step 436 can be performed.
  • Step 433 Determine resource occupation information of the data set corresponding to the query request.
  • Step 434 Determine, according to the resource occupation information, whether to cache the query request.
  • step 435 can be performed; if not, step 436 can be performed.
  • Step 435 Store the context information corresponding to the query request in the external memory.
  • Step 436 Obtain a data set corresponding to the query request, and send the data set to the client.
  • the front-end server may perform SQL parsing on the query request, and use SQL to parse the process of “acquiring the data set corresponding to the query request and sending the data set to the client”.
  • the result generates a scheduling request, and sends the scheduling request to the computing server, the scheduling request is used to request data corresponding to the query request; then, the front-end server can receive the data set returned by the computing server, and send the data set to the client end.
  • the process of "reading context information from the external memory” may include, but is not limited to: determining the priority corresponding to the context information by using context information in the external memory; and then utilizing the The priority corresponding to the context information, sorting the context information in the external memory, and reading the context information with the highest priority from the external memory according to the sorting result.
  • the external memory may store multiple context information.
  • the front-end server may determine the priority corresponding to each context information, and sort all context information in the external memory according to the order of priority from highest to lowest. Reading the highest priority context information, that is, the first context information; or, sorting all context information in the external memory in order of priority from low to high, and reading the context information with the highest priority, that is, the last A contextual message.
  • the context information may include, but is not limited to, a reception time of the query request and a predicted data amount corresponding to the query request, as shown in Table 1.
  • the process of determining the priority corresponding to the context information by using the context information in the external memory may include, but is not limited to, the front-end server determines the waiting corresponding to the context information by using the difference between the current time and the receiving time. Time; then, according to the waiting time and the amount of predicted data, the priority corresponding to the context information is determined.
  • the process of determining the priority corresponding to the context information according to the waiting time and the amount of the predicted data may include, but is not limited to, normalizing the waiting time to obtain the first sub-weight; The predicted data amount is normalized to obtain a second sub-weight; then, the first weight value is obtained according to the first sub-weight and the second sub-weight; and then, the priority corresponding to the context information is obtained according to the first weight value. a level; wherein, when the first weight value is larger, the priority corresponding to the context information is higher, and when the first weight value is smaller, the priority corresponding to the context information is lower.
  • the front-end server may determine the priority A corresponding to the query identifier 101, the priority B corresponding to the query identifier 102, and the priority C corresponding to the query identifier 103.
  • the current time is 2017.11.6.14.16.18 (ie, 14:16:18 on November 6, 2017)
  • the waiting time corresponding to the query identifier 101 may be 8 seconds
  • the waiting time corresponding to the query identifier 102 may be 3 seconds
  • the waiting time corresponding to the query identifier 103 is 1 second.
  • the waiting time is normalized, that is, the waiting time can be converted into a value between 0 and 1.
  • the normalized first sub-weight is more Big.
  • the wait time corresponding to the query ID is divided by the sum of the wait times for all query IDs. Therefore, the first sub-weight corresponding to the query identifier 101 is 8/12, that is, 0.667, and the first sub-weight corresponding to the query identifier 102 is 3/12, that is, 0.25, and the first sub-weight corresponding to the query identifier 103 is 1/12. That is 0.083.
  • the above method is only an example of normalizing the waiting time, and the processing manner is not limited.
  • the normalized processing of the amount of prediction data that is, the amount of prediction data can be converted into a value between 0 and 1, in the process of conversion, when the amount of prediction data is larger, the normalized second sub-weight The smaller.
  • the sum of the predicted data amounts of all the query identifiers is first subtracted from the predicted data amount corresponding to the query identifier, and then the calculation result is divided by the sum of the predicted data amounts of all the query identifiers.
  • the second sub-weight corresponding to the query identifier 101 is (37.6-25.6)/37.6, that is, 0.319
  • the second sub-weight corresponding to the query identifier 102 is (37.6-10)/37.6, that is, 0.734
  • the corresponding identifier of the query identifier 103 The weight of the second child is (37.6-2)/37.6, which is 0.947.
  • the above is just an example of normalization, and there is no limitation on this processing method.
  • the product of the first sub-weight and the second sub-weight is taken as the first weight value.
  • the first weight value corresponding to the query identifier 101 is 0.667*0.319, that is, 0.213; the first weight value corresponding to the query identifier 102 is 0.25*0.734, that is, 0.183; and the first weight value corresponding to the query identifier 103 is 0.083*0.947. That is 0.079.
  • the priority corresponding to the query identifier is obtained according to each first weight value, and no limitation is imposed on this. As long as the first weight value is larger, the priority is higher. When the first weight value is smaller, the priority is higher. Low.
  • the first weight value 0.213 corresponds to a priority of 213 (or 21)
  • the first weight value 0.183 corresponds to a priority of 183 (or 18)
  • the first weight value of 0.079 corresponds to a priority of 79 (or 8).
  • the priority of the query identifier 101 may be higher than the priority corresponding to the query identifier 102, and the priority corresponding to the query identifier 102 may be higher than the priority corresponding to the query identifier 103. That is to say, the ranking result of all the context information in the external memory is: the context information corresponding to the query identifier 101, the context information corresponding to the query identifier 102, and the context information corresponding to the query identifier 103. Therefore, the front-end server can read the context information corresponding to the query identifier 101 from the external memory.
  • the front-end server can parse the query request 1 from the context information corresponding to the query identifier 101, and obtain the data set corresponding to the query request 1, assuming that the acquired data set is the data set A. Then, the data set A is stored in the external memory, as shown in Table 4, as an example after storing the data set A.
  • the process of “acquiring the data set corresponding to the query request by using the context information” may include, but is not limited to, the following method: the front-end server may parse the query request from the context information, and generate and The query requests a corresponding scheduling request and can send the scheduling request to the computing server. Then, the front-end server can receive the data set corresponding to the scheduling request returned by the computing server, and the data set is also the data set corresponding to the query request.
  • the data returned by the computing server to the front-end server may be referred to as a data set.
  • the query request may be an SQL query request. Therefore, the front-end server performs SQL parsing on the query request, and learns that the query request is used for the requested data set, such as request data set 1 and data set 2. Then, the front-end server analyzes the computing server corresponding to the data set 1, and sends a scheduling request for requesting the data set 1 to the computing server; after receiving the scheduling request, the computing server sends the data set 1 to the front-end server. In addition, the front-end server analyzes the computing server corresponding to the data set 2, and sends a scheduling request for requesting the data set 2 to the computing server; after receiving the scheduling request, the computing server sends the data set 2 to the front-end server. After receiving the data set 1 and the data set 2, the front-end server can form data set 1 and data set 2 into a data set, that is, a data set corresponding to the query request.
  • step 303 when the front-end server reads the context information from the external memory, even if there is a plurality of context information in the external memory, the front-end server can read only one context information in one read process, and obtain the context information by using the context information.
  • the front-end server After the data set corresponding to the query request is stored in the external memory and the data set is stored in the external memory, another context information is read from the external memory (in the reading process, the operations of determining priority, sorting, etc. are performed again, where Do not repeat them, and so on.
  • the front-end server may start a scheduling thread, and the scheduling thread first processes the query request 3 in the external memory, acquires the data set corresponding to the query request 3, and stores the data set in the external memory, and then schedules The thread processes the query request 4 in the external memory, and so on.
  • the process of "reading the data set from the external memory” may include, but is not limited to, using the context information in the external memory to determine the priority of the data set corresponding to the context information. Level; then, using the priority of the data set, sorting the data set in the external memory, and reading the data set from the external memory according to the sort result of the data set.
  • a plurality of data sets may be stored in the external memory, and each data set has corresponding context information.
  • the front-end server uses the context information corresponding to each data set to determine the priority of the data set; sorts all data sets in the external memory according to the order of priority from high to low; or, according to the priority The low to high order sorts all data sets in the external memory.
  • the front-end server may also calculate the data volume of the data set (ie, the true data volume, no longer the predicted data volume), and the data set.
  • the amount of data is stored in the context information. As shown in Table 5, it is an example of context information.
  • the context information may include, but is not limited to, the receiving time of the query request, and the data amount of the data set corresponding to the query request.
  • the process of determining the priority of the data set corresponding to the context information by using the context information in the external memory may include, but is not limited to, the front-end server determining the context information by using the difference between the current time and the receiving time. Corresponding waiting time; then, according to the waiting time and the data amount of the data set, the priority of the data set corresponding to the context information is determined.
  • the process of determining the priority of the data set corresponding to the context information according to the waiting time and the data amount of the data set may include, but is not limited to, normalizing the waiting time to obtain a third Sub-weighting; normalizing the data amount to obtain a fourth sub-weight; obtaining a second weight value according to the third sub-weight and the fourth sub-weight; obtaining a priority of the data set according to the second weight value; Wherein, when the second weight value is larger, the priority of the data set corresponding to the context information is higher, and when the second weight value is smaller, the priority of the data set corresponding to the context information is lower.
  • the waiting time can be converted into a value between 0 and 1.
  • the normalized third sub-weight is larger.
  • the amount of data can be converted to a value between 0 and 1.
  • the normalized fourth sub-weight is smaller.
  • the process of determining the priority of the data set according to the “waiting time and the data amount of the data set” is similar to the above “determining the priority corresponding to the context information according to the waiting time and the predicted data amount”, the difference is that The amount of data in the data set is different from the amount of predicted data, and will not be repeated here.
  • the sort result is: data set A, data set B, Data set C.
  • the process of "reading a data set from the external memory according to the sorting result of the data set” may include, but is not limited to, the following manner: if sorting is performed in descending order of priority, the front end The server can read the top N data sets from the external memory according to the sorting result of the data set; or, if sorting according to the order of priority from low to high, the front end server can sort the results according to the data set.
  • the N data sets that are sorted backward are read from the external memory.
  • the N is a positive integer greater than or equal to 1; and the sum of the resource occupation information corresponding to the N data sets and the current occupied resources of the front-end server may be smaller than the current resource threshold.
  • the resource occupation information may include a memory usage rate, a CPU usage rate, and a data volume of the data set.
  • the resource threshold may include a memory threshold, a CPU threshold, and a data volume threshold.
  • the front-end server can also count current memory usage, current CPU usage, and current bandwidth usage.
  • the front-end server determines the memory usage corresponding to the data set with the highest priority (see step 411), the CPU usage (see step 411), and the data volume of the data set (known from Table 5).
  • the highest priority data set is read in the external memory. If any one or more of the above three conditions are not satisfied, the data set is not read from the external memory, and after waiting for the preset time, it continues to determine whether the highest priority data can be read from the external memory. set.
  • the front-end server After reading the data set with the highest priority from the external memory, the front-end server determines the memory usage, the CPU usage, and the data amount of the data set corresponding to the data set with the second highest priority. If the memory usage of the data set with the highest priority, the memory usage of the data set with the second highest priority, and the current memory usage are less than the memory threshold, and the CPU usage of the data set with the highest priority, The sum of the CPU usage and the current CPU usage of the data set with the second highest priority is smaller than the CPU threshold, and the data amount corresponding to the data set with the highest priority, the data amount corresponding to the data set with the second highest priority, and the current data set.
  • the second highest priority data set is read from the external memory, and so on. If any one or more of the above three conditions are not satisfied, the data set having the second highest priority is not read from the external memory, and the reading process is ended.
  • the front-end server can read N data sets from the external memory, and the sum of the resource occupation information corresponding to the N data sets and the current occupied resources of the front-end server is smaller than the resource threshold.
  • the read N data sets can be sent to the client, for example, by using the user information in the context information, the data set is sent to the corresponding The client does not repeat this process. Then, the front-end server can also delete the context information corresponding to the data set from the external storage, and thus complete the sending process of the data set.
  • step 304 when the front-end server reads the data set from the external memory, even if there are multiple data sets in the external memory, the front-end server can read only N data sets and N data sets in one reading process. Once sent to the client, the data set will be read again from the external storage, and so on.
  • the data set after receiving the query request, the data set is not directly sent to the client, but the query request and the data set are first stored in the external memory, so that the data corresponding to the query request may be
  • the cache is localized, so that multiple query requests can be processed in a short period of time, that is, to avoid returning multiple data sets to the client in a short time, thereby reducing CPU resource, memory resources, and network bandwidth usage, and avoiding client query timeout or failure. Improve user experience.
  • the query request of a large amount of data can be cached in the local storage, and the query request of the small data amount is not cached in the local storage, so that the data set of the small data amount can be directly sent to the client without affecting the small.
  • the query request for the amount of data the maximum likelihood of reducing the frequency of full gc.
  • FIG. 5A is a schematic diagram of the test results.
  • id xxx limit xxx respectively Limit 10, limit 100, limit 500, limit 1000, limit 2000, limit 5000, limit 8000, limit10000 limit the number of rows in the data set.
  • the data volume of the data set is larger.
  • Fig. 5A the relationship between the data size of the data set and the average query time consumption is shown.
  • the memory usage rate is kept at around 75% by executing other programs before the test, and the above query operation is performed asynchronously.
  • the query time can be significantly reduced, and the frequency of the full gc can be reduced.
  • FIG. 5B it is a schematic diagram of the relationship between memory usage and query time consumption.
  • a SQL query request is tested by running other programs to control the memory usage. Through testing, it can be found that when the memory usage exceeds a certain value, if the method of the scheme is used, the query time can be significantly reduced.
  • 5D shows query concurrency and query The relationship between the timeout rates is similar to that of Figure 5C.
  • the query timeout rate begins to increase.
  • the concurrency threshold value whose query timeout rate is suddenly increased can be postponed, and in the same environment, the number of concurrent users can be increased.
  • FIG. 5E it is a schematic diagram for a mixed scenario (that is, a query operation with a large data set and a smaller query operation for a data set), and is a query for a query operation with a larger data set for a smaller data set.
  • the impact of the timeout rate As can be seen from FIG. 5E, when the number of concurrent query operations of the data set exceeds a certain critical value, the query timeout rate of the smaller query operation of the data set begins to increase. Obviously, after using the method of this scheme, the query timeout rate of the smaller query operation of the data set can be slowed down.
  • the embodiment of the present application further provides a data query device, which can be applied to a front-end server, as shown in FIG. 6 , which is a structural diagram of the device, and the device includes:
  • the receiving module 601 is configured to receive a query request sent by the client.
  • a determining module 602 configured to determine resource occupation information of a data set corresponding to the query request
  • the obtaining module 603 is configured to: when determining, according to the resource occupation information, to cache the query request, obtain a data set corresponding to the query request;
  • a storage module 604 configured to store the data set in an external memory
  • the sending module 605 is configured to read the data set from the external storage, and send the data set to the client.
  • the obtaining module 603 is specifically configured to: store, in an external memory, context information corresponding to the query request in a process of acquiring a data set corresponding to the query request; and read the context information from the external memory, The data set corresponding to the query request is obtained by using the context information.
  • the obtaining module 603 is specifically configured to: in the process of reading the context information from the external memory, use the context information in the external memory to determine a priority corresponding to the context information; and use the priority corresponding to the context information Level, sorting context information in the outer memory; and reading context information with the highest priority from the outer memory according to the sorting result;
  • the sending module 605 is specifically configured to determine, by using context information in the external memory, the priority of the data set corresponding to the context information in the process of reading the data set from the external memory; and using the priority of the data set, Sorting the data sets in the outer memory; reading the data set from the outer memory according to the sorting result of the data set.
  • the embodiment of the present application further provides a front-end server, where the front-end server may include: a receiver, a processor, and a transmitter; wherein the receiver is configured to receive a query sent by the client. a request for determining resource occupancy information of a data set corresponding to the query request; if it is determined to cache the query request according to the resource occupation information, acquiring a data set corresponding to the query request, in an external memory Storing the data set; reading the data set from an external memory; and transmitting a transmitter to send the data set to a client.
  • the front-end server may include: a receiver, a processor, and a transmitter; wherein the receiver is configured to receive a query sent by the client. a request for determining resource occupancy information of a data set corresponding to the query request; if it is determined to cache the query request according to the resource occupation information, acquiring a data set corresponding to the query request, in an external memory Storing the data set; reading the data set from an external memory; and transmitting a transmitter
  • the embodiment of the present application further provides a machine readable storage medium, which can be applied to a front-end server.
  • the machine-readable storage medium stores a plurality of computer instructions.
  • the following processing is performed: Receiving a query request sent by the client; determining resource occupation information of the data set corresponding to the query request; and if it is determined to cache the query request according to the resource occupation information, acquiring a data set corresponding to the query request Storing the data set in an external memory; reading the data set from an external memory; transmitting the data set to a client.
  • the system, device, module or unit illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product having a certain function.
  • a typical implementation device is a computer, and the specific form of the computer may be a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email transceiver, and a game control.
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
  • embodiments of the present application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • these computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the instruction means implements the functions specified in one or more blocks of the flowchart or in a flow or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种数据查询方法、装置及设备,该方法包括:接收客户端发送的查询请求(301);确定与所述查询请求对应的数据集的资源占用信息(302);若根据所述资源占用信息确定对所述查询请求进行缓存,则获取与所述查询请求对应的数据集,并在外存储器中存储所述数据集(303);从所述外存储器中读取所述数据集,并将所述数据集发送给客户端(304)。通过该方法,可以避免短时间内向客户端返回多个数据集,从而减轻CPU资源、内存资源、网络带宽的占用,避免客户端查询超时或者失败,提高用户使用感受。

Description

一种数据查询方法、装置及设备
本申请要求2018年01月19日递交的申请号为201810053977.8、发明名称为“一种数据查询方法、装置及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,尤其涉及一种数据查询方法、装置及设备。
背景技术
分析型数据库是一种实时计算引擎,能够对海量数据进行任意维度的分析和统计,支持高并发、低延时(毫秒级响应)、实时在线分析、海量数据查询等功能。其中,分析型数据库可以存储大量数据,在接收到客户端发送的查询请求后,可以查询与该查询请求对应的数据,并将查询到的数据返回给客户端。
但是,在某些场景下(如地图数据的查询场景、画像数据的查询场景等),可能在短时间接收到多个查询请求(即并发数很高),每个查询请求对应的数据量较大,这样,需要在短时间内处理多个查询请求,针对每个查询请求返回大量数据,从而导致CPU(Central Processing Unit,中央处理器)资源、内存资源、网络带宽等出现异常,导致客户端查询超时或者失败,影响用户使用感受。
发明内容
本申请提供一种数据查询方法,应用于前端服务器,所述方法包括:
接收客户端发送的查询请求;
确定与所述查询请求对应的数据集的资源占用信息;
若根据所述资源占用信息确定对所述查询请求进行缓存,则获取与所述查询请求对应的数据集,并在外存储器中存储所述数据集;
从所述外存储器中读取所述数据集,并将所述数据集发送给客户端。
本申请提供一种数据查询装置,应用于前端服务器,所述装置包括:
接收模块,用于接收客户端发送的查询请求;
确定模块,用于确定与所述查询请求对应的数据集的资源占用信息;
获取模块,用于在根据所述资源占用信息确定对所述查询请求进行缓存时,则获取 与所述查询请求对应的数据集;
存储模块,用于在外存储器中存储所述数据集;
发送模块,用于从外存储器中读取数据集,并将所述数据集发送给客户端。
本申请提供一种前端服务器,所述前端服务器包括:
接收器,用于接收客户端发送的查询请求;
处理器,用于确定与所述查询请求对应的数据集的资源占用信息;若根据所述资源占用信息确定对所述查询请求进行缓存,则获取与所述查询请求对应的数据集,在外存储器中存储所述数据集;从外存储器中读取所述数据集;
发射器,用于将所述数据集发送给客户端。
基于上述技术方案,本申请实施例中,在接收到查询请求后,不是直接将数据集发送给客户端,而是先在外存储器中存储查询请求、数据集,这样,可以将查询请求对应的数据集缓存在本地,从而可以避免短时间内处理多个查询请求,即避免短时间内向客户端返回多个数据集,从而减轻CPU资源、内存资源、网络带宽的占用,避免客户端查询超时或者失败,提高用户使用感受。
附图说明
为了更加清楚地说明本申请实施例或者现有技术中的技术方案,下面将对本申请实施例或者现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请中记载的一些实施例,对于本领域普通技术人员来讲,还可以根据本申请实施例的这些附图获得其它的附图。
图1是本申请一种实施方式中的系统结构示意图;
图2A是本申请一种实施方式中的查询请求处理示意图;
图2B是本申请另一种实施方式中的查询请求处理示意图;
图3是本申请一种实施方式中的数据查询方法的流程图;
图4A是本申请另一种实施方式中的数据查询方法的流程图;
图4B是本申请又一种实施方式中的数据查询方法的流程图;
图4C是本申请又一种实施方式中的数据查询方法的流程图;
图5A是本申请一种实施方式中的效果示意图;
图5B是本申请另一种实施方式中的效果示意图;
图5C是本申请又一种实施方式中的效果示意图;
图5D是本申请又一种实施方式中的效果示意图;
图5E是本申请又一种实施方式中的效果示意图;
图6是本申请一种实施方式中的数据查询装置的结构图。
具体实施方式
在本申请使用的术语仅仅是出于描述特定实施例的目的,而非限制本申请。本申请和权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其它含义。还应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。
应当理解,尽管在本申请可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在不脱离本申请范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,此外,所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。
本申请实施例提出一种数据查询方法,可以应用于包括客户端、前端服务器(也可以称为前端节点front node)、计算服务器(也可以称为计算节点compute node)、数据库的系统,如用于实现分析型数据库(如Analytic DB,简称为ADS)的系统。当然,还可以包括其它服务器,如资源调度服务器,对此不做限制。
参见图1所示,为本申请实施例的应用场景示意图,客户端、前端服务器、计算服务器、数据库的数量均可以为一个或者多个。在图1中,以3个客户端、1个前端服务器、3个计算服务器、3个数据库为例进行说明。当然,客户端、前端服务器、计算服务器、数据库的数量还可以为其它数值,对此不做限制。
其中,分析型数据库是一种实时计算引擎,能够对海量数据进行任意维度的分析和统计,支持高并发、低延时、实时在线分析、海量数据查询等功能。
其中,客户端用于从数据库中查询数据,如可以是终端设备(如PC(Personal Computer,个人计算机)、笔记本电脑、移动终端等)上的APP(Application,应用),也可以是终端设备上的浏览器,对此客户端的类型不做具体限制。
其中,数据库用于存储各种类型的数据,且能够将数据库中存储的数据提供给客户端。对于数据库中存储的数据的类型,本申请实施例中不做限制,如可以是用户数据、商品数据、地图数据、视频数据、图像数据、音频数据等。
其中,前端服务器用于接收客户端发送的查询请求,并对该查询请求进行SQL(Structured Query Language,结构化查询语言)解析,利用SQL解析结果生成调度请求,并将该调度请求发送给计算服务器,该调度请求用于请求与该查询请求对应的数据。接收计算服务器返回的数据,并将该数据发送给客户端。
其中,计算服务器用于接收前端服务器发送的调度请求,并利用该调度请求从数据库中读取与该调度请求对应的数据,并将该数据发送给前端服务器。
参见图2A所示,前端服务器可以启动多个线程,每个线程用于处理一个查询请求。例如,前端服务器在接收到查询请求1后,为查询请求1启动线程1,由线程1处理查询请求1,即执行将调度请求发送给计算服务器、将计算服务器返回的数据发送给客户端等操作,在将数据发送给客户端后,则释放线程1。在线程1工作时,若前端服务器接收到查询请求2,则为查询请求2启动线程2,由线程2处理查询请求2。在线程1、线程2工作时,若前端服务器接收到查询请求3,则为查询请求3启动线程3,由线程3处理查询请求3,以此类推。
综上所述,若前端服务器在短时间内接收到多个查询请求(即并发数很高),则前端服务器可以启动多个线程,并由多个线程分别对多个查询请求进行处理。进一步的,当每个查询请求对应的数据量较大时,则需要针对每个查询请求返回大量数据,从而导致CPU资源、内存资源、网络带宽等出现异常,导致前端服务器的处理性能下降,并导致客户端查询超时或者失败,影响用户使用感受。例如,针对每个查询请求返回大量数据时,而这些数据会占用大量的内存资源,从而存在内存溢出或者内存频繁回收等问题,从而导致查询超时或者失败。
针对上述发现,本申请实施例中,参见图2B所示,前端服务器可以启动多个查询线程,每个查询线程用于处理一个查询请求。此外,前端服务器可以启动一个调度线程,该调度线程用于依次处理外存储器中的查询请求。例如,前端服务器在接收到查询请求1后,为查询请求1启动查询线程1,由查询线程1处理查询请求1。在查询线程1工作时,若前端服务器接收到查询请求2,则为查询请求2启动查询线程2,由查询线程2处理查询请求2。在查询线程1、查询线程2工作时,若前端服务器接收到查询请求3、查询请求4、查询请求5等,不再为这些查询请求启动查询线程,而是将这些查询请求存储到外存储器中。
前端服务器可以启动调度线程,调度线程先处理外存储器中的查询请求3,在查询请求3处理完成之后,调度线程才处理外存储器中的查询请求4,在查询请求4处理完 成之后,调度线程才处理外存储器中的查询请求5,以此类推。
综上所述,若前端服务器在短时间内接收到多个查询请求(即并发数很高),则可以控制查询线程的启动数量,可以避免短时间内处理多个查询请求,即避免短时间内向客户端返回大量数据,从而减轻CPU资源、内存资源、网络带宽的占用,提高处理性能,避免客户端查询超时或者失败,提高用户使用感受。
在上述应用场景下,参见图3所示,为本申请实施例中提出的数据查询方法的流程图,该方法可以应用于前端服务器,该方法可以包括以下步骤:
步骤301,接收客户端发送的查询请求。
步骤302,确定与该查询请求对应的数据集的资源占用信息。
步骤303,若根据该资源占用信息确定对该查询请求进行缓存,则获取与该查询请求对应的数据集,并在外存储器中存储该数据集。
步骤304,从该外存储器中读取数据集,并将该数据集发送给客户端。
在一个例子中,上述执行顺序只是为了方便描述给出的一个示例,在实际应用中,还可以改变步骤之间的执行顺序,对此执行顺序不做限制。而且,在其它实施例中,并不一定按照本说明书示出和描述的顺序来执行相应方法的步骤,其方法所包括的步骤可以比本说明书所描述的更多或更少。此外,本说明书中所描述的单个步骤,在其它实施例中可能被分解为多个步骤进行描述;本说明书中所描述的多个步骤,在其它实施例也可能被合并为单个步骤进行描述。
针对步骤301,在一个例子中,当客户端请求数据时,可以发送查询请求,前端服务器接收客户端发送的查询请求,该查询请求用于请求数据库中的数据。
针对步骤302,在确定与该查询请求对应的数据集的资源占用信息之后,若根据该资源占用信息确定不对该查询请求进行缓存,则可以获取与该查询请求对应的数据集,并将该数据集发送给客户端,而不在缓存数据集。
针对步骤303,获取与该查询请求对应的数据集,并在外存储器中存储该数据集的过程,可以包括但不限于:在外存储器中存储该查询请求对应的上下文信息。然后,从该外存储器中读取上下文信息,并利用所述上下文信息获取与该查询请求对应的数据集,并在该外存储器中存储该数据集。
针对步骤303,在一个例子中,前端服务器在接收到查询请求之后,可以直接在外存储器中存储该查询请求对应的上下文信息;或者,在满足某些特定条件时,前端服务器才可以在外存储器中存储该查询请求对应的上下文信息。
其中,用于存储查询请求对应的上下文信息、数据集(为了方便描述,将最终返回给客户端的数据称为数据集,后续过程中会介绍数据集)的存储介质是外存储器,而不是内存储器(即内存),从而避免占用内存储器的资源。外存储器是指除内存及CPU缓存以外的存储器,可以包括但不限于硬盘、软盘、光盘、U盘等,对此外存储器的类型不做限制,只要与内存储器不同即可。
其中,查询请求对应的上下文信息可以包括但不限于以下之一或者任意组合:查询请求对应的查询标识、查询请求的接收时间、查询请求对应的用户信息(如客户端的IP地址等)、查询请求对应的预测数据量、所述查询请求。参见表1所示,为查询请求对应的上下文信息的一个示例,对此不做限制。
表1
查询标识 接收时间 用户信息 预测数据量 查询请求
101 2017.11.6.14.16.10 IP地址A 25.6M 查询请求1
...
查询请求可以对应一个查询标识,该查询标识可以具有唯一性,即不同的查询请求的查询标识不同。通过这个查询标识(如表1中的查询标识101),前端服务器可以获取到接收时间、用户信息、预测数据量、查询请求等内容。
接收时间是指前端服务器接收到该查询请求的时间,即前端服务器在接收到查询请求时,可以记录该查询请求的接收时间,如2017.11.6.14.16.10等。
用户信息可以包括但不限于客户端的IP地址(查询请求的源IP地址)、用户身份信息(如用户名、密码等,可以从查询请求中获知),对此不做限制。
预测数据量是指确定的与查询请求对应的数据集的数据量(即数据集的数据大小)。例如,前端服务器在接收到查询请求1后,由于当前还没有真正获取查询请求1对应的数据集,因此,可以先确定与该查询请求1对应的数据集的数据量,这个数据量就是预测数据量,对此确定过程,将在后续过程中说明。
查询请求是前端服务器接收到的查询请求,包括查询请求携带的所有内容。
在一个例子中,针对“满足某些特定条件时,前端服务器才可以在外存储器中存储该查询请求对应的上下文信息”的过程,可以包括但不限于如下情况:
情况一、参见图4A所示,为数据查询方法的另一个流程图,该方法包括:
步骤411,确定与查询请求对应的数据集的资源占用信息。
步骤412,根据该资源占用信息确定是否对该查询请求进行缓存。
如果是,则可以执行步骤413;如果否,则可以执行步骤414。
步骤413,在外存储器中存储该查询请求对应的上下文信息。
进一步的,执行从外存储器中读取上下文信息,并利用所述上下文信息获取与该查询请求对应的数据集,并在外存储器中存储所述数据集等步骤。
步骤414,获取该查询请求对应的数据集,并将该数据集发送给客户端。
针对步骤302或者步骤411,在一个例子中,针对“确定与查询请求对应的数据集的资源占用信息”的过程,可以包括但不限于:通过该查询请求携带的数据标识查询映射表,得到该数据标识对应的数据集的资源占用信息;其中,该映射表用于记录数据标识与资源占用信息的对应关系。进一步的,为了生成映射表,则前端服务器在获取到与查询请求对应的数据集后,可以在所述映射表中记录该查询请求携带的数据标识与该数据集的资源占用信息的对应关系。
例如,前端服务器第一次接收到用于请求数据集A的查询请求后,通过该查询请求携带的数据标识(如数据集A对应的数据标识1)查询映射表,由于映射表中不存在该数据标识1对应的资源占用信息,因此,前端服务器确定资源占用信息为默认的资源占用信息1(资源占用信息1是根据经验预先配置的,当映射表中不存在数据标识对应的资源占用信息时,就确定出资源占用信息1)。
进一步的,前端服务器在获取到与该查询请求对应的数据集(在后续过程中,会介绍如何获取到与该查询请求对应的数据集)后,可以统计出该数据集的资源占用信息,假设资源占用信息为资源占用信息A,则可以在映射表中记录数据标识1与资源占用信息A的对应关系,如表2所示,为映射表的示例。
表2
数据标识 资源占用信息
数据标识1 资源占用信息A
前端服务器再次(如第二次、第三次等)接收到用于请求数据集A的查询请求后,通过该查询请求携带的数据标识1查询映射表,可以得到与数据标识1对应的资源占用信息A,即确定结果为资源占用信息A。前端服务器在获取到与该查询请求对应的数据集后,可以统计出该数据集的资源占用信息;若当前的统计结果与资源占用信息A相同,则保持资源占用信息A不变;若当前的统计结果与资源占用信息A不同,则使用当前的统计结果替换资源占用信息A。
在一个例子中,上述的资源占用信息可以包括但不限于以下之一或者任意组合:内存占用率、CPU占用率、数据集的数据量(即数据集的大小)。
针对“统计数据集的资源占用信息”的过程,可以包括:在得到与查询请求对应的数据集后,可以统计该数据集对应的内存占用率,即这个数据集占用了多少内存,如5%等,表示这个数据集占用了总内存的5%。在得到与查询请求对应的数据集后,可以统计该数据集对应的CPU占用率,即这个数据集占用了多少CPU,如5%等,表示这个数据集占用了总CPU的5%。在得到与查询请求对应的数据集后,可以统计该数据集对应的数据量,即这个数据集的大小。
针对步骤412,在一个例子中,针对“根据该资源占用信息确定是否对该查询请求进行缓存”的过程,可以包括但不限于如下方式:若该资源占用信息与前端服务器的当前占用资源之和,大于等于资源阈值,则可以确定对该查询请求进行缓存;或者,若该资源占用信息与前端服务器的当前占用资源之和,小于所述资源阈值,则可以确定不对该查询请求进行缓存。
假设资源占用信息包括内存占用率、CPU占用率、数据集的数据量,则资源阈值可以包括内存阈值(如85%)、CPU阈值(如90%)、数据量阈值(如200M)。前端服务器可以统计当前内存使用情况(如60%,表示当前已经被使用了60%的内存),当前CPU使用情况(如60%,表示当前已经被使用了60%的CPU),当前带宽使用情况(如100M,表示当前被占用了100M的带宽)。
然后,若内存占用率与当前内存使用情况之和小于内存阈值,且CPU占用率与当前CPU使用情况之和小于CPU阈值,且数据集的数据量与当前带宽使用情况之和小于数据量阈值,则确定不对查询请求进行缓存。若下述三个条件满足一个或者多个,则确定对查询请求进行缓存:内存占用率与当前内存使用情况之和大于等于内存阈值,CPU占用率与当前CPU使用情况之和大于等于CPU阈值,数据集的数据量与当前带宽使用情况之和大于等于数据量阈值。
在一个例子中,资源阈值(如内存阈值、CPU阈值、数据量阈值等)可以是根据经验配置的固定值,即资源阈值不会发生变化。在另一个例子中,资源阈值还可以动态调整,即根据经验配置资源阈值的初始值后,还可以动态调整所述资源阈值,以下结合具体实施例,对资源阈值的动态调整过程进行说明。
其中,针对外存储器中的查询请求,在将查询请求的数据集发送给客户端后,可以动态调整资源阈值,即外存储器中的每个查询请求触发调整资源阈值。
例如,在接收到查询请求1后,利用资源阈值1确定是否对查询请求1进行缓存;若进行缓存,则在外存储器中存储查询请求1对应的上下文信息;在将查询请求1对应的数据集发送给客户端后,将资源阈值1调整为资源阈值2。在接收到查询请求2后,利用资源阈值2确定是否对查询请求2进行缓存;若不进行缓存,则不会在外存储器中存储查询请求2对应的上下文信息;在将查询请求2对应的数据集发送给客户端后,不用调整资源阈值2。
在一个例子中,针对动态调整资源阈值的过程,可以包括:获取查询请求对应的处理时间;其中,该处理时间具体为:从接收到该查询请求开始,一直到将数据集发送给客户端之间的时间。然后,可以根据该查询请求对应的处理时间,调整资源阈值。进一步的,针对“根据该查询请求对应的处理时间,调整资源阈值”的过程,可以包括但不限于:若该查询请求对应的处理时间大于预设第一时间阈值,则可以提高资源阈值的取值;其中,提高后的资源阈值不大于资源阈值上限;若该查询请求对应的处理时间小于预设第二时间阈值,则可以降低资源阈值的取值;其中,降低后的资源阈值不小于资源阈值下限。
其中,查询请求的接收时间可以从上下文信息中获知,而数据集发送给客户端的时间,也就是资源阈值调整时的当前时间,综上所述,前端服务器可以将该当前时间与该接收时间之间的差值,确定为该查询请求对应的处理时间。
其中,上述预设第一时间阈值、上述预设第二时间阈值均可以根据经验进行配置,对此不做限制,且预设第一时间阈值可以大于预设第二时间阈值。
其中,资源阈值上限可以根据经验进行配置,对此资源阈值上限不做限制;资源阈值下限可以根据经验进行配置,对此资源阈值下限不做限制;当然,在配置资源阈值上限和资源阈值下限时,资源阈值上限可以大于资源阈值下限。
其中,提高资源阈值的取值可以包括:计算当前的资源阈值与第一资源调整值的和,若二者的和不大于资源阈值上限,则提高后的资源阈值就是二者的和;若二者的和大于资源阈值上限,则提高后的资源阈值就是资源阈值上限。
其中,降低资源阈值的取值可以包括:计算当前的资源阈值与第二资源调整值的差,若二者的差不小于资源阈值下限,则降低后的资源阈值就是二者的差;若二者的差小于资源阈值下限,则降低后的资源阈值就是资源阈值下限。
其中,上述第一资源调整值和上述第二资源调整值均可以根据经验进行配置,而且,第一资源调整值和第二资源调整值可以相同,也可以不同。
在上述实施例中,若查询请求对应的处理时间大于预设第一时间阈值,则说明采用缓存的方式存储查询请求时,查询请求的处理时间较长,因此,可以尽量减少需要缓存的查询请求数量。基于此,可以提高资源阈值的取值,这样,使得资源占用信息与前端服务器的当前占用资源之和尽量小于资源阈值,从而不需要对查询请求进行缓存,避免查询请求的处理时间太长。
在上述实施例中,若查询请求对应的处理时间小于预设第二时间阈值,则说明采用缓存的方式存储查询请求时,查询请求的处理时间较短,因此,可以增加需要缓存的查询请求数量。基于此,可以降低资源阈值的取值,这样,使得资源占用信息与前端服务器的当前占用资源之和尽量大于等于资源阈值,从而可以对查询请求进行缓存,并可以节省前端服务器的处理资源。
情况二、参见图4B所示,为数据查询方法的另一个流程图,该方法包括:
步骤421,解析查询请求中携带的参数信息。
步骤422,判断参数表中是否存在该参数信息。其中,该参数表用于记录需要进行缓存的参数信息。如果是,执行步骤423;如果否,执行步骤424。
步骤423,在外存储器中存储该查询请求对应的上下文信息。
进一步的,执行从外存储器中读取上下文信息,并利用所述上下文信息获取与该查询请求对应的数据集,并在外存储器中存储所述数据集等步骤。
步骤424,获取该查询请求对应的数据集,并将该数据集发送给客户端。
在一个例子中,前端服务器可以获取需要进行缓存的参数信息,并将参数信息存储到参数表中。基于此,在接收到查询请求后,可以解析查询请求中携带的参数信息,并判断参数表中是否存在该参数信息。如果是,说明需要对查询请求进行缓存;如果否,说明不需要对查询请求进行缓存。
在一个例子中,针对“前端服务器获取需要进行缓存的参数信息,并将参数信息存储到参数表中”的过程,可以包括但不限于如下方式:
方式一、前端服务器接收数据库创建请求,该数据库创建请求携带需要进行缓存的参数信息,将该数据库创建请求中携带的参数信息记录到参数表。
其中,若需要对某些参数信息进行缓存,则可以在创建数据库时,指定需要进行缓存的参数信息。具体的,可以向前端服务器发送携带参数信息的数据库创建请求,从而将需要进行缓存的参数信息通知给前端服务器,由前端服务器将该数据库创建请求中携带的参数信息记录到参数表中。
例如,数据库创建请求可以为:create database test options(resource_type='ecu'ecu_type='c1'ecu_count=2 with_result_cache='true'),create database test options表示当前的消息是数据库创建请求,resource_type=‘ecu’是目标数据库的资源模型类型、ecu_type='c1'是目标数据库的资源模型、ecu_count=2是目标数据库的资源模型的数量信息、with_result_cache='true'是需要进行缓存的参数信息。
此外,上述参数信息对当前数据库下的所有表都生效,也就是说,针对当前数据库下的所有表,若接收到针对这些表的查询请求,且查询请求中携带上述需要进行缓存的参数信息,就说明需要对该查询请求进行缓存。
方式二、前端服务器接收表组创建请求,该表组创建请求携带需要进行缓存的参数信息,并将该表组创建请求中携带的参数信息记录到参数表。
其中,若需要对某些参数信息进行缓存,则可以在创建数据库的表组时,指定需要进行缓存的参数信息。具体的,可以向前端服务器发送携带参数信息的表组创建请求,从而将需要进行缓存的参数信息通知给前端服务器,由前端服务器将该表组创建请求中携带的参数信息记录到参数表中。
例如,所述表组创建请求的一个示例可以包括:create tablegroup test options(with_result_cache='true'),create tablegroup test options表示当前的消息是表组创建请求,而with_result_cache='true'就是需要进行缓存的参数信息。
此外,上述参数信息对当前表组下所有的表都生效,也就是说,针对当前表组下所有的表,若接收到针对这些表的查询请求,且查询请求中携带上述需要进行缓存的参数信息,就说明需要对该查询请求进行缓存。
方式三、前端服务器接收表创建请求,且该表创建请求携带需要进行缓存的参数信息,然后,将该表创建请求中携带的参数信息记录到参数表。
其中,若需要对某些参数信息进行缓存,则可以在创建数据库的表时,指定需要进行缓存的参数信息。具体的,可以向前端服务器发送携带参数信息的表创建请求,从而可以将需要进行缓存的参数信息通知给前端服务器,由前端服务器将该表创建请求中携带的参数信息记录到参数表中。
例如,表创建请求的示例可以为:CREATE TABLE test(col1 varchar,col2 varchar,PRIMARY KEY(col1))PARTITION BY HASH KEY(col1)PARTITION NUM 10SUBPARTITION BY LIST KEY(col2)TABLEGROUP test_group OPTIONS(UPDATETYPE='realtime’with_result_cache='true'),CREATE TABLE test表示当前 的消息可以是表创建请求,(col1 varchar,col2 varchar,PRIMARY KEY(col1))PARTITION BY HASH KEY(col1)PARTITION NUM 10 SUBPARTITION BY LIST KEY(col2)TABLEGROUP test_group OPTIONS与表有关,UPDATETYPE=‘realtime’是表的类型信息,with_result_cache=‘true'是需要进行缓存的参数信息。
上述参数信息只对当前表生效,也就是说,若接收到针对当前表的查询请求,且查询请求中携带上述参数信息,就说明需要对该查询请求进行缓存。
在上述实施例中,所述表均是指数据库中的数据表,对此不做限制。
当然,上述实施例只是给出了“是否对查询请求进行缓存”的几个示例,对此不做限制。例如,若需要对查询请求进行缓存,则可以在查询请求中携带hint参数,如查询请求为/*+withResultCache=true*/select id,name from test limit 100;由于查询请求携带hint参数,因此前端服务器在接收到该查询请求后,确定对查询请求进行缓存。又例如,前端服务器还可以通过全局、数据库、表组、表的开关状态,强制使用缓存。具体的,若全局开关开启,则对所有查询请求进行缓存;若全局开关关闭,数据库开关开启,则对该数据库的所有查询请求进行缓存;若全局开关、数据库开关关闭,表组开关开启,则对该表组的所有查询请求进行缓存;若全局开关、数据库开关、表组开关均关闭,而表开关开启,则对该表的所有查询请求进行缓存。
情况三、参见图4C所示,为数据查询方法的另一个流程图,该方法包括:
步骤431,解析查询请求中携带的参数信息。
步骤432,判断参数表中是否存在该参数信息。
如果是,则可以执行步骤433;如果否,则可以执行步骤436。
步骤433,确定与该查询请求对应的数据集的资源占用信息。
步骤434,根据该资源占用信息确定是否对该查询请求进行缓存。
如果是,则可以执行步骤435;如果否,则可以执行步骤436。
步骤435,在外存储器中存储该查询请求对应的上下文信息。
步骤436,获取该查询请求对应的数据集,并将该数据集发送给客户端。
在上述情况一、情况二、情况三中,针对“获取该查询请求对应的数据集,并将该数据集发送给客户端”的过程,前端服务器可以对该查询请求进行SQL解析,利用SQL解析结果生成调度请求,并将该调度请求发送给计算服务器,该调度请求用于请求与该查询请求对应的数据;然后,前端服务器可以接收计算服务器返回的数据集,并将该数据集发送给客户端。
针对步骤303,在一个例子中,针对“从外存储器中读取上下文信息”的过程,可以包括但不限于:利用外存储器中的上下文信息,确定该上下文信息对应的优先级;然后,利用该上下文信息对应的优先级,对外存储器中的上下文信息进行排序,并根据排序结果,从外存储器中读取优先级最高的上下文信息。
其中,外存储器可能存储多个上下文信息,针对所有的上下文信息,前端服务器可以确定每个上下文信息对应的优先级,并按照优先级从高到低的顺序,对外存储器中的所有上下文信息进行排序,读取优先级最高的上下文信息,即第一个上下文信息;或者,按照优先级从低到高的顺序,对外存储器中的所有上下文信息进行排序,读取优先级最高的上下文信息,即最后一个上下文信息。
在一个例子中,上下文信息可以包括但不限于查询请求的接收时间、查询请求对应的预测数据量,参见表1所示。基于此,针对“利用外存储器中的上下文信息,确定该上下文信息对应的优先级”的过程,可以包括但不限于:前端服务器利用当前时间与该接收时间的差,确定该上下文信息对应的等待时间;然后,根据该等待时间和该预测数据量,确定该上下文信息对应的优先级。
进一步的,针对“根据该等待时间和该预测数据量,确定该上下文信息对应的优先级”的过程,可以包括但不限于:对该等待时间进行归一化处理,得到第一子权重;对该预测数据量进行归一化处理,得到第二子权重;然后,根据该第一子权重和该第二子权重获得第一权重值;然后,根据该第一权重值获得上下文信息对应的优先级;其中,当第一权重值越大时,则上下文信息对应的优先级越高,当第一权重值越小时,则上下文信息对应的优先级越低。
例如,假设外存储器中的上下文信息如表3所示,前端服务器可以确定查询标识101对应的优先级A、查询标识102对应的优先级B、查询标识103对应的优先级C。具体的,假设当前时间为2017.11.6.14.16.18(即2017年11月6日14时16分18秒),则查询标识101对应的等待时间可以为8秒,查询标识102对应的等待时间可以为3秒,查询标识103对应的等待时间为可以1秒。
表3
查询标识 接收时间 用户信息 预测数据量 查询请求
101 2017.11.6.14.16.10 IP地址A 25.6M 查询请求1
102 2017.11.6.14.16.15 IP地址B 10M 查询请求2
103 2017.11.6.14.16.17 IP地址C 2M 查询请求3
然后,对等待时间进行归一化处理,即可以将等待时间转换为0与1之间的数值,在转换的过程中,当等待时间越大时,则归一化后的第一子权重越大。例如,将查询标识对应的等待时间除以所有查询标识的等待时间之和。因此,查询标识101对应的第一子权重为8/12,即0.667,查询标识102对应的第一子权重为3/12,即0.25,查询标识103对应的第一子权重为1/12,即0.083。当然,上述方式只是对等待时间进行归一化处理的一个示例,对此处理方式不做限制。
对预测数据量进行归一化处理,即可以将预测数据量转换为0与1之间的数值,在转换的过程中,当预测数据量越大时,则归一化后的第二子权重越小。例如,先使用所有查询标识的预测数据量之和减去查询标识对应的预测数据量,然后用计算结果除以所有查询标识的预测数据量之和。因此,查询标识101对应的第二子权重为(37.6-25.6)/37.6,即0.319,查询标识102对应的第二子权重为(37.6-10)/37.6,即0.734,查询标识103对应的第二子权重为(37.6-2)/37.6,即0.947。当然,上述只是归一化处理的示例,对此处理方式不做限制。
将第一子权重和第二子权重的积作为第一权重值。例如,查询标识101对应的第一权重值为0.667*0.319,即0.213;查询标识102对应的第一权重值为0.25*0.734,即0.183;查询标识103对应的第一权重值为0.083*0.947,即0.079。
然后,根据各第一权重值获得查询标识对应的优先级,对此不做限制,只要当第一权重值越大时,则优先级越高,当第一权重值越小时,则优先级越低即可。例如,第一权重值0.213对应的优先级为213(或21),第一权重值0.183对应的优先级为183(或18),第一权重值0.079对应的优先级为79(或8)。
综上所述,查询标识101对应的优先级可以高于查询标识102对应的优先级,且查询标识102对应的优先级可以高于查询标识103对应的优先级。也就是说,外存储器中的所有上下文信息的排序结果依次为:查询标识101对应的上下文信息、查询标识102对应的上下文信息、查询标识103对应的上下文信息。因此,前端服务器可以从外存储器中读取查询标识101对应的上下文信息。
然后,前端服务器可以从查询标识101对应的上下文信息中解析出查询请求1,并获取与该查询请求1对应的数据集,假设获取的数据集为数据集A。然后,在外存储器中存储该数据集A,参见表4所示,为存储数据集A后的示例。
表4
查询标识 接收时间 用户信息 预测数据量 查询请求 数据集
101 2017.11.6.14.16.10 IP地址A 25.6M 查询请求1 数据集A
102 2017.11.6.14.16.15 IP地址B 10M 查询请求2  
103 2017.11.6.14.16.17 IP地址C 2M 查询请求3  
针对步骤303,在一个例子中,针对“利用上下文信息获取与查询请求对应的数据集”的过程,可以包括但不限于如下方式:前端服务器可以从该上下文信息中解析出查询请求,并生成与该查询请求对应的调度请求,并可以向计算服务器发送该调度请求。然后,前端服务器可以接收计算服务器返回的与该调度请求对应的数据集,而这个数据集也就是与上述查询请求对应的数据集。其中,为了方便描述,可以将计算服务器向前端服务器返回的数据称为数据集。
其中,上述查询请求可以为SQL查询请求,因此,前端服务器对该查询请求进行SQL解析,获知该查询请求用于请求的数据集,如请求数据集1和数据集2。然后,前端服务器分析出数据集1对应的计算服务器,向该计算服务器发送用于请求数据集1的调度请求;计算服务器接收到该调度请求后,将数据集1发送给前端服务器。此外,前端服务器分析出数据集2对应的计算服务器,向该计算服务器发送用于请求数据集2的调度请求;计算服务器接收到该调度请求后,将数据集2发送给前端服务器。前端服务器在接收到数据集1和数据集2后,可以将数据集1和数据集2组成一个数据集,即与查询请求对应的数据集。
在步骤303中,前端服务器从外存储器中读取上下文信息时,即使外存储器中存在多个上下文信息,前端服务器的一次读取过程,可以只读取一个上下文信息,在利用所述上下文信息获取到与查询请求对应的数据集,并在外存储器中存储该数据集后,才会从外存储器中读取另一个上下文信息(在读取过程中,重新执行确定优先级、排序等操作,在此不再重复赘述),以此类推。
参见图2B所示,前端服务器可以启动一个调度线程,该调度线程先处理外存储器中的查询请求3,在获取到与查询请求3对应的数据集,并在外存储器中存储该数据集后,调度线程才处理外存储器中的查询请求4,以此类推。
针对步骤304,在一个例子中,针对“从该外存储器中读取数据集”的过程,可以包括但不限于如下方式:利用外存储器中的上下文信息,确定该上下文信息对应的数据集的优先级;然后,利用所述数据集的优先级,对该外存储器中的数据集进行排序,并根据数据集的排序结果,从该外存储器中读取数据集。
其中,外存储器中可能存储多个数据集,每个数据集有对应的上下文信息。针对所有数据集,前端服务器利用每个数据集对应的上下文信息,确定数据集的优先级;按照优先级从高到低的顺序,对外存储器中的所有数据集进行排序;或者,按照优先级从低到高的顺序,对外存储器中的所有数据集进行排序。
在一个例子中,前端服务器在获取到与查询请求对应的数据集后,还可以统计出该数据集的数据量(即真正的数据量,不再是预测的数据量),并将该数据集的数据量存储到上下文信息中。如表5所示,为上下文信息的一个示例。
表5
查询标识 接收时间 用户信息 预测数据量 查询请求 数据量 数据集
101 2017.11.6.14.16.10 IP地址A 25.6M 查询请求1 26M 数据集A
102 2017.11.6.14.16.15 IP地址B 10M 查询请求2 12M 数据集B
103 2017.11.6.14.16.17 IP地址C 2M 查询请求3 3M 数据集C
综上所述,上下文信息可以包括但不限于查询请求的接收时间、查询请求对应的数据集的数据量。基于此,针对“利用外存储器中的上下文信息,确定该上下文信息对应的数据集的优先级”的过程,可以包括但不限于:前端服务器利用当前时间与该接收时间的差,确定该上下文信息对应的等待时间;然后,根据该等待时间和数据集的数据量,确定该上下文信息对应的数据集的优先级。
进一步的,针对“根据该等待时间和数据集的数据量,确定该上下文信息对应的数据集的优先级”的过程,可以包括但不限于:对该等待时间进行归一化处理,得到第三子权重;对该数据量进行归一化处理,得到第四子权重;根据该第三子权重和该第四子权重获得第二权重值;根据该第二权重值获得数据集的优先级;其中,当第二权重值越大时,则上下文信息对应的数据集的优先级越高,当第二权重值越小时,则上下文信息对应的数据集的优先级越低。
其中,对等待时间进行归一化处理时,可以将等待时间转换为0与1之间的数值,在转换过程中,若等待时间越大,则归一化后的第三子权重越大。对数据量进行归一化处理时,可以将数据量转换为0与1之间的数值,在转换过程中,若数据量越大,则归一化后的第四子权重越小。根据第三子权重和第四子权重获得第二权重值时,将第三子权重和第四子权重的积作为第二权重值。
其中,根据“等待时间和数据集的数据量,确定数据集的优先级”的过程,与上述“根据等待时间和预测数据量,确定上下文信息对应的优先级”的过程类似,不同之处 在于数据集的数据量与预测数据量不同,在此不再重复赘述。
综上所述,假设数据集A的优先级高于数据集B的优先级,且数据集B的优先级高于数据集C的优先级,则排序结果为:数据集A、数据集B、数据集C。
在一个例子中,针对“根据数据集的排序结果,从该外存储器中读取数据集”的过程,可以包括但不限于如下方式:若按照优先级从高到低的顺序进行排序,则前端服务器可以根据数据集的排序结果,从该外存储器中读取排序靠前的N个数据集;或者,若按照优先级从低到高的顺序进行排序,则前端服务器可以根据数据集的排序结果,从该外存储器中读取排序靠后的N个数据集。
其中,所述N为大于等于1的正整数;而且,所述N个数据集对应的资源占用信息与前端服务器的当前占用资源之和,可以小于当前的资源阈值。
假设资源占用信息可以包括内存占用率、CPU占用率、数据集的数据量,则资源阈值可以包括内存阈值、CPU阈值、数据量阈值。而且,前端服务器还可以统计当前内存使用情况,当前CPU使用情况,当前带宽使用情况。
在此基础上,前端服务器确定优先级最高的数据集对应的内存占用率(参见步骤411)、CPU占用率(参见步骤411)、数据集的数据量(从表5中获知)。
若内存占用率与当前内存使用情况之和小于内存阈值,且CPU占用率与当前CPU使用情况之和小于CPU阈值,且数据集的数据量与当前带宽使用情况之和小于数据量阈值,则从外存储器中读取优先级最高的数据集。若上述三个条件中,有任意一个或者多个条件不满足,则不从外存储器中读取数据集,在等待预设时间后,继续判断是否能够从外存储器中读取优先级最高的数据集。
在从外存储器中读取优先级最高的数据集后,前端服务器确定优先级第二高的数据集对应的内存占用率、CPU占用率、数据集的数据量。若优先级最高的数据集对应的内存占用率、优先级第二高的数据集对应的内存占用率、当前内存使用情况之和小于内存阈值,且优先级最高的数据集对应的CPU占用率、优先级第二高的数据集对应的CPU占用率、当前CPU使用情况之和小于CPU阈值,且优先级最高的数据集对应的数据量、优先级第二高的数据集对应的数据量、当前带宽使用情况之和小于数据量阈值,则继续从外存储器中读取优先级第二高的数据集,以此类推。若上述三个条件中,有任意一个或者多个条件不满足,则不从外存储器中读取优先级第二高的数据集,结束读取流程。
综上所述,前端服务器可以从外存储器中读取N个数据集,且这N个数据集对应的资源占用信息与前端服务器的当前占用资源之和,小于资源阈值。
在一个例子中,前端服务器从外存储器中读取N个数据集后,就可以将读取的N个数据集发送给客户端,如利用上下文信息中的用户信息,将数据集发送给对应的客户端,对此发送过程不再赘述。然后,前端服务器还可以从外存储器中删除该数据集对应的上下文信息,至此完成该数据集的发送过程。
在步骤304中,前端服务器从外存储器中读取数据集时,即使外存储器中存在多个数据集,前端服务器的一次读取过程,可以只读取N个数据集,在将N个数据集发送给客户端后,才会重新从外存储器中读取数据集,以此类推。
基于上述技术方案,本申请实施例中,在接收到查询请求后,不是直接将数据集发送给客户端,而是先在外存储器中存储查询请求、数据集,这样,可以将查询请求对应的数据集缓存在本地,从而可以避免短时间内处理多个查询请求,即避免短时间内向客户端返回多个数据集,从而减轻CPU资源、内存资源、网络带宽的占用,避免客户端查询超时或者失败,提高用户使用感受。而且,上述方式可以将大数据量的查询请求缓存在本地的存储器,而小数据量的查询请求不用缓存在本地的存储器,从而可以直接将小数据量的数据集发送给客户端,不影响小数据量的查询请求,最大可能性的降低了full gc的频率。
参见图5A所示,是测试结果的示意图,通过查询一个列数据量很大的表,使用/*+withResultCache=true*/select id,name,….,region from test where id=xxx limit xxx分别limit 10、limit 100、limit 500、limit 1000、limit 2000、limit 5000、limit 8000、limit10000限制数据集的行数,当数据集的行数越多时,则数据集的数据量越大。图5A中,示出了数据集的数据量大小和平均查询耗时的关系。为了模拟内存使用率,在测试之前通过执行其它程序,使得内存使用率一直保持在75%左右,并异步执行上述查询操作。通过图5A可以看出,在使用本方案的方法后,当数据集很大时,可以明显降低查询耗时,并可以降低full gc的频率。
参见图5B所示,是内存使用率和查询耗时的关系示意图,通过运行其它程序控制内存使用率的方式,对某个SQL查询请求进行测试。通过测试可以发现,当内存使用率超过某个值后,若使用本方案的方法,则可以明显降低查询耗时。
参见图5C所示,是通过用户明细数据查询的示意图,如/*+withResultCache=true*/select id,name,….,region from test where id=xxx limit 5000,共查询100个列数据,其根据不同的并发数和查询超时率进行统计。从图5C中可以看出,查询的并发量越大,则超时率越高。而且,当超过一定的并发数后,查询超时率明显提高。显然,在使用本 方案的方法后,可以减缓查询超时率的提高。
参见图5D所示,是单独查询数据集较小的并发测试的一个示意图,如/*+withResultCache=true*/select id,name from test where id=xxx limit 50。5D中展示查询并发数和查询超时率之间的关系,与图5C类似,当超过一定的并发数后,查询超时率开始不断增加。显然,在使用本方案的方法后,可以将查询超时率突然增加的并发数临界值向后推迟,在同样的环境下,可以增大用户并发数。
参见图5E所示,是针对混合场景(即存在数据集较大的查询操作、数据集较小的查询操作)的示意图,是数据集较大的查询操作对数据集较小的查询操作的查询超时率的影响。从图5E中可以看出,当数据集较大的查询操作的并发数超过一定的临界值后,数据集较小的查询操作的查询超时率开始不断增加。显然,在使用本方案的方法后,可以减缓数据集较小的查询操作的查询超时率。
基于与上述方法同样的申请构思,本申请实施例还提供一种数据查询装置,可以应用于前端服务器,如图6所示,为该装置的结构图,该装置包括:
接收模块601,用于接收客户端发送的查询请求;
确定模块602,用于确定与所述查询请求对应的数据集的资源占用信息;
获取模块603,用于在根据所述资源占用信息确定对所述查询请求进行缓存时,则获取与所述查询请求对应的数据集;
存储模块604,用于在外存储器中存储所述数据集;
发送模块605,用于从外存储器中读取数据集,将所述数据集发送给客户端。
所述获取模块603,具体用于在获取与所述查询请求对应的数据集的过程中,在外存储器中存储所述查询请求对应的上下文信息;从所述外存储器中读取所述上下文信息,利用所述上下文信息获取与所述查询请求对应的数据集。
所述获取模块603,具体用于在从所述外存储器中读取所述上下文信息的过程中,利用所述外存储器中的上下文信息,确定上下文信息对应的优先级;利用上下文信息对应的优先级,对所述外存储器中的上下文信息进行排序;根据排序结果,从所述外存储器中读取优先级最高的上下文信息;
在利用所述上下文信息获取与所述查询请求对应的数据集的过程中,生成与所述查询请求对应的调度请求,并向计算服务器发送所述调度请求;接收所述计算服务器返回的与所述调度请求对应的数据集。
所述发送模块605,具体用于在从外存储器中读取数据集的过程中,利用外存储器 中的上下文信息,确定上下文信息对应的数据集的优先级;利用所述数据集的优先级,对所述外存储器中的数据集进行排序;根据所述数据集的排序结果,从所述外存储器中读取数据集。
基于与上述方法同样的申请构思,本申请实施例还提供了一种前端服务器,所述前端服务器可以包括:接收器、处理器和发射器;其中,接收器,用于接收客户端发送的查询请求;用于确定与所述查询请求对应的数据集的资源占用信息;若根据所述资源占用信息确定对所述查询请求进行缓存,则获取与所述查询请求对应的数据集,在外存储器中存储所述数据集;从外存储器中读取所述数据集;发射器,用于将所述数据集发送给客户端。
基于与上述方法同样的申请构思,本申请实施例还提供一种机器可读存储介质,可以应用于前端服务器,机器可读存储介质上存储有若干计算机指令,计算机指令被执行时进行如下处理:接收客户端发送的查询请求;确定与所述查询请求对应的数据集的资源占用信息;若根据所述资源占用信息确定对所述查询请求进行缓存,则获取与所述查询请求对应的数据集,在外存储器中存储所述数据集;从外存储器中读取所述数据集;将所述数据集发送给客户端。
上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机,计算机的具体形式可以是个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件收发设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任意几种设备的组合。
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可以由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机 程序指令到通用计算机、专用计算机、嵌入式处理机或其它可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其它可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
而且,这些计算机程序指令也可以存储在能引导计算机或其它可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或者多个流程和/或方框图一个方框或者多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其它可编程数据处理设备上,使得在计算机或者其它可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其它可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (23)

  1. 一种数据查询方法,其特征在于,应用于前端服务器,所述方法包括:
    接收客户端发送的查询请求;
    确定与所述查询请求对应的数据集的资源占用信息;
    若根据所述资源占用信息确定对所述查询请求进行缓存,则获取与所述查询请求对应的数据集,并在外存储器中存储所述数据集;
    从所述外存储器中读取所述数据集,并将所述数据集发送给客户端。
  2. 根据权利要求1所述的方法,其特征在于,
    所述确定与所述查询请求对应的数据集的资源占用信息之后,还包括:
    若根据所述资源占用信息确定不对所述查询请求进行缓存,则获取与所述查询请求对应的数据集,并将所述数据集发送给客户端。
  3. 根据权利要求1所述的方法,其特征在于,
    所述确定与所述查询请求对应的数据集的资源占用信息的过程,具体包括:
    通过所述查询请求携带的数据标识查询映射表,得到所述数据标识对应的数据集的资源占用信息;映射表用于记录数据标识与资源占用信息的对应关系;
    其中,所述前端服务器获取到与查询请求对应的数据集后,在所述映射表中记录该查询请求携带的数据标识与该数据集的资源占用信息的对应关系。
  4. 根据权利要求1或2所述的方法,其特征在于,
    所述确定与所述查询请求对应的数据集的资源占用信息之后,还包括:
    若所述资源占用信息与所述前端服务器的当前占用资源之和,大于等于资源阈值,则确定对所述查询请求进行缓存;或者,
    若所述资源占用信息与所述前端服务器的当前占用资源之和,小于所述资源阈值,则确定不对所述查询请求进行缓存。
  5. 根据权利要求4所述的方法,其特征在于,
    所述将所述数据集发送给客户端之后,所述方法还包括:
    获取所述查询请求对应的处理时间;其中,所述处理时间具体为:从接收到所述查询请求开始,一直到将所述数据集发送给客户端之间的时间;
    根据所述查询请求对应的处理时间,调整所述资源阈值。
  6. 根据权利要求5所述的方法,其特征在于,所述根据所述查询请求对应的处理时间,调整所述资源阈值的过程,具体包括:
    若所述查询请求对应的处理时间大于预设第一时间阈值,则提高所述资源阈值的取值;其中,提高后的资源阈值不大于资源阈值上限;
    若所述查询请求对应的处理时间小于预设第二时间阈值,则降低所述资源阈值的取值;其中,降低后的资源阈值不小于资源阈值下限。
  7. 根据权利要求1或2所述的方法,其特征在于,所述资源占用信息包括以下之一或者任意组合:内存占用率、CPU占用率、数据集的数据量。
  8. 根据权利要求1所述的方法,其特征在于,所述获取与所述查询请求对应的数据集,并在外存储器中存储所述数据集的过程,包括:
    在外存储器中存储所述查询请求对应的上下文信息;
    从所述外存储器中读取所述上下文信息,利用所述上下文信息获取与所述查询请求对应的数据集,并在所述外存储器中存储所述数据集。
  9. 根据权利要求8所述的方法,其特征在于,
    所述在外存储器中存储所述查询请求对应的上下文信息,包括:
    解析所述查询请求中携带的参数信息,并判断参数表中是否存在所述参数信息;其中,所述参数表用于记录需要进行缓存的参数信息;
    若是,在外存储器中存储所述查询请求对应的上下文信息。
  10. 根据权利要求9所述的方法,其特征在于,所述方法还包括:
    接收数据库创建请求,所述数据库创建请求携带需要进行缓存的参数信息,并将所述数据库创建请求中携带的参数信息记录到所述参数表;和/或,
    接收表组创建请求,所述表组创建请求携带需要进行缓存的参数信息,并将所述表组创建请求中携带的参数信息记录到所述参数表;和/或,
    接收表创建请求,所述表创建请求携带需要进行缓存的参数信息,并将所述表创建请求中携带的参数信息记录到所述参数表。
  11. 根据权利要求8所述的方法,其特征在于,
    所述从所述外存储器中读取所述上下文信息的过程,具体包括:
    利用所述外存储器中的上下文信息,确定上下文信息对应的优先级;
    利用上下文信息对应的优先级,对所述外存储器中的上下文信息进行排序;
    根据排序结果,从所述外存储器中读取优先级最高的上下文信息。
  12. 根据权利要求11所述的方法,其特征在于,所述上下文信息包括查询请求的接收时间、所述查询请求对应的预测数据量;所述利用所述外存储器中的上下文信息, 确定上下文信息对应的优先级的过程,具体包括:
    利用当前时间与所述接收时间的差,确定所述上下文信息对应的等待时间;
    根据所述等待时间和所述预测数据量,确定上下文信息对应的优先级。
  13. 根据权利要求12所述的方法,其特征在于,所述根据所述等待时间和所述预测数据量,确定上下文信息对应的优先级的过程,具体包括:
    对所述等待时间进行归一化处理,得到第一子权重;
    对所述预测数据量进行归一化处理,得到第二子权重;
    根据所述第一子权重和所述第二子权重获得第一权重值;
    根据所述第一权重值获得上下文信息对应的优先级;其中,当第一权重值越大时,则所述优先级越高,当第一权重值越小时,则所述优先级越低。
  14. 根据权利要求8所述的方法,其特征在于,所述上下文信息包括所述查询请求,所述利用所述上下文信息获取与所述查询请求对应的数据集,包括:
    生成与所述查询请求对应的调度请求,并向计算服务器发送所述调度请求;
    接收所述计算服务器返回的与所述调度请求对应的数据集。
  15. 根据权利要求1所述的方法,其特征在于,
    从所述外存储器中读取所述数据集的过程,具体包括:
    利用外存储器中的上下文信息,确定上下文信息对应的数据集的优先级;
    利用数据集的优先级,对所述外存储器中的数据集进行排序;
    根据数据集的排序结果,从所述外存储器中读取数据集。
  16. 根据权利要求15所述的方法,其特征在于,所述上下文信息包括查询请求的接收时间、所述查询请求对应的数据集的数据量;所述利用外存储器中的上下文信息,确定上下文信息对应的数据集的优先级的过程,具体包括:
    利用当前时间与所述接收时间的差,确定上下文信息对应的等待时间;
    根据所述等待时间和所述数据量,确定上下文信息对应的数据集的优先级。
  17. 根据权利要求16所述的方法,其特征在于,所述根据所述等待时间和所述数据量,确定上下文信息对应的数据集的优先级的过程,具体包括:
    对所述等待时间进行归一化处理,得到第三子权重;
    对所述数据量进行归一化处理,得到第四子权重;
    根据所述第三子权重和所述第四子权重获得第二权重值;
    根据所述第二权重值获得数据集的优先级;其中,当所述第二权重值越大时,则所 述优先级越高,当所述第二权重值越小时,则所述优先级越低。
  18. 根据权利要求16所述的方法,其特征在于,所述根据数据集的排序结果,从所述外存储器中读取数据集的过程,具体包括:
    若按照优先级从高到低的顺序进行排序,则根据数据集的排序结果,从所述外存储器中读取排序靠前的N个数据集;或者,若按照优先级从低到高的顺序进行排序,则根据数据集的排序结果,从所述外存储器中读取排序靠后的N个数据集;其中,N为大于等于1的正整数;所述N个数据集对应的资源占用信息与所述前端服务器的当前占用资源之和,小于资源阈值。
  19. 一种数据查询装置,其特征在于,应用于前端服务器,所述装置包括:
    接收模块,用于接收客户端发送的查询请求;
    确定模块,用于确定与所述查询请求对应的数据集的资源占用信息;
    获取模块,用于在根据所述资源占用信息确定对所述查询请求进行缓存时,则获取与所述查询请求对应的数据集;
    存储模块,用于在外存储器中存储所述数据集;
    发送模块,用于从外存储器中读取数据集,并将所述数据集发送给客户端。
  20. 根据权利要求19所述的装置,其特征在于,
    所述获取模块,具体用于在获取与所述查询请求对应的数据集的过程中,在外存储器中存储所述查询请求对应的上下文信息;从所述外存储器中读取所述上下文信息,利用所述上下文信息获取与所述查询请求对应的数据集。
  21. 根据权利要求20所述的装置,其特征在于,
    所述获取模块,具体用于在从所述外存储器中读取所述上下文信息的过程中,利用所述外存储器中的上下文信息,确定上下文信息对应的优先级;利用上下文信息对应的优先级,对所述外存储器中的上下文信息进行排序;根据排序结果,从所述外存储器中读取优先级最高的上下文信息;
    在利用所述上下文信息获取与所述查询请求对应的数据集的过程中,生成与所述查询请求对应的调度请求,并向计算服务器发送所述调度请求;接收所述计算服务器返回的与所述调度请求对应的数据集。
  22. 根据权利要求19所述的装置,其特征在于,
    所述发送模块,具体用于在从所述外存储器中读取数据集的过程中,利用外存储器中的上下文信息,确定上下文信息对应的数据集的优先级;利用所述数据集的优先级, 对所述外存储器中的数据集进行排序;根据所述数据集的排序结果,从所述外存储器中读取数据集。
  23. 一种前端服务器,其特征在于,所述前端服务器包括:
    接收器,用于接收客户端发送的查询请求;
    处理器,用于确定与所述查询请求对应的数据集的资源占用信息;若根据所述资源占用信息确定对所述查询请求进行缓存,则获取与所述查询请求对应的数据集,在外存储器中存储所述数据集;从外存储器中读取所述数据集;
    发射器,用于将所述数据集发送给客户端。
PCT/CN2019/071357 2018-01-19 2019-01-11 一种数据查询方法、装置及设备 WO2019141134A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2020539715A JP2021511588A (ja) 2018-01-19 2019-01-11 データクエリ方法、装置およびデバイス
EP19741895.7A EP3742306A4 (en) 2018-01-19 2019-01-11 DATA INTERROGATION PROCESS, APPARATUS AND DEVICE
US16/932,596 US11734271B2 (en) 2018-01-19 2020-07-17 Data query method, apparatus and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810053977.8A CN110109953B (zh) 2018-01-19 2018-01-19 一种数据查询方法、装置及设备
CN201810053977.8 2018-01-19

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/932,596 Continuation US11734271B2 (en) 2018-01-19 2020-07-17 Data query method, apparatus and device

Publications (1)

Publication Number Publication Date
WO2019141134A1 true WO2019141134A1 (zh) 2019-07-25

Family

ID=67301611

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/071357 WO2019141134A1 (zh) 2018-01-19 2019-01-11 一种数据查询方法、装置及设备

Country Status (5)

Country Link
US (1) US11734271B2 (zh)
EP (1) EP3742306A4 (zh)
JP (1) JP2021511588A (zh)
CN (1) CN110109953B (zh)
WO (1) WO2019141134A1 (zh)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765157A (zh) * 2019-09-06 2020-02-07 中国平安财产保险股份有限公司 数据查询方法、装置、计算机设备及存储介质
CN111107017A (zh) * 2019-12-06 2020-05-05 苏州浪潮智能科技有限公司 一种交换机报文拥塞的处理方法、设备以及存储介质
CN111260311A (zh) * 2020-01-09 2020-06-09 广东卓维网络有限公司 一种电量数据平台系统及分析方法
CN112947942A (zh) * 2021-04-01 2021-06-11 厦门亿联网络技术股份有限公司 一种数据的解析获取方法、装置、电子设备及存储介质
CN113568570A (zh) * 2021-06-22 2021-10-29 阿里巴巴新加坡控股有限公司 数据处理方法及装置
CN113742383A (zh) * 2021-09-03 2021-12-03 网银在线(北京)科技有限公司 数据存储方法、装置、设备及介质
CN113760968A (zh) * 2020-09-24 2021-12-07 北京沃东天骏信息技术有限公司 数据查询方法、装置、系统、电子设备及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111078735B (zh) * 2019-11-04 2023-06-30 苏宁云计算有限公司 一种查询请求处理方法及装置
CN111581234B (zh) * 2020-05-09 2023-04-28 中国银行股份有限公司 Rac多节点数据库查询方法、装置及系统
CN115150030B (zh) * 2021-03-31 2024-02-06 北京金山云网络技术有限公司 一种数据处理方法、装置、电子设备、存储介质及系统
CN113901008B (zh) * 2021-11-10 2022-09-27 上海意略明数字科技股份有限公司 数据处理方法及装置、存储介质、计算设备
CN114124968B (zh) * 2022-01-27 2022-05-20 深圳华锐金融技术股份有限公司 基于行情数据的负载均衡方法、装置、设备及介质
CN117880075A (zh) * 2022-10-11 2024-04-12 华为技术有限公司 数据管理方法、服务器端、客户端及系统
CN115344620B (zh) * 2022-10-19 2023-01-06 成都中科合迅科技有限公司 自定义数据池实现前后端分离后数据按需同步方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964816A (zh) * 2010-09-26 2011-02-02 用友软件股份有限公司 在b/s架构软件系统中浏览数据的方法和系统
CN102609488A (zh) * 2012-01-20 2012-07-25 北京星网锐捷网络技术有限公司 客户端及其数据查询方法、服务端和数据查询系统
CN104462194A (zh) * 2014-10-28 2015-03-25 北京国双科技有限公司 一种业务数据的处理方法、装置及服务器
CN105354193A (zh) * 2014-08-19 2016-02-24 阿里巴巴集团控股有限公司 数据库数据缓存方法、查询方法及缓存装置、查询装置
CN106372156A (zh) * 2016-08-30 2017-02-01 福建天晴数码有限公司 数据缓存方法及系统

Family Cites Families (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6571259B1 (en) 2000-09-26 2003-05-27 Emc Corporation Preallocation of file system cache blocks in a data storage system
JP4306152B2 (ja) 2001-06-26 2009-07-29 株式会社日立製作所 クラスタ化したアプリケーションサーバおよびデータベース構造を持つWebシステム
US6931408B2 (en) 2001-08-17 2005-08-16 E.C. Outlook, Inc. Method of storing, maintaining and distributing computer intelligible electronic data
US6823374B2 (en) * 2001-11-16 2004-11-23 Fineground Networks Adjusting the cacheability of web documents according to the responsiveness of its content server
US7290015B1 (en) 2003-10-02 2007-10-30 Progress Software Corporation High availability via data services
US7421562B2 (en) 2004-03-01 2008-09-02 Sybase, Inc. Database system providing methodology for extended memory support
JP4615344B2 (ja) 2005-03-24 2011-01-19 株式会社日立製作所 データ処理システム及びデータベースの管理方法
EP1990738B1 (en) 2007-05-07 2011-03-09 Software AG Method and server for synchronizing a plurality of clients accessing a database
US8185546B2 (en) 2007-08-13 2012-05-22 Oracle International Corporation Enhanced control to users to populate a cache in a database system
CN101207570A (zh) * 2007-11-26 2008-06-25 上海华为技术有限公司 数据传输方法、数据发送速率控制方法及基站
US7962502B2 (en) * 2008-11-18 2011-06-14 Yahoo! Inc. Efficient caching for dynamic webservice queries using cachable fragments
US8266125B2 (en) 2009-10-01 2012-09-11 Starcounter Ab Systems and methods for managing databases
CN101873475B (zh) * 2010-01-07 2012-09-05 杭州海康威视数字技术股份有限公司 控制命令发送方法、数据传输方法、监控系统及设备
US9092482B2 (en) * 2013-03-14 2015-07-28 Palantir Technologies, Inc. Fair scheduling for mixed-query loads
CN103853727B (zh) * 2012-11-29 2018-07-31 深圳中兴力维技术有限公司 提高大数据量查询性能的方法及系统
US9251210B2 (en) * 2013-04-19 2016-02-02 Oracle International Corporation Caching external data sources for SQL processing
US10114874B2 (en) * 2014-02-24 2018-10-30 Red Hat, Inc. Source query caching as fault prevention for federated queries
US11803547B2 (en) * 2017-05-19 2023-10-31 Oracle International Corporation System and method for query resource caching
US10706053B2 (en) * 2017-06-13 2020-07-07 Oracle International Corporation Method and system for defining an object-agnostic offlinable data storage model
US11113413B2 (en) * 2017-08-25 2021-09-07 Immuta, Inc. Calculating differentially private queries using local sensitivity on time variant databases

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101964816A (zh) * 2010-09-26 2011-02-02 用友软件股份有限公司 在b/s架构软件系统中浏览数据的方法和系统
CN102609488A (zh) * 2012-01-20 2012-07-25 北京星网锐捷网络技术有限公司 客户端及其数据查询方法、服务端和数据查询系统
CN105354193A (zh) * 2014-08-19 2016-02-24 阿里巴巴集团控股有限公司 数据库数据缓存方法、查询方法及缓存装置、查询装置
CN104462194A (zh) * 2014-10-28 2015-03-25 北京国双科技有限公司 一种业务数据的处理方法、装置及服务器
CN106372156A (zh) * 2016-08-30 2017-02-01 福建天晴数码有限公司 数据缓存方法及系统

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765157A (zh) * 2019-09-06 2020-02-07 中国平安财产保险股份有限公司 数据查询方法、装置、计算机设备及存储介质
CN110765157B (zh) * 2019-09-06 2024-02-02 中国平安财产保险股份有限公司 数据查询方法、装置、计算机设备及存储介质
CN111107017A (zh) * 2019-12-06 2020-05-05 苏州浪潮智能科技有限公司 一种交换机报文拥塞的处理方法、设备以及存储介质
CN111260311A (zh) * 2020-01-09 2020-06-09 广东卓维网络有限公司 一种电量数据平台系统及分析方法
CN113760968A (zh) * 2020-09-24 2021-12-07 北京沃东天骏信息技术有限公司 数据查询方法、装置、系统、电子设备及存储介质
CN112947942A (zh) * 2021-04-01 2021-06-11 厦门亿联网络技术股份有限公司 一种数据的解析获取方法、装置、电子设备及存储介质
CN113568570A (zh) * 2021-06-22 2021-10-29 阿里巴巴新加坡控股有限公司 数据处理方法及装置
CN113568570B (zh) * 2021-06-22 2024-04-12 阿里巴巴创新公司 数据处理方法及装置
CN113742383A (zh) * 2021-09-03 2021-12-03 网银在线(北京)科技有限公司 数据存储方法、装置、设备及介质

Also Published As

Publication number Publication date
EP3742306A4 (en) 2021-08-18
JP2021511588A (ja) 2021-05-06
US20200349160A1 (en) 2020-11-05
US11734271B2 (en) 2023-08-22
CN110109953A (zh) 2019-08-09
CN110109953B (zh) 2023-12-19
EP3742306A1 (en) 2020-11-25

Similar Documents

Publication Publication Date Title
WO2019141134A1 (zh) 一种数据查询方法、装置及设备
WO2019184739A1 (zh) 一种数据查询方法、装置及设备
US20060294049A1 (en) Back-off mechanism for search
WO2015172533A1 (zh) 数据库查询方法和服务器
US10182024B1 (en) Reallocating users in content sharing environments
CN110765138B (zh) 数据查询方法、装置、服务器及存储介质
CN111464615A (zh) 请求处理方法、装置、服务器及存储介质
EP3103032A1 (en) Trend response management
CN107636655B (zh) 实时提供数据即服务(DaaS)的系统和方法
WO2020211717A1 (zh) 一种数据处理方法、装置及设备
WO2019056958A1 (zh) 一种热点关键字获取方法、装置及服务器
CN110222046B (zh) 列表数据的处理方法、装置、服务器和存储介质
CN114553762A (zh) 一种对流表中的流表项处理的方法及装置
JP7048729B2 (ja) ネットワーク利用の最適化
CN107911484B (zh) 一种消息处理的方法及装置
US9280384B2 (en) Method, server and system for processing task data
CN113157777B (zh) 一种分布式实时查询数据的方法、集群、系统及存储介质
US7752203B2 (en) System and method for look ahead caching of personalized web content for portals
WO2021143199A1 (zh) 日志查询方法、装置、计算机设备和存储介质
CN110865845A (zh) 提高接口访问效率的方法、存储介质
TW202027480A (zh) 自動調節無伺服器程式之系統與其方法
CN112102821B (zh) 应用于电子设备的数据处理方法、装置、系统、介质
CN115114012B (zh) 一种任务分配方法、装置、电子设备及存储介质
US20240168798A1 (en) Automatic synchronous or asynchronous execution of requests
CN115543411A (zh) 应用配置发布系统、方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19741895

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020539715

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2019741895

Country of ref document: EP

Effective date: 20200819