WO2021143199A1 - 日志查询方法、装置、计算机设备和存储介质 - Google Patents

日志查询方法、装置、计算机设备和存储介质 Download PDF

Info

Publication number
WO2021143199A1
WO2021143199A1 PCT/CN2020/117888 CN2020117888W WO2021143199A1 WO 2021143199 A1 WO2021143199 A1 WO 2021143199A1 CN 2020117888 W CN2020117888 W CN 2020117888W WO 2021143199 A1 WO2021143199 A1 WO 2021143199A1
Authority
WO
WIPO (PCT)
Prior art keywords
query
log
complexity
data
entropy
Prior art date
Application number
PCT/CN2020/117888
Other languages
English (en)
French (fr)
Inventor
孙玉
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021143199A1 publication Critical patent/WO2021143199A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1734Details of monitoring file system events, e.g. by the use of hooks, filter drivers, logs

Definitions

  • This application relates to the field of cloud technology for big data. This application particularly relates to a log query method, device, computer equipment, and storage medium.
  • this application provides a log query method, the method includes: obtaining a log query request; analyzing the query time scale, query data scale, and query complexity corresponding to the log query request, and determining query entropy; and obtaining the current query load , And determine the time slice length according to the current query load and query entropy, where the current query load is based on the remaining total memory analysis; decompose the query statement carried in the log query request to obtain the target query statement; according to the time slice length, Execute the target query statement and get the log query result.
  • this application provides a log query device, the device includes: a query request acquisition module for acquiring a log query request; a query entropy determination module for analyzing the query time scale and query data scale corresponding to the log query request And query complexity, determine query entropy; time slice determination module, used to obtain the current query load capacity, and determine the time slice length according to the current query load capacity and query entropy, where the current query load capacity is based on the remaining total memory analysis Obtain; statement decomposition module, used to decompose the query statement carried in the log query request to obtain the target query statement; log fragmentation query module, used to execute the target query statement according to the time fragment length, and obtain the log query result.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the processor implements the following steps when executing the computer program: obtaining a log query request; analyzing a log query request Corresponding query time scale, query data scale and query complexity, determine query entropy; obtain the current query carrying capacity, and determine the time slice length according to the current query carrying capacity and query entropy, where the current query carrying capacity is based on the remaining total memory The analysis is obtained; the query statement carried in the log query request is decomposed to obtain the target query statement; the target query statement is executed according to the length of the time slice, and the log query result is obtained.
  • this application also provides a computer-readable storage medium on which a computer program is stored.
  • the computer program When the computer program is executed by a processor, the following steps are implemented: obtaining a log query request; analyzing the query time scale corresponding to the log query request , Query data scale and query complexity, and determine query entropy;
  • Obtain the current query load and determine the time slice length according to the current query load and query entropy, where the current query load is obtained based on the analysis of the remaining total memory; decompose the query statement carried in the log query request to obtain the target query statement; The length of the time slice, the target query statement is executed, and the log query result is obtained.
  • the log query method, device, computer equipment, and storage medium provided in this application analyze the query time scale, query data scale, and query complexity involved in log query requirements to obtain user query requirements, combined with remaining memory, which is the actual carrying capacity, according to
  • the query entropy algorithm performs time slicing.
  • the query statement is decomposed, and the query statement is executed according to the length of the time slicing.
  • Each query of the user can be decomposed into a loadable query, which greatly reduces the resources of a single query Consumption improves the log query efficiency.
  • FIG. 1 is a diagram of the application environment of the log query method in an embodiment of this application
  • FIG. 2 is a schematic flowchart of a log query method in an embodiment of this application
  • FIG. 3 is a schematic flowchart of a log query method in another embodiment of this application.
  • FIG. 5 is a structural block diagram of a log query device in an embodiment of this application.
  • FIG. 6 is a structural block diagram of a log query device in another embodiment of this application.
  • Fig. 7 is an internal structure diagram of a computer device in an embodiment of the application.
  • the log query method provided in this application can be applied to the application environment as shown in FIG. 1.
  • the terminal 102 communicates with the server 104 through the network. Specifically, the user enters the corresponding log query field in the log query operation interface of the log system of the terminal 102, clicks the "query" button, the terminal 102 generates a log query request, sends the log query request to the server 104, and the server 104 obtains the log query Request, obtain log query request, analyze the query time scale, query data scale and query complexity corresponding to the log query request, determine query entropy, obtain the current query carrying capacity, and determine the time slice length according to the current query carrying capacity and query entropy Among them, the current query load is obtained based on the analysis of the remaining total memory, the query statement carried in the log query request is decomposed to obtain the target query statement, and the target query statement is executed according to the time slice length to obtain the log query result.
  • the terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart
  • a log query method which involves the field of cloud technology of big data. Taking the method applied to the server 104 in FIG. 1 as an example for description, the method includes the following steps:
  • Step 202 Obtain a log query request.
  • ES cluster is an open source, distributed, RESTful interface full-text search engine based on Lucene (hereinafter referred to as ES cluster).
  • the user can log in to the above-mentioned business system at the terminal, enter the corresponding log query fields in the log query operation interface of the system, such as the time range of the query, the name of the system group to be queried, and the corresponding query (search) sentence, and then click "Query" "Button to generate the corresponding log query request and send the log query request to the server.
  • Step 204 Analyze the query time scale, query data scale, and query complexity corresponding to the log query request, and determine query entropy.
  • the search entropy refers to the entropy corresponding to the current user's query demand.
  • the query time scale refers to the hours of the query time range, and the query data scale refers to the total GB of the index involved in the query.
  • the log query request carries corresponding user query demand data. According to the demand data carried in the log query request, the query time scale, query data scale, and query complexity involved in the user’s query demand can be analyzed and obtained, and then According to the query time scale, query data scale and query complexity, the corresponding query entropy is calculated.
  • Step 206 Obtain the current query carrying capacity, and determine the time slice length according to the current query carrying capacity and query entropy, where the current query carrying capacity is obtained based on the analysis of the remaining total memory.
  • the current query carrying capacity refers to the current remaining query carrying capacity of the cluster.
  • Time slice is a time slice (Timeslice), which is a microscopic CPU (Central Processing Unit) time allocated by the time-sharing operating system to each running process.
  • time slice refers to CPU allocation
  • the essence of the time given to each thread is to divide long tasks into short tasks, and then execute them one by one.
  • the time segment length can be understood as the query time length after splitting.
  • the cluster may refer to an Elastcisearch cluster.
  • the length of the time slice may be a log query request with a user query time range of 1 hour, divided into a request with a time slice length of 10 minutes, that is, log data of 10 minutes can be queried at a time.
  • the actual carrying capacity of the ES cluster and the safe concurrent query situation are taken into consideration, and the total remaining memory of the ES cluster is obtained. Then, according to the remaining memory of the cluster, combined with the number of concurrent queries , To determine the current query carrying capacity. According to the current query load and query entropy, determine the time slice length.
  • Step 208 Decompose the query statement carried in the log query request to obtain the target query statement.
  • the query sentence used by the user for search is SPL (Search Processing Language).
  • the query sentence submitted by the user can be called the first-level query sentence. Since the first-level query sentence may involve more processing, It takes a long time for the system to respond to user requests. In order to shorten the time for users to wait for query results, the first-level query statement can be decomposed into easy-to-process second-level statements to obtain the target statement, and then execute the second-level query statement to shorten the query time .
  • the first-level query statements that need to be decomposed include, but are not limited to, statements such as Stats/count, distinct_count, avg, sum, min, max, Eval/max, min, Tranpose, MovingAVG, Rollingstd, and Transaction.
  • the first-level query query statement can be: stats avg() by response
  • the above-mentioned first-level query statement can be decomposed into the second-level query statement: stats avg(), count() by response.
  • the average response time (average response time of each shard X total number of each shard) / total number of all shards.
  • Step 210 Execute the target query statement according to the length of the time slice to obtain the log query result.
  • the second-level query statement can be executed in sequence according to the length of the time slice.
  • the secondary query statement can be encapsulated into an ES API (Application Programming Interface, application programming interface) and submitted to the ES cluster for query, or query through an aggregate function to obtain the log query result, and further, it can also be saved in real time. Log query results.
  • ES API Application Programming Interface, application programming interface
  • the query time scale, query data scale and query complexity involved in the log query requirements are analyzed to obtain user query requirements, and then combined with the remaining memory, which is the actual carrying capacity, time slicing is performed according to the query entropy algorithm.
  • the query statement is decomposed, and the query statement is executed according to the length of the time slice.
  • Each query of the user can be decomposed into a loadable query, which greatly reduces the resource consumption of a single query and improves the efficiency of log query.
  • analyzing the query time scale, query data scale, and query complexity corresponding to the log query request, and determining the query entropy includes:
  • Step 224 Extract the query time range, query grouping data, and query statement carried in the log query request; determine the query time scale according to the query time range, determine the query data scale according to the query grouping data, and determine the query complexity according to the query statement; based on the query time scale , Query data scale and query complexity, and calculate query entropy.
  • the log query request submitted by the user carries user log query requirements, including the query time range, query grouping data, and query sentences.
  • the query time scale can be determined according to the extracted query time range, and the query data scale can be determined according to the query grouping data.
  • the query statement to determine the query complexity, and then calculate the corresponding query entropy.
  • the query grouping data includes the name of the group to be queried and the number of groups to be queried; determining the size of the query data according to the grouping data of the query includes: determining the number of indexes according to the number of groups to be queried, and searching for the corresponding group name according to the group name to be queried.
  • Index size determines the size of the query data.
  • querying group data includes querying a specific system name and the number of systems to be queried, and a corresponding index size is preset for a single system.
  • the corresponding index number can be determined according to the number of systems the user queries, if the user wants To query the log data of two systems, the number of indexes is 2.
  • the query entropy algorithm can facilitate the decomposition of the user log query request into corresponding time slices.
  • determining the query complexity according to the query statement includes:
  • Step 240 According to the preset query command complexity determination rule, the query command field in the query sentence is analyzed to obtain the query command complexity;
  • Step 241 Analyze the number of target fields in the query sentence, and determine the complexity of the query bucket according to a preset query bucket complexity calculation method
  • Step 242 Determine the query complexity according to the query command complexity and the query bucket complexity.
  • the query command complexity can include 1, 5, and 10. .
  • commands with a complexity of 1 include but are not limited to: Stats/count, distinct_count, avg, sum, min, max, Eval/abs, case, ceil, floor, len, if, low, substring, max, tolong , Trim, upper, isnum, issrt, now, Fields, Rename, Limit, Top, Save; commands with a complexity of 5 include but not limited to Tranpose, MovingAVG, Rollingstd, Transaction; commands with a complexity of 10 include But not limited to parse.
  • query complexity (O) query command complexity * query bucket complexity.
  • concept of the query complexity of the query statement is defined by oneself, which is beneficial to the fragmentation processing of the log query request.
  • obtaining the current query carrying capacity includes: obtaining remaining total memory data and query concurrent data; according to the remaining total memory data and query concurrent data, combined with preset bearer parameters to obtain the current query carrying capacity.
  • the current query carrying capacity is calculated according to the entropy of 1000 queries per 1GB of memory, and based on the developer's multiple test results and work experience, the estimated number of concurrent queries is 10. It is understandable that the number of concurrent queries can be set to different values according to different actual situations, which is not limited here.
  • the current query carrying capacity is obtained, taking into account the actual carrying capacity of the cluster memory and the number of concurrent queries of the platform, which can effectively avoid the poor actual carrying capacity of the cluster and at the same time generate a large number of concurrent queries, and reduce the occurrence of Full GC and triggering The number of circuit breakers.
  • determining the length of time slices according to the current query load and query entropy includes: step 226, according to the current query load and query entropy, obtain the number of query slices; according to the query time The scale and the number of query fragments determine the length of time fragments.
  • the calculated query entropy can be combined to split the query length of the user, and determine the number of query fragments and the length of time fragments.
  • the method further includes: step 212, summarizing the log query according to the summary logic corresponding to the preset query sentence As a result, the summarized log query results are pushed.
  • the preset data calculation logic is the summary logic corresponding to the query statement.
  • the summary logic corresponding to the preset query statement includes the standard Elasticsearch The query statistics syntax. If the query statement is an average-related query, then follow the average processing logic, count and summarize the log query results, and further push the summarized log query results.
  • the log query results after pushing the summary can be that if the user's query time scale is 24 hours, the query results can be dynamically displayed in reverse order, starting from the query results of the most recent one hour, and then displaying the second most recent The log query results within the hour range, and so on, the 24-hour range query results are displayed in 24 batches to avoid long waiting time for users and reduce the resource consumption of a single query.
  • a log query device including: a query request obtaining module 510, a query entropy determination module 520, a time segment determination module 530, a sentence decomposition module 540, and log segmentation Query module 550, where:
  • the query request obtaining module 510 is used to obtain the log query request
  • the query entropy determination module 520 is used to analyze the query time scale, query data scale, and query complexity corresponding to the log query request, and determine query entropy;
  • the time slicing determining module 530 is configured to obtain the current query carrying capacity and determine the time slicing length according to the current query carrying capacity and query entropy, wherein the current query carrying capacity is obtained based on the analysis of the remaining total memory;
  • the statement decomposition module 540 is used to decompose the query statement carried in the log query request to obtain the target query statement;
  • the log fragmentation query module 550 is used to execute the target query statement according to the length of the time fragmentation to obtain the log query result.
  • the query entropy determination module 520 is also used to extract the query time range, query grouping data, and query sentences carried in the log query request, determine the query time scale according to the query time range, determine the query data scale according to the query grouping data, and The query complexity is determined according to the query statement, and the query entropy is calculated based on the query time scale, the query data scale and the query complexity.
  • the query entropy determination module 520 includes a query data size determination unit, which is used to determine the number of indexes according to the number of groups to be queried, find the corresponding index size according to the name of the group to be queried, and according to the number of indexes and The index size determines the size of the query data.
  • the query entropy determining module 520 further includes a query complexity determining unit, which is used to analyze the query command field in the query sentence according to a preset query command complexity determination rule to obtain the query command complexity, and analyze the query
  • the number of target fields in the statement determines the complexity of the query bucket according to the preset query bucket complexity calculation method, and determines the query complexity according to the complexity of the query command and the complexity of the query bucket.
  • the device further includes a current query carrying capacity determination module 560, which is used to obtain the remaining total memory data and query concurrent data, according to the remaining total memory data and the query concurrent data, combined with the preset bearer Parameter to get the current query carrying capacity.
  • a current query carrying capacity determination module 560 which is used to obtain the remaining total memory data and query concurrent data, according to the remaining total memory data and the query concurrent data, combined with the preset bearer Parameter to get the current query carrying capacity.
  • the time segment length determining module 530 is further configured to obtain the number of query segments according to the current query load and query entropy, and determine the time segment length according to the query time scale and the number of query segments.
  • the device further includes a query result processing module 570, configured to summarize the log query results according to the summary logic corresponding to the preset query sentence, and push the summarized log query results.
  • a query result processing module 570 configured to summarize the log query results according to the summary logic corresponding to the preset query sentence, and push the summarized log query results.
  • Each module in the above log query device can be implemented in whole or in part by software, hardware and a combination thereof.
  • the above-mentioned modules may be embedded in the form of hardware or independent of the processor in the computer equipment, or may be stored in the memory of the computer equipment in the form of software, so that the processor can call and execute the operations corresponding to the above-mentioned modules.
  • a computer device is provided.
  • the computer device may be a server, and its internal structure diagram may be as shown in FIG. 7.
  • the computer equipment includes a processor, a memory, and a network interface connected through a system bus. Among them, the processor of the computer device is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage medium.
  • the database of the computer equipment is used to store log data, current query load and other data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize a log query method.
  • FIG. 7 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • the specific computer device may Including more or fewer parts than shown in the figure, or combining some parts, or having a different arrangement of parts.
  • a computer device including a memory and a processor, and a computer program is stored in the memory.
  • the processor executes the computer program, the following steps are implemented: obtaining a log query request, and analyzing the query corresponding to the log query request Time scale, query data scale and query complexity, determine query entropy, obtain the current query load, and determine the time slice length according to the current query load and query entropy, where the current query load is based on the analysis of the remaining total memory,
  • the query statement carried in the log query request is decomposed to obtain the target query statement, and the target query statement is executed according to the time slice length to obtain the log query result. I won't repeat them here.
  • the processor further implements the following steps when executing the computer program: extracting the query time range, query grouping data, and query sentences carried in the log query request; determining the query time scale according to the query time range, and determining the query according to the query grouping data
  • the data scale and the query complexity are determined according to the query statement; the query entropy is calculated based on the query time scale, the query data scale and the query complexity.
  • the query group data includes the name of the group to be queried and the number of groups to be queried.
  • the processor executes the computer program, the following steps are also implemented: determine the number of indexes according to the number of groups to be queried; The size of the index; the size of the query data is determined according to the number of indexes and the size of the index.
  • the processor further implements the following steps when executing the computer program: according to preset query command complexity determination rules, analyze the query command field in the query statement to obtain the query command complexity; analyze the target in the query statement For the number of fields, the query bucket complexity is determined according to the preset query bucket complexity calculation method; the query complexity is determined according to the query command complexity and the query bucket complexity.
  • the processor further implements the following steps when executing the computer program: obtaining the remaining total memory data and querying concurrent data; according to the remaining total memory data and querying concurrent data, combined with preset bearer parameters, the current query capacity is obtained.
  • the processor further implements the following steps when executing the computer program: obtaining the number of query fragments according to the current query load and query entropy; and determining the length of the time fragment according to the query time scale and the number of query fragments.
  • the processor further implements the following steps when executing the computer program: according to the summary logic corresponding to the preset query statement, summarize the log query results; push the summarized log query results.
  • a computer-readable storage medium is provided.
  • the storage medium is a volatile storage medium or a non-volatile storage medium, and a computer program is stored thereon.
  • the computer program is executed by a processor, the following Steps: Obtain the log query request, analyze the query time scale, query data scale and query complexity corresponding to the log query request, determine the query entropy, obtain the current query load, and determine the time slice length according to the current query load and query entropy Among them, the current query load is obtained based on the analysis of the remaining total memory, the query statement carried in the log query request is decomposed to obtain the target query statement, and the target query statement is executed according to the time slice length to obtain the log query result.
  • the following steps are also implemented: extracting the query time range, query grouping data, and query sentences carried in the log query request; determining the query time scale according to the query time range, and determining the query group data according to the query time range
  • the query data scale and the query complexity are determined according to the query statement; the query entropy is calculated based on the query time scale, the query data scale and the query complexity.
  • the query group data includes the name of the group to be queried and the number of groups to be queried.
  • the following steps are also implemented: determine the number of indexes according to the number of groups to be queried; Corresponding index size; according to the number of indexes and index size, determine the size of the query data.
  • the following steps are also implemented: according to preset query command complexity determination rules, the query command field in the query sentence is analyzed to obtain the query command complexity; The number of target fields is determined according to the preset query bucket complexity calculation method to determine the query bucket complexity; the query complexity is determined according to the query command complexity and the query bucket complexity.
  • the following steps are also implemented: obtain the remaining total memory data and query concurrent data; according to the remaining total memory data and query concurrent data, combined with preset bearer parameters, obtain the current query load .
  • the following steps are further implemented: obtaining the number of query fragments according to the current query load and query entropy; and determining the length of the time fragment according to the query time scale and the number of query fragments.
  • the following steps are also implemented: according to the summary logic corresponding to the preset query sentence, the log query results are summarized; and the summarized log query results are pushed.
  • Non-volatile memory may include read-only memory (Read-Only Memory, ROM), magnetic tape, floppy disk, flash memory, or optical memory.
  • Volatile memory may include random access memory (Random Access Memory, RAM) or external cache memory.
  • RAM can be in many forms, such as Static Random Access Memory (Static Random Access Memory). Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM) etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种日志查询方法、装置、计算机设备和存储介质,涉及大数据的云技术领域。所述方法包括:获取日志查询请求(202),分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵(204),获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度(206),其中,当前查询承载量基于剩余总内存分析得到,分解日志查询请求携带的查询语句,得到目标查询语句(208),根据时间分片长度,执行目标查询语句,得到日志查询结果(210)。采用本方法可将用户的每次查询分解成可承载的查询,极大程度地降低了单次查询的资源消耗,提高了日志查询效率。

Description

日志查询方法、装置、计算机设备和存储介质
本申请要求于2020年6月30日提交中国专利局、申请号为202010613920.6,发明名称为“日志查询方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及大数据的云技术领域,本申请特别是涉及一种日志查询方法、装置、计算机设备和存储介质。
背景技术
随着互联网技术与业务的发展,应用系统和环境越来越复杂,在管理和维护大型复杂应用系统的过程中,日志的作用也越来越显著,通过日志可以快速地查找问题和故障,也可以通过日志对业务进行深度分析,因此,日志系统得到快速发展,出现了多款支持日志查询的商业软件和开源软件。
目前,大多数企业选择了开源方案,在日志的处理和存贮上使用开源索引引擎为底层方案,并在上层开发适合自身企业的业务分析系统。但发明人意识到大型企业由于产品数量多业务量大,产生日志量也相应较大,且分析场景复杂,在查询分析中,特别是在超长时间范围的查询分析中,采用直接交由底层索引引擎进行查询,极易引起垃圾数据冗余,或者触发相关断路器限制,影响整个日志平台的日志查询效率。
技术问题
大型企业由于产品数量多业务量大,产生日志量也相应较大,且分析场景复杂,在查询分析中,特别是在超长时间范围的查询分析中,采用直接交由底层索引引擎进行查询,极易引起垃圾数据冗余,或者触发相关断路器限制,影响整个日志平台的日志查询效率。
技术解决方案
基于此,有必要针对上述技术问题,提供一种能够提高日志查询效率的日志查询方法、装置、计算机设备和存储介质。
第一方面,本申请提供一种日志查询方法,所述方法包括:获取日志查询请求;分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵;获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度,其中,当前查询承载量基于剩余总内存分析得到;分解日志查询请求携带的查询语句,得到目标查询语句;根据时间分片长度,执行目标查询语句,得到日志查询结果。
第二方面,本申请提供一种日志查询装置,所述装置包括:查询请求获取模块,用于获取日志查询请求;查询熵确定模块,用于分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵;时间分片确定模块,用于获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度,其中,当前查询承载量基于剩余总内存分析得到;语句分解模块,用于分解日志查询请求携带的查询语句,得到目标查询语句;日志分片查询模块,用于根据时间分片长度,执行目标查询语句,得到日志查询结果。
第三方面,本申请还提供一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现以下步骤:获取日志查询请求;分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵;获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度,其中,当前查询承载量基于剩余总内存分析得到;分解日志查询请求携带的查询语句,得到目标查询语句;根据时间分片长度,执行目标查询语句,得到日志查询结果。
第四方面,本申请还提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以下步骤:获取日志查询请求;分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵;
获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度,其中,当前查询承载量基于剩余总内存分析得到;分解日志查询请求携带的查询语句,得到目标查询语句;根据时间分片长度,执行目标查询语句,得到日志查询结果。
有益效果
本申请所提供的日志查询方法、装置、计算机设备和存储介质,分析日志查询需求涉及的查询时间规模、查询数据规模和查询复杂度,得到用户查询需求,再结合剩余内存即实际承载能力,根据查询熵算法进行时间分片,同时,将查询语句进行分解,根据时间分片长度执行查询语句,可将用户的每次查询分解成可承载的查询,极大程度的降低了单次查询的资源消耗,提升了日志查询效率。
附图说明
图1为本申请中的一个实施例中日志查询方法的应用环境图;
图2为本申请中的一个实施例中日志查询方法的流程示意图;
图3为本申请中的另一个实施例中日志查询方法的流程示意图;
图4为本申请中的一个实施例中查询复杂度确定步骤的流程示意图;
图5为本申请中的一个实施例中日志查询装置的结构框图;
图6为本申请中的另一个实施例中日志查询装置的结构框图;
图7为本申请中的一个实施例中计算机设备的内部结构图。
本发明的最佳实施方式
本申请提供的日志查询方法,可以应用于如图1所示的应用环境中。其中,终端102通过网络与服务器104进行通信。具体可以是用户于终端102的日志系统的日志查询操作界面输入相应的日志查询字段,点击“查询”按钮,终端102生成日志查询请求,将日志查询请求发送至服务器104,服务器104获取该日志查询请求,获取日志查询请求,分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵,获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度,其中,当前查询承载量基于剩余总内存分析得到,分解日志查询请求携带的查询语句,得到目标查询语句,根据时间分片长度,执行目标查询语句,得到日志查询结果。其中,终端102可以但不限于是各种个人计算机、笔记本电脑、智能手机、平板电脑和便携式可穿戴设备,服务器104可以用独立的服务器或者是多个服务器组成的服务器集群来实现。
在一个实施例中,如图2所示,提供了一种日志查询方法,涉及大数据的云技术领域,以该方法应用于图1中的服务器104为例进行说明,包括以下步骤:
步骤202,获取日志查询请求。
在实际应用中,可以是用户在基于ES(ElasticSearch)集群索引开发业务系统中,进行日志查询操作。ES集群即一个基于Lucene构建的开源、分布式、RESTful接口的全文搜索引擎(以下简称ES集群)。具体的,可以是用户于终端登录上述业务系统,于系统的日志查询操作界面输入相应的日志查询字段,如查询的时间范围、查询的系统分组名称以及相应的查询(搜索)语句,点击“查询”按钮,生成相应的日志查询请求,并将日志查询请求发送至服务器。
步骤204,分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵。
本实施例中,查询熵(SearchEntropy)即与当前的用户查询需求所对应的熵。查询时间规模即指查询时间范围的小时数,查询数据规模指查询涉及的索引总GB数量。如上述实施例所述,日志查询请求中携带相应的用户查询需求数据,可以根据日志查询请求中携带的需求数据,分析得到用户查询需求涉及的查询时间规模、查询数据规模和查询复杂度,进而根据查询时间规模、查询数据规模和查询复杂度,计算相应的查询熵。
步骤206,获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度,其中,当前查询承载量基于剩余总内存分析得到。
本实施例中,当前查询承载量是指集群当前剩余的查询承载量。时间分片即时间片(Timeslice),其是分时操作系统分配给每个正在运行的进程微观上的一段CPU(Central Processing Unit,中央处理器)时间,简单来说,时间片即指CPU分配给各个线程的时间,其本质是将长任务分割成一个个时间很短的任务,再一个个执行。本实施例中,时间分片长度可以理解为拆分后的查询时间长度。本实施例中,集群可以是指Elastcisearch集群。具体实施时,时间分片长度可以是将用户查询时间范围为1小时的日志查询请求,分割成时间分片长度为10分钟的请求,即一次查询10分钟的日志数据。在确定时间分片长度时,本实施例中,考虑了ES集群的实际承载能力和平安的并发查询情况,获取ES集群的总剩余内存数量,然后,根据集群的剩余内存数量,结合查询并发数,确定当前查询承载量。根据当前查询承载量和查询熵,确定时间分片长度。
步骤208,分解日志查询请求携带的查询语句,得到目标查询语句。
在实际应用中,用户用于搜索的查询语句为SPL(Search processing Language,搜索处理语言),其中,用户提交的查询语句可称为一级查询语句,由于一级查询语句可能涉及较多处理,系统响应用户的请求的需要较长的时间,为了缩短用户等待查询结果的时间,可以将一级查询语句分解为易处理的二级语句,得到目标语句,然后执行二级查询语句,缩短查询时间。具体的,需要分解的一级查询语句包括但不限于类型是Stats/count、distinct_count、avg、sum、min、max,Eval/max、min、 Tranpose、MovingAVG、Rollingstd以及 Transaction等语句。例如,以统计平均相应时间为例,一级查询查询语句可以为:stats avg() by response,可将上述一级查询语句分解为二级查询语句:stats avg(), count() by response。其中,平均相应时间 = (每个分片的平均相应时间 X 每个分片的总数) / 所有分片的总数。
步骤210,根据时间分片长度,执行目标查询语句,得到日志查询结果。
如上述实施例所示,将一级查询语句分解为二级查询语句之后,可按照时间分片长度,依次执行二级查询语句。具体的,可以是将二级查询语句封装成 ES API(Application Programming Interface,应用程序接口)提交给ES集群进行查询,或者是通过聚合函数进行查询,得到日志查询结果,进一步的,还可以实时保存日志查询结果。
上述日志查询方法中,分析日志查询需求涉及的查询时间规模、查询数据规模和查询复杂度,得到用户查询需求,再结合剩余内存即实际承载能力考虑,根据查询熵算法进行时间分片,同时,将查询语句进行分解,根据时间分片长度执行查询语句,可将用户的每次查询分解成可承载的查询,极大程度的降低了单次查询的资源消耗,提高了日志查询效率。且进一步的,因为单次查询资源消耗的降低,能够减少了集群出现Full GC以及触发断路器的次数,降低了整个集群可用性的风险,可解决长期以来困扰大规模日志平台稳定性的问题。
在其中一个实施例中,分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵包括:
步骤224,提取日志查询请求携带的查询时间范围、查询分组数据以及查询语句;根据查询时间范围确定查询时间规模、根据查询分组数据确定查询数据规模以及根据查询语句确定查询复杂度;基于查询时间规模、查询数据规模和查询复杂度,计算查询熵。
具体实施时,用户提交的日志查询请求携带用户日志查询需求,具体包括查询时间范围、查询分组数据以及查询语句,可以是根据提取的查询时间范围确定查询时间规模、根据查询分组数据确定查询数据规模以及根据查询语句确定查询复杂度,进而计算相应的查询熵。
具体的,日志查询请求涉及的查询时间范围通常包括查询近期某一天(即24小时)的日志数据、或是具体某几小时的时间段的数据,又或者是覆盖较长时间范围的日志数据,可以是将用户查询的时间范围都统一以小时为单位核算,得到相应的查询时间规模(Hour)=查询时间范围的小时数。在另一个实施例中,查询分组数据包括待查询分组名称以及待查询分组数;根据查询分组数据确定查询数据规模包括:根据待查询分组数,确定索引数量,根据待查询分组名称,查找对应的索引大小,根据索引数量以及索引大小,确定查询数据规模。具体的,查询分组数据包括查询具体的某个系统名称以及查询的系统数量,且单个系统都对应预设有相应的索引大小,可以根据用户查询的系统数量,确定相应的索引数量,如用户想要查询2个系统的日志数据,则索引数量为2,基于此,可确定查询数据规模(Size,查询涉及的索引总GB数量) = 索引数量 * 索引大小。查询复杂度的确定可以是查询复杂度(O) = 查询命令复杂度 * 查询桶复杂度。得到查询时间规模(Hour)、查询数据规模(Size)以及查询复杂度(O)后,对应用户本次日志查询请求的查询熵的计算可以是:查询熵(SearchEntropy)= 时间规模(Hour)*数据规模(Size)*查询复杂度(O)。本实施例中,通过查询熵算法能够便于将用户日志查询请求的分解成相应的时间分片。
如图4所示,在其中一个实施例中,根据查询语句确定查询复杂度包括:
步骤240,按照预设的查询命令复杂度确定规则,分析查询语句中的查询命令字段,得到查询命令复杂度;
步骤241,分析查询语句中的目标字段数量,按照预设查询桶复杂度计算方式,确定查询桶复杂度;
步骤242,根据查询命令复杂度和查询桶复杂度,确定查询复杂度。
在实际应用中,开发人员可根据项目试验情况和个人经验,预设查询命令复杂度确定规则,为不同类型的查询命令添加相应的查询命令复杂度,查询命令复杂度可以包括1、5和10。例如,查询命令复杂度为1的命令包括但不限于:Stats/count、distinct_count、avg、sum、min、max、Eval/abs、case、ceil、floor、len、if、low、substring、max、tolong、trim、upper、isnum、issrt、now、Fields、Rename、Limit、Top、Save;查询命令复杂度为5的命令包括但不限于Tranpose、MovingAVG、Rollingstd、Transaction;查询命令复杂度为10的命名包括但不限于parse。具体实施时,可以是按照预设的查询命令复杂度确定规则,分析查询语句中的查询命令字段,如分析查询语句中是否包括如distinct_count、avg等查询命令字段,若存在相关查询命令字段,则对应为其添加相应的查询复杂度,然后,统计所有查询语句添加的查询命令复杂度,得到最终的查询命令复杂度。本实施例中,查询桶复杂度的确定可以是查询桶复杂度 = 10^桶数量,其中,桶(Bucket)数量的确定可以是分析查询语句中by身后紧跟的字段数量,得到目标字段数量,例如stats count() by user,url,by身后紧跟user和url两个字段,则确定相应的桶数量为2。按照上述方式得到查询命令方式和查询桶数量后,可通过查询复杂度(O) = 查询命令复杂度 * 查询桶复杂度,计算得到查询复杂度。本实施例中,自行定义了查询语句的查询复杂度的概念,有利于对日志查询请求进行分片处理。
在其中一个实施例中,获取当前查询承载量包括:获取剩余总内存数据和查询并发数据;根据剩余总内存数据和查询并发数据,结合预设承载参数,得到当前查询承载量。
具体实施时,开发人员需要事先预设好当前查询承载量,在预设当前查询承载量时,不仅考虑了集群的实际承载能力还考虑了系统平台的并发查询情况。实际应用中,当前查询承载量承载参数按照每1GB内存承载1000查询熵计算,且根据开发人员多次试验结果和工作经验,预估查询并发数为10。可以理解的是,查询并发数可根据不同的实际情况设定为不同的数值,在此不做限定。本实施例中,获取集群剩余总内存数量和查询并发数,结合当前查询承载量承载参数,可得到当前查询承载量 = 集群总剩余内存数量* 1000 / 10 。本实施例中,当前查询承载量的获取,考虑了集群内存实际承载能力和平台查询并发数,能够有效避免集群实际承载能力不佳且同时又产生大量查询并发情况,减少集群出现Full GC以及触发断路器的次数。
如图3所示,在其中一个实施例中,根据当前查询承载量和查询熵,确定时间分片长度包括:步骤226,根据当前查询承载量和查询熵,得到查询分片数;根据查询时间规模和查询分片数,确定时间分片长度。
具体实施时,在获取集群的单位当前查询承载量后,可结合计算出的查询熵,拆分用户的查询长度,确定查询分片数和时间分片长度。具体的,查询分片数为:查询分片数 = 查询熵 / 当前查询承载量。在得到查询分片数之后,时间分片长度的确定可以是:时间分片长度 = 查询时间规模 / 查询分片数。本实施例中,结合查询熵和当前查询承载量,确定时间分片,能够有效将查询分解成集群可承载的查询,提高单次查询的成功率。
如图3所示,在其中一个实施例中,根据时间分片长度,执行目标查询语句,得到日志查询结果之后,还包括:步骤212,按照预设的查询语句对应的汇总逻辑,汇总日志查询结果,推送汇总后的日志查询结果。
在实际应用中,得到日志查询结果之后,为了提高日志查询结果的可视性,可按照预设的数据计算逻辑即查询语句对应的汇总逻辑,预设的查询语句对应的汇总逻辑包括标准的Elasticsearch的查询统计语法。如若查询语句为平均数相关的查询,则按照平均数处理逻辑,统计汇总日志查询结果,并进一步推送汇总后的日志查询结果。具体的,推送汇总后的日志查询结果可以是若用户的查询时间规模为24小时,则可以倒序的方式动态展示查询结果,从最近的一小时的查询结果开始进行展示,再展示最近的第二个小时范围内的日志查询结果,以此类推,将24小时范围的查询结果分24批次进行展示,避免用户等待时间过长,且能够降低单次查询的资源消耗。
应该理解的是,虽然图2-4的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,图2-4中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
在其中一个实施例中,如图5所示,提供了一种日志查询装置,包括:查询请求获取模块510、查询熵确定模块520、时间分片确定模块530、语句分解模块540和日志分片查询模块550,其中:
查询请求获取模块510,用于获取日志查询请求;
查询熵确定模块520,用于分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵;
时间分片确定模块530,用于获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度,其中,当前查询承载量基于剩余总内存分析得到;
语句分解模块540,用于分解日志查询请求携带的查询语句,得到目标查询语句;
日志分片查询模块550,用于根据时间分片长度,执行目标查询语句,得到日志查询结果。
在其中一个实施例中,查询熵确定模块520还用于提取日志查询请求携带的查询时间范围、查询分组数据以及查询语句,根据查询时间范围确定查询时间规模、根据查询分组数据确定查询数据规模以及根据查询语句确定查询复杂度,基于查询时间规模、查询数据规模和查询复杂度,计算查询熵。
在其中一个实施例中,查询熵确定模块520包括查询数据规模确定单元,用于还用于根据待查询分组数,确定索引数量,根据待查询分组名称,查找对应的索引大小,根据索引数量以及索引大小,确定查询数据规模。
在其中一个实施例中,查询熵确定模块520还包括查询复杂度确定单元,用于按照预设的查询命令复杂度确定规则,分析查询语句中的查询命令字段,得到查询命令复杂度,分析查询语句中的目标字段数量,按照预设查询桶复杂度计算方式,确定查询桶复杂度,根据查询命令复杂度和查询桶复杂度,确定查询复杂度。
如图6所示,在其中一个实施例中,装置还包括当前查询承载量确定模块560,用于获取剩余总内存数据和查询并发数据,根据剩余总内存数据和查询并发数据,结合预设承载参数,得到当前查询承载量。
在其中一个实施例中,时间分片长度确定模块530还用于根据当前查询承载量和查询熵,得到查询分片数,根据查询时间规模和查询分片数,确定时间分片长度。
如图6所示,在其中一个实施例中,装置还包括查询结果处理模块570,用于按照预设的查询语句对应的汇总逻辑,汇总日志查询结果,推送汇总后的日志查询结果。
关于日志查询装置的具体限定可以参见上文中对于日志查询方法的限定,在此不再赘述。上述日志查询装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。
在一个实施例中,提供了一种计算机设备,该计算机设备可以是服务器,其内部结构图可以如图7所示。该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,该计算机设备的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存储器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储日志数据、当前查询承载量等数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现一种日志查询方法。
本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在其中一个实施例中,提供了一种计算机设备,包括存储器和处理器,存储器中存储有计算机程序,该处理器执行计算机程序时实现以下步骤:获取日志查询请求,分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵,获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度,其中,当前查询承载量基于剩余总内存分析得到,分解日志查询请求携带的查询语句,得到目标查询语句,根据时间分片长度,执行目标查询语句,得到日志查询结果。在此不再赘述。
在其中一个实施例中,处理器执行计算机程序时还实现以下步骤:提取日志查询请求携带的查询时间范围、查询分组数据以及查询语句;根据查询时间范围确定查询时间规模、根据查询分组数据确定查询数据规模以及根据查询语句确定查询复杂度;基于查询时间规模、查询数据规模和查询复杂度,计算查询熵。
在其中一个实施例中,查询分组数据包括待查询分组名称以及待查询分组数,处理器执行计算机程序时还实现以下步骤:根据待查询分组数,确定索引数量;根据待查询分组名称,查找对应的索引大小;根据索引数量以及索引大小,确定查询数据规模。
在其中一个实施例中,处理器执行计算机程序时还实现以下步骤:按照预设的查询命令复杂度确定规则,分析查询语句中的查询命令字段,得到查询命令复杂度;分析查询语句中的目标字段数量,按照预设查询桶复杂度计算方式,确定查询桶复杂度;根据查询命令复杂度和查询桶复杂度,确定查询复杂度。
在其中一个实施例中,处理器执行计算机程序时还实现以下步骤:获取剩余总内存数据和查询并发数据;根据剩余总内存数据和查询并发数据,结合预设承载参数,得到当前查询承载量。
在其中一个实施例中,处理器执行计算机程序时还实现以下步骤:根据当前查询承载量和查询熵,得到查询分片数;根据查询时间规模和查询分片数,确定时间分片长度。
在其中一个实施例中,处理器执行计算机程序时还实现以下步骤:按照预设的查询语句对应的汇总逻辑,汇总日志查询结果;推送汇总后的日志查询结果。
在一个实施例中,提供了一种计算机可读存储介质,所述存储介质为易失性存储介质或非易失性存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现以下步骤:获取日志查询请求,分析日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵,获取当前查询承载量、并根据当前查询承载量和查询熵,确定时间分片长度,其中,当前查询承载量基于剩余总内存分析得到,分解日志查询请求携带的查询语句,得到目标查询语句,根据时间分片长度,执行目标查询语句,得到日志查询结果。
在其中一个实施例中,计算机程序被处理器执行时还实现以下步骤:提取日志查询请求携带的查询时间范围、查询分组数据以及查询语句;根据查询时间范围确定查询时间规模、根据查询分组数据确定查询数据规模以及根据查询语句确定查询复杂度;基于查询时间规模、查询数据规模和查询复杂度,计算查询熵。
在其中一个实施例中,查询分组数据包括待查询分组名称以及待查询分组数,计算机程序被处理器执行时还实现以下步骤:根据待查询分组数,确定索引数量;根据待查询分组名称,查找对应的索引大小;根据索引数量以及索引大小,确定查询数据规模。
在其中一个实施例中,计算机程序被处理器执行时还实现以下步骤:按照预设的查询命令复杂度确定规则,分析查询语句中的查询命令字段,得到查询命令复杂度;分析查询语句中的目标字段数量,按照预设查询桶复杂度计算方式,确定查询桶复杂度;根据查询命令复杂度和查询桶复杂度,确定查询复杂度。
在其中一个实施例中,计算机程序被处理器执行时还实现以下步骤:获取剩余总内存数据和查询并发数据;根据剩余总内存数据和查询并发数据,结合预设承载参数,得到当前查询承载量。
在其中一个实施例中,计算机程序被处理器执行时还实现以下步骤:根据当前查询承载量和查询熵,得到查询分片数;根据查询时间规模和查询分片数,确定时间分片长度。
在其中一个实施例中,计算机程序被处理器执行时还实现以下步骤:按照预设的查询语句对应的汇总逻辑,汇总日志查询结果;推送汇总后的日志查询结果。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和易失性存储器中的至少一种。非易失性存储器可包括只读存储器(Read-Only Memory,ROM)、磁带、软盘、闪存或光存储器等。易失性存储器可包括随机存取存储器(Random Access Memory,RAM)或外部高速缓冲存储器。作为说明而非局限,RAM可以是多种形式,比如静态随机存取存储器(Static Random Access Memory,SRAM)或动态随机存取存储器(Dynamic Random Access Memory,DRAM)等。

Claims (20)

  1. 一种日志查询方法,其中,所述方法包括:
    获取日志查询请求;
    分析所述日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵;
    获取当前查询承载量、并根据所述当前查询承载量和所述查询熵,确定时间分片长度,其中,所述当前查询承载量基于剩余总内存分析得到;
    分解所述日志查询请求携带的查询语句,得到目标查询语句;
    根据所述时间分片长度,执行所述目标查询语句,得到日志查询结果。
  2. 根据权利要求1所述的方法,其中,所述分析所述日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵包括:
    提取所述日志查询请求携带的查询时间范围、查询分组数据以及查询语句;
    根据所述查询时间范围确定查询时间规模、根据所述查询分组数据确定查询数据规模以及根据所述查询语句确定查询复杂度;
    基于所述查询时间规模、所述查询数据规模和所述查询复杂度,计算查询熵。
  3. 根据权利要求2所述的方法,其中,所述查询分组数据包括待查询分组名称以及待查询分组数;根据所述查询分组数据确定查询数据规模包括:
    根据所述待查询分组数,确定索引数量、并根据所述待查询分组名称,查找对应的索引大小;
    根据所述索引数量以及所述索引大小,确定查询数据规模。
  4. 根据权利要求2所述的方法,其中,根据所述查询语句确定查询复杂度包括:
    按照预设的查询命令复杂度确定规则,分析所述查询语句中的查询命令字段,得到查询命令复杂度;
    分析所述查询语句中的目标字段数量,按照预设查询桶复杂度计算方式,确定查询桶复杂度;
    根据所述查询命令复杂度和所述查询桶复杂度,确定查询复杂度。
  5. 根据权利要求1至4任意一项所述的方法,其中,所述获取当前查询承载量包括:
    获取剩余总内存数据和查询并发数据;
    根据所述剩余总内存数据、所述查询并发数据以及预设承载参数,得到当前查询承载量。
  6. 根据权利要求1至4任意一项所述的方法,其中,根据所述当前查询承载量和所述查询熵,确定时间分片长度包括:
    根据所述当前查询承载量和所述查询熵,得到查询分片数;
    根据所述查询时间规模和所述查询分片数,确定时间分片长度。
  7. 根据权利要求1至4任意一项所述的方法,其中,所述根据所述时间分片长度,执行所述目标查询语句,得到日志查询结果之后,还包括:
    按照预设的查询语句对应的汇总逻辑,汇总所述日志查询结果;
    推送汇总后的所述日志查询结果。
  8. 一种日志查询装置,其中,所述装置包括:
    查询请求获取模块,用于获取日志查询请求;
    查询熵确定模块,用于分析所述日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵;
    时间分片确定模块,用于获取当前查询承载量、并根据所述当前查询承载量和所述查询熵,确定时间分片长度,其中,所述当前查询承载量基于剩余总内存分析得到;
    语句分解模块,用于分解所述日志查询请求携带的查询语句,得到目标查询语句;
    日志分片查询模块,用于根据所述时间分片长度,执行所述目标查询语句,得到日志查询结果。
  9. 一种计算机设备,其中,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现一种日志查询方法;
    其中,所述日志查询方法包括:
    获取日志查询请求;
    分析所述日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵;
    获取当前查询承载量、并根据所述当前查询承载量和所述查询熵,确定时间分片长度,其中,所述当前查询承载量基于剩余总内存分析得到;
    分解所述日志查询请求携带的查询语句,得到目标查询语句;
    根据所述时间分片长度,执行所述目标查询语句,得到日志查询结果。
  10. 根据权利要求9所述的计算机设备,其中,所述分析所述日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵包括:
    提取所述日志查询请求携带的查询时间范围、查询分组数据以及查询语句;
    根据所述查询时间范围确定查询时间规模、根据所述查询分组数据确定查询数据规模以及根据所述查询语句确定查询复杂度;
    基于所述查询时间规模、所述查询数据规模和所述查询复杂度,计算查询熵。
  11. 根据权利要求10所述的计算机设备,其中,所述查询分组数据包括待查询分组名称以及待查询分组数;根据所述查询分组数据确定查询数据规模包括:
    根据所述待查询分组数,确定索引数量、并根据所述待查询分组名称,查找对应的索引大小;
    根据所述索引数量以及所述索引大小,确定查询数据规模。
  12. 根据权利要求10所述的计算机设备,其中,根据所述查询语句确定查询复杂度包括:
    按照预设的查询命令复杂度确定规则,分析所述查询语句中的查询命令字段,得到查询命令复杂度;
    分析所述查询语句中的目标字段数量,按照预设查询桶复杂度计算方式,确定查询桶复杂度;
    根据所述查询命令复杂度和所述查询桶复杂度,确定查询复杂度。
  13. 根据权利要求9-12所述的计算机设备,其中,所述获取当前查询承载量包括:
    获取剩余总内存数据和查询并发数据;
    根据所述剩余总内存数据、所述查询并发数据以及预设承载参数,得到当前查询承载量。
  14. 根据权利要求9-12所述的计算机设备,其中,根据所述当前查询承载量和所述查询熵,确定时间分片长度包括:
    根据所述当前查询承载量和所述查询熵,得到查询分片数;
    根据所述查询时间规模和所述查询分片数,确定时间分片长度。
  15. 根据权利要求9-12所述的计算机设备,其中,所述根据所述时间分片长度,执行所述目标查询语句,得到日志查询结果之后,还包括:
    按照预设的查询语句对应的汇总逻辑,汇总所述日志查询结果;
    推送汇总后的所述日志查询结果。
  16. 一种计算机可读存储介质,其中,其上存储有计算机程序,所述计算机程序被处理器执行时实现一种日志查询方法,其中,所述日志查询方法包括:
    获取日志查询请求;
    分析所述日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵;
    获取当前查询承载量、并根据所述当前查询承载量和所述查询熵,确定时间分片长度,其中,所述当前查询承载量基于剩余总内存分析得到;
    分解所述日志查询请求携带的查询语句,得到目标查询语句;
    根据所述时间分片长度,执行所述目标查询语句,得到日志查询结果。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述分析所述日志查询请求对应的查询时间规模、查询数据规模和查询复杂度,确定查询熵包括:
    提取所述日志查询请求携带的查询时间范围、查询分组数据以及查询语句;
    根据所述查询时间范围确定查询时间规模、根据所述查询分组数据确定查询数据规模以及根据所述查询语句确定查询复杂度;
    基于所述查询时间规模、所述查询数据规模和所述查询复杂度,计算查询熵。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述查询分组数据包括待查询分组名称以及待查询分组数;根据所述查询分组数据确定查询数据规模包括:
    根据所述待查询分组数,确定索引数量、并根据所述待查询分组名称,查找对应的索引大小;
    根据所述索引数量以及所述索引大小,确定查询数据规模。
  19. 根据权利要求17所述的计算机可读存储介质,其中,根据所述查询语句确定查询复杂度包括:
    按照预设的查询命令复杂度确定规则,分析所述查询语句中的查询命令字段,得到查询命令复杂度;
    分析所述查询语句中的目标字段数量,按照预设查询桶复杂度计算方式,确定查询桶复杂度;
    根据所述查询命令复杂度和所述查询桶复杂度,确定查询复杂度。
  20. 根据权利要求16-19所述的计算机可读存储介质,其中,所述获取当前查询承载量包括:
    获取剩余总内存数据和查询并发数据;
    根据所述剩余总内存数据、所述查询并发数据以及预设承载参数,得到当前查询承载量。
PCT/CN2020/117888 2020-06-30 2020-09-25 日志查询方法、装置、计算机设备和存储介质 WO2021143199A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010613920.6A CN111767252A (zh) 2020-06-30 2020-06-30 日志查询方法、装置、计算机设备和存储介质
CN202010613920.6 2020-06-30

Publications (1)

Publication Number Publication Date
WO2021143199A1 true WO2021143199A1 (zh) 2021-07-22

Family

ID=72723420

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/117888 WO2021143199A1 (zh) 2020-06-30 2020-09-25 日志查询方法、装置、计算机设备和存储介质

Country Status (2)

Country Link
CN (1) CN111767252A (zh)
WO (1) WO2021143199A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115935090B (zh) * 2023-03-10 2023-06-16 北京锐服信科技有限公司 一种基于时间分片的数据查询方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271385A1 (en) * 2008-04-28 2009-10-29 Infosys Technologies Limited System and method for parallel query evaluation
CN102521405A (zh) * 2011-12-26 2012-06-27 中国科学院计算技术研究所 支持高速加载的海量结构化数据存储、查询方法和系统
CN108021618A (zh) * 2017-11-13 2018-05-11 北京天元创新科技有限公司 一种数据查询方法及系统
CN109033123A (zh) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 基于大数据的查询方法、装置、计算机设备和存储介质
CN110427390A (zh) * 2019-08-01 2019-11-08 北京明略软件系统有限公司 数据查询方法及装置、存储介质、电子装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6983239B1 (en) * 2000-10-25 2006-01-03 International Business Machines Corporation Method and apparatus for embedding grammars in a natural language understanding (NLU) statistical parser
US6889219B2 (en) * 2002-01-22 2005-05-03 International Business Machines Corporation Method of tuning a decision network and a decision tree model
JP5579140B2 (ja) * 2011-09-05 2014-08-27 日本電信電話株式会社 文書検索装置及び方法及びプログラム
US9594838B2 (en) * 2013-03-14 2017-03-14 Microsoft Technology Licensing, Llc Query simplification
CN103905456B (zh) * 2014-04-08 2017-02-15 上海交通大学 一种基于熵模型的dns反解攻击的检测方法
US9892125B1 (en) * 2014-05-23 2018-02-13 MapD Technologies, Inc. Method for logging update queries
CN104050297B (zh) * 2014-07-03 2017-09-29 中国工商银行股份有限公司 一种查询事务分配方法及装置
IL243113B (en) * 2015-12-15 2020-08-31 Picscout Israel Ltd Trademark recognition for automatic image search engines
US20190079943A1 (en) * 2017-09-11 2019-03-14 Blackfynn Inc. Real time and retrospective query integration
CN110321214A (zh) * 2018-03-29 2019-10-11 阿里巴巴集团控股有限公司 一种数据查询方法、装置及设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090271385A1 (en) * 2008-04-28 2009-10-29 Infosys Technologies Limited System and method for parallel query evaluation
CN102521405A (zh) * 2011-12-26 2012-06-27 中国科学院计算技术研究所 支持高速加载的海量结构化数据存储、查询方法和系统
CN108021618A (zh) * 2017-11-13 2018-05-11 北京天元创新科技有限公司 一种数据查询方法及系统
CN109033123A (zh) * 2018-05-31 2018-12-18 康键信息技术(深圳)有限公司 基于大数据的查询方法、装置、计算机设备和存储介质
CN110427390A (zh) * 2019-08-01 2019-11-08 北京明略软件系统有限公司 数据查询方法及装置、存储介质、电子装置

Also Published As

Publication number Publication date
CN111767252A (zh) 2020-10-13

Similar Documents

Publication Publication Date Title
NL2011613B1 (en) System and method for batch evaluation programs.
CA2829266C (en) System and method for batch evaluation programs
US8738645B1 (en) Parallel processing framework
US8756217B2 (en) Speculative switch database
US11775520B2 (en) Updating of a denormalized database object after updating, deleting, or inserting a record in a source database object
US10866960B2 (en) Dynamic execution of ETL jobs without metadata repository
WO2021143199A1 (zh) 日志查询方法、装置、计算机设备和存储介质
CN113220710A (zh) 数据查询方法、装置、电子设备以及存储介质
Xu et al. Banian: a cross-platform interactive query system for structured big data
CN114064729A (zh) 一种数据检索方法、装置、设备及存储介质
CN109543079B (zh) 数据查询方法、装置、计算设备及存储介质
CN112100186A (zh) 基于分布式系统的数据处理方法、装置、计算机设备
EP2990960A1 (en) Data retrieval via a telecommunication network
Zhang et al. An Efficient Massive Data Processing Model in the Cloud--A Preliminary Report
US20240169009A1 (en) System and method for estimated update timing of cached data
US20170149872A1 (en) Client-server system and terminal
DeSilva et al. Using streaming data and Apache Flink to infer energy consumption
CN116991779A (zh) 能源大数据服务化查询方法、装置、设备及介质
CN116827946A (zh) 负载均衡方法、数据同步方法、装置、电子设备及介质
CN116842225A (zh) 数据库查询方法、装置、设备、介质和程序产品
US20080162876A1 (en) dedicated hardware processor for structured query language (sql) transactions
CN117891827A (zh) 数据处理方法、数据处理装置以及计算机存储介质
CN114936246A (zh) 一种Redis数据的管理方法、装置、设备、存储介质及产品
CN115794555A (zh) 一种业务日志处理方法、装置、设备及存储介质
CN115587091A (zh) 数据入库方法、装置、设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20913342

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20913342

Country of ref document: EP

Kind code of ref document: A1