CN112182032A

CN112182032A - Slow query log analysis method, system, electronic device and computer-readable storage medium

Info

Publication number: CN112182032A
Application number: CN202011215943.8A
Authority: CN
Inventors: 李婉洁; 刘远; 郭颂
Original assignee: Beijing Minglue Zhaohui Technology Co Ltd
Current assignee: Beijing Minglue Zhaohui Technology Co Ltd
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2021-01-05

Abstract

The invention provides a slow query log analysis method, a system, electronic equipment and a computer readable storage medium, wherein the technical scheme of the method comprises a log summarizing step of summarizing generated slow query logs; a log analysis step, wherein the collected slow query logs are analyzed through a first-class data processing framework; and a result obtaining step, namely outputting the analysis result to a database and displaying the analysis result. The invention reduces the manpower labor, can automatically analyze the slow query and display the result, has high efficiency and real time, and can quickly analyze a large amount of slow queries, thereby positioning the reason of slow query as soon as possible. The method is widely applicable to json type slow query.

Description

Slow query log analysis method, system, electronic device and computer-readable storage medium

Technical Field

The invention belongs to the field of data processing, and particularly relates to a slow query log analysis method, a slow query log analysis system, electronic equipment and a computer-readable storage medium.

Background

Through daily observation of the retrieval engine, slow query logs of all machines are generated by slow queries every day, and sometimes a large number of slow queries are generated in some cases, so that problems occur in clusters. These slow queries are typically long, each occupying a space of roughly three pages. How to analyze the slow queries in real time and quickly finds unreasonable places where the queries possibly exist is the trend, and meanwhile, the method can assist service groups and operation and maintenance related personnel to solve problems as soon as possible, improve efficiency and achieve good benefits.

At present, most processing modes aiming at slow queries are manual analysis modes, and although tools such as a Profile API and a search profiler in kibana exist, when a large query is analyzed, only one slow query needs a long time to be analyzed, a time analysis result given finally is difficult to analyze due to the fact that the granularity is too fine, the problem that the query is slow in practice is not greatly solved, and the simple manual analysis consumes manpower and material resources.

Disclosure of Invention

The embodiment of the application provides a slow query log analysis method, a slow query log analysis system, electronic equipment and a storage medium, and aims to at least solve the problems of long time consumption and low efficiency of the existing slow query log analysis technology.

In a first aspect, an embodiment of the present application provides a slow query log analysis method, including:

a log summarizing step, namely summarizing the generated slow query logs;

a log analysis step, wherein the collected slow query logs are analyzed through a first-class data processing framework;

and a result obtaining step, namely outputting the analysis result to a database and displaying the analysis result.

Preferably, the log summarizing step includes drawing the slow query log through writing a script, and summarizing and inputting the slow query log into a publish and subscribe message system through a log transmission tool.

Preferably, the log transmission tool comprises a logstack tool.

Preferably, the publish and subscribe messaging system comprises Kafka.

Preferably, the log analyzing step includes programming in a stream data processing framework to realize the occurrence frequency of each key field of the statistical query body, and arranging the key fields in descending order according to the occurrence frequency.

Preferably, the log analyzing step further includes: if a query _ string field appears in the query body, analyzing internal parameters of the query _ string field; the internal parameters comprise value values corresponding to a query field, a max _ determined _ states field, a fuzzy _ prefix _ length field and a fuzzy _ max _ extensions field.

Preferably, the stream data processing framework includes Spark Streaming.

In a second aspect, an embodiment of the present application provides a slow query log analysis system, which is suitable for the slow query log analysis method, and includes:

the log summarizing unit is used for summarizing the generated slow query logs;

the log analysis unit is used for analyzing the summarized slow query logs through a first-class data processing framework;

and the result acquisition unit is used for outputting the analysis result to the database and displaying the analysis result.

In some embodiments, the log summarization unit comprises a module for pulling the slow query log by writing a script and summarizing and inputting the slow query log into a publish and subscribe message system through a log transmission tool.

In some of these embodiments, the log transmission tool comprises a Logstash tool.

In some of these embodiments, the publish and subscribe messaging system comprises Kafka.

In some embodiments, the log analysis unit comprises a processor programmed in a stream data processing framework to implement a statistical query body for respective key field frequency of occurrence and to sort the respective key fields in descending order of the frequency of occurrence.

In some of these embodiments, the log analysis unit further comprises: if a query _ string field appears in the query body, analyzing internal parameters of the query _ string field; the internal parameters comprise value values corresponding to a query field, a max _ determined _ states field, a fuzzy _ prefix _ length field and a fuzzy _ max _ extensions field.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the slow query log analysis method according to the first aspect.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements a slow query log analysis method as described in the first aspect above.

Compared with the related art, the slow query log analysis method provided by the embodiment of the application comprises the following steps:

1. manpower labor is reduced, and analysis and result display can be automatically performed on slow inquiry;

2. the method has high efficiency and real time, can quickly analyze a large amount of slow queries, and thus positions the reason for slow queries as soon as possible;

3. the method has universality and wide application range, and all Json type slow queries (general DSL queries are all in a Json format) can be applied to any service.

The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

FIG. 1 is a flow diagram of a slow query log analysis method according to an embodiment of the application;

FIG. 2 is a block diagram of a slow query log analysis system in accordance with an embodiment of the present application;

FIG. 3 is a block diagram of an electronic device according to an embodiment of the present application;

in the above figures:

11. a log summarizing unit; 12. a log analysis unit; 13. a result acquisition unit; 20. a bus; 21. a processor; 22. a memory; 23. a communication interface.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.

It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

Most processing modes aiming at slow queries at present are manual analysis modes, and although tools such as a Profile API and a search profiler in kibana exist, when a large query is analyzed, only one slow query needs a long time to be analyzed, and finally, given time analysis results are too fine in granularity and difficult to analyze, so that the problem of slow query in practice is not solved greatly. And the simple manual analysis consumes more manpower and material resources.

When the slow query is processed in a manual mode, time and labor are consumed due to unfamiliarity with slow query languages; the analysis by means of the analysis tool has various problems of long time consumption, low efficiency and the like, and the analysis tool does not have effect in solving the actual slow query problem, so that the embodiment of the application provides a slow query log analysis method, a slow query log analysis system, electronic equipment and a storage medium, and the embodiment of the application can be applied to elastic search.

Some of the terms of art to which the invention relates are described below:

the Elasticsearch is a Lucene-based search server. It provides a distributed multi-user capable full-text search engine based on RESTful web interface. The Elasticsearch was developed in the Java language and published as open source under the Apache licensing terms, a popular enterprise level search engine. The Elasticisearch is used in cloud computing, can achieve real-time searching, and is stable, reliable, rapid, convenient to install and use. Official clients are available in Java,. NET (C #), PHP, Python, apache groovy, Ruby and many other languages. The Elasticisearch is the most popular enterprise search engine, as shown by the DB-Engineers ranking, followed by Apache Solr, also based on Lucene.

The method for analyzing the query performance of the database statement can output an execution plan by using EXPLAIN, and can also enable the database to record the statement of the query exceeding a specified time, wherein the query of the database statement exceeding the specified time is called as 'slow query'.

Logstash is a platform for application logging, transmission, processing, management, and searching of events. The method can be used for uniformly collecting and managing application program logs and providing a Web interface for inquiry and statistics.

Kafka is an open source stream processing platform developed by the Apache software foundation, written in Scala and Java. Kafka is a high-throughput distributed publish-subscribe messaging system that can handle all the action flow data of a consumer in a web site. This action (web browsing, searching and other user actions) is a key factor in many social functions on modern networks. These data are typically addressed by handling logs and log aggregations due to throughput requirements. This is a viable solution to the limitations of Hadoop-like log data and offline analysis systems, but which require real-time processing. The purpose of Kafka is to unify online and offline message processing through the parallel loading mechanism of Hadoop, and also to provide real-time messages through clustering.

The Spark Streaming is an extension of a Spark core API, and can realize the processing of real-time Streaming data with high throughput and a fault-tolerant mechanism. Spark Streaming supports the retrieval of data from a variety of data sources, including Kafka, Flume, Twitter, ZeroMQ, Kinesis, and TCP Sockets. After data is obtained from the data source, high-level functions such as map, reduce, join, and window can be used for processing of complex algorithms, and finally, the processing results can be stored in a file system, a database, and a field instrument panel. On the basis of the Spark unified environment, other subframes of Spark, such as machine learning, graph calculation, etc., can be used to process the streaming data.

The slow query log analysis method provided by the embodiment of the application avoids time and labor waste and low efficiency of manual analysis, related slow query related contents are basically complete, the contents of most of slow query logs can be specifically analyzed, a service group and operation and maintenance related personnel can be rapidly assisted to perform problem location, and secondly, after Spark Streaming is combined, a large amount of slow queries can be rapidly analyzed in real time.

Referring to fig. 1, a flowchart of a slow query log analysis method according to an embodiment of the present application includes the following steps:

s101, summarizing the generated slow query logs;

s102, analyzing the summarized slow query logs through a first-class data processing framework;

and S103, outputting the analysis result to a database and displaying.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.

The machine of each dataode node in the Elasticsearch cluster generates some slow queries at an unfixed time each day and stores the slow queries into a log file under a specific directory. The log summarizing step comprises the steps of drawing the slow query logs through writing scripts, summarizing the slow query logs through a log transmission tool and inputting the summarized slow query logs into a publishing and subscribing message system. And a new generated slow query log of each machine can be automatically checked and pulled in real time by writing a get _ slowlog.

Wherein the log transmission tool comprises a logstack tool, and the publish and subscribe message system comprises Kafka. In the embodiment of the application, the slow query log pulled by the script is summarized and input into Kafka by means of Logstash.

The log analysis step comprises programming in a stream data processing frame to realize the occurrence frequency of each key field of a statistical query body, and arranging the key fields in a descending order according to the occurrence frequency.

Wherein the log analyzing step further comprises: if a query _ string field appears in the query body, analyzing internal parameters of the query _ string field; the internal parameters comprise value values corresponding to a query field, a max _ determined _ states field, a fuzzy _ prefix _ length field and a fuzzy _ max _ extensions field.

Wherein the stream data processing framework comprises Spark Streaming. In order to realize the processing of real-time Streaming data with high throughput and a fault-tolerant mechanism, Spark Streaming is selected and used, the analysis function of slow query is programmed in Spark Streaming, and the analysis result is stored in database and finally displayed on a web page.

The core of the embodiment of the application is to realize the analysis of the slow query on the Spark Streaming. Because the slow Query, i.e. the elastic search Query DSL Query, includes a plurality of search and filter manners, when a Query body is large, various queries are often involved, and some of the Query manners are time-consuming, which specifically includes the following:

(1) match _ phrase _ prefix query: a wildcard search is performed on the last Token in the inverted index list. Wherein the important parameters are as follows: fuzzy matching number control: max _ extensions has a default value of 50 and a minimum value of 1. This type of query is generally not recommended because the uncertainty factor for wildcard search is large and time consuming;

(2) wildcard query: in the method, the fuzzy search is carried out by adding wildcards at the head and the tail of the keyword, if the length of the input character string is not limited, the query is very slow, and a CPU is consumed;

(3) script query: when the script language is used for query, the written script needs to be well evaluated, if the script is used for calculating some operations such as dynamic fields and the like, resources are consumed very much, and the whole system is slowed down;

(4) query _ string query: this query is not problematic for normal use, but its internal parameters need to be carefully evaluated before use.

To sum up, the embodiment of the present application focuses on the situation that the above fields appear, and meanwhile, other fields also need to be focused on, such as: whether multiple filtering is performed, in other words, whether the filter occurs enough times (because filtering before querying can greatly speed up the querying speed).

Specifically, in the log analysis step in the embodiment of the present application, the times of occurrence of each key field of the query body, for example, the times of occurrence of query, filter, pool, term, and the like, are counted, and are sorted from large to small according to count, and in the fields, the frequency of occurrence of five fields, i.e., filter, match _ phrase _ prefix, wildcard, script, and query _ string, is focused.

When the query _ string field appears, the parameters in the query _ string field need to be analyzed separately, and mainly relate to the value values corresponding to the four fields of query, max _ determined _ states, fuzzy _ prefix _ length and fuzzy _ max _ extensions; for the query field, counting the total occurrence times of OR, AND AND NOT in the query field, AND using the total occurrence times to reflect the query of how many keywords the query relates to laterally; judging whether the value of the max _ minimized _ states field is too large or not, and limiting the regular expression which is too complex; for a fuzzy _ prefix _ length field, judging whether the value is too small, if so, the fuzzy search is very large, and the query is dragged; for the fuzzy _ max _ extensions field, judging whether the value is too large or not, too large will result in too much fuzzy search content, thus being time-consuming.

In the embodiment of the present application, step S103 yellow or red marks some results exceeding the preset index, so as to display more intuitively.

The embodiment of the application provides a slow query log analysis system, which is suitable for the slow query log analysis method. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.

Fig. 2 is a frame diagram of a slow query log analysis system according to an embodiment of the present application, and includes a log obtaining unit 11, a log analysis unit 12, and a result obtaining unit 13, where:

the log summarizing unit 11 summarizes the generated slow query logs;

the log analysis unit 12 is used for analyzing the summarized slow query logs through a first-class data processing framework;

and the result acquisition unit 13 outputs the analysis result to the database and displays the analysis result.

In some embodiments, the machine of each dataode node in the Elasticsearch cluster generates a slow query at an unfixed time each day and stores the slow query in a log file under a specific directory. The log summarizing step comprises the steps of drawing the slow query logs through writing scripts, summarizing the slow query logs through a log transmission tool and inputting the summarized slow query logs into a publishing and subscribing message system. And a new generated slow query log of each machine can be automatically checked and pulled in real time by writing a get _ slowlog.

In the embodiment of the application, the slow query log pulled by the script is summarized and input into Kafka by means of Logstash.

The above units may be functional units or program units, and may be implemented by software or hardware. For units implemented by hardware, the units may be located in the same processor; or the units may be located in different processors in any combination.

In addition, the method for analyzing the slow query log according to the embodiment of the present application described in conjunction with fig. 1 may be implemented by an electronic device. Fig. 3 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present application.

The electronic device may include a processor 21 and a memory 22 storing computer program instructions.

Specifically, the processor 21 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.

Memory 22 may include, among other things, mass storage for data or instructions. By way of example, and not limitation, memory 22 may include a Hard Disk Drive (Hard Disk Drive, abbreviated to HDD), a floppy Disk Drive, a Solid State Drive (SSD), flash memory, an optical Disk, a magneto-optical Disk, magnetic tape, or a Universal Serial Bus (USB) Drive or a combination of two or more of these. Memory 22 may include removable or non-removable (or fixed) media, where appropriate. The memory 22 may be internal or external to the data processing apparatus, where appropriate. In a particular embodiment, the memory 22 is a Non-Volatile (Non-Volatile) memory. In particular embodiments, Memory 22 includes Read-Only Memory (ROM) and Random Access Memory (RAM). The ROM may be mask-programmed ROM, Programmable ROM (PROM), Erasable PROM (EPROM), Electrically Erasable PROM (EEPROM), Electrically rewritable ROM (EAROM), or FLASH Memory (FLASH), or a combination of two or more of these, where appropriate. The RAM may be a Static Random-Access Memory (SRAM) or a Dynamic Random-Access Memory (DRAM), where the DRAM may be a Fast Page Mode Dynamic Random-Access Memory (FPMDRAM), an Extended data output Dynamic Random-Access Memory (EDODRAM), a Synchronous Dynamic Random-Access Memory (SDRAM), and the like.

The memory 22 may be used to store or cache various data files that need to be processed and/or used for communication, as well as possible computer program instructions executed by the processor 21.

The processor 21 implements any of the slow query log analysis methods in the above embodiments by reading and executing computer program instructions stored in the memory 22.

In some of these embodiments, the electronic device may also include a communication interface 23 and a bus 20. As shown in fig. 2, the processor 21, the memory 22, and the communication interface 23 are connected via the bus 20 to complete mutual communication.

The communication port 23 may be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.

The bus 20 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 20 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 20 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 20 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.

The electronic device can execute a slow query log analysis method in the embodiment of the application.

In addition, in combination with the slow query log analysis method in the foregoing embodiments, embodiments of the present application may provide a computer-readable storage medium to implement. The computer readable storage medium having stored thereon computer program instructions; the computer program instructions, when executed by a processor, implement any of the slow query log analysis methods of the embodiments described above.

And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A slow query log analysis method, comprising:

a log summarizing step, namely summarizing the generated slow query logs;

2. The method of claim 1, wherein the log summarization step comprises pulling the slow query logs by writing a script and summarizing the slow query logs into a publish and subscribe messaging system via a log transfer tool.

3. The slow query log analysis method of claim 2, wherein the log transmission tool comprises a Logstash tool.

4. A slow query log analysis method as claimed in claim 2 or 3, wherein the publish and subscribe message system comprises Kafka.

5. The slow-query log analysis method of claim 1, wherein the log analysis step comprises programming in a stream data processing framework to implement statistical query body occurrence frequency for each key field, and arranging the key fields in descending order of the occurrence frequency.

6. The slow-query log analysis method of claim 5, wherein the log analysis step further comprises: if a query _ string field appears in the query body, analyzing internal parameters of the query _ string field; the internal parameters comprise value values corresponding to a query field, a max _ determined _ states field, a fuzzy _ prefix _ length field and a fuzzy _ max _ extensions field.

7. The slow query log parsing method as claimed in any one of claims 1, 5 and 6, wherein said stream data processing framework comprises Spark Streaming.

8. A slow query log analysis system, comprising:

the log summarizing unit is used for summarizing the generated slow query logs;

9. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the slow query log analysis method of any of claims 1 to 7 when executing the computer program.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a slow query log analysis method according to any one of claims 1 to 7.