CN114625763A

CN114625763A - Information analysis method and device for database, electronic equipment and readable medium

Info

Publication number: CN114625763A
Application number: CN202210142991.1A
Authority: CN
Inventors: 高新刚
Original assignee: Jingdong Technology Information Technology Co Ltd
Current assignee: Jingdong Technology Information Technology Co Ltd
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2022-06-14

Abstract

The embodiment of the disclosure discloses an information analysis method and device for a database, an electronic device and a readable medium. One embodiment of the method comprises: periodically acquiring a slow query information set from a preset information queue, wherein the preset information queue is used for storing slow query information acquired from a database cluster; aggregating the slow query information in the slow query information set according to a preset dimension; storing the aggregation processing result in a preset database; and carrying out statistical analysis on related data in a preset database according to user operation to generate distribution information of the slow query information. The implementation mode is related to cloud technology and cloud computing, analysis and processing can be carried out on slow query information from multiple dimensions, and a user can conveniently monitor the performance state of the database server, so that problems can be found timely.

Description

Information analysis method and device for database, electronic equipment and readable medium

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a cloud technology, and specifically relates to an information analysis method and apparatus for a database, an electronic device, and a readable medium.

Background

Slow queries generally refer to SQL statements (a language that operates on a database) in MySQL (relational database management system) in which the length of a query has run through a specified length of time. With the increasing number of databases, the generation of slow queries is also growing exponentially. A large number of slow queries often present a performance hazard. However, in everyday system operations, the relevant slow queries may not be prominent and often not effectively perceived.

When a performance problem occurs, a DBA (Database Administrator) needs to directly log in an online server to view relevant performance indexes. Not only is the operation complicated, but also the analysis process takes a lot of time. Furthermore, historical slow query related indicators and trends cannot be monitored.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter. Some embodiments of the present disclosure propose an information analysis method, apparatus, electronic device, and computer-readable medium for a database to solve one or more of the technical problems mentioned in the background section above.

In a first aspect, some embodiments of the present disclosure provide an information analysis method for a database, the method including: periodically acquiring a slow query information set from a preset information queue, wherein the preset information queue is used for storing slow query information acquired from a database cluster; aggregating the slow query information in the slow query information set according to a preset dimension; storing the aggregation processing result in a preset database; and carrying out statistical analysis on the data in the preset database to generate the distribution information of the slow query information.

In some embodiments, aggregating slow query information in the slow query information set according to a preset dimension includes: for each slow query message in the slow query message set, modifying a target character in the slow query message into a preset character, and generating a hash value of the modified slow query message; and performing aggregation processing on the generated hash values according to a preset dimension.

In some embodiments, generating a hash value of the modified slow query information comprises: and generating a hash value of the modified slow query information by adopting an information summarization algorithm.

In some embodiments, periodically obtaining a slow query information set from a pre-defined information queue includes: and determining slow query information with mutually matched topics from a preset information queue, and generating a slow query information set according to the determined slow query information.

In some embodiments, the performing statistical analysis on the data in the preset database to generate the distribution information of the slow query information includes: generating a total amount distribution graph of slow query information in a first time period in response to receiving a selection operation of a user on the first time period; and in response to detecting a user's selection operation of a first time point in the total amount distribution map, determining the amount of slow query information of each database instance at the first time point; and generating a database instance distribution graph at a first time point according to the quantity of the slow query information of each database instance.

In some embodiments, statistically analyzing the data in the preset database to generate the distribution information of the slow query information, further includes: in response to detecting that a user selects a target database instance in the database instance distribution map, determining a database associated with the target database instance; and generating a database distribution graph in the second time period according to the determined amount of the slow query information of the database in the second time period.

In some embodiments, statistically analyzing the data in the preset database to generate the distribution information of the slow query information, further includes: in response to detecting that a user selects a target database in the database distribution map, determining slow query information of the target database in a second time period; determining the number of slow query information in different query time ranges according to the determined query time of the slow query information, and generating a slow query time distribution graph of the target database in a second time period; and responding to the detected user selection operation on a second time point in the slow query duration distribution map, and generating a quantity ratio distribution map of the slow query information in different query duration ranges at the second time point.

In a second aspect, some embodiments of the present disclosure provide an information analysis apparatus for a database, the apparatus comprising: the acquisition unit is configured to periodically acquire a slow query information set from a preset information queue, wherein the preset information queue is used for storing slow query information acquired from a database cluster; the processing unit is configured to aggregate the slow query information in the slow query information set according to a preset dimension; a result storage unit configured to store the aggregation processing result in a preset database; and the generating unit is configured to perform statistical analysis on the data in the preset database to generate the distribution information of the slow query information.

In some embodiments, the processing unit is further configured to, for each slow query information in the set of slow query information, modify a target character in the slow query information to a preset character, generate a hash value of the modified slow query information; and performing aggregation processing on the generated hash values according to a preset dimension.

In some embodiments, the processing unit is further configured to generate a hash value of the modified slow query information using an information digest algorithm.

In some embodiments, the obtaining unit is further configured to determine slow query information with mutually matched topics from a preset information queue, and generate a slow query information set according to the determined slow query information.

In some embodiments, the generating unit is further configured to generate a total amount distribution map of the slow query information in the first time period in response to receiving a user selection operation of the first time period; and in response to detecting a user's selection operation of a first time point in the total amount distribution map, determining the amount of slow query information of each database instance at the first time point; and generating a database instance distribution graph at a first time point according to the quantity of the slow query information of each database instance.

In some embodiments, the generation unit is further configured to determine a database associated with the target database instance in response to detecting a user selection operation on the target database instance in the database instance profile; and generating a database distribution graph in the second time period according to the determined quantity of the slow query information of the database in the second time period.

In some embodiments, the generating unit is further configured to determine slow query information of the target database in the second time period in response to detecting a user selection operation of the target database in the database profile; determining the quantity of the slow query information in different query time ranges according to the determined query time of the slow query information, and generating a slow query time distribution graph of the target database in a second time period; and responding to the detected user selection operation on a second time point in the slow query duration distribution map, and generating a quantity ratio distribution map of the slow query information in different query duration ranges at the second time point.

In a third aspect, some embodiments of the present disclosure provide an electronic device, comprising: one or more processors; a storage device, on which one or more programs are stored, which when executed by one or more processors cause the one or more processors to implement the method described in any implementation of the first aspect.

In a fourth aspect, some embodiments of the disclosure provide a computer readable medium on which a computer program is stored, wherein the program, when executed by a processor, implements the method described in any of the implementations of the first aspect.

The above embodiments of the present disclosure have the following advantages: the information analysis method for the database of some embodiments of the present disclosure may help a user to timely discover performance issues related to slow queries. In particular, the reason why the relevant slow-query problem cannot be effectively perceived in the daily system operation is that: in centralized monitoring, historical graphs of related slow queries cannot be viewed, so that a user (such as a manager) cannot predict the generation amount of the slow queries on the line. If there is a performance problem, this is usually known by an alarm generated after the performance problem occurs. And users often need to log in the online server to check the relevant performance indexes for analysis processing. Based on this, the information analysis method of some embodiments of the present disclosure may perform centralized collection processing on slow query information of a database cluster, and may store the processing result in a preset database. Therefore, the distribution information of the slow query information with different dimensions can be generated according to the user requirements. That is to say, the user can directly view the slow query trend of all the servers on the line through the embodiment. The slow query performance state of the database server can be monitored comprehensively without logging in an online server. Therefore, problems can be effectively found in time, user operation can be simplified, and analysis time is shortened.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and components are not necessarily drawn to scale.

FIG. 1 is an architectural diagram of an exemplary system in which some embodiments of the present disclosure may be applied;

FIG. 2 is a flow diagram of some embodiments of an information analysis method for a database according to the present disclosure;

FIG. 3A is a diagram of a total volume profile of slow query information over a first time period;

FIG. 3B is a schematic diagram of an example distribution plot for a first point in time;

FIG. 4 is a schematic block diagram of some embodiments of an information analysis apparatus for a database according to the present disclosure;

FIG. 5 is a schematic block diagram of further embodiments of an information analysis apparatus for a database according to the present disclosure;

FIG. 6 is a schematic structural diagram of an electronic device suitable for use in implementing some embodiments of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings. The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units.

It is noted that references to "a", "an", and "the" modifications in this disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 of an information analysis method or information analysis apparatus for a database to which some embodiments of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include terminal device 101,

networks

102, 104, server 103, and

database servers

105, 106. Network 102 may be the medium used to provide a communication link between terminal device 101 and server 103. Network 104 may be a medium used to provide communication links between server 103 and

database servers

105, 106. The

networks

102, 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. Various client applications, such as a slow query analysis application, a web browser, a shopping application, an instant messenger, and the like, may be installed on the terminal device 101.

Here, the terminal apparatus 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices with a display screen, including but not limited to a smart phone, a tablet computer, an e-book reader, a laptop portable computer, a desktop computer, and the like. When the terminal apparatus 101 is software, it can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.

The server 103 may be a server providing various services, for example, may be a background server providing support for slow query analysis type applications. The backend server can collect slow-query information from the

database servers

105, 106 and perform multidimensional aggregation processing. And in case it is detected that the user performs an operation on the application through the terminal device 101, the backend server may generate corresponding slow query distribution information to return to the terminal device 101.

The

database servers

105, 106 may be servers that provide data support for various applications. A plurality of database instances may be installed in the

database servers

105, 106. Multiple databases may be created in one database instance.

Here, the server 103 and the

database servers

105 and 106 may be hardware or software. When the server 103 and the

database servers

105 and 106 are hardware, they may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When server 103 and

database servers

105, 106 are software, they may be implemented as multiple software or software modules, for example, to provide distributed services, or as a single software or software module. And is not particularly limited herein.

It should be noted that the method provided by the embodiment of the present disclosure may be executed by the server 103, and may also be executed by the terminal device 101. Accordingly, the apparatus may be provided in the server 103, or may be provided in the terminal device 101. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, servers, and database servers in fig. 1 are merely illustrative. There may be any number of terminal devices, networks, servers and database servers, as desired.

With continued reference to fig. 2, a flow 200 of some embodiments of an information analysis method for a database in accordance with the present disclosure is shown. The method comprises the following steps:

in step 201, a slow query information set is periodically obtained from a preset information queue.

In some embodiments, the execution subject of the information analysis method for the database (e.g., the server 103 shown in fig. 1) may periodically obtain the slow query information sets from the preset information queue through a wired connection or a wireless connection. The preset information queue can be used for storing slow query information collected from the database cluster. The slow query (the slow query) generally refers to an SQL statement in which the query duration exceeds a specified duration. The slow queries can be recorded in Mysql to obtain a log of slow queries. As an example, in each obtaining process, the execution main body may sequentially obtain slow query information from a preset information queue according to the arrangement order of the information, so as to obtain a slow query information set.

Here, the execution agent may also collect slow query information of each database in the database cluster (e.g., database servers 105 and 106) through a wired connection or a wireless connection. The executive may collect these slow query logs in real-time or periodically (e.g., every ten minutes) and store them in a pre-set information queue. It should be noted that the preset message queue may be any program or structure for implementing a message storage function in a message transmission process, such as an ActiveMQ (open source message bus), a ZeroMQ (ZMQ for short, a message processing queue library), Kafka (a high-throughput distributed publish-subscribe message system), and the like.

In some embodiments, as shown in FIG. 5, multiple clients (client programming toolkits) may be included in the execution body. Each client may correspond to a database cluster (e.g., database instance) or database server. Each client can execute respective tasks at regular time, run local slow query analysis scripts and analyze the original slow query log of MySQL in the time period. The parsed information is then sent to the Kafka message queue as shown in fig. 5. Therefore, the collection and analysis efficiency of the slow query information is improved. In addition, when the communication mode or the interface is changed, only the corresponding client can be adjusted. Namely, the adoption of the corresponding acquisition mode can avoid the execution of a large amount of modification of the main body, thereby reducing the complexity and the time for modifying and adjusting. Optionally, the execution main body may further convert the analyzed slow query information into a fixed format, and then send the fixed format to the preset information queue. This facilitates subsequent data processing analysis.

It will be appreciated that since the amount of access to different database instances will generally vary, the amount of slow query information that will ultimately be generated by each database instance (or database server) will vary. In addition, the execution principal may adopt a producer-consumer mode. Wherein a producer is generally a process that produces data. And consumers are generally the processes that consume the data. This model can solve the problem of strong coupling between the producer and the consumer by means of a container (i.e. a pre-set information queue). The existence of the preset information queue can ensure that the preset information queue and the preset information queue do not need direct communication, so that the efficiency of processing data respectively can be improved.

In some embodiments, the execution agent may use a general-purpose computing engine (e.g., Spark, a fast general-purpose computing engine designed for large-scale data processing) to obtain the slow-query information set. As shown in fig. 5, the execution body may adopt Spark streaming (a framework for processing real-time computing services), which is also a core of streaming computing, and the slow query information is fetched from the kafka information queue through Spark Context according to a processing manner such as one batch every ten minutes. Where Spark Context is the main entry of the Spark function, which represents the connection to the Spark cluster. The spare Context can access the spare service (in an active state) to run the task.

It should be noted that each piece of information published to the Kafka information queue has a category, which is called Topic (also called Topic). Thus, in some embodiments, the execution agent may determine slow query information from the pre-set information queue whose topics match each other. A set of slow query information may then be generated based on the determined slow query information. That is, Spark Context may go to the Kafka message queue to obtain the slow query message with the subject matching (e.g., same Topic) each time, so as to obtain the slow query message set.

By way of example, the information queue includes three types, i.e., Topic1, Topic2, and Topic 3. At this time, the execution subject can obtain one kind of slow query information of Topic from the execution subject at a time to form a slow query information set. For example, the slow query information of Topic1 may be obtained for the first time, and the slow query information of Topic2 may be obtained for the second time. Or in order to improve the data processing efficiency, the execution subject can acquire the slow query information of various types of Topic at a time.

And 202, aggregating the slow query information in the slow query information set according to a preset dimension.

In some embodiments, the execution subject may aggregate the slow query information in the slow query information set according to a preset dimension. The preset dimension may be set according to actual requirements, and may include, but is not limited to, at least one of the following: time (minutes, hours, days) dimension, item (e.g., business) dimension, database instance dimension, table dimension, department dimension, and the like. The aggregation process usually includes content selection, analysis, and classification of related data, and the final analysis results in the desired result. As an example, the classification analysis may be performed in a time dimension based on the time of slow query information generation. According to the content related in the slow query information, screening and aggregation can be carried out according to the service or service line dimension.

Optionally, for each slow query message in the slow query message set, the first execution subject may modify a target character (e.g., a variable in a sentence) in the slow query message into a preset character; a hash value of the modified slow query information may then be generated. And then, the generated hash values can be aggregated according to a preset dimension.

As an example, for SQL statements "Select col1 from table where id is 1" and "Select col1 from table where id is 2", we can get "Select col1 from table where id is 2" by replacing the variables with fixed characters? ". This can turn two SQL statements into one type of SQL statement. In turn, a unique hash value for this type of SQL statement can be derived. Therefore, the data integrity is ensured, the data volume of the slow query information needing to be processed can be reduced, and the data processing efficiency is improved. By way of example, the executing agent may employ a Hash function such as SHA-1(Secure Hash Algorithm 1) or MD5(Message-Digest Algorithm) to generate the Hash value.

As can be seen from fig. 5, after the Spark Context obtains the data content (i.e., the slow query information set), all data can be scattered by a Cluster manager to be evenly distributed on each computing node (e.g., worker1, worker2, worker 3). The result after calculation can adopt MD5 algorithm to obtain the hash value of the slow inquiry information of the same type.

It will be appreciated that the slow query information processed as described above maintains the underlying statistics for each dimension of the native SQL statement. In this way, the execution subject can subsequently perform (map) merging calculations of data according to different dimensions. As shown in fig. 5, for the obtained hash value, the aggregation calculation may be performed in terms of time (Hour), Table (Table), Instance (Instance), and other (other) dimensions.

And step 203, storing the aggregation processing result in a preset database.

In some embodiments, the execution principal may store the aggregation processing result in step 203 in a preset database. The preset database here may be any database for storing data. The preset database may be set according to actual use requirements, such as an Hbase (distributed, column-oriented open source database) database shown in fig. 5.

It can be understood that, in order to ensure the multi-dimension and future expandability of the data, in the calculation process of each flow, the result of each calculation can be written into a preset database. As shown in fig. 5, MD5 calculations, merging calculations in different dimensions, and the results of the calculations (reduce) can all be stored in the Hbase database. Thus, all data of different time and dimension are saved in Hbase

And 204, performing statistical analysis on the data in the preset database to generate distribution information of the slow query information.

In some embodiments, the execution subject may perform statistical analysis on the data in the preset database, thereby generating distribution information of the slow query information. And the user can operate on the slow query analysis application platform through a terminal device (such as the terminal device 101 shown in fig. 1) to view the condition of the relevant slow query information. As an example, the execution subject may perform statistical analysis on related data (data indicated by an operation) in a preset database according to the user operation, thereby generating distribution information of slow query information corresponding to the user operation. Here, since there is an association relationship between the calculation results stored in the preset database, the execution main body may perform a secondary operation based on the plurality of calculation results stored therein to obtain an analysis result of the slow query information, and may send the analysis result to the terminal device for presentation to the user.

In some embodiments, the execution subject, upon receiving a user selection of a first time period, may generate a total amount profile of slow query information over the first time period. The first time period here may be any time period before the current time. Further, in response to detecting a user selection of a first time point in the total volume profile, the amount of slow query information for each database instance at the first time point may be determined. And further generating a database instance distribution graph at the first time point according to the quantity of the slow query information of each database instance.

As an example, during daily use by a user, a profile of the total amount of slow-query information over a first time period may be viewed first. As shown in FIG. 3A, the user may select overview information for slow queries in the 2018-07-1214: 02 to 2018-07-1215: 01 time period. Where the abscissa may be time (in minutes) and the ordinate may be total number of slow query messages (count, in thousands). In the graph, the user may also view a specific total number of slow query information for different points in time, such as a total number of 2018-07-1214: 16 slow query information of 9499. Through the total distribution graph, the user can predict the load condition of all the servers on the current line according to the total compared with the total distribution graph.

At this time, if the user finds that the total slow query number at a certain time point in the total number distribution graph is abnormal, the user can enter the information display of all database instances at the time point by clicking the coordinate point. If the user finds that the total number of slow queries 2018-07-1215: 00 in fig. 3A is suddenly increased abnormally, the user may select the abscissa or coordinate value corresponding to the time point. The executive may then generate a database instance profile as shown in FIG. 3B. Where the abscissa may be the number of slow query messages and the ordinate may be the identity (e.g., IP address) of each database instance. Here, the user may view a specific total number of slow query information for different database instances, such as database instance 172.24.138.243#3306 with a number of slow query information of 402. The TOP database INSTANCEs may be presented in the database INSTANCE profile (via the TOP _ INSTANCE function). Therefore, in daily operation and maintenance, the user can directly position according to the information, and the working efficiency and the accuracy can be greatly improved.

Optionally, in response to detecting a user's selection operation on a target database instance in the database instance distribution map, the execution subject may determine a database associated with the target database instance; a database profile for a second time period may be generated based on the determined amount of slow query information for the database for the second time period. The second time period may be a time period including the first time point, or may be any time period. Here, the abscissa of the database profile may be time, and the ordinate may be the amount of slow-query information. The database profile may be used to represent the trend of the amount of slow query information for different databases and to distinguish the different databases by color (or shape, line thickness, etc.). The database distribution map can also be used to represent the trend of the amount of slow query information of all databases located in different query duration ranges, and distinguish the different query duration ranges by using colors and the like.

Further, in response to detecting a user selection of a target database in the database profile, the execution subject may determine slow query information for the target database over a second time period. Then, according to the determined query duration of the slow query information, the number of the slow query information in different query duration ranges can be determined, so that a slow query duration distribution graph of the target database in a second time period is generated. The abscissa of the slow query duration distribution graph may be time, and the ordinate may be the amount of slow query information. The query duration range can be divided into 0-0.1 second, 0.1-0.5 second, 0.5-1 second, 1-3 seconds and more than 3 seconds. The number of slow query information in different query duration ranges can be distinguished in the graph by means of colors and the like.

Then, in response to detecting that the user selects a second time point in the slow query duration distribution map, the execution main body may further generate a number-to-number distribution map of slow query information located in different query duration ranges at the second time point.

As an example, in fig. 3B, if the user finds that an abnormal condition occurs in a certain database instance, the slow query information distribution of all databases associated with the database instance may be checked by clicking. The statistics is carried out on all databases, and the slow query information quantity of each database can be clear at a glance. And the quantity of slow query information in different query duration ranges can be compared, and the slow query information can also be directly viewed by a user in a pie chart mode. This provides great convenience to the user in diagnosing and troubleshooting performance problems.

It can be understood that, in addition to monitoring the total number and trend of the global slow query information, the information analysis method for the database disclosed in this embodiment may also monitor the slow query information according to the dimension of the business line or the department. In this way, the user can directly determine the part needing optimization and give related suggestions according to the proportion of slow query information of related business lines or departments in distribution information (such as pie charts). In addition, the selection manner is not limited herein, and may be (but is not limited to) at least one of option selection, clicking, moving the display position of the mouse, and the like. The display mode of the distribution information is also not limited here, and various statistical graphs can be adopted according to the requirements.

In some embodiments, the preset dimension may further include related index information commonly seen in native MySQL, such as Query time (Query time), number of Rows checked (Rows extended), Count (Count), number of Rows returned (Rows sent), and the like. These may all be recorded in the native slow query log file. Therefore, the user can also know the comparison condition of the parameters in all databases on the application platform.

It can be known from the above description that developers can directly query the slow query trend in their own service line through the platform, and can directly locate through the hash value obtained by merging and calculating the similar slow queries. And the DBA can directly check the slow query trend of all the servers on line through statistics after the platform is merged and calculated. Meanwhile, the information of the INSTANCE with the largest number of slow queries in the current time period can be directly located through the TOP _ INSTANCE function. And the information of each dimension can be obtained by drilling down again. Therefore, the user can monitor the slow query performance state of the database server in all aspects without logging in an online server.

Some embodiments of the present disclosure provide an information analysis method for a database that can stream collected slow query SQL statements. And the calculation of different dimensions can be carried out according to the dimensions such as minutes, hours, days, tables, services, instances and service lines, and the data after the aggregation of different dimensions is merged. By combining the idea of database operation and maintenance, the most intuitive slow query chart information can be displayed for DBAs and research and development personnel according to various dimensions. Therefore, the method and the device not only can help users to find problems timely and effectively, but also reduce faults. The time length of user operation and analysis can be reduced, and the optimization efficiency is improved.

With further reference to fig. 4, as an implementation of the method illustrated in fig. 2 described above, the present disclosure provides some embodiments of an information analysis apparatus for a database, which correspond to those of the method embodiments illustrated in fig. 2, and which may be applied in particular to various electronic devices.

As shown in fig. 4, the information analysis apparatus 400 for a database of some embodiments may include: an obtaining unit 401 configured to periodically obtain a slow query information set from a preset information queue, where the preset information queue is used to store slow query information collected from a database cluster; a processing unit 402 configured to aggregate slow query information in the slow query information set according to a preset dimension; a result storage unit 403 configured to store the aggregation processing result in a preset database; the generating unit 404 is configured to perform statistical analysis on data in a preset database to generate distribution information of the slow query information.

In some embodiments, the processing unit 402 may be further configured to, for each slow query information in the slow query information set, modify a target character in the slow query information to a preset character, and generate a hash value of the modified slow query information; and performing aggregation processing on the generated hash values according to a preset dimension.

In some embodiments, processing unit 402 may be further configured to employ a message digest algorithm to generate a hash value of the modified slow query message.

In some embodiments, the obtaining unit 401 may be further configured to determine slow query information with mutually matched topics from a preset information queue, and obtain a slow query information set according to the determined slow query information.

In some embodiments, the generating unit 404 may be further configured to generate a total amount profile of the slow query information in the first time period in response to receiving a user selection operation of the first time period; responding to the detected selection operation of the user to a first time point in the total amount distribution graph, and determining the quantity of slow query information of each database instance at the first time point; and generating a database instance distribution graph at a first time point according to the quantity of the slow query information of each database instance.

In some embodiments, the generating unit 404 may be further configured to determine, in response to detecting a user selection operation on a target database instance in the database instance profile, a database associated with the target database instance; and generating a database distribution graph in the second time period according to the determined quantity of the slow query information of the database in the second time period.

In some embodiments, the generating unit 404 may be further configured to determine slow query information of the target database in the second time period in response to detecting a user selection operation of the target database in the database profile; determining the number of slow query information in different query time ranges according to the determined query time of the slow query information, and generating a slow query time distribution graph of the target database in a second time period; and responding to the detected user selection operation on a second time point in the slow query duration distribution map, and generating a quantity ratio distribution map of the slow query information in different query duration ranges at the second time point.

As an example, as shown in fig. 5, the above-described acquisition unit 401 and processing unit 402 correspond to parts in the Spark calculation engine. And the default message queue may be the Kafka message queue shown in fig. 5. The result storage unit 403 corresponds to the Hbase database. In addition, as can be seen from fig. 5, the information analysis apparatus may further include an acquisition unit clients.

It will be understood that the elements described in the apparatus 400 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 400 and the units included therein, and will not be described herein again.

Referring now to FIG. 6, a block diagram of an electronic device (e.g., the server of FIG. 1) 600 suitable for use in implementing some embodiments of the present disclosure is shown. The electronic device shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.

In particular, according to some embodiments of the present disclosure, the processes described above with reference to the flow diagrams may be implemented as computer software programs. For example, some embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In some such embodiments, the computer program may be downloaded and installed from a network through the communication device 609, or installed from the storage device 608, or installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of some embodiments of the present disclosure.

It should be noted that the computer readable medium described in some embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In some embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In some embodiments of the present disclosure, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the clients, servers may communicate using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may interconnect with any form or medium of digital data communication (e.g., a communications network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: periodically acquiring a slow query information set from a preset information queue, wherein the preset information queue is used for storing slow query information acquired from a database cluster; aggregating the slow query information in the slow query information set according to a preset dimension; storing the aggregation processing result in a preset database; and carrying out statistical analysis on the data in the preset database to generate the distribution information of the slow query information.

Furthermore, computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in some embodiments of the present disclosure may be implemented by software, and may also be implemented by hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a processing unit, a result storage unit, and a generation unit. Where the names of these units do not in some cases constitute a limitation on the unit itself, for example, the retrieving unit may also be described as a "unit that periodically retrieves a slow query information set from a preset information queue".

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combinations of the above-mentioned features, and other embodiments in which the above-mentioned features or their equivalents are combined arbitrarily without departing from the spirit of the invention are also encompassed. For example, the above features and (but not limited to) the features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. An information analysis method for a database, wherein the method comprises:

periodically acquiring a slow query information set from a preset information queue, wherein the preset information queue is used for storing slow query information acquired from a database cluster;

aggregating the slow query information in the slow query information set according to a preset dimension;

storing the aggregation processing result in a preset database;

and performing statistical analysis on the data in the preset database to generate the distribution information of the slow query information.

2. The information analysis method according to claim 1, wherein the aggregating slow query information in the slow query information set according to a preset dimension comprises:

for each slow query information in the slow query information set, modifying a target character in the slow query information into a preset character, and generating a hash value of the modified slow query information;

and performing aggregation processing on the generated hash values according to a preset dimension.

3. The information analysis method of claim 2, wherein the generating a hash value of the modified slow query information comprises:

and generating a hash value of the modified slow query information by adopting an information summarization algorithm.

4. The information analysis method of claim 1, wherein the periodically obtaining slow query information sets from a preset information queue comprises:

and determining slow query information with mutually matched subjects from a preset information queue, and generating a slow query information set according to the determined slow query information.

5. The information analysis method according to one of claims 1 to 4, wherein performing statistical analysis on the data in the preset database to generate distribution information of slow query information comprises:

responding to the received selection operation of a user on a first time period, and generating a total amount distribution graph of slow query information in the first time period; and

in response to detecting a user's selection operation of a first time point in the total amount distribution graph, determining the amount of slow query information of each database instance at the first time point;

and generating a database instance distribution graph at the first time point according to the quantity of the slow query information of each database instance.

6. The information analysis method according to claim 5, wherein the statistical analysis is performed on the data in the preset database to generate distribution information of the slow query information, further comprising:

in response to detecting a user's selection operation of a target database instance in the database instance distribution map, determining a database associated with the target database instance;

and generating a database distribution graph in a second time period according to the determined amount of the slow query information of the database in the second time period.

7. The information analysis method according to claim 6, wherein the statistical analysis is performed on the data in the preset database to generate distribution information of the slow query information, further comprising:

in response to detecting a user's selection operation of a target database in the database profile, determining slow query information of the target database in the second time period;

determining the number of slow query information in different query time ranges according to the determined query time of the slow query information, and generating a slow query time distribution graph of the target database in the second time period; and

and generating a quantity ratio distribution graph of slow query information in different query duration ranges at a second time point in response to detecting that the user selects the second time point in the slow query duration distribution graph.

8. An information analysis apparatus for a database, wherein the apparatus comprises:

the acquisition unit is configured to periodically acquire a slow query information set from a preset information queue, wherein the preset information queue is used for storing slow query information acquired from a database cluster;

the processing unit is configured to aggregate the slow query information in the slow query information set according to a preset dimension;

a result storage unit configured to store the aggregation processing result in a preset database;

and the generating unit is configured to perform statistical analysis on the data in the preset database to generate the distribution information of the slow query information.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-7.