CN113608909B - Data processing method, apparatus, device, system, storage medium and program product - Google Patents

Data processing method, apparatus, device, system, storage medium and program product Download PDF

Info

Publication number
CN113608909B
CN113608909B CN202110864262.2A CN202110864262A CN113608909B CN 113608909 B CN113608909 B CN 113608909B CN 202110864262 A CN202110864262 A CN 202110864262A CN 113608909 B CN113608909 B CN 113608909B
Authority
CN
China
Prior art keywords
data
database system
abnormal
flow
suspected
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110864262.2A
Other languages
Chinese (zh)
Other versions
CN113608909A (en
Inventor
杨科
沈春辉
杨成虎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba China Co Ltd
Original Assignee
Alibaba China Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba China Co Ltd filed Critical Alibaba China Co Ltd
Priority to CN202110864262.2A priority Critical patent/CN113608909B/en
Publication of CN113608909A publication Critical patent/CN113608909A/en
Application granted granted Critical
Publication of CN113608909B publication Critical patent/CN113608909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data processing method, a device, equipment, a system, a storage medium and a program product. The method comprises the following steps: aiming at least one predefined abnormal problem, according to the data category corresponding to the abnormal problem, acquiring target part data in all data required by analyzing the corresponding abnormal problem from data generated by the operation of a database system as data to be processed; and processing the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem to determine suspected objects with the abnormal problem in the database system. The embodiment of the application can reduce the consumed computing resources when analyzing the database system.

Description

Data processing method, apparatus, device, system, storage medium and program product
Technical Field
The present disclosure relates to the field of database technologies, and in particular, to a data processing method, apparatus, device, system, storage medium, and program product.
Background
A Database System (Database System) is a System composed of a Database and management software thereof. The database system includes a centralized database system and a distributed database system.
In general, problems in the operation of database systems can be analyzed, located, and even automatically repaired. Specifically, the method can collect the total query request data, the total write request data, the total log data and the like generated by the operation of the database system, and determine the problems of the database system on the basis of cleaning, analyzing and the like the collected total data. However, this approach has a problem of greater consumption of computing resources.
Disclosure of Invention
The embodiment of the application provides a data processing method, a device, equipment, a system, a storage medium and a program product, which are used for solving the problem that the system consumes larger computing resources when a database system is analyzed in the prior art.
In a first aspect, an embodiment of the present application provides a data processing method, including:
aiming at least one predefined abnormal problem, according to the data category corresponding to the abnormal problem, acquiring target part data in all data required by analyzing the corresponding abnormal problem from data generated by the operation of a database system as data to be processed; the target part data can be used for determining suspected objects which have the abnormal problems and have a large influence on the operation of the database system, and other part data in the whole data can be used for determining suspected objects which have the abnormal problems and have a small influence on the operation of the database system;
And processing the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem to determine the suspected object with the abnormal problem in the database system.
In a second aspect, an embodiment of the present application provides a data processing apparatus, including:
the acquisition module is used for aiming at least one predefined abnormal problem, acquiring target part data in all data required by analyzing the corresponding abnormal problem from data generated by the operation of the database system according to the data category corresponding to the abnormal problem, and taking the target part data as data to be processed; the target part data can be used for determining suspected objects which have the abnormal problems and have a large influence on the operation of the database system, and other part data in the whole data can be used for determining suspected objects which have the abnormal problems and have a small influence on the operation of the database system;
and the processing module is used for processing the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem so as to determine the suspected object with the abnormal problem in the database system.
In a third aspect, an embodiment of the present application provides a data processing apparatus, including:
The acquisition module is used for aiming at least one predefined abnormal problem, acquiring and analyzing target part data in all data required by the corresponding abnormal problem from data generated by the operation of the database system according to the data category corresponding to the abnormal problem, and taking the target part data as data to be processed; the target part data can be used for determining suspected objects which have the abnormal problems and have a large influence on the operation of the database system, and other part data in the whole data can be used for determining suspected objects which have the abnormal problems and have a small influence on the operation of the database system;
and the processing module is used for processing the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem so as to determine the suspected object with the abnormal problem in the database system.
In a fourth aspect, embodiments of the present application provide a computer device, comprising: a memory, a processor; wherein the memory is for storing one or more computer instructions which, when executed by the processor, implement the method of any of the first aspects.
In a fifth aspect, embodiments of the present application provide a computer program product comprising computer program instructions for implementing the method according to any of the first aspects when said instructions are executed by a processor.
In a sixth aspect, embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed, implements a method according to any of the first aspects.
In the embodiment of the application, from the perspective of a client of a database, at least one type of abnormality problem corresponding to the abnormality of the client is predefined, the abnormality problem has a corresponding data type and data analysis logic, the computer equipment obtains target partial data in all data required for analyzing the abnormality problem from data generated by the operation of the database system according to the data type as data to be processed, and processes the corresponding data to be processed according to the data analysis logic so as to determine suspected objects with the abnormality problem in the database system, since the target partial data in all data can be used for determining suspected objects with the abnormality problem and the influence degree on the operation of the database system is large, and other partial data in all data can be used for determining suspected objects with the abnormality problem and the influence degree on the operation of the database system is small, the purpose of determining suspected objects with great influence on the operation of the database system can be determined by obtaining the target partial data as the data to be processed, and further, since only partial data in all data are required to be processed, the target partial data are required to be processed, compared with the data with the light weight data are collected, and the weight is reduced, and compared with the data of the light weight processing and the weight is calculated. In addition, as the processing result is the suspected object with the abnormal problem, for the common database client, whether the suspected object with one or more suspected abnormal problems exists or not can be known, so that the client can be guided to further process the problem.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below of the drawings that are needed in the embodiments or the prior art descriptions, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic view of an application scenario in an embodiment of the present application;
FIG. 2 is a flow chart of a data processing method according to an embodiment of the present disclosure;
FIG. 3 is a block diagram providing a solution to the request hotspot problem according to an embodiment of the present application;
FIG. 4 is a block diagram of an embodiment of the present application that provides for traffic hot spot issues;
FIG. 5 is a block diagram providing a solution to a large query request problem in accordance with an embodiment of the present application;
FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure;
fig. 7 is a schematic structural diagram of a computer device according to an embodiment of the present application.
Detailed Description
For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
The terminology used in the embodiments of the application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise, the "plurality" generally includes at least two, but does not exclude the case of at least one.
It should be understood that the term "and/or" as used herein is merely one relationship describing the association of the associated objects, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrase "if determined" or "if detected (stated condition or event)" may be interpreted as "when determined" or "in response to determination" or "when detected (stated condition or event)" or "in response to detection (stated condition or event), depending on the context.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a product or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such product or system. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a commodity or system comprising such elements.
In addition, the sequence of steps in the method embodiments described below is only an example and is not strictly limited.
In order to facilitate understanding of the technical solutions provided by the embodiments of the present application by those skilled in the art, a technical environment in which the technical solutions are implemented is described below.
In the related art, a data processing method commonly used for analyzing a database system mainly includes collecting the total log data generated by the operation of the database system, determining the problems of the database system based on the cleaning, analysis and other processes of the collected total log data, and the total data collection and the processing of the total log data have larger consumption of computing resources, so that a data processing mode capable of reducing the consumption of computing resources is needed in the related art.
Based on actual technical requirements similar to those described above, the data processing method provided by the application can reduce the computational resources consumed in analyzing the database system by using a technical means.
The data processing method provided by each embodiment of the present application is specifically described below through an exemplary application scenario.
Fig. 1 is an application scenario schematic diagram of a data processing method according to an embodiment of the present application. As shown in fig. 1, a database system 11 and a data processing system 12 may be included in the application scenario. The database system 11 may include a controller (master) node and a plurality of data nodes, where the controller node may be responsible for scheduling, and the data nodes may be responsible for providing external data read-write requests, and so on. During operation of database system 11, database system 11 may reveal some data, such as log data, etc., generated by its operation.
The data processing system 12 is located above the database system 11, where the data processing system 12 may include an acquisition module 121 for acquiring data generated by operation of the database system 11 and a processing module 122 for performing data processing on the data acquired by the acquisition module, and a computer device (denoted as a computer device X) configured to deploy the processing module 122 may perform the data processing method provided by the embodiment of the present application.
In general, when analyzing a database system, it is necessary to acquire, from the database system, all log data, all query request data, all write requests, and other types of all data generated by the operation of the database system, and perform processes such as cleaning and analysis on the acquired all data to determine a problem in the database system. Since the total amount of data needs to be collected and the data processing for the total amount of data, that is, the data collection and the data processing are both weighted, there is a problem that the consumption of the computing resources is large.
Moreover, since both data collection and data processing require relatively large computing resources, the processing module for performing data processing and the database system are typically disposed in different computer devices, and thus the processing module is typically a centralized processing module, corresponding to a plurality of database systems. However, the centralized processing module can provide an analysis service for a database system deployed on a public cloud, but cannot provide an analysis service for a database system deployed on a proprietary cloud isolated by a network, and applicable service objects are very limited.
In addition, if the code corresponding to the acquisition module is operated by the process in the database instance, because the acquisition module needs to perform the re-quantized data acquisition in the conventional technology, the database instance with larger consumption of computing resources and smaller resource specification has very limited computing resources and cannot support the data acquisition of heavy resource consumption, so that the acquisition module can only be operated in the database instance with larger resource specification, but not in the database instance with smaller resource specification, and the applicable instance is also very limited.
In addition, in the traditional technology, the acquired full data is processed, the processing result is the system internal index of the database system, the method is suitable for a system manager with abundant experience to conduct problem investigation, the method is insufficient from the perspective of a common database client, and an analysis conclusion capable of guiding the client to further process the problem is lacking.
In order to solve the technical problem of large consumption of computing resources when the database system is analyzed, in an application scenario shown in fig. 1, at least one type of abnormal problem corresponding to the abnormality of the client side is predefined from the client side of the database, the corresponding data type of the abnormal problem and data analysis logic exist, the computer equipment X obtains target part data in all data required for analyzing the corresponding abnormal problem from data generated by the operation of the database system according to the data type, processes the corresponding data to be processed according to the data analysis logic so as to determine suspected objects with the abnormal problem in the database system, and since the target part data in all data can be used for determining suspected objects with the abnormal problem and large influence degree on the operation of the database system, and other part data in all data can be used for determining suspected objects with the abnormal problem and small influence degree on the operation of the database system. In addition, as the processing result is the suspected object with the abnormal problem, for the common database client, whether the suspected object with one or more suspected abnormal problems exists or not can be known, so that the client can be guided to further process the problem.
Based on the above, in the application scenario shown in fig. 1, the computer device X obtains, for at least one predefined type of anomaly problem, target portion data in all data required for analyzing the anomaly problem from data generated by the database system according to the data category corresponding to the anomaly problem, and processes the corresponding data to be processed according to the data analysis logic corresponding to the anomaly problem, so as to determine a suspected object with the anomaly problem in the database system.
In this embodiment, the analysis of the database system is realized through light-weight data acquisition and light-weight data processing, so that the consumption of computing resources is relatively small, in one embodiment, the acquisition module 121 and the processing module 122 in the data processing system 12 may be disposed in the same computer device, that is, acquiring the target portion data required for analyzing the corresponding abnormal problem may specifically include acquiring the target portion data required for analyzing the corresponding abnormal problem, so that the data processing system 12 and the database system 11 may be located in the same network environment, thereby enabling the data processing system 12 to provide analysis services for the database system disposed on the public cloud and also providing analysis services for the database system disposed on the network isolated proprietary cloud.
Further, because the consumption of the data collection and the data processing on the computing resources is small, the resource specification requirements on the database instance running the corresponding code of the data processing system 12 are low, so that the database instance with small resource specification can be run, the applicable instance range is enlarged, and the requirements of different resource specification instances are met.
In another embodiment, the acquisition module 121 and the processing module 122 in the data processing system 12 may be deployed on different computer devices, that is, acquiring the target portion data required for analyzing the corresponding abnormal problem may specifically include receiving the target portion data required for analyzing the corresponding abnormal problem acquired and transmitted by other devices.
In the following, data collection by the computer device X will be mainly described in detail.
Some embodiments of the present application are described in detail below with reference to the accompanying drawings. The following embodiments and features of the embodiments may be combined with each other without conflict.
Fig. 2 is a flow chart of a data processing method according to an embodiment of the present application, as shown in fig. 2, the method of the present embodiment may include:
step 21, aiming at least one predefined abnormal problem, according to the data category corresponding to the abnormal problem, acquiring target part data in all data required by analyzing the corresponding abnormal problem from data generated by the operation of a database system, and taking the target part data as data to be processed;
And step 22, processing the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem so as to determine the suspected object with the abnormal problem in the database system.
In this embodiment of the present invention, the target portion data may be used to determine a suspected object that has the abnormal problem and has a large influence on the operation of the database system, and other portions of data in the total data may be used to determine a suspected object that has the abnormal problem and has a small influence on the operation of the database system. That is, the target portion data can be used to determine suspected objects that have a greater degree of impact on the operation of the database system, and the other portion data can be used to determine suspected objects that have a lesser degree of impact on the operation of the database system.
In view of the fact that the client is concerned about the operation condition of the database system from the viewpoint of using the database system, in the embodiment of the application, when the database system is analyzed, the purpose of analysis can be to determine the suspected objects with larger influence on the operation of the database system, but not to care about the suspected objects with smaller influence on the operation of the database system. Based on the above, although other part of data can be used for determining the suspected object with the abnormal problem, from the purpose that the suspected object with great influence on the operation of the database system needs to be determined, since the other part of data can be used for determining the suspected object with small influence on the operation of the database system, only the target part of data in all data can be subjected to data acquisition, and the other part of data is not subjected to data acquisition, so that light-weight data acquisition and light-weight data processing are realized.
The at least one problem may be flexibly defined as desired. The inventor of the application puts forward a concept of a client side abnormal problem from the perspective of using a database system by a client, and puts forward three abnormal problems including a request hot spot problem, a flow abnormal growth problem and a large query request problem, wherein the three abnormal problems can basically cover various scenes of abnormal use of the database system by the client.
The problem of hot spot request refers to that data read-write is severely inclined due to unreasonable design of a primary key, for example, reading or writing is concentrated on a certain partition (Region) of a table, so that the distributed capability of a database is degraded into a single-node form.
An Abnormal Flow increase (Abnormal Flow) problem refers to an Abnormal increase in table traffic that results in excess of the overall processing capacity of the database system.
The big query (big call) problem refers to that a single query scans a large number of bottom storage data blocks due to unreasonable table query modes, so that a large number of computing resources and input and output resources are consumed, and the request processing throughput of a single machine is affected.
In one embodiment, the at least one exception problem may include one or more of a request hotspot problem, a traffic exception growth problem, or a large query request problem.
The data category corresponding to the request hotspot problem specifically refers to a data category required for analyzing the request hotspot problem. In one embodiment, the data category corresponding to the request hotspot problem may include a fragment traffic data category, that is, the request hotspot problem may be analyzed according to the fragment traffic data. Wherein, the relation between the fragments (regions) and the table is: a table is composed of a plurality of slices, each slice stores part of records of the table, and all records of the slices are combined together to form the table.
In the embodiment of the present application, the method for acquiring the target portion data in all the data required for analyzing the request hotspot problem may be implemented by a method for acquiring the flow data of the fragments with higher fragment flows, that is, a head fragment flow acquiring method, so as to obtain the data to be processed corresponding to the request hotspot problem.
For example, the head portion flow acquisition may be performed periodically, and based on this, step 21 may specifically include: and periodically acquiring the fragment flow data of a plurality of fragments with top flow ranks in each of a plurality of tables meeting preset requirements from fragment flow data generated by the operation of the database system, wherein the plurality of tables meeting the preset requirements comprise the plurality of tables with top flow ranks as data to be processed corresponding to the request hot spot problem. The flow of the table may specifically be a request amount of a read-write request for the table, and the flow of the fragments specifically is a request amount of a read-write request for the fragments.
It should be noted that, in the case of data collection by the computer device X in fig. 1, the step 21 may include, for example: and periodically collecting the fragment flow data of a plurality of fragments with top flow ranks in each of a plurality of tables meeting the preset requirements, and taking the fragment flow data as the data to be processed corresponding to the request hot spot problem.
Optionally, the head fragment traffic can be collected in a snapshot manner, that is, the head snapshot manner can be adopted to collect and analyze part of fragment traffic data required by the request hot spot problem so as to obtain the data to be processed corresponding to the request hot spot problem.
In one embodiment, in a single snapshot, the pre-traffic-ranking FTN may be recorded table Pre-rank FTN of traffic in table region Fragment traffic data for a fragment. T is defined below i For table i, R i,j For Table T i Is the j-th slice of (c).
Example 1, assume that there is a table T in the database system 1 ,T 2 ,T 3 ,T 4 ,……,T m ,T i ComprisesSlicing R i,1 ,R i,2 ,R i,3 ,……R i,n ,t k Schedule flow relationship T 1 >T 2 >T 3 >T 4 >……T m ,t k Time T i The fragment flow relationship of the table is R i,1 >R i,2 >R i,3 >……R i,n Then a head snapshot mode can be adopted, at t k Time snapshot selection record traffic top FTN table In the table of (a), traffic ranks top FTN region Fragmented traffic data of a fragment, wherein FTN table Less than m, FTN region Less than n, t k The data to be processed corresponding to the request hotspot problem acquired by performing the head snapshot at the moment may include:
……
Further, assume FTN table Equal to p, FTN region Equal to q, where p is less than m and q is less than n, then t as shown in FIG. 3 k Table T can be recorded in the time snapshot 1 To table T m T filled grey in (c) 1 To T p In gray filled tile R 1,1 To R 1,q Fragment flow data of (2), grey filled fragments R 2,1 To R 2,q Fragment flow data of … …, gray-filled fragment R p,1 To R p,q Number of fractional flows of (a)Accordingly, the white-filled tile R is not recorded 1,q+1 To R 1,n Fragment flow data of (a), white filled fragments R 2,q+1 To R 2,n Is a white filled tile R, … … p,q+1 To R p,n Nor does it record white filled T p+1 To T m The fragmentation flow data of the middle fragments. In addition, t in fig. 4 1 Time snapshot, t 2 Time snapshot, t 3 Time snapshot, … …, can be understood as t k Snapshot of time before time. Wherein t is k The time snapshot may record data, for example, in the manner shown in table 1 below.
TABLE 1
T 1 R 1,1 Component flow data of (2) R 1,2 Component flow data of (2) …… R 1,q Component flow data of (2)
T 2 R 2,1 Component flow data of (2) R 2,2 Component flow data of (2) …… R 2,q Component flow data of (2)
…… …… …… …… ……
T p R p,1 Component flow data of (2) R p,2 Component flow data of (2) …… R p,q Component flow data of (2)
Optionally, the multiple tables meeting the preset requirements may further include a table corresponding to the top-ranked shard in each data node of the database system. Consider that there may be unreasonable distribution of requested traffic, resulting in that although the table traffic of a certain table does not have a previous FTN table However, a serious inclination exists in a certain fragment access in the table to generate a fragment hot spot, and the fragment cannot be processed by the prior FTN appears table The identified condition in the data node can be identified that the traffic of the table is not in the FTN by presetting a plurality of tables meeting the requirements and further comprising a table corresponding to the fragments with the top traffic rank in each data node table However, the situation before serious inclination exists in the slice access in the table, which is beneficial to improving the accuracy.
It may be understood that, in the data node, the table corresponding to the top-ranked fragment may or may not be repeated, and for the repeated table, the fragment traffic data of the top-ranked fragments in the table may only obtain one part.
For example, on the basis of example 1, assume that for the r-th storage node H in the database system r There is a slice R a,m ,R b,n ,R t,p ,……,R u,q And t k The time slice flow relationship is: r is R a,m >R b,n >R t,p >……R u,q Then t k The time snapshot can also select a recording node H r List T of e.g. 3 fragment correspondences with top middle fragment traffic a ,T b, T t In the top-ranked FTN region Fragment traffic data for each fragment. Thus t k The data to be processed corresponding to the request hotspot problem acquired by performing the head snapshot at the moment may include:
……
……
……
it will be appreciated that T a 、T b 、T c And T is 1 To the point ofAnd not repeated.
In this embodiment of the present application, after obtaining data to be processed required for analyzing a request hotspot problem from data generated by operation of a database system, the data to be processed corresponding to the request hotspot problem may be processed according to data analysis logic corresponding to the request hotspot problem, so as to determine that a suspected object of the request hotspot problem exists in the database system.
The processing of the data to be processed corresponding to the request hotspot problem may be implemented by performing an equalization analysis on the flow data of the table fragments, that is, the data analysis logic corresponding to the request hotspot problem may be specifically an analysis logic for determining the suspected object by performing an equalization analysis on the flow data of the table fragments. In one embodiment, step 22 may specifically include: according to the acquired fragment flow data of a plurality of fragments with top flow ranks in each table, carrying out balance degree analysis on the fragment flow data in each table, and determining the fragment flow balance degree of each table; and determining a suspected hot spot list with the request hot spot problem and corresponding suspected hot spot fragments according to the fragment flow balance of each list. The specific mode of the equalization analysis is not limited in this application.
For example, for the fragment flow balance of each table, if the fragment flow balance of a certain table is smaller than the balance threshold, the table may be determined to be a suspected hot spot table, and further at least one fragment with the top fragment flow heat rank in the table may be determined to be a corresponding suspected hot spot fragment.
For example, assume Table T 2 Is divided into slices R of 2,1 Component flow data of 1000, segment R 2,2 The component flow rate data 980 of the rest of the fragments is within 10, then by the method of T 2 The flow balance of the middle fragments is analyzed to determine that the flow is concentrated on R 2,1 And R is 2,2 Thereby can determine the table T 2 T is a suspected hot spot list 2 Is divided into slices R of 2,1 And R is 2,2 And (5) slicing the corresponding suspected hot spot.
Also for example, assume Table T 3 Is divided into slices R of 3,1 The component flow data of (2) is 1000, and the flow data of the other fragments is less than 10, then by the method of T 3 The flow balance of the middle fragments is analyzed to determine that the flow is concentrated on R 3,1 Thereby can determine the table T 3 T is a suspected hot spot list 3 Is divided into slices R of 3,1 And (5) slicing the corresponding suspected hot spot.
By the method, the suspected hot spot list and the suspected hot spot fragments in the list can be determined, meanwhile, the complexity of data acquisition and the storage amount of data can be effectively reduced, and the resource cost of hot spot calculation and identification is reduced.
In the embodiment of the present application, the data category corresponding to the abnormal flow growth problem specifically refers to the data category required for analyzing the abnormal flow growth problem. In one embodiment, the data categories corresponding to the traffic anomaly increase problem may include a system load category+a table traffic data category, i.e., the traffic anomaly increase problem may be analyzed according to the system load and the table traffic. The load may be referred to as a CPU load.
In the embodiment of the present application, the partial table flow data required for analyzing the abnormal flow growth problem may be obtained by a method of obtaining the flow data of the table with higher table flow, that is, a header table flow obtaining method, so as to obtain the data to be processed corresponding to the abnormal flow growth problem.
For example, the header table flow may be periodically acquired, based on which step 21 may specifically include: and periodically acquiring the system load of the database system and the table flow data of each table in a plurality of tables with top flow ranks from the table flow data and the system load generated by the operation of the database system, and taking the table flow data as data to be processed corresponding to the flow abnormal growth problem. The flow of the table may specifically be a request amount of a read-write request for the table.
It should be noted that, in the case of data collection by the computer device X in fig. 1, the step 21 may include, for example: and periodically collecting the table flow data of each table in a plurality of tables with top flow ranks from the table flow data and the system load generated by the operation of the database system, and taking the table flow data as the data to be processed corresponding to the abnormal flow growth problem.
Optionally, the head table traffic can be collected in a snapshot manner, that is, part of table traffic data required for analyzing the traffic abnormal growth problem can be collected in a head snapshot manner, so as to obtain the data to be processed corresponding to the traffic abnormal growth problem.
In one embodiment, table traffic data for multiple tables before traffic ranking and system load may be recorded in a single snapshot. T is defined below i For the ith table, L k At t for database system k System load at time. Example 2, assume that there is a table T in the database system 1 、T 2 、T 3 、……、T p 、……、T q ……,T m If t 1 The relation of the flow of the timetable is T 1 >T 2 >T 3 >……>T p ……>T q ……>T m Then, as shown in FIG. 4, t 1 The time snapshot may record table T 1 To T m In T 1 、T 2 、T 3 、……、T p … … and T q Is set up in the table of flow data and t 1 Time of day system load L 1 . If t 2 Time sum t 3 The table flow relationship of the time is also T 1 >T 2 >T 3 >……>T p ……>T q ……>T m Then, as shown in FIG. 4, t 2 The time snapshot may record table T 1 To T m In T 1 、T 2 、T 3 、……、T p … … and T q Is set up in the table of flow data and t 2 Time of day system load L 2 ,t 3 The time snapshot may also record table T 1 To T m In T 1 、T 2 、T 3 、……、T p … … and T q Is set up in the table of flow data and t 3 Time of day system load L 3 . If t k The flow relations of the timetable are T p >T 1 >T 2 >……>T k ……>T q ……>T m Then, as shown in FIG. 4, t k The time snapshot may record table T 1 To T m In T q 、T 1 、T 2 、……、T k … … and T q Is set up in the table of flow data and t k Time of day system load L k
In this embodiment of the present application, after obtaining data to be processed required for analyzing a traffic abnormal growth problem from data generated by operation of a database system, the data to be processed corresponding to the traffic abnormal growth problem may be processed according to data analysis logic corresponding to the traffic abnormal growth problem, so as to determine a suspected object having the traffic abnormal growth problem in the database system.
The processing of the data to be processed corresponding to the abnormal flow increase problem can be realized by means of further analyzing the table flow through abnormal triggering of the system load, namely, the data analysis logic corresponding to the abnormal flow increase problem can be specifically the analysis logic for further analyzing the table flow through abnormal triggering of the system load. In one embodiment, a security load (Safety Load Value) of the database system may be defined for describing a security load water level threshold of the database system. Step 22 may specifically include: and determining whether the system load obtained each time is larger than a load threshold value, and determining a suspected abnormal growth table with the abnormal flow growth problem according to the table flow of each table in a plurality of tables with the top ranking of the acquired flow and the acquired flow for a plurality of times when the system load obtained each time is larger than the load threshold value.
For example, as shown in FIG. 4, assume t k Time of day system load L k Greater than the load threshold, then can be based on t k Table traffic for time snapshot and t k Time before time (e.g. t 1 From time to t k-1 Time of day) table traffic of the snapshot, a suspected abnormal growth table is determined. Specifically, it can be according to the table T i (i is equal to 1, 2, 3, … …, p,… … q) at t 1 From time to t k Time table flow, analysis table T i If there is an unexpected increase in the table traffic for the table traffic. In fig. 4, the signal is transmitted through the pair (T p,1 ,T p,2, T p,3, …T p,n ) Analysis can be performed to determine the gray filled table T of FIG. 4 p A suspected abnormal growth table for the existence of a traffic abnormal growth problem, wherein T p,1 Representing t 1 Time T p The flow rate of the meter, T p,2 Representing t 2 Time T p The flow rate of the meter, T p,3 Representing t 3 Time T p The flow rate of the meter, T p,n Representing t n Time T p Table traffic of (2).
By the method, the table with abnormal flow growth can be rapidly determined by triggering further analysis of the table flow through system load feedback of the database system on the basis of less calculation.
In the embodiment of the present application, the data category corresponding to the large query request problem specifically refers to a data category required for analyzing the large query request problem. In one embodiment, the number of bottom data blocks scanned by a single query request may be recorded in a log, and the data category corresponding to the large query request problem may include a log data category, i.e., the large query request problem may be analyzed according to the log data.
In the embodiment of the application, the problem of the large query request is considered to exist continuously if the problem exists, so that the target part data in all data required by analyzing the problem of the large query request can be obtained by obtaining the query records with the number of scanning blocks larger than the number threshold value in the last period of time.
For example, the acquisition of the query record with the number of scanning blocks greater than the number threshold in the last period of time may be performed periodically, based on which step 21 may specifically include: and periodically acquiring target query records with the number of scanning blocks larger than a number threshold value in the last period of time from log data generated by the operation of the database system, and taking the target query records as data to be processed corresponding to the large query request problem.
It should be noted that, in the case of data collection by the computer device X in fig. 1, the step 21 may include, for example: and periodically collecting target query records with the number of scanning blocks larger than a number threshold value in the last period of time from log data generated by the operation of the database system, and taking the target query records as data to be processed corresponding to the large query request problem.
For example, as shown in FIG. 5, t may be recorded in the log 1 Log entries (log records) from time to current time, the log entries represented by gray fills in fig. 5 represent the time period of the last time (i.e., t m Time to current time).
It is understood that a query request that scans for a number of data blocks greater than the number threshold may be considered a query request that scans for a greater number of data blocks. From t shown in FIG. 5 m The filtered target query record in the log record entry to the current time may be, for example:
t m big call who has read xxx blocks,table=T 1 dml=select*from T 1 where xxx
……
t n big call who has read xxx blocks,table=T 1 dml=select*from T 1 where xxx
……
let t in FIG. 5 1 ,t n ) Time period, table T x The number of inquiry requests with a large number of scanning data blocks is Cu, and the rate of inquiry requests with a large number of scanning data blocks is Su, [ t ] m ,t n ) Time period, table T x The number of inquiry requests with a large number of scanning data blocks is Cv, and the rate of inquiry requests with a large number of scanning data blocks is Sv, so that Cu can be obtained in a scene of stable flow>Cv, at the same time Su≡Sv. Based on this theoretical analysis, it can be seen that, as time goes by, the accumulation times of the query requests with a larger number of scanned data blocks will increase continuously, and the rate of the query requests with a larger number of scanned data blocks in the traffic stabilization scenario is basically unchanged, where the larger number of scanned data blocks is continuously present, the purpose in the last period of time in the log data The target query record may serve as the data to be processed required to analyze the large query request problem.
In this embodiment of the present application, after obtaining data to be processed required for analyzing a big query request problem from data generated by operation of a database system, the data to be processed corresponding to the big query request problem may be processed according to data analysis logic corresponding to the big query request problem, so as to determine a suspected object of the big query request problem in the database system. In one embodiment, step 22 may specifically include: and determining a suspected large query request table and a corresponding large query request statement of the large query request problem in the database system according to the target query record in the last period of time.
Optionally, the suspected object with the problem of the large query request can be determined according to the number of times and the rate of the query requests with a large number of scanned data blocks in the target query record. In one embodiment, the determining, according to the target query record in the last period of time, a suspected large query request table and a corresponding large query request statement in the database system, where the large query request problem exists, may specifically include: calculating the query request times and the query request rate of each table related to the target query record in the last period of time; and determining a table with the number of inquiry requests larger than a number threshold and the inquiry request rate larger than a rate threshold as a suspected large inquiry request table with the large inquiry request problem, and taking at least one inquiry statement with the top rank of the number of scanning data blocks in inquiry statements corresponding to the suspected large inquiry request table in the target inquiry record as a large inquiry request statement corresponding to the suspected large inquiry request table.
For example, T in FIG. 5 may be calculated 1 And the number and rate of the inquiry requests, and T is calculated 1 Number of query requests and query request rate, and T 1 Corresponding query statements, recorded in a row corresponding to T1 in fig. 5; t in FIG. 5 can be calculated 2 The number of inquiry requests and the inquiry request rate of (1) can also be calculatedT of (2) 2 Number of query requests and query request rate, and T 2 The corresponding query statement is recorded at T in FIG. 5 2 In a corresponding row; … …; t in FIG. 5 can also be calculated p And can also calculate the obtained T p Number of query requests and query request rate, and T p The corresponding large query statement is recorded at T in FIG. 5 p In a corresponding row. Let T be 1 And T 2 The number of inquiry requests is greater than the number threshold, and T 1 And T 2 If the query request rates of (1) are greater than the rate threshold, then T1 and T2 may be determined to be suspected large query request tables, represented by gray fill in fig. 5.
By the method, a small amount of logs are used for light analysis, a large query request table with a large query request problem and a corresponding large query request statement can be determined, and the weight-independent log collection, cleaning and storage service is not relied on, so that the light processing effect is achieved.
In this embodiment of the present application, after determining that the suspected object with the abnormal problem exists in the database system, the suspected object with the abnormal problem may further be further labeled, so that a client may intuitively learn the suspected object.
According to the data processing method, at least one type of abnormal problem corresponding to the abnormal problem on the client side is predefined from the client side of the database, the corresponding data type and the data analysis logic of the abnormal problem exist, target part data required by analysis of the corresponding abnormal problem are obtained from data generated by operation of the database system according to the data type and serve as data to be processed, the corresponding data to be processed are processed according to the data analysis logic, suspected objects with the abnormal problem in the database system are determined, light-weight data acquisition and light-weight data processing are achieved, and therefore computing resources can be saved.
FIG. 6 is a schematic diagram of a data processing apparatus according to an embodiment of the present disclosure; referring to fig. 6, this embodiment provides a data processing apparatus, which may perform the method described in the foregoing method embodiment, and specifically, the apparatus may include:
The obtaining module 61 is configured to obtain, for at least one predefined abnormal problem, target part data in all data required for analyzing the corresponding abnormal problem from data generated by the database system operation according to a data category corresponding to the abnormal problem, as data to be processed; the target part data can be used for determining suspected objects which have the abnormal problems and have a large influence on the operation of the database system, and other part data in the whole data can be used for determining suspected objects which have the abnormal problems and have a small influence on the operation of the database system;
and the processing module 62 is configured to process the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem, so as to determine that a suspected object of the abnormal problem exists in the database system.
Optionally, the at least one anomaly issue includes one or more of: request hot spot problems, traffic anomaly growth problems, or large query request problems.
Optionally, the acquiring module 61 is specifically configured to: and periodically acquiring the fragment flow data of a plurality of fragments with top flow ranks in each of a plurality of tables meeting preset requirements from fragment flow data generated by the operation of the database system, wherein the plurality of tables meeting the preset requirements comprise the plurality of tables with top flow ranks as data to be processed corresponding to the request hot spot problem.
Optionally, the processing module 62 is specifically configured to: according to the acquired fragment flow data of a plurality of fragments with top flow ranks in each table, carrying out balance degree analysis on the fragment flow in each table, and determining the fragment flow balance degree of each table; and determining a suspected hot spot list with the request hot spot problem and corresponding suspected hot spot fragments according to the fragment flow balance of each list.
Optionally, the acquiring module 61 is specifically configured to: and periodically acquiring the system load of the database system and the table flow data of each table in a plurality of tables with top flow ranks from the table flow data and the system load generated by the operation of the database system, and taking the table flow data as data to be processed corresponding to the flow abnormal growth problem.
Optionally, the processing module 62 is specifically configured to: and determining whether the system load obtained each time is larger than a load threshold value, and determining a suspected abnormal growth table with the abnormal flow growth problem according to the table flow of each table in a plurality of tables with the top ranking of the acquired flow and the acquired flow for a plurality of times when the system load obtained each time is larger than the load threshold value.
Optionally, the acquiring module 61 is specifically configured to: and periodically acquiring target query records with the number of scanning blocks larger than a number threshold value in the last period of time from log data generated by the operation of the database system, and taking the target query records as data to be processed corresponding to the large query request problem.
Optionally, the processing module 62 is specifically configured to: and determining a suspected large query request table and a corresponding suspected large query request statement of the large query request problem in the database system according to the target query record in the last period of time.
Optionally, the processing module 62 is configured to determine, according to the target query record in the last period of time, a suspected large query request table and a corresponding suspected large query request statement in the database system, where the large query request problem exists, specifically including: calculating the query request times and the query request rate of each table related to the target query record in the last period of time; and determining a table with the number of inquiry requests larger than a number threshold and the inquiry request rate larger than a rate threshold as a suspected large inquiry request table with the large inquiry request problem, and determining at least one inquiry statement with the top rank of the number of scanning data blocks in inquiry statements corresponding to the suspected large inquiry request table in the target inquiry record as a large inquiry request statement corresponding to the suspected large inquiry request table.
Optionally, the processing module 62 is further configured to label suspected objects that have the abnormal problem.
The apparatus shown in fig. 6 may perform the method of the embodiment shown in fig. 2, and reference is made to the relevant description of the embodiment shown in fig. 2 for parts of this embodiment not described in detail. The implementation process and the technical effect of this technical solution are described in the embodiment shown in fig. 2, and are not described herein.
In one possible implementation, the arrangement of the apparatus shown in fig. 6 may be implemented as a computer device. As shown in fig. 7, the computer device may include: a processor 71 and a memory 72. Wherein the memory 72 is for storing a program for supporting a computer device to perform the method provided in the embodiment shown in fig. 2 described above, the processor 71 is configured for executing the program stored in the memory 72.
The program comprises one or more computer instructions, wherein the one or more computer instructions, when executed by the processor 71, are capable of performing the steps of:
aiming at least one predefined abnormal problem, according to the data category corresponding to the abnormal problem, acquiring target part data in all data required by analyzing the corresponding abnormal problem from data generated by the operation of a database system as data to be processed; the target part data can be used for determining suspected objects which have the abnormal problems and have a large influence on the operation of the database system, and other part data in the whole data can be used for determining suspected objects which have the abnormal problems and have a small influence on the operation of the database system;
And processing the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem to determine the suspected object with the abnormal problem in the database system.
Optionally, the processor 71 is further configured to perform all or part of the steps in the embodiment shown in fig. 2 described above.
The computer device may also include a communication interface 73 in its structure for communicating with other devices or communication networks.
The embodiment of the present application further provides a data processing system 12 as shown in fig. 1, where, for at least one predefined abnormal problem, an acquisition module 121 is configured to acquire, from data generated by running a database system according to a data category corresponding to the abnormal problem, target portion data in all data required for analyzing the corresponding abnormal problem, as data to be processed; the target part data can be used for determining suspected objects which have the abnormal problems and have a large influence on the operation of the database system, and other part data in the whole data can be used for determining suspected objects which have the abnormal problems and have a small influence on the operation of the database system; the processing module 122 is configured to process the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem, so as to determine that a suspected object of the abnormal problem exists in the database system.
Optionally, the collecting module 121 is specifically configured to: and periodically collecting the fragment flow data of a plurality of fragments with top flow ranks in each of a plurality of tables meeting preset requirements from fragment flow data generated by the operation of the database system, wherein the plurality of tables meeting the preset requirements comprise the plurality of tables with top flow ranks as data to be processed corresponding to the request hot spot problem.
Optionally, the processing module 122 is specifically configured to: according to the collected fragment flow data of a plurality of fragments with top flow ranks in each table, carrying out balance degree analysis on the fragment flow in each table, and determining the fragment flow balance degree of each table; and determining a suspected hot spot list with the request hot spot problem and corresponding suspected hot spot fragments according to the fragment flow balance of each list.
Optionally, the collecting module 121 is specifically configured to: and periodically collecting the table flow data of each table in a plurality of tables with top flow ranks from the table flow data and the system load generated by the operation of the database system, and taking the table flow data as the data to be processed corresponding to the abnormal flow growth problem.
Optionally, the processing module 122 is specifically configured to: and determining whether the system load acquired each time is greater than a load threshold value, and determining a suspected abnormal growth table with the abnormal flow growth problem according to the table flow of each table in a plurality of tables with the top ranking of the flow acquired each time and the flow acquired multiple times before when the system load acquired each time is greater than the load threshold value.
Optionally, the collecting module 121 is specifically configured to: and periodically collecting target query records with the number of scanning blocks larger than a number threshold value in the last period of time from log data generated by the operation of the database system, and taking the target query records as data to be processed corresponding to the large query request problem.
Optionally, the processing module 122 is specifically configured to: and determining a suspected large query request table and a corresponding suspected large query request statement of the large query request problem in the database system according to the target query record in the last period of time.
Optionally, the processing module 122 is configured to determine, according to the target query record in the last period of time, a suspected large query request table and a corresponding suspected large query request statement in the database system, where the large query request problem exists, specifically including: calculating the query request times and the query request rate of each table related to the target query record in the last period of time; and determining a table with the number of inquiry requests larger than a number threshold and the inquiry request rate larger than a rate threshold as a suspected large inquiry request table with the large inquiry request problem, and determining at least one inquiry statement with the top rank of the number of scanning data blocks in inquiry statements corresponding to the suspected large inquiry request table in the target inquiry record as a large inquiry request statement corresponding to the suspected large inquiry request table.
Optionally, the processing module 122 is further configured to label suspected objects that have the abnormal problem.
In addition, the embodiment of the application further provides a computer readable storage medium, on which a computer program is stored, which when executed, implements the method according to the embodiment of the method shown in fig. 2.
Embodiments of the present application also provide a computer program product comprising computer program instructions which, when executed by a processor, implement a method as described in the method embodiment shown in fig. 2.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by adding necessary general purpose hardware platforms, or may be implemented by a combination of hardware and software. Based on such understanding, the foregoing aspects, in essence and portions contributing to the art, may be embodied in the form of a computer program product, which may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, linked lists, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the corresponding technical solutions from the scope of the technical solutions of the embodiments of the present application.

Claims (13)

1. A data processing method, comprising:
aiming at least one predefined abnormal problem, according to the data category corresponding to the abnormal problem, acquiring target part data in all data required by analyzing the corresponding abnormal problem from data generated by the operation of a database system as data to be processed; the abnormal problem is a problem that the flow of the client accessing the database system defined from the viewpoint of using the database system by the client is abnormal, the target part data can be used for determining suspected objects which have the abnormal problem and have a large influence on the operation of the database system, and other part data in the whole data can be used for determining suspected objects which have the abnormal problem and have a small influence on the operation of the database system;
And processing the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem to determine the suspected object with the abnormal problem in the database system.
2. The method of claim 1, the at least one anomaly issue comprising one or more of: request hot spot problems, traffic anomaly growth problems, or large query request problems.
3. The method according to claim 2, wherein the obtaining, from the data generated by the database system operation according to the data category corresponding to the abnormal problem, the target portion data in all the data required for analyzing the corresponding abnormal problem, as the data to be processed, includes:
and periodically acquiring the fragment flow data of a plurality of fragments with top flow ranks in each of a plurality of tables meeting preset requirements from fragment flow data generated by the operation of the database system, wherein the plurality of tables meeting the preset requirements comprise the plurality of tables with top flow ranks as data to be processed corresponding to the request hot spot problem.
4. The method according to claim 3, wherein the processing, according to the data analysis logic corresponding to the abnormal problem, the corresponding data to be processed to determine the suspected object of the abnormal problem in the database system includes:
According to the acquired fragment flow data of a plurality of fragments with top flow ranks in each table, carrying out balance degree analysis on the fragment flow in each table, and determining the fragment flow balance degree of each table;
and determining a suspected hot spot list with the request hot spot problem and corresponding suspected hot spot fragments according to the fragment flow balance of each list.
5. The method according to claim 2, wherein the obtaining, from the data generated by the database system operation according to the data category corresponding to the abnormal problem, the target portion data in all the data required for analyzing the corresponding abnormal problem, as the data to be processed, includes:
and periodically acquiring the system load of the database system and the table flow data of each table in a plurality of tables with top flow ranks from the table flow data and the system load generated by the operation of the database system, and taking the table flow data as data to be processed corresponding to the flow abnormal growth problem.
6. The method according to claim 5, wherein the processing, according to the data analysis logic corresponding to the abnormal problem, the corresponding data to be processed to determine the suspected object of the abnormal problem in the database system includes:
And determining whether the system load obtained each time is larger than a load threshold value, and determining a suspected abnormal growth table with the abnormal flow growth problem according to the table flow of each table in a plurality of tables with the top ranking of the acquired flow and the acquired flow for a plurality of times when the system load obtained each time is larger than the load threshold value.
7. The method according to claim 2, wherein the obtaining, from the data generated by the database system operation according to the data category corresponding to the abnormal problem, the target portion data in all the data required for analyzing the corresponding abnormal problem, as the data to be processed, includes:
and periodically acquiring target query records with the number of scanning blocks larger than a number threshold value in the last period of time from log data generated by the operation of the database system, and taking the target query records as data to be processed corresponding to the large query request problem.
8. The method of claim 7, wherein the processing, according to the data analysis logic corresponding to the abnormal problem, the corresponding data to be processed to determine a suspected object in the database system that has the abnormal problem includes:
and determining a suspected large query request table and a corresponding suspected large query request statement of the large query request problem in the database system according to the target query record in the last period of time.
9. The method of claim 8, the determining, from the target query record in the last period of time, a suspected large query request table and a corresponding suspected large query request statement in the database system for which the large query request problem exists, comprising:
calculating the query request times and the query request rate of each table related to the target query record in the last period of time;
and determining a table with the number of inquiry requests larger than a number threshold and the inquiry request rate larger than a rate threshold as a suspected large inquiry request table with the large inquiry request problem, and determining at least one inquiry statement with the top rank of the number of scanning data blocks in inquiry statements corresponding to the suspected large inquiry request table in the target inquiry record as a large inquiry request statement corresponding to the suspected large inquiry request table.
10. A data processing apparatus comprising:
the acquisition module is used for aiming at least one predefined abnormal problem, acquiring target part data in all data required by analyzing the corresponding abnormal problem from data generated by the operation of the database system according to the data category corresponding to the abnormal problem, and taking the target part data as data to be processed; the abnormal problem is a problem that the flow of the client accessing the database system defined from the viewpoint of using the database system by the client is abnormal, the target part data can be used for determining suspected objects which have the abnormal problem and have a large influence on the operation of the database system, and other part data in the whole data can be used for determining suspected objects which have the abnormal problem and have a small influence on the operation of the database system;
And the processing module is used for processing the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem so as to determine the suspected object with the abnormal problem in the database system.
11. A data processing system, comprising:
the first computer equipment is used for aiming at least one predefined abnormal problem, and collecting and analyzing target part data in all data required by the corresponding abnormal problem from data generated by the operation of the database system according to the data category corresponding to the abnormal problem as data to be processed; the abnormal problem is a problem that the flow of the client accessing the database system defined from the viewpoint of using the database system by the client is abnormal, the target part data can be used for determining suspected objects which have the abnormal problem and have a large influence on the operation of the database system, and other part data in the whole data can be used for determining suspected objects which have the abnormal problem and have a small influence on the operation of the database system;
and the second computer equipment is used for processing the corresponding data to be processed according to the data analysis logic corresponding to the abnormal problem so as to determine the suspected object with the abnormal problem in the database system.
12. A computer device, comprising: a memory, a processor; wherein the memory is configured to store one or more computer instructions which, when executed by the processor, implement the method of any one of claims 1 to 9.
13. A computer readable storage medium having stored thereon a computer program which, when executed, implements the method of any of claims 1 to 9.
CN202110864262.2A 2021-07-29 2021-07-29 Data processing method, apparatus, device, system, storage medium and program product Active CN113608909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110864262.2A CN113608909B (en) 2021-07-29 2021-07-29 Data processing method, apparatus, device, system, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110864262.2A CN113608909B (en) 2021-07-29 2021-07-29 Data processing method, apparatus, device, system, storage medium and program product

Publications (2)

Publication Number Publication Date
CN113608909A CN113608909A (en) 2021-11-05
CN113608909B true CN113608909B (en) 2024-02-02

Family

ID=78306006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110864262.2A Active CN113608909B (en) 2021-07-29 2021-07-29 Data processing method, apparatus, device, system, storage medium and program product

Country Status (1)

Country Link
CN (1) CN113608909B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6224193B1 (en) * 2016-09-26 2017-11-01 みずほ情報総研株式会社 Test process management system, test process management method, and test process management program
CN109542868A (en) * 2018-09-28 2019-03-29 中国平安人寿保险股份有限公司 Position method, apparatus, electronic equipment and the storage medium of abnormal SQL statement
CN110908974A (en) * 2018-09-14 2020-03-24 阿里巴巴集团控股有限公司 Database management method, device, equipment and storage medium
CN111061588A (en) * 2019-12-13 2020-04-24 北京奇艺世纪科技有限公司 Method and device for locating database abnormal source
WO2021023053A1 (en) * 2019-08-05 2021-02-11 阿里巴巴集团控股有限公司 Data processing method and device, and storage medium
US11012452B1 (en) * 2018-01-09 2021-05-18 NortonLifeLock, Inc. Systems and methods for establishing restricted interfaces for database applications

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11003641B2 (en) * 2017-09-22 2021-05-11 Microsoft Technology Licensing, Llc Automatic database troubleshooting
US11093642B2 (en) * 2019-01-03 2021-08-17 International Business Machines Corporation Push down policy enforcement

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6224193B1 (en) * 2016-09-26 2017-11-01 みずほ情報総研株式会社 Test process management system, test process management method, and test process management program
US11012452B1 (en) * 2018-01-09 2021-05-18 NortonLifeLock, Inc. Systems and methods for establishing restricted interfaces for database applications
CN110908974A (en) * 2018-09-14 2020-03-24 阿里巴巴集团控股有限公司 Database management method, device, equipment and storage medium
CN109542868A (en) * 2018-09-28 2019-03-29 中国平安人寿保险股份有限公司 Position method, apparatus, electronic equipment and the storage medium of abnormal SQL statement
WO2021023053A1 (en) * 2019-08-05 2021-02-11 阿里巴巴集团控股有限公司 Data processing method and device, and storage medium
CN111061588A (en) * 2019-12-13 2020-04-24 北京奇艺世纪科技有限公司 Method and device for locating database abnormal source

Also Published As

Publication number Publication date
CN113608909A (en) 2021-11-05

Similar Documents

Publication Publication Date Title
CN111356148A (en) Method and related equipment for realizing network optimization
US20100153431A1 (en) Alert triggered statistics collections
CN107122126B (en) Data migration method, device and system
CN110147470B (en) Cross-machine-room data comparison system and method
CN106649687A (en) Method and device for on-line analysis and processing of large data
CN114185761B (en) Log acquisition method, device and equipment
CN109165207B (en) Drinking water mass data storage management method and system based on Hadoop
CN117971488A (en) Storage management method and related device for distributed database cluster
US20200293543A1 (en) Method and apparatus for transmitting data
CN113608909B (en) Data processing method, apparatus, device, system, storage medium and program product
CN114036410A (en) Data storage method, data query method, data storage device, data query system, data query program, and data storage medium
CN103902614A (en) Data processing method, device and system
CN116434415A (en) Information processing method and device of number calling system, processor and electronic equipment
CN117389472A (en) Cold and hot data migration method and system for transaction data
CN115442262B (en) Resource evaluation method and device, electronic equipment and storage medium
CA2884091C (en) System and method for load distribution in a network
CN115994029A (en) Container resource scheduling method and device
KR102054068B1 (en) Partitioning method and partitioning device for real-time distributed storage of graph stream
CN113723710B (en) Customer loss prediction method, system, storage medium and electronic equipment
CN118158092B (en) Computing power network scheduling method and device and electronic equipment
CN113129075B (en) Synchronous tracking data tracing system for online and offline sales of house property
CN107862006A (en) The implementation method and device of data source switching
CN118193503B (en) Hierarchical management system for server center data
CN115061815B (en) AHP-based optimal scheduling decision method and system
CN116187895B (en) Intelligent warehouse cargo flow planning method, system and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant