CN113342502B - Performance diagnosis method and device for data lake, computer equipment and storage medium - Google Patents

Performance diagnosis method and device for data lake, computer equipment and storage medium Download PDF

Info

Publication number
CN113342502B
CN113342502B CN202110738024.7A CN202110738024A CN113342502B CN 113342502 B CN113342502 B CN 113342502B CN 202110738024 A CN202110738024 A CN 202110738024A CN 113342502 B CN113342502 B CN 113342502B
Authority
CN
China
Prior art keywords
current
execution
preset time
data lake
time period
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110738024.7A
Other languages
Chinese (zh)
Other versions
CN113342502A (en
Inventor
邓晟
黄亚兵
陈阳
林开贤
敖毅波
沈志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Finance Technology Co Ltd
Original Assignee
China Merchants Finance Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Finance Technology Co Ltd filed Critical China Merchants Finance Technology Co Ltd
Priority to CN202110738024.7A priority Critical patent/CN113342502B/en
Publication of CN113342502A publication Critical patent/CN113342502A/en
Application granted granted Critical
Publication of CN113342502B publication Critical patent/CN113342502B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a method and a device for diagnosing the performance of a data lake, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring task scheduling information data of dates meeting a service level agreement from the task scheduling information data of the last preset period; determining a delay coefficient of each preset time period in the execution date according to the first starting execution time and the first ending execution time; calculating to obtain a first abnormal value detection threshold and a second abnormal value detection threshold according to the Laplace criterion and the delay coefficient; acquiring task scheduling information data of a current execution date; calculating to obtain a current delay coefficient according to the second initial execution time, the second termination execution time and the current execution date; and comparing the current performance and the historical performance of the data lake according to the current delay coefficient, the first abnormal value detection threshold and the second abnormal value detection threshold to determine whether the current performance of the data lake meets the level service agreement or not, so that the efficiency of determining the performance of the data lake is improved.

Description

Performance diagnosis method and device for data lake, computer equipment and storage medium
Technical Field
The invention relates to the technical field of data analysis, in particular to a method and a device for diagnosing performance of a data lake, computer equipment and a storage medium.
Background
The enterprise data lake provides a large-data batch computing service for an enterprise, and the batch computing tasks are periodically scheduled by the scheduling system, so that the situation of delaying the periodic scheduling tasks is easy to occur, for example, when the scheduling system submits the batch computing tasks to the data lake for execution, the submitted batch computing tasks exceed the concurrence of the scheduling system, so that the task execution time is increased, and the performance of the data lake basic service (such as Hive, yarn and GaussDB 200) is reduced, so that the task execution time is also increased, such as system process memory overflow, frequent Full GC, unbalanced load, data inclination, business table data expansion and the like; when the batch computation tasks use computing resources such as CPUs (central processing units), memories, IOs (input/output) and the like of a data lake, when the computing resources are in bottleneck, if the use amount of the new online task resources is large, the computing resources are in shortage, and the queuing of the computing tasks is delayed to be executed, so that the execution time of the tasks is increased.
At present, technicians are mainly required to check the reason of the delay of the periodic scheduling task one by one, the efficiency is low, and the performance of the data lake cannot be quickly determined.
Disclosure of Invention
The embodiment of the invention provides a method and a device for diagnosing the performance of a data lake, computer equipment and a storage medium, which are used for improving the efficiency of determining the performance of the data lake.
In order to solve the above technical problem, an embodiment of the present application provides a method for diagnosing performance of a data lake, including:
acquiring task scheduling information data of dates meeting a service level agreement from task scheduling information data of a last preset period as a first reference data set, wherein the preset period is M days and is composed of N continuous preset time periods every day, and the first reference data set comprises K first task numbers, a first starting execution time corresponding to each first task number, a first ending execution time corresponding to each first task number and an execution date corresponding to each first task number;
determining a delay coefficient of each preset time period in each execution date according to the first starting execution time and the first ending execution time for each execution date;
calculating a first abnormal value detection threshold value of each preset time period and a second abnormal value detection threshold value of each preset time period according to a Laplace criterion and the delay coefficient, wherein the first abnormal value detection threshold value is smaller than the second abnormal value detection threshold value;
acquiring task scheduling information data of a current execution date in a current preset period as a second reference data set, wherein the second reference data set comprises L second task numbers, a second initial execution time corresponding to each second task number, a second ending execution time corresponding to each second task number and the current execution date corresponding to each second task number;
calculating to obtain a current delay coefficient of a current preset time period on the current execution date according to the second starting execution time, the second ending execution time and the current execution date;
and comparing the current performance of the data lake in the current preset time period with the historical performance of the data lake in the preset time period according to the current delay coefficient, the first abnormal value detection threshold and the second abnormal value detection threshold, and determining whether the current performance of the data lake in the current preset time period meets the level service protocol.
In order to solve the above technical problem, an embodiment of the present application further provides a performance diagnosis apparatus for a data lake, including:
a first obtaining module, configured to obtain task scheduling information data of a date meeting a service level agreement from task scheduling information data in a last preset period, as a first reference data set, where the preset period is M days and each day is composed of N consecutive preset time periods, and the first reference data set includes K first task numbers, a first start execution time corresponding to each first task number, a first end execution time corresponding to each first task number, and an execution date corresponding to each first task number;
a first determining module, configured to determine, for each of the execution dates, a delay coefficient for each of the preset time periods within each of the execution dates according to the first start execution time and the first end execution time;
a first calculating module, configured to calculate a first outlier detection threshold for each preset time period and a second outlier detection threshold for each preset time period according to a lai-dar criterion and the delay coefficient, where the first outlier detection threshold is smaller than the second outlier detection threshold;
a second obtaining module, configured to obtain task scheduling information data of a current execution date in a current preset period as a second reference data set, where the second reference data set includes L second task numbers, a second start execution time corresponding to each second task number, a second end execution time corresponding to each second task number, and the current execution date corresponding to each second task number;
the second calculation module is used for calculating a current delay coefficient of a current preset time period on the current execution date according to the second starting execution time, the second ending execution time and the current execution date;
and the comparison module is used for comparing the current performance of the data lake in the current preset time period with the historical performance of the data lake in the preset time period according to the current delay coefficient, the first abnormal value detection threshold and the second abnormal value detection threshold, and determining whether the current performance of the data lake in the current preset time period meets the level service protocol.
In order to solve the technical problem, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored, and the computer program, when executed by a processor, implements the steps of the method for diagnosing performance of a data lake.
According to the performance diagnosis method, the performance diagnosis device, the computer equipment and the storage medium of the data lake, task scheduling information data of dates meeting a service level agreement is obtained from task scheduling information data of a previous preset period and is used as a first reference data set, wherein the preset period is M days and is composed of N continuous preset time periods every day, and the first reference data set comprises K first task numbers, a first starting execution time corresponding to each first task number, a first ending execution time corresponding to each first task number and an execution date corresponding to each first task number; determining a delay coefficient of each preset time period in the execution date according to the first starting execution time and the first ending execution time for each execution date; calculating a first abnormal value detection threshold value of each preset time period and a second abnormal value detection threshold value of each preset time period according to a Lauda criterion and a delay coefficient, wherein the first abnormal value detection threshold value is smaller than the second abnormal value detection threshold value; acquiring task scheduling information data of a current execution date in a current preset period as a second reference data set, wherein the second reference data set comprises L second task numbers, a second initial execution time corresponding to each second task number, a second ending execution time corresponding to each second task number and a current execution date corresponding to each second task number; calculating to obtain a current delay coefficient of a current preset time period at the current execution date according to the second starting execution time, the second ending execution time and the current execution date; according to the current delay coefficient, the first abnormal value detection threshold of each preset time period and the second abnormal value detection threshold of each preset time period, the current performance of the data lake in the current preset time period is compared with the historical performance of the data lake in the preset time period, whether the current performance of the data lake in the current preset time period meets the level service protocol or not is determined, and the efficiency of determining the performance of the data lake is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
FIG. 2 is a flow chart of one embodiment of a method for diagnosing performance of a data lake of the present application;
FIG. 3 is a schematic block diagram of an embodiment of a data lake performance diagnostic apparatus according to the present application;
FIG. 4 is a block diagram of one embodiment of a computer device according to the present application.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
Referring to fig. 1, as shown in fig. 1, a system architecture 100 may include terminal devices 101, 102, 103, a network 104 and a server 105. Network 104 is the medium used to provide communication links between terminal devices 101, 102, 103 and server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an e-book reader, an MP3 player (Moving Picture experts Group Audio Layer III, motion Picture experts compression standard Audio Layer 3), an MP4 player (Moving Picture experts Group Audio Layer IV, motion Picture experts compression standard Audio Layer 4), a laptop portable computer, a desktop computer, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that the performance diagnosis method for the data lake provided by the embodiment of the present application is executed by a server, and accordingly, the performance diagnosis device for the data lake is disposed in the server.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. Any number of terminal devices, networks and servers may be provided according to implementation needs, and the terminal devices 101, 102 and 103 in this embodiment may specifically correspond to an application system in actual production.
Referring to fig. 2, fig. 2 shows a method for diagnosing performance of a data lake according to an embodiment of the present invention, which is described by taking the method applied to the server in fig. 1 as an example, and is detailed as follows:
s201: the method comprises the steps of obtaining task scheduling information data of dates meeting a service level agreement from task scheduling information data of a last preset period as a first reference data set, wherein the preset period is M days and is composed of N continuous preset time periods every day, and the first reference data set comprises K first task numbers, a first starting execution time corresponding to each first task number, a first ending execution time corresponding to each first task number and an execution date corresponding to each first task number.
Specifically, M, K, and N are positive integers greater than 0, the service level agreement is a contract with legal binding force negotiated and signed between the service provider and the user, and in the present application, the content specified in the service level agreement includes, but is not limited to, a performance requirement of a data lake every day, a performance requirement of a data lake every hour, and a task execution speed requirement of a data lake every hour;
the first termination execution time is obtained according to a time point corresponding to a task completion state corresponding to each task number, the task completion state comprises task completion or task failure, and the time point of task completion or the time point of task failure is used as the first termination execution time.
For better illustration of the preset period and the preset time period, the preset period and the preset time period are exemplified herein:
for example, the preset period is 30 days, and each day is divided by 24 hours, so that the preset time period is 1 hour.
Here, it should be particularly noted that, assuming that M days of task scheduling data in a preset period of M days meet a service level agreement, the M days of task scheduling data are used as a first reference data set, and if 2 days of task scheduling data in the M days do not meet the service level agreement, the 2 days of task scheduling data are rejected.
S202: and determining a delay coefficient of each preset time period in each execution date according to the first starting execution time and the first ending execution time for each execution date.
Specifically, the delay coefficient may be a median or an 80% percentile, the execution date is one day of a preset period, the preset time period is a minimum time unit of one day, and the first start execution time and the first end execution time corresponding to each task number corresponding to each execution date are calculated to determine the execution duration corresponding to the completed task number of each preset time period; and sequencing all the corresponding execution durations in each preset time period according to the sequence from small to large to obtain an execution duration sequence, and searching a median or an 80% percentile in the execution duration sequence as a delay coefficient.
S203: and calculating a first abnormal value detection threshold value of each preset time period and a second abnormal value detection threshold value of each preset time period according to the Laplace criterion and the delay coefficient, wherein the first abnormal value detection threshold value is smaller than the second abnormal value detection threshold value.
Specifically, the laiida criterion may be specifically expressed as (μ -3 σ, μ +3 σ), where μ is a mean value of the execution coefficients of the same preset time period on each day in the preset period, σ is a standard deviation of the preset time period in the previous preset period, μ -3 σ is a first abnormal value detection threshold, μ +3 σ is a second abnormal value detection threshold, and the standard deviation σ may be calculated by the following formula (1):
Figure BDA0003140520820000081
wherein, sigma is standard deviation, M is the number of days for which the task scheduling data of the preset period meets the service level agreement, and x i Delay factor, x, for the ith execution date in the same predetermined time period i The average value of the delay coefficients of the M execution dates in the same preset time period,
s204: and acquiring task scheduling information data of the current execution date in the current preset period as a second reference data set, wherein the second reference data set comprises L second task numbers, a second starting execution time corresponding to each second task number, a second ending execution time corresponding to each second task number and the current execution date corresponding to each second task number.
Specifically, L is a positive integer greater than 0, the current preset period is the same as the previous preset period, and the current execution date in the current preset period corresponds to the execution date of a certain day in the previous preset period, for example, if the current execution date in the current preset period is 2021 year 5 month 30, the current execution date corresponds to 2021 year 4 month 30 in the previous preset period.
S205: calculating to obtain a current delay coefficient of a current preset time period at the current execution date according to the second starting execution time, the second ending execution time and the current execution date;
specifically, the current preset time is the same as the preset time period, the current delay coefficient may be a median or an 80% percentile, the current preset time period is the minimum time unit of the current execution date, and the current execution time corresponding to the completed task number of each preset time period is determined by calculating the second start execution time and the second end execution time corresponding to each task number corresponding to the current execution date; and sequencing all current execution durations corresponding to each current preset time period in the sequence from small to large to obtain a current execution duration sequence, and searching a median or an 80% percentile in the current execution duration sequence as a current delay coefficient, wherein the second execution termination time is determined according to a time point corresponding to a task completion state corresponding to each task number, the task completion state comprises task completion or task failure, and the time point corresponding to the task completion or the time point corresponding to the task failure can be used as the second execution termination time.
S206: and comparing the current performance of the data lake in the current preset time period with the historical performance of the data lake in the preset time period according to the current delay coefficient, the first abnormal value detection threshold and the second abnormal value detection threshold, and determining whether the current performance of the data lake in the current preset time period meets the grade service agreement.
Specifically, in step S206, the step of comparing the current performance of the data lake in the current preset time period with the historical performance of the data lake in the preset time period according to the current delay coefficient, the first abnormal value detection threshold and the second abnormal value detection threshold, and determining whether the current performance of the data lake in the current preset time period meets the hierarchical service agreement includes:
and if the current delay coefficient is greater than or equal to the first abnormal value detection threshold and smaller than the second abnormal value detection threshold, the current performance of the data lake in the current preset time period is consistent with the historical performance of the data lake in the preset time period, and the current performance of the data lake in the current preset time period is determined to meet the service level agreement.
And if the current delay coefficient is smaller than the first abnormal value detection threshold, the current performance of the data lake in the current preset time period is superior to the historical performance of the data lake in the preset time period, and the current performance of the data lake in the current preset time period is determined to meet and be higher than the service level agreement.
And if the current delay coefficient is larger than or equal to the second abnormal value detection threshold, the current performance of the data lake in the current preset time period is inferior to the historical performance of the data lake in the preset time period, and the current performance of the data lake in the current preset time period is determined to be not capable of meeting the service level agreement.
For example, assume that the current execution date is 2021 year # 6/month # 30, and the current execution coefficient for the current preset time period of 00.00-1.00 is y 1 The first abnormal value detection threshold is (mu) 1 -3σ 1 ) The second abnormal value detection threshold is (mu) 1 +3σ 1 ) Wherein, mu 1 The average value of the execution coefficients of each execution date in the last preset period (assuming that the last preset period is 2021 year 5 month 1 to 2021 year 5 month 30, and all satisfy the service level agreement) in the same preset time period (namely, 00.00-1.00 of each day in 2021 year 5 month 1 to 2021 year 5 month 30), σ 1 Standard deviation for each execution date over the same preset time period:
if (mu) 1 -3σ 1 )≤y 1 <(μ 1 +3σ 1 ) And if so, determining that the current performance of the data lake in the current preset time period is consistent with the historical performance of the data lake in the preset time period, and determining that the current performance of the data lake in the current preset time period meets the service level agreement.
If y 1 <(μ 1 -3σ 1 ) And if so, the current performance of the data lake in the current preset time period is superior to the historical performance of the data lake in the preset time period, and the current performance of the data lake in the current preset time period is determined to meet and be higher than the service level agreement.
If (mu) 1 +3σ 1 )≤y 1 If the current performance of the data lake in the current preset time period is inferior to the historical performance of the data lake in the preset time period, determining that the current performance of the data lake in the current preset time period does not meet the service level agreement.
In this step, when the current execution coefficient is greater than or equal to the second abnormal value detection threshold, calculating a difference value between the current execution coefficient and the second abnormal value detection threshold, and performing peak staggering adjustment on a task to be completed within the current preset time period or performing timely capacity expansion on the data lake according to the difference value, so that the current performance of the data lake within the current preset time period is improved, and the data lake meets the service level agreement.
According to the performance diagnosis method for the data lake, provided by the embodiment of the invention, task scheduling information data of dates meeting a service level agreement is obtained from task scheduling information data of a last preset period and is used as a first reference data set, wherein the preset period is M days, each day consists of N continuous preset time periods, and the first reference data set comprises K first task numbers, a first initial execution time corresponding to each first task number, a first termination execution time corresponding to each first task number and an execution date corresponding to each first task number; determining a delay coefficient of each preset time period in the execution date according to the first starting execution time and the first ending execution time for each execution date; calculating a first abnormal value detection threshold value of each preset time period and a second abnormal value detection threshold value of each preset time period according to a Lauda criterion and a delay coefficient, wherein the first abnormal value detection threshold value is smaller than the second abnormal value detection threshold value; acquiring task scheduling information data of a current execution date in a current preset period as a second reference data set, wherein the second reference data set comprises L second task numbers, a second initial execution time corresponding to each second task number, a second ending execution time corresponding to each second task number and a current execution date corresponding to each second task number; calculating to obtain a current delay coefficient of a current preset time period on the current execution date according to the second starting execution time, the second ending execution time and the current execution date; according to the current delay coefficient, the first abnormal value detection threshold value of each preset time period and the second abnormal value detection threshold value of each preset time period, the current performance of the data lake in the current preset time period is compared with the historical performance of the data lake in the preset time period, whether the current performance of the data lake in the current preset time period meets the level service protocol or not is determined, and the efficiency of determining the performance of the data lake is improved.
In some optional implementations of the present embodiment, in step S202, the step of determining the delay coefficient for each preset time period within each execution date according to the first start execution time and the first end execution time for each execution date includes C1 to C3:
c1: and determining all execution durations of the preset time period in each execution date according to the first initial execution time, the first ending execution time and the execution dates.
C2: and sequencing all the execution durations in each preset time period according to a preset rule to obtain an execution duration sequence.
C3: and determining a delay coefficient of each preset time period in each execution date according to the execution time length sequence.
For a better understanding of steps C1 to C3, further examples of steps C1 to C3 are described here:
for example, assume that the preset period is 6 month 1 to 6 month 30, where the task scheduling data of the date satisfying the service level agreement is 6 month 1 to 6 month 30 task scheduling data, and the execution date is the preset time period 7 within 6 month 3: 00-8:00 is an example:
in the preset time period 7:00-8:00 is [10min, 14min, 8min, 3min, 5min, 1min, 2min, 1min, 3min ], and all the execution durations are sorted to obtain an execution duration sequence [1min, 2min, 3min, 5min, 8min, 10min, 14min ], so that, in a preset time period 7:00-8: the execution coefficient of 00 is 3min for the median and 10min for the 80 th percentile.
In this step, the delay coefficient is a median or an 80% percentile, where the median is not affected by the maximum or minimum values of the distribution sequence, so that the median is representative in the execution duration sequence, and the 80% quantile is representative of a longer execution duration in the execution duration sequence, which can be adjusted according to an actual application scenario, thereby facilitating quantization processing of the execution duration and further improving performance diagnosis efficiency for determining the data lake.
In some optional implementations of the present embodiment, in step S203, steps D1 to D3 of calculating a first outlier detection threshold for each preset time period and a second outlier detection threshold for each preset time period according to a lazy-dary criterion and a delay coefficient are performed:
d1: and calculating the average value of the delay coefficients of the same preset time period every day to obtain N average values.
D2: and calculating to obtain N standard deviations according to the N average values and the N delay coefficients.
D3: and calculating to obtain a first abnormal value detection threshold value of each preset time period and a second abnormal value detection threshold value of each preset time period based on the Lauda criterion, the N standard deviations and the N average values.
For steps D1 to D3, explanation has been made at step S203, and in order to avoid repetition, no explanation is made here.
In the step, a first abnormal value detection threshold value of each preset time period and a second abnormal value detection threshold value of each preset time period are calculated and obtained based on the Lauda criterion, the N standard deviations and the N average values, and the current performance of the data lake in the current preset time period is judged according to the first abnormal value detection threshold value and the second abnormal value detection threshold value, so that the performance diagnosis efficiency of the data lake in the current preset time period is improved.
In some optional implementations of the present embodiment, a method for diagnosing performance of a data lake further includes the following steps A1 to A2:
a1: and obtaining N first delay execution coefficients according to the N current delay coefficients, the N first abnormal value detection thresholds and the N second abnormal value detection thresholds.
A2: and comparing the task execution speed of the data lake in the current preset time period with the task execution speed of the data lake in the preset time period according to the first delay execution coefficient and a preset first detection threshold value, and determining whether the task execution speed of the data lake in the current preset time period meets a service level protocol.
Specifically, in step A1, the step of obtaining N first delay execution coefficients according to N current delay coefficients, N first outlier detection thresholds, and N second outlier detection thresholds includes:
if the current delay factor is greater than the first outlier detection threshold and less than the second outlier detection threshold, the first delay execution factor is 0.
And if the current delay coefficient is smaller than the first abnormal value detection threshold or the current delay coefficient is larger than the second abnormal value detection threshold, calculating a difference value between the current delay coefficient and the first abnormal value detection threshold, and taking the difference value as a first delay execution coefficient.
For example, assuming that the second abnormality detection threshold for the time period of 00.00-1.00 in the last preset cycle is (μ 1+3 σ 1), the current delay coefficient is T1, and the first delay execution coefficient is T, the first delay execution coefficient T = T1- (μ 1+3 σ 1).
Specifically, the step A2 of comparing the task execution speed of the data lake in the current preset time period with the task execution speed of the data lake in the preset time period according to the first delay execution coefficient and the preset first detection threshold value and determining whether the task execution speed of the data lake in the current preset time period meets the service level agreement or not includes:
and if the first delay execution coefficient is equal to the preset first detection threshold, the current task execution speed of the data lake in the current preset time period is consistent with the historical task execution speed of the data lake in the preset time period, and the task execution speed of the data lake in the current preset time period is determined to meet the service level protocol.
And if the first delay execution coefficient is smaller than a preset first detection threshold value, the current task execution speed of the data lake in the current preset time period is superior to the historical task execution speed of the data lake in the preset time period, and the task execution speed of the data lake in the current preset time period is determined to meet and be higher than the service level protocol.
And if the first delay execution coefficient is larger than a preset first detection threshold, the current task execution speed of the data lake in the current preset time period is lower than the historical task execution speed of the data lake in the preset time period, and the task execution speed of the data lake in the current preset time period is determined not to meet the service level protocol.
Assuming, for example, that the first delay execution coefficient is T, the preset first detection threshold is 0,
when T =0, the current performance of the data lake in the current preset time period is consistent with the historical performance of the data lake in the preset time period, and the task execution speed of the data lake in the current preset time period is determined to meet the service level agreement;
when T is less than 0, the current performance of the data lake in the current preset time period is superior to the historical performance of the data lake in the preset time period, and the task execution speed of the data lake in the current preset time period is determined to meet and be higher than the service level agreement;
and when T is greater than 0, the current performance of the data lake in the current preset time period is inferior to the historical performance of the data lake in the preset time period, and the task execution speed of the data lake in the current preset time period is determined not to meet the service level agreement.
In this embodiment, the task execution speed of the data lake in the current preset time period is compared with the task execution speed of the data lake in the preset time period through the first delay execution coefficient and the preset first detection threshold, whether the task execution speed of the data lake in the current preset time period meets the service level protocol is determined, the diagnosis efficiency of the task execution speed of the database in the current preset time period is improved, and when the current performance of the data lake in the current preset time period is inferior to the historical performance of the data lake in the preset time period, the data lake in the current preset time period is subjected to expanding or peak-staggered execution, so that the execution efficiency of the data lake is improved, and the task execution speed of the data lake meets the service level protocol.
In some optional implementations of this embodiment, a method for diagnosing performance of a data lake further includes the following steps B1 to B2:
b1: averaging the N first delay execution coefficients every day to obtain a second delay execution coefficient corresponding to the current execution date;
b2: and comparing the current performance of the data lake in the current execution date with the historical performance of the data lake in the execution date according to the second delayed execution coefficient and a preset second detection threshold value, and determining whether the current performance of the data lake in the current execution date meets the service level agreement.
Specifically, the step B2 of comparing the current performance of the data lake in the current execution date with the historical performance of the data lake in the execution date according to the second delayed execution coefficient and the preset second detection threshold, and determining whether the current performance of the data lake in the current execution date meets the service level agreement includes:
and if the second delayed execution coefficient is equal to the preset second detection threshold, the current performance of the data lake in the current execution date is consistent with the historical performance of the data lake in the execution date, and the current performance of the data lake in the current execution date is determined to meet the service level agreement.
And if the second delayed execution coefficient is smaller than a preset second detection threshold, the current performance of the data lake in the current execution date is superior to the historical performance of the data lake in the execution date, and the current performance of the data lake in the current execution date is determined to meet and be higher than the service level agreement.
If the second delayed execution coefficient is greater than the preset second detection threshold, the current performance of the data lake in the current execution date is inferior to the historical performance of the data lake in the execution date, and the current performance of the data lake in the current execution date is determined not to meet the service level agreement.
For example, assuming that the second delay execution coefficient is X, the preset second detection threshold is 0
When X =0, the current performance of the data lake in the current execution date is consistent with the historical performance of the data lake in the execution date, and the current performance of the data lake in the current execution date is determined to meet the service level agreement;
when X is less than 0, the current performance of the data lake in the current execution date is superior to the historical performance of the data lake in the execution date, and the current performance of the data lake in the current execution date is determined to meet and be higher than the service level agreement;
and when X is greater than 0, the current performance of the data lake in the current execution date is inferior to the historical performance of the data lake in the execution date, and the current performance of the data lake in the current execution date is determined not to meet the service level agreement.
In this embodiment, according to the second delayed execution coefficient and the preset second detection threshold, the current performance of the data lake in the current execution date is compared with the historical performance of the data lake in the execution date, whether the current performance of the data lake in the current execution date meets the service level agreement is determined, and when the current performance of the data lake in the current execution date is inferior to the historical performance of the data lake in the execution date, the data lake on the current execution date is subjected to expansion or task off-peak execution, so that the execution efficiency of the data lake is improved, and the performance of the data lake on the current execution date meets the service level agreement.
It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.
FIG. 3 is a schematic block diagram of a performance diagnosis apparatus for data lakes, which corresponds one-to-one to the performance diagnosis method for data lakes according to the above-described embodiment. As shown in fig. 3, the performance diagnosis apparatus of the data lake includes a first obtaining module 30, a first determining module 31, a first calculating module 32, a second obtaining module 33, a second calculating module 34, and a comparing module 35. The detailed description of each functional module is as follows:
the first obtaining module 30 is configured to obtain, from the task scheduling information data in the previous preset period, task scheduling information data of a date meeting a service level agreement as a first reference data set, where the preset period is M days and each day is composed of N consecutive preset time periods, and the first reference data set includes K first task numbers, a first start execution time corresponding to each first task number, a first end execution time corresponding to each first task number, and an execution date corresponding to each first task number.
A first determining module 31, configured to determine, for each execution date, a delay coefficient for each preset time period within each execution date according to the first start execution time and the first end execution time.
The first calculating module 32 is configured to calculate a first outlier detection threshold for each preset time period and a second outlier detection threshold for each preset time period according to the ralda criterion and the delay coefficient, where the first outlier detection threshold is smaller than the second outlier detection threshold.
The second obtaining module 33 is configured to obtain task scheduling information data of a current execution date in a current preset period as a second reference data set, where the second reference data set includes L second task numbers, a second start execution time corresponding to each second task number, a second end execution time corresponding to each second task number, and a current execution date corresponding to each second task number.
And the second calculating module 34 is configured to calculate a current delay coefficient of the current preset time period on the current execution date according to the second starting execution time, the second ending execution time, and the current execution date.
The comparison module 35 is configured to compare the current performance of the data lake in the current preset time period with the historical performance of the data lake in the preset time period according to the current delay coefficient, the first outlier detection threshold, and the second outlier detection threshold, and determine whether the current performance of the data lake in the current preset time period meets the hierarchical service protocol.
Further, the first determining module 31 includes a second determining module, a sorting module and a third determining module, and the detailed description of each functional module is as follows:
and the second determining module is used for determining all the execution duration of the preset time period in each execution date according to the first starting execution time, the first ending execution time and the execution dates.
And the sequencing module is used for sequencing all the execution durations in each preset time period according to a preset rule to obtain an execution duration sequence.
And the third determining module is used for determining the delay coefficient of each preset time period in each execution date according to the execution time length sequence.
Further, the first calculating module 32 includes a third calculating module, a fourth calculating module and a fifth calculating module, and the detailed description of each functional module is as follows:
and the third calculation module is used for calculating the average value of the delay coefficients in the same preset time period every day to obtain N average values.
And the fourth calculation module is used for calculating and obtaining N standard deviations according to the N average values and the N delay coefficients.
And the fifth calculation module is used for calculating and obtaining the first abnormal value detection threshold value of each preset time period and the second abnormal value detection threshold value of each preset time period based on the Laviand criterion, the N standard deviations and the N average values.
Further, the comparing module 35 includes a fourth determining module, a fifth determining module and a sixth determining module, and the detailed description of each functional module is as follows:
and the fourth determining module is used for determining that the current performance of the data lake in the current preset time period is consistent with the historical performance of the data lake in the preset time period if the current delay coefficient is greater than or equal to the first abnormal value detection threshold and smaller than the second abnormal value detection threshold, and determining that the current performance of the data lake in the current preset time period meets the service level protocol.
And the fifth determining module is used for determining that the current performance of the data lake in the current preset time period meets and is higher than the service level agreement if the current delay coefficient is smaller than the first abnormal value detection threshold value and the current performance of the data lake in the current preset time period is superior to the historical performance of the data lake in the preset time period.
And the sixth determining module is used for determining that the current performance of the data lake in the current preset time period does not meet the service level agreement if the current delay coefficient is greater than or equal to the second abnormal value detection threshold value and the current performance of the data lake in the current preset time period is inferior to the historical performance of the data lake in the preset time period.
Optionally, the performance diagnosis device for the data lake further includes a first execution coefficient obtaining module and a seventh determining module, and the detailed description of each functional module is as follows:
and the first execution coefficient acquisition module is used for acquiring N first delay execution coefficients according to the N current delay coefficients, the N first abnormal value detection thresholds and the N second abnormal value detection thresholds.
And the seventh determining module is used for comparing the task execution speed of the data lake in the current preset time period with the task execution speed of the data lake in the preset time period according to the first delay execution coefficient and the preset first detection threshold, and determining whether the task execution speed of the data lake in the current preset time period meets the service level agreement.
Further, the seventh determining module includes an eighth determining module, a ninth determining module and a tenth determining module, and the detailed description of each functional module is as follows:
and the eighth determining module is configured to, if the first delayed execution coefficient is equal to the preset first detection threshold, determine that the current task execution speed of the data lake in the current preset time period is consistent with the historical task execution speed of the data lake in the preset time period, and determine that the task execution speed of the data lake in the current preset time period meets the service level agreement.
And the ninth determining module is used for determining that the task execution speed of the data lake in the current preset time period meets and is higher than the service level protocol, if the first delay execution coefficient is smaller than the preset first detection threshold, the current task execution speed of the data lake in the current preset time period is higher than the historical task execution speed of the data lake in the preset time period.
And the tenth determining module is used for determining that the task execution speed of the data lake in the current preset time period does not meet the service level agreement if the first delayed execution coefficient is greater than the preset first detection threshold value and the current task execution speed of the data lake in the current preset time period is lower than the historical task execution speed of the data lake in the preset time period.
Optionally, the device for diagnosing performance of a data lake further includes a second delay coefficient obtaining module and an eleventh determining module, and detailed descriptions of the functional modules are as follows:
and the second delay coefficient acquisition module is used for averaging the N first delay execution coefficients every day to obtain a second delay execution coefficient corresponding to the current execution date.
And the eleventh determining module is used for comparing the current performance of the data lake in the current execution date with the historical performance of the data lake in the execution date according to the second delayed execution coefficient and a preset second detection threshold value, and determining whether the current performance of the data lake in the current execution date meets the service level agreement.
Further, the eleventh determining module includes a twelfth determining module, a thirteenth determining module and a fourteenth determining module, and the detailed description of each functional module is as follows:
and the twelfth determining module is used for determining that the current performance of the data lake in the current execution date meets the service level agreement if the second delayed execution coefficient is equal to the preset second detection threshold value and the current performance of the data lake in the current execution date is consistent with the historical performance of the data lake in the execution date.
And the thirteenth determining module is used for determining that the current performance of the data lake in the current execution date meets and is higher than the service level agreement if the second delayed execution coefficient is smaller than the preset second detection threshold value, and the current performance of the data lake in the current execution date is better than the historical performance of the data lake in the execution date.
And the fourteenth determining module is used for determining that the current performance of the data lake in the current execution date does not meet the service level agreement if the second delayed execution coefficient is greater than the preset second detection threshold value and the current performance of the data lake in the current execution date is inferior to the historical performance of the data lake in the execution date.
For specific limitations of the performance diagnostic apparatus of the data lake, reference can be made to the above limitations of the performance diagnostic method of the data lake, which are not described herein again. The various modules in the performance diagnostic apparatus of the data lake described above can be implemented in whole or in part by software, hardware, and combinations thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 4 in particular, fig. 4 is a block diagram of a basic structure of a computer device according to the embodiment.
The computer device 4 comprises a memory 41, a processor 42, and a network interface 43, which are communicatively connected to each other via a system bus. It is noted that only computer device 4 having components connection memory 41, processor 42, network interface 43 is shown, but it is understood that not all of the illustrated components are required to be implemented, and that more or fewer components may alternatively be implemented. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user in a keyboard mode, a mouse mode, a remote controller mode, a touch panel mode or a voice control equipment mode.
The memory 41 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or D interface display memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. In some embodiments, the memory 41 may be an internal storage unit of the computer device 4, such as a hard disk or a memory of the computer device 4. In other embodiments, the memory 41 may also be an external storage device of the computer device 4, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 4. Of course, the memory 41 may also include both an internal storage unit of the computer device 4 and an external storage device thereof. In this embodiment, the memory 41 is generally used for storing an operating system installed in the computer device 4 and various types of application software, such as program codes for controlling electronic files. Further, the memory 41 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 42 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 42 is typically used to control the overall operation of the computer device 4. In this embodiment, the processor 42 is configured to run program codes stored in the memory 41 or process data, for example, program codes for controlling electronic files.
The network interface 43 may comprise a wireless network interface or a wired network interface, and the network interface 43 is generally used for establishing communication connection between the computer device 4 and other electronic devices.
The present application further provides another embodiment, which is to provide a computer-readable storage medium storing an interface display program, which is executable by at least one processor to cause the at least one processor to perform the steps of the method for diagnosing performance of a data lake as described above.
Through the description of the foregoing embodiments, it is clear to those skilled in the art that the method of the foregoing embodiments may be implemented by software plus a necessary general hardware platform, and certainly may also be implemented by hardware, but in many cases, the former is a better implementation. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It should be understood that the above-described embodiments are merely exemplary of some, and not all, embodiments of the present application, and that the drawings illustrate preferred embodiments of the present application without limiting the scope of the claims appended hereto. This application is capable of embodiments in many different forms and the embodiments are provided so that this disclosure will be thorough and complete. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (9)

1. A method of diagnosing the performance of a data lake, the method comprising:
acquiring task scheduling information data of dates meeting a service level agreement from task scheduling information data of a last preset period as a first reference data set, wherein the preset period is M days and is composed of N continuous preset time periods every day, and the first reference data set comprises K first task numbers, a first starting execution time corresponding to each first task number, a first ending execution time corresponding to each first task number and an execution date corresponding to each first task number;
determining a delay coefficient of each preset time period in each execution date according to the first starting execution time and the first ending execution time for each execution date;
calculating a first abnormal value detection threshold value of each preset time period and a second abnormal value detection threshold value of each preset time period according to a Lauda criterion and the delay coefficient, wherein the first abnormal value detection threshold value is smaller than the second abnormal value detection threshold value;
acquiring task scheduling information data of a current execution date in a current preset period as a second reference data set, wherein the second reference data set comprises L second task numbers, a second initial execution time corresponding to each second task number, a second ending execution time corresponding to each second task number and the current execution date corresponding to each second task number;
calculating to obtain a current delay coefficient of a current preset time period on the current execution date according to the second starting execution time, the second ending execution time and the current execution date;
according to the current delay coefficient, the first abnormal value detection threshold and the second abnormal value detection threshold, comparing the current performance of the data lake in the current preset time period with the historical performance of the data lake in the preset time period, and determining whether the current performance of the data lake in the current preset time period meets the service level protocol;
obtaining N first delay execution coefficients according to the N current delay coefficients, the N first abnormal value detection thresholds and the N second abnormal value detection thresholds;
and comparing the task execution speed of the data lake in the current preset time period with the task execution speed of the data lake in the preset time period according to the first delay execution coefficient and a preset first detection threshold value, and determining whether the task execution speed of the data lake in the current preset time period meets the service level protocol.
2. The method of claim 1, wherein said step of determining a delay factor for each of said preset time periods within each of said execution dates based on said first start execution time and said first end execution time for each of said execution dates comprises:
determining all execution durations of a preset time period in each execution date according to the first starting execution time, the first ending execution time and the execution dates;
sequencing all the execution durations in each preset time period according to a preset rule to obtain an execution duration sequence;
and determining a delay coefficient of each preset time period in each execution date according to the execution time length sequence.
3. The method of claim 1, wherein the step of calculating a first outlier detection threshold for each of the preset time periods and a second outlier detection threshold for each of the preset time periods based on the Lauda criterion and the delay factor comprises:
calculating the average value of the delay coefficients of the same preset time period every day to obtain N average values;
calculating N standard deviations according to the N average values and the N delay coefficients;
and calculating a first abnormal value detection threshold value of each preset time period and a second abnormal value detection threshold value of each preset time period based on a Lauda criterion, N standard deviations and N average values.
4. The method for diagnosing the performance of a data lake of claim 1, wherein the step of comparing the current performance of the data lake in the current preset time period with the historical performance of the data lake in the preset time period according to the current delay factor, the first outlier detection threshold and the second outlier detection threshold to determine whether the current performance of the data lake in the current preset time period satisfies the class service agreement comprises:
if the current delay coefficient is greater than or equal to the first outlier detection threshold and smaller than the second outlier detection threshold, the current performance of the data lake in the current preset time period is consistent with the historical performance of the data lake in the preset time period, and it is determined that the current performance of the data lake in the current preset time period meets the service level protocol;
if the current delay coefficient is smaller than the first abnormal value detection threshold, the current performance of the data lake in the current preset time period is superior to the historical performance of the data lake in the preset time period, and the current performance of the data lake in the current preset time period is determined to meet and be higher than the service level protocol;
and if the current delay coefficient is larger than or equal to the second abnormal value detection threshold, the current performance of the data lake in the current preset time period is inferior to the historical performance of the data lake in the preset time period, and the current performance of the data lake in the current preset time period is determined not to meet the service level protocol.
5. The method for diagnosing the performance of a data lake of claim 1, wherein the step of comparing the current performance of the data lake in the current preset time period with the historical performance of the data lake in the preset time period according to the first delayed execution coefficient and a preset first detection threshold to determine whether the task execution speed of the data lake in the current preset time period satisfies the service level agreement comprises:
if the first delay execution coefficient is equal to a preset first detection threshold, the current performance of the data lake in the current preset time period is consistent with the historical performance of the data lake in the preset time period, and the task execution speed of the data lake in the current preset time period is determined to meet the service level protocol;
if the first delayed execution coefficient is smaller than a preset first detection threshold, the current performance of the data lake in the current preset time period is superior to the historical performance of the data lake in the preset time period, and the task execution speed of the data lake in the current preset time period is determined to meet and be higher than the service level protocol;
and if the first delayed execution coefficient is larger than a preset first detection threshold, the current performance of the data lake in the current preset time period is inferior to the historical performance of the data lake in the preset time period, and the task execution speed of the data lake in the current preset time period is determined not to meet the service level protocol.
6. A method of diagnosing the performance of a data lake of claim 1, further comprising:
averaging the N first delay execution coefficients every day to obtain a second delay execution coefficient corresponding to the current execution date;
and comparing the current performance of the data lake in the current execution date with the historical performance of the data lake in the execution date according to the second delayed execution coefficient and a preset second detection threshold value, and determining whether the current performance of the data lake in the current execution date meets the service level agreement.
7. The method of claim 6, wherein the step of comparing the current performance of the data lake on the current execution date with the historical performance of the data lake on the execution date based on the second delayed execution coefficient and a preset second detection threshold to determine whether the current performance of the data lake on the current execution date satisfies the service level agreement comprises:
if the second delayed execution coefficient is equal to a preset second detection threshold value, the current performance of the data lake in the current execution date is consistent with the historical performance of the data lake in the execution date, and the current performance of the data lake in the current execution date is determined to meet the service level agreement;
if the second delayed execution coefficient is smaller than a preset second detection threshold, the current performance of the data lake in the current execution date is superior to the historical performance of the data lake in the execution date, and the current performance of the data lake in the current execution date is determined to meet and be higher than the service level agreement;
and if the second delayed execution coefficient is larger than a preset second detection threshold, the current performance of the data lake in the current execution date is inferior to the historical performance of the data lake in the execution date, and the current performance of the data lake in the current execution date is determined not to meet the service level agreement.
8. A performance diagnostic apparatus for a data lake, the apparatus comprising:
a first obtaining module, configured to obtain task scheduling information data of a date meeting a service level agreement from task scheduling information data in a last preset period, as a first reference data set, where the preset period is M days and each day is composed of N consecutive preset time periods, and the first reference data set includes K first task numbers, a first start execution time corresponding to each first task number, a first end execution time corresponding to each first task number, and an execution date corresponding to each first task number;
a first determining module, configured to determine, for each of the execution dates, a delay coefficient for each of the preset time periods within each of the execution dates according to the first start execution time and the first end execution time;
a first calculation module, configured to calculate a first outlier detection threshold for each of the preset time periods and a second outlier detection threshold for each of the preset time periods according to a ralda criterion and the delay coefficient, where the first outlier detection threshold is smaller than the second outlier detection threshold;
a second obtaining module, configured to obtain task scheduling information data of a current execution date in a current preset cycle, as a second reference data set, where the second reference data set includes L second task numbers, a second start execution time corresponding to each second task number, a second end execution time corresponding to each second task number, and the current execution date corresponding to each second task number;
the second calculation module is used for calculating a current delay coefficient of a current preset time period on the current execution date according to the second starting execution time, the second ending execution time and the current execution date;
the comparison module is used for comparing the current performance of the data lake in the current preset time period with the historical performance of the data lake in the preset time period according to the current delay coefficient, the first abnormal value detection threshold and the second abnormal value detection threshold, and determining whether the current performance of the data lake in the current preset time period meets the service level protocol;
a first execution coefficient obtaining module, configured to obtain N first delay execution coefficients according to the N current delay coefficients, the N first outlier detection thresholds, and the N second outlier detection thresholds;
and the determining module is used for comparing the task execution speed of the data lake in the current preset time period with the task execution speed of the data lake in the preset time period according to the first delay execution coefficient and a preset first detection threshold value, and determining whether the task execution speed of the data lake in the current preset time period meets the service level protocol.
9. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out a method for diagnosing the performance of a data lake according to any one of claims 1 to 7.
CN202110738024.7A 2021-06-30 2021-06-30 Performance diagnosis method and device for data lake, computer equipment and storage medium Active CN113342502B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110738024.7A CN113342502B (en) 2021-06-30 2021-06-30 Performance diagnosis method and device for data lake, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110738024.7A CN113342502B (en) 2021-06-30 2021-06-30 Performance diagnosis method and device for data lake, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113342502A CN113342502A (en) 2021-09-03
CN113342502B true CN113342502B (en) 2023-01-20

Family

ID=77481814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110738024.7A Active CN113342502B (en) 2021-06-30 2021-06-30 Performance diagnosis method and device for data lake, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113342502B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114116842B (en) * 2021-11-25 2023-05-19 上海柯林布瑞信息技术有限公司 Multidimensional medical data real-time acquisition method and device, electronic equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813513A (en) * 2020-06-24 2020-10-23 中国平安人寿保险股份有限公司 Real-time task scheduling method, device, equipment and medium based on distribution

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7865584B2 (en) * 2006-06-20 2011-01-04 Alcatel Lucent Network service performance monitoring apparatus and methods
US20140229608A1 (en) * 2013-02-14 2014-08-14 Alcatel-Lucent Canada Inc. Parsimonious monitoring of service latency characteristics
US10127255B1 (en) * 2015-10-07 2018-11-13 Applied Predictive Technologies, Inc. Computer system and method of initiative analysis using outlier identification
CN110086649B (en) * 2019-03-19 2023-06-16 深圳壹账通智能科技有限公司 Abnormal flow detection method, device, computer equipment and storage medium
CN110572297B (en) * 2019-08-09 2021-07-06 网宿科技股份有限公司 Network performance evaluation method, server and storage medium
CN111338878A (en) * 2020-02-21 2020-06-26 平安科技(深圳)有限公司 Anomaly detection method and device, terminal device and storage medium
CN111726341B (en) * 2020-06-02 2022-10-14 五八有限公司 Data detection method and device, electronic equipment and storage medium
CN112463530A (en) * 2020-11-23 2021-03-09 平安科技(深圳)有限公司 Anomaly detection method and device for micro-service system, electronic equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111813513A (en) * 2020-06-24 2020-10-23 中国平安人寿保险股份有限公司 Real-time task scheduling method, device, equipment and medium based on distribution

Also Published As

Publication number Publication date
CN113342502A (en) 2021-09-03

Similar Documents

Publication Publication Date Title
CN111475373A (en) Service control method and device under micro service, computer equipment and storage medium
WO2022126961A1 (en) Method for target object behavior prediction of data offset and related device thereof
CN111813624A (en) Robot execution time length estimation method based on time length analysis and related equipment thereof
CN110633135A (en) Asynchronous task allocation method and device, computer equipment and storage medium
CN108491304B (en) electronic device, business system risk control method and storage medium
CN112288163A (en) Target factor prediction method of target object and related equipment
CN114564294A (en) Intelligent service arranging method and device, computer equipment and storage medium
CN113342502B (en) Performance diagnosis method and device for data lake, computer equipment and storage medium
CN110708197B (en) Monitoring device, method for setting monitoring policy of monitoring device, and storage medium
CN110413610B (en) Method and system for improving export efficiency of business data report forms and database server
CN114254927A (en) Production scheduling method and device, computer equipment and storage medium
CN110443441B (en) Rule efficiency monitoring method, device, computer equipment and storage medium
CN110569114B (en) Service processing method, device, equipment and storage medium
CN111144634A (en) Method and device for predicting power price
CN114138401A (en) Container configuration method, device, equipment and medium based on artificial intelligence
CN111817294B (en) Power grid power supply control system, method and equipment based on mathematical model
CN109190789B (en) Medium-and-long-term wind power prediction method and device, computer equipment and storage medium
CN112182107A (en) Method and device for acquiring list data, computer equipment and storage medium
CN112002352A (en) Random music playing method and device, computer equipment and storage medium
CN111327513B (en) Message data pushing method and device, computer equipment and storage medium
CN117215753A (en) Task scheduling method, device, equipment and medium
CN113657654B (en) Method, device, equipment and storage medium for estimating delivery quantity of building packages
CN111194009B (en) GPS information extraction method, device, equipment and storage medium
CN113504995A (en) Batch data processing method and device, computer equipment and storage medium
CN115936910A (en) Fee data processing method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant