CN117271183A - Method and device for acquiring database abnormal job scheduling retry strategy - Google Patents

Method and device for acquiring database abnormal job scheduling retry strategy Download PDF

Info

Publication number
CN117271183A
CN117271183A CN202311182938.5A CN202311182938A CN117271183A CN 117271183 A CN117271183 A CN 117271183A CN 202311182938 A CN202311182938 A CN 202311182938A CN 117271183 A CN117271183 A CN 117271183A
Authority
CN
China
Prior art keywords
abnormal
scheduling
database
retry
job
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311182938.5A
Other languages
Chinese (zh)
Inventor
雷经纬
徐嘉禛
于子烨
罗响
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202311182938.5A priority Critical patent/CN117271183A/en
Publication of CN117271183A publication Critical patent/CN117271183A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/90335Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/907Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Library & Information Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for acquiring a database abnormal operation scheduling retry strategy, and relates to the field of big data, the field of financial science and technology or other related technical fields, wherein the method for acquiring comprises the following steps: responding to a job scheduling request sent by a user side, and collecting database error codes, wherein the database error codes are error codes generated based on abnormal job scheduling in the process of executing job scheduling tasks by the database; obtaining abnormal scheduling characteristics based on the database error codes, and determining an abnormal scheduling type to which abnormal job scheduling belongs based on the abnormal scheduling characteristics; and inquiring a retry strategy library based on the abnormal scheduling characteristics and the abnormal scheduling types to obtain a retry strategy of abnormal job scheduling. The invention solves the technical problems that in the related art, when the abnormal situation occurs in the process of executing the job scheduling task, the retry mechanism based on the fixed times is difficult to effectively solve the abnormal situation and consumes more database system resources.

Description

Method and device for acquiring database abnormal job scheduling retry strategy
Technical Field
The invention relates to the field of big data, the field of financial science and technology or other related technical fields, in particular to a method and a device for acquiring a database abnormal operation scheduling retry strategy.
Background
With the rapid development of big data technology, data is taken as an important basis for supporting various business activities such as business decision making, wind control evaluation, market analysis and the like, the processing efficiency and accuracy of the data have important influence on business operation, when ultra-large-scale data are processed, MPP (large-scale parallel processing) distributed database clusters with high-efficiency parallel processing capacity are often used, and when the MPP database clusters execute job scheduling tasks, due to the fact that the number of cluster nodes is numerous, the failure of a single node can cause the failure of the whole data job; meanwhile, due to the huge data scale, communication and data transmission among nodes may become bottlenecks affecting the operation efficiency.
In the related art, when abnormal job scheduling occurs in a database during execution of a job scheduling task, a retry module is generally added to a big data scheduling component, and retries for a captured error reporting job directly for a fixed number of times, specifically including: (1) An exception capturing mechanism is introduced into the big data scheduling component, and error reporting conditions in the operation executing process are monitored; (2) The retry times limit, when the abnormality is captured, the retry module retries the operation according to the preset retry times, namely, for each captured abnormality, the retry operation with fixed times is performed; (3) A retry interval limit, wherein a retry strategy with a fixed time interval is set in a retry module, so that the operation processing is ensured to be completed in a limited time; (4) When the operation is abnormal, the retry module automatically retries the operation according to the preset retry times and interval time, and the operation is re-executed when retries each time, so that the problem in the operation execution process is solved through multiple attempts; (5) Judging the retry result, judging the final state of the job according to the retry result after the preset retry number is reached, if the job is successfully completed within the retry number, judging that the job is successfully executed, and if the job still fails within the retry number, judging that the job is failed to be executed.
The above-mentioned retry operation for the abnormal job scheduling for a fixed number of times has the following drawbacks: firstly, the existing scheme adopts the same retry mechanism for all abnormal job scheduling, and for some types of errors, even if retries are carried out, the abnormality cannot be effectively avoided because the errors of the system and the abnormality are various, for example, the errors caused by temporary network jitter can be solved by a retry mechanism with fixed times; however, if the database is abnormal due to high load or SQL statement writing errors, the retry operation performed for a fixed number of times may be trapped in a retry dead cycle, so that the purpose of avoiding the abnormality cannot be achieved, and the operation and maintenance workload is increased; second, existing retry mechanisms may burden the database, for example, when the underlying MPP distributed database is already in a high-load state, blind retry operations may further increase the pressure of the database and may even cause database crashes; or, when the SQL statement itself has grammar errors or performance problems, no matter how many times the retry is performed, the SQL statement cannot be successfully performed, but rather consumes more database system resources, so that the SQL statement excessively depends on a simple retry mechanism, so that the problem cannot be effectively solved, but rather the severity of the problem is possibly aggravated, and even systematic risks can be caused in extreme cases.
In view of the above problems, no effective solution has been proposed at present.
Disclosure of Invention
The embodiment of the invention provides a method and a device for acquiring a database abnormal job scheduling retry strategy, which are used for at least solving the technical problems that in the related art, when an abnormal situation occurs in the process of executing a job scheduling task, a retry mechanism based on fixed times is difficult to effectively solve the abnormal situation and more database system resources are consumed.
According to an aspect of the embodiment of the invention, there is provided a method for acquiring a database abnormal job scheduling retry strategy, including: responding to a job scheduling request sent by a user side, and collecting database error codes, wherein the database error codes are error codes generated based on abnormal job scheduling in the process of executing a job scheduling task by a database, and the abnormal job scheduling is a scheduling process of abnormal job occurrence caused by interaction interruption between the user side and a database server in the process of executing the job scheduling task; obtaining abnormal scheduling characteristics based on the database error codes, and determining an abnormal scheduling type to which the abnormal job scheduling belongs based on the abnormal scheduling characteristics; inquiring a retry strategy library based on the abnormal scheduling characteristics and the abnormal scheduling types to obtain a retry strategy of the abnormal job scheduling, wherein the retry strategy at least comprises: number of retries, duration of retry interval.
Optionally, before responding to the job scheduling request sent by the user side, the method further comprises: collecting a historical database error code generated by executing a job scheduling task in a historical time period; and acquiring abnormal job scheduling information based on the historical database error code, and constructing an abnormal job database based on the historical database error code, the abnormal job scheduling information and the relation between the historical database error code and the abnormal job scheduling information.
Optionally, after constructing an abnormal job database based on the historical database error code, the abnormal job scheduling information, and the relationship between the historical database error code and the abnormal job scheduling information, the method further includes: extracting features of the abnormal job scheduling information in the abnormal job database to obtain the abnormal scheduling features; classifying the abnormal job scheduling based on the abnormal scheduling characteristics to obtain an abnormal scheduling type to which the abnormal job scheduling belongs; and constructing an abnormal database based on the historical database error code, the abnormal scheduling characteristic, the abnormal scheduling type and the relation among the three.
Optionally, after constructing an exception database based on the historical database error code, the exception scheduling feature, the exception scheduling type, and a relationship between the three, further comprising: configuring the retry strategy for the abnormal job scheduling based on the abnormal scheduling type; and constructing the retry strategy library based on the abnormal job scheduling information corresponding to the abnormal job scheduling and the retry strategy.
Optionally, the step of collecting the database error code in response to the job scheduling request sent by the user terminal includes: generating a job scheduling task based on the job scheduling request, and calling a database interface to execute the job scheduling task; and generating an error code acquisition instruction based on the job scheduling task, and acquiring the database error code in real time in the execution process of the job scheduling task.
Optionally, the step of obtaining the abnormal scheduling feature based on the database error code and determining the abnormal scheduling type to which the abnormal job scheduling belongs based on the abnormal scheduling feature includes: inquiring the abnormal operation database based on the database error code to obtain abnormal operation scheduling information; and extracting features of the abnormal job scheduling information to obtain the abnormal scheduling features, and inquiring the abnormal database based on the abnormal scheduling features to obtain the abnormal scheduling type of the abnormal job scheduling.
Optionally, after querying a retry strategy library based on the abnormal scheduling feature and the abnormal scheduling type to obtain a retry strategy of the abnormal job scheduling, the method further includes: executing the operation scheduling retry operation based on the retry strategy of the abnormal operation scheduling, and recording log data in the process of executing the retry operation; and evaluating the retry strategy based on log data in the process of executing the retry operation to obtain an evaluation result, and updating the retry strategy library according to the evaluation result.
According to another aspect of the embodiment of the present invention, there is also provided an apparatus for acquiring a database abnormal job scheduling retry policy, including: the system comprises an acquisition unit, a database server and a database server, wherein the acquisition unit is used for responding to a job scheduling request sent by the user side and acquiring a database error code, wherein the database error code is an error code generated based on abnormal job scheduling in the process of executing a job scheduling task by the database, and the abnormal job scheduling is a scheduling process of abnormal job occurrence caused by interaction interruption between the user side and the database server in the process of executing the job scheduling task; the determining unit is used for obtaining abnormal scheduling characteristics based on the database error codes and determining an abnormal scheduling type to which the abnormal job scheduling belongs based on the abnormal scheduling characteristics; the query unit is configured to query a retry strategy library based on the abnormal scheduling feature and the abnormal scheduling type, and obtain a retry strategy of the abnormal job scheduling, where the retry strategy at least includes: number of retries, duration of retry interval.
Optionally, the obtaining device of the database abnormal job scheduling retry strategy further includes: the first acquisition module is used for acquiring a historical database error code generated by executing a job scheduling task in a historical time period; the first construction module is used for acquiring abnormal job scheduling information based on the historical database error code and constructing an abnormal job database based on the historical database error code, the abnormal job scheduling information and the relation between the historical database error code and the abnormal job scheduling information.
Optionally, the obtaining device of the database abnormal job scheduling retry strategy further includes: the first extraction module is used for extracting the characteristics of the abnormal job scheduling information in the abnormal job database to obtain the abnormal scheduling characteristics; the first classification module is used for classifying the abnormal job scheduling based on the abnormal scheduling characteristics to obtain an abnormal scheduling type to which the abnormal job scheduling belongs; and the second construction module is used for constructing an abnormal database based on the historical database error code, the abnormal scheduling characteristics, the abnormal scheduling type and the relation among the three.
Optionally, the obtaining device of the database abnormal job scheduling retry strategy further includes: the first configuration module is used for configuring the retry strategy for the abnormal job scheduling based on the abnormal scheduling type; and the third construction module is used for constructing the retry strategy library based on the abnormal job scheduling information corresponding to the abnormal job scheduling and the retry strategy.
Optionally, the acquisition unit includes: the first generation module is used for generating a job scheduling task based on the job scheduling request and calling a database interface to execute the job scheduling task; and the second generation module is used for generating an error code acquisition instruction based on the job scheduling task and acquiring the database error code in real time in the execution process of the job scheduling task.
Optionally, the determining unit includes: the first query module is used for querying the abnormal operation database based on the database error code to obtain abnormal operation scheduling information; and the second extraction module is used for extracting the characteristics of the abnormal job scheduling information to obtain the abnormal scheduling characteristics, and inquiring the abnormal database based on the abnormal scheduling characteristics to obtain the abnormal scheduling type of the abnormal job scheduling.
Optionally, the obtaining device of the database abnormal job scheduling retry strategy further includes: the first execution module is used for executing the operation scheduling retry operation based on the retry strategy of the abnormal operation scheduling and recording log data in the process of executing the retry operation; the first evaluation module is used for evaluating the retry strategy based on log data in the process of executing the retry operation, obtaining an evaluation result and updating the retry strategy library according to the evaluation result.
According to another aspect of the embodiment of the present invention, there is further provided a computer readable storage medium, where the computer readable storage medium includes a stored computer program, where when the computer program runs, the device where the computer readable storage medium is located is controlled to execute the method for acquiring any one of the database abnormal job scheduling retry policies described above.
According to another aspect of the embodiment of the present invention, there is further provided an electronic device, including one or more processors and a memory, where the memory is configured to store one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement a method for acquiring any one of the database abnormal job scheduling retry policies described above.
In the disclosure, a database error code is collected in response to a job scheduling request sent by a user terminal, wherein the database error code is an error code generated by a database based on abnormal job scheduling in the process of executing a job scheduling task, the abnormal job scheduling is a scheduling process of executing the job scheduling task, wherein the abnormal job scheduling process is abnormal due to interaction interruption of the user terminal and a database server, then an abnormal scheduling feature is obtained based on the database error code, an abnormal scheduling type to which the abnormal job scheduling belongs is determined based on the abnormal scheduling feature, and finally a retry strategy base is queried based on the abnormal scheduling feature and the abnormal scheduling type, so that a retry strategy of the abnormal job scheduling is obtained, wherein the retry strategy at least comprises: number of retries, duration of retry interval.
In the method, the abnormal condition of the database in the process of executing the job scheduling task is determined through the database error code, the abnormal scheduling characteristic is acquired according to the database error code, the abnormal scheduling type of the abnormal condition is further determined, a pre-built retry strategy library is queried based on the abnormal scheduling characteristic and the abnormal scheduling type, different retry strategies are configured for different abnormal job scheduling, the abnormal condition is effectively solved, the job timeliness and the job success rate are improved, the operation and maintenance efficiency is improved, and further the technical problems that in the related art, when the abnormal condition occurs in the process of executing the job scheduling task, the abnormal condition is difficult to effectively solve based on a retry mechanism with fixed times and more database system resources are consumed are solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention and together with the description serve to explain the invention and do not constitute a limitation on the invention. In the drawings:
FIG. 1 is a flow chart of an alternative method of acquiring a database abnormal job scheduling retry strategy in accordance with an embodiment of the present invention;
FIG. 2 is a schematic diagram of an alternative MPP distributed database-based abnormal job scheduling retry system in accordance with an embodiment of the present invention;
FIG. 3 is a diagram of an alternative abnormal job scheduling retry system in connection with an external system according to an embodiment of the present invention;
FIG. 4 is a schematic flow diagram illustrating the operation of an alternative abnormal job scheduling retry system according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of an alternative database abnormal job scheduling retry strategy acquisition apparatus according to an embodiment of the present invention;
fig. 6 is a block diagram of a hardware structure of an electronic device (or mobile device) of a method for acquiring a database abnormal job scheduling retry strategy according to an embodiment of the present invention.
Detailed Description
In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
It should be noted that, the method and the device for acquiring the database abnormal job scheduling retry strategy in the present disclosure may be used in the big data field when the retry strategy is configured for different types of database abnormal job scheduling, and may also be used in any field other than the big data field when the retry strategy is configured for different types of database abnormal job scheduling.
It should be noted that, related information (including but not limited to user equipment information, user personal information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present disclosure are information and data authorized by a user or sufficiently authorized by each party, and the collection, use and processing of related data need to comply with related laws and regulations and standards of related areas, and are provided with corresponding operation entries for the user to select authorization or rejection. For example, an interface is provided between the system and the relevant user or institution, before acquiring the relevant information, the system needs to send an acquisition request to the user or institution through the interface, and acquire the relevant information after receiving the consent information fed back by the user or institution.
The following embodiments of the present invention are applicable to various retry strategy acquisition systems/applications/devices. By an automatic exception handling mechanism, the system can quickly identify and divide error codes generated by the database, take corresponding measures to perform exception handling, such as retry, fault recovery or fault transfer, and the like, can avoid the timeliness reduction of the operation caused by the exception condition, and ensures that the operation can be completed on time.
According to the invention, different retry strategies are configured for different types of abnormal job scheduling, so that different types of session failure conditions can be processed more accurately, and proper retry times, retry intervals and failure recovery mechanisms are selected, thereby being beneficial to improving the success rate of the job and reducing the job failure rate caused by session failure.
According to the invention, a retry strategy library is pre-constructed based on historical data, and an appropriate retry strategy is determined through an automatic query mechanism, so that abnormal operation scheduling can be rapidly and accurately identified and processed, manual intervention is not needed, the processing efficiency and accuracy are improved, and the occurrence frequency of systematic risks and errors is reduced.
The present invention will be described in detail with reference to the following examples.
Example 1
According to an embodiment of the present invention, there is provided an embodiment of a method for acquiring a database abnormal job scheduling retry strategy, it should be noted that the steps illustrated in the flowchart of the drawings may be performed in a computer system such as a set of computer executable instructions, and that although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different from that herein.
FIG. 1 is a flowchart of an alternative method for acquiring a database abnormal job scheduling retry strategy according to an embodiment of the present invention, as shown in FIG. 1, the method includes the steps of:
step S101, a database error code is collected in response to a job scheduling request sent by a user terminal, wherein the database error code is an error code generated based on abnormal job scheduling in the process of executing a job scheduling task, and the abnormal job scheduling is a scheduling process of executing the job scheduling task, wherein the abnormal job occurs due to interaction interruption between the user terminal and a database server;
step S102, obtaining abnormal scheduling characteristics based on database error codes, and determining an abnormal scheduling type to which abnormal job scheduling belongs based on the abnormal scheduling characteristics;
step S103, inquiring a retry strategy library based on the abnormal scheduling characteristics and the abnormal scheduling types to obtain a retry strategy of abnormal job scheduling, wherein the retry strategy at least comprises: number of retries, duration of retry interval.
Through the steps, firstly, responding to a job scheduling request sent by a user terminal, collecting database error codes, wherein the database error codes are error codes generated by the database based on abnormal job scheduling in the process of executing job scheduling tasks, the abnormal job scheduling is a scheduling process of executing the job scheduling tasks, wherein the abnormal job scheduling process is abnormal due to interaction interruption of the user terminal and a database server, then, based on the database error codes, abnormal scheduling characteristics are obtained, the abnormal scheduling type of the abnormal job scheduling is determined based on the abnormal scheduling characteristics, finally, a retry strategy base is queried based on the abnormal scheduling characteristics and the abnormal scheduling types, and a retry strategy of the abnormal job scheduling is obtained, wherein the retry strategy at least comprises: number of retries, duration of retry interval.
In this embodiment, the abnormal condition of the database in the process of executing the job scheduling task is determined through the database error code, the abnormal scheduling feature is obtained according to the database error code, the abnormal scheduling type to which the abnormal condition belongs is further determined, a pre-built retry strategy library is queried based on the abnormal scheduling feature and the abnormal scheduling type, different retry strategies are configured for different abnormal job scheduling, the abnormal condition is effectively solved, the job timeliness and the job success rate are improved, the operation and maintenance efficiency is improved, and further the technical problems that in the related art, when the abnormal condition occurs in the process of executing the job scheduling task in the database, the abnormal condition is difficult to effectively solve based on a retry mechanism with fixed times, and more database system resources are consumed are solved.
Embodiments of the present invention will be described in detail with reference to the following steps.
In the data job scheduling, when massive data is processed, a distributed database cluster which is processed in large-scale parallel is often needed, and job scheduling tasks are executed based on a plurality of nodes in the database cluster, so that given jobs are completed.
Note that the abnormal job scheduling includes, but is not limited to: the existing retry mechanism is configured with the same retry times and retry interval time for all conditions, and can not effectively solve the job scheduling abnormality, and for the condition of network connection abnormality, multiple retries can effectively solve the abnormality, but for the abnormality such as high load of the database or user use error, the actual problem can not be solved even if the retry is repeated for multiple times, the database resource is consumed due to multiple retries, and the database burden is increased.
Aiming at the problems, the embodiment of the invention can configure different retry strategies for different types of abnormal operation scheduling of the database, such as temporary abnormality caused by network reasons, can set multiple retry strategies, can set deferred retry strategies for high-load conditions of the database, can not effectively solve the current abnormality for abnormality generated by user using errors, such as abnormality generated by SQL statement writing errors, can set the retry times to be 0 times, and sends feedback information to the operation and maintenance terminal, and the operation and maintenance terminal gives a solution.
Optionally, before responding to the job scheduling request sent by the user side, the method further comprises: collecting a historical database error code generated by executing a job scheduling task in a historical time period; and acquiring abnormal job scheduling information based on the historical database error code, and constructing an abnormal job database based on the historical database error code, the abnormal job scheduling information and the relation between the historical database error code and the abnormal job scheduling information.
It should be noted that, in the embodiment of the present invention, by collecting database error codes generated in a historical time period and integrating the database error codes, an abnormal job database is constructed, each database error code corresponds to specific abnormal job scheduling information, the database error code refers to an SQL standard error code, which is a string containing five characters, and is composed of an SQL error class containing 2 characters and a subclass containing 3 characters, and the five characters contain numerical values or capital letters, represent codes of various errors or warning conditions, and the database error code for successful job execution is 00000, otherwise, the database error code represents abnormal conditions of the database when executing job scheduling based on SQL statements, and the following are examples of some common database error codes: 20000,Integrity Constraint Violation (integrity constraint violation); 23000,Unique Constraint Violation (unique constraint violation); 23503,Foreign Key Violation (foreign key violation); 99001, timeout (access timeout); 23505,Unique Violation (unique value violation); 22001,String Data Right Truncation (string data cut-off); 22003,Numeric Value Out of Range (out of range values), 42000,Syntax Error or Access Rule Violation (syntax error), 40001,Serialization Failure (transaction concurrency conflict); 08006,Connection Failure (database connection failure), in short, the abnormal scheduling conditions represented by the error codes of different databases are different, the types of the abnormal scheduling are different, and the retry strategies to be acquired are also different.
It should be noted that, the field information recorded in the abnormal job database includes, but is not limited to: anomaly identification, anomaly codes (corresponding to the database error codes), anomaly description information.
Optionally, after constructing the abnormal job database based on the historical database error code, the abnormal job scheduling information, and the relationship of the historical database error code and the abnormal job scheduling information, further comprising: extracting features of abnormal job scheduling information in an abnormal job database to obtain abnormal scheduling features; classifying the abnormal job scheduling based on the abnormal scheduling characteristics to obtain an abnormal scheduling type to which the abnormal job scheduling belongs; and constructing an abnormal database based on the historical database error codes, the abnormal scheduling characteristics, the abnormal scheduling types and the relations among the three.
After the abnormal job database is established, feature extraction is performed based on the abnormal job scheduling information recorded in the abnormal job database, keywords or keywords for representing abnormal conditions in the abnormal job scheduling information are extracted, abnormal job scheduling is classified according to the extracted abnormal scheduling features, an abnormal scheduling type to which the abnormal job scheduling belongs is obtained, an abnormal database is constructed according to a historical database error code, the extracted abnormal scheduling features, the abnormal scheduling types and the association relation among the three, and the historical database error code and the abnormal scheduling features are stored in the abnormal database in a blocking mode according to the abnormal scheduling type.
It should be noted that, the field information recorded in the exception database includes, but is not limited to: an exception identifier, an exception code, an exception key (corresponding to the exception scheduling feature), and a type identifier and a type name corresponding to an exception type (corresponding to the exception scheduling type).
Optionally, after constructing the exception database based on the historical database error code, the exception scheduling feature, the exception scheduling type and the relationship between the three, the method further comprises: a retry strategy is configured for abnormal job scheduling based on the abnormal scheduling type; and constructing a retry strategy library based on the abnormal job scheduling information corresponding to the abnormal job scheduling and the retry strategy.
It should be noted that, in the embodiment of the present invention, corresponding retry policies are configured according to different types of abnormal job scheduling, where the retry policies may include retry times and retry time intervals, for example, for an abnormal situation caused by temporary network jitter, multiple job retry operations may be configured, and a shorter time interval may be configured, that is, retry is performed multiple times in a short time, and the retry times may be adjusted in real time according to an actual running situation of a database, so that a data job may be smoothly executed in a first time of network recovery, so as to speed up running efficiency of the database, and for an abnormal situation caused by a high load of the database, a longer time interval may be configured, retry for fewer times may be delayed, or retry operations may be performed after a partial load of the database is released, and situations such as a load of the database is increased due to multiple retry operations during a high load of the database, or a crash may be caused.
It should be noted that, the field information recorded in the retry policy repository includes, but is not limited to: policy identification, policy name, number of retries, retry interval time, cluster name, database name.
Step S101, responding to a job scheduling request sent by a user terminal, and collecting database error codes.
It should be noted that, the user side sends a job scheduling request to the database server, the database server schedules each node of the server cluster to execute the data job, and in the process of executing the data job, the database error code is collected in real time, and when the database error code is not 00000, it indicates that each node has an abnormal situation when executing the data job.
Optionally, the step of collecting the database error code in response to the job scheduling request sent by the user terminal includes: generating a job scheduling task based on the job scheduling request, and calling a database interface to execute the job scheduling task; and generating an error code acquisition instruction based on the job scheduling task, and acquiring database error codes in real time in the execution process of the job scheduling task.
After receiving the job scheduling request, generating a job scheduling task according to the request, calling a database interface to execute the job scheduling task, then generating an error code acquisition instruction, and acquiring a database error code in the job scheduling process in real time.
Step S102, obtaining abnormal scheduling characteristics based on the database error codes, and determining the abnormal scheduling type of the abnormal job scheduling based on the abnormal scheduling characteristics.
Optionally, the step of obtaining the abnormal scheduling feature based on the database error code and determining the abnormal scheduling type to which the abnormal job schedule belongs based on the abnormal scheduling feature includes: inquiring an abnormal job database based on the database error code to obtain abnormal job scheduling information; and extracting features of the abnormal job scheduling information to obtain abnormal scheduling features, and inquiring an abnormal database based on the abnormal scheduling features to obtain an abnormal scheduling type to which the abnormal job scheduling belongs.
After the database error code is acquired, acquiring abnormal job scheduling information by querying an abnormal job database, determining the specific condition of abnormal job scheduling, extracting features according to the abnormal job scheduling information to obtain abnormal scheduling features, querying the abnormal database by taking the abnormal scheduling features as query conditions, and determining the abnormal scheduling type to which the abnormal job scheduling belongs.
Step S103, inquiring a retry strategy library based on the abnormal scheduling characteristics and the abnormal scheduling types to obtain a retry strategy of abnormal job scheduling.
It should be noted that, after determining the abnormal scheduling type to which the abnormal job scheduling belongs, locating the abnormal scheduling type to a relevant position in the retry strategy library, and querying a retry strategy corresponding to the abnormal condition, for example: for the abnormal situation of writing errors in SQL query sentences, the retry strategy is 0 retries; for the case of network-caused connection failure, the retry strategy is 10 retries, each retry operation is performed for 1 minute or the like.
Optionally, after querying the retry strategy library based on the abnormal scheduling feature and the abnormal scheduling type to obtain the retry strategy of the abnormal job scheduling, the method further comprises: executing the operation scheduling retry operation based on the retry strategy of the abnormal operation scheduling, and recording log data in the process of executing the retry operation; and evaluating the retry strategy based on the log data in the process of executing the retry operation to obtain an evaluation result, and updating the retry strategy library according to the evaluation result.
It should be noted that, when executing the retry operation, the current retry strategy is sent to the Kafka error center for subsequent error information collection, monitoring and logging, meanwhile, log data in the process of executing the retry operation may be obtained, the retry strategy is evaluated according to the log data, whether the retry strategy is suitable for the abnormal operation scheduling of the type is analyzed according to the evaluation result, and the retry strategy in the retry strategy library is updated according to the analysis result.
The following detailed description is directed to alternative embodiments.
FIG. 2 is a schematic diagram of an alternative MPP distributed database-based abnormal job scheduling retry system according to an embodiment of the present invention, which is composed of three modules connected in series by a bus 21, as shown in FIG. 2: the configuration management module 22, the task scheduling module 25 and the retry management module 29 are respectively responsible for one or more specific functions, and jointly realize normal operation and exception handling of the system, and specifically comprise:
the bus 21, which is the core component of the abnormal job scheduling retry system, is responsible for connecting in series and coordinating the work of all modules, and provides a mechanism for message passing and event triggering, ensuring the communication and cooperative work between the modules.
The configuration management unit 22 is responsible for parameter analysis and environment inspection of the system and consists of a parameter analysis module 23 and an environment inspection module 24.
The parameter analysis module 23 is responsible for analyzing configuration parameters of the system, including database connection parameters, system operation parameters and the like, and transmitting the analyzed parameters to other modules for use.
The environment checking module 24 is responsible for checking the running environment of the system, including database connection status, system resource status, etc., and based on the checking result, the module can discover and process some potential problems in advance, so as to avoid the problems affecting the normal running of the system.
The task scheduling module 25 is responsible for task scheduling, task execution, and exception handling of the system, and is composed of a task scheduling module 26, a task execution module 27, and an exception handling module 28.
The task orchestration module 26 is responsible for task orchestration of the system, including allocation of tasks, priority setting, scheduling of policies, and the like.
The task execution module 27 is responsible for task execution of the system, and communicates with the underlying MPP distributed database, where the task execution includes: start, monitor, stop, etc. of the data processing operation.
The exception handling module 28 is responsible for handling the exception of the system, communicates with the retry strategy module 210 via the bus 21, and completes tasks including detection and recording of job exceptions, and selects an appropriate handling strategy according to the type of exception.
The retry treatment module 29 is responsible for retry strategy acquisition, retry process control and retry condition analysis of the system, and consists of a retry strategy module 210, a retry control module 211 and a retry analysis module 212, and through cooperative work of the three modules, automatic exception handling can be realized, big data operation scheduling is optimized, system stability and efficiency are improved, and possibility of systematic risk is reduced.
The retry strategy module 210 is responsible for the system to acquire the retry strategy, including design and selection of the retry strategy, and updating and maintaining the strategy, and the specific maintenance content includes: (1) retry policy maintenance: setting and updating of retry strategies are responsible, different retry strategies are customized for different types of abnormal operation scheduling according to historical data generated in the historical operation process of the system, including but not limited to retry times, retry intervals, retry conditions and the like, and the retry strategies are updated periodically or when necessary to adapt to the change of the system; (2) The database error code is maintained, the database error code is an important mode of reporting errors by the database, and the system can know the cause of the operation failure of the database by capturing and analyzing the database error code so as to perform corresponding processing; (3) The association relation maintenance is responsible for maintaining various association relations in the system, including association relations of tasks, task and database, database and error codes and the like, and the association relations are important bases for normal operation of the system; (4) Cache update, in order to improve the operation efficiency of the system, the system will store some common data or calculation results in the cache, but as the system operates and the data changes, the cache data may expire or be inaccurate, so that the cache data needs to be updated periodically or when necessary.
The retry control module 211 is responsible for retry process control of the system, including start, monitor, and stop of the retry, and adjusts the retry strategy according to the actual situation.
The retry analysis module 212 is responsible for the retry condition analysis of the system, including collecting and analyzing retry logs, extracting retry key indicators, and counting and analyzing retry results.
FIG. 3 is a connection diagram of an alternative abnormal job scheduling retry system and an external system according to an embodiment of the present invention, as shown in FIG. 3, a large data platform system is divided into an end layer, a service layer, a data layer, and a database layer on a deployed logic structure, where the service layer and the data layer together form the abnormal job scheduling retry system, and the abnormal job scheduling retry system is connected to the end layer and the database layer, and interacts with each other to jointly execute a job scheduling task.
The terminal layer provides an operation interface of the big data platform for a user and comprises the following parts: a big data workstation provides a tool and environment for the developer to analyze and process data; the operation and maintenance cockpit provides a tool for monitoring and managing the operation condition of the big data platform for operation and maintenance personnel; and an operation warehouse provides a tool for data analysis and service management for operators.
The service layer comprises an access channel and a retry management module.
The access channel is responsible for flexible scheduling, unified scheduling and operation and maintenance monitoring, and specifically comprises scheduling and monitoring of data processing tasks and automatic submitting of the jobs to the MPP distributed database cluster.
The retry treatment module mainly comprises three functional modules, namely retry strategy acquisition, retry control and retry log uploading, and supports operation and maintenance personnel to realize maintenance of the retry strategy, maintenance of error codes, maintenance of association relations and update of the rule cache strategy through an access channel of operation and maintenance monitoring.
The data layer is used for storing related information of historical abnormal job scheduling, and comprises the following steps: error information (corresponding to the above-described abnormal job scheduling information), retry policies, policy/error association, storing data in the form of a library, including an abnormal job database, an abnormal database, and a retry policy library, wherein field information recorded in the abnormal job database includes, but is not limited to: anomaly identification, anomaly code (corresponding to the database error code), anomaly description information; the field information recorded in the exception database includes, but is not limited to: an exception identifier, a type identifier corresponding to an exception type (corresponding to the exception scheduling type), a type name, an exception code, and an exception key (corresponding to the exception scheduling feature); the field information recorded in the retry policy repository includes, but is not limited to: policy identification, policy name, number of retries, retry interval time, cluster name, database name.
The database layer is a plurality of MPP distributed database clusters (MPP 1, MPP2 and MPP3 are shown in fig. 4) for storing service data, and is used as the core of a large data platform for storing and processing the service data.
FIG. 4 is a schematic operation flow diagram of an alternative abnormal job scheduling retry system according to an embodiment of the present invention, and as shown in FIG. 4, an acquisition flow of an abnormal job scheduling retry policy performed by the abnormal job scheduling retry system includes:
firstly, starting, completing deployment of each module of a system, and starting a main control program;
step two, starting a configuration management module 22 and starting a self-checking task;
initializing a retry management module, carrying out initialization configuration of retry strategies, error codes and association relations, locally generating a retry strategy cache, acquiring abnormal job scheduling information based on database error codes in a historical time period, constructing an abnormal job database, then carrying out feature extraction and classification based on the abnormal job scheduling information to obtain abnormal scheduling features and abnormal scheduling types, constructing an abnormal database according to the database error codes, the abnormal scheduling features and the abnormal scheduling types, configuring the retry strategy for each abnormal job scheduling, and constructing a retry strategy library;
Step four, the task scheduling module is started, and data processing tasks are submitted to the MPP distributed database clusters at the bottom layer according to the rule triggering of the task arrangement module 26;
step five, the task scheduling module processes the abnormal situation, the abnormality processing module 28 captures the error, reports the error information to the Kafka error center, and the retry management module 29 intervenes in the process;
step six, the retry management module matches the retry strategy, and according to the MPP distributed database standard SQL error code (corresponding to the database error code) captured by the exception handling module 28, the retry strategy module 210 extracts the exception scheduling feature according to the error code and determines the exception scheduling type to which the exception job scheduling belongs, and then queries the pre-built retry strategy library to perform instant retry strategy matching;
step seven, the retry treatment module executes retry operation, and executes operation retry according to the finally selected retry strategy, wherein the retry strategy comprises retry waiting time intervals and retry times, and retry information is written into an error center, and when the retry is executed, the current retry strategy is sent to the Kafka error center for subsequent error information collection, monitoring and log recording;
Step eight, the retry management module analyzes the retry condition, the retry analysis module 212 tracks the operation condition of the MPP distributed database and the job, generates an abnormal analysis report, and the analysis index includes: (1) error type distribution case: analyzing which error type occurs most, and the distribution condition of each error type; (2) A retry mechanism evaluation that analyzes whether the current retry mechanism is appropriate for a particular error type; (3) The retry success rate trend is used for analyzing the number of successful retries per day and the trend change condition of the number, and based on the result of log analysis, the retry strategy and the exception handling mechanism are continuously improved so as to improve the retry success rate and the operation execution efficiency;
step nine, the retry treatment module opens a maintenance interface, and according to the system operation status, the retry treatment module 29 opens a maintenance interface and an interface of retry strategy, error codes and association relation for operation and maintenance personnel to adjust and optimize the system;
and step ten, ending the large data job scheduling, and waiting for the next job scheduling task.
According to the embodiment of the invention, through an automatic exception handling mechanism, the system can quickly identify and divide the error code generated by the database, and take corresponding measures to perform exception handling, such as retry, fault recovery or fault transfer, etc., so that the timeliness of the operation caused by the exception condition can be avoided, and the operation can be completed on time.
According to the embodiment of the invention, different retry strategies are configured for different types of abnormal job scheduling, so that different types of session failure conditions can be processed more accurately, and proper retry times, retry intervals and failure recovery mechanisms are selected, thereby being beneficial to improving the success rate of the job and reducing the job failure rate caused by session failure.
According to the embodiment of the invention, the retry strategy library is pre-constructed based on historical data, and the appropriate retry strategy is determined through an automatic query mechanism, so that abnormal operation scheduling can be rapidly and accurately identified and processed, manual intervention is not needed, the processing efficiency and accuracy are improved, and the occurrence frequency of systematic risks and errors is reduced.
The following describes in detail another embodiment.
Example two
The apparatus for acquiring a database abnormal job scheduling retry strategy provided in this embodiment includes a plurality of implementation units, each of which corresponds to each implementation step in the first embodiment.
FIG. 5 is a schematic diagram of an alternative apparatus for acquiring a retry strategy of database abnormal job scheduling according to an embodiment of the present invention, as shown in FIG. 5, the apparatus for acquiring a retry strategy includes: an acquisition unit 51, a determination unit 52, a query unit 53, wherein,
The collecting unit 51 is configured to collect a database error code in response to a job scheduling request sent by the user side, where the database error code is an error code generated based on abnormal job scheduling in a process of executing a job scheduling task by the database, and the abnormal job scheduling is a scheduling process in which an abnormal job occurs due to interaction interruption between the user side and the database server in the process of executing the job scheduling task;
a determining unit 52, configured to obtain an abnormal scheduling feature based on the database error code, and determine an abnormal scheduling type to which the abnormal job scheduling belongs based on the abnormal scheduling feature;
a query unit 53, configured to query a retry policy library based on the abnormal scheduling feature and the abnormal scheduling type, to obtain a retry policy of abnormal job scheduling, where the retry policy at least includes: number of retries, duration of retry interval.
In the above-mentioned retry strategy obtaining device, the collecting unit 51 responds to the job scheduling request sent by the user terminal, and collects the database error code, where the database error code is an error code generated based on abnormal job scheduling in the process of executing the job scheduling task, and the abnormal job scheduling is a scheduling process in which the job is abnormal due to the interaction interruption between the user terminal and the database server in the process of executing the job scheduling task; obtaining an abnormal scheduling feature based on the database error code by the determining unit 52, and determining an abnormal scheduling type to which the abnormal job scheduling belongs based on the abnormal scheduling feature; querying, by the querying unit 53, a retry strategy library based on the abnormal scheduling feature and the abnormal scheduling type, to obtain a retry strategy of abnormal job scheduling, where the retry strategy at least includes: number of retries, duration of retry interval.
In this embodiment, the abnormal condition of the database in the process of executing the job scheduling task is determined through the database error code, the abnormal scheduling feature is obtained according to the database error code, the abnormal scheduling type to which the abnormal condition belongs is further determined, a pre-built retry strategy library is queried based on the abnormal scheduling feature and the abnormal scheduling type, different retry strategies are configured for different abnormal job scheduling, the abnormal condition is effectively solved, the job timeliness and the job success rate are improved, the operation and maintenance efficiency is improved, and further the technical problems that in the related art, when the abnormal condition occurs in the process of executing the job scheduling task in the database, the abnormal condition is difficult to effectively solve based on a retry mechanism with fixed times, and more database system resources are consumed are solved.
Optionally, the obtaining device of the database abnormal job scheduling retry strategy further includes: the first acquisition module is used for acquiring a historical database error code generated by executing a job scheduling task in a historical time period; the first construction module is used for acquiring abnormal job scheduling information based on the historical database error code and constructing an abnormal job database based on the historical database error code, the abnormal job scheduling information and the relation between the historical database error code and the abnormal job scheduling information.
Optionally, the obtaining device of the database abnormal job scheduling retry strategy further includes: the first extraction module is used for extracting characteristics of abnormal job scheduling information in the abnormal job database to obtain abnormal scheduling characteristics; the first classification module is used for classifying the abnormal job scheduling based on the abnormal scheduling characteristics to obtain an abnormal scheduling type to which the abnormal job scheduling belongs; and the second construction module is used for constructing an abnormal database based on the historical database error code, the abnormal scheduling characteristics, the abnormal scheduling type and the relation among the three.
Optionally, the obtaining device of the database abnormal job scheduling retry strategy further includes: the first configuration module is used for configuring a retry strategy for the abnormal job scheduling based on the abnormal scheduling type; and the third construction module is used for constructing a retry strategy library based on the abnormal job scheduling information corresponding to the abnormal job scheduling and the retry strategy.
Optionally, the acquisition unit comprises: the first generation module is used for generating a job scheduling task based on the job scheduling request and calling a database interface to execute the job scheduling task; the second generation module is used for generating error code acquisition instructions based on the job scheduling task and acquiring database error codes in real time in the execution process of the job scheduling task.
Optionally, the determining unit includes: the first query module is used for querying the abnormal operation database based on the database error code to obtain abnormal operation scheduling information; the second extraction module is used for extracting the characteristics of the abnormal job scheduling information to obtain abnormal scheduling characteristics, and inquiring an abnormal database based on the abnormal scheduling characteristics to obtain an abnormal scheduling type to which the abnormal job scheduling belongs.
Optionally, the obtaining device of the database abnormal job scheduling retry strategy further includes: the first execution module is used for executing the operation scheduling retry operation based on the retry strategy of the abnormal operation scheduling and recording log data in the process of executing the retry operation; the first evaluation module is used for evaluating the retry strategy based on the log data in the retry operation executing process, obtaining an evaluation result and updating the retry strategy library according to the evaluation result.
The device for acquiring the database abnormal job scheduling retry strategy may further include a processor and a memory, wherein the acquisition unit 51, the determination unit 52, the query unit 53, and the like are stored in the memory as program units, and the processor executes the program units stored in the memory to implement corresponding functions.
The processor includes a kernel, and the kernel fetches a corresponding program unit from the memory. The kernel may set one or more retry policies for different types of exception job scheduling by adjusting kernel parameters.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM), which includes at least one memory chip.
According to another aspect of the embodiment of the present invention, there is further provided a computer readable storage medium, where the computer readable storage medium includes a stored computer program, and when the computer program runs, the device where the computer readable storage medium is located is controlled to execute the method for acquiring any one of the database abnormal job scheduling retry policies described above.
According to another aspect of the embodiment of the present invention, there is further provided an electronic device, including one or more processors and a memory, where the memory is configured to store one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors implement the method for acquiring the database abnormal job scheduling retry policy.
The present application also provides a computer program product adapted to perform, when executed on a data processing device, a program initialized with the method steps of: responding to a job scheduling request sent by a user side, and collecting database error codes, wherein the database error codes are error codes generated based on abnormal job scheduling in the process of executing a job scheduling task by a database, and the abnormal job scheduling is a scheduling process of abnormal job occurrence caused by interaction interruption between the user side and a database server in the process of executing the job scheduling task; obtaining abnormal scheduling characteristics based on the database error codes, and determining an abnormal scheduling type to which abnormal job scheduling belongs based on the abnormal scheduling characteristics; inquiring a retry strategy library based on the abnormal scheduling characteristics and the abnormal scheduling types to obtain a retry strategy of abnormal job scheduling, wherein the retry strategy at least comprises: number of retries, duration of retry interval.
Fig. 6 is a block diagram of a hardware structure of an electronic device (or mobile device) of a method for acquiring a database abnormal job scheduling retry strategy according to an embodiment of the present invention. As shown in fig. 6, the electronic device may include one or more processors 602 (shown in fig. 6 as 602a, 602b, … …,602 n) (the processor 602 may include, but is not limited to, a microprocessor MCU, a programmable logic device FPGA, etc.) and a memory 604 for storing data. In addition, the method may further include: a display, an input/output interface (I/O interface), a Universal Serial Bus (USB) port (which may be included as one of the ports of the I/O interface), a network interface, a keyboard, a power supply, and/or a camera. It will be appreciated by those of ordinary skill in the art that the configuration shown in fig. 6 is merely illustrative and is not intended to limit the configuration of the electronic device described above. For example, the electronic device may also include more or fewer components than shown in FIG. 6, or have a different configuration than shown in FIG. 6.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
In the foregoing embodiments of the present invention, the descriptions of the embodiments are emphasized, and for a portion of this disclosure that is not described in detail in this embodiment, reference is made to the related descriptions of other embodiments.
In the several embodiments provided in the present application, it should be understood that the disclosed technology content may be implemented in other manners. The above-described embodiments of the apparatus are merely exemplary, and the division of the units, for example, may be a logic function division, and may be implemented in another manner, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some interfaces, units or modules, or may be in electrical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The foregoing is merely a preferred embodiment of the present invention and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present invention, which are intended to be comprehended within the scope of the present invention.

Claims (10)

1. The method for acquiring the database abnormal job scheduling retry strategy is characterized by comprising the following steps of:
responding to a job scheduling request sent by a user side, and collecting database error codes, wherein the database error codes are error codes generated based on abnormal job scheduling in the process of executing a job scheduling task by a database, and the abnormal job scheduling is a scheduling process of abnormal job occurrence caused by interaction interruption between the user side and a database server in the process of executing the job scheduling task;
obtaining abnormal scheduling characteristics based on the database error codes, and determining an abnormal scheduling type to which the abnormal job scheduling belongs based on the abnormal scheduling characteristics;
inquiring a retry strategy library based on the abnormal scheduling characteristics and the abnormal scheduling types to obtain a retry strategy of the abnormal job scheduling, wherein the retry strategy at least comprises: number of retries, duration of retry interval.
2. The method according to claim 1, further comprising, before responding to the job scheduling request sent by the user terminal:
collecting a historical database error code generated by executing the job scheduling task in a historical time period;
and acquiring abnormal job scheduling information based on the historical database error code, and constructing an abnormal job database based on the historical database error code, the abnormal job scheduling information and the relation between the historical database error code and the abnormal job scheduling information.
3. The acquisition method according to claim 2, characterized by further comprising, after constructing an abnormal job database based on the history database error code, the abnormal job scheduling information, and a relation of the history database error code and the abnormal job scheduling information:
extracting features of the abnormal job scheduling information in the abnormal job database to obtain the abnormal scheduling features;
classifying the abnormal job scheduling based on the abnormal scheduling characteristics to obtain an abnormal scheduling type to which the abnormal job scheduling belongs;
and constructing an abnormal database based on the historical database error code, the abnormal scheduling characteristic, the abnormal scheduling type and the relation among the three.
4. The acquisition method according to claim 3, characterized by further comprising, after constructing an exception database based on the history database error code, the exception scheduling feature, the exception scheduling type, and a relation between the three:
configuring the retry strategy for the abnormal job scheduling based on the abnormal scheduling type;
and constructing the retry strategy library based on the abnormal job scheduling information corresponding to the abnormal job scheduling and the retry strategy.
5. The method according to claim 1, wherein the step of collecting database error codes in response to a job scheduling request sent by a user terminal comprises:
generating a job scheduling task based on the job scheduling request, and calling a database interface to execute the job scheduling task;
and generating an error code acquisition instruction based on the job scheduling task, and acquiring the database error code in real time in the execution process of the job scheduling task.
6. The method according to claim 3, wherein the step of obtaining an abnormal scheduling feature based on the database error code, and determining an abnormal scheduling type to which an abnormal job schedule belongs based on the abnormal scheduling feature comprises:
Inquiring the abnormal operation database based on the database error code to obtain abnormal operation scheduling information;
and extracting features of the abnormal job scheduling information to obtain the abnormal scheduling features, and inquiring the abnormal database based on the abnormal scheduling features to obtain the abnormal scheduling type of the abnormal job scheduling.
7. The acquisition method according to claim 1, characterized by further comprising, after obtaining a retry strategy of the abnormal job schedule by querying a retry strategy library based on the abnormal schedule feature and the abnormal schedule type:
executing the operation scheduling retry operation based on the retry strategy of the abnormal operation scheduling, and recording log data in the process of executing the retry operation;
and evaluating the retry strategy based on log data in the process of executing the retry operation to obtain an evaluation result, and updating the retry strategy library according to the evaluation result.
8. An acquisition device of a database abnormal job scheduling retry strategy, comprising:
the system comprises an acquisition unit, a database server and a database server, wherein the acquisition unit is used for responding to a job scheduling request sent by the user side and acquiring a database error code, wherein the database error code is an error code generated based on abnormal job scheduling in the process of executing a job scheduling task by the database, and the abnormal job scheduling is a scheduling process of abnormal job occurrence caused by interaction interruption between the user side and the database server in the process of executing the job scheduling task;
The determining unit is used for obtaining abnormal scheduling characteristics based on the database error codes and determining an abnormal scheduling type to which the abnormal job scheduling belongs based on the abnormal scheduling characteristics;
the query unit is configured to query a retry strategy library based on the abnormal scheduling feature and the abnormal scheduling type, and obtain a retry strategy of the abnormal job scheduling, where the retry strategy at least includes: number of retries, duration of retry interval.
9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored computer program, wherein the computer program when run controls a device in which the computer readable storage medium is located to execute the method for acquiring the database abnormal job scheduling retry strategy according to any one of claims 1 to 7.
10. An electronic device comprising one or more processors and a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of obtaining the database abnormal job scheduling retry strategy of any of claims 1 to 7.
CN202311182938.5A 2023-09-13 2023-09-13 Method and device for acquiring database abnormal job scheduling retry strategy Pending CN117271183A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311182938.5A CN117271183A (en) 2023-09-13 2023-09-13 Method and device for acquiring database abnormal job scheduling retry strategy

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311182938.5A CN117271183A (en) 2023-09-13 2023-09-13 Method and device for acquiring database abnormal job scheduling retry strategy

Publications (1)

Publication Number Publication Date
CN117271183A true CN117271183A (en) 2023-12-22

Family

ID=89207318

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311182938.5A Pending CN117271183A (en) 2023-09-13 2023-09-13 Method and device for acquiring database abnormal job scheduling retry strategy

Country Status (1)

Country Link
CN (1) CN117271183A (en)

Similar Documents

Publication Publication Date Title
CN110287052B (en) Root cause task determination method and device for abnormal task
US8938421B2 (en) Method and a system for synchronizing data
US7526508B2 (en) Self-managing database architecture
CN114500250B (en) System linkage comprehensive operation and maintenance system and method in cloud mode
US10116534B2 (en) Systems and methods for WebSphere MQ performance metrics analysis
Bansal et al. Decaf: Diagnosing and triaging performance issues in large-scale cloud services
CN111125444A (en) Big data task scheduling management method, device, equipment and storage medium
EP3591485B1 (en) Method and device for monitoring for equipment failure
CN113656245B (en) Data inspection method and device, storage medium and processor
CN111858251A (en) Big data computing technology-based data security audit method and system
CN113760677A (en) Abnormal link analysis method, device, equipment and storage medium
CN117992304A (en) Integrated intelligent operation and maintenance platform
CN106384283A (en) Internet plus based service bus structure and service bus system
EP1428140A2 (en) Systems and methods for collecting, storing, and analyzing database statistics
KR101830936B1 (en) Performance Improving System Based Web for Database and Application
CN113641739A (en) Spark-based intelligent data conversion method
CN117194154A (en) APM full-link monitoring system and method based on micro-service
CN117112656A (en) Integrated information intelligent management system and method for scientific and technological volunteer service management
CN112965793B (en) Identification analysis data-oriented data warehouse task scheduling method and system
CN117271183A (en) Method and device for acquiring database abnormal job scheduling retry strategy
CN115718690A (en) Data accuracy monitoring system and method
CN113472881B (en) Statistical method and device for online terminal equipment
CN115168297A (en) Bypassing log auditing method and device
CN111835566A (en) System fault management method, device and system
CN118368212B (en) All-link monitoring system, method and storage medium based on business index

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination