CN115203260A - Abnormal data determination method and device, electronic equipment and storage medium - Google Patents

Abnormal data determination method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN115203260A
CN115203260A CN202210846103.4A CN202210846103A CN115203260A CN 115203260 A CN115203260 A CN 115203260A CN 202210846103 A CN202210846103 A CN 202210846103A CN 115203260 A CN115203260 A CN 115203260A
Authority
CN
China
Prior art keywords
data
task
execution
job flow
identifier
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210846103.4A
Other languages
Chinese (zh)
Inventor
孔佑记
彭磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jidou Technology Co ltd
Original Assignee
Shanghai Jidou Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jidou Technology Co ltd filed Critical Shanghai Jidou Technology Co ltd
Priority to CN202210846103.4A priority Critical patent/CN115203260A/en
Publication of CN115203260A publication Critical patent/CN115203260A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an abnormal data determination method, an abnormal data determination device, an electronic device and a storage medium, wherein the method comprises the following steps: acquiring an operation flow identifier which fails to be executed in a data warehouse; searching a task identifier with a task state as execution failure according to the operation flow identifier, wherein the operation flow identifier corresponds to a plurality of task identifiers with different execution sequences; and searching the data warehouse for a data table set corresponding to the task identification which fails to execute, and determining the data record of each data table in the data table set as abnormal data. By searching the task identifier with the task state being the execution failure according to the operation flow identifier and searching the data table set corresponding to the task identifier with the execution failure from the data warehouse, the data is marked by the operation flow identifier and the task identifier in advance, so that the abnormal data with the task execution failure can be determined by combining the operation flow identifier and the task identifier, and the efficiency of determining the abnormal data is effectively improved.

Description

Abnormal data determination method and device, electronic equipment and storage medium
Technical Field
The present application relates to the technical field of big data and data processing, and in particular, to an abnormal data determination method, an abnormal data determination device, an electronic device, and a storage medium.
Background
Data Warehouse (Data Warehouse), also known as an enterprise Data Warehouse, refers to a central repository of integrated Data from one or more different sources, and may also be a system for reporting and Data analysis. The processing tasks in the data warehouse include: extract, transform, and Load, and thus, the data processing tasks in the data warehouse are also commonly referred to as ETL tasks.
Currently, multiple ETL tasks with complex dependencies in a data warehouse often have chaotic and invalid abnormal data (also referred to as dirty data) for system reasons or human reasons, such as: in an ETL task of data extraction of a workflow, data records with missing fields (for example, null data records in non-null fields) may be caused by system reasons or human reasons, and these abnormal data may flow to ETL tasks downstream of the workflow (for example, an ETL task of data conversion in the workflow), and may be finally found at an application layer after data loading. When the abnormal data needs to be determined, the dependency relationships among a plurality of ETL tasks and the hierarchical relationships among the database tables need to be manually combed, and the abnormal data is traced according to the combed dependency relationships and hierarchical relationships, so that all the abnormal data are determined. In a specific implementation process, it is found that determining abnormal data by artificially combing the dependency relationships among a plurality of ETL tasks and the hierarchical relationships among database tables is inefficient.
Disclosure of Invention
An object of the embodiments of the present application is to provide an abnormal data determining method, an abnormal data determining device, an electronic device, and a storage medium, which are used to solve the problem of low efficiency in determining abnormal data.
The embodiment of the application provides an abnormal data determination method, which comprises the following steps: acquiring an operation flow identifier which fails to be executed in a data warehouse; searching a task identifier with a task state as an execution failure according to the operation flow identifier, wherein the operation flow identifier corresponds to a plurality of task identifiers with different execution sequences; and searching the data warehouse for a data table set corresponding to the task identification with failed execution, and determining the data record of each data table in the data table set as abnormal data. In the implementation process of the scheme, the task identifier with the task state being failed to execute is searched according to the operation flow identifier, the data table set corresponding to the task identifier with the failed execution is searched from the data warehouse, and finally, the data record of each data table in the data table set is determined to be abnormal data.
Optionally, in this embodiment of the present application, searching for a data table set corresponding to a task identifier that fails to be executed from a data warehouse includes: searching a task batch number corresponding to the task identifier which fails to execute from a data warehouse; and acquiring a data table set corresponding to the task batch number. In the implementation process of the scheme, the task batch number corresponding to the task identifier which fails to be executed is searched from the data warehouse, and the data table set corresponding to the task batch number is obtained, so that the data table set corresponding to the abnormal data which fails to be executed by the task can be determined by combining the job flow identifier and the task identifier, and the efficiency of determining the abnormal data is effectively improved.
Optionally, in this embodiment of the present application, before acquiring the job flow identifier that fails to be executed in the data warehouse, the method further includes: and executing the job flow corresponding to the job flow identification, and obtaining the execution state of the job flow, wherein the execution state of the job flow represents that the job flow is executed successfully or failed. In the implementation process of the scheme, the execution state of the job flow is obtained by executing the job flow corresponding to the job flow identifier, so that the data table set corresponding to the abnormal data with task execution failure can be determined.
Optionally, in this embodiment of the present application, the job flow includes a plurality of tasks with different execution sequences; executing the job flow corresponding to the job flow identification, and obtaining the execution state of the job flow, wherein the execution state comprises the following steps: executing the tasks according to the execution sequence corresponding to the operation flow identification to obtain the execution states of the tasks; and if any task in the plurality of tasks fails to execute, determining the execution state of the job flow as the execution failure, otherwise, determining the execution state of the job flow as the execution success. In the implementation process of the scheme, the execution state of the job flow is determined as the execution failure by executing any one of the tasks according to the execution sequence corresponding to the job flow identification, so that abnormal data corresponding to the execution state as the execution failure can be found, the problem that the task fails to execute but the abnormal data is not found is solved, and the efficiency of determining the abnormal data is effectively improved.
Optionally, in this embodiment of the present application, after determining the data record of each data table in the set of data tables as abnormal data, the method further includes: and deleting the data record corresponding to the abnormal data from each data table in the data table set. In the implementation process of the scheme, the data records corresponding to the abnormal data are deleted from each data table in the data table set, so that the problem that the abnormal data flows into the next task and is executed in a pollution manner due to the fact that the abnormal data are not deleted is solved, and the normal execution of the workflow is effectively guaranteed.
Optionally, in this embodiment of the application, after deleting the data record corresponding to the abnormal data from each data table in the data table set, the method further includes: and generating a deletion log corresponding to the abnormal data. In the implementation process of the scheme, the deleting log corresponding to the abnormal data is generated, so that the deleting operation of the abnormal data can be traced quickly through the deleting log corresponding to the abnormal data, the log is deleted for auditing and tracing, and the risk of deleting the non-abnormal data is avoided.
Optionally, in this embodiment of the present application, acquiring an identifier of a job flow that fails to be executed in a data warehouse includes: receiving a data deletion request sent by target equipment, wherein the data deletion request comprises an operation flow identification failed to be executed in a data warehouse; after the deletion log corresponding to the abnormal data is generated, the method further comprises the following steps: generating a data deletion response corresponding to the data deletion request according to the deletion log corresponding to the abnormal data; and sending a data deletion response corresponding to the data deletion request to the target equipment. In the implementation process of the scheme, after the data deletion response corresponding to the data deletion request is generated according to the deletion log corresponding to the abnormal data, the data deletion response corresponding to the data deletion request is sent to the target device, so that the abnormal data deletion service is effectively provided for the target device, and the function of deleting the abnormal data remotely is realized.
An embodiment of the present application further provides an abnormal data determining apparatus, including: the operation identification acquisition module is used for acquiring operation flow identification which fails to be executed in the data warehouse; the task identifier searching module is used for searching a task identifier with a task state as execution failure according to the operation flow identifier, and the operation flow identifier corresponds to a plurality of task identifiers with different execution sequences; and the abnormal data determining module is used for searching the data table set corresponding to the task identifier which fails to execute from the data warehouse and determining the data record of each data table in the data table set as abnormal data.
Optionally, in an embodiment of the present application, the abnormal data determining module includes: the task batch number searching module is used for searching a task batch number corresponding to the task identifier failed to execute from the data warehouse; and the data table set acquisition module is used for acquiring the data table set corresponding to the task batch number.
Optionally, in this embodiment of the present application, the abnormal data determining apparatus further includes: and the execution state obtaining module is used for executing the job flow corresponding to the job flow identification and obtaining the execution state of the job flow, and the execution state of the job flow represents that the job flow is executed successfully or failed.
Optionally, in this embodiment of the present application, the workflow includes a plurality of tasks with different execution sequences; an execution state obtaining module comprising: the sequence execution task module is used for executing a plurality of tasks according to the execution sequence corresponding to the operation flow identification to obtain the execution states of the tasks; and the execution state determining module is used for determining the execution state of the job flow as the execution failure if any one of the tasks fails to execute, and otherwise, determining the execution state of the job flow as the execution success.
Optionally, in this embodiment of the present application, the abnormal data determining apparatus further includes: and the data record deleting module is used for deleting the data record corresponding to the abnormal data from each data table in the data table set.
Optionally, in this embodiment of the present application, the abnormal data determining apparatus further includes: and the log deleting module is used for generating a deleting log corresponding to the abnormal data.
Optionally, in this embodiment of the present application, the abnormal data determining apparatus further includes: the data deleting module is used for receiving a data deleting request sent by the target equipment, wherein the data deleting request comprises an operation flow identifier which fails to be executed in the data warehouse; the deletion response generation module is used for generating a data deletion response corresponding to the data deletion request according to the deletion log corresponding to the abnormal data; and the deletion response sending module is used for sending a data deletion response corresponding to the data deletion request to the target equipment.
An embodiment of the present application further provides an electronic device, including: a processor and a memory, the memory storing processor-executable machine-readable instructions which, when executed by the processor, perform a method as described above.
Embodiments of the present application further provide a computer-readable storage medium, on which a computer program is stored, and the computer program is executed by a processor to perform the method as described above.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flow chart of an abnormal data determination method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram illustrating an implementation process of a data warehouse provided by an embodiment of the present application;
fig. 3 is a schematic flowchart illustrating a process of deleting abnormal data according to a request of a target device according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an abnormal data determination apparatus provided in an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. The components of the embodiments of the present application, generally described and illustrated in the figures herein, can be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present application, as presented in the figures, is not intended to limit the scope of the embodiments of the present application, as claimed, but is merely representative of selected embodiments of the present application. All other embodiments obtained by a person skilled in the art based on the embodiments of the present application without any creative effort belong to the protection scope of the embodiments of the present application.
It is to be understood that "first" and "second" in the embodiments of the present application are used to distinguish similar objects. Those skilled in the art will appreciate that the terms "first," "second," etc. do not denote any order or quantity, nor do the terms "first," "second," etc. denote any order or importance.
Before describing the abnormal data determining method provided in the embodiment of the present application, some concepts related in the embodiment of the present application are described:
abnormal Data (Abnormal Data), also called Abnormal Data or dirty Data, refers to Data that is generated in the Data processing process when a software system is Abnormal and cannot be normally used, or Data that is generated due to human misoperation and cannot be normally used. Here, the software system exception is, for example: the design of the software system has defects that insufficient computing resources of a server are killed (kill), or messages are lost due to insufficient network resources, and the like. The human error here is, for example: a false click causes data to be repeatedly imported and handles data records that are not canonical, resulting in empty fields, and so on.
Structured Query Language (SQL) is a special purpose programming Language, and is also a Database Query and programming Language for accessing data and querying, updating, and managing Relational Database Management systems (RDBMS), or processing in a Relational stream data Management System (RDSMS).
It should be noted that the abnormal data determining method provided in the embodiment of the present application may be executed by an electronic device, where the electronic device refers to a device terminal or a server having a function of executing a computer program, and the device terminal includes, for example: smart phones, personal computers, tablet computers, personal digital assistants, or mobile internet access devices, and the like. A server refers to a device that provides computing services over a network, such as: the server system comprises an x86 server and a non-x 86 server, wherein the non-x 86 server comprises: mainframe, minicomputer, and UNIX server.
Application scenarios to which the abnormal data determining method is applicable are described below, where the application scenarios include, but are not limited to: the application scenes such as Business Intelligence (BI), data analysis, user portrait, personalized recommendation and the like all use the data warehouse, so that the abnormal data in the data warehouse can be found by using the abnormal data determining method in the application scenes, and the abnormal data can be deleted, so that the data quality in the data warehouse is improved, and the application effect of using the data warehouse is finally improved.
Please refer to a flow chart diagram of an abnormal data determination method provided by the embodiment of the present application shown in fig. 1; the embodiment of the application provides an abnormal data determination method, which comprises the following steps:
step S110: and acquiring the operation flow identification of the execution failure in the data warehouse.
Please refer to fig. 2, which is a schematic diagram illustrating an implementation process of a data warehouse provided in an embodiment of the present application; it is understood that the data warehouse may manage data in a data hierarchy manner, and the data hierarchy is essentially to process data of a previous task into a next task according to preset processing logic when processing data jobs and tasks, for example: the data warehouse comprises a first job flow (job 1), a second job flow (job 2) and a third job flow (job 3), data in a first task (task 1) in the first job flow are extracted, so that data meeting preset conditions in the data are extracted, and the data meeting the preset conditions are sent to a next second task (task 2).
Step S120: and searching for the task identifier with the task state as the execution failure according to the operation flow identifier, wherein the operation flow identifier corresponds to a plurality of task identifiers with different execution sequences.
It will be appreciated that each job includes a plurality of tasks that are executed in different orders, and thus the job flow identification corresponds to a plurality of task identifications that are executed in different orders.
Step S130: and searching the data warehouse for a data table set corresponding to the task identification which fails to execute, and determining the data record of each data table in the data table set as abnormal data.
It can be understood that, because the data warehouse stores the corresponding association relationship between the job flow identifier and the task identifier, the task identifier whose task state is failed to be executed can be searched according to the job flow identifier. Similarly, because the data warehouse stores the corresponding association relationship between each task identifier and the data table set, the data table set corresponding to the task identifier which fails to be executed can be found from the data warehouse.
In the implementation process, the task identifier with the task state being failed to execute is searched according to the operation flow identifier, the data table set corresponding to the task identifier with the failed execution is searched from the data warehouse, and finally, the data record of each data table in the data table set is determined to be abnormal data.
As an alternative embodiment of the abnormal data determination method, before acquiring the job flow identifier of the execution failure in the data warehouse, the job flow may also be executed, and the process of executing the job may include:
step S111: and executing the job flow corresponding to the job flow identification, and obtaining the execution state of the job flow, wherein the execution state of the job flow represents that the job flow is executed successfully or failed.
It is to be understood that the above-described job flow includes a plurality of tasks that are performed in different orders.
As an alternative implementation of step S111, in executing the job flow corresponding to the job flow identifier, the execution state of the job flow may be determined according to the execution state of the task, and this implementation may include:
step S111a: and executing the tasks according to the execution sequence corresponding to the operation flow identification to obtain the execution states of the tasks.
The embodiment of step S111a described above is, for example: taking the data warehouse in fig. 2 as an example, the first job flow (job 1), the second job flow (job 2), and the third job flow (job 3) in the data warehouse are respectively executed, that is, the first task (task 1), the second task (task 2), and the third task (task 3) in the first job flow are sequentially executed, the first task (task 1), the second task (task 2), and the third task (task 3) in the second job flow are executed, the first task (task 1), the second task (task 2), the third task (task 3), and the fourth task (task 4) in the third job flow are executed, and the execution state of each task in the plurality of tasks is obtained.
Step S111b: and if any task in the plurality of tasks fails to execute, determining the execution state of the job flow as the execution failure.
The embodiment of step S111b described above is, for example: taking the data warehouse in fig. 2 as an example, the data warehouse includes the first job flow (job 1), the second job flow (job 2), and the third task (task 3) together, and it is obvious that the second task (task 2) in the second job flow (job 2) and the second task (task 2) in the third job flow (job 3) are failed to execute, the execution state of the job flow may be determined as failed to execute, and the job flow identifiers that have failed to execute in the above-mentioned data warehouse are job2 and job3.
Step S111c: and if all the tasks in the plurality of tasks are successfully executed, determining the execution state of the job flow as successful execution.
The embodiment of step S111c described above is, for example: taking the data warehouse in fig. 2 as an example, it is obvious that all tasks of the first task (task 1), the second task (task 2) and the third task (task 3) in the first job flow (job 1) are successfully executed, and then the execution state of the job flow can be determined as successful execution.
As an alternative embodiment of step S120, a data table, such as a job _ task _ relationship table (job _ task _ relationship), may be designed in the data warehouse to save the corresponding association between the job flow identifier and the task identifier, and the corresponding association between the task identifier and the data table set. The table structure field of the association table may include: job stream identification (jobid), task identification (task id), task name (task name), task execution order (task order), and execution state (status), among others. In addition, a flow result recording table (task _ batch _ result) may be added to record the execution state and execution result of the job flow and task, and other information, and the table structure field of the flow result recording table may include: job stream identification (job _ id), task identification (task _ id), task batch number (task _ batch _ number), and data table set (table _ lists), among others.
In the implementation process, as the task _ batch _ result recording table (task _ batch _ result) for executing the job task is added, the failed execution task of a plurality of tasks in the job flow can be quickly determined according to the execution state of the task, so that the job flow with failed execution is finally determined. Of course, in a specific implementation process, all fields in the association table (job _ task _ relationship) and the flow result record table (task _ batch _ result) may be placed in the same data table, or may be opened in different data tables, and the data table to which the fields belong may be set according to specific situations.
It is understood that, taking the second job flow (job 2) in fig. 2 as an example, the second job flow includes a first task (task 1), a second task (task 2) and a third task (task 3) which are sequentially executed in sequence, and the task identifier whose task state is execution failure is task2.
As an alternative implementation manner of step S130, an implementation manner of finding a data table set corresponding to the task identifier that fails to be executed may include:
step S131: and searching a task batch number corresponding to the task identifier which fails to execute from the data warehouse.
Step S132: and acquiring a data table set corresponding to the task batch number.
The above-mentioned embodiments of steps S131 to S132 include: searching the task identifier with execution state (status) as failed execution from the association relation table (job _ task _ relationship) of the data warehouse, and searching the task batch number (task _ batch _ number) corresponding to the task identifier (task _ id) with failed execution from the flow result recording table of the data warehouse, wherein the task batch number is used for determining the specific workflow and task from which the data comes and processing at a specific time point, so as to more quickly locate the workflow and task with abnormal data. And then acquiring a data table set (table _ lists) corresponding to the task batch number (table _ batch _ number) in a flow result record table of the data warehouse.
As an optional implementation of step S130, after determining the data record of each data table in the data table set as abnormal data, the data record of the abnormal data may also be deleted, and this implementation may include:
step S133: and deleting the data record corresponding to the abnormal data from each data table in the data table set.
The embodiment of step S133 described above includes, for example: in a specific implementation process, the data record corresponding to the abnormal data may be deleted directly according to an SQL statement queried by one association (join in) according to the job stream identifier (job _ id), the task identifier (task _ id), and the association table (job _ task _ relationship).
As an optional implementation of step S130, after deleting the data record corresponding to the abnormal data from each data table in the data table set, a deletion log in the data deletion process may also be generated, where the implementation may include:
step S134: and in the process of deleting the data record corresponding to the abnormal data, generating a deletion log corresponding to the abnormal data.
The embodiment of step S134 described above is, for example: and in the process of deleting the data record corresponding to the abnormal data, generating a deletion log corresponding to the abnormal data in the relational database or the non-relational database. Relational databases that can be used are for example: mysql, postgreSQL, oracle, SQLSever, etc., non-relational databases that may be used include: grakn database, neo4j database, hadoop subsystem HBase, mongoDB and CouchDB, etc.
Please refer to fig. 3, which is a schematic flowchart illustrating a process of deleting abnormal data according to a request of a target device according to an embodiment of the present application; as an optional implementation of the abnormal data determining method, the abnormal data may also be determined and deleted according to a request of the target device, and the implementation may include:
step S210: the electronic equipment receives a data deletion request sent by the target equipment, wherein the data deletion request comprises the operation flow identification failed to be executed in the data warehouse.
The embodiment of the step S210 is, for example: the electronic device receives a data deletion request sent by a target device through a Transmission Control Protocol (TCP) or a User Datagram Protocol (UDP), where the data deletion request includes an operation flow identifier that is executed in a data warehouse in a failure manner.
Step S220: the electronic equipment searches for the task identifier with the task state as the execution failure according to the operation flow identifier, and the operation flow identifier corresponds to a plurality of task identifiers with different execution sequences.
Step S230: the electronic equipment searches the data table set corresponding to the task identification failed to execute from the data warehouse, and determines the data record of each data table in the data table set as abnormal data.
The implementation principle and implementation manner of steps S220 to S230 are similar to those of steps S120 to S130, and therefore, the implementation principle and implementation manner will not be described herein, and if it is unclear, reference may be made to the description of steps S120 to S130.
Step S240: and the electronic equipment generates a data deletion response corresponding to the data deletion request according to the deletion log corresponding to the abnormal data.
The embodiment of the step S240 is, for example: after the electronic device generates the deletion log corresponding to the abnormal data in the relational database or the non-relational database, the electronic device may further generate a data deletion response corresponding to the data deletion request according to the deletion log corresponding to the abnormal data.
Step S250: and the electronic equipment sends a data deletion response corresponding to the data deletion request to the target equipment.
The embodiment of the step S250 is, for example: the electronic device sends a data deletion response corresponding to the data deletion request to the target device through a TCP Protocol or a UDP Protocol, and specifically, the data deletion response corresponding to the data deletion request may be sent to the target device through a hypertext Transfer Protocol (HTTP) or a Hypertext Transfer Protocol Security (HTTPs).
Please refer to fig. 4, which illustrates a schematic structural diagram of an abnormal data determining apparatus provided in the embodiment of the present application; the embodiment of the present application provides an abnormal data determining apparatus 300, including:
the job identification obtaining module 310 is configured to obtain a job flow identification of an execution failure in the data warehouse.
The task identifier searching module 320 is configured to search, according to the job flow identifier, a task identifier whose task state is execution failure, where the job flow identifier corresponds to multiple task identifiers with different execution sequences.
And the abnormal data determining module 330 is configured to find the data table set corresponding to the task identifier which fails to be executed from the data warehouse, and determine the data record of each data table in the data table set as abnormal data.
Optionally, in an embodiment of the present application, the abnormal data determining module includes:
and the task batch number searching module is used for searching the task batch number corresponding to the task identifier which fails to execute from the data warehouse.
And the data table set acquisition module is used for acquiring the data table set corresponding to the task batch number.
Optionally, in this embodiment of the present application, the abnormal data determining apparatus further includes:
and the execution state acquisition module is used for executing the job flow corresponding to the job flow identification and acquiring the execution state of the job flow, wherein the execution state of the job flow represents that the job flow is executed successfully or unsuccessfully.
Optionally, in this embodiment of the present application, the workflow includes a plurality of tasks with different execution sequences; an execution state acquisition module comprising:
and the sequential execution task module is used for executing the tasks according to the execution sequence corresponding to the job flow identification to obtain the execution states of the tasks.
And the execution state determining module is used for determining the execution state of the job flow as the execution failure if any one of the tasks fails to execute, and otherwise, determining the execution state of the job flow as the execution success.
Optionally, in this embodiment of the present application, the abnormal data determining apparatus further includes:
and the data record deleting module is used for deleting the data record corresponding to the abnormal data from each data table in the data table set.
Optionally, in an embodiment of the present application, the abnormal data determining apparatus further includes:
and the log deleting module is used for generating a deleting log corresponding to the abnormal data.
Optionally, in this embodiment of the present application, the abnormal data determining apparatus further includes:
and the deletion request receiving module is used for receiving a data deletion request sent by the target equipment, wherein the data deletion request comprises the job flow identification failed to execute in the data warehouse.
And the deletion response generation module is used for generating a data deletion response corresponding to the data deletion request according to the deletion log corresponding to the abnormal data.
And the deletion response sending module is used for sending a data deletion response corresponding to the data deletion request to the target equipment.
It should be understood that the apparatus corresponds to the above-mentioned abnormal data determination method embodiment, and can perform the steps related to the above-mentioned method embodiment, and the specific functions of the apparatus can be referred to the above description, and the detailed description is appropriately omitted here to avoid redundancy. The device includes at least one software functional module that can be stored in memory in the form of software or firmware (firmware) or solidified in the Operating System (OS) of the device.
Please refer to fig. 5, which illustrates a schematic structural diagram of an electronic device according to an embodiment of the present application. An electronic device 400 provided in an embodiment of the present application includes: a processor 410 and a memory 420, the memory 420 storing machine-readable instructions executable by the processor 410, the machine-readable instructions when executed by the processor 410 performing the method as above.
Embodiments of the present application further provide a computer-readable storage medium 430, where the computer-readable storage medium 430 stores a computer program, and the computer program is executed by the processor 410 to perform the above method.
The computer-readable storage medium 430 may be implemented by any type of volatile or nonvolatile storage device or combination thereof, such as a Static Random Access Memory (SRAM), an Electrically Erasable Programmable Read-Only Memory (EEPROM), an Erasable Programmable Read-Only Memory (EPROM), a Programmable Read-Only Memory (PROM), a Read-Only Memory (ROM), a magnetic Memory, a flash Memory, a magnetic disk, or an optical disk.
It should be noted that, in this specification, each embodiment is described in a progressive manner, and each embodiment focuses on differences from other embodiments, and portions that are the same as and similar to each other in each embodiment may be referred to. For the device-like embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The apparatus embodiments described above are merely illustrative and, for example, the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
In addition, functional modules in the embodiments of the present application may be integrated together to form an independent part, or each module may exist alone, or two or more modules may be integrated to form an independent part. Furthermore, in the description of the present specification, reference to the description of "one embodiment," "some embodiments," "an example," "a specific example," or "some examples" or the like means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present application. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.
In this document, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.
The above description is only an alternative embodiment of the embodiments of the present application, but the scope of the embodiments of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the embodiments of the present application, and all the changes or substitutions should be covered by the scope of the embodiments of the present application.

Claims (10)

1. An abnormal data determination method, comprising:
acquiring an operation flow identifier which fails to be executed in a data warehouse;
searching a task identifier with a task state as a failed execution task according to the operation flow identifier, wherein the operation flow identifier corresponds to a plurality of task identifiers with different execution sequences;
and searching a data table set corresponding to the task identifier failed to execute from the data warehouse, and determining the data record of each data table in the data table set as abnormal data.
2. The method of claim 1, wherein said searching the data warehouse for the set of data tables corresponding to the task identifier that failed to execute comprises:
searching a task batch number corresponding to the task identifier failed to execute from the data warehouse;
and acquiring a data table set corresponding to the task batch number.
3. The method of claim 1, further comprising, prior to executing a failed job flow identification in the acquisition data warehouse:
and executing the job flow corresponding to the job flow identification to obtain the execution state of the job flow, wherein the execution state of the job flow represents that the job flow is executed successfully or failed.
4. The method of claim 3, wherein the workflow comprises a plurality of tasks that are performed in different orders; the executing the job flow corresponding to the job flow identification, and obtaining the execution state of the job flow includes:
executing the tasks according to the execution sequence corresponding to the job flow identification to obtain the execution states of the tasks;
and if any task in the plurality of tasks fails to execute, determining the execution state of the job flow as the execution failure, otherwise, determining the execution state of the job flow as the execution success.
5. The method according to any one of claims 1-4, further comprising, after said determining the data records of each data table in said set of data tables as anomalous data:
and deleting the data record corresponding to the abnormal data from each data table in the data table set.
6. The method according to claim 5, further comprising, after the deleting the data record corresponding to the abnormal data from each data table in the set of data tables:
and generating a deletion log corresponding to the abnormal data.
7. The method of claim 6, wherein obtaining the job flow identification for which execution in the data warehouse failed comprises:
receiving a data deletion request sent by target equipment, wherein the data deletion request comprises a job flow identifier failed to execute in the data warehouse;
after the generating of the deletion log corresponding to the abnormal data, the method further includes:
generating a data deletion response corresponding to the data deletion request according to the deletion log corresponding to the abnormal data;
and sending a data deletion response corresponding to the data deletion request to the target equipment.
8. An abnormal data determination apparatus, comprising:
the operation identification acquisition module is used for acquiring operation flow identification which fails to be executed in the data warehouse;
the task identification searching module is used for searching a task identification of which the task state is execution failure according to the operation flow identification, and the operation flow identification corresponds to a plurality of task identifications with different execution sequences;
and the abnormal data determining module is used for searching the data table set corresponding to the task identifier which fails to execute from the data warehouse and determining the data record of each data table in the data table set as abnormal data.
9. An electronic device, comprising: a processor and a memory, the memory storing machine-readable instructions executable by the processor, the machine-readable instructions, when executed by the processor, performing the method of any of claims 1 to 7.
10. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, performs the method of any one of claims 1 to 7.
CN202210846103.4A 2022-07-04 2022-07-04 Abnormal data determination method and device, electronic equipment and storage medium Pending CN115203260A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210846103.4A CN115203260A (en) 2022-07-04 2022-07-04 Abnormal data determination method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210846103.4A CN115203260A (en) 2022-07-04 2022-07-04 Abnormal data determination method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115203260A true CN115203260A (en) 2022-10-18

Family

ID=83582491

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210846103.4A Pending CN115203260A (en) 2022-07-04 2022-07-04 Abnormal data determination method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115203260A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858310A (en) * 2023-03-01 2023-03-28 美云智数科技有限公司 Abnormal task identification method and device, computer equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858310A (en) * 2023-03-01 2023-03-28 美云智数科技有限公司 Abnormal task identification method and device, computer equipment and storage medium
CN115858310B (en) * 2023-03-01 2023-07-21 美云智数科技有限公司 Abnormal task identification method, device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112506870B (en) Data warehouse increment updating method and device and computer equipment
US11954123B2 (en) Data processing method and device for data integration, computing device and medium
CN112256715B (en) Index updating method and device, electronic equipment and storage medium
CN110543512B (en) Information synchronization method, device and system
CN112256318B (en) Construction method and equipment for dependent product
CN115794839B (en) Data collection method based on Php+Mysql system, computer equipment and storage medium
CN114385760A (en) Method and device for real-time synchronization of incremental data, computer equipment and storage medium
CN115203260A (en) Abnormal data determination method and device, electronic equipment and storage medium
US10606805B2 (en) Object-level image query and retrieval
CN111753141B (en) Data management method and related equipment
CN113986942B (en) Message queue management method and device based on man-machine conversation
CN110543465A (en) directory operation method and device, computer equipment and storage medium
CN112100186B (en) Data processing method and device based on distributed system and computer equipment
US11481399B1 (en) Nested discovery and deletion of resources
US11775864B2 (en) Feature management platform
CN114510470A (en) Data management method, device and equipment based on metadata and storage medium
CN114625515A (en) Task management method, device, equipment and storage medium
CN114416689A (en) Data migration method and device, computer equipment and storage medium
US9606892B2 (en) Workfile monitor
CN111274316A (en) Execution method and device of multi-level data flow task, electronic equipment and storage medium
CN112685474A (en) Application management method, device, equipment and storage medium
CN117093384B (en) Universal back-end reliable execution method, system, equipment and readable medium
CN116302206B (en) Presto data source hot loading method based on MQ
CN114268540B (en) Rule engine optimization method, device and equipment
Pereira et al. Mediator framework for inserting xDRs into Hadoop

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination