CN109710596B - Data cleaning method, device, equipment and computer readable storage medium - Google Patents

Data cleaning method, device, equipment and computer readable storage medium Download PDF

Info

Publication number
CN109710596B
CN109710596B CN201811468867.4A CN201811468867A CN109710596B CN 109710596 B CN109710596 B CN 109710596B CN 201811468867 A CN201811468867 A CN 201811468867A CN 109710596 B CN109710596 B CN 109710596B
Authority
CN
China
Prior art keywords
information
cleaning
backup
code
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811468867.4A
Other languages
Chinese (zh)
Other versions
CN109710596A (en
Inventor
文玎玲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811468867.4A priority Critical patent/CN109710596B/en
Publication of CN109710596A publication Critical patent/CN109710596A/en
Application granted granted Critical
Publication of CN109710596B publication Critical patent/CN109710596B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a data cleaning method, a device, equipment and a computer readable storage medium, wherein the method comprises the following steps: when the cleaning file is received, reading table name information, owner information, field information, column name information and cleaning condition information in the cleaning file; forming table name information, owner information and field information into table information to be cleaned, and generating a backup code according to the table information to be cleaned; forming cleaning information from the table name information, the column name information and the cleaning condition information, and generating a cleaning code according to the cleaning information; and controlling the operation of the cleaning code, cleaning target data matched with the cleaning condition information in the to-be-cleaned table corresponding to the name information, controlling the operation of the backup code, and backing up the target data. The code for data cleaning in the scheme is generated according to the cleaning file, so that the writing of developers is avoided, the generated code specification is unified, the error rate is reduced, and the development and test time of the developers and the testers is saved.

Description

Data cleaning method, device, equipment and computer readable storage medium
Technical Field
The present invention relates generally to the field of database technologies, and in particular, to a data cleaning method, apparatus, device, and computer readable storage medium.
Background
The data table in the database stores a large amount of data, various data have updated characteristics along with time, after the data are updated into new data, the data before the update need to be cleaned, and a large amount of data needing to be cleaned exist in the database; the data to be cleaned is cleaned by the cleaning code and backed up by the backup code so as to prevent false cleaning.
At present, cleaning codes and backup codes are all written by developers, different developers have different writing styles, so that the cleaning codes and the backup codes used for cleaning and backing up data in each data table are quite different, the code specification management is inconvenient, and more time is required for testing by testers.
Disclosure of Invention
The invention mainly aims to provide a data cleaning method, a device, equipment and a computer readable storage medium, which aim to solve the problems that in the prior art, data stored in a data table in a database are cleaned and backed up according to cleaning codes and backup codes written by developers, standardized management of the codes is not facilitated, and a tester spends more time testing.
In order to achieve the above object, the present invention provides a data cleaning method, which includes the following steps:
when a cleaning file is received, reading table name information, owner information, field information, column name information and cleaning condition information in the cleaning file;
forming the table name information, the owner information and the field information into table information to be cleaned, and generating a backup code according to the table information to be cleaned;
forming cleaning information by the table name information, the column name information and the cleaning condition information, and generating a cleaning code according to the cleaning information;
and controlling the operation of the cleaning code, cleaning target data matched with the cleaning condition information in the to-be-cleaned table corresponding to the table name information, and controlling the operation of the backup code to backup the target data.
Preferably, the step of generating backup codes according to the table information to be cleaned includes:
adding the owner information and the table name information in the table information to be cleaned into a preset table head statement to generate a table head code;
adding field information in the information to be cleaned into a preset content statement to generate a table content code, and adding a preset ending mark after splicing the table header code and the table content code to generate a backup code;
And reading path information in the cleaning file, creating a backup folder in a path guided by the path information, and storing the backup code into the backup folder.
Preferably, the step of generating the cleaning code according to the cleaning information includes:
reading a table name identifier, a column name identifier and a condition identifier corresponding to the table name information, the column name information and the cleaning condition information in the cleaning information;
adding the table name information, the column name information and the cleaning condition information into a preset rollback sentence according to the table name identification, the column name identification and the condition identification to generate a rollback code;
and creating a rollback folder in a path pointed by the path information, and storing the rollback code into the rollback folder.
Preferably, the step of controlling the backup code to run and backing up the target data includes:
calling the backup codes from the backup folder according to the path guided by the path information;
controlling the backup code to run, generating a backup table corresponding to the table name information, reading the target data and transmitting the target data to the backup table so as to backup the target data;
And detecting the operation states of cleaning and backup of the target data, and distributing identifiers corresponding to the operation states to the target data.
Preferably, the step of backing up the target data includes:
when a rollback request is received, calling the rollback code from the rollback folder according to the path guided by the path information;
controlling the rollback code to run, searching the backup table, and determining the data to be rollback in the target data according to identifiers carried by the target data in the backup table;
and updating the identifier after the data to be rolled back is rolled back.
Preferably, the step of storing the rollback code into the rollback folder includes:
respectively reading backup naming information, rollback naming information and cleaning naming information corresponding to the backup folder, the rollback folder and the cleaning code;
and adding the backup naming information, the rollback naming information and the cleaning naming information into a preset document to generate a deployment document.
Preferably, the step of generating backup codes according to the table information to be cleaned includes:
Determining an authorization cell in the cleaning file according to a preset authorization identifier, reading character string information in the authorization cell, adding the character string information into a preset authorization statement, and generating an authorization statement;
and adding the authorization statement into the backup code, and updating the backup code.
In addition, in order to achieve the above object, the present invention also provides a data cleaning device, including:
the reading module is used for reading table name information, owner information, field information, column name information and cleaning condition information in the cleaning file when the cleaning file is received;
the first generation module is used for forming the table name information, the owner information and the field information into table information to be cleaned, and generating a backup code according to the table information to be cleaned;
the second generation module is used for forming the table name information, the column name information and the cleaning condition information into cleaning information and generating a cleaning code according to the cleaning information;
and the control module is used for controlling the operation of the cleaning code, cleaning target data matched with the cleaning condition information in the to-be-cleaned table corresponding to the table name information, controlling the operation of the backup code and backing up the target data.
In addition, to achieve the above object, the present invention also proposes a data cleaning apparatus including: a memory, a processor, a communication bus, and a data cleansing program stored on the memory;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is configured to execute the data cleaning program to implement the following steps:
when a cleaning file is received, reading table name information, owner information, field information, column name information and cleaning condition information in the cleaning file;
forming the table name information, the owner information and the field information into table information to be cleaned, and generating a backup code according to the table information to be cleaned;
forming cleaning information by the table name information, the column name information and the cleaning condition information, and generating a cleaning code according to the cleaning information;
and controlling the operation of the cleaning code, cleaning target data matched with the cleaning condition information in the to-be-cleaned table corresponding to the table name information, and controlling the operation of the backup code to backup the target data.
In addition, to achieve the above object, the present invention also provides a computer-readable storage medium storing one or more programs executable by one or more processors for:
When a cleaning file is received, reading table name information, owner information, field information, column name information and cleaning condition information in the cleaning file;
forming the table name information, the owner information and the field information into table information to be cleaned, and generating a backup code according to the table information to be cleaned;
forming cleaning information by the table name information, the column name information and the cleaning condition information, and generating a cleaning code according to the cleaning information;
and controlling the operation of the cleaning code, cleaning target data matched with the cleaning condition information in the to-be-cleaned table corresponding to the table name information, and controlling the operation of the backup code to backup the target data.
According to the data cleaning method, a developer compiles and uploads a cleaning file for cleaning data stored in a data table, when the uploaded cleaning file is received, table name information, owner information, field information, column name information and cleaning condition information in the cleaning file are read, table name information, owner information and field information are formed into table information to be cleaned, and backup codes are generated according to the table information to be cleaned; forming cleaning information by table name information, column name information and cleaning condition information, and generating a cleaning code according to the cleaning information; and controlling the operation of the cleaning code, cleaning target data matched with the cleaning information in a table to be cleaned, which is characterized by the name information, and simultaneously controlling the operation of the backup code to backup the cleaned target data so as to prevent the mistaken cleaning of the target data. In the scheme, the cleaning codes and the backup codes are generated through multiple types of information in the read cleaning file, wherein the cleaning information characterizes the table names, the column names and the cleaning conditions of a data table where the data to be cleaned are located, so that the cleaning codes generated according to the cleaning information can accurately clean the data in the table names and the column names according to the cleaning conditions; the information to be cleaned characterizes the table name of the data table where the data to be cleaned is located, the owner corresponding to the data table and the field, so that a backup table consistent with the data table where the data to be cleaned is located can be created according to the backup code generated by the information to be cleaned, and further accurate backup of the cleaned data is realized; the method and the device avoid writing the cleaning codes and the backup codes by developers, ensure that the generated cleaning codes and backup codes are unified in specification, reduce error rate and save development and test time of the developers and testers.
Drawings
FIG. 1 is a flow chart of a first embodiment of a data cleaning method of the present invention;
FIG. 2 is a schematic diagram of functional modules of a first embodiment of the data cleaning apparatus of the present invention;
FIG. 3 is a schematic diagram of a device architecture of a hardware operating environment involved in a method according to an embodiment of the present invention.
The achievement of the objects, functional features and advantages of the present invention will be further described with reference to the accompanying drawings, in conjunction with the embodiments.
Detailed Description
It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
The invention provides a data cleaning method.
Referring to fig. 1, fig. 1 is a flowchart illustrating a data cleaning method according to a first embodiment of the present invention. In this embodiment, the data cleaning method includes:
step S10, when a cleaning file is received, reading table name information, owner information, field information, column name information and cleaning condition information in the cleaning file;
the data cleaning method is applied to the server and is suitable for cleaning the data stored in the data table in the database through the server. Specifically, when there is a need to clean the data stored in the data table of the database, a developer writes a cleaning file first, where the cleaning file is a file existing in the form of EXCEL, and each item of content of the data table where the data to be cleaned is located is set, such as table name information of the data table where the data to be cleaned is located, column name information of a specific data column of the data table where the data to be cleaned is located, owner information, field information of the data table where the data to be cleaned is located, cleaning condition information of the cleaning data, and the like. When the cleaning file uploaded by a developer is received, reading table name information, owner information, field information, column name information and cleaning condition information in the cleaning file according to identifiers of cells where preset characterization table name information, owner information, field information, column name information and cleaning condition information are located; the name of the data table where the data to be cleaned is located is determined according to the table name information, the owner user having each operation right on the data table is represented by the owner information, the field name of the data table is represented by the field information, the data column where the data to be cleaned is located is represented by the column name information, and the data to be cleaned in the data table is represented by the cleaning condition information.
Step S20, forming the table name information, the owner information and the field information into table information to be cleaned, and generating a backup code according to the table information to be cleaned;
further, the read table name information, the owner information and the field information are formed into table information to be cleaned, the table information to be cleaned represents information of a data table where data to be cleaned is located, the data to be cleaned in the data table is represented by target data, and the data table where the target data is located is represented by the table to be cleaned. Presetting a statement for generating a backup code, and transmitting formed table information to be cleaned to the preset statement to generate the statement for forming the backup code; the backup codes are used for creating a backup table and transmitting the cleaned target data to the backup table for backup so as to avoid misoperation on the target data. Specifically, the step of generating the backup code according to the table information to be cleaned includes:
step S21, adding the owner information and the table name information in the table information to be cleaned into a preset table head statement to generate a table head code;
it is to be understood that, because the backup code is used for generating the backup table for backing up the target data in the table to be cleaned, the backup table and the table to be cleaned have the same table structure, so that the backup code for creating the backup table needs to be generated according to the table information of the table to be cleaned. Specifically, the preset statement includes a preset header statement, the owner information and the table name information in the to-be-cleaned table information are added into the preset header statement, and a header code is generated, wherein the header code is used for generating the header of the backup table, and the header code is generated according to the owner information and the table name information of the to-be-cleaned table, so that the header of the generated backup table is identical to the header of the to-be-cleaned table. The preset header statement may exist in the form of "" CREATE TABLE "+table owner". Table name+ "("), the owner information and the TABLE name information are added to the positions of the "TABLE owner" and the "TABLE name" in the preset header statement, and the formed statement is the header code for generating the header.
Step S22, adding field information in the information to be cleaned into a preset content statement to generate a table content code, and adding a preset ending mark after splicing the table header code and the table content code to generate a backup code;
further, in the process of backing up the target data, the backup table needs to have the table structure content of the table to be cleaned in addition to the table head of the table to be cleaned where the target data is located; the table structure content represents each field name in the table to be cleaned, and represents each data type stored in the table to be cleaned. The preset sentences also comprise preset content sentences, and field information in the information to be cleaned is added into the preset content sentences to generate table content codes; the table content code is used for generating the table structure content of the backup table, and the table content code is generated according to the field information of the table to be cleaned, so that the table structure content of the generated backup table is identical to the table structure content of the table to be cleaned. And then splicing the generated header code and the generated table content code, and adding a preset end mark at the tail end of the spliced code formed by the splicing operation to characterize the generation of the backup code for creating the backup table.
Step S23, reading path information in the cleaning file, creating a backup folder in a path guided by the path information, and storing the backup code into the backup folder.
It will be appreciated that storage is required for the generated backup code, and that storage is performed in accordance with the path information, i.e. the backup code is stored in the path specified by the path information. Presetting a specific cell for placing path information in the cleaning file, wherein the specific cell is provided with a path identifier; after the backup codes are generated, the path information in the cleaning file is read according to the path identifier, and a backup folder is created in the path guided by the path information, wherein the backup folder is the folder for storing the generated backup codes. And the backup file is named through the backup naming information, and the backup naming information can be preset in a preset cell of the cleaning file or uploaded through a developer. The backup naming information is obtained in a mode of reading or receiving the backup naming information uploaded by the developer from a preset cell of the cleaning file, and the backup folder is named according to the backup naming information. After naming the backup folder, the generated backup codes are added into the backup folder for storage, so that the cleaned data can be backed up according to the stored backup codes.
In order to ensure the security of the data in each data table, the operation authority of each user to the data table needs to be limited, that is, different operation authorities to the data table are allocated to different users in the management process of each data table in the database. The backup table for backing up the cleaned data is a temporary table, and after the cleaned data is cleaned for a period of time, if the data of each data table in the database normally operates, the cleaned data can be judged not to influence the normal operation of the database; and the data backed up in the backup table can be deleted. The operation authority of the backup table is established under a temporary user so as to perform unified management; and the operation authority of different tables to be cleaned belongs to all users, the user with the operation authority of the table to be cleaned is used as a cleaning user, in order to facilitate the operation of the cleaning user on the backup table, the operation authority of the backup table is required to be granted to the cleaning user, and the authorization operation is carried out after the backup code is generated. Specifically, the step of generating the backup code according to the table information to be cleaned includes:
step q1, determining an authorization cell in the cleaning file according to a preset authorization identifier, reading character string information in the authorization cell, and adding the character string information into a preset authorization statement to generate an authorization statement;
Presetting an authorization cell in the cleaning file, and configuring a corresponding preset authorization identifier aiming at the authorization cell; after the backup code is generated, reading the identification of each cell in the cleaning file, comparing the read identification with a preset authorization identification, and when the read identification is consistent with the preset authorization identification, indicating that the cell from which the identification is derived is an authorization cell. And reading character string information in the authorization unit cell, wherein the character string information represents a user name required to be authorized and a corresponding authorization type, namely, the operation authority corresponding to the authorization type is granted to the backup table for the user represented by the character string information. Specifically, the authorization operation is implemented by an authorization statement, a preset authorization statement for generating the authorization statement is preset, and the read character string information is added into the preset authorization statement, so that the authorization statement can be generated, and the authorization statement is used for cleaning the operation authority of the user to allocate the backup table.
And q2, adding the authorization statement into the backup code, and updating the backup code.
Further, adding the generated authorization statement into the backup code, updating the backup code, enabling the backup code to run, generating a backup table, and adding operation authority to the backup table for a cleaning user; the backup table is operated while the data to be cleaned in the table to be cleaned is cleaned by a cleaning user; the backup table operation comprises the steps of transmitting data to be cleaned in the table to be cleaned to the backup table for backup, thoroughly deleting the data to be cleaned, which are backed up in the backup table, and recovering the data to be cleaned, which are backed up in the backup table, to the table to be cleaned, and the like.
Step S30, forming cleaning information from the table name information, the column name information and the cleaning condition information, and generating a cleaning code according to the cleaning information;
furthermore, table name information, list information and cleaning condition information read from the cleaning file are formed into cleaning information, so that the names of data tables where data to be cleaned are located and names of data columns in the data tables are represented by the cleaning information, and the data meeting the cleaning condition information in the data columns is target data to be cleaned. The method comprises the steps of presetting a preset cleaning statement for generating a cleaning code, and transmitting formed cleaning information into the preset cleaning statement to generate the cleaning statement for forming the cleaning code. And further storing the generated cleaning code and backup code in a path guided by the same path information so as to characterize the cleaning code and backup code to perform cleaning and backup operations on the data in the same data table. Specifically, a cleaning folder is created in the path guided by the path information, and the cleaning folder is the folder storing the generated cleaning codes. And meanwhile, the cleaning file is named through cleaning naming information, and the cleaning naming information can be preset in a preset cell of the cleaning file or uploaded through a developer. The cleaning naming information is obtained in a mode of reading or receiving the cleaning naming information uploaded by the developer from a preset cell of the cleaning file, and the cleaning folder is named according to the cleaning naming information. After naming the cleaning folder, the generated cleaning code is added into the cleaning folder for storage, so that the data to be cleaned can be cleaned conveniently according to the stored cleaning code.
And S40, controlling the operation of the cleaning code, cleaning target data matched with the cleaning condition information in the to-be-cleaned table corresponding to the table name information, controlling the operation of the backup code, and backing up the target data.
Further, after the cleaning code is generated, the operation of the cleaning code can be called to realize the cleaning operation on the data to be cleaned. Specifically, after the control cleaning code is operated, searching a table to be cleaned, which needs cleaning operation, through table name information, and searching a data column, which needs cleaning operation, from the table to be cleaned according to column name information; reading the data in the data column, comparing the read data with the cleaning condition information, and judging whether the read data is matched with the cleaning condition information or not; if the read data is matched with the cleaning condition information, the read data is the target data to be cleaned, and cleaning operation is carried out on the read data; when the read data and the cleaning condition information are not matched, the read data is not required to be cleaned; and when all the data in the data column are read and are compared and matched with the cleaning condition information, the cleaning operation of the target data in the table to be cleaned is completed.
In the process of cleaning target data, misoperation may exist, and data which does not need to be cleaned is used as target data to be cleaned, so that data in a data table is abnormal; in order to avoid such a situation, a backup mechanism is provided, so that when misoperation occurs, data which does not need to be cleaned can be restored through the backed-up data. Specifically, when cleaning target data, controlling backup codes to run, and generating a backup table so as to transmit the cleaned target data to the backup table for backup; the step of controlling the backup code to run and backing up the target data comprises the following steps:
step S41, calling the backup codes from the backup folder according to the path guided by the path information;
understandably, the generated backup code is stored in a backup folder, and the backup folder is named by backup naming information, and the path pointed by the path information reflects the position of the backup folder; when the backup code operation needs to be called, the backup folder named by the backup naming information is searched according to the path guided by the path information, and then the backup code in the backup folder is called, so that the cleaned target data is backed up through the backup code.
Step S42, controlling the backup code to run, generating a backup table corresponding to the table name information, reading the target data and transmitting the target data to the backup table so as to backup the target data;
further, the backup code is controlled to run, a backup table corresponding to the table name information is generated, namely the table name information is determined to be the name of the backup table, so that the name of the generated backup table is consistent with the name of the data table from which the target data is obtained, and the backup characteristic of the backup table to the data table from which the target data is obtained is reflected. And simultaneously reading the cleaned target data, transmitting the read target data to the backup table, and carrying out backup operation.
Step S43, detecting the operation state of cleaning and backup of the target data, and allocating an identifier corresponding to the operation state to the target data.
Considering that the conditions of successful backup and successful cleaning and unsuccessful backup and cleaning exist in the process of cleaning and backing up the target data, the two conditions are taken as operation states of cleaning and backup, namely the operation states comprise two states of successful backup and cleaning and successful backup and unsuccessful cleaning. The different identifiers are set for two different operating states, such as the identifier f1 is set successfully for both backup and clean up, and the identifier f2 is set successfully for backup but not clean up. After the target data is backed up, the operation states of cleaning and backup are detected, and identifiers corresponding to the operation states are allocated to the target data so as to represent the cleaning and backup conditions of the target data. After all the target data subjected to the cleaning operation are backed up, finishing the cleaning operation on the target data; and the target data is cleaned through the backup operation, and meanwhile, misoperation is prevented.
According to the data cleaning method, a developer compiles and uploads a cleaning file for cleaning data stored in a data table, when the uploaded cleaning file is received, table name information, owner information, field information, column name information and cleaning condition information in the cleaning file are read, table name information, owner information and field information are formed into table information to be cleaned, and backup codes are generated according to the table information to be cleaned; forming cleaning information by table name information, column name information and cleaning condition information, and generating a cleaning code according to the cleaning information; and controlling the operation of the cleaning code, cleaning target data matched with the cleaning information in a table to be cleaned, which is characterized by the name information, and simultaneously controlling the operation of the backup code to backup the cleaned target data so as to prevent the mistaken cleaning of the target data. In the scheme, the cleaning codes and the backup codes are generated through multiple types of information in the read cleaning file, wherein the cleaning information characterizes the table names, the column names and the cleaning conditions of a data table where the data to be cleaned are located, so that the cleaning codes generated according to the cleaning information can accurately clean the data in the table names and the column names according to the cleaning conditions; the information to be cleaned characterizes the table name of the data table where the data to be cleaned is located, the owner corresponding to the data table and the field, so that a backup table consistent with the data table where the data to be cleaned is located can be created according to the backup code generated by the information to be cleaned, and further accurate backup of the cleaned data is realized; the method and the device avoid writing the cleaning codes and the backup codes by developers, ensure that the generated cleaning codes and backup codes are unified in specification, reduce error rate and save development and test time of the developers and testers.
Further, in another embodiment of the data cleansing of the present invention, the step of generating a cleansing code according to the cleansing information includes:
step a1, reading a table name identifier, a column name identifier and a condition identifier corresponding to the table name information, the column name information and the cleaning condition information in the cleaning information;
understandably, the data cleaned up for the faulty operation needs to be restored to the table to be cleaned where it was originally, and the restoring operation is implemented by the rollback code. Specifically, the table name information, the column name information and the cleaning condition information which form the cleaning information respectively carry the table name identifier, the column name identifier and the condition identifier so as to distinguish the three in the cleaning information. After the cleaning code is generated, the table name identification, the column name identification and the condition identification corresponding to the table name information, the column name information and the cleaning condition information in the cleaning information are read, so that the table name information, the column name information and the cleaning condition information are distinguished according to the table name information, the column name information and the cleaning condition information.
Step a2, adding the table name information, the column name information and the cleaning condition information into a preset rollback sentence according to the table name identification, the column name identification and the condition identification to generate a rollback code;
Further, a preset rollback sentence for generating a rollback code is preset, and table name information, column name information and cleaning condition information in the cleaning information are added into the preset rollback sentence, so that the rollback sentence for forming the rollback code can be generated; the cleaning information characterization of the generated rollback code restores the backup data cleaned according to the cleaning condition information in the backup table with the table name information to the data column of the table to be cleaned corresponding to the table name information and the column name information. Because the cleaning information relates to three items of table name information, column name information and cleaning condition information, the three items need to be distinguished in the adding process; specifically, different codes for representing different information are arranged in the preset rollback statement, such as table name information, column name information and cleaning condition information are represented by codes g1, g2 and g3 respectively; therefore, different information can be added to the preset rollback sentence according to the read identifier, if the read table name identifier, the read column name identifier and the read condition identifier are w1, w2 and w3 respectively, the table name information carrying w1 can be added to the position where the code g1 in the preset rollback sentence is located, the column name information carrying w2 can be added to the position where the code g2 in the preset rollback sentence is located, and the cleaning condition information carrying w3 can be added to the position where the code g3 in the preset rollback sentence is located. After the table name information, the column name information and the cleaning condition information are added into the preset rollback statement, a rollback code for recovering the data cleaned into the backup table can be generated.
And a step a3 of creating a rollback folder in the path guided by the path information and storing the rollback code into the rollback folder.
Further, after the rollback code is generated, a storage operation needs to be performed on the generated rollback code, and the rollback code and the backup code are stored in a path pointed by the same path information, so as to characterize that the rollback code and the backup code perform cleaning and recovering operations on the same data. Specifically, a rollback folder, that is, a folder storing the generated rollback code, is created in the path indicated by the path information. And meanwhile, the rollback folder is named through rollback naming information, and the rollback naming information can be preset in a preset cell of the cleaning file or uploaded through a developer. The method comprises the steps that rollback naming information is obtained in a mode of reading or receiving rollback naming information uploaded by a developer from preset cells of a cleaning file, and the rollback folder is named according to the rollback naming information. After naming the rollback folder, the generated rollback code is added to the rollback folder for storage, so that the cleaned data can be subjected to subsequent rollback operation according to the stored rollback code.
Understandably, the generated clean code, backup code and rollback code cannot exist alone, and need to be deployed into a specific project, and the execution of each program is realized by scheduling of the project. The deployment operation is implemented by the deployment document, and specifically, after the step of storing the rollback code into the rollback folder, includes:
step a4, respectively reading backup naming information, rollback naming information and cleaning naming information corresponding to the backup folder, the rollback folder and the cleaning codes;
and a step a5, adding the backup naming information, the rollback naming information and the cleaning naming information into a preset document to generate a deployment document.
Further, in order to generate a deployment document, cleaning naming information, backup naming information and rollback naming information corresponding to the cleaning folder, the backup folder and the rollback folder where the cleaning code, the backup code and the rollback code are located are read, and the read cleaning naming information, backup naming information and rollback naming information are added into a preset document to generate the deployment document; wherein the preset document is a file preset for generating a deployment document. Deploying the cleaning codes, the backup codes and the rollback codes into project projects through the generated deployment document, and searching corresponding cleaning folders, backup folders and rollback folders by the project projects according to cleaning naming information, backup naming information and rollback naming information in the deployment document; and then, calling codes in each folder to run so as to clean and backup the data to be cleaned in the data table and restore the cleaned data; the method and the device realize that data cleaned due to misoperation are recovered while the data in the data table are cleaned.
Further, in another embodiment of the data cleaning of the present invention, the step of backing up the target data includes:
step b1, when a rollback request is received, calling the rollback code from the rollback folder according to the path guided by the path information;
understandably, after the cleaning of the target data to be cleaned is completed, other data in the data table is used in various aspects as basic data; when the data cleaned by misoperation exists in the target data, abnormal use of the basic data can be caused, and recovery operation needs to be carried out on the cleaned target data, namely rollback is controlled to be carried out by rollback code operation. Triggering a rollback request by a developer or an operation and maintenance person, and when the rollback request is received, indicating that the need for recovering the cleaned data from the backup table exists; searching a folder corresponding to the rollback naming information from a path stored by the rollback code, namely a path guided by path information, wherein the folder is the rollback folder storing the rollback code; rollback code is invoked from the rollback folder to recover the cleaned target data.
Step b2, controlling the rollback code to run, searching the backup table, and determining the data to be rolled back in the target data according to the identifiers carried by the target data in the backup table;
Further, the rollback code is controlled to run, a backup table generated by the backup code is searched according to the table name information, and the data to be rollback, which needs to be recovered, is determined for each target data backed up in the backup table according to the identifiers carried by each target data. Different identifiers are set for the target data which is successful in backup and cleaning and the target data which is successful in backup but unsuccessful in cleaning and backup processes; the target data which is successfully backed up but unsuccessfully cleaned is cleaned, so that the target data which is required to be cleaned still exists in the table to be cleaned, the target data is not required to be recovered, and only the target data which is successfully backed up and cleaned is recovered. And taking the target data which are successful in backup and cleaning as the data to be rolled back, reading the identifier of the operation state which is successful in backup and cleaning, and determining the target data with the identifier as the data to be rolled back which needs to be recovered.
And b3, updating the identifier after the data to be rolled back is rolled back.
Further, after determining the data to be rolled back, the rolling back code rolls back the data to be rolled back, reads the data to be rolled back, and transmits the data to be rolled back to a data column of the table to be cleaned according to column name information; the data column is the data column in the table to be cleaned before the data column is cleaned by rollback, so as to restore the data to be rollback to the initial position of the table to be cleaned. Detecting the state of the rollback operation and updating the identifier for different rollback states; when the rollback operation is successful, namely the data to be rolled back is successfully restored to the initial position of the table to be cleaned, updating the identifier to represent a state of successful restoration; when the rollback operation fails, namely the data to be rolled back is not successfully restored to the initial position of the table to be cleaned, updating the identifier to be in a state representing restoration failure; the success or failure of the rollback operation is reflected through different states updated by the identifier, so that the maintenance of each piece of data to be rollback by operation and maintenance personnel is facilitated.
In addition, referring to fig. 2, the present invention provides a data cleaning device, in a first embodiment of the present invention, the data cleaning device includes:
the reading module 10 is used for reading the table name information, the owner information, the field information, the column name information and the cleaning condition information in the cleaning file when the cleaning file is received;
the first generating module 20 is configured to form the table name information, the owner information, and the field information into table information to be cleaned, and generate a backup code according to the table information to be cleaned;
a second generating module 30, configured to form the table name information, the column name information, and the cleaning condition information into cleaning information, and generate a cleaning code according to the cleaning information;
and the control module 40 is used for controlling the operation of the cleaning code, cleaning the target data matched with the cleaning condition information in the to-be-cleaned table corresponding to the table name information, and controlling the operation of the backup code to backup the target data.
In the data cleaning device of this embodiment, a developer compiles and uploads a cleaning file for cleaning data stored in a data table, and when the uploaded cleaning file is received, a reading module 10 reads table name information, owner information, field information, column name information and cleaning condition information in the cleaning file, and a first generating module 20 forms table name information, owner information and field information into table information to be cleaned, and further generates a backup code according to the table information to be cleaned; meanwhile, the second generation module 30 forms the table name information, the column name information and the cleaning condition information into cleaning information, and generates a cleaning code according to the cleaning information; the control module 40 controls the operation of the cleaning code to clean the target data matched with the cleaning information in the table to be cleaned, which is characterized by the name information, and simultaneously controls the operation of the backup code to backup the cleaned target data so as to prevent the mistaken cleaning of the target data. In the scheme, the cleaning codes and the backup codes are generated through multiple types of information in the read cleaning file, wherein the cleaning information characterizes the table names, the column names and the cleaning conditions of a data table where the data to be cleaned are located, so that the cleaning codes generated according to the cleaning information can accurately clean the data in the table names and the column names according to the cleaning conditions; the information to be cleaned characterizes the table name of the data table where the data to be cleaned is located, the owner corresponding to the data table and the field, so that a backup table consistent with the data table where the data to be cleaned is located can be created according to the backup code generated by the information to be cleaned, and further accurate backup of the cleaned data is realized; the method and the device avoid writing the cleaning codes and the backup codes by developers, ensure that the generated cleaning codes and backup codes are unified in specification, reduce error rate and save development and test time of the developers and testers.
Further, in another embodiment of the data cleaning device of the present invention, the first generating module includes:
the adding unit is used for adding the owner information and the table name information in the table information to be cleaned into a preset table head statement to generate a table head code;
the generating unit is used for adding field information in the information to be cleaned into a preset content statement to generate a table content code, and adding a preset ending mark after splicing the table header code and the table content code to generate a backup code;
and the storage unit is used for reading the path information in the cleaning file, creating a backup folder in the path guided by the path information and storing the backup code into the backup folder.
Further, in another embodiment of the data cleaning device of the present invention, the data cleaning device further includes:
the acquisition module is used for reading the table name identification, the column name identification and the condition identification corresponding to the table name information, the column name information and the cleaning condition information in the cleaning information;
the adding module is used for adding the table name information, the column name information and the cleaning condition information into a preset rollback sentence according to the table name identification, the column name identification and the condition identification to generate a rollback code;
And the storage module is used for creating a rollback folder in the path guided by the path information and storing the rollback code into the rollback folder.
Further, in another embodiment of the data cleaning device of the present invention, the control module is further configured to:
calling the backup codes from the backup folder according to the path guided by the path information;
controlling the backup code to run, generating a backup table corresponding to the table name information, reading the target data and transmitting the target data to the backup table so as to backup the target data;
and detecting the operation states of cleaning and backup of the target data, and distributing identifiers corresponding to the operation states to the target data.
Further, in another embodiment of the data cleaning device of the present invention, the data cleaning device further includes:
the calling module is used for calling the rollback code from the rollback folder according to the path guided by the path information when the rollback request is received;
the searching module is used for controlling the rollback code to run, searching the backup table and determining the data to be rollback in the target data according to the identifiers carried by the target data in the backup table;
And the first updating module is used for updating the identifier after the data to be rolled back is rolled back.
Further, in another embodiment of the data cleaning device of the present invention, the data cleaning device further includes:
the grabbing module is used for respectively reading backup naming information, rollback naming information and cleaning naming information corresponding to the backup folder, the rollback folder and the cleaning codes;
and the third generation module is used for adding the backup naming information, the rollback naming information and the cleaning naming information into a preset document to generate a deployment document.
Further, in another embodiment of the data cleaning device of the present invention, the data cleaning device further includes:
the determining module is used for determining the authorization cells in the cleaning file according to the preset authorization identification, reading the character string information in the authorization cells, adding the character string information into a preset authorization statement and generating the authorization statement;
and the second updating module is used for adding the authorization statement into the backup code and updating the backup code.
The virtual function modules of the data cleaning apparatus are stored in the memory 1005 of the data cleaning device shown in fig. 3, and when the processor 1001 executes the data cleaning program, the functions of the modules in the embodiment shown in fig. 2 are implemented.
Referring to fig. 3, fig. 3 is a schematic device structure of a hardware running environment related to a method according to an embodiment of the present invention.
The data cleaning device in the embodiment of the invention can be a PC (personal computer ) or terminal devices such as a smart phone, a tablet personal computer, an electronic book reader, a portable computer and the like.
As shown in fig. 3, the data cleaning apparatus may include: a processor 1001, such as a CPU (Central Processing Unit ), a memory 1005, and a communication bus 1002. Wherein a communication bus 1002 is used to enable connected communication between the processor 1001 and a memory 1005. The memory 1005 may be a high-speed RAM (random access memory ) or a stable memory (non-volatile memory), such as a disk memory. The memory 1005 may also optionally be a storage device separate from the processor 1001 described above.
Optionally, the data cleaning device may further include a user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi (Wireless Fidelity, wireless broadband) module, and the like. The user interface may comprise a Display, an input unit such as a Keyboard (Keyboard), and the optional user interface may further comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).
Those skilled in the art will appreciate that the data cleaning device structure shown in fig. 3 does not constitute a limitation of the data cleaning device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
As shown in fig. 3, an operating system, a network communication module, and a data cleaning program may be included in a memory 1005, which is a computer-readable storage medium. An operating system is a program that manages and controls the hardware and software resources of a data cleaning device, supporting the execution of data cleaning programs and other software and/or programs. The network communication module is used to enable communication between components within the memory 1005 and other hardware and software in the data cleansing device.
In the data cleaning device shown in fig. 3, a processor 1001 is configured to execute a data cleaning program stored in a memory 1005, and implement the steps in the embodiments of the data cleaning method described above.
The present invention provides a computer-readable storage medium storing one or more programs that are further executable by one or more processors for implementing the steps in the embodiments of the data cleaning method described above.
It should also be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a computer readable storage medium (e.g. ROM/RAM, magnetic disk, optical disk) as described above, comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present invention.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the specification and drawings of the present invention or direct/indirect application in other related technical fields are included in the scope of the present invention.

Claims (10)

1. The data cleaning method is characterized by comprising the following steps of:
when a cleaning file is received, reading table name information, owner information, field information, column name information and cleaning condition information in the cleaning file;
forming the table name information, the owner information and the field information into table information to be cleaned, and generating a backup code according to the table information to be cleaned;
forming cleaning information by the table name information, the column name information and the cleaning condition information, and generating a cleaning code according to the cleaning information;
controlling the operation of the cleaning code, cleaning target data matched with the cleaning condition information in a table to be cleaned corresponding to the table name information, and simultaneously controlling the operation of the backup code to backup the target data;
the step of generating the backup code according to the table information to be cleaned comprises the following steps:
Reading path information in a cleaning file, creating a backup folder in a path guided by the path information, and storing the backup code into the backup folder;
the method comprises the steps of presetting a preset cleaning statement for generating a cleaning code, transmitting formed cleaning information into the preset cleaning statement, generating the cleaning statement for forming the cleaning code, and storing the generated cleaning code and backup code in a path guided by the same path information.
2. The data cleansing method as defined in claim 1, wherein the step of generating backup codes according to the table information to be cleansed comprises:
adding the owner information and the table name information in the table information to be cleaned into a preset table head statement to generate a table head code;
and adding field information in the table information to be cleaned into a preset content statement to generate a table content code, and adding a preset ending mark after splicing the table header code and the table content code to generate a backup code.
3. The data cleansing method of claim 2, wherein the step of generating a cleansing code according to the cleansing information comprises:
Reading a table name identifier, a column name identifier and a condition identifier corresponding to the table name information, the column name information and the cleaning condition information in the cleaning information;
adding the table name information, the column name information and the cleaning condition information into a preset rollback sentence according to the table name identification, the column name identification and the condition identification to generate a rollback code;
and creating a rollback folder in a path pointed by the path information, and storing the rollback code into the rollback folder.
4. The data cleansing method as recited in claim 3, wherein said step of controlling said backup code to run and backing up said target data comprises:
calling the backup codes from the backup folder according to the path guided by the path information;
controlling the backup code to run, generating a backup table corresponding to the table name information, reading the target data and transmitting the target data to the backup table so as to backup the target data;
and detecting the operation states of cleaning and backup of the target data, and distributing identifiers corresponding to the operation states to the target data.
5. The data cleansing method as defined in claim 4, wherein the step of backing up the target data comprises:
When a rollback request is received, calling the rollback code from the rollback folder according to the path guided by the path information;
controlling the rollback code to run, searching the backup table, and determining the data to be rollback in the target data according to identifiers carried by the target data in the backup table;
and updating the identifier after the data to be rolled back is rolled back.
6. The data cleansing method as recited in claim 3, wherein said step of storing said rollback code into said rollback folder comprises, after:
respectively reading backup naming information, rollback naming information and cleaning naming information corresponding to the backup folder, the rollback folder and the cleaning code;
and adding the backup naming information, the rollback naming information and the cleaning naming information into a preset document to generate a deployment document.
7. The data cleansing method as defined in any one of claims 1-6, wherein said step of generating backup codes from said table information to be cleansed comprises:
determining an authorization cell in the cleaning file according to a preset authorization identifier, reading character string information in the authorization cell, adding the character string information into a preset authorization statement, and generating an authorization statement;
And adding the authorization statement into the backup code, and updating the backup code.
8. A data cleaning device, the data cleaning device comprising:
the reading module is used for reading table name information, owner information, field information, column name information and cleaning condition information in the cleaning file when the cleaning file is received;
the first generation module is used for forming table name information, owner information and field information into table information to be cleaned, generating backup codes according to the table information to be cleaned, reading path information in a cleaning file, creating a backup folder in a path guided by the path information, storing the backup codes into the backup folder, presetting a preset cleaning statement for generating the cleaning codes, transmitting the formed cleaning information into the preset cleaning statement, generating a cleaning statement for forming the cleaning codes, and storing the generated cleaning codes and the backup codes in the path guided by the same path information;
the second generation module is used for forming the table name information, the column name information and the cleaning condition information into cleaning information and generating a cleaning code according to the cleaning information;
And the control module is used for controlling the operation of the cleaning code, cleaning target data matched with the cleaning condition information in the to-be-cleaned table corresponding to the table name information, and simultaneously controlling the operation of the backup code to backup the target data.
9. A data cleaning apparatus, the data cleaning apparatus comprising: a memory, a processor, a communication bus, and a data cleansing program stored on the memory;
the communication bus is used for realizing connection communication between the processor and the memory;
the processor is configured to execute the data cleaning program to implement the steps of the data cleaning method according to any one of claims 1-7.
10. A computer readable storage medium, characterized in that the computer readable storage medium has stored thereon a data cleaning program, which when executed by a processor, implements the steps of the data cleaning method according to any of claims 1-7.
CN201811468867.4A 2018-11-30 2018-11-30 Data cleaning method, device, equipment and computer readable storage medium Active CN109710596B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811468867.4A CN109710596B (en) 2018-11-30 2018-11-30 Data cleaning method, device, equipment and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811468867.4A CN109710596B (en) 2018-11-30 2018-11-30 Data cleaning method, device, equipment and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109710596A CN109710596A (en) 2019-05-03
CN109710596B true CN109710596B (en) 2023-12-19

Family

ID=66254570

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811468867.4A Active CN109710596B (en) 2018-11-30 2018-11-30 Data cleaning method, device, equipment and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109710596B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222039B (en) * 2019-05-07 2023-09-29 平安科技(深圳)有限公司 Data storage and garbage data cleaning method, device, equipment and storage medium
CN114071389A (en) * 2020-07-31 2022-02-18 中国移动通信集团新疆有限公司 Test verification method and device, computer equipment and storage medium
CN112905386A (en) * 2021-02-08 2021-06-04 中国工商银行股份有限公司 Table data backup cleaning method and device based on life cycle

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000300797A (en) * 1999-04-20 2000-10-31 Sophia Co Ltd Arithmetic processing device for game and arithmetic processing method for game
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050028046A1 (en) * 2003-07-31 2005-02-03 International Business Machines Corporation Alert flags for data cleaning and data analysis
US7865473B2 (en) * 2007-04-02 2011-01-04 International Business Machines Corporation Generating and indicating incremental backup copies from virtual copies of a data set
US10394667B2 (en) * 2015-02-25 2019-08-27 Hyland Switzerland Sàrl System and methods for backing up and restoring database objects

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000300797A (en) * 1999-04-20 2000-10-31 Sophia Co Ltd Arithmetic processing device for game and arithmetic processing method for game
CN103593352A (en) * 2012-08-15 2014-02-19 阿里巴巴集团控股有限公司 Method and device for cleaning mass data
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
用Javabean构建数据过滤的Script代码生成器;林达德;;宁波职业技术学院学报(05);全文 *

Also Published As

Publication number Publication date
CN109710596A (en) 2019-05-03

Similar Documents

Publication Publication Date Title
CN109710596B (en) Data cleaning method, device, equipment and computer readable storage medium
US9146839B2 (en) Method for pre-testing software compatibility and system thereof
US9454439B2 (en) Disaster recovery validation
CN107729041A (en) The hot update method of application program, device, terminal and storage medium
CN102117234B (en) Method for recovering original software by mobile terminal in software upgrading failure
EP1770512A2 (en) Method and system for updating software
CN110673936B (en) Breakpoint continuous operation method and device for arrangement service, storage medium and electronic equipment
CN109558318B (en) Code management method and code warehouse distributed system
US10635575B2 (en) Testing of enterprise resource planning systems
Xu et al. Minimizing the side effect of context inconsistency resolution for ubiquitous computing
CN110502399B (en) Fault detection method and device
CN114780019A (en) Electronic device management method and device, electronic device and storage medium
CN105117242A (en) System resetting method and terminal
CN105574026A (en) Method and device for service supporting by using non-relational database
CN113312205B (en) Data verification method and device, storage medium and computer equipment
CN110083493A (en) A kind of embedded system failure self-recovery method, terminal device and storage medium
CN109634782B (en) Method and device for detecting system robustness, storage medium and terminal
CN110928945B (en) Data processing method and device for database and data processing system
CN115061858B (en) Data persistence method and device, computer equipment and storage medium
CN105159701A (en) System resetting method and terminal
CN115098889A (en) Authority management method, device, equipment and storage medium
CN109117190A (en) System start method and device
CN112965865A (en) Personal computer restart testing method, device and system
CN112596954A (en) Data backup and reconstruction method, device, equipment and storage medium
CN112579358B (en) Backup point detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant