CN108319694B - Automatic historical data cleaning method and device - Google Patents

Automatic historical data cleaning method and device Download PDF

Info

Publication number
CN108319694B
CN108319694B CN201810104814.8A CN201810104814A CN108319694B CN 108319694 B CN108319694 B CN 108319694B CN 201810104814 A CN201810104814 A CN 201810104814A CN 108319694 B CN108319694 B CN 108319694B
Authority
CN
China
Prior art keywords
data
cleaned
cleaning
current
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810104814.8A
Other languages
Chinese (zh)
Other versions
CN108319694A (en
Inventor
谢小兵
林楷坤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN201810104814.8A priority Critical patent/CN108319694B/en
Publication of CN108319694A publication Critical patent/CN108319694A/en
Application granted granted Critical
Publication of CN108319694B publication Critical patent/CN108319694B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses an automatic historical data cleaning method and device, wherein a data cleaning master control parameter table, a cleaning rule table, a consistency rule table and a data table to be cleaned are read to obtain a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned; and cleaning the current data master table to be cleaned and the related historical data of the association table corresponding to the current data master table to be cleaned according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned. And automatically cleaning the historical data according to a configurable cleaning rule, and ensuring the data consistency among the data tables in the cleaning process of the historical data.

Description

Automatic historical data cleaning method and device
Technical Field
The invention relates to the technical field of data cleaning, in particular to an automatic historical data cleaning method and device.
Background
Business data in a computer system is typically accumulated. If the service historical data with overlarge scale is not cleaned in time, the excessive disk space is occupied, and the response performance of the system is also influenced. Therefore, it is necessary to clean the historical data exceeding a certain period in time.
In the existing application system, two methods are generally used for cleaning historical data:
the method comprises the following steps: manually issuing an SQL TRUNCATE or SQL DELETE or SQL DROP \ CREATE TABLE command aiming at each data TABLE needing data cleaning so as to clean the data;
the method 2 comprises the following steps: and the data cleaning is realized by a programming method. Defining and using a database cursor in a cleaning program, and deleting records of each associated data table one by one from the main table.
Disadvantages and shortcomings of method 1: the method is difficult to ensure the consistency among the data tables, and is easy to make mistakes, thereby causing data confusion.
The method 2 has the following disadvantages and shortcomings: without configurability. Depending on the programming implementation, the data scrubbing logic is implemented separately for each particular application system; has no universality and portability. Once the data cleaning rule needs to be adjusted or the cleaning table range needs to be increased, the program needs to be modified, so that the maintenance work becomes complicated and error is easy to make.
Disclosure of Invention
In view of this, the present invention provides an automatic historical data cleaning method and apparatus, which automatically execute the historical data cleaning work according to a configurable cleaning rule and a consistency rule, and improve the historical data cleaning efficiency.
In order to achieve the above purpose, the invention provides the following specific technical scheme:
a method for automatically cleaning historical data comprises the following steps:
reading a data cleaning master control parameter table, a cleaning rule table, a consistency rule table and a data table to be cleaned to obtain a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned;
and according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned, cleaning the association table corresponding to the current data master table to be cleaned and the related historical data of the current data master table to be cleaned in sequence.
Preferably, the reading of the data cleaning master control parameter table, the cleaning rule table, the consistency rule table, and the data table to be cleaned to obtain the current data master table to be cleaned, the association table corresponding to the current data master table to be cleaned, and each cleaning rule of the current data master table to be cleaned includes:
reading a data cleaning master control parameter table, and acquiring historical data cleaning global control parameters, wherein the historical data cleaning global control parameters comprise a cleaning execution control effective mark, a latest cleaning date and cleaning frequency;
when the cleaning execution control effective mark is effective, judging whether historical data needs to be cleaned currently or not according to the current date, the latest cleaning date and the cleaning frequency;
if yes, reading the data table to be cleaned, acquiring a current data master table to be cleaned, reading the cleaning rule table, acquiring each cleaning rule of the current data master table to be cleaned, reading the consistency rule table, and acquiring an association table corresponding to the current data master table to be cleaned.
Preferably, the sequentially cleaning the association table corresponding to the current data master table to be cleaned and the relevant historical data of the current data master table to be cleaned according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned includes:
determining a main key of a current data main table to be cleaned according to the data table to be cleaned and the data definition table;
generating an SQL statement for deleting the historical data of the current data main table to be cleaned according to the main key of the current data main table to be cleaned;
respectively generating SQL sentences for deleting the historical data of the association table corresponding to the current data main table to be cleaned according to the association table corresponding to the current data main table to be cleaned;
and sequentially executing the SQL statement for deleting the historical data of the association table corresponding to the current data main table to be cleaned and the SQL statement for deleting the historical data of the current data main table to be cleaned.
Preferably, each association table corresponding to the current master table of data to be cleaned corresponds to a number, and each number represents the strength of the association relationship between the corresponding association table and the current master table of data to be cleaned;
the sequentially executing the SQL statement for deleting the association table historical data corresponding to the current data master table to be cleaned and the SQL statement for deleting the historical data of the current data master table to be cleaned specifically comprises the following steps:
and sequentially executing SQL sentences for deleting the historical data of the association table corresponding to the current data main table to be cleaned according to the serial numbers from large to small, and then executing the SQL sentences for deleting the historical data of the current data main table to be cleaned.
Preferably, the method further comprises:
and recording a cleaning data log in the cleaning process of the historical data.
Preferably, the method further comprises:
and recording a cleaning process log in the cleaning process of the historical data.
An automatic cleaning device for historical data, comprising:
the reading unit is used for reading the data cleaning master control parameter table, the cleaning rule table, the consistency rule table and the data table to be cleaned to obtain a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned;
and the cleaning unit is used for sequentially cleaning the association table corresponding to the current data master table to be cleaned and the related historical data of the current data master table to be cleaned according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned.
Preferably, the reading unit includes:
the first reading subunit is used for reading the data cleaning master control parameter table and acquiring historical data cleaning global control parameters, wherein the historical data cleaning global control parameters comprise a cleaning execution control effective mark, the latest cleaning date and cleaning frequency, and the one-time cleaning date and cleaning frequency;
a judging subunit, configured to, when the cleaning execution control validation flag is valid, judge whether to need to clean the historical data currently according to the current date, the latest cleaning date, and the cleaning frequency; if yes, triggering a second reading subunit;
the second reading subunit is configured to read the data table to be cleaned, obtain the current data master table to be cleaned, read the cleaning rule table, obtain each cleaning rule of the current data master table to be cleaned, read the consistency rule table, and obtain the association table corresponding to the current data master table to be cleaned.
Preferably, the cleaning unit includes:
the determining subunit is used for determining a main key of the current data main table to be cleaned according to the data table to be cleaned and the data definition table;
the first generation subunit is used for generating an SQL statement for deleting the historical data of the current data main table to be cleaned according to the main key of the current data main table to be cleaned;
the second generation subunit is used for respectively generating SQL statements for deleting the historical data of the association table corresponding to the current main data table to be cleaned according to the association table corresponding to the current main data table to be cleaned;
and the execution subunit is used for sequentially executing the SQL statement for deleting the historical data of the association table corresponding to the current main table of the data to be cleaned and the SQL statement for deleting the historical data of the current main table of the data to be cleaned.
Preferably, the apparatus further comprises:
and the cleaning data log recording unit is used for recording the cleaning data log in the cleaning process of the historical data.
Preferably, the apparatus further comprises:
and the cleaning process log recording unit is used for recording the cleaning process log in the cleaning process of the historical data.
Compared with the prior art, the invention has the following beneficial effects:
the invention discloses an automatic historical data cleaning method and device, which are used for reading a data cleaning master control parameter table, a cleaning rule table, a consistency rule table and a data table to be cleaned to obtain a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned; and cleaning the current data master table to be cleaned and the related historical data of the association table corresponding to the current data master table to be cleaned according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned. And automatically cleaning the historical data according to a configurable cleaning rule, and ensuring the data consistency among the data tables in the cleaning process of the historical data.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a flowchart of a method for automatically cleaning historical data according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for automatically cleaning historical data according to the embodiment of the present invention;
FIG. 3 is a schematic diagram of a general control parameter table for data cleaning according to an embodiment of the present invention;
FIG. 4 is a disclosed coherency rule representation intent of an embodiment of the present invention;
FIG. 5 is a schematic diagram of a to-be-cleaned data table according to an embodiment of the disclosure;
FIG. 6 is a representation of a cleaning rule disclosed in an embodiment of the present invention;
FIG. 7 is a representation of data definition disclosed in an embodiment of the present invention;
FIG. 8 is a flowchart of another method for automatically cleaning historical data according to an embodiment of the present disclosure;
FIG. 9 is a representation of a cleaning data log according to an embodiment of the disclosure;
FIG. 10 is a log representation of a cleaning process disclosed in an embodiment of the present invention;
fig. 11 is a schematic structural diagram of an automatic historical data cleaning device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present embodiment discloses an automatic historical data cleaning method, which specifically includes the following steps:
s101: reading a data cleaning master control parameter table, a cleaning rule table, a consistency rule table and a data table to be cleaned to obtain a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned;
referring to fig. 2, a preferred embodiment of S101 includes:
s201: reading a data cleaning master control parameter table, and acquiring historical data cleaning global control parameters which comprise a cleaning execution control effective mark, a last cleaning date and cleaning frequency;
s202: when the cleaning execution control effective mark is effective, judging whether historical data needs to be cleaned currently or not according to the current date, the latest cleaning date and the cleaning frequency;
if yes, execute S203, S203: reading a data table to be cleaned, acquiring a current data master table to be cleaned, reading a cleaning rule table, acquiring each cleaning rule of the current data master table to be cleaned, reading a consistency rule table, and acquiring an association table corresponding to the current data master table to be cleaned;
if not, the process returns to the step S202.
It should be noted that the data cleaning master control parameter table includes a system identifier, a cleaning execution control valid flag, whether a consistency check has passed, whether a cleaning rule is recorded, whether a table structure definition of cleaned data is recorded, whether data is backed up, a last cleaning date, a cleaning frequency, a number of records per COMMIT, a modification date, a modification time, a modification corresponding accounting date, a modification teller, an authorized teller, and the like, and specifically refer to fig. 3, and fig. 3 is a schematic diagram of the data cleaning master control parameter table.
The consistency rule table defines the relevance between the main table and the association table so as to ensure the consistency between the data tables when the association table is subjected to data cleaning. The consistency rule table is generated in a script form in the development process and is issued with the version, and a transaction maintenance function is not provided.
The main table is a data table in which the data of the association table has dependency relationship with the main table, and the external key of the association table is the main key of the main table. For example, a data model composed of three tables of the simplest department, employee and employee punch-card records is taken as an example to illustrate, and for the relationship between the department table and the employee table, the department table is a main table and the employee table is an association table. When a department table record is deleted, it must be guaranteed that no employee records exist under that department.
The principle requirements for data consistency are:
1. the data between the main table and the associated table is the data connected by the same main key plus the foreign key, and if the record of the main table is cleared, the data of the associated table must be cleared at the same time or in advance;
2. if a record of the association table still exists, then the main table record must also exist.
The consistency rule table includes fields such as a main table name, an association relation grouping category, and a sequence number in an association relation grouping group, and specifically, referring to fig. 4, fig. 4 is a consistency rule representation intention.
Referring to fig. 5, fig. 5 is a schematic diagram of a to-be-cleaned data table. The data table to be cleaned records field information such as whether to record cleaning process logs and whether to record cleaned data.
The data cleaning rule can be only defined for the data tables in the data tables to be cleaned. The condition between the same records is the relationship with (AND). Each table may define a plurality of rule records. The relationship of OR (OR) between the multiple records.
Cleaning rules are usually defined for the main table, and the data cleaning of the association table is performed according to the cleaned data of the main table. Please refer to fig. 6 for a detailed structure of the cleaning rule table.
S102: and according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned, cleaning the association table corresponding to the current data master table to be cleaned and the related historical data of the current data master table to be cleaned in sequence.
Referring to fig. 2, a preferred embodiment of S102 is as follows:
s204: determining a main key of a current data main table to be cleaned according to the data table to be cleaned and the data definition table;
s205: generating an SQL statement for deleting the historical data of the current data main table to be cleaned according to the main key of the current data main table to be cleaned;
s206: respectively generating SQL sentences for deleting the historical data of the association table corresponding to the current data main table to be cleaned according to the association table corresponding to the current data main table to be cleaned;
s207: and sequentially executing SQL statements for deleting the historical data of the association table corresponding to the current main data table to be cleaned and SQL statements for deleting the historical data of the current main data table to be cleaned.
It should be noted that, each time the version update of the application system is installed, if a data table structure is changed, all the data table definitions of the latest version should be written into the data definition table. Referring to fig. 7, fields of a corresponding version number, a main table name, a data column name, a type, a length, a primary key, nullable, and the like are recorded in the data definition table.
Preferably, when each association table corresponding to the current master table of data to be cleaned corresponds to a number, each number represents the strength of the association relationship between the corresponding association table and the current master table of data to be cleaned; the executing and deleting the SQL statement of the historical data of the current data main table to be cleaned and each SQL statement of the historical data of the association table corresponding to the current data main table to be cleaned specifically comprises the following steps:
and sequentially executing SQL sentences for deleting the historical data of the association table corresponding to the current main data table to be cleaned according to the serial numbers from large to small, and then executing SQL sentences for deleting the historical data of the current main data table to be cleaned.
It should be noted that, the larger the number is, the weaker the association relationship between the corresponding association table and the current data master table to be cleaned is.
The automatic historical data cleaning method disclosed in this embodiment reads a data cleaning master control parameter table, a cleaning rule table, a consistency rule table, and a data table to be cleaned, and obtains a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned, and each cleaning rule of the current data master table to be cleaned; and cleaning the current data master table to be cleaned and the related historical data of the association table corresponding to the current data master table to be cleaned according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned. And automatically cleaning the historical data according to a configurable cleaning rule, and ensuring the data consistency among the data tables in the cleaning process of the historical data.
Referring to fig. 8, the present embodiment discloses another automatic historical data cleaning method, which specifically includes the following steps:
s301: reading a data cleaning master control parameter table, a cleaning rule table, a consistency rule table and a data table to be cleaned to obtain a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned;
s302: according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned, cleaning an association table corresponding to the current data master table to be cleaned and related historical data of the current data master table to be cleaned in sequence;
s303: in the process of cleaning the historical data, recording a cleaning data log;
referring to fig. 9, fig. 9 shows the cleaning data log representation intention, and the cleaning data log is recorded in the cleaning data log table, which specifically includes fields such as cleaning batch, data table name, cleaning date, cleaning time, primary key field name, and primary key value.
S304: and recording a cleaning process log in the cleaning process of the historical data.
Referring to fig. 10, fig. 10 is a diagram illustrating a cleaning process log, in which the cleaning process log is recorded in a cleaning process log table, and the cleaning process log specifically includes fields such as cleaning batch, cleaning table name, cleaning date, data backup tape, cleaning operation, cleaning record number, cleaning start time, and cleaning end time.
In the automatic historical data cleaning method disclosed in this embodiment, in the historical data cleaning process, the cleaning data log and the cleaning process log are recorded, so that backtracking and data recovery can be performed according to the cleaning data log and the cleaning process log.
Based on the above method for automatically cleaning historical data disclosed in the embodiment, referring to fig. 11, the embodiment correspondingly discloses an automatic cleaning device for historical data, which includes:
a reading unit 401, configured to read a data cleaning master control parameter table, a cleaning rule table, a consistency rule table, and a data table to be cleaned, to obtain a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned, and each cleaning rule of the current data master table to be cleaned;
a cleaning unit 402, configured to sequentially clean the association table corresponding to the current data master table to be cleaned and the relevant historical data of the current data master table to be cleaned according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned.
Preferably, the reading unit 401 includes:
the first reading subunit is used for reading a data cleaning master control parameter table and acquiring historical data cleaning global control parameters, wherein the historical data cleaning global control parameters comprise a cleaning execution control effective mark, a latest cleaning date and a cleaning frequency;
a judging subunit, configured to, when the cleaning execution control validation flag is valid, judge whether to need to clean the historical data currently according to the current date, the latest cleaning date, and the cleaning frequency; if yes, triggering a second reading subunit;
the second reading subunit is configured to read the data table to be cleaned, obtain the current data master table to be cleaned, read the cleaning rule table, obtain each cleaning rule of the current data master table to be cleaned, read the consistency rule table, and obtain the association table corresponding to the current data master table to be cleaned.
Preferably, the cleaning unit 402 includes:
the determining subunit is used for determining a main key of the current data main table to be cleaned according to the data table to be cleaned and the data definition table;
the first generation subunit is used for generating an SQL statement for deleting the historical data of the current data main table to be cleaned according to the main key of the current data main table to be cleaned;
the second generation subunit is used for respectively generating SQL statements for deleting the historical data of the association table corresponding to the current main data table to be cleaned according to the association table corresponding to the current main data table to be cleaned;
and the execution subunit is used for sequentially executing the SQL statement for deleting the historical data of the association table corresponding to the current main data table to be cleaned and the SQL statement for deleting the historical data of the current main data table to be cleaned.
Preferably, the apparatus further comprises:
and the cleaning data log recording unit is used for recording the cleaning data log in the cleaning process of the historical data.
Preferably, the apparatus further comprises:
and the cleaning process log recording unit is used for recording the cleaning process log in the cleaning process of the historical data.
The automatic historical data cleaning device disclosed in this embodiment reads a data cleaning master control parameter table, a cleaning rule table, a consistency rule table and a data table to be cleaned, and obtains a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned, and each cleaning rule of the current data master table to be cleaned; and cleaning the current data master table to be cleaned and the related historical data of the association table corresponding to the current data master table to be cleaned according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned. And automatically cleaning the historical data according to a configurable cleaning rule, and ensuring the data consistency among the data tables in the cleaning process of the historical data.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A method for automatically cleaning historical data is characterized by comprising the following steps:
reading a data cleaning master control parameter table, and acquiring historical data cleaning global control parameters which comprise a cleaning execution control effective mark, a last cleaning date and cleaning frequency;
when the cleaning execution control effective mark is effective, judging whether historical data needs to be cleaned currently or not according to the current date, the latest cleaning date and the cleaning frequency;
if yes, reading the data table to be cleaned, acquiring a current data master table to be cleaned, reading a cleaning rule table, acquiring each cleaning rule of the current data master table to be cleaned, reading a consistency rule table, and acquiring an association table corresponding to the current data master table to be cleaned; the consistency rule table defines the association between the main table and the association table, and at least comprises the following fields: the main table name, the associated relation grouping category and the serial number in the associated relation grouping group;
determining a main key of a current data main table to be cleaned according to the data table to be cleaned and the data definition table; the data definition table is used for recording definition fields of all data tables of the latest version when the structure of the data table is changed;
generating an SQL statement for deleting the historical data of the current data main table to be cleaned according to the main key of the current data main table to be cleaned;
respectively generating SQL sentences for deleting the historical data of the association table corresponding to the current data master table to be cleaned according to the association table corresponding to the current data master table to be cleaned;
and sequentially executing SQL statements for deleting the historical data of the association table corresponding to the current main data table to be cleaned and SQL statements for deleting the historical data of the current main data table to be cleaned.
2. The method according to claim 1, wherein each association table corresponding to the current master table of data to be cleaned corresponds to a number, and each number represents the strength of the association relationship between the corresponding association table and the current master table of data to be cleaned;
the sequentially executing the SQL statement for deleting the association table historical data corresponding to the current data master table to be cleaned and the SQL statement for deleting the historical data of the current data master table to be cleaned specifically comprises the following steps:
and sequentially executing SQL sentences for deleting the historical data of the association table corresponding to the current data main table to be cleaned according to the serial numbers from large to small, and then executing the SQL sentences for deleting the historical data of the current data main table to be cleaned.
3. The method of claim 1, further comprising:
and recording a cleaning data log in the cleaning process of the historical data.
4. The method of claim 1, further comprising:
and recording a cleaning process log in the cleaning process of the historical data.
5. An automatic cleaning device for historical data, characterized by comprising:
the reading unit is used for reading the data cleaning master control parameter table, the cleaning rule table, the consistency rule table and the data table to be cleaned to obtain a current data master table to be cleaned, an association table corresponding to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned;
the cleaning unit is used for sequentially cleaning the association table corresponding to the current data master table to be cleaned and the related historical data of the current data master table to be cleaned according to the current data master table to be cleaned and each cleaning rule of the current data master table to be cleaned;
the cleaning unit includes:
the determining subunit is used for determining a main key of the current data main table to be cleaned according to the data table to be cleaned and the data definition table; the data definition table is used for recording definition fields of all data tables of the latest version when the structure of the data table is changed;
the first generation subunit is used for generating an SQL statement for deleting the historical data of the current data main table to be cleaned according to the main key of the current data main table to be cleaned;
the second generation subunit is used for respectively generating SQL statements for deleting the historical data of the association table corresponding to the current data main table to be cleaned according to the association table corresponding to the current data main table to be cleaned;
the execution subunit is used for sequentially executing an SQL statement for deleting the historical data of the association table corresponding to the current main data table to be cleaned and an SQL statement for deleting the historical data of the current main data table to be cleaned;
the reading unit includes:
the first reading subunit is used for reading a data cleaning master control parameter table and acquiring historical data cleaning global control parameters, wherein the historical data cleaning global control parameters comprise a cleaning execution control effective mark, a latest cleaning date and cleaning frequency;
a judging subunit, configured to, when the cleaning execution control validation flag is valid, judge whether to need to clean the historical data currently according to the current date, the latest cleaning date, and the cleaning frequency; if yes, triggering a second reading subunit;
the second reading subunit is configured to read the data table to be cleaned, obtain a current data master table to be cleaned, read the cleaning rule table, obtain each cleaning rule of the current data master table to be cleaned, read the consistency rule table, and obtain an association table corresponding to the current data master table to be cleaned; the consistency rule table defines the association between the main table and the association table, and at least comprises the following fields: the main table name, the associated relation grouping category and the serial number in the associated relation grouping group.
6. The apparatus of claim 5, further comprising:
and the cleaning data log recording unit is used for recording the cleaning data log in the cleaning process of the historical data.
7. The apparatus of claim 5, further comprising:
and the cleaning process log recording unit is used for recording the cleaning process log in the cleaning process of the historical data.
CN201810104814.8A 2018-02-02 2018-02-02 Automatic historical data cleaning method and device Active CN108319694B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810104814.8A CN108319694B (en) 2018-02-02 2018-02-02 Automatic historical data cleaning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810104814.8A CN108319694B (en) 2018-02-02 2018-02-02 Automatic historical data cleaning method and device

Publications (2)

Publication Number Publication Date
CN108319694A CN108319694A (en) 2018-07-24
CN108319694B true CN108319694B (en) 2023-01-20

Family

ID=62891542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810104814.8A Active CN108319694B (en) 2018-02-02 2018-02-02 Automatic historical data cleaning method and device

Country Status (1)

Country Link
CN (1) CN108319694B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110222039B (en) * 2019-05-07 2023-09-29 平安科技(深圳)有限公司 Data storage and garbage data cleaning method, device, equipment and storage medium
CN111597180A (en) * 2020-05-19 2020-08-28 山东汇贸电子口岸有限公司 Data cleaning method of OTRS system based on storage process
CN112559511B (en) * 2021-02-25 2021-06-01 江苏苏宁银行股份有限公司 Deposit system historical data cleaning method and system

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092631A (en) * 2007-04-06 2013-05-08 西安万年科技实业有限公司 Database application system development platform and development method
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8788457B2 (en) * 2007-09-21 2014-07-22 International Business Machines Corporation Ensuring that the archival data deleted in relational source table is already stored in relational target table

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092631A (en) * 2007-04-06 2013-05-08 西安万年科技实业有限公司 Database application system development platform and development method
CN104036001A (en) * 2014-06-13 2014-09-10 上海新炬网络技术有限公司 Dynamic hotlist priority scheduling based quick data cleaning method

Also Published As

Publication number Publication date
CN108319694A (en) 2018-07-24

Similar Documents

Publication Publication Date Title
CN107506451B (en) Abnormal information monitoring method and device for data interaction
CN101719149B (en) Data synchronization method and device
CN108319694B (en) Automatic historical data cleaning method and device
CN101401097A (en) Detecting database events using recovery logs
CN109471851B (en) Data processing method, device, server and storage medium
CN107924357B (en) Job managing apparatus and job management method
JP2008262537A (en) Reasoning information based on retrieving case from archive record
CN106371953A (en) Compact binary event log generation
Kleppmann Designing data-intensive applications
CN110222039B (en) Data storage and garbage data cleaning method, device, equipment and storage medium
CN106503260B (en) Method and device for improving effective storage space of database
KR101588375B1 (en) Method and system for managing database
CN111488117B (en) Method, electronic device, and computer-readable medium for managing metadata
CN110704468A (en) Data updating method and device and controller
CN116541403A (en) Method, system, electronic device and storage medium for capturing database data change in real time
CN110928883A (en) Data archiving method and device
CN104391945A (en) Method and device for processing database file data index
JP2018085042A (en) Database management device, information processing system, database management method and database management program
JP2017010376A (en) Mart-less verification support system and mart-less verification support method
JP2010152707A (en) Backup method of database and database system
CN112783927B (en) Database query method and system
US11831490B1 (en) Systems, methods, and media for performing information technology service management correlation for infrastructure environment functions
JP2008139994A (en) System for managing design change time influence, management method for design change time influecne and management program for design change time influence
JP4663526B2 (en) Form creation support device, form creation support method, and form creation support program
CN117633000A (en) Slow SQL analysis treatment method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant