CN116775626A - Historical data cleaning method and device, electronic equipment and storage medium - Google Patents

Historical data cleaning method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN116775626A
CN116775626A CN202310667533.4A CN202310667533A CN116775626A CN 116775626 A CN116775626 A CN 116775626A CN 202310667533 A CN202310667533 A CN 202310667533A CN 116775626 A CN116775626 A CN 116775626A
Authority
CN
China
Prior art keywords
data
processed
historical data
partition table
partition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310667533.4A
Other languages
Chinese (zh)
Inventor
张明鸥
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Bank of China Ltd
Original Assignee
Bank of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bank of China Ltd filed Critical Bank of China Ltd
Priority to CN202310667533.4A priority Critical patent/CN116775626A/en
Publication of CN116775626A publication Critical patent/CN116775626A/en
Pending legal-status Critical Current

Links

Abstract

The application provides a historical data cleaning method, a device, electronic equipment and a storage medium, which can be used in the field of big data. The method comprises the following steps: receiving a deleting instruction statement, and analyzing the deleting instruction statement to obtain historical data to be processed; creating a partition table based on attribute information of the to-be-processed historical data, acquiring a to-be-processed data table based on the name of the table where the to-be-processed historical data is located, and judging whether a primary key exists in the attribute information in the to-be-processed data table; if the partition table exists, redefining the partition table based on the main key, cleaning historical data in the redefined partition table in the processing execution time, and carrying out refreshing operation; and if the data does not exist, acquiring an identification value in the data table to be processed, redefining the partition table based on the identification value, cleaning the historical data in the redefined partition table at the processing execution time, and performing refreshing operation. The method of the application has strong flexibility and can be applied to various service data tables.

Description

Historical data cleaning method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of big data, and in particular, to a method and apparatus for cleaning historical data, an electronic device, and a storage medium.
Background
In application system databases in many fields, data are required to be acquired from various institutions every day for processing, so that stock data of the database is in a continuously growing state, and based on service requirements, massive useless historical data and service processing intermediate data are stored in the database, so that the influence of historical data accumulation on the stability of a production environment is prevented, and the historical data are required to be cleaned regularly.
In the prior art, setting up an intermediate table with the same structure as the original data table to be cleaned and a materialized view with the same name as the intermediate table, putting data which does not meet the deleting condition into the materialized view, and adding materialized view logs for the materialized view; furthermore, when the database is idle, the materialized view is completely refreshed once, then indexes, constraints and the like of a target table are created on an intermediate table, further, the materialized view is locked to an original table after being refreshed in one increment and twice, the last increment is refreshed, then the original data table is renamed to other names, the materialized view is deleted, the intermediate table is modified to the name of the original data table, and the aim of cleaning historical data in the data table is achieved, wherein the data table is required to be provided with a main key.
However, due to the variety of service data tables, some data tables cannot create a primary key, and the flexibility is poor by cleaning the historical data in the data tables by the mode.
Disclosure of Invention
The application provides a historical data cleaning method, a device, electronic equipment and a storage medium, which are used for solving the problem that the flexibility is poor when cleaning the historical data in a data table because the variety of service data tables is various and a primary key cannot be established for some data tables.
In a first aspect, the present application provides a method for cleaning historical data, including:
receiving a deleting instruction statement, and analyzing the deleting instruction statement to obtain historical data to be processed; the history data to be processed includes: the name of a table where the history data to be processed is located, attribute information corresponding to the history data to be processed and processing execution time;
creating a partition table based on attribute information corresponding to the historical data to be processed, acquiring a data table to be processed based on the name of the table where the historical data to be processed is located, and judging whether the attribute information in the data table to be processed has a primary key or not;
if the partition table exists, redefining the partition table based on the main key, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table;
If the data in the partition table does not exist, acquiring an identification value in the data table to be processed, redefining the partition table based on the identification value, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table.
In a second aspect, the present application provides a historical data cleaning apparatus, comprising:
the receiving module is used for receiving the deleting instruction statement and analyzing the deleting instruction statement to obtain the history data to be processed; the history data to be processed includes: the name of a table where the history data to be processed is located, attribute information corresponding to the history data to be processed and processing execution time;
the creating module is used for creating a partition table based on the attribute information corresponding to the historical data to be processed, acquiring a data table to be processed based on the name of the table where the historical data to be processed is located, and judging whether the attribute information in the data table to be processed has a primary key or not;
the first cleaning module is used for redefining the partition table based on a main key when the attribute information in the data table to be processed has the main key, cleaning historical data in the redefined partition table in the processing execution time, and carrying out refreshing operation on the cleaned partition table;
And the second cleaning module is used for acquiring the identification value in the data table to be processed when the attribute information in the data table to be processed does not have a primary key, redefining the partition table based on the identification value, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table.
In a third aspect, the present application provides an electronic device comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored by the memory to implement the method of any one of the first aspects.
In a fourth aspect, the present application provides a computer-readable storage medium having stored therein computer-executable instructions for performing the method of any of the first aspects when executed by a processor.
In a fifth aspect, the application provides a computer program product comprising a computer program which, when executed by a processor, implements the method according to any of the first aspects.
In summary, according to the method, the device, the electronic device and the storage medium for cleaning the historical data provided by the application, the to-be-processed historical data is obtained by receiving the deletion instruction statement and analyzing the deletion instruction statement, further, a partition table is created based on attribute information corresponding to the to-be-processed historical data, the to-be-processed data table is obtained based on the name of the table where the to-be-processed historical data is located, and redefining is performed by judging whether the attribute information in the to-be-processed data table exists in a main key or in a manner of adopting an identification value ROWID, so that the historical data in the redefined partition table can be cleaned in the processing execution time, and the cleaned partition table is refreshed, wherein the to-be-processed historical data comprises: the name of a table where the history data to be processed is located, attribute information corresponding to the history data to be processed and processing execution time; therefore, the application provides more flexibility for cleaning the historical data of the data tables with different service purposes, and the cleaning tasks can be executed in batches during the idle period of the service through the set processing execution time, so that extra system resources can not be occupied during the busy period of the service.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application;
FIG. 2 is a flow chart of a method for cleaning historical data according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a functional module according to an embodiment of the present application;
FIG. 4 is a flowchart of a specific method for cleaning historical data according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of a historical data cleaning device according to an embodiment of the present application;
fig. 6 is a schematic block diagram of an electronic device according to an embodiment of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with related laws and regulations and standards, and provide corresponding operation entries for the user to select authorization or rejection.
It should be noted that the method and the device for cleaning the historical data provided by the application can be used in the field of big data and can also be used in any field except the financial field, and the application field of the method and the device for cleaning the historical data is not limited.
In the present application, the words "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "for example" should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the present application, "at least one" means one or more, and "a plurality" means two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a alone, a and B together, and B alone, wherein a, B may be singular or plural. The character "/" generally indicates that the context-dependent object is an "or" relationship. "at least one of" or the like means any combination of these items, including any combination of single item(s) or plural items(s). For example, at least one (one) of a, b, or c may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or plural.
The following is a description of the generic terms related to the present application.
Online redefinition (Online Redefinition): one property that may be referred to as Oracle. Note that the online REDEFINITION function is introduced into the corresponding dbms_redefinition package in the Oracle9i and later versions.
Data manipulation language (Data Manipulation Language, DML): it may refer to an instruction set in a structured query language (Structured Query Language, SQL) which is responsible for running data access work on database objects, and uses INSERT, UPDATE, DELETE three instructions as cores, which represent insertion, update and deletion, respectively, and the three instructions are used for developing data-centric applications.
Procedural SQL language (Procedural Language/SQL, PL/SQL): may refer to a programming language that is an extension of the SQL statement by the Oracle database. Specifically, through adding the characteristics of programming language to the use of ordinary SQL sentences, PL/SQL organizes data operations and query sentences in procedural units of PL/SQL codes, and realizes complex functions or calculation through logic judgment, circulation and other operations.
In application system databases in many fields, data needs to be acquired from various institutions for processing every day, for example, a fund valuation system needs to acquire transaction data from various financial institutions for processing every day, and a daily accumulation database stores huge amounts of useless historical transaction data and business processing intermediate data. To prevent the accumulation of historical data from affecting the stability of the production environment, periodic cleaning is required.
In one possible implementation, for an application system that cannot be shutdown for 7×24 hours, the most primitive DML operation can be used to delete data because the historical data cleaning cannot be accelerated by a method such as table rebuilding or table space migration.
However, performing a large number of DML concurrent operations may prevent the execution of DML statements for other sessions or transactions, resulting in abnormally slow operation of the business system and hidden system safety hazards.
In another possible implementation manner, when the service system performs cleaning of the historical data in the large data table under the condition that the service system cannot stop for 7×24 hours, an ON pre materialized view method may be adopted, specifically, by establishing an intermediate table with the same structure as the original data table to be cleaned and a materialized view with the same name as the intermediate table, placing the data which does not meet the deletion condition into the materialized view, and adding materialized view logs for the materialized view; furthermore, when the database is idle, the materialized view is completely refreshed once, then indexes, constraints and the like of a target table are created on an intermediate table, further, the materialized view is locked to an original table after being refreshed in one increment and twice, the last increment is refreshed, then the original data table is renamed to other names, the materialized view is deleted, the intermediate table is modified to the name of the original data table, and the aim of cleaning historical data in the data table is achieved, wherein the data table is required to be provided with a main key.
However, due to the variety of service data tables, some data tables cannot create a primary key, and the flexibility is poor by cleaning the historical data in the data tables by the mode.
In view of the above problems, the present application provides a method for cleaning historical data, which converts a data table to be cleaned into a partition table with a deletion condition as a range in an online redefinition manner, and further deletes the historical data conforming to the deletion condition in the partition table, so that cleaning of the historical data can be accelerated, the problem that the efficiency of deleting the historical data is low in a traditional loop manner of acquiring a ROWID (identification value) through a PL/SQL process cursor is solved, and meanwhile, the service application is always ensured to be online, wherein the online redefinition manner includes using a primary key or using a ROWID manner, so that more flexibility is provided for cleaning the historical data of data tables with different service uses, and in addition, the historical data cleaning logic can be loaded to a timing task module, that is, execution parameters such as service execution time are set, so that cleaning tasks are executed in batches during idle service periods, no extra system resources are occupied during busy service periods are ensured, and resource utilization can be maximized.
Fig. 1 is a schematic view of an application scenario provided in an embodiment of the present application, where, as shown in fig. 1, the application scenario includes a display device 101 and a service processing system 102 that are operable by a user; taking a banking scene as an example, specifically, a user performs a deletion operation on user transaction data of 20XX years in a banking system based on a display device 101, and correspondingly, a service processing system 102 responds to the deletion operation of the user, searches a data table to be processed in a database, converts the data table to be processed into a partition table with a deletion condition corresponding to the deletion operation as a range in an online redefinition mode, deletes historical data conforming to the deletion condition in the partition table, and obtains a data table of the deletion historical data, wherein the deletion condition is the user transaction data of 20XX years.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 2 is a flow chart of a method for cleaning historical data according to an embodiment of the present application, as shown in fig. 2, where the method for cleaning historical data includes the following steps:
s201, receiving a deleting instruction statement, and analyzing the deleting instruction statement to obtain historical data to be processed; the history data to be processed includes: the name of the table where the history data to be processed is located, attribute information corresponding to the history data to be processed, and processing execution time.
In the embodiment of the present application, the attribute information is an alias that can be understood, a set data format and a modified data type that are defined by the system for a basic field in a relational data source, and the attribute information may include: the application embodiment does not limit specific content contained in the attribute information, such as type information, size information, and format information of the history data to be processed, such as column name, table type, and the like.
In this step, loading is performed by setting a timing task, so as to generate a deletion instruction statement containing processing execution time, where the processing execution time enables the historical data cleaning task to be executed in batch during a service idle period, and specifically, by inheriting a Sring timer QuartzJobBean, a historical data cleaning timing task implementation class is created, so that the timing task loading can be implemented.
Further, analyzing the deleting instruction statement to obtain the to-be-processed historical data, namely, reading the data table configuration in the timing task implementation class major function to obtain the range of the to-be-cleaned data table, such as the name of the table where the to-be-processed historical data is located, attribute information corresponding to the to-be-processed historical data and the like.
It should be noted that, the historical data may correspond to different types of data in different application scenarios, and the embodiment of the present application does not specifically limit the type and content of the historical data, for example, the historical data may be transaction data.
S202, creating a partition table based on attribute information corresponding to the historical data to be processed, acquiring a data table to be processed based on the name of the table where the historical data to be processed is located, and judging whether the attribute information in the data table to be processed has a primary key or not.
In the embodiment of the present application, the partition table may refer to dividing the data of the data table to be processed into a plurality of small subsets called partitions, and the types of partition tables are mainly divided into: range, list and hash partition, the division basis is according to the attribute information corresponding to the history data to be processed, and it should be noted that the partition table may create a specific partition index for querying and classifying.
In this step, by determining whether the attribute information in the data table to be processed has a primary key, which is one or more fields in the data table to be processed, whose values are used for a certain record in the unique identification table and may be composed of a plurality of keywords, in which the present embodiment is not particularly limited, the pseudo primary key may refer to a new column that is used to store a pseudo value and is meaningless to the domain model of the table, is used to determine a record in the table, may be used as the primary key of the data table, and the pseudo primary key requires a unique key and all columns are non-null.
ROWID may refer to a pseudo-column that is used to uniquely mark a row in a data table. It is the internal address of the line data in the physical table and may contain two addresses, one of which is the address of the data file stored in the block pointing to the line contained in the data table, and the other is the address in the data block of this line that can be located directly to the data line itself.
Specifically, a temporary partition TABLE of the data TABLE to be cleaned may be created, where the partition of the temporary partition TABLE uses the condition of the historical data to be deleted as a range, and a dbms_redefinition_can_redef_table (USERNAME, TABLENAME, OPTIONS _flag) instruction is called to determine whether a primary key exists in the data TABLE to be processed, where can_redef_table indicates whether the checked data TABLE to be processed CAN be redefined, and if the storage process is executed successfully, it represents that redefinition CAN be performed.
It should be noted that, the present application may also determine what redefinition method is used to clean the history data by determining whether the attribute information in the data table to be processed has a pseudo primary key. In the redefinition process using the primary key or the pseudo primary key, the data tables of the versions before redefinition and after redefinition have the same primary key or pseudo primary key column.
And S203, redefining the partition table based on the primary key if the partition table exists, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table.
In this step, if it is determined that the primary key exists in the data TABLE to be cleaned, the operation_flag is assigned as the dbms_redefinition_CONS_USE_PK, and the dbms_redefinition_START_REDEF_TABLE (USERNAME, TABLENAME, TEMPTABLENAME, OPTIONS _flag) instruction is called to start online redefinition, where TEMTABLENAME is the name of the partition TABLE, the operation_flag is correspondingly the name of the dbms_redefinition_CONS_USE_PK (representing the primary key), and TABLENAME is the name of the data TABLE to be cleaned, and may be modified according to different application scenarios.
Further, a DBMS_REDEFINITION_COPY_TABLE_ DEPENDENTS instruction is called, triggers, indexes and constraints are established on the partition TABLE, and corresponding authorization is performed; after authorization, invoking a DBMS_REDEFINITION_REDEF_TABLE (USERNAME, TABLENAME, TEMPTABLENAME) instruction to complete redefinition of the data TABLE to be cleaned, wherein the data TABLE to be cleaned and the partition TABLE have the same attribute, index, constraint, authorization and trigger; wherein copy_table_ DEPENDENTS represents objects that synchronize indexes and dependencies, including indexes, constraints, triggers, permissions, etc., finish_redef_table: indicating that online redefinition is complete.
Further, calling ALTER TABLE TABLENAME TRUNCATE PARTITION XX instruction to realize quick cleaning of the historical data in the partition table, and after cleaning the historical data, refreshing the partition table to obtain a processed data table; XX is a partition block name in a partition table corresponding to the historical data, and the refreshing operation is based on a common refreshing mode.
It should be noted that before synchronizing the index and the DEPENDENT OBJECT, the operation of registering the DEPENDENT OBJECT may be performed through a register_ DEPENDENTS _object instruction, such as an index, a constraint, a trigger, or the like, or the operation of not registering the DEPENDENT OBJECT may be performed through a register_DependENT_object instruction, which is not limited in particular, and is determined according to a specific application scenario.
S204, if the identification value does not exist, acquiring the identification value in the data table to be processed, redefining the partition table based on the identification value, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table.
In this step, if it is determined that the data TABLE to be cleaned does not have the primary key, the operation_flag is assigned as the dbms_redefinition_condition_row, and the dbms_redefinition_start_redef_table (USERNAME, TABLENAME, TEMPTABLENAME, OPTIONS _flag) instruction is called to start online redefinition, where the temp tab is the name of the partition TABLE and the operation_flag corresponds to the dbms_redefinition_con_use_row.
It should be noted that, the following redefinition process, such as setting up a trigger, an index, and a constraint on the partition table, and performing corresponding authorization, is similar to the redefinition process based on the primary key, and the details are described in S203, which is not repeated herein, where the refresh operation is an operation performed in the refresh mode based on the ROWID.
It should be noted that, since Index-Organized Table (IOT) does not support the use of the ROWID redefinition method, when the ROWID is used for redefinition, a hidden field m_row $ is added to the redefined version of the data Table, and the ALTER TABLE TABLE _ NAME DROP UNUSED COLUMNS instruction may be invoked to delete the hidden column m_row $ or mark the hidden column m_row $ as unused.
Therefore, the embodiment of the application provides a method for cleaning historical data, which can obtain the historical data to be processed by receiving a deleting instruction statement and analyzing the deleting instruction statement, further, creates a partition table based on attribute information corresponding to the historical data to be processed, acquires the data table to be processed based on the name of the table where the historical data to be processed is located, redefines the data table by judging whether the attribute information in the data table to be processed has a main key or not and further adopts a main key or an identification value ROWID, so that the historical data in the redefined partition table can be cleaned in the processing execution time, and refreshes the redefined partition table; therefore, the application provides more flexibility for cleaning the historical data of the data tables with different service purposes, and the cleaning tasks can be executed in batches during the idle period of the service through the set processing execution time, so that extra system resources can not be occupied during the busy period of the service.
It should be noted that, the present application is based on the history data cleaning performed by the iBATIS framework calling the history data cleaning SQL execution script, as shown in the embodiment of fig. 2.
The iBATIS framework may include a plurality of functional modules, and is configured to call historical data cleaning performed by an SQL execution script, and fig. 3 is a schematic structural diagram of one functional module provided in an embodiment of the present application, and as shown in fig. 3, the iBATIS framework includes a timed task loading module, a data table configuration module and a historical data cleaning module; the timing task loading module is used for loading the historical data cleaning module into the timing task and setting execution parameters of the task, such as task processing time; the timing task loading module can support dynamic adjustment of the range of the data table to be cleaned, and the range list is read in the cleaning module; the history data cleaning module is used for calling the DBMS_REDEFINITION packet to redefine the data table in the range of the data table configuration module on line, and using a TRUNCATE history partition method to clean the data quickly.
Optionally, creating a partition table based on attribute information corresponding to the to-be-processed historical data includes:
judging whether the data table to be processed supports redefinition or not based on attribute information corresponding to the historical data to be processed;
if the data table to be processed is determined to support redefinition, locking the data table to be processed, and creating a partition table by taking attribute information corresponding to the historical data to be processed as a partition range;
and if the data table to be processed is determined not to support redefinition, generating error reporting information for feedback.
In this step, whether the data TABLE to be processed supports redefinition CAN be checked by calling a can_redef_table instruction, specifically, whether the storage process of the history data to be processed and the corresponding attribute information is executed successfully or not is judged, if the storage process is executed successfully, redefinition CAN be performed, otherwise, redefinition cannot be performed.
After the fact that the data table to be processed supports redefinition is confirmed, the data table to be processed is locked, so that the occurrence of inaccurate data deletion caused by data table change is reduced, further, a partition table taking attribute information corresponding to historical data to be processed as a partition range is created, namely, the partition table taking a deletion condition as a range, if the fact that the data table to be processed does not support redefinition is confirmed, error reporting information is generated for feedback, the error reporting information can be sent to terminal equipment of a user in the form of a short message or a display frame, and can also be directly displayed on display equipment of a service processing system.
Therefore, the embodiment of the application can convert the data table to be cleaned into the partition table with the deletion condition as a range through online redefinition, thereby facilitating the acceleration of cleaning.
Optionally, the refreshing operation is performed on the cleaned partition table, including:
refreshing the cleaned partition table, exchanging the name of the refreshed partition table and the name of the data table to be processed, and unlocking the data table to be processed.
In this step, when refreshing the cleaned partition TABLE, the sync_intermediate_table instruction increment synchronization data may be called, or the absorpt_redef_table instruction clean redefined error and suspension redefinition may be called, and the adopted refreshing mode may be full refreshing or quick refreshing.
Further, the name of the refreshed partition table may be replaced by the name of the data table to be processed, the name of the data table to be processed may be replaced by the name of the partition table, or the name of the data table to be processed may be replaced by another name.
Because the data in the data table does not have real-time property, the data needs to be updated by refreshing operation, and the accuracy of the data is improved.
Optionally, the method further comprises:
and receiving an adjustment request, and adjusting the partition range based on the adjustment request.
In this step, in response to the adjustment operation of the user, the range of the data table to be cleaned may be dynamically adjusted, and different partition ranges may be adjusted according to the adaptability of different application scenarios.
Therefore, the embodiment of the application supports the dynamic adjustment of the range of the data table to be cleaned, and the processing flexibility can be improved by manually adjusting the division of the blocks.
Optionally, the history data to be processed further includes: processing execution frequency; cleaning historical data within the redefined partition table at the process execution time, comprising:
acquiring a dimension relation corresponding to the data table to be processed, and creating at least one thread based on the dimension relation;
and calling a deleting function by the at least one thread, wherein the deleting function is used for cleaning historical data in the redefined partition table based on the processing execution frequency at the processing execution time.
In the embodiment of the application, the threads are utilized to clean the historical data, and the processing modes can be parallel, serial, sequential, concurrent processing and the like, so the application is not limited in particular.
In this step, a plurality of threads may be created by acquiring a range of a data table to be cleaned and circulating the dimensions of the data table to be cleaned, and call () methods are respectively called to execute specific historical data cleaning logic, that is, at least one thread is used to call a deletion function to clean historical data, where the deletion function may be a trunk function, for example, a call ALTER TABLE TABLENAME TRUNCATE PARTITION XX instruction cleans the historical data in a redefined partition table based on a processing execution frequency in a defined processing execution time, and the processing execution frequency is the number of times of cleaning by the call instruction.
The DELETE function may also be other types of DELETE functions, such as DELETE functions, and embodiments of the present application are not limited in this regard.
It should be noted that, the trunk function is faster than the DELETE function, and uses less system and transaction log resources, the trunk function DELETEs data by releasing the data page used for storing table data, and only records the release of the page in the transaction log, the cleaning speed is fast, and each time the DELETE function performs line deletion, one item is recorded for each deleted line in the transaction log.
Therefore, the embodiment of the application adopts multithreading to carry out the historical data cleaning task, has high response speed, can stop running at any time, and reduces the running load.
Optionally, invoking the delete function with the at least one thread includes:
calculating the data amount of the historical data in the redefined partition table by utilizing a predefined algorithm;
judging whether the data quantity exceeds a preset threshold value or not;
and under the condition that the data volume is determined to be larger than the preset threshold value, calling a deleting function by utilizing the at least one thread in a parallel processing mode.
In the embodiment of the application, the preset threshold value can be a threshold value which is set in advance and is used for determining that the data volume is overlarge, and when the data volume of the historical data in the partition table is overlarge, a deleting function can be called in a mode of parallel processing of a plurality of threads; if the data amount of the historical data in the partition table is not large, any form of parallel, serial, sequential and concurrent processing of a single thread or multiple threads can be adopted.
It should be noted that, in the embodiment of the present application, the preset threshold value and the number of invoked threads are not specifically limited.
Therefore, the embodiment of the application can execute the historical data cleaning task in a multithreaded parallel manner, and further improves the efficiency of service processing.
Optionally, the method further comprises:
detecting the partition table after the refreshing operation to obtain a detection result, wherein the detection result is used for indicating whether the historical data is cleaned successfully or not;
and generating prompt information based on the detection result, and sending the prompt information to terminal equipment of a user.
In this step, a specific algorithm may be used to detect the table after deleting the history data to determine whether the history data is cleaned successfully, and the embodiment of the present application is not limited, and may refer to the existing technology; furthermore, when the historical data is determined to be not cleaned successfully, prompt information with abnormal content can be generated and used for reminding a user of what kind of abnormality occurs, and the content corresponding to the abnormality can be conveniently located.
It should be noted that, in the embodiment of the present application, the sending content and the sending form of the prompt message are not limited in particular, and the prompt message may be sent to the terminal device of the user, or may be displayed on the display device of the service processing system in the form of a display frame.
Therefore, the application can generate the promotion information for the condition of unsuccessful cleaning so as to remind the user to search the reason or clean the data again, thereby improving the accuracy of data cleaning.
In summary, fig. 4 is a schematic flow chart of a specific historical data cleaning method according to an embodiment of the present application; as shown in fig. 4, the method for cleaning historical data includes the following steps:
step A: and (3) starting a timing task, namely receiving a deleting instruction statement, further calling an entry function to analyze the deleting instruction statement, namely reading the data table configuration in a timing task implementation type major call function to obtain the range of the data table to be cleaned, judging whether the data table to be cleaned has a major key, if so, executing the step (B), and if not, executing the step (C).
And (B) step (B): and (3) performing online redefinition based on the primary key, calling a plurality of threads such as thread 1 and thread 2 to thread N to delete the historical partition, and performing refreshing operation.
Step C: on-line redefinition is performed based on the ROWID, a plurality of threads such as thread 1 and thread 2 to thread N are called to delete the historical partition, and refreshing operation is performed.
Therefore, the application can ensure the consistency of the data while cleaning the historical data, the data table to be cleaned can normally carry out DML operation in most of the online operation time of the business, and the operation carried out during the data cleaning can be fed back to the original data table, thereby having high availability. In addition, the method provided by the application has strong flexibility and can be applied to various service data tables.
In the foregoing embodiment, the history data cleaning method provided in the embodiment of the present application is described, and in order to implement each function in the method provided in the embodiment of the present application, the electronic device as the execution body may include a hardware structure and/or a software module, and each function may be implemented in the form of a hardware structure, a software module, or a hardware structure plus a software module. Some of the functions described above are performed in a hardware configuration, a software module, or a combination of hardware and software modules, depending on the specific application of the solution and design constraints.
For example, fig. 5 is a schematic structural diagram of a historical data cleaning device according to an embodiment of the present application, where the device includes: the device comprises a receiving module 510, a creating module 520, a first cleaning module 530 and a second cleaning module 540, wherein the receiving module 510 is used for receiving a deleting instruction statement and analyzing the deleting instruction statement to obtain history data to be processed; the history data to be processed includes: the name of a table where the history data to be processed is located, attribute information corresponding to the history data to be processed and processing execution time;
the creating module 520 is configured to create a partition table based on attribute information corresponding to the to-be-processed historical data, obtain a to-be-processed data table based on a name of the table where the to-be-processed historical data is located, and determine whether a primary key exists in the attribute information in the to-be-processed data table;
The first cleaning module 530 is configured to redefine the partition table based on a primary key when the attribute information in the data table to be processed has the primary key, clean historical data in the redefined partition table during the processing execution time, and perform a refresh operation on the cleaned partition table;
the second cleaning module 540 is configured to obtain an identification value in the data table to be processed when the attribute information in the data table to be processed does not have a primary key, redefine the partition table based on the identification value, clean historical data in the redefined partition table at the processing execution time, and perform a refresh operation on the cleaned partition table.
Optionally, the creating module 520 is specifically configured to:
judging whether the data table to be processed supports redefinition or not based on attribute information corresponding to the historical data to be processed;
if the data table to be processed is determined to support redefinition, locking the data table to be processed, and creating a partition table by taking attribute information corresponding to the historical data to be processed as a partition range;
and if the data table to be processed is determined not to support redefinition, generating error reporting information for feedback.
Optionally, the first cleaning module 530 includes a first cleaning unit and a first refreshing unit; the second cleaning module 540 includes a second cleaning unit and a second refreshing unit; the first refreshing unit or the second refreshing unit is used for:
refreshing the cleaned partition table, exchanging the name of the refreshed partition table and the name of the data table to be processed, and unlocking the data table to be processed.
Optionally, the device further includes an adjustment module, where the adjustment module is configured to:
and receiving an adjustment request, and adjusting the partition range based on the adjustment request.
Optionally, the history data to be processed further includes: processing execution frequency; the first cleaning unit or the second cleaning unit is used for:
acquiring a dimension relation corresponding to the data table to be processed, and creating at least one thread based on the dimension relation;
and calling a deleting function by the at least one thread, wherein the deleting function is used for cleaning historical data in the redefined partition table based on the processing execution frequency at the processing execution time.
Optionally, the first cleaning unit or the second cleaning unit is specifically configured to:
Calculating the data amount of the historical data in the redefined partition table by utilizing a predefined algorithm;
judging whether the data quantity exceeds a preset threshold value or not;
and under the condition that the data volume is determined to be larger than the preset threshold value, calling a deleting function by utilizing the at least one thread in a parallel processing mode.
Optionally, the device further includes a detection module, where the detection module is configured to:
detecting the partition table after the refreshing operation to obtain a detection result, wherein the detection result is used for indicating whether the historical data is cleaned successfully or not;
and generating prompt information based on the detection result, and sending the prompt information to terminal equipment of a user.
The specific implementation principle and effect of the historical data cleaning device provided by the embodiment of the application can be referred to the corresponding related description and effect of the above embodiment, and will not be repeated here.
The embodiment of the application also provides a schematic structural diagram of the electronic device, and fig. 6 is a schematic block diagram of the electronic device.
The electronic device 600 may be a mobile phone, computer, digital broadcast terminal, messaging device, game console, tablet device, medical device, exercise device, personal digital assistant, etc., that may be used to implement the methods described in the embodiments shown in fig. 2 or fig. 4 above.
As shown in fig. 6, the electronic device 600 may include at least one processor 610 for implementing the methods described in the embodiments shown in fig. 2 or fig. 4.
Illustratively, the processor 610 may be configured to receive a delete instruction statement and parse the delete instruction statement to obtain historical data to be processed; the history data to be processed includes: the name of a table where the history data to be processed is located, attribute information corresponding to the history data to be processed and processing execution time; creating a partition table based on attribute information corresponding to the historical data to be processed, acquiring a data table to be processed based on the name of the table where the historical data to be processed is located, and judging whether the attribute information in the data table to be processed has a primary key or not; if the partition table exists, redefining the partition table based on the main key, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table; if the data in the partition table does not exist, acquiring an identification value in the data table to be processed, redefining the partition table based on the identification value, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table. Reference is made specifically to the detailed description in the method examples, and details are not described here.
The electronic device 600 may also include at least one memory 620 that may be used to store program instructions and/or data. The memory 620 is configured to store various types of data to support operations at the electronic device 600. Examples of such data include instructions for any application or method operating on the electronic device 600, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 620 may be implemented by any type of volatile or non-volatile memory device or combination thereof, such as a static random access memory (static random access memory, SRAM), read-only memory (ROM), programmable ROM (PROM), erasable Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), magnetic memory, flash memory, magnetic disk, or optical disk.
Memory 620 may be coupled to processor 610. The coupling in the embodiments of the present application is an indirect coupling or communication connection between devices, units, or modules, which may be in electrical, mechanical, or other forms for information interaction between the devices, units, or modules. The processor 610 may operate in conjunction with the memory 620. The processor 610 may execute program instructions stored in the memory 620. At least one of the at least one memory may be included in the processor.
The electronic device 600 may also include a receiver 630 and a transmitter 640 for communicating with other devices over a transmission medium, such that the electronic device 600 may communicate with other devices. The processor 610 may receive the data and/or information using the receiver 630, transmit the data and/or information using the transmitter 640, and may be used to implement the methods described in the embodiments shown in fig. 2 or fig. 4.
The specific connection medium between the processor 610, the memory 620, the receiver 630, and the transmitter 640 is not limited in the embodiment of the present application.
In an exemplary embodiment, the electronic device 600 can be implemented by one or more application specific integrated circuits (application specific integrated circuit, ASIC), digital signal processor (digital signal processor, DSP), digital signal processing device (DSP device, DSPD), programmable logic device (programmable logic device, PLD), field programmable gate array (field programmable gate array, FPGA), controller, microcontroller, microprocessor, or other electronic element for performing the above-described methods.
In an exemplary embodiment, a non-transitory computer readable storage medium is also provided, such as a memory 620, including instructions executable by the processor 610 of the electronic device 600 to perform the above-described method. For example, the non-transitory computer readable storage medium may be ROM, random access memory (random access memory, RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
The present application also provides a computer program product comprising: a computer program (which may also be referred to as code, or instructions) which, when executed, causes a computer to perform the method as described in the embodiments shown in fig. 2 or fig. 4.
The present application also provides a computer-readable storage medium storing a computer program (which may also be referred to as code, or instructions). The computer program, when executed, causes a computer to perform the method as described in the embodiments shown in fig. 2 or fig. 4.
Those of ordinary skill in the art will appreciate that the various illustrative logical blocks (illustrative logical block) and steps (steps) described in connection with the embodiments disclosed herein can be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application. In the several embodiments provided by the present application, it should be understood that the disclosed apparatus, device and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, and for example, the division of the modules is merely a logical function division, and there may be additional divisions when actually implemented, for example, multiple modules or components may be combined or integrated into another system, or some features may be omitted or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or modules, which may be in electrical, mechanical, or other forms.
The modules described as separate components may or may not be physically separate, and components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional module in each embodiment of the present application may be integrated into one processing module, or each module may exist alone physically, or two or more modules may be integrated into one module.
In the above embodiments, the functions of the respective functional modules may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions (programs). When the computer program instructions (program) are loaded and executed on a computer, the processes or functions according to the embodiments of the present application are fully or partially produced. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in a computer-readable storage medium or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by a wired (e.g., coaxial cable, fiber optic, digital subscriber line (digital subscriber line, DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., a floppy disk, a hard disk, a magnetic tape), an optical medium (e.g., a digital versatile disk (digital video disc, DVD)), or a semiconductor medium (e.g., a Solid State Disk (SSD)), or the like.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (10)

1. A method of cleaning historical data, the method comprising:
receiving a deleting instruction statement, and analyzing the deleting instruction statement to obtain historical data to be processed; the history data to be processed includes: the name of a table where the history data to be processed is located, attribute information corresponding to the history data to be processed and processing execution time;
creating a partition table based on attribute information corresponding to the historical data to be processed, acquiring a data table to be processed based on the name of the table where the historical data to be processed is located, and judging whether the attribute information in the data table to be processed has a primary key or not;
if the partition table exists, redefining the partition table based on the main key, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table;
if the data in the partition table does not exist, acquiring an identification value in the data table to be processed, redefining the partition table based on the identification value, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table.
2. The method of claim 1, wherein creating a partition table based on attribute information corresponding to the history data to be processed comprises:
judging whether the data table to be processed supports redefinition or not based on attribute information corresponding to the historical data to be processed;
if the data table to be processed is determined to support redefinition, locking the data table to be processed, and creating a partition table by taking attribute information corresponding to the historical data to be processed as a partition range;
and if the data table to be processed is determined not to support redefinition, generating error reporting information for feedback.
3. The method of claim 2, wherein performing a refresh operation on the cleaned partition table comprises:
refreshing the cleaned partition table, exchanging the name of the refreshed partition table and the name of the data table to be processed, and unlocking the data table to be processed.
4. The method according to claim 2, wherein the method further comprises:
and receiving an adjustment request, and adjusting the partition range based on the adjustment request.
5. The method of claim 1, wherein the historical data to be processed further comprises: processing execution frequency; cleaning historical data within the redefined partition table at the process execution time, comprising:
Acquiring a dimension relation corresponding to the data table to be processed, and creating at least one thread based on the dimension relation;
and calling a deleting function by the at least one thread, wherein the deleting function is used for cleaning historical data in the redefined partition table based on the processing execution frequency at the processing execution time.
6. The method of claim 5, wherein invoking the delete function with the at least one thread comprises:
calculating the data amount of the historical data in the redefined partition table by utilizing a predefined algorithm;
judging whether the data quantity exceeds a preset threshold value or not;
and under the condition that the data volume is determined to be larger than the preset threshold value, calling a deleting function by utilizing the at least one thread in a parallel processing mode.
7. The method according to any one of claims 1-6, further comprising:
detecting the partition table after the refreshing operation to obtain a detection result, wherein the detection result is used for indicating whether the historical data is cleaned successfully or not;
and generating prompt information based on the detection result, and sending the prompt information to terminal equipment of a user.
8. A historical data cleaning apparatus, the apparatus comprising:
The receiving module is used for receiving the deleting instruction statement and analyzing the deleting instruction statement to obtain the history data to be processed; the history data to be processed includes: the name of a table where the history data to be processed is located, attribute information corresponding to the history data to be processed and processing execution time;
the creating module is used for creating a partition table based on the attribute information corresponding to the historical data to be processed, acquiring a data table to be processed based on the name of the table where the historical data to be processed is located, and judging whether the attribute information in the data table to be processed has a primary key or not;
the first cleaning module is used for redefining the partition table based on a main key when the attribute information in the data table to be processed has the main key, cleaning historical data in the redefined partition table in the processing execution time, and carrying out refreshing operation on the cleaned partition table;
and the second cleaning module is used for acquiring the identification value in the data table to be processed when the attribute information in the data table to be processed does not have a primary key, redefining the partition table based on the identification value, cleaning historical data in the redefined partition table in the processing execution time, and refreshing the cleaned partition table.
9. An electronic device, comprising: a processor, and a memory communicatively coupled to the processor;
the memory stores computer-executable instructions;
the processor executes computer-executable instructions stored in the memory to implement the method of any one of claims 1 to 7.
10. A computer readable storage medium having stored therein computer executable instructions which when executed by a processor are adapted to carry out the method of any one of claims 1 to 7.
CN202310667533.4A 2023-06-07 2023-06-07 Historical data cleaning method and device, electronic equipment and storage medium Pending CN116775626A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310667533.4A CN116775626A (en) 2023-06-07 2023-06-07 Historical data cleaning method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310667533.4A CN116775626A (en) 2023-06-07 2023-06-07 Historical data cleaning method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116775626A true CN116775626A (en) 2023-09-19

Family

ID=87990612

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310667533.4A Pending CN116775626A (en) 2023-06-07 2023-06-07 Historical data cleaning method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116775626A (en)

Similar Documents

Publication Publication Date Title
US11550769B2 (en) Data processing method, apparatus, and system
US8725730B2 (en) Responding to a query in a data processing system
US8185546B2 (en) Enhanced control to users to populate a cache in a database system
CN111414352B (en) Database information management method and device
EP3401807B1 (en) Synopsis based advanced partition elimination
CN107402821B (en) Access control method, device and equipment for shared resources
CN103810224A (en) Information persistence and query method and device
US10585896B2 (en) Managing data in relational database management system
CN113094387A (en) Data query method and device, electronic equipment and machine-readable storage medium
CN110309122B (en) Method, device, server and storage medium for obtaining incremental data
US9396218B2 (en) Database insert with deferred materialization
CN111723161A (en) Data processing method, device and equipment
US8909681B2 (en) Gap detection in a temporally unique index in a relational database
CN109614411B (en) Data storage method, device and storage medium
US10552406B2 (en) Maintaining object and query result consistency in a triplestore database
CN113568924A (en) Data processing method and device, electronic equipment and storage medium
US10387887B2 (en) Bloom filter driven data synchronization
CN116775626A (en) Historical data cleaning method and device, electronic equipment and storage medium
US9916373B2 (en) Dynamic data partitioning extension
CN113934729A (en) Data management method based on knowledge graph, related equipment and medium
CN113849524A (en) Data processing method and device
CN111159214A (en) API access method and device, electronic equipment and storage medium
US20240037100A1 (en) Detecting chains of functions that violate a constraint
US20230267102A1 (en) On-demand virtual storage access method analytics
CN116821143A (en) Data updating method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination