CN112286922A - Data cleaning method, device, equipment and storage medium - Google Patents

Data cleaning method, device, equipment and storage medium Download PDF

Info

Publication number
CN112286922A
CN112286922A CN202011186576.3A CN202011186576A CN112286922A CN 112286922 A CN112286922 A CN 112286922A CN 202011186576 A CN202011186576 A CN 202011186576A CN 112286922 A CN112286922 A CN 112286922A
Authority
CN
China
Prior art keywords
cleaning
data
cleaned
configuration task
judgment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011186576.3A
Other languages
Chinese (zh)
Inventor
王凯
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Property and Casualty Insurance Company of China Ltd
Original Assignee
Ping An Property and Casualty Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Property and Casualty Insurance Company of China Ltd filed Critical Ping An Property and Casualty Insurance Company of China Ltd
Priority to CN202011186576.3A priority Critical patent/CN112286922A/en
Publication of CN112286922A publication Critical patent/CN112286922A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/40ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the management of medical equipment or devices, e.g. scheduling maintenance or upgrades

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Business, Economics & Management (AREA)
  • Biomedical Technology (AREA)
  • General Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Quality & Reliability (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Epidemiology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a data cleaning method, a device, equipment and a storage medium, wherein the data cleaning method comprises the following steps: generating a preset configuration task cleaning table according to the configuration task; detecting a configuration task to be cleaned in a preset configuration task cleaning table; data cleaning is carried out on the data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence; because the configuration task in the medical system caches a larger data volume after being used, the data volume can be reduced after the data is cleaned, and the form query performance corresponding to the configuration task is greatly improved; the method can reduce the unsmoothness and instability of the medical system in query and use, and ensure the smoothness and stability of the medical system; the failure data and the garbage data are prevented from being stored in the medical system, and the validity of the data stored in the medical system is improved.

Description

Data cleaning method, device, equipment and storage medium
Technical Field
The invention relates to the technical field of digital medical treatment, in particular to a data cleaning method, a data cleaning device, data cleaning equipment and a storage medium.
Background
Data cleansing generates a portion of a data warehouse process with data from multiple online transaction processing (OLTP) systems. Spelling, conflicting spelling rules between the two systems, and conflicting data (e.g., having two numbers for the same part). The data cleaning work aims to prevent data with errors or problems from entering the operation process and is generally completed with the help of a computer, and the data cleaning work comprises the cleaning of a data effective range, the cleaning of data logic consistency and the spot check of data quality.
The existing table data cleaning is manually processed, and batch-by-batch manual data deletion and backup are performed manually. If the table with larger comparison historical data amount is involved and the function with larger influence on the business function is needed, the service stopping operation is needed to be carried out firstly. If the amount of the cleaned data is too large, only the effective data can be exported, and then the whole table is cleaned and then imported. The process is complicated and labor intensive.
The existing defects are as follows: due to the fact that data generated by the commonalities of all modules of the medical system are excessive (such as data generated by operations of message pushing, dialing recording, insurance policy detail, price enquiry, transaction and the like), a lot of historical generated data can not be used any more (such as three months ago, half a year ago and one year ago), a large amount of storage space of a database is occupied, the performance of data enquiry is reduced due to the fact that the data amount in a table is too large, and the usability of service is reduced as a whole. At present, the medical system cannot automatically backup and clean data which do not need to be processed.
Disclosure of Invention
In view of the above, embodiments of the present invention are proposed in order to provide a data scrubbing method, apparatus, device and storage medium that overcome or at least partially solve the above problems.
In order to solve the above problem, an embodiment of the present invention discloses a data cleaning method for cleaning data of a medical system, including:
generating a preset configuration task cleaning table according to the configuration task;
detecting a configuration task to be cleaned in a preset configuration task cleaning table;
data cleaning is carried out on the data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence;
further, the generating a preset configuration task cleaning table according to the configuration task includes:
acquiring a configuration task and corresponding data information in the configuration task; the data information at least comprises a table name, duration, cleaning time, cleaning logic, backup requirement, backup table name and validity;
generating a preset configuration task cleaning table according to the configuration task and the data information;
further, the detecting the configuration tasks to be cleaned in the preset configuration task cleaning table includes:
presetting scanning time;
scanning the configuration task cleaning table according to the preset scanning time to obtain a scanning result;
obtaining a configuration task to be cleaned according to the scanning result;
further, the data in the configuration task to be cleaned is cleaned in a multithreading mode and a preset cleaning rule flow; wherein, the clearance rule flow includes clearance condition judgement, clearance logic judgement, clearance data judgement and data backup judgement according to the order, includes:
determining the number of the configuration tasks to be cleaned;
when the number of the configuration tasks to be cleaned is N, establishing N corresponding independent threads, and synchronously cleaning data of the N configuration tasks to be cleaned according to a preset cleaning rule flow, wherein N is more than or equal to 1;
further, when the number of the configuration tasks to be cleaned is N, N corresponding independent threads are established, and data cleaning is performed on the N configuration tasks to be cleaned synchronously according to a preset cleaning rule flow, where N is 1, the method includes:
carrying out condition judgment on the to-be-cleaned configuration task according to a preset cleaning rule flow;
when the configuration task to be cleaned can meet the condition judgment in the preset cleaning rule flow, performing data cleaning on the configuration task to be cleaned;
further, when the to-be-cleaned configuration task can meet the condition judgment in the preset cleaning rule flow, performing data cleaning on the to-be-cleaned configuration task, where the cleaning rule flow sequentially includes cleaning condition judgment, cleaning logic judgment, cleaning data judgment, and data backup judgment, and includes:
judging whether the cleaning time in the configuration task to be cleaned meets the cleaning condition, if so, finishing the judgment of the cleaning condition;
detecting whether a cleaning logic is configured in the configuration task to be cleaned, and if the cleaning logic is configured, screening data according to the cleaning logic to finish the judgment of the cleaning logic;
judging whether paging inquiry is carried out according to the cleaning duration in the configuration task to be cleaned to judge whether data needs to be cleaned, and if so, finishing the judgment of the cleaning data;
judging whether data backup is needed in the configuration task to be cleaned, if so, backing up the data needed to be backed up and deleting the data from the configuration task cleaning table to finish data backup judgment;
and performing data cleaning on the configuration task to be cleaned according to the completion of the cleaning rule flow.
The embodiment of the invention also discloses a data cleaning device, which comprises:
the generating module is used for generating a preset configuration task cleaning table according to the configuration tasks;
the detection module is used for detecting the configuration tasks to be cleaned in the preset configuration task cleaning table;
the data cleaning module is used for cleaning data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence.
The embodiment of the present invention also discloses a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the program to implement the method described in any one of the above embodiments.
The embodiment of the invention also discloses a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and when the program is executed by a processor, the steps of the method are realized.
The embodiment of the invention has the following advantages: presetting a configuration task cleaning table; generating a preset configuration task cleaning table according to the configuration task; detecting a configuration task to be cleaned in a preset configuration task cleaning table; data cleaning is carried out on the data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence; when the medical system is applied in a networking mode, a configuration task in the medical system caches a large amount of data after being used, the amount of data can be reduced after the data are cleaned, and the form query performance corresponding to the configuration task is greatly improved; the method can avoid the unsmoothness and instability of the medical system in query and use due to large data volume, and ensure the smoothness and stability of the medical system; the failure data and the garbage data can be prevented from being stored in the medical system, and the validity of the data stored in the medical system is improved.
Drawings
FIG. 1 is a flow chart of steps of a first embodiment of a data cleansing method according to the present invention;
FIG. 2 is a flowchart illustrating steps of a second embodiment of a data cleansing method according to the present invention;
FIG. 3 is a flow chart of the third step of a data cleaning method according to a third embodiment of the present invention;
FIG. 4 is a flowchart illustrating a fourth step of a data cleansing method according to a fourth embodiment of the present invention;
FIG. 5 is a flow chart of the fifth step of an embodiment of the data cleansing method of the present invention;
FIG. 6 is a flowchart illustrating steps of a sixth embodiment of a data cleansing method;
FIG. 7 is a block diagram of an embodiment of a data cleansing apparatus according to the present invention;
FIG. 8 illustrates a computer device of a data cleansing method of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
One of the core concepts of the embodiment of the invention is that a preset configuration task cleaning table is generated according to a configuration task; detecting a configuration task to be cleaned in a preset configuration task cleaning table; data cleaning is carried out on the data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence; when the medical system is applied in a networking mode, a configuration task in the medical system caches a large amount of data after being used, the amount of data can be reduced after the data are cleaned, and the form query performance corresponding to the configuration task is greatly improved; the method can avoid the unsmoothness and instability of the medical system in query and use due to large data volume, and ensure the smoothness and stability of the medical system; the failure data and the garbage data can be prevented from being stored in the medical system, and the validity of the data stored in the medical system is improved.
Referring to fig. 1, a flowchart illustrating steps of a first embodiment of a data cleansing method according to the present invention is shown, for cleansing data of a medical system, and specifically may include the following steps:
step S10, generating a preset configuration task cleaning table according to the configuration task;
step S20, detecting the configuration tasks to be cleaned in the preset configuration task cleaning table;
step S30, data in the configuration task to be cleaned is cleaned by adopting a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence.
In the embodiment of the present invention, a preset configuration task cleaning table is generated according to a configuration task, where the configuration task is based on a basic task associated with a medical system, such as: message pushing, record dialing, policy statement, price inquiry, transaction and the like; all the configuration tasks in a preset configuration task cleaning table have corresponding configuration task data information; when configuration tasks needing to be cleaned exist, corresponding configuration tasks and data information of the configuration tasks can be added into a preset configuration task cleaning table; detecting the configuration tasks to be cleaned in the preset configuration task cleaning table, obtaining the configuration tasks to be cleaned through detection, and not performing data cleaning on the configuration tasks which do not need to be cleaned; data cleaning is carried out on the data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence; and performing data cleaning on the configuration task to be cleaned in a multithreading mode by combining a preset cleaning rule flow, performing data cleaning on the data in the configuration task to be cleaned when the preset cleaning rule flow is met, and finishing the data cleaning of the configuration task to be cleaned when the preset cleaning rule flow is not met.
In the embodiment, based on the fact that when the medical system is applied in a networking mode, a configuration task in the medical system caches a large amount of data after being used, the amount of data can be reduced after the data are cleaned, and the form query performance corresponding to the configuration task is greatly improved; the method can avoid the unsmoothness and instability of the medical system in query and use due to large data volume, and ensure the smoothness and stability of the medical system; the failure data and the garbage data can be prevented from being stored in the medical system, and the validity of the data stored in the medical system is improved.
When a new configuration task needs to be subjected to data cleaning, the configuration task and data contained in the configuration task are correspondingly added into the configuration task cleaning table, the configuration task is newly added into the configuration task cleaning table, the data cleaning can be performed only by adding the configuration task and the corresponding data into the configuration task cleaning table, manual operation and re-development can be avoided, and labor cost and development cost waste are avoided.
Referring to fig. 2, a flowchart illustrating steps of a second embodiment of a data cleaning method according to the present invention is shown, where the generating a preset configuration task cleaning table according to a configuration task specifically includes the following steps:
step S101, acquiring a configuration task and corresponding data information in the configuration task; the data information at least comprises a table name, duration, cleaning time, cleaning logic, backup requirement, backup table name and validity;
and step S102, generating a preset configuration task cleaning table according to the configuration task and the data information.
In the embodiment of the invention, corresponding data information in the configuration task is acquired, wherein the data information at least comprises the corresponding data information in the configuration task; the data information at least comprises table names, duration, cleaning time, cleaning logic, whether backup is needed, table names are backed up, whether the data information is valid and other related field information; and forming a preset configuration task cleaning table by the configuration tasks and the corresponding data information, and cleaning data according to the preset configuration task cleaning table.
Referring to fig. 3, a flowchart illustrating a third step of an embodiment of a data cleaning method according to the present invention is shown, where the detecting a configuration task to be cleaned in a preset configuration task cleaning table specifically includes the following steps:
step S201, presetting scanning time;
step S202, scanning the configuration task cleaning table according to the preset scanning time to obtain a scanning result;
and step S203, obtaining a configuration task to be cleaned according to the scanning result.
In the embodiment of the invention, after the preset configuration task cleaning table, a preset scanning time is added, the preset scanning time is used for regularly scanning and inquiring the configuration task cleaning table, and whether the configuration task to be cleaned exists or not is detected by the set scanning time; by means of timed scanning detection, data cached by a configuration task to be cleaned in the medical system can be cleaned in time, and the problems that garbage data and invalid data are increased and the storage bearing capacity of the medical system is reduced due to data redundancy and low data timeliness are solved; the timeliness of the data in the medical system is improved, and the load of the storage module of the medical system is reduced.
Referring to fig. 4, a flowchart illustrating a fourth step of the data cleaning method according to the fourth embodiment of the present invention is shown, where the acquiring of the data in the configuration task for performing the multi-thread data cleaning specifically includes the following steps:
step S301, determining the number of the configuration tasks to be cleaned;
step S302, when the number of the configuration tasks to be cleaned is N, corresponding N independent threads are established, and data cleaning is synchronously performed on the N configuration tasks to be cleaned according to a preset cleaning rule flow, wherein N is larger than or equal to 1.
In the embodiment of the invention, the specific number of the configuration tasks to be cleaned in the configuration task cleaning table is determined, the independent threads with the number corresponding to the configuration tasks to be cleaned are established, and the data cleaning is synchronously carried out on the configuration tasks to be cleaned through the independent threads and the preset cleaning rule flow; if only one configuration task to be cleaned needs data cleaning, establishing and matching an independent thread for the configuration task to be cleaned, and performing data cleaning on the configuration task to be cleaned based on the independent thread; if N configuration tasks to be cleaned need data cleaning, N independent threads are established and matched according to the number of the corresponding configuration tasks to be cleaned, and the data cleaning of the N configuration tasks to be cleaned is correspondingly processed through the N independent threads and is synchronous processing; the method has the advantages that the independent threads can be constructed in a matched mode to perform data cleaning on the configuration tasks to be cleaned according to the number of the configuration tasks to be cleaned, one-to-one mode, namely the mode of the single independent threads to the single configuration tasks to be cleaned, data cleaning of the configuration tasks to be cleaned is realized, one-to-one processing effect can be still realized when the plurality of configuration tasks to be cleaned need data cleaning, synchronous processing can be realized, data cleaning efficiency is improved, meanwhile, errors in data cleaning can be avoided, and data cleaning of other configuration tasks or the configuration tasks which do not need cleaning is avoided.
Referring to fig. 5, a flowchart of a fifth step of an embodiment of the data cleaning method according to the present invention is shown, where when the number of the configuration tasks to be cleaned is N, N corresponding independent threads are established, and data cleaning is performed on the N configuration tasks to be cleaned synchronously according to a preset cleaning rule flow, where N is 1, the method specifically includes the following steps:
step S3021, performing condition judgment on the configuration task to be cleaned according to a preset cleaning rule flow;
step S3022, when the to-be-cleaned configuration task can meet the condition judgment in the preset cleaning rule flow, performing data cleaning on the to-be-cleaned configuration task.
In the embodiment of the invention, when only one configuration task to be cleaned needs data cleaning, an independent thread corresponding to the configuration task to be cleaned is constructed, condition judgment is carried out in the independent thread according to a preset cleaning rule flow, and when the configuration task to be cleaned can meet the condition judgment in the preset cleaning rule flow, data cleaning is carried out on the configuration task to be cleaned; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence; only after the configuration task to be cleaned meets the cleaning rule flows arranged in sequence, the data can be cleaned after all the configuration tasks are met; if any of the items is not satisfied, the data cleansing is ended in the item that is not satisfied.
In a preferred embodiment of the present invention, when N to-be-cleaned configuration tasks require data cleaning, data cleaning is performed through N independent threads corresponding to the to-be-cleaned configuration tasks, wherein no matter one to-be-cleaned configuration task and N to-be-cleaned configuration tasks are performed, condition judgments performed by preset cleaning rule flows are all consistent, that is, the cleaning rule flows sequentially include cleaning condition judgments, cleaning logic judgments, cleaning data judgments, and data backup judgments, and the cleaning rule flows are not changed, but only the data cleaning of the to-be-cleaned configuration tasks is specifically processed through the independent threads; therefore, the data cleaning of the configuration task to be cleaned can be carried out in a multi-thread mode by combining with the preset cleaning rule flow.
Referring to fig. 6, a flowchart of a sixth step of a data cleaning method embodiment of the present invention is shown, where when the configuration task to be cleaned can meet condition judgment in the preset cleaning rule flow, data cleaning is performed on the configuration task to be cleaned, where the cleaning rule flow sequentially includes cleaning condition judgment, cleaning logic judgment, cleaning data judgment, and data backup judgment, and specifically may include the following steps:
step S30221, judging whether the cleaning time in the configuration task to be cleaned meets the cleaning condition, and if so, finishing the judgment of the cleaning condition;
step S30222, detecting whether a cleaning logic is configured in the configuration task to be cleaned, and if so, screening data according to the cleaning logic to complete the judgment of the cleaning logic;
step S30223, judging whether the data in the configuration task to be cleaned needs to be cleaned or not by performing paging query according to the cleaning duration, and if so, finishing the judgment of cleaning the data;
step S30224, judging whether data backup is needed in the configuration task to be cleaned, if so, backing up the data to be backed up and deleting the data from the configuration task cleaning table to complete data backup judgment;
step S30225, performing data cleaning on the configuration task to be cleaned according to the completion of the cleaning rule flow.
In the embodiment of the invention, condition adaptation or condition judgment is carried out on the configuration task to be cleaned through the sequence of cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment, and the data cleaning of the configuration task to be cleaned can be carried out only when the sequence is required to be met;
firstly, judging whether a cleaning condition is met or not according to the cleaning time of a configuration task to be cleaned, if so, finishing the judgment of the cleaning condition, carrying out the next operation, and if not, finishing the data cleaning task;
then judging whether a cleaning logic exists in the configuration task to be cleaned, if so, wherein the cleaning logic comprises the following steps: screening according to fields, screening according to modification time, screening according to table partitions and the like, screening data through cleaning logic, finishing cleaning logic judgment, and carrying out the next operation;
and then, paging inquiry is carried out according to the cleaning duration in the configuration task to be cleaned, wherein paging is a technology for displaying all data to a user in a segmented manner, the user only sees one part of all data, and the user can jump by page numbers or page turning to know to find the content which the user wants to see. Whether data is cleared or not can be found in clearing time through paging inquiry, meanwhile, inquiry time is saved, page-by-page inquiry, time cost waste and increase of loads of a medical system and a server are avoided, whether data in a configuration task need to be cleared or not is judged, if no data exists, the data clearing task is finished, if yes, data clearing judgment is finished, and next operation is carried out;
finally, judging whether the configuration task to be cleaned needs to be backed up by data or not, if the configuration task needs to be backed up by data, backing up the data needing to be backed up in the configuration task, adding the backed-up data into a backup table, and deleting the data from the configuration task cleaning table, thereby finishing data backup judgment; and if the data backup is not needed, the whole cleaning rule process is finished.
According to the above, the configuration task to be cleaned is subjected to condition judgment, so that the configuration task to be cleaned can meet the cleaning rule flow, and then the configuration task to be cleaned is subjected to data cleaning; the cleaning and backup can be dynamically carried out according to the data which is not needed by the medical system; excessive garbage data do not need to be stored in the medical system; the convenience of subsequently extracting the data corresponding to the configuration task in the medical system is faster and more accurate.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
Referring to fig. 7, a block diagram of a data cleansing apparatus according to an embodiment of the present invention is shown, which may specifically include the following modules:
a generating module 1001, configured to generate a preset configuration task cleaning table according to a configuration task;
the detection module 1002 is configured to detect a configuration task to be cleaned in the preset configuration task cleaning table;
the data cleaning module 1003 is configured to perform data cleaning on the data in the configuration task to be cleaned by using a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence.
In a preferred embodiment of the present invention, the generating module 1001 is configured to generate a preset configuration task cleaning table according to a configuration task, and includes:
the acquisition unit is used for acquiring the configuration task and corresponding data information in the configuration task; the data information at least comprises a table name, duration, cleaning time, cleaning logic, backup requirement, backup table name and validity;
and the generating unit is used for generating a preset configuration task cleaning table according to the configuration task and the data information.
In a preferred embodiment of the present invention, the detecting module 1002 is configured to detect a configuration task to be cleaned in the preset configuration task cleaning table, and includes:
a scanning time unit for presetting scanning time;
the scanning result unit is used for scanning the configuration task cleaning table according to the preset scanning time to obtain a scanning result;
and the configuration task unit is used for obtaining a configuration task to be cleaned according to the scanning result.
In a preferred embodiment of the present invention, the data cleaning module 1003 is configured to perform data cleaning on data in the configuration task to be cleaned by using a multithreading manner and a preset cleaning rule flow; wherein, the clearance rule flow includes clearance condition judgement, clearance logic judgement, clearance data judgement and data backup judgement according to the order, includes:
the determining unit is used for determining the number of the configuration tasks to be cleaned;
and the multithread cleaning unit is used for establishing N corresponding independent threads when the number of the configuration tasks to be cleaned is N, and synchronously cleaning data of the N configuration tasks to be cleaned according to a preset cleaning rule flow, wherein N is more than or equal to 1.
In a preferred embodiment of the present invention, the multithread cleaning unit is configured to, when the number of the configuration tasks to be cleaned is N, establish corresponding N independent threads, and perform data cleaning on the N configuration tasks to be cleaned synchronously according to a preset cleaning rule flow, where N is 1, the multithread cleaning unit includes:
a cleaning rule flow unit, configured to perform condition judgment on the to-be-cleaned configuration task according to a preset cleaning rule flow;
and the execution unit is used for executing data cleaning on the configuration task to be cleaned when the configuration task to be cleaned can meet the condition judgment in the preset cleaning rule flow.
In a preferred embodiment of the present invention, the executing unit is configured to execute data cleaning on the to-be-cleaned configuration task when the to-be-cleaned configuration task can meet condition judgment in the preset cleaning rule flow, where the cleaning rule flow sequentially includes cleaning condition judgment, cleaning logic judgment, cleaning data judgment, and data backup judgment, and includes:
a cleaning condition judgment unit, configured to judge whether cleaning time in the configuration task to be cleaned satisfies a cleaning condition, and if so, finish the judgment of the cleaning condition;
a cleaning logic judgment unit, configured to detect whether a cleaning logic is configured in the configuration task to be cleaned, and if the cleaning logic is configured, perform data screening according to the cleaning logic to complete the cleaning logic judgment;
the cleaning data judgment unit is used for judging whether paging inquiry is carried out in the configuration task to be cleaned according to the cleaning duration so as to judge whether data needs cleaning, and if so, finishing the judgment of cleaning data;
the data backup judging unit is used for judging whether data backup is needed in the configuration task to be cleaned, if the data backup is needed, the data needing to be backed up are backed up and deleted from the configuration task cleaning table, and the data backup judgment is completed;
and the data cleaning subunit is used for cleaning the data of the configuration task to be cleaned according to the completion of the cleaning rule flow.
Referring to fig. 8, in an embodiment of the present invention, the present invention further provides a computer device, where the computer device 12 is represented in a form of a general-purpose computing device, and components of the computer device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus 18 structures, including a memory bus 18 or memory controller, a peripheral bus 18, an accelerated graphics terminal port, a processor, or a local bus 18 using any of a variety of bus 18 structures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus 18, micro-channel architecture (MAC) bus 18, enhanced ISA bus 18, audio Video Electronics Standards Association (VESA) local bus 18, and Peripheral Component Interconnect (PCI) bus 18.
Computer device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)31 and/or cache memory 32. Computer device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (commonly referred to as "hard drives"). Although not shown in FIG. 8, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. The memory may include at least one program product having a set (e.g., at least one) of program modules 42, with the program modules 42 configured to carry out the functions of embodiments of the invention.
A program/utility 41 having a set (at least one) of program modules 42 may be stored, for example, in memory, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules 42, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Computer device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, camera, etc.), with one or more devices that enable a user to interact with computer device 12, and/or with any devices (e.g., network card, modem, etc.) that enable computer device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. Also, computer device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN)), a Wide Area Network (WAN), and/or a public network (e.g., the Internet) via network adapter 20. As shown, the network adapter 21 communicates with the other modules of the computer device 12 via the bus 18. It should be appreciated that although not shown in FIG. 8, other hardware and/or software modules may be used in conjunction with computer device 12, including but not limited to: microcode, device drivers, redundant processing units 16, external disk drive arrays, RAID systems, tape drives, and data backup storage systems 34, etc.
The processing unit 16 executes various functional applications and data processing, such as implementing a data cleaning method provided by an embodiment of the present invention, by executing programs stored in the system memory 28.
That is, the processing unit 16 implements, when executing the program: generating a preset configuration task cleaning table according to the configuration task; detecting a configuration task to be cleaned in a preset configuration task cleaning table; data cleaning is carried out on the data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence.
In an embodiment of the present invention, the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor, implements the data cleaning method as provided in all embodiments of the present application.
That is, the program when executed by the processor implements: generating a preset configuration task cleaning table according to the configuration task; detecting a configuration task to be cleaned in a preset configuration task cleaning table; data cleaning is carried out on the data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence.
Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer-readable storage medium or a computer-readable signal medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPOM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The data cleaning method, device, apparatus and storage medium provided by the present invention are described in detail above, and a specific example is applied in the text to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (9)

1. A data cleaning method for cleaning data of a medical system is characterized by comprising the following steps:
generating a preset configuration task cleaning table according to the configuration task;
detecting a configuration task to be cleaned in a preset configuration task cleaning table;
data cleaning is carried out on the data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence.
2. The method according to claim 1, wherein the generating a preset configuration task cleaning table according to the configuration task comprises:
acquiring a configuration task and corresponding data information in the configuration task; the data information at least comprises a table name, duration, cleaning time, cleaning logic, backup requirement, backup table name and validity;
and generating a preset configuration task cleaning table according to the configuration task and the data information.
3. The method according to claim 1, wherein the detecting the configuration task to be cleaned in the preset configuration task cleaning table comprises:
presetting scanning time;
scanning the configuration task cleaning table according to the preset scanning time to obtain a scanning result;
and obtaining a configuration task to be cleaned according to the scanning result.
4. The method according to claim 1, wherein data cleaning is performed on the data in the configuration task to be cleaned in a multithreading manner and a preset cleaning rule flow; wherein, the clearance rule flow includes clearance condition judgement, clearance logic judgement, clearance data judgement and data backup judgement according to the order, includes:
determining the number of the configuration tasks to be cleaned;
and when the number of the configuration tasks to be cleaned is N, establishing N corresponding independent threads, and synchronously cleaning data of the N configuration tasks to be cleaned according to a preset cleaning rule flow, wherein N is more than or equal to 1.
5. The method according to claim 4, wherein when the number of the configuration tasks to be cleaned is N, the corresponding N independent threads are established, and data cleaning is performed on the N configuration tasks to be cleaned synchronously according to a preset cleaning rule flow, where N is 1, the method includes:
carrying out condition judgment on the to-be-cleaned configuration task according to a preset cleaning rule flow;
and when the configuration task to be cleaned can meet the condition judgment in the preset cleaning rule flow, performing data cleaning on the configuration task to be cleaned.
6. The method according to claim 5, wherein when the configuration task to be cleaned can meet the condition judgment in the preset cleaning rule flow, performing data cleaning on the configuration task to be cleaned, wherein the cleaning rule flow sequentially includes a cleaning condition judgment, a cleaning logic judgment, a cleaning data judgment and a data backup judgment, and includes:
judging whether the cleaning time in the configuration task to be cleaned meets the cleaning condition, if so, finishing the judgment of the cleaning condition;
detecting whether a cleaning logic is configured in the configuration task to be cleaned, and if the cleaning logic is configured, screening data according to the cleaning logic to finish the judgment of the cleaning logic;
judging whether paging inquiry is carried out according to the cleaning duration in the configuration task to be cleaned to judge whether data needs to be cleaned, and if so, finishing the judgment of the cleaning data;
judging whether data backup is needed in the configuration task to be cleaned, if so, backing up the data needed to be backed up and deleting the data from the configuration task cleaning table to finish data backup judgment;
and performing data cleaning on the configuration task to be cleaned according to the completion of the cleaning rule flow.
7. A data cleansing apparatus, comprising:
the generating module is used for generating a preset configuration task cleaning table according to the configuration tasks;
the detection module is used for detecting the configuration tasks to be cleaned in the preset configuration task cleaning table;
the data cleaning module is used for cleaning data in the configuration task to be cleaned in a multithreading mode and a preset cleaning rule flow; the cleaning rule process comprises cleaning condition judgment, cleaning logic judgment, cleaning data judgment and data backup judgment in sequence.
8. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the method according to any one of claims 1 to 6 when executing the program.
9. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method of any one of claims 1 to 6.
CN202011186576.3A 2020-10-29 2020-10-29 Data cleaning method, device, equipment and storage medium Pending CN112286922A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011186576.3A CN112286922A (en) 2020-10-29 2020-10-29 Data cleaning method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011186576.3A CN112286922A (en) 2020-10-29 2020-10-29 Data cleaning method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112286922A true CN112286922A (en) 2021-01-29

Family

ID=74353019

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011186576.3A Pending CN112286922A (en) 2020-10-29 2020-10-29 Data cleaning method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112286922A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115617788A (en) * 2022-12-20 2023-01-17 广州嘉为科技有限公司 Product cleaning method and device and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760487A (en) * 2016-02-17 2016-07-13 中国工商银行股份有限公司 Historical data cleaning method and device
CN106528840A (en) * 2016-11-11 2017-03-22 中国银行股份有限公司 Service data clearing method and system based on banking system
CN107291804A (en) * 2017-05-15 2017-10-24 努比亚技术有限公司 Method, equipment and the computer-readable recording medium of data scrubbing
CN108829782A (en) * 2018-05-31 2018-11-16 平安科技(深圳)有限公司 data table cleaning method, server and computer readable storage medium
CN109885565A (en) * 2019-02-14 2019-06-14 中国银行股份有限公司 A kind of tables of data method for cleaning and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105760487A (en) * 2016-02-17 2016-07-13 中国工商银行股份有限公司 Historical data cleaning method and device
CN106528840A (en) * 2016-11-11 2017-03-22 中国银行股份有限公司 Service data clearing method and system based on banking system
CN107291804A (en) * 2017-05-15 2017-10-24 努比亚技术有限公司 Method, equipment and the computer-readable recording medium of data scrubbing
CN108829782A (en) * 2018-05-31 2018-11-16 平安科技(深圳)有限公司 data table cleaning method, server and computer readable storage medium
CN109885565A (en) * 2019-02-14 2019-06-14 中国银行股份有限公司 A kind of tables of data method for cleaning and device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115617788A (en) * 2022-12-20 2023-01-17 广州嘉为科技有限公司 Product cleaning method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN102129425B (en) The access method of big object set table and device in data warehouse
CN113238843B (en) Task execution method, device, equipment and storage medium
CN114416667A (en) Method and device for rapidly sharing network disk file, network disk and storage medium
CN110990445A (en) Data processing method, device, equipment and medium
US11704216B2 (en) Dynamically adjusting statistics collection time in a database management system
CN112286922A (en) Data cleaning method, device, equipment and storage medium
CN113064895B (en) Incremental updating method, device and system for map
US8818960B2 (en) Tracking redo completion at a page level
US20140025373A1 (en) Fixing Broken Tagged Words
CN112131248B (en) Data analysis method, device, equipment and storage medium
US8234287B2 (en) Sorting records based on free text field content
CN110377891B (en) Method, device and equipment for generating event analysis article and computer readable storage medium
WO2018057401A1 (en) Preserve input focus while scrolling in a virtualized dataset
JP2020123321A (en) Method and apparatus for search processing based on clipboard data
CN111262727B (en) Service capacity expansion method, device, equipment and storage medium
US20180113920A1 (en) Recursive extractor framework for forensics and electronic discovery
CN112364268A (en) Resource acquisition method and device, electronic equipment and storage medium
CN112818204A (en) Service processing method, device, equipment and storage medium
US8495033B2 (en) Data processing
CN110795470A (en) Associated data acquisition method, device, equipment and storage medium
US20160042022A1 (en) Data coordination support apparatus and data coordination support method
CN110750569A (en) Data extraction method, device, equipment and storage medium
CN111209286A (en) Data calling method and system
CN110389862B (en) Data storage method, device, equipment and storage medium
CN112905224B (en) Time-consuming determination method, device and equipment for code review

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination