CN110928863A - Method for task breakpoint resume applied to data cleaning tool - Google Patents

Method for task breakpoint resume applied to data cleaning tool Download PDF

Info

Publication number
CN110928863A
CN110928863A CN201911141715.8A CN201911141715A CN110928863A CN 110928863 A CN110928863 A CN 110928863A CN 201911141715 A CN201911141715 A CN 201911141715A CN 110928863 A CN110928863 A CN 110928863A
Authority
CN
China
Prior art keywords
data
breakpoint
task
source
marking
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911141715.8A
Other languages
Chinese (zh)
Inventor
纪峥嵘
刘军
叶庆楚
陈博文
吴永佳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Shiling Technology Co ltd
Original Assignee
Wuxi Shiling Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Shiling Technology Co ltd filed Critical Wuxi Shiling Technology Co ltd
Priority to CN201911141715.8A priority Critical patent/CN110928863A/en
Publication of CN110928863A publication Critical patent/CN110928863A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Quality & Reliability (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention relates to the technical field of data processing, and discloses a method for task breakpoint resume applied to a data cleaning tool, which comprises the following steps: (1) extracting target source data, and splitting source data breakpoints into data source chunks; (2) after an abnormal problem occurs in a processing task and when the task needs to be restarted, inquiring according to a source data breakpoint grouping mark table and positioning a latest marked breakpoint which is in an unprocessed state; (3) acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, and continuing to complete a cleaning task; (4) and when all arrays in the source data breakpoint grouping mark table are in a processed state, the task execution is completed. The invention processes the cleaning data by sections in a mode of splitting, grouping and marking breakpoints of the source service data, and can continue to complete the rest tasks from the points of abnormal interruption after the cleaning tasks are interrupted abnormally and the breakpoints are compared.

Description

Method for task breakpoint resume applied to data cleaning tool
Technical Field
The invention relates to the technical field of data processing, in particular to a method for task breakpoint resume applied to a data cleaning tool.
Background
At present, with the development of medical informatization, the construction of a hospital information integration platform is widely developed, and an ETL data cleaning tool contained in the hospital information integration platform is mainly used for creating a data center of a whole hospital and realizing an independent data warehouse. ETL is a process of data extraction (Extract), transformation (Transform), and loading (Load), is a core and soul of BI/DW (business intelligence/data warehouse), and is an important ring for constructing a data center. A user extracts required data from a data source, and finally loads the data into a data center according to a predefined data center model after data conversion. The business data of medical business systems HIS, LIS, PACS, EMR and the like are extracted to a business data layer of a data center through ETL, source data of the business data layer of the data center are extracted to a standard layer of the data center after being cleaned and converted in a standardized way, and the data are further extracted and converted and loaded to data application layers such as a data warehouse, a clinical knowledge base, an index base and the like through further integration of the data by the standard layer of the data center. However, during the execution of the ETL task, the task may be interrupted due to external reasons such as abnormal background program, unstable network, and power failure of the server. Therefore, quick and timely cleaning of business data is very important for management of medical data.
Disclosure of Invention
Aiming at the defects of low business data cleaning speed and unstable operation process in the prior art, the invention provides a method for task breakpoint continuous transmission applied to a data cleaning tool.
In order to solve the above technical problems, the present invention is solved by the following technical solutions.
A method for task breakpoint resume applied to a data cleaning tool comprises the following steps:
(1) extracting target source data, splitting source data breakpoints into data source chunks, grouping the data breakpoints in sequence to form set arrays, marking the array numbers, generating a source data breakpoint grouping marking table, marking each array as an unprocessed state, and marking the array as a processed state after processing tasks;
(2) after an abnormal problem occurs in a processing task and when the task needs to be restarted, inquiring according to a source data breakpoint grouping mark table and positioning a latest marked breakpoint which is in an unprocessed state;
(3) acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, and continuing to complete a cleaning task;
(4) and when all arrays in the source data breakpoint grouping mark table are in a processed state, the task execution is completed.
Preferably, in step (1), the process of grouping in order to form a set array and marking the array number comprises: dividing a data memory into a plurality of temporary storage tables, forming a set array by all the temporary storage tables, generating group numbers for the set array according to sequence, and putting the marked relation of the set array into a source data breakpoint grouping mark table.
Preferably, after step (3) is completed, the completed data source chunk is marked as processed.
Preferably, in step (3), the implementation process of the cleaning task includes cleaning, converting, and loading the target library table.
Preferably, the method further comprises the step (5) of deleting breakpoint marking information of the task in the source data breakpoint grouping marking table after all the tasks are executed.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that: the invention divides the breakpoints of the source service data, groups and marks the breakpoints, processes the cleaning data in a segmented way, carries out detailed management on the original uncontrollable cleaning process, and can continue to complete the rest tasks from the point of abnormal interruption after the cleaning task is interrupted abnormally and the breakpoint is compared. The scheme avoids the condition that the task is completely invalidated due to the task exception, and saves the computing resources and the computing time of the server. The invention designs that a cleaning task for abnormal interruption caused by external reasons is added in the ETL data cleaning service, and after the abnormality is eliminated, the task can be executed again, and the cleaning task can continue to run from the point of the last task interruption to finish the rest of the task. The design mode can enable the user operation to be more humanized, effectively avoids repeated execution again, saves time for the user and improves the speed.
Drawings
FIG. 1 is a flow chart illustrating a method for resuming a task at a breakpoint applied to a data cleaning tool according to the present invention.
Fig. 2 is a schematic diagram of patient admission registration information being broken-off.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
As shown in fig. 1, a method for resuming a task breakpoint applied to a data cleaning tool includes the following steps:
(1) extracting target source data, splitting source data breakpoints into data source chunks, grouping the data breakpoints in sequence to form set arrays, marking the array numbers, generating a source data breakpoint grouping marking table, marking each array as an unprocessed state, and marking the array as a processed state after processing tasks;
(2) after an abnormal problem occurs in a processing task and when the task needs to be restarted, inquiring according to a source data breakpoint grouping mark table and positioning a latest marked breakpoint which is in an unprocessed state;
(3) acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, and continuing to complete a cleaning task;
(4) and when all arrays in the source data breakpoint grouping mark table are in a processed state, the task execution is completed.
Preferably, in step (1), the process of grouping in order to form a set array and marking the array number comprises: dividing a data memory into a plurality of temporary storage tables, forming a set array by all the temporary storage tables, generating group numbers for the set array according to sequence, and putting the marked relation of the set array into a source data breakpoint grouping mark table.
Preferably, after step (3) is completed, the completed data source chunk is marked as processed.
Preferably, in step (3), the implementation process of the cleaning task includes cleaning, converting, and loading the target library table.
Preferably, the method further comprises the step (5) of deleting breakpoint marking information of the task in the source data breakpoint grouping marking table after all the tasks are executed.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that: the invention divides the breakpoints of the source service data, groups and marks the breakpoints, processes the cleaning data in a segmented way, carries out detailed management on the original uncontrollable cleaning process, and can continue to complete the rest tasks from the point of abnormal interruption after the cleaning task is interrupted abnormally and the breakpoint is compared. The scheme avoids the condition that the task is completely invalidated due to the task exception, and saves the computing resources and the computing time of the server. The invention designs that a cleaning task for abnormal interruption caused by external reasons is added in the ETL data cleaning service, and after the abnormality is eliminated, the task can be executed again, and the cleaning task can continue to run from the point of the last task interruption to finish the rest of the task. The design mode can enable the user operation to be more humanized, effectively avoids repeated execution again, saves time for the user and improves the speed.
Example 1
Step 1: extracting third-party service source data, splitting breakpoints of the source data, grouping and marking;
step 2: inquiring and comparing a 'source data breakpoint grouping mark storage table', inserting a breakpoint mark if no data exists, and finding out a grouping number of an unprocessed breakpoint mark if data exists;
and step 3: acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, cleaning, converting and loading a target base table;
and 4, step 4: and each time a group of source data chunks are cleaned, changing the breakpoint marking state in the marking storage table into processed state, and deleting the breakpoint marking information of the task in the source data breakpoint grouping and marking storage table until the task is completely executed.
Taking the task of cleaning 1 ten thousand records for patient admission registration information as an example, the following fig. 2:
and 5: extracting patient admission registration information and completing breakpoint splitting, grouping and marking:
step 6: 1 ten thousand records are divided into 10 temporary tables (breakpoints) for storage, each temporary table is 1000 pieces of data, corresponding group number marking is completed, and the corresponding relation is stored into a relation table of 'source data breakpoint grouping marking', and the method comprises the following steps:
serial number Task name Breakpoint (group) number Processing state
1 Admission registration information 1 Has been processed
2 Admission registration information 2 Has been processed
3 Admission registration information 3 Has been processed
4 Admission registration information 4 Has been processed
5 Admission registration information 5 Has been processed
6 Admission registration information 6 Has been processed
7 Admission registration information 7 Untreated
8 Admission registration information 8 Untreated
9 Admission registration information 9 Untreated
10 Admission registration information 10 Untreated
TABLE 1
And 7: after the abnormal repairing re-executing task, the service interior will inquire the relation table of 'source data breakpoint grouping mark', quickly locate the position of unprocessed breakpoint 7, at the same time, compare the temporary table group in the memory, skip the previous 6 groups to find the corresponding temporary table of source data No. 7, and continue to sequentially complete the cleaning task of the rest part.
In summary, the above-mentioned embodiments are only preferred embodiments of the present invention, and all equivalent changes and modifications made in the claims of the present invention should be covered by the claims of the present invention.

Claims (5)

1. A method for resuming a task at a breakpoint applied to a data cleaning tool is characterized by comprising the following steps:
(1) extracting target source data, splitting source data breakpoints into data source chunks, grouping the data breakpoints in sequence to form set arrays, marking the array numbers, generating a source data breakpoint grouping marking table, marking each array as an unprocessed state, and marking the array as a processed state after processing tasks;
(2) after an abnormal problem occurs in a processing task and when the task needs to be restarted, inquiring according to a source data breakpoint grouping mark table and positioning a latest marked breakpoint which is in an unprocessed state;
(3) acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, and continuing to complete a cleaning task;
(4) and when all arrays in the source data breakpoint grouping mark table are in a processed state, the task execution is completed.
2. The method of claim 1, wherein the method comprises the following steps: in the step (1), the process of grouping in sequence to form a set array and marking the number of the set array comprises the following steps: dividing a data memory into a plurality of temporary storage tables, forming a set array by all the temporary storage tables, generating group numbers for the set array according to sequence, and putting the marked relation of the set array into a source data breakpoint grouping mark table.
3. The method of claim 1, wherein the method comprises the following steps: and (4) after the step (3) is completed, marking the completed data source block as a processed state.
4. The method of claim 1, wherein the method comprises the following steps: in the step (3), the implementation process of the cleaning task comprises cleaning, converting and loading the target base table.
5. The method of claim 1, wherein the method comprises the following steps: and (5) after the task is completely executed, deleting breakpoint marking information of the task in the source data breakpoint grouping marking table.
CN201911141715.8A 2019-11-20 2019-11-20 Method for task breakpoint resume applied to data cleaning tool Pending CN110928863A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911141715.8A CN110928863A (en) 2019-11-20 2019-11-20 Method for task breakpoint resume applied to data cleaning tool

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911141715.8A CN110928863A (en) 2019-11-20 2019-11-20 Method for task breakpoint resume applied to data cleaning tool

Publications (1)

Publication Number Publication Date
CN110928863A true CN110928863A (en) 2020-03-27

Family

ID=69851314

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911141715.8A Pending CN110928863A (en) 2019-11-20 2019-11-20 Method for task breakpoint resume applied to data cleaning tool

Country Status (1)

Country Link
CN (1) CN110928863A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231403A (en) * 2020-10-15 2021-01-15 北京人大金仓信息技术股份有限公司 Consistency checking method, device, equipment and storage medium for data synchronization
CN113641694A (en) * 2021-07-16 2021-11-12 南京国电南自维美德自动化有限公司 Massive historical data backup method and recovery method for database

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105187564A (en) * 2015-10-14 2015-12-23 中科宇图天下科技有限公司 Method for breakpoint resuming of mobile phone side file
CN107426270A (en) * 2017-03-21 2017-12-01 北京智行鸿远汽车有限公司 A kind of data breakpoint continuous transmission method of vehicle remote monitoring terminal
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105187564A (en) * 2015-10-14 2015-12-23 中科宇图天下科技有限公司 Method for breakpoint resuming of mobile phone side file
CN107426270A (en) * 2017-03-21 2017-12-01 北京智行鸿远汽车有限公司 A kind of data breakpoint continuous transmission method of vehicle remote monitoring terminal
CN109271435A (en) * 2018-09-14 2019-01-25 南威软件股份有限公司 A kind of data pick-up method and system for supporting breakpoint transmission

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112231403A (en) * 2020-10-15 2021-01-15 北京人大金仓信息技术股份有限公司 Consistency checking method, device, equipment and storage medium for data synchronization
CN112231403B (en) * 2020-10-15 2024-01-30 北京人大金仓信息技术股份有限公司 Consistency verification method, device, equipment and storage medium for data synchronization
CN113641694A (en) * 2021-07-16 2021-11-12 南京国电南自维美德自动化有限公司 Massive historical data backup method and recovery method for database
CN113641694B (en) * 2021-07-16 2023-12-22 南京国电南自维美德自动化有限公司 Database massive historical data backup method and database massive historical data recovery method

Similar Documents

Publication Publication Date Title
WO2018180970A1 (en) Information processing system, feature value explanation method and feature value explanation program
CN111459985B (en) Identification information processing method and device
CN105589838B (en) A kind of electronic government documents trace reservation method based on Documents Comparison
US7707230B1 (en) Methods and structure for use of an auxiliary database for importation of data into a target database
CN105550225A (en) Index construction method and query method and apparatus
CN110781231A (en) Batch import method, device, equipment and storage medium based on database
CN110928863A (en) Method for task breakpoint resume applied to data cleaning tool
EP3471344A1 (en) System and method for selecting proxy computer
CN107545015B (en) Processing method and processing device for query fault
CN105224527B (en) General ETL methods suitable for a variety of purpose table update modes
CN104239580B (en) General purpose single field based on value row mapping splits data pick-up method and apparatus
US11221986B2 (en) Data management method and data analysis system
US20150026115A1 (en) Creation of change-based data integration jobs
CN116244333A (en) Database query performance prediction method and system based on cost factor calibration
CN106557881B (en) Business process system construction method based on business activity execution sequence
CN104781814A (en) Reference data segmentation from single to multiple tables
CN111782619A (en) Document increment synchronization method and device between servers and storage medium
CN116150179A (en) Method and device for comparing data consistency between databases
CN104536897A (en) Automatic testing method and system based on keyword
CN108427675A (en) Build the method and apparatus of index
CN113568921B (en) Multi-person collaborative operation method for geographic information data production and update
CN115063101A (en) Method, system and device for generating structure data based on case base and storage medium
CN109635032B (en) Data conversion method and terminal
CN113791594A (en) Configuration establishing method, system, equipment and medium of distributed control system
CN112115148B (en) Method, device and equipment for determining data comparison result

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200327