CN110928863A - Method for task breakpoint resume applied to data cleaning tool - Google Patents
Method for task breakpoint resume applied to data cleaning tool Download PDFInfo
- Publication number
- CN110928863A CN110928863A CN201911141715.8A CN201911141715A CN110928863A CN 110928863 A CN110928863 A CN 110928863A CN 201911141715 A CN201911141715 A CN 201911141715A CN 110928863 A CN110928863 A CN 110928863A
- Authority
- CN
- China
- Prior art keywords
- data
- breakpoint
- task
- source
- marking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004140 cleaning Methods 0.000 title claims abstract description 39
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000002159 abnormal effect Effects 0.000 claims abstract description 11
- 238000012545 processing Methods 0.000 claims abstract description 10
- 238000003491 array Methods 0.000 claims abstract description 7
- 238000013461 design Methods 0.000 description 4
- 230000010354 integration Effects 0.000 description 3
- 230000005856 abnormality Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000005540 biological transmission Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H40/00—ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Business, Economics & Management (AREA)
- General Business, Economics & Management (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Biomedical Technology (AREA)
- General Engineering & Computer Science (AREA)
- Epidemiology (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention relates to the technical field of data processing, and discloses a method for task breakpoint resume applied to a data cleaning tool, which comprises the following steps: (1) extracting target source data, and splitting source data breakpoints into data source chunks; (2) after an abnormal problem occurs in a processing task and when the task needs to be restarted, inquiring according to a source data breakpoint grouping mark table and positioning a latest marked breakpoint which is in an unprocessed state; (3) acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, and continuing to complete a cleaning task; (4) and when all arrays in the source data breakpoint grouping mark table are in a processed state, the task execution is completed. The invention processes the cleaning data by sections in a mode of splitting, grouping and marking breakpoints of the source service data, and can continue to complete the rest tasks from the points of abnormal interruption after the cleaning tasks are interrupted abnormally and the breakpoints are compared.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a method for task breakpoint resume applied to a data cleaning tool.
Background
At present, with the development of medical informatization, the construction of a hospital information integration platform is widely developed, and an ETL data cleaning tool contained in the hospital information integration platform is mainly used for creating a data center of a whole hospital and realizing an independent data warehouse. ETL is a process of data extraction (Extract), transformation (Transform), and loading (Load), is a core and soul of BI/DW (business intelligence/data warehouse), and is an important ring for constructing a data center. A user extracts required data from a data source, and finally loads the data into a data center according to a predefined data center model after data conversion. The business data of medical business systems HIS, LIS, PACS, EMR and the like are extracted to a business data layer of a data center through ETL, source data of the business data layer of the data center are extracted to a standard layer of the data center after being cleaned and converted in a standardized way, and the data are further extracted and converted and loaded to data application layers such as a data warehouse, a clinical knowledge base, an index base and the like through further integration of the data by the standard layer of the data center. However, during the execution of the ETL task, the task may be interrupted due to external reasons such as abnormal background program, unstable network, and power failure of the server. Therefore, quick and timely cleaning of business data is very important for management of medical data.
Disclosure of Invention
Aiming at the defects of low business data cleaning speed and unstable operation process in the prior art, the invention provides a method for task breakpoint continuous transmission applied to a data cleaning tool.
In order to solve the above technical problems, the present invention is solved by the following technical solutions.
A method for task breakpoint resume applied to a data cleaning tool comprises the following steps:
(1) extracting target source data, splitting source data breakpoints into data source chunks, grouping the data breakpoints in sequence to form set arrays, marking the array numbers, generating a source data breakpoint grouping marking table, marking each array as an unprocessed state, and marking the array as a processed state after processing tasks;
(2) after an abnormal problem occurs in a processing task and when the task needs to be restarted, inquiring according to a source data breakpoint grouping mark table and positioning a latest marked breakpoint which is in an unprocessed state;
(3) acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, and continuing to complete a cleaning task;
(4) and when all arrays in the source data breakpoint grouping mark table are in a processed state, the task execution is completed.
Preferably, in step (1), the process of grouping in order to form a set array and marking the array number comprises: dividing a data memory into a plurality of temporary storage tables, forming a set array by all the temporary storage tables, generating group numbers for the set array according to sequence, and putting the marked relation of the set array into a source data breakpoint grouping mark table.
Preferably, after step (3) is completed, the completed data source chunk is marked as processed.
Preferably, in step (3), the implementation process of the cleaning task includes cleaning, converting, and loading the target library table.
Preferably, the method further comprises the step (5) of deleting breakpoint marking information of the task in the source data breakpoint grouping marking table after all the tasks are executed.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that: the invention divides the breakpoints of the source service data, groups and marks the breakpoints, processes the cleaning data in a segmented way, carries out detailed management on the original uncontrollable cleaning process, and can continue to complete the rest tasks from the point of abnormal interruption after the cleaning task is interrupted abnormally and the breakpoint is compared. The scheme avoids the condition that the task is completely invalidated due to the task exception, and saves the computing resources and the computing time of the server. The invention designs that a cleaning task for abnormal interruption caused by external reasons is added in the ETL data cleaning service, and after the abnormality is eliminated, the task can be executed again, and the cleaning task can continue to run from the point of the last task interruption to finish the rest of the task. The design mode can enable the user operation to be more humanized, effectively avoids repeated execution again, saves time for the user and improves the speed.
Drawings
FIG. 1 is a flow chart illustrating a method for resuming a task at a breakpoint applied to a data cleaning tool according to the present invention.
Fig. 2 is a schematic diagram of patient admission registration information being broken-off.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples.
As shown in fig. 1, a method for resuming a task breakpoint applied to a data cleaning tool includes the following steps:
(1) extracting target source data, splitting source data breakpoints into data source chunks, grouping the data breakpoints in sequence to form set arrays, marking the array numbers, generating a source data breakpoint grouping marking table, marking each array as an unprocessed state, and marking the array as a processed state after processing tasks;
(2) after an abnormal problem occurs in a processing task and when the task needs to be restarted, inquiring according to a source data breakpoint grouping mark table and positioning a latest marked breakpoint which is in an unprocessed state;
(3) acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, and continuing to complete a cleaning task;
(4) and when all arrays in the source data breakpoint grouping mark table are in a processed state, the task execution is completed.
Preferably, in step (1), the process of grouping in order to form a set array and marking the array number comprises: dividing a data memory into a plurality of temporary storage tables, forming a set array by all the temporary storage tables, generating group numbers for the set array according to sequence, and putting the marked relation of the set array into a source data breakpoint grouping mark table.
Preferably, after step (3) is completed, the completed data source chunk is marked as processed.
Preferably, in step (3), the implementation process of the cleaning task includes cleaning, converting, and loading the target library table.
Preferably, the method further comprises the step (5) of deleting breakpoint marking information of the task in the source data breakpoint grouping marking table after all the tasks are executed.
Due to the adoption of the technical scheme, the invention has the remarkable technical effects that: the invention divides the breakpoints of the source service data, groups and marks the breakpoints, processes the cleaning data in a segmented way, carries out detailed management on the original uncontrollable cleaning process, and can continue to complete the rest tasks from the point of abnormal interruption after the cleaning task is interrupted abnormally and the breakpoint is compared. The scheme avoids the condition that the task is completely invalidated due to the task exception, and saves the computing resources and the computing time of the server. The invention designs that a cleaning task for abnormal interruption caused by external reasons is added in the ETL data cleaning service, and after the abnormality is eliminated, the task can be executed again, and the cleaning task can continue to run from the point of the last task interruption to finish the rest of the task. The design mode can enable the user operation to be more humanized, effectively avoids repeated execution again, saves time for the user and improves the speed.
Example 1
Step 1: extracting third-party service source data, splitting breakpoints of the source data, grouping and marking;
step 2: inquiring and comparing a 'source data breakpoint grouping mark storage table', inserting a breakpoint mark if no data exists, and finding out a grouping number of an unprocessed breakpoint mark if data exists;
and step 3: acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, cleaning, converting and loading a target base table;
and 4, step 4: and each time a group of source data chunks are cleaned, changing the breakpoint marking state in the marking storage table into processed state, and deleting the breakpoint marking information of the task in the source data breakpoint grouping and marking storage table until the task is completely executed.
Taking the task of cleaning 1 ten thousand records for patient admission registration information as an example, the following fig. 2:
and 5: extracting patient admission registration information and completing breakpoint splitting, grouping and marking:
step 6: 1 ten thousand records are divided into 10 temporary tables (breakpoints) for storage, each temporary table is 1000 pieces of data, corresponding group number marking is completed, and the corresponding relation is stored into a relation table of 'source data breakpoint grouping marking', and the method comprises the following steps:
serial number | Task name | Breakpoint (group) | Processing state | |
1 | |
1 | Has been processed | |
2 | |
2 | Has been processed | |
3 | |
3 | Has been processed | |
4 | |
4 | Has been processed | |
5 | |
5 | Has been processed | |
6 | |
6 | Has been processed | |
7 | |
7 | Untreated | |
8 | |
8 | Untreated | |
9 | |
9 | Untreated | |
10 | |
10 | Untreated |
TABLE 1
And 7: after the abnormal repairing re-executing task, the service interior will inquire the relation table of 'source data breakpoint grouping mark', quickly locate the position of unprocessed breakpoint 7, at the same time, compare the temporary table group in the memory, skip the previous 6 groups to find the corresponding temporary table of source data No. 7, and continue to sequentially complete the cleaning task of the rest part.
In summary, the above-mentioned embodiments are only preferred embodiments of the present invention, and all equivalent changes and modifications made in the claims of the present invention should be covered by the claims of the present invention.
Claims (5)
1. A method for resuming a task at a breakpoint applied to a data cleaning tool is characterized by comprising the following steps:
(1) extracting target source data, splitting source data breakpoints into data source chunks, grouping the data breakpoints in sequence to form set arrays, marking the array numbers, generating a source data breakpoint grouping marking table, marking each array as an unprocessed state, and marking the array as a processed state after processing tasks;
(2) after an abnormal problem occurs in a processing task and when the task needs to be restarted, inquiring according to a source data breakpoint grouping mark table and positioning a latest marked breakpoint which is in an unprocessed state;
(3) acquiring corresponding data source chunks according to the packet numbers marked by the unprocessed breakpoint, sequentially executing the unprocessed data source chunks, and continuing to complete a cleaning task;
(4) and when all arrays in the source data breakpoint grouping mark table are in a processed state, the task execution is completed.
2. The method of claim 1, wherein the method comprises the following steps: in the step (1), the process of grouping in sequence to form a set array and marking the number of the set array comprises the following steps: dividing a data memory into a plurality of temporary storage tables, forming a set array by all the temporary storage tables, generating group numbers for the set array according to sequence, and putting the marked relation of the set array into a source data breakpoint grouping mark table.
3. The method of claim 1, wherein the method comprises the following steps: and (4) after the step (3) is completed, marking the completed data source block as a processed state.
4. The method of claim 1, wherein the method comprises the following steps: in the step (3), the implementation process of the cleaning task comprises cleaning, converting and loading the target base table.
5. The method of claim 1, wherein the method comprises the following steps: and (5) after the task is completely executed, deleting breakpoint marking information of the task in the source data breakpoint grouping marking table.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911141715.8A CN110928863A (en) | 2019-11-20 | 2019-11-20 | Method for task breakpoint resume applied to data cleaning tool |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911141715.8A CN110928863A (en) | 2019-11-20 | 2019-11-20 | Method for task breakpoint resume applied to data cleaning tool |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110928863A true CN110928863A (en) | 2020-03-27 |
Family
ID=69851314
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911141715.8A Pending CN110928863A (en) | 2019-11-20 | 2019-11-20 | Method for task breakpoint resume applied to data cleaning tool |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110928863A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112231403A (en) * | 2020-10-15 | 2021-01-15 | 北京人大金仓信息技术股份有限公司 | Consistency checking method, device, equipment and storage medium for data synchronization |
CN113641694A (en) * | 2021-07-16 | 2021-11-12 | 南京国电南自维美德自动化有限公司 | Massive historical data backup method and recovery method for database |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105187564A (en) * | 2015-10-14 | 2015-12-23 | 中科宇图天下科技有限公司 | Method for breakpoint resuming of mobile phone side file |
CN107426270A (en) * | 2017-03-21 | 2017-12-01 | 北京智行鸿远汽车有限公司 | A kind of data breakpoint continuous transmission method of vehicle remote monitoring terminal |
CN109271435A (en) * | 2018-09-14 | 2019-01-25 | 南威软件股份有限公司 | A kind of data pick-up method and system for supporting breakpoint transmission |
-
2019
- 2019-11-20 CN CN201911141715.8A patent/CN110928863A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105187564A (en) * | 2015-10-14 | 2015-12-23 | 中科宇图天下科技有限公司 | Method for breakpoint resuming of mobile phone side file |
CN107426270A (en) * | 2017-03-21 | 2017-12-01 | 北京智行鸿远汽车有限公司 | A kind of data breakpoint continuous transmission method of vehicle remote monitoring terminal |
CN109271435A (en) * | 2018-09-14 | 2019-01-25 | 南威软件股份有限公司 | A kind of data pick-up method and system for supporting breakpoint transmission |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112231403A (en) * | 2020-10-15 | 2021-01-15 | 北京人大金仓信息技术股份有限公司 | Consistency checking method, device, equipment and storage medium for data synchronization |
CN112231403B (en) * | 2020-10-15 | 2024-01-30 | 北京人大金仓信息技术股份有限公司 | Consistency verification method, device, equipment and storage medium for data synchronization |
CN113641694A (en) * | 2021-07-16 | 2021-11-12 | 南京国电南自维美德自动化有限公司 | Massive historical data backup method and recovery method for database |
CN113641694B (en) * | 2021-07-16 | 2023-12-22 | 南京国电南自维美德自动化有限公司 | Database massive historical data backup method and database massive historical data recovery method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2018180970A1 (en) | Information processing system, feature value explanation method and feature value explanation program | |
CN111459985B (en) | Identification information processing method and device | |
CN105589838B (en) | A kind of electronic government documents trace reservation method based on Documents Comparison | |
US7707230B1 (en) | Methods and structure for use of an auxiliary database for importation of data into a target database | |
CN105550225A (en) | Index construction method and query method and apparatus | |
CN110781231A (en) | Batch import method, device, equipment and storage medium based on database | |
CN110928863A (en) | Method for task breakpoint resume applied to data cleaning tool | |
EP3471344A1 (en) | System and method for selecting proxy computer | |
CN107545015B (en) | Processing method and processing device for query fault | |
CN105224527B (en) | General ETL methods suitable for a variety of purpose table update modes | |
CN104239580B (en) | General purpose single field based on value row mapping splits data pick-up method and apparatus | |
US11221986B2 (en) | Data management method and data analysis system | |
US20150026115A1 (en) | Creation of change-based data integration jobs | |
CN116244333A (en) | Database query performance prediction method and system based on cost factor calibration | |
CN106557881B (en) | Business process system construction method based on business activity execution sequence | |
CN104781814A (en) | Reference data segmentation from single to multiple tables | |
CN111782619A (en) | Document increment synchronization method and device between servers and storage medium | |
CN116150179A (en) | Method and device for comparing data consistency between databases | |
CN104536897A (en) | Automatic testing method and system based on keyword | |
CN108427675A (en) | Build the method and apparatus of index | |
CN113568921B (en) | Multi-person collaborative operation method for geographic information data production and update | |
CN115063101A (en) | Method, system and device for generating structure data based on case base and storage medium | |
CN109635032B (en) | Data conversion method and terminal | |
CN113791594A (en) | Configuration establishing method, system, equipment and medium of distributed control system | |
CN112115148B (en) | Method, device and equipment for determining data comparison result |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20200327 |