CN110362300A - A kind of data cleansing tool - Google Patents

A kind of data cleansing tool Download PDF

Info

Publication number
CN110362300A
CN110362300A CN201910649088.2A CN201910649088A CN110362300A CN 110362300 A CN110362300 A CN 110362300A CN 201910649088 A CN201910649088 A CN 201910649088A CN 110362300 A CN110362300 A CN 110362300A
Authority
CN
China
Prior art keywords
data
component
node
cleansing tool
data cleansing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910649088.2A
Other languages
Chinese (zh)
Inventor
侯战斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tongda Polytron Technologies Inc
Original Assignee
Beijing Tongda Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tongda Polytron Technologies Inc filed Critical Beijing Tongda Polytron Technologies Inc
Priority to CN201910649088.2A priority Critical patent/CN110362300A/en
Publication of CN110362300A publication Critical patent/CN110362300A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/30Creation or generation of source code
    • G06F8/34Graphical or visual programming

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to data processing field, and in particular to a kind of data cleansing tool.Spend that the time is long, process is cumbersome and needs staff professional too strong in order to solve the problems, such as existing in data cleansing, the invention proposes a kind of using by graphically editing cleaning process, can repeat calling and data cleansing tool easy to operate, without too strong professional skill.In order to realize appeal purpose, the invention it is used the technical solution adopted is that, a kind of data cleansing tool, comprising: operation monitoring module imports execution module, imports configuration module and data definition module;The importing configuration module includes configurations unit, cleaning rule definition unit and cleaning process definition unit.Cleaning rule definition unit include: control node, string processing node, date processing node, digital processing node, data processing node and data loading node.

Description

A kind of data cleansing tool
Technical field
The invention belongs to data processing field, and in particular to a kind of data cleansing tool.
Background technique
Data cleansing is to convert the data of separate sources different-format to the data of format of the same race and then be stored in data The process in library, but at present for most of data cleansing adopt and manually carry out, there are also softwares may be implemented Data cleansing, but its functional very weak, professional knowledge in terms of needing staff to have high computer, and handle Data mode fixed single, data per treatment require to reprogram, and spend the time long.
Summary of the invention
Spend that the time is long, process is cumbersome and needs staff professional in order to solve above-mentioned in data cleansing Property too strong problem, the invention proposes a kind of using by graphically editing cleaning process, can repeating to call and operation letter Singly, without the data cleansing tool of too strong professional skill.
In order to realize appeal purpose, the invention it is used the technical solution adopted is that, a kind of data cleansing tool, packet Include: operation monitoring module imports execution module, imports configuration module and data definition module;The importing configuration module packet It includes, configurations unit, cleaning rule definition unit and cleaning process definition unit.Preferably, cleaning rule definition Unit includes: control node, string processing node, date processing node, digital processing node, data processing node and data It is put in storage node.
Preferably, the control node includes: to start component, terminate component, determination component and branch component.As It is preferred that the string processing node include: obtain string length component, search son wear component, interception substring component, even Connect character string component and character string replacement component.
Preferably, the date processing node includes: to obtain current component, time computation module and formatting group Part.
Preferably, the digital processing node includes: that numeric formatted component and numerical value calculate.
Preferably, the data processing includes: variable assignments component, data check component and data transformation components.
Preferably, the data loading includes: except weight regular node and data loading node.
Preferably, user can pass through dragging by the various components in each node in cleaning rule definition unit It is freely combined, forms the cleaning rule process for meeting user demand.
Preferably, the data definition module includes: data table definitions unit and data source definition unit;Described The main purpose of data table definitions unit is the requirement according to goal systems for data, carries out identification and standard to external data Change processing, after reaching goal systems data demand, enters in the corresponding database table of goal systems.
The utility model has the advantages that in (1) the application, user can be by carrying out dragging composition to patterned component for innovation and creation Meet the data cleansing process of user's demand, although spending the time longer when first time constructing process, constructed Process can generate record, so directly pervious process can be called when next time reprocesses same type of data, from And the time is shortened, improve efficiency;(2) the application uses patterned design, and building cleaning process is only needed to figure Pull and design parameter according to demand, without programming again, so wanting the ability in terms of the programming of staff Ask not high;(3) cleaning process that the application is composed by then passing through component, so the cleaning stream of ratio write out by code Journey, logic when the application can be to cleaning is checked, and can be checked to the historical record of the data of cleaning.
Specific embodiment
A kind of data cleansing tool, comprising: operation monitoring module, importing execution module, importing configuration module and data are fixed Adopted module;The importing configuration module includes that configurations unit, cleaning rule definition unit and cleaning process definition are single Member.
Cleaning rule definition unit include: control node, string processing node, the date processing node, number at Manage node, data processing node and data loading node.
The control node includes: to start component, terminate component, determination component and branch component.The character string Processing node includes: to obtain string length component, search son and wear component, interception substring component, connection string component and word Symbol string replacement component.
The date processing node includes: to obtain current component, time computation module and FORMATTING COMPONENT.The number Word processing node includes: that numeric formatted component and numerical value calculate.The data processing includes: variable assignments component, data Verify component and data transformation components.
The data loading includes: except weight regular node and data loading node.User can be fixed by cleaning rule The various components in each node in adopted unit are freely combined by pulling, and form the cleaning rule for meeting user demand Process.
The data definition module includes: data table definitions unit and data source definition unit;The tables of data is fixed The main purpose of adopted unit is the requirement according to goal systems for data, carries out identification and standardization to external data, After reaching goal systems data demand, enter in the corresponding database table of goal systems.
Before executing data cleansing operation, in order to receive the valid data after cleaning, need first to use data definition module, Target database table is defined.Data definition module can also be with regard to data other than defining data field essential information No is not sky, and whether data encrypt, and data check mode etc. is defined.For limiting the data of enumeration type word of value range Section, can also provide alternative data list.For quoting the field from other tables, also sees and closed by specified data referencing System sets the relevant information of reference list and field.
In addition to above-mentioned rule, the data that purpose table can also be arranged remove weight-normality then, and when specified discovery repeated data, system Used processing mode, if processing mode be update, week engine can according to the update mode set in literary name section, organize The field for needing to update.
Data cleansing engine in importing process, can automatically record importing related data (initial data, import as a result, Invalid data).After the completion of importing again, historical record can be imported by inquiry, obtain the relevant information about importing process at any time. And result data export can will be imported, the processing under line is carried out, by invalid data, manual processing is carried out according to cause of invalidity, so It is imported again afterwards, to improve the storage rate of data to greatest extent.
In the application, user can form the data for meeting user's demand by pull to patterned component Cleaning process, although spending the time longer when first time constructing process, the process constructed can generate record, or less When the same type of data of secondary reprocessing, directly pervious process can be called, so as to shorten the time, improve effect Rate;The application uses patterned design, and building cleaning process only needs to carry out figure to pull and design is joined according to demand Number, without programming again, so not high to the Capability Requirement in terms of the programming of staff;The application is by then passing through group The cleaning process that part is composed, so the cleaning process of ratio write out by code, the application patrols when can be to cleaning It collects and is checked, and the historical record of the data of cleaning can be checked;The clear tool assembly of graphics data is abundant, clearly Logic is washed to show in a manner of patterned, it is succinct intuitive, it should be readily appreciated that and change, powerful support is provided for data processing.
The preferred embodiment of the invention is described in detail above.It should be appreciated that the ordinary skill people of this field Member makes many modifications and variations without the design that creative work can be created according to the present invention.Therefore, all this technology necks Technical staff passes through logic analysis, reasoning or limited reality according to the design of the invention on the basis of existing technology in domain Available technical solution is tested, it all should be within the scope of protection determined by the claims.

Claims (10)

1. a kind of data cleansing tool characterized by comprising operation monitoring module imports execution module, imports configuration module With data definition module;The importing configuration module includes configurations unit, cleaning rule definition unit and cleaning process Definition unit.
2. a kind of data cleansing tool according to claim 1, which is characterized in that cleaning rule definition unit packet It includes: control node, string processing node, date processing node, digital processing node, data processing node and data loading section Point.
3. a kind of data cleansing tool according to claim 2, which is characterized in that the control node includes: to start Component terminates component, determination component and branch component.
4. a kind of data cleansing tool according to claim 2, which is characterized in that the string processing node packet It includes: obtaining string length component, lookup wears component, intercepts substring component, connection string component and character string replacement group Part.
5. a kind of data cleansing tool according to claim 2, which is characterized in that the date handles node and includes: Obtain current component, time computation module and FORMATTING COMPONENT.
6. a kind of data cleansing tool according to claim 2, which is characterized in that the digital processing node includes: Numeric formatted component and numerical value calculate.
7. a kind of data cleansing tool according to claim 2, which is characterized in that the data processing includes: variable Valuation component, data check component and data transformation components.
8. a kind of data cleansing tool according to claim 2, which is characterized in that the data loading includes: except weight Regular node and data loading node.
9. a kind of data cleansing tool described in -8 according to claim 1, which is characterized in that user can be fixed by cleaning rule The various components in each node in adopted unit are freely combined by pulling, and form the cleaning rule for meeting user demand Process.
10. a kind of data cleansing tool according to claim 1, which is characterized in that the data definition module includes: Data table definitions unit and data source definition unit;The main purpose of the data table definitions unit is according to goal systems pair In the requirement of data, identification is carried out to external data and standardization enters target after reaching goal systems data demand In the corresponding database table of system.
CN201910649088.2A 2019-07-18 2019-07-18 A kind of data cleansing tool Pending CN110362300A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910649088.2A CN110362300A (en) 2019-07-18 2019-07-18 A kind of data cleansing tool

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910649088.2A CN110362300A (en) 2019-07-18 2019-07-18 A kind of data cleansing tool

Publications (1)

Publication Number Publication Date
CN110362300A true CN110362300A (en) 2019-10-22

Family

ID=68220697

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910649088.2A Pending CN110362300A (en) 2019-07-18 2019-07-18 A kind of data cleansing tool

Country Status (1)

Country Link
CN (1) CN110362300A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117171153A (en) * 2023-09-11 2023-12-05 北京三维天地科技股份有限公司 Visual data cleaning method and system supporting custom cleaning flow

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231661A (en) * 2008-02-19 2008-07-30 上海估家网络科技有限公司 Method and system for digging object grade knowledge
CA2648210A1 (en) * 2008-01-03 2009-07-03 Accenture Global Services Gmbh System and method for automating etl applications
GB2509090A (en) * 2012-12-20 2014-06-25 Ibm An extract-transform-load (ETL) processor controller indicates a degree of preferredness of a proposed placement of data
CN105976158A (en) * 2016-04-26 2016-09-28 中国电子科技网络信息安全有限公司 Visual ETL flow management and scheduling monitoring method
CN108363782A (en) * 2018-02-11 2018-08-03 中国联合网络通信集团有限公司 A kind of data cleaning method and Data clean system

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA2648210A1 (en) * 2008-01-03 2009-07-03 Accenture Global Services Gmbh System and method for automating etl applications
CN101231661A (en) * 2008-02-19 2008-07-30 上海估家网络科技有限公司 Method and system for digging object grade knowledge
GB2509090A (en) * 2012-12-20 2014-06-25 Ibm An extract-transform-load (ETL) processor controller indicates a degree of preferredness of a proposed placement of data
CN105976158A (en) * 2016-04-26 2016-09-28 中国电子科技网络信息安全有限公司 Visual ETL flow management and scheduling monitoring method
CN108363782A (en) * 2018-02-11 2018-08-03 中国联合网络通信集团有限公司 A kind of data cleaning method and Data clean system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李代平: "《软件工程》", 31 August 2002, 冶金工业出版社 *
王铭军: "可视数据清洗综述", 《中国图象图形学报》 *
肖英: "《Java程序设计基础》", 31 January 2017, 华中科技大学出版社 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117171153A (en) * 2023-09-11 2023-12-05 北京三维天地科技股份有限公司 Visual data cleaning method and system supporting custom cleaning flow

Similar Documents

Publication Publication Date Title
Gehlot et al. An introduction to systems modeling and simulation with colored petri nets
US7379934B1 (en) Data mapping
CN106897809A (en) Workflow creation method, workflow designer and workflow system
CN106897806A (en) Workflow creation method and system, operation system
Chen et al. Multiobjective optimization of airline crew roster recovery problems under disruption conditions
CN103092631B (en) A kind of data base application system development platform and development approach
JPH06266813A (en) Data collecting device and method for collecting and inputting data and requirement from plurality of user for constructing process-model and data-model
CN106095404A (en) A kind of business process model is to the automodel conversion method servicing composition model
CN107153921A (en) The method that workflow approval node is set examination & approval role by department's rank
CN109472496A (en) Workflow construction method and device based on visualization guidance and automatic Verification
CN109241104A (en) The resolver and its implementation of AISQL in decision type distributed data base system
CN107578217A (en) A kind of working electronic stream is autonomously generated method, apparatus and office management system
CN110362300A (en) A kind of data cleansing tool
Boring et al. Human performance modeling for dynamic human reliability analysis
CN111046189A (en) Modeling method of power distribution network knowledge graph model
CN101331505A (en) Method and apparatus for an algorithm development environment for solving a class of real-life combinatorial optimization problems
CN104575148B (en) Simulation operating system for training nuclear power plant reactor operator
CN105224300B (en) A kind of visual modeling method based on system member view construction system view
Seyfang et al. Visualizing complex process hierarchies during the modeling process
KR20100046504A (en) Smart business model supporting apparatus and a method thereof
Simon Web-Based Simulation Of Production Schedules With High-Level Petri Nets.
CN106294064A (en) A kind of device and method that baseboard management controller attribute is set
CN108205564A (en) Knowledge hierarchy construction method and system
Kavelar et al. Web-based decision making for complex event processing systems
Fazio et al. Select-HVAC: knowledge-based system as an advisor to configure HVAC systems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191022