CN105740465A - Flexible custom comparison method - Google Patents

Flexible custom comparison method Download PDF

Info

Publication number
CN105740465A
CN105740465A CN201610122752.4A CN201610122752A CN105740465A CN 105740465 A CN105740465 A CN 105740465A CN 201610122752 A CN201610122752 A CN 201610122752A CN 105740465 A CN105740465 A CN 105740465A
Authority
CN
China
Prior art keywords
data
comparison
field
self
master
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610122752.4A
Other languages
Chinese (zh)
Inventor
刘华兴
邹建军
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Software Group Co Ltd
Original Assignee
Inspur Software Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Software Group Co Ltd filed Critical Inspur Software Group Co Ltd
Priority to CN201610122752.4A priority Critical patent/CN105740465A/en
Publication of CN105740465A publication Critical patent/CN105740465A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a flexible self-defining comparison method, which is characterized in that main data and matching fields in the data are defined in the data comparison process, data comparison is carried out by combining a main data stream, comparison rules of the fields are set by self-defining two forms to be compared and the fields to be compared, and a system marks data with differences according to the rules set by a user. The method can calculate the comparison result only by determining the comparison rule between the fields of the master table and the slave table, thereby greatly improving the comparison efficiency between different data tables.

Description

One is self-defined comparison method flexibly
Technical field
The present invention relates to computer communication technology field, be specifically related to one self-defined comparison method flexibly, a kind of method that data selecting separate sources flexibly are compared.
Background technology
Along with the popularization of informationization application, in large corporation, the quantity of information system is increasing gradually.But lack data-interface owing to each system data conventions is imperfect, between system, and make Data Integration work become an important process of IT department;Data Integration contains the extraction of data, conversion and loading, is directed to computer technology and service logic.Data Integration is very important, even can affect the success or failure of information system/Construction of Data Warehouse.One object is stored in multiple system so that have to carry out the comparison of data in the process of Data Integration, and, comparing work now is also the core content of Data Integration.
Each system is different to the understanding angle of same target, and the object definition therefore provided also differs;Considering the integrity of data, the time limitation etc. of data causes that the comparison work of data is not smooth simultaneously.Comparing is generally automatically processed by computer program, and stage needs artificial participation.The Data Integration relating to comparing is present in many Information System configuration processes, such as the construction of national basis Information Library System;Population database data is respectively derived from public security bureau, bureau of labour, Bureau of Education, Labor and Social Security Bureau etc., and legal person's database data derives from industrial and commercial bureau, the tax bureau, Economic and Trade Commission etc.;Geographic information resources storehouse derives from regional planning agency, Real Estate Administration, Bureau of Water Resources, builds pipe office, Traffic Administration Bureau etc..
Summary of the invention
The technical problem to be solved in the present invention is: lack interface etc. owing to the data standard in different systems is inconsistent, between system, cause that comparing work is difficult to, in order to solve to compare between the data of separate sources problem, the present invention provides one self-defined comparison method flexibly.
The technical solution adopted in the present invention is:
One is self-defined comparison method flexibly, described method is by defining the master data in comparing process, the matching field in data, comparing is carried out in conjunction with primary traffic, field by two lists of self-defined required comparison and required comparison, set the comparison rules of field, the data that system regular marks set by user there are differences.
Described method completes the comparison of data by steps such as data acquisition, master data definition, subsidiary field setting, comparison rules settings.
Described data acquisition is as follows:
First, the template of data acquisition is defined, it is ensured that in the data collected, critical field can collect;
Secondly, the acquisition time of data is set, facilitates follow-up data different dimensions to compare;
Finally, the data deriving from different systems are separately deposited, and flag data source.
Described master data definition procedure is as follows:
According to the requirement of comparing business, define master and slave table and the field of required comparison, it is determined that the meaning of these fields, the source of data, title in respective system.
It is as follows that described subsidiary field sets process:
Choose the field of same meaning, set master meter and the subsidiary field from table.
It is as follows that described comparison rules sets process:
In comparing process, it is necessary to carrying out, through computing, this kind of field whether confirmation there are differences, its comparison is realized by setup algorithm formula.
Described method operating procedure is as follows:
1) standardization data acquisition:
Self-defining data acquisition module, the data cleansing that internet data or other unit are provided, is organized into normalized data and is deposited in database table;
2) data Rapid matching:
Data in two tables are associated, if there is no critical field, take subsidiary field to carry out fuzzy matching, facilitate next step comparing;
3) comparison rules is self-defined:
Select two tables, it is stipulated that master meter with from table, select the field of required comparison and the field of data association, then set the calculated relationship between the field of required comparison, directly display out, by calculating, the data and field that there are differences.
The data collecting standard of described method code requirement, internet data is obtained by web crawlers technology, the data that other unit provides, by providing excel or data base to directly link mode and provide.
System involved by described method takes B/S framework, and user realizes master data definition at browser interface, subsidiary field sets and comparison rules sets.
The invention have the benefit that
The inventive method only needs the comparison rules between clearly master and slave literary name section just can calculate comparison result, substantially increases the comparison efficiency between different pieces of information table.
Accompanying drawing explanation
Fig. 1 is the inventive method flow chart.
Detailed description of the invention
Below in conjunction with Figure of description, by detailed description of the invention, the present invention is further described:
Embodiment 1:
One is self-defined comparison method flexibly, described method is by defining the master data in comparing process, the matching field in data, comparing is carried out in conjunction with primary traffic, field by two lists of self-defined required comparison and required comparison, set the comparison rules of field, the data that system regular marks set by user there are differences.
Embodiment 2:
As it is shown in figure 1, on the basis of embodiment 1, method described in the present embodiment completes the comparison of data by qualification steps such as data acquisition, master data definition, subsidiary field setting, comparison rules settings.
Embodiment 3:
On the basis of embodiment 2, described in the present embodiment, data acquisition is as follows:
First, defining the template of data acquisition, it is ensured that in the data collected, critical field can collect, for instance collector's information, the field such as name, age, sex, identification card number must exist;
Secondly, the acquisition time of data is set, facilitates follow-up data different dimensions to compare;
Finally, the data deriving from different systems are separately deposited and flag data source, so just more targeted in comparing process.
Embodiment 4:
On the basis of embodiment 2, described in the present embodiment, master data definition procedure is as follows:
According to the requirement of comparing business, define master and slave table and the field of required comparison, it is determined that the meaning of these fields, the source of data, title in respective system.
This step is the core of comparing work, it is necessary to the source of clear and definite critical data item and responsibility.As for demographic data: name and ID (identity number) card No. derive from public security bureau, education degree derives from Bureau of Education, and employment unit information derives from Labor and Social Security Bureau etc..
Embodiment 5:
On the basis of embodiment 2, it is as follows that subsidiary field described in the present embodiment sets process:
By choosing the field of same meaning, set master meter and the subsidiary field from table, such as personal information comparison, the optional field such as identification card number, name.Choosing subsidiary field is represent that the data of same information are compared to better retrieve in comparing process, owing in tables of data, data volume is big, if not choosing subsidiary field, the situation of cartesian product during comparing, can be produced, causing that comparison data amount increases.Choose the subsidiary field of different meaning, it is possible to produce different comparison results, facilitate user's various dimensions data are compared.
Embodiment 6:
On the basis of embodiment 2, it is as follows that comparison rules described in the present embodiment sets process:
Due in comparing process, part field can not be made directly comparison, need to come whether confirmation there are differences through computing, so the comparison of this kind of field can be realized by setup algorithm formula, such as house transaction information comparison, A table exists total transaction amount, B table exists floor space and every square meter price, then can set the every square meter price of the floor space * in B table and to compare with the total transaction amount field in A table.
Embodiment 7:
On the basis of 1-6 any embodiment, described in the present embodiment, method operating procedure is as follows:
1) standardization data acquisition:
Self-defining data acquisition module, the data cleansing that internet data or other unit are provided, is organized into normalized data and is deposited in database table;
2) data Rapid matching:
Data in two tables are associated, the if there is no critical field such as number of registration, identification card number, it is possible to take the subsidiary field such as title, area to carry out fuzzy matching, facilitate next step comparing;
3) comparison rules is self-defined:
Select two tables, it is stipulated that master meter with from table, select the field of required comparison and the field of data association, then set the calculated relationship between the field of required comparison, directly display out, by calculating, the data and field that there are differences.
Embodiment 8:
On the basis of embodiment 7, the data collecting standard of method code requirement described in the present embodiment, internet data is obtained by web crawlers technology, the data that other unit provides, by providing the modes such as excel or data base directly link to provide.
Embodiment 9:
On the basis of embodiment 7, the system involved by method described in the present embodiment takes B/S framework, and user realizes master data definition at browser interface, subsidiary field sets and comparison rules sets.
Embodiment of above is merely to illustrate the present invention; and it is not limitation of the present invention; those of ordinary skill about technical field; without departing from the spirit and scope of the present invention; can also make a variety of changes and modification; therefore all equivalent technical schemes fall within scope of the invention, and the scope of patent protection of the present invention should be defined by the claims.

Claims (9)

1. a self-defined comparison method flexibly, it is characterized in that: described method is by defining the master data in comparing process, the matching field in data, comparing is carried out in conjunction with primary traffic, field by two lists of self-defined required comparison and required comparison, set the comparison rules of field, the data that system regular marks set by user there are differences.
2. one according to claim 1 self-defined comparison method flexibly, it is characterised in that: described method is set by data acquisition, master data definition, subsidiary field setting, comparison rules, completes the comparison of data.
3. one according to claim 2 self-defined comparison method flexibly, it is characterised in that: described data acquisition is as follows:
First, the template of data acquisition is defined, it is ensured that in the data collected, critical field can collect;
Secondly, the acquisition time of data is set, facilitates follow-up data different dimensions to compare;
Finally, the data deriving from different systems are separately deposited, and flag data source.
4. one according to claim 2 self-defined comparison method flexibly, it is characterised in that: described master data definition procedure is as follows:
According to the requirement of comparing business, define master and slave table and the field of required comparison, it is determined that the meaning of these fields, the source of data, title in respective system.
5. one according to claim 2 self-defined comparison method flexibly, it is characterised in that: it is as follows that described subsidiary field sets process:
Choose the field of same meaning, set master meter and the subsidiary field from table.
6. one according to claim 2 self-defined comparison method flexibly, it is characterised in that: it is as follows that described comparison rules sets process:
In comparing process, it is necessary to carrying out, through computing, this kind of field whether confirmation there are differences, its comparison is realized by setup algorithm formula.
7. according to the arbitrary described one self-defined comparison method flexibly of claim 1-6, it is characterised in that: described method operating procedure is as follows:
1) standardization data acquisition:
Self-defining data acquisition module, the data cleansing that internet data or other unit are provided, is organized into normalized data and is deposited in database table;
2) data Rapid matching:
Data in two tables are associated, if there is no critical field, take subsidiary field to carry out fuzzy matching, facilitate next step comparing;
3) comparison rules is self-defined:
Select two tables, it is stipulated that master meter with from table, select the field of required comparison and the field of data association, then set the calculated relationship between the field of required comparison, directly display out, by calculating, the data and field that there are differences.
8. one according to claim 7 self-defined comparison method flexibly, it is characterized in that: the data collecting standard of described method code requirement, internet data is obtained by web crawlers technology, the data that other unit provides, by providing excel or data base to directly link mode and provide.
9. one according to claim 7 self-defined comparison method flexibly, it is characterised in that: the system involved by described method takes B/S framework, and user realizes master data definition at browser interface, subsidiary field sets and comparison rules sets.
CN201610122752.4A 2016-03-04 2016-03-04 Flexible custom comparison method Pending CN105740465A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610122752.4A CN105740465A (en) 2016-03-04 2016-03-04 Flexible custom comparison method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610122752.4A CN105740465A (en) 2016-03-04 2016-03-04 Flexible custom comparison method

Publications (1)

Publication Number Publication Date
CN105740465A true CN105740465A (en) 2016-07-06

Family

ID=56249772

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610122752.4A Pending CN105740465A (en) 2016-03-04 2016-03-04 Flexible custom comparison method

Country Status (1)

Country Link
CN (1) CN105740465A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971002A (en) * 2017-04-18 2017-07-21 北京思特奇信息技术股份有限公司 A kind of data auditing method and system
CN107679054A (en) * 2017-06-12 2018-02-09 平安科技(深圳)有限公司 Data comparison method, device and readable storage medium storing program for executing
CN107767924A (en) * 2017-11-13 2018-03-06 医渡云(北京)技术有限公司 Initial data checking method, device, electronic equipment and storage medium
CN107918679A (en) * 2018-01-04 2018-04-17 国网福建省电力有限公司 Data warehouse model technological layer difference comparison method
CN108170805A (en) * 2017-12-28 2018-06-15 福建中金在线信息科技有限公司 A kind of tables of data comparative approach, device, electronic equipment and readable storage medium storing program for executing
CN108334452A (en) * 2018-02-08 2018-07-27 深圳壹账通智能科技有限公司 Regular data transfers test method, device, computer equipment and storage medium
CN109828985A (en) * 2018-12-27 2019-05-31 大唐软件技术股份有限公司 A kind of list difference querying method and device
CN110175182A (en) * 2019-05-30 2019-08-27 口碑(上海)信息技术有限公司 Verification of data method and device
WO2022012536A1 (en) * 2020-07-14 2022-01-20 International Business Machines Corporation Auto detection of matching fields in entity resolution systems

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458705A (en) * 2008-12-29 2009-06-17 阿里巴巴集团控股有限公司 Data collating method between different utility systems, apparatus and system
CN101902532A (en) * 2009-05-27 2010-12-01 北京汉铭通信有限公司 Data auditing method and system of telecommunication services
CN103391311A (en) * 2013-06-24 2013-11-13 北京奇虎科技有限公司 Method and system for consistency verification of data among multiple platforms
US20130339795A1 (en) * 2012-06-15 2013-12-19 The Boeing Company Failure Analysis Validation And Visualization
CN104156832A (en) * 2014-08-28 2014-11-19 国家电网公司 Intersystem data verification method and device
CN104199958A (en) * 2014-09-18 2014-12-10 浪潮软件集团有限公司 Intelligent data analysis, comparison and pushing method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458705A (en) * 2008-12-29 2009-06-17 阿里巴巴集团控股有限公司 Data collating method between different utility systems, apparatus and system
CN101902532A (en) * 2009-05-27 2010-12-01 北京汉铭通信有限公司 Data auditing method and system of telecommunication services
US20130339795A1 (en) * 2012-06-15 2013-12-19 The Boeing Company Failure Analysis Validation And Visualization
CN103391311A (en) * 2013-06-24 2013-11-13 北京奇虎科技有限公司 Method and system for consistency verification of data among multiple platforms
CN104156832A (en) * 2014-08-28 2014-11-19 国家电网公司 Intersystem data verification method and device
CN104199958A (en) * 2014-09-18 2014-12-10 浪潮软件集团有限公司 Intelligent data analysis, comparison and pushing method

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106971002A (en) * 2017-04-18 2017-07-21 北京思特奇信息技术股份有限公司 A kind of data auditing method and system
CN107679054B (en) * 2017-06-12 2019-11-05 平安科技(深圳)有限公司 Data comparison method, device and readable storage medium storing program for executing
CN107679054A (en) * 2017-06-12 2018-02-09 平安科技(深圳)有限公司 Data comparison method, device and readable storage medium storing program for executing
WO2018227880A1 (en) * 2017-06-12 2018-12-20 平安科技(深圳)有限公司 Data comparison method, apparatus and device, and readable storage medium
CN107767924A (en) * 2017-11-13 2018-03-06 医渡云(北京)技术有限公司 Initial data checking method, device, electronic equipment and storage medium
CN108170805A (en) * 2017-12-28 2018-06-15 福建中金在线信息科技有限公司 A kind of tables of data comparative approach, device, electronic equipment and readable storage medium storing program for executing
CN108170805B (en) * 2017-12-28 2021-07-02 福建中金在线信息科技有限公司 Data table comparison method and device, electronic equipment and readable storage medium
CN107918679A (en) * 2018-01-04 2018-04-17 国网福建省电力有限公司 Data warehouse model technological layer difference comparison method
CN108334452A (en) * 2018-02-08 2018-07-27 深圳壹账通智能科技有限公司 Regular data transfers test method, device, computer equipment and storage medium
CN109828985B (en) * 2018-12-27 2021-06-25 大唐软件技术股份有限公司 Form difference query method and device
CN109828985A (en) * 2018-12-27 2019-05-31 大唐软件技术股份有限公司 A kind of list difference querying method and device
CN110175182A (en) * 2019-05-30 2019-08-27 口碑(上海)信息技术有限公司 Verification of data method and device
CN110175182B (en) * 2019-05-30 2021-07-02 口碑(上海)信息技术有限公司 Data checking method and device
WO2022012536A1 (en) * 2020-07-14 2022-01-20 International Business Machines Corporation Auto detection of matching fields in entity resolution systems
US11726980B2 (en) 2020-07-14 2023-08-15 International Business Machines Corporation Auto detection of matching fields in entity resolution systems

Similar Documents

Publication Publication Date Title
CN105740465A (en) Flexible custom comparison method
Bonica et al. A common-space measure of state supreme court ideology
CN104394118A (en) User identity identification method and system
CN105761139A (en) Account checking system and method
US20220076231A1 (en) System and method for enrichment of transaction data
CN107067170A (en) A kind of cost accounting and quotation system based on assembling decomposition texture
CN105894298A (en) E-commerce O2O platform system
CN104424613A (en) Value added tax invoice monitoring method and system thereof
Cheramin et al. Resilient NdFeB magnet recycling under the impacts of COVID-19 pandemic: Stochastic programming and Benders decomposition
CN103901883A (en) System and method for detecting aircraft manufacturing configuration control accuracy
CN108665283A (en) The recognition methods of hotel's house type price exception of OTA platforms and system
CN111831682B (en) Method, apparatus, device and computer readable medium for processing accumulation fund service
CN110705968A (en) Artificial intelligence-based approval flow solution method and device and computer equipment
CN107944905A (en) A kind of method and system of construction enterprises' material purchases price analysis
US20080086347A1 (en) Business information management system, business information management method, and business information management program
CN103578064A (en) Department management system of hospital equipment department
CN111401703A (en) Resource information scheduling method and device
CN105827873B (en) A kind of solution strange land client traffic handles limited method and device
US9946736B2 (en) Constructing a database of verified individuals
Giroud et al. Factors determining supply linkages between transnational corporations and local suppliers in ASEAN
CN109559085A (en) The measures and procedures for the examination and approval and terminal device based on BIM light weighed model
CN110825919A (en) ID data processing method and device
CN103870944A (en) Medicine logistics management model used for hospital information system
CN106157216A (en) A kind of spatial information data management method and system
CN115129356A (en) Target event billboard generation method, storage medium and electronic device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20160706

RJ01 Rejection of invention patent application after publication