CN105740465A - Flexible custom comparison method - Google Patents
Flexible custom comparison method Download PDFInfo
- Publication number
- CN105740465A CN105740465A CN201610122752.4A CN201610122752A CN105740465A CN 105740465 A CN105740465 A CN 105740465A CN 201610122752 A CN201610122752 A CN 201610122752A CN 105740465 A CN105740465 A CN 105740465A
- Authority
- CN
- China
- Prior art keywords
- data
- comparison
- field
- self
- master
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000005516 engineering process Methods 0.000 claims description 6
- 238000012790 confirmation Methods 0.000 claims description 3
- 238000011017 operating method Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 description 6
- 238000010276 construction Methods 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012797 qualification Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2282—Tablespace storage structures; Management thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses a flexible self-defining comparison method, which is characterized in that main data and matching fields in the data are defined in the data comparison process, data comparison is carried out by combining a main data stream, comparison rules of the fields are set by self-defining two forms to be compared and the fields to be compared, and a system marks data with differences according to the rules set by a user. The method can calculate the comparison result only by determining the comparison rule between the fields of the master table and the slave table, thereby greatly improving the comparison efficiency between different data tables.
Description
Technical field
The present invention relates to computer communication technology field, be specifically related to one self-defined comparison method flexibly, a kind of method that data selecting separate sources flexibly are compared.
Background technology
Along with the popularization of informationization application, in large corporation, the quantity of information system is increasing gradually.But lack data-interface owing to each system data conventions is imperfect, between system, and make Data Integration work become an important process of IT department;Data Integration contains the extraction of data, conversion and loading, is directed to computer technology and service logic.Data Integration is very important, even can affect the success or failure of information system/Construction of Data Warehouse.One object is stored in multiple system so that have to carry out the comparison of data in the process of Data Integration, and, comparing work now is also the core content of Data Integration.
Each system is different to the understanding angle of same target, and the object definition therefore provided also differs;Considering the integrity of data, the time limitation etc. of data causes that the comparison work of data is not smooth simultaneously.Comparing is generally automatically processed by computer program, and stage needs artificial participation.The Data Integration relating to comparing is present in many Information System configuration processes, such as the construction of national basis Information Library System;Population database data is respectively derived from public security bureau, bureau of labour, Bureau of Education, Labor and Social Security Bureau etc., and legal person's database data derives from industrial and commercial bureau, the tax bureau, Economic and Trade Commission etc.;Geographic information resources storehouse derives from regional planning agency, Real Estate Administration, Bureau of Water Resources, builds pipe office, Traffic Administration Bureau etc..
Summary of the invention
The technical problem to be solved in the present invention is: lack interface etc. owing to the data standard in different systems is inconsistent, between system, cause that comparing work is difficult to, in order to solve to compare between the data of separate sources problem, the present invention provides one self-defined comparison method flexibly.
The technical solution adopted in the present invention is:
One is self-defined comparison method flexibly, described method is by defining the master data in comparing process, the matching field in data, comparing is carried out in conjunction with primary traffic, field by two lists of self-defined required comparison and required comparison, set the comparison rules of field, the data that system regular marks set by user there are differences.
Described method completes the comparison of data by steps such as data acquisition, master data definition, subsidiary field setting, comparison rules settings.
Described data acquisition is as follows:
First, the template of data acquisition is defined, it is ensured that in the data collected, critical field can collect;
Secondly, the acquisition time of data is set, facilitates follow-up data different dimensions to compare;
Finally, the data deriving from different systems are separately deposited, and flag data source.
Described master data definition procedure is as follows:
According to the requirement of comparing business, define master and slave table and the field of required comparison, it is determined that the meaning of these fields, the source of data, title in respective system.
It is as follows that described subsidiary field sets process:
Choose the field of same meaning, set master meter and the subsidiary field from table.
It is as follows that described comparison rules sets process:
In comparing process, it is necessary to carrying out, through computing, this kind of field whether confirmation there are differences, its comparison is realized by setup algorithm formula.
Described method operating procedure is as follows:
1) standardization data acquisition:
Self-defining data acquisition module, the data cleansing that internet data or other unit are provided, is organized into normalized data and is deposited in database table;
2) data Rapid matching:
Data in two tables are associated, if there is no critical field, take subsidiary field to carry out fuzzy matching, facilitate next step comparing;
3) comparison rules is self-defined:
Select two tables, it is stipulated that master meter with from table, select the field of required comparison and the field of data association, then set the calculated relationship between the field of required comparison, directly display out, by calculating, the data and field that there are differences.
The data collecting standard of described method code requirement, internet data is obtained by web crawlers technology, the data that other unit provides, by providing excel or data base to directly link mode and provide.
System involved by described method takes B/S framework, and user realizes master data definition at browser interface, subsidiary field sets and comparison rules sets.
The invention have the benefit that
The inventive method only needs the comparison rules between clearly master and slave literary name section just can calculate comparison result, substantially increases the comparison efficiency between different pieces of information table.
Accompanying drawing explanation
Fig. 1 is the inventive method flow chart.
Detailed description of the invention
Below in conjunction with Figure of description, by detailed description of the invention, the present invention is further described:
Embodiment 1:
One is self-defined comparison method flexibly, described method is by defining the master data in comparing process, the matching field in data, comparing is carried out in conjunction with primary traffic, field by two lists of self-defined required comparison and required comparison, set the comparison rules of field, the data that system regular marks set by user there are differences.
Embodiment 2:
As it is shown in figure 1, on the basis of embodiment 1, method described in the present embodiment completes the comparison of data by qualification steps such as data acquisition, master data definition, subsidiary field setting, comparison rules settings.
Embodiment 3:
On the basis of embodiment 2, described in the present embodiment, data acquisition is as follows:
First, defining the template of data acquisition, it is ensured that in the data collected, critical field can collect, for instance collector's information, the field such as name, age, sex, identification card number must exist;
Secondly, the acquisition time of data is set, facilitates follow-up data different dimensions to compare;
Finally, the data deriving from different systems are separately deposited and flag data source, so just more targeted in comparing process.
Embodiment 4:
On the basis of embodiment 2, described in the present embodiment, master data definition procedure is as follows:
According to the requirement of comparing business, define master and slave table and the field of required comparison, it is determined that the meaning of these fields, the source of data, title in respective system.
This step is the core of comparing work, it is necessary to the source of clear and definite critical data item and responsibility.As for demographic data: name and ID (identity number) card No. derive from public security bureau, education degree derives from Bureau of Education, and employment unit information derives from Labor and Social Security Bureau etc..
Embodiment 5:
On the basis of embodiment 2, it is as follows that subsidiary field described in the present embodiment sets process:
By choosing the field of same meaning, set master meter and the subsidiary field from table, such as personal information comparison, the optional field such as identification card number, name.Choosing subsidiary field is represent that the data of same information are compared to better retrieve in comparing process, owing in tables of data, data volume is big, if not choosing subsidiary field, the situation of cartesian product during comparing, can be produced, causing that comparison data amount increases.Choose the subsidiary field of different meaning, it is possible to produce different comparison results, facilitate user's various dimensions data are compared.
Embodiment 6:
On the basis of embodiment 2, it is as follows that comparison rules described in the present embodiment sets process:
Due in comparing process, part field can not be made directly comparison, need to come whether confirmation there are differences through computing, so the comparison of this kind of field can be realized by setup algorithm formula, such as house transaction information comparison, A table exists total transaction amount, B table exists floor space and every square meter price, then can set the every square meter price of the floor space * in B table and to compare with the total transaction amount field in A table.
Embodiment 7:
On the basis of 1-6 any embodiment, described in the present embodiment, method operating procedure is as follows:
1) standardization data acquisition:
Self-defining data acquisition module, the data cleansing that internet data or other unit are provided, is organized into normalized data and is deposited in database table;
2) data Rapid matching:
Data in two tables are associated, the if there is no critical field such as number of registration, identification card number, it is possible to take the subsidiary field such as title, area to carry out fuzzy matching, facilitate next step comparing;
3) comparison rules is self-defined:
Select two tables, it is stipulated that master meter with from table, select the field of required comparison and the field of data association, then set the calculated relationship between the field of required comparison, directly display out, by calculating, the data and field that there are differences.
Embodiment 8:
On the basis of embodiment 7, the data collecting standard of method code requirement described in the present embodiment, internet data is obtained by web crawlers technology, the data that other unit provides, by providing the modes such as excel or data base directly link to provide.
Embodiment 9:
On the basis of embodiment 7, the system involved by method described in the present embodiment takes B/S framework, and user realizes master data definition at browser interface, subsidiary field sets and comparison rules sets.
Embodiment of above is merely to illustrate the present invention; and it is not limitation of the present invention; those of ordinary skill about technical field; without departing from the spirit and scope of the present invention; can also make a variety of changes and modification; therefore all equivalent technical schemes fall within scope of the invention, and the scope of patent protection of the present invention should be defined by the claims.
Claims (9)
1. a self-defined comparison method flexibly, it is characterized in that: described method is by defining the master data in comparing process, the matching field in data, comparing is carried out in conjunction with primary traffic, field by two lists of self-defined required comparison and required comparison, set the comparison rules of field, the data that system regular marks set by user there are differences.
2. one according to claim 1 self-defined comparison method flexibly, it is characterised in that: described method is set by data acquisition, master data definition, subsidiary field setting, comparison rules, completes the comparison of data.
3. one according to claim 2 self-defined comparison method flexibly, it is characterised in that: described data acquisition is as follows:
First, the template of data acquisition is defined, it is ensured that in the data collected, critical field can collect;
Secondly, the acquisition time of data is set, facilitates follow-up data different dimensions to compare;
Finally, the data deriving from different systems are separately deposited, and flag data source.
4. one according to claim 2 self-defined comparison method flexibly, it is characterised in that: described master data definition procedure is as follows:
According to the requirement of comparing business, define master and slave table and the field of required comparison, it is determined that the meaning of these fields, the source of data, title in respective system.
5. one according to claim 2 self-defined comparison method flexibly, it is characterised in that: it is as follows that described subsidiary field sets process:
Choose the field of same meaning, set master meter and the subsidiary field from table.
6. one according to claim 2 self-defined comparison method flexibly, it is characterised in that: it is as follows that described comparison rules sets process:
In comparing process, it is necessary to carrying out, through computing, this kind of field whether confirmation there are differences, its comparison is realized by setup algorithm formula.
7. according to the arbitrary described one self-defined comparison method flexibly of claim 1-6, it is characterised in that: described method operating procedure is as follows:
1) standardization data acquisition:
Self-defining data acquisition module, the data cleansing that internet data or other unit are provided, is organized into normalized data and is deposited in database table;
2) data Rapid matching:
Data in two tables are associated, if there is no critical field, take subsidiary field to carry out fuzzy matching, facilitate next step comparing;
3) comparison rules is self-defined:
Select two tables, it is stipulated that master meter with from table, select the field of required comparison and the field of data association, then set the calculated relationship between the field of required comparison, directly display out, by calculating, the data and field that there are differences.
8. one according to claim 7 self-defined comparison method flexibly, it is characterized in that: the data collecting standard of described method code requirement, internet data is obtained by web crawlers technology, the data that other unit provides, by providing excel or data base to directly link mode and provide.
9. one according to claim 7 self-defined comparison method flexibly, it is characterised in that: the system involved by described method takes B/S framework, and user realizes master data definition at browser interface, subsidiary field sets and comparison rules sets.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610122752.4A CN105740465A (en) | 2016-03-04 | 2016-03-04 | Flexible custom comparison method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610122752.4A CN105740465A (en) | 2016-03-04 | 2016-03-04 | Flexible custom comparison method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105740465A true CN105740465A (en) | 2016-07-06 |
Family
ID=56249772
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610122752.4A Pending CN105740465A (en) | 2016-03-04 | 2016-03-04 | Flexible custom comparison method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105740465A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106971002A (en) * | 2017-04-18 | 2017-07-21 | 北京思特奇信息技术股份有限公司 | A kind of data auditing method and system |
CN107679054A (en) * | 2017-06-12 | 2018-02-09 | 平安科技(深圳)有限公司 | Data comparison method, device and readable storage medium storing program for executing |
CN107767924A (en) * | 2017-11-13 | 2018-03-06 | 医渡云(北京)技术有限公司 | Initial data checking method, device, electronic equipment and storage medium |
CN107918679A (en) * | 2018-01-04 | 2018-04-17 | 国网福建省电力有限公司 | Data warehouse model technological layer difference comparison method |
CN108170805A (en) * | 2017-12-28 | 2018-06-15 | 福建中金在线信息科技有限公司 | A kind of tables of data comparative approach, device, electronic equipment and readable storage medium storing program for executing |
CN108334452A (en) * | 2018-02-08 | 2018-07-27 | 深圳壹账通智能科技有限公司 | Regular data transfers test method, device, computer equipment and storage medium |
CN109828985A (en) * | 2018-12-27 | 2019-05-31 | 大唐软件技术股份有限公司 | A kind of list difference querying method and device |
CN110175182A (en) * | 2019-05-30 | 2019-08-27 | 口碑(上海)信息技术有限公司 | Verification of data method and device |
WO2022012536A1 (en) * | 2020-07-14 | 2022-01-20 | International Business Machines Corporation | Auto detection of matching fields in entity resolution systems |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458705A (en) * | 2008-12-29 | 2009-06-17 | 阿里巴巴集团控股有限公司 | Data collating method between different utility systems, apparatus and system |
CN101902532A (en) * | 2009-05-27 | 2010-12-01 | 北京汉铭通信有限公司 | Data auditing method and system of telecommunication services |
CN103391311A (en) * | 2013-06-24 | 2013-11-13 | 北京奇虎科技有限公司 | Method and system for consistency verification of data among multiple platforms |
US20130339795A1 (en) * | 2012-06-15 | 2013-12-19 | The Boeing Company | Failure Analysis Validation And Visualization |
CN104156832A (en) * | 2014-08-28 | 2014-11-19 | 国家电网公司 | Intersystem data verification method and device |
CN104199958A (en) * | 2014-09-18 | 2014-12-10 | 浪潮软件集团有限公司 | Intelligent data analysis, comparison and pushing method |
-
2016
- 2016-03-04 CN CN201610122752.4A patent/CN105740465A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101458705A (en) * | 2008-12-29 | 2009-06-17 | 阿里巴巴集团控股有限公司 | Data collating method between different utility systems, apparatus and system |
CN101902532A (en) * | 2009-05-27 | 2010-12-01 | 北京汉铭通信有限公司 | Data auditing method and system of telecommunication services |
US20130339795A1 (en) * | 2012-06-15 | 2013-12-19 | The Boeing Company | Failure Analysis Validation And Visualization |
CN103391311A (en) * | 2013-06-24 | 2013-11-13 | 北京奇虎科技有限公司 | Method and system for consistency verification of data among multiple platforms |
CN104156832A (en) * | 2014-08-28 | 2014-11-19 | 国家电网公司 | Intersystem data verification method and device |
CN104199958A (en) * | 2014-09-18 | 2014-12-10 | 浪潮软件集团有限公司 | Intelligent data analysis, comparison and pushing method |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106971002A (en) * | 2017-04-18 | 2017-07-21 | 北京思特奇信息技术股份有限公司 | A kind of data auditing method and system |
CN107679054B (en) * | 2017-06-12 | 2019-11-05 | 平安科技(深圳)有限公司 | Data comparison method, device and readable storage medium storing program for executing |
CN107679054A (en) * | 2017-06-12 | 2018-02-09 | 平安科技(深圳)有限公司 | Data comparison method, device and readable storage medium storing program for executing |
WO2018227880A1 (en) * | 2017-06-12 | 2018-12-20 | 平安科技(深圳)有限公司 | Data comparison method, apparatus and device, and readable storage medium |
CN107767924A (en) * | 2017-11-13 | 2018-03-06 | 医渡云(北京)技术有限公司 | Initial data checking method, device, electronic equipment and storage medium |
CN108170805A (en) * | 2017-12-28 | 2018-06-15 | 福建中金在线信息科技有限公司 | A kind of tables of data comparative approach, device, electronic equipment and readable storage medium storing program for executing |
CN108170805B (en) * | 2017-12-28 | 2021-07-02 | 福建中金在线信息科技有限公司 | Data table comparison method and device, electronic equipment and readable storage medium |
CN107918679A (en) * | 2018-01-04 | 2018-04-17 | 国网福建省电力有限公司 | Data warehouse model technological layer difference comparison method |
CN108334452A (en) * | 2018-02-08 | 2018-07-27 | 深圳壹账通智能科技有限公司 | Regular data transfers test method, device, computer equipment and storage medium |
CN109828985B (en) * | 2018-12-27 | 2021-06-25 | 大唐软件技术股份有限公司 | Form difference query method and device |
CN109828985A (en) * | 2018-12-27 | 2019-05-31 | 大唐软件技术股份有限公司 | A kind of list difference querying method and device |
CN110175182A (en) * | 2019-05-30 | 2019-08-27 | 口碑(上海)信息技术有限公司 | Verification of data method and device |
CN110175182B (en) * | 2019-05-30 | 2021-07-02 | 口碑(上海)信息技术有限公司 | Data checking method and device |
WO2022012536A1 (en) * | 2020-07-14 | 2022-01-20 | International Business Machines Corporation | Auto detection of matching fields in entity resolution systems |
US11726980B2 (en) | 2020-07-14 | 2023-08-15 | International Business Machines Corporation | Auto detection of matching fields in entity resolution systems |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105740465A (en) | Flexible custom comparison method | |
Bonica et al. | A common-space measure of state supreme court ideology | |
CN104394118A (en) | User identity identification method and system | |
CN105761139A (en) | Account checking system and method | |
US20220076231A1 (en) | System and method for enrichment of transaction data | |
CN107067170A (en) | A kind of cost accounting and quotation system based on assembling decomposition texture | |
CN105894298A (en) | E-commerce O2O platform system | |
CN104424613A (en) | Value added tax invoice monitoring method and system thereof | |
Cheramin et al. | Resilient NdFeB magnet recycling under the impacts of COVID-19 pandemic: Stochastic programming and Benders decomposition | |
CN103901883A (en) | System and method for detecting aircraft manufacturing configuration control accuracy | |
CN108665283A (en) | The recognition methods of hotel's house type price exception of OTA platforms and system | |
CN111831682B (en) | Method, apparatus, device and computer readable medium for processing accumulation fund service | |
CN110705968A (en) | Artificial intelligence-based approval flow solution method and device and computer equipment | |
CN107944905A (en) | A kind of method and system of construction enterprises' material purchases price analysis | |
US20080086347A1 (en) | Business information management system, business information management method, and business information management program | |
CN103578064A (en) | Department management system of hospital equipment department | |
CN111401703A (en) | Resource information scheduling method and device | |
CN105827873B (en) | A kind of solution strange land client traffic handles limited method and device | |
US9946736B2 (en) | Constructing a database of verified individuals | |
Giroud et al. | Factors determining supply linkages between transnational corporations and local suppliers in ASEAN | |
CN109559085A (en) | The measures and procedures for the examination and approval and terminal device based on BIM light weighed model | |
CN110825919A (en) | ID data processing method and device | |
CN103870944A (en) | Medicine logistics management model used for hospital information system | |
CN106157216A (en) | A kind of spatial information data management method and system | |
CN115129356A (en) | Target event billboard generation method, storage medium and electronic device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160706 |
|
RJ01 | Rejection of invention patent application after publication |