CN103530334B - Based on the data matching system and method for comparing template - Google Patents
Based on the data matching system and method for comparing template Download PDFInfo
- Publication number
- CN103530334B CN103530334B CN201310456767.0A CN201310456767A CN103530334B CN 103530334 B CN103530334 B CN 103530334B CN 201310456767 A CN201310456767 A CN 201310456767A CN 103530334 B CN103530334 B CN 103530334B
- Authority
- CN
- China
- Prior art keywords
- data
- matching
- threshold
- similarity
- data records
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 16
- 230000000903 blocking effect Effects 0.000 claims abstract 2
- 241001269238 Data Species 0.000 description 6
- 238000010606 normalization Methods 0.000 description 4
- 230000008859 change Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000011524 similarity measure Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/22—Social work
Abstract
Description
ID | A | B | C | D |
1 | a1 | b1 | c1 | d1 |
2 | a2 | b2 | c2 | d2 |
Claims (8)
- It is 1. a kind of based on the data matching system for comparing template, it is characterised in that including:Blocking unit, for receiving the data from not same area, piecemeal, the index are carried out to data according to the index entry of setting Item includes one or more fields of the data;Comparing unit, for obtaining matching pair for each data block, the matching to comprising two data records, and according to The rule for comparing template is each matching to calculating similarity;Taxon, for determining the matching relationship of matching two data records of centering according to default similarity threshold;It is described Comparing unit includes:Subelement is obtained, for for each block number evidence, the value identical data of the index entry to be formed into matching It is right;The similarity threshold includes first threshold and Second Threshold, and the first threshold is more than the Second Threshold;The taxon is further used for when the similarity of described two data records is more than or equal to the first threshold, really The relation of fixed described two data records for matching relationship and generates the unique mark for associating described two data records, When the similarity of described two data records is less than the first threshold and is more than the Second Threshold, described two data are determined The relation of record is doubtful relation, and when the similarity of described two data records is less than or equal to the Second Threshold, really The relation of fixed described two data records is mismatch relation.
- It is 2. according to claim 1 based on the data matching system for comparing template, it is characterised in that also to include:Data are clear Unit is washed, the data are handled according to preset data form, to meet predetermined format.
- It is 3. according to claim 1 or 2 based on the data matching system for comparing template, it is characterised in that described relatively more single Member includes computation subunit, for the same field of described two data records, calculates the same word of described two data records The similar value of section corresponding content, the similarity is determined according to the similar value of the same field corresponding content.
- It is 4. according to claim 3 based on the data matching system for comparing template, it is characterised in that the computation subunit It is further used for when described two data records have multiple same fields, the corresponding similar value sum of each same field is made For the similarity of described two data records.
- It is 5. a kind of based on the data matching method for comparing template, it is characterised in that including:The data from not same area are received, piecemeal are carried out to data according to the index entry of setting, the index entry includes the number According to one or more fields;Matching pair is obtained for each data block, the matching is to including two data records;Matched according to the rule for comparing template to be each to calculating similarity, matching centering is determined according to default similarity threshold The matching relationship of two data records;The step that matching pair is obtained for each data block specifically includes:For each block number evidence, By the value identical data composition matching pair of the index entry;The similarity threshold includes first threshold and Second Threshold, and the first threshold is more than the Second Threshold;When the similarity of described two data records is more than or equal to the first threshold, the pass of described two data records is determined It is for matching relationship and generates the unique mark for associating described two data records, in the similar of described two data records When degree is less than the first threshold and is more than the Second Threshold, the relation for determining described two data records is doubtful relation, And when the similarity of described two data records is less than or equal to the Second Threshold, determine the pass of described two data records It is for mismatch relation.
- It is 6. according to claim 5 based on the data matching method for comparing template, it is characterised in that to divide to data Before block, in addition to:The data are handled according to preset data form, to meet predetermined format.
- 7. according to claim 5 or 6 based on the data matching method for comparing template, it is characterised in that it is described according to than Rule compared with template specifically includes for each matching to the step of calculating similarity:For the same word of described two data records Section, the similar value of the same field corresponding content of described two data records is calculated, according to the same field corresponding content Similar value determines the similarity.
- It is 8. according to claim 7 based on the data matching method for comparing template, it is characterised in that in described two data When record has multiple same fields, using the corresponding similar value sum of each same field as the similar of described two data records Degree.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310456767.0A CN103530334B (en) | 2013-09-29 | 2013-09-29 | Based on the data matching system and method for comparing template |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310456767.0A CN103530334B (en) | 2013-09-29 | 2013-09-29 | Based on the data matching system and method for comparing template |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103530334A CN103530334A (en) | 2014-01-22 |
CN103530334B true CN103530334B (en) | 2018-01-23 |
Family
ID=49932343
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310456767.0A Active CN103530334B (en) | 2013-09-29 | 2013-09-29 | Based on the data matching system and method for comparing template |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103530334B (en) |
Families Citing this family (17)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104809141A (en) * | 2014-01-29 | 2015-07-29 | 携程计算机技术(上海)有限公司 | Matching system and method of hotel data |
CN105096028A (en) * | 2014-11-20 | 2015-11-25 | 北京航天金盾科技有限公司 | Intelligent matching method of population data |
CN106681524A (en) * | 2015-11-10 | 2017-05-17 | 阿里巴巴集团控股有限公司 | Method and device for processing information |
CN107291672B (en) * | 2016-03-31 | 2020-11-20 | 阿里巴巴集团控股有限公司 | Data table processing method and device |
CN106021526B (en) * | 2016-05-25 | 2019-09-27 | 东软集团股份有限公司 | News category method and device |
CN108572947B (en) * | 2017-03-13 | 2019-11-19 | 腾讯科技(深圳)有限公司 | A kind of data fusion method and device |
CN108664497B (en) * | 2017-03-30 | 2020-11-03 | 大有秦鼎(北京)科技有限公司 | Data matching method and device |
CN107193860B (en) * | 2017-03-31 | 2021-03-02 | 苏州艾隆信息技术有限公司 | Medicine information multidimensional identification method and system |
CN107203686B (en) * | 2017-03-31 | 2021-04-20 | 苏州艾隆信息技术有限公司 | Medicine information difference processing method and system |
CN107103048B (en) * | 2017-03-31 | 2021-04-20 | 苏州艾隆信息技术有限公司 | Medicine information matching method and system |
CN108038504B (en) * | 2017-12-11 | 2019-12-27 | 深圳房讯通信息技术有限公司 | Method for analyzing content of house property certificate photo |
CN108920601B (en) * | 2018-06-27 | 2020-12-01 | 中国联合网络通信集团有限公司 | Data matching method and device |
CN109063178B (en) * | 2018-08-22 | 2019-12-24 | 四川新网银行股份有限公司 | Method and device for automatically expanding self-help analysis report |
CN113535943A (en) * | 2020-04-14 | 2021-10-22 | 阿里巴巴集团控股有限公司 | Medical record classification method and device and data record classification method and device |
CN111737533B (en) * | 2020-06-19 | 2024-02-09 | 东软集团股份有限公司 | Method, device, storage medium and equipment for processing inspection items |
CN112732703B (en) * | 2021-03-23 | 2022-04-12 | 中国信息通信研究院 | Metadata processing method, metadata processing apparatus, and readable storage medium |
CN113434584B (en) * | 2021-06-28 | 2022-10-14 | 国网北京市电力公司 | Data processing method and device for power equipment and electronic equipment |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186427A (en) * | 2011-12-31 | 2013-07-03 | 中国银联股份有限公司 | System and method for analyzing data record set |
CN103257961A (en) * | 2012-02-15 | 2013-08-21 | 北大方正集团有限公司 | Method, device and system of bibliography repeat removal |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101739414A (en) * | 2008-11-25 | 2010-06-16 | 华中师范大学 | Ontological concept mapping method |
US9069850B2 (en) * | 2011-11-08 | 2015-06-30 | Comcast Cable Communications, Llc | Content descriptor |
CN102542262B (en) * | 2012-01-04 | 2013-07-31 | 东南大学 | Waveform identification method based on operating-characteristic working condition waveform library of high-speed rail |
-
2013
- 2013-09-29 CN CN201310456767.0A patent/CN103530334B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103186427A (en) * | 2011-12-31 | 2013-07-03 | 中国银联股份有限公司 | System and method for analyzing data record set |
CN103257961A (en) * | 2012-02-15 | 2013-08-21 | 北大方正集团有限公司 | Method, device and system of bibliography repeat removal |
Non-Patent Citations (1)
Title |
---|
"不同应用系统相关数据的匹配检测与借用";齐为华;《2007年CAD/CAM学术交流会议论文集》;20070531;第180-181页 * |
Also Published As
Publication number | Publication date |
---|---|
CN103530334A (en) | 2014-01-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103530334B (en) | Based on the data matching system and method for comparing template | |
CN112365987A (en) | Diagnostic data anomaly detection method and device, computer equipment and storage medium | |
US8429220B2 (en) | Data exchange among data sources | |
CN103473375A (en) | Data cleaning method and data cleaning system | |
JP2012511763A (en) | Assertion-based record linkage in a decentralized autonomous medical environment | |
CN110175697B (en) | Adverse event risk prediction system and method | |
US20100169348A1 (en) | Systems and Methods for Handling Multiple Records | |
WO2015027425A1 (en) | Method and device for storing data | |
CN103473373A (en) | Threshold matching model-based similarity analysis system and threshold matching model-based similarity analysis method | |
CN109062936B (en) | Data query method, computer readable storage medium and terminal equipment | |
CN111597177A (en) | Data governance method for improving data quality | |
CN109145003A (en) | A kind of method and device constructing knowledge mapping | |
CN105512300B (en) | information filtering method and system | |
CN110516752A (en) | Clustering cluster method for evaluating quality, device, equipment and storage medium | |
CN110909168A (en) | Knowledge graph updating method and device, storage medium and electronic device | |
CN113111063A (en) | Medical patient main index discovery method applied to multiple data sources | |
CN110019542B (en) | Generation of enterprise relationship, generation of organization member database and identification of same name member | |
WO2022222942A1 (en) | Method and apparatus for generating question and answer record, electronic device, and storage medium | |
WO2022247549A1 (en) | Drug prediction method, apparatus and device, and storage medium | |
CN109346146B (en) | Prescription checking and distributing method, electronic equipment and storage medium | |
US10192031B1 (en) | System for extracting information from DICOM structured reports | |
CN109558461B (en) | Medical data classified storage method and device | |
CN107861965A (en) | Data intelligence recognition methods and system | |
Schnell et al. | Building a national perinatal data base without the use of unique personal identifiers | |
CN105512270B (en) | Method and device for determining related objects |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
ASS | Succession or assignment of patent right |
Owner name: PKU HEALTHCARE IT CO., LTD. Free format text: FORMER OWNER: FOUNDER INTERNATIONAL CO., LTD. Effective date: 20150203 Free format text: FORMER OWNER: FOUNDER INTERNATIONAL (BEIJING) CO., LTD. Effective date: 20150203 |
|
C41 | Transfer of patent application or patent right or utility model | ||
COR | Change of bibliographic data |
Free format text: CORRECT: ADDRESS; FROM: 215123 SUZHOU, JIANGSU PROVINCE TO: 100080 HAIDIAN, BEIJING |
|
TA01 | Transfer of patent application right |
Effective date of registration: 20150203 Address after: 100080, No. 19, No. 52 West Fourth Ring Road, Beijing, Haidian District Applicant after: Peking University Medical Information Technology Co.,Ltd. Address before: Suzhou City, Jiangsu Province, Suzhou Industrial Park 215123 Xinghu Street No. 328 Creative Industry Park founder International Building Applicant before: FOUNDER INTERNATIONAL Co.,Ltd. Applicant before: Founder International Co.,Ltd. (Beijing) |
|
GR01 | Patent grant | ||
GR01 | Patent grant | ||
PP01 | Preservation of patent right | ||
PP01 | Preservation of patent right |
Effective date of registration: 20240202 Granted publication date: 20180123 |