CN104331481A - Method for obtaining relation between business model data and physical model data based on large-scale data collision - Google Patents

Method for obtaining relation between business model data and physical model data based on large-scale data collision Download PDF

Info

Publication number
CN104331481A
CN104331481A CN201410626483.6A CN201410626483A CN104331481A CN 104331481 A CN104331481 A CN 104331481A CN 201410626483 A CN201410626483 A CN 201410626483A CN 104331481 A CN104331481 A CN 104331481A
Authority
CN
China
Prior art keywords
data
model data
business
business model
physical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410626483.6A
Other languages
Chinese (zh)
Inventor
杨高超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Communication Information System Co Ltd
Original Assignee
Inspur Communication Information System Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Communication Information System Co Ltd filed Critical Inspur Communication Information System Co Ltd
Priority to CN201410626483.6A priority Critical patent/CN104331481A/en
Publication of CN104331481A publication Critical patent/CN104331481A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Fuzzy Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for obtaining relation between business model data and physical model data based on large-scale data collision, belonging to the field of business intelligence. A scene verified by the method is an exterior line transmission system in a communication network. The method comprises the following steps: capturing effective exterior line transmission model data from a foreground user interface to analyze and obtain a corresponding relation between an exterior line transmission user business model and a physical model of an exterior line system database to obtain a source structure of data. Compared with the prior art, the method is capable of obtaining most of the valuable metadata information of a target data source and has a good popularization and application value.

Description

The method of business model data and physical model data relation is obtained based on large-scale data collision
 
Technical field
The present invention relates to business intelligence field, specifically a kind of method obtaining business model data and physical model data relation based on large-scale data collision.
Background technology
Developing rapidly of business intelligence, provides strong help for enterprise makes wise business business decision.In the business intelligence project of user, have an appointment 80% work, time and expense all spend in data integration.But, because the data source of each enterprise information system is closed independently of one another, mutually, data feature identify and mapping is produced very large " obstacle ".Therefore, how effective integrated management is carried out to data and become the inevitable choice strengthening Enterprise business competitive power.
Data integration needs, by the necessary public data structure of integrated application, namely must disclose list structure, relation between table, the implication etc. of coding, but " obstacle " between enterprise to make us likely cannot obtain these information comprehensively and accurately.
Summary of the invention
Technical assignment of the present invention is for the Realistic Dilemma in above-mentioned, provides a kind of method obtaining business model data and physical model data relation based on large-scale data collision.The method passes through probability model, carry out large-scale data collision coupling, thus judge business model data and the most possible true mapping relations of physical model data, and then obtain the valuable metadata information of target data source major part, to solve the problem of data integration.
Technical assignment of the present invention realizes in the following manner: the method obtaining business model data and physical model data relation based on large-scale data collision, be characterized in: the target of the example of the method is the transmission outbound system in communication network, but is not limited only to the transmission outbound system in communication network.Analyze by filling the customer service model data collection capturing outer system from the user interface of outer system system, by the background data base of the customer service model data collection of crawl and outer system system is comprehensively analyzed, and then the mapping relations obtained between business model data set and outbound system background data base table, thus obtain the mapping relations of business model data and physical model data, namely obtain the business implication of the tables of data of outer system database.
Specifically, first the method captures the customer service model data of outbound system from the foreground user interface of outbound system, is called business model data set, and the mode of crawl can use the method for internet crawler capturing or the artificial method captured; The business model data set grabbed is analyzed with the physical data in transmission outbound system background data base, the business model data set analyzed in the user interface of foreground comes from which physical model data background data base, thus the corresponding relation obtained between physical model data table that business model data set corresponding to foreground user interface and back-end data store, finally know the metadata information in transmission outbound system
Comprise following steps:
A, capture the business datum record of an outbound system user interface, i.e. the data seen from the interface that outbound system apply of user, the data instance of an acquisition business model, referred to herein as a business model data set ;
B, steps A gained business model data set to be mated with the full dose data of outbound system backstage physical model, the data of namely all with backstage in outbound system physical models are mated, obtain the matching relationship that this possible business model data set tables of data corresponding with physical model in outside line database is possible, and stored record is carried out to relational model;
C, by analyzing the relational model likely existed, the valuable metadata information of consumer positioning, for follow-up data integration provides basis.
As preferably, utilize database oppositely to analyze in step B and with web crawlers technology, business datum is mated.
In step B, the corresponding relation of business model data and physical model data is:
The record of business model data centralization mates with maximum matching rate with the individual data table in physical model;
Or:
The record of business model data centralization mates with maximum matching rate with the single master data sheet in physical model and multiple associated data table;
Or:
In business datum record set, some field is mated in a kind of mode of Function Mapping with some field of the tables of data in physical model.
The method obtaining business model data and physical model data relation based on large-scale data collision of the present invention, after acquisition business model data set, by carrying out collision and the comparison of full dose with the physical model data that stores in background data base, the probability model that utilization may be mated finds the corresponding relation of most probable business model data set and tables of data corresponding to physical model, and then obtain the valuable metadata information of major part of target data source, break in data integration process, company is to " barrier " and " obstacle " of data sharing, for follow-up data integration provides basis.Compared with prior art there is following outstanding beneficial effect:
1) analyzed by user interface business datum: the interface business datum of user is oppositely analyzed data source data structure as medium, break " obstacle " that company produces data sharing;
2) based on probability statistical analysis: the matching degree of the user service data obtained from interface and physical database table, be come out with the form of probability, matching degree is higher, and the data source structure of acquisition is more accurate;
3) all data crash analysis: the full-service data according to obtaining are mated with user's full database data, the relation that analysis business model data and physical model data exist, can the valuable metadata information of degree of depth digging user data source.
Accompanying drawing explanation
Accompanying drawing 1 the present invention is based on the process flow diagram that large-scale data collision obtains the method for business model data and physical model data relation;
Accompanying drawing 2 the present invention is based on the function model figure that physical data is resolved in large-scale data collision.
Embodiment
Method based on large-scale data collision acquisition business model data and physical model data relation of the present invention is described in detail below with specific embodiment with reference to Figure of description.
Embodiment:
As shown in accompanying drawing 1,2, the method obtaining business model data and physical model data relation based on large-scale data collision of the present invention, its implementation is as follows:
The customer data base of A, the method analysis is the transmission outbound system in communication network, the all user interface business datum collection captured are carried out statistical study and data mining, list master meter corresponding to single business model data set and relevant information thereof, comprising: sequence number, master meter, contingency table, service fields number.For CTP model, the CPT business model data set grabbed from user interface is as shown in table 1:
Table 1:CTP business model data set
B, the business model data set obtained by steps A are carried out collision with the data in the tables of data of transmission outbound system background data base and mate, utilize different database metadata management characteristics, obtain all tables of data titles and the Data field names of transmission outbound system background data base.According to the metadata of these physical models of the transmission outbound system got, carry out business model data set to collide with the full dose of all physical model data in transmission outside line backstage, obtain the corresponding relation of business model data and physical model data in the mode of probability statistics, obtain matching result as follows:
Master meter: RES_CTP
CTP title->CTP_NAME; USERLABEL; LABEL_CN;
The ID->RELATED_DISTRICT_CUID of associated region;
The CUID->RELATED_EMS_CUID of affiliated EMS;
Home network element unique identification->RELATED_NE_CUID;
Affiliated port unique identification->RELATED_PTP_CUID;
CTP layer speed->CTP_LAYER_RATE;
Chinese->CTP_NAME; USERLABEL; LABEL_CN;
Cascade scope->CONTIGUOUS_RANGE; REMARK; REST_VOL;
Identifier-the >FDN that when transmission system collection comes up, system generates;
Subordinate dish CUID->RELATED_CARD_CUID;
Group version->GT_VERSION; IS_MAP_MODE;
Internal time slot number->INNER_NUM; PROTECT_MODE;
Object type coding->OBJECT_TYPE_CODE;
2M reduced rate->CONTIGUOUS_RANGE; REMARK; REST_VOL;
Circuit information remarks->CONTIGUOUS_RANGE; REMARK; REST_VOL;
Gather the friendly name->CTP_NAME of the user that comes up; USERLABEL; LABEL_CN;
CUID->CUID;
Available mark->ISDELETE; STATEFLAG;
Timestamp->TIME_STAMP;
Creation-time->CREAT_TIME;
M->LAST_MODIFY_TIME during Last modification;
Unique identification->INT_ID;
Unique identification-B->OBJECTID;
Delete flag->IS_CHANNEL; ISDELETE;
---------------------------------------------------------------------
Contingency table: RMS_CITY
Affiliated cities and counties->ZH_LABEL; ZH1;
RMS_CITY.INT_ID->RES_CTP.CITY_ID
Contingency table: RES_CARD
Ownership board->ZH_LABEL;
Home network element unique identification->RELATED_DEVICE_CUID;
Related information:
RES_CARD.CITY_ID->RES_CTP.CITY_ID
The field of not mating
Connection status
Cascade style
---------------------------------------------------------------------
This Output rusults shows, the physical model master meter that business model data set is corresponding is RES_CTP, three field CTP_NAME in the corresponding master meter of " CTP title " possibility wherein in business model, USERLABEL, LABEL_CN(is because the value of these three fields is the same, and the CTP ranking coupling of energy and business model data centralization); " the affiliated cities and counties " of business model data centralization with in physical model table RMS_CITY ZH_LABEL corresponding, the field CITY_ID in master meter RES_CTP associates with the field INT_ID in contingency table RMS_CITY; Connection status, the cascade style field of business model data centralization do not find corresponding relation.
Different business model data sets is captured from the different interfaces of transmission outbound system by means such as artificial or web crawlers, and for the operation that these business model data set iterative loop are above-mentioned, the corresponding relation transmitting the physical model that the business model data set in outbound system on all visible interfaces stores with the outer linear system background data base of transmission can be obtained.
Based on above-mentioned implementation, analytic process of the present invention is as follows:
1) utilize mode that is artificial or web crawlers to capture business model data set from user interface, and carry out division statistics with different business, obtain the single business statistics result shown in table 2;
Table 2: business statistics table
2) step 1) in result analyze, arrange, formed the coupling statistics shown in table 3.Wherein, matching rate has been come by probability statistics, and same business datum may with the corresponding different data source table of different probability.Therefore, choosing maximum matching probability determination data source list structure, is the key analyzed;
Table 3: coupling statistical form
3) this data area analyzed is outside line database, and one has 653 tables, 12442 fields, 147810967 records.This is analyzed, and has intercepted 61 effective transmission outside line model data collection analyze from user interface.Obtain the corresponding relation of the physical model data of 55 business model data sets and background data base, and once statistics is summed up to matching result, comprise: master meter is total, service fields is total, determine field sum, Optional Field total, non-matching field sum, total match-percentage, and analysis result is as shown in table 4:
Table 4: result statistical form
4) as known from Table 4, adopt the present invention, by data collision detection technique, can obtain the valuable metadata information of in target data source more than 80%, sharing for data integration provides effective solution;
5) add up through large-scale data crash analysis and matching probability, following 3 kinds of business model data and physical model data corresponding relation can be determined, specifically as follows in conjunction with example:
A, business model data set mate with maximum matching rate with individual data table in physical model: certain business datum table of acquisition can both mate one by one with whole fields that in outside line database, certain is shown.Example is as shown in table 5, table 6:
Table 5: certain business datum table
Table 6: physical data table
B, business model data set mate with maximum matching rate with the multiple contingency tables in physical model: certain is shown and contingency table mates with maximum fields with outside line database for certain business datum table of acquisition.Example is as shown in table 7, table 8, table 9:
Table 7: certain business datum table
Table 8: physical data table
Table 9: association physical data table
C, some field of business model data set are mated in a kind of mode of Function Mapping with some field of physical model data table: in theory, adopt business datum to carry out collision with full dose physical data to mate, the coupling of 100% can be obtained in theory, the full coupling of service fields sum namely in table 4.And in fact only have the field of more than 80% to mate, and known through statistical study, also there is the corresponding relation of another business model data and physical model data: Function Mapping relation.In this case, data collision can not be leaned on merely to mate to have come, because certain field X of business datum, can map by certain (the function correspondence that simple boolean is corresponding or complicated), but exist with Y in physics table, in this case, in data collision matching process, just need to introduce related function and do mapping conversion coupling.Due to Function Mapping more complicated, only do relevant matches process to simple mapping in the present invention, example is as follows:
A) calling corresponding relation: for the ease of storing, artificially specifying that the corresponding real Business Name of some another name simply easy to remember (Boolean type, numeral, word or letter etc.) stores, as shown in table 10, table 11
Table 10: business datum table
Table 11: physical data table
B) computing corresponding relation: business datum is obtained, as shown in table 12, table 13 by the related operation of some field of physical data
Table 12: business datum table
Table 13: physical data table
6) by above statistical study, can determine to transmit the mapping relations with visible business model in the user interface of transmission outbound system in outbound system physical model, thus can the structure of automatic analysis transmission outbound system physical model, for follow-up enterprise data integration provides basis.
7) also there are some problems that can't resolve in this programme.If the data in physical model do not show in the user interface, be not namely used to, we obtain the mapping of this part physical model and business model by having no idea.Certainly, because invisible to user, so this part content may be concerning dispensable business.

Claims (3)

1. obtain the method for business model data and physical model data relation based on large-scale data collision, it is characterized in that:
The data area of the example of this method is the transmission outbound system of communication network, first from the foreground user interface of outbound system, capture the customer service model data of outbound system, be called business model data set, the mode of crawl can use the method for internet crawler capturing or the artificial method captured; The business model data set grabbed is analyzed with the physical data in transmission outbound system background data base, the business model data set analyzed in the user interface of foreground comes from which physical model data background data base, thus the corresponding relation obtained between physical model data table that business model data set corresponding to foreground user interface and back-end data store, finally know the metadata information in transmission outbound system
Comprise following steps:
The business datum record of A, a crawl outbound system user interface, obtains a business model data set ;
B, steps A gained business model data set to be mated with the full dose data of outbound system backstage physical model, obtain the matching relationship that this possible business model data set tables of data corresponding with physical model in outside line database is possible, and stored record is carried out to relational model;
C, by analyzing the relational model likely existed, the valuable metadata information of consumer positioning, for follow-up data integration provides basis.
2. the method obtaining business model data and physical model data relation based on large-scale data collision according to claim 1, is characterized in that: utilize database oppositely to analyze in step B and mate business datum with web crawlers technology.
3. the method obtaining business model data set and physical model data relation based on large-scale data collision according to claim 1, is characterized in that the corresponding relation of business model data and physical model data in step B is:
The record of business model data centralization mates with maximum matching rate with the individual data table in physical model;
Or:
The record of business model data centralization mates with maximum matching rate with the single master data sheet in physical model and multiple associated data table;
Or:
In business datum record set, some field is mated in a kind of mode of Function Mapping with some field of the tables of data in physical model.
CN201410626483.6A 2014-11-10 2014-11-10 Method for obtaining relation between business model data and physical model data based on large-scale data collision Pending CN104331481A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410626483.6A CN104331481A (en) 2014-11-10 2014-11-10 Method for obtaining relation between business model data and physical model data based on large-scale data collision

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410626483.6A CN104331481A (en) 2014-11-10 2014-11-10 Method for obtaining relation between business model data and physical model data based on large-scale data collision

Publications (1)

Publication Number Publication Date
CN104331481A true CN104331481A (en) 2015-02-04

Family

ID=52406208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410626483.6A Pending CN104331481A (en) 2014-11-10 2014-11-10 Method for obtaining relation between business model data and physical model data based on large-scale data collision

Country Status (1)

Country Link
CN (1) CN104331481A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633028A (en) * 2017-09-01 2018-01-26 广州慧睿思通信息科技有限公司 A kind of method and system of dynamic data collision association
CN108763565A (en) * 2018-06-04 2018-11-06 广东京信软件科技有限公司 A kind of matched construction method of data auto-associating based on deep learning

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729460A (en) * 2014-01-10 2014-04-16 中国南方电网有限责任公司 Graphical data model managing method and system based on metadata
CN103853843A (en) * 2014-03-20 2014-06-11 浪潮集团山东通用软件有限公司 Method for realizing data concentration across security domains based on main data mapping
CN104049690A (en) * 2014-06-10 2014-09-17 浪潮电子信息产业股份有限公司 Model design method by using critical application host to cope with high concurrent business

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103729460A (en) * 2014-01-10 2014-04-16 中国南方电网有限责任公司 Graphical data model managing method and system based on metadata
CN103853843A (en) * 2014-03-20 2014-06-11 浪潮集团山东通用软件有限公司 Method for realizing data concentration across security domains based on main data mapping
CN104049690A (en) * 2014-06-10 2014-09-17 浪潮电子信息产业股份有限公司 Model design method by using critical application host to cope with high concurrent business

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张忠平等: "基于元数据驱动的 ET L 架构设计", 《计算机应用与软件》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633028A (en) * 2017-09-01 2018-01-26 广州慧睿思通信息科技有限公司 A kind of method and system of dynamic data collision association
CN107633028B (en) * 2017-09-01 2020-10-30 广州慧睿思通信息科技有限公司 Dynamic data collision association method and system
CN108763565A (en) * 2018-06-04 2018-11-06 广东京信软件科技有限公司 A kind of matched construction method of data auto-associating based on deep learning

Similar Documents

Publication Publication Date Title
CN105069703B (en) A kind of electrical network mass data management method
CN102404126B (en) Charging method of cloud computing during application process
CN106709012A (en) Method and device for analyzing big data
CN111459766A (en) Calling chain tracking and analyzing method for micro-service system
CN106778876A (en) User classification method and system based on mobile subscriber track similitude
CN102111296A (en) Mining method for communication alarm association rule based on maximal frequent item set
CN104462222A (en) Distributed storage method and system for checkpoint vehicle pass data
CN103605651A (en) Data processing showing method based on on-line analytical processing (OLAP) multi-dimensional analysis
CN107818024A (en) A kind of request ID transmission methods and system based on spring blockers
CN106250424A (en) The searching method of a kind of daily record context, Apparatus and system
CN103488683B (en) Microblog data management system and implementation method thereof
CN111046000B (en) Government data exchange sharing oriented security supervision metadata organization method
CN103049496A (en) Method, apparatus and device for dividing multiple users into user groups
CN105071966B (en) Server is extracted in a kind of log information management method and daily record
CN111400393B (en) Data processing method and device based on multi-application platform and storage medium
KR101982756B1 (en) System and Method for processing complex stream data using distributed in-memory
CN112100402A (en) Power grid knowledge graph construction method and device
CN115278737A (en) Data acquisition method of 5G network
CN102110139B (en) Analytic algorithm for geographic grid in telecommunication field
CN104331481A (en) Method for obtaining relation between business model data and physical model data based on large-scale data collision
CN109614521A (en) A kind of efficient secret protection subgraph inquiry processing method
CN109344190A (en) A kind of police service data processing method and device
CN105589900A (en) Data mining method based on multi-dimensional analysis
CN105512270A (en) Method and device for determining related objects
CN106161403A (en) Application program restored method, device and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150204