CN110807026A - Automatic capture system for analyzing financial big data blood relationship - Google Patents

Automatic capture system for analyzing financial big data blood relationship Download PDF

Info

Publication number
CN110807026A
CN110807026A CN201911015208.XA CN201911015208A CN110807026A CN 110807026 A CN110807026 A CN 110807026A CN 201911015208 A CN201911015208 A CN 201911015208A CN 110807026 A CN110807026 A CN 110807026A
Authority
CN
China
Prior art keywords
node
data
information
big data
analyzing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911015208.XA
Other languages
Chinese (zh)
Inventor
万洋
吴非
何坚
薛小朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Jiexin Information Technology Co Ltd
Original Assignee
Beijing Zhongke Jiexin Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Jiexin Information Technology Co Ltd filed Critical Beijing Zhongke Jiexin Information Technology Co Ltd
Priority to CN201911015208.XA priority Critical patent/CN110807026A/en
Publication of CN110807026A publication Critical patent/CN110807026A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/221Column-oriented storage; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Abstract

The invention provides an automatic capture system for analyzing financial big data blood relationship, which comprises: collecting nodes of a big data platform: the information node is used for collecting information nodes; a node analysis unit: the information node is used for analyzing the acquired information node to acquire a main node, a data outflow node and a data inflow node; a node cleaning unit: for resolving non-multisource heterogeneous data; and the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment; and the information node relation is visualized, namely a data blood relationship map. The method can realize the establishment of the blood relationship in the processes of establishing, changing, converting and the like of the data models of the components such as the big data basic platform HIVE, FALCON, SQOOP and the like and the rapid capture in the massive metadata model, and increases the function of efficiently and automatically establishing the blood relationship support for the massive metadata.

Description

Automatic capture system for analyzing financial big data blood relationship
Technical Field
The invention relates to the technical field of financial big data, in particular to an automatic capture system for analyzing the genetic relationship of financial big data.
Background
In the big data era, data is explosively increased, and massive and various types of data are rapidly generated. The huge and complicated data information generates new data through the contact fusion, conversion transformation and circulation, and the new data is converged into the ocean of the data.
The data generation, processing fusion, circulation and circulation are carried out until the data are finally lost, and a relationship can be naturally formed among the data. We refer to a similar relationship in human society to express this relationship between data, called the relationship of blood-based data. Unlike the relationship of blood relationship in human society, the relationship of blood relationship of data also contains some characteristic features:
1. and attributing. Generally, specific data is attributed to a specific organization or individual, and the data has attributes.
2. And (4) multiple sources. The same data may have multiple sources (multiple parents). One data may be generated by processing a plurality of data, and such processing may be a plurality of data.
3. Traceability. The blood relationship of the data shows the life cycle of the data, shows the whole process from generation to extinction of the data, and has traceability.
4. And (4) layering. The data is hierarchical in its relationship to blood. The description information of the data such as classification, induction and summarization of the data forms new data, and the description information of different degrees forms the hierarchy of the data.
However, the superficial blood relationship is simply constructed in the data warehouse, and the application range is narrow.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an automatic capture system for analyzing the genetic relationship of financial big data.
The invention is realized by the following technical scheme:
the invention provides an automatic capture system for analyzing financial big data blood relationship, which comprises: collecting nodes of a big data platform: the information node is used for collecting information nodes;
and the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment;
and the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment;
a node analysis unit: the information node is used for analyzing the acquired information node to acquire a main node, a data outflow node and a data inflow node;
a node cleaning unit: for resolving non-multisource heterogeneous data;
and the information node relation is visualized, namely a data blood relationship map.
Preferably, in the collection node aggregation stage, the analysis of the multi-source heterogeneous data is completed through a data synchronization tool, where the data synchronization tool includes an open source framework DataX.
Preferably, the bloody border relationships of the acquired data are persisted into an N0SQL distributed database.
Preferably, the visualization technology of the blood relationship map adopts D3, and efficient rendering of the blood relationship map can be rapidly realized through various built-in interfaces.
Preferably, if parameters to be replaced exist in the acquisition nodes, the node analysis unit replaces the parameters to acquire a data inflow node set and a data outflow node set;
the node analysis module is used for acquiring the data flow-in node set and the data flow-out node set which are acquired by the node analysis module, and the node analysis module is used for analyzing the data flow-in node set and the data flow-out node set to obtain the data blood relationship.
According to the embodiment of the invention, the data blood relationship graph is constructed by analyzing the task statements, so that the multi-level blood relationship can be analyzed, and the analysis of the multi-source heterogeneous blood relationship can be realized. And the establishment and tracing of blood relationship of big data base platforms HIVE, HBASE, SQOOP, FALCON and the like are realized.
Drawings
FIG. 1 is a schematic flow chart of an automated capture system for analyzing genetic relationship of financial big data according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring collectively to fig. 1, the automated capture system for analyzing genetic relationship of financial big data of the present application comprises:
collecting nodes of a big data platform: the information node is used for collecting information nodes; and monitoring and collecting the relevant operation process of the big data platform assembly model in real time through the HOOK HOOK pre-embedded.
And the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment; according to the scheme, the metadata relation and the entity information are stored by adopting HBASE + ELASTICSEARCH, and the SCHMEALESS mode is adopted, so that the metadata entities and the entity rule model structure information of different platforms do not need to be concerned. The unified storage model can support metadata with different attribute information in different service fields to the maximum extent; mass data access can be supported, and flexible horizontal extension is guaranteed; meanwhile, the full-text retrieval characteristic is provided, and the quick maturity positioning and the efficient blood relationship tracing of the metadata entity can be realized. And persisting the consanguinity relationship of the acquired data into an N0SQL distributed database.
A node analysis unit: the information node is used for analyzing the acquired information node to acquire a main node, a data outflow node and a data inflow node; in the collection node convergence stage, the analysis of multi-source heterogeneous data is completed through a data synchronization tool, and the data synchronization tool comprises an open source framework DataX. If parameters needing to be replaced exist in the acquisition nodes, the node analysis unit replaces the parameters to acquire a data inflow node set and a data outflow node set;
the node analysis module is used for acquiring the data flow-in node set and the data flow-out node set which are acquired by the node analysis module.
A node cleaning unit: for resolving non-multisource heterogeneous data; the cleaning rule node is used for expressing the screening standard in the data circulation process. A large amount of data are distributed in different places, the requirements on the data quality are different in each place, a data receiver can filter the accessed data according to the requirements on the data, the requirements form data standards, and data cleaning is carried out according to the standards. The cleaning rules may be varied. For example, it is required that it cannot be null, and it is required to conform to a certain format. On the visual graph, the cleaning rules are represented by a circle marked with a capital letter E, and various rules are expressed in a simplified mode to ensure the simplicity and the clarity of the graph. The operation of viewing the rule contents is also simple, and when the mouse is moved to the circle marked with the capital letter "E", the standard list is automatically displayed.
And (4) visualizing the information node relation, namely a data blood relationship map. The visualization technology of the blood relationship map adopts D3, and the efficient rendering of the blood relationship map can be quickly realized through various built-in interfaces. Only by visualization can the relationship of the blood relationship be clearly revealed to the user.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (5)

1. An automated capture system for analyzing financial big data consanguinity relationships, comprising:
collecting nodes of a big data platform: the information node is used for collecting information nodes;
and the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment;
a node analysis unit: the information node is used for analyzing the acquired information node to acquire a main node, a data outflow node and a data inflow node;
a node cleaning unit: for resolving non-multisource heterogeneous data;
and the information node relation is visualized, namely a data blood relationship map.
2. The automated capture system for analyzing financial big data bloody border relationships according to claim 1, characterized in that in the collection node aggregation stage, the parsing of the multi-source heterogeneous data is completed through a data synchronization tool, which comprises an open source framework DataX.
3. The automated capture system for analyzing genetic big data relationship of claim 2, wherein the genetic relationship of the acquired data is persisted into an N0SQL distributed database.
4. The automated capturing system for analyzing the genetic relationship of financial big data according to claim 1, wherein the visualization technology of the genetic map adopts D3, and the efficient rendering of the genetic relationship map can be rapidly realized through various built-in interfaces.
5. The automated capturing system for analyzing the bloodiness edm of financial big data according to claim 1, wherein if there is a parameter to be replaced in the collection node, the node parsing unit replaces the parameter to obtain a data inflow node set and a data outflow node set;
the node analysis module is used for acquiring the data flow-in node set and the data flow-out node set which are acquired by the node analysis module, and the node analysis module is used for analyzing the data flow-in node set and the data flow-out node set to obtain the data blood relationship.
CN201911015208.XA 2019-10-24 2019-10-24 Automatic capture system for analyzing financial big data blood relationship Pending CN110807026A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911015208.XA CN110807026A (en) 2019-10-24 2019-10-24 Automatic capture system for analyzing financial big data blood relationship

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911015208.XA CN110807026A (en) 2019-10-24 2019-10-24 Automatic capture system for analyzing financial big data blood relationship

Publications (1)

Publication Number Publication Date
CN110807026A true CN110807026A (en) 2020-02-18

Family

ID=69489034

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911015208.XA Pending CN110807026A (en) 2019-10-24 2019-10-24 Automatic capture system for analyzing financial big data blood relationship

Country Status (1)

Country Link
CN (1) CN110807026A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723253A (en) * 2020-05-25 2020-09-29 贵州华泰智远大数据服务有限公司 Data blood relationship query method and query system based on graph database
CN112100201A (en) * 2020-09-30 2020-12-18 东莞市盟大塑化科技有限公司 Data monitoring method, device, equipment and storage medium based on big data technology
CN112732987A (en) * 2020-12-31 2021-04-30 北京百分点科技集团股份有限公司 Full life cycle data map generation system and method
CN113282678A (en) * 2021-03-30 2021-08-20 杭州数梦工场科技有限公司 Data blood relationship display method and device
CN113722310A (en) * 2021-09-16 2021-11-30 北京航空航天大学 Blood relationship information visual representation method
CN113868253A (en) * 2021-09-28 2021-12-31 中通服创立信息科技有限责任公司 Data relationship capturing and big data relationship tree construction method
CN115203179A (en) * 2022-05-16 2022-10-18 北京航空航天大学 Data cleaning method and device

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138160A1 (en) * 2003-08-28 2005-06-23 Accenture Global Services Gmbh Capture, aggregation and/or visualization of structural data of architectures
CN108228747A (en) * 2017-12-20 2018-06-29 江苏数加数据科技有限责任公司 Data genetic connection visualized graphs system in data improvement
CN109684402A (en) * 2018-12-21 2019-04-26 福建南威软件有限公司 One kind being based on big data platform metadata genetic connection implementation method
CN109710703A (en) * 2019-01-03 2019-05-03 北京顺丰同城科技有限公司 A kind of generation method and device of genetic connection network
CN110019315A (en) * 2018-06-19 2019-07-16 杭州数澜科技有限公司 A kind of method and apparatus for the parsing of data blood relationship
CN110019384A (en) * 2017-08-15 2019-07-16 阿里巴巴集团控股有限公司 A kind of acquisition methods of blood relationship data provide the method and device of blood relationship data

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050138160A1 (en) * 2003-08-28 2005-06-23 Accenture Global Services Gmbh Capture, aggregation and/or visualization of structural data of architectures
CN110019384A (en) * 2017-08-15 2019-07-16 阿里巴巴集团控股有限公司 A kind of acquisition methods of blood relationship data provide the method and device of blood relationship data
CN108228747A (en) * 2017-12-20 2018-06-29 江苏数加数据科技有限责任公司 Data genetic connection visualized graphs system in data improvement
CN110019315A (en) * 2018-06-19 2019-07-16 杭州数澜科技有限公司 A kind of method and apparatus for the parsing of data blood relationship
CN109684402A (en) * 2018-12-21 2019-04-26 福建南威软件有限公司 One kind being based on big data platform metadata genetic connection implementation method
CN109710703A (en) * 2019-01-03 2019-05-03 北京顺丰同城科技有限公司 A kind of generation method and device of genetic connection network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张岩等: "商业银行信息系统中"血缘分析"技术的应用研究", 《信息技术与信息化》 *
金泳: "基于数据仓库的数据血缘管理研究", 《轻工科技》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723253A (en) * 2020-05-25 2020-09-29 贵州华泰智远大数据服务有限公司 Data blood relationship query method and query system based on graph database
CN112100201A (en) * 2020-09-30 2020-12-18 东莞市盟大塑化科技有限公司 Data monitoring method, device, equipment and storage medium based on big data technology
CN112732987A (en) * 2020-12-31 2021-04-30 北京百分点科技集团股份有限公司 Full life cycle data map generation system and method
CN112732987B (en) * 2020-12-31 2022-12-06 北京百分点科技集团股份有限公司 Full life cycle data map generation system and method
CN113282678A (en) * 2021-03-30 2021-08-20 杭州数梦工场科技有限公司 Data blood relationship display method and device
CN113722310A (en) * 2021-09-16 2021-11-30 北京航空航天大学 Blood relationship information visual representation method
CN113868253A (en) * 2021-09-28 2021-12-31 中通服创立信息科技有限责任公司 Data relationship capturing and big data relationship tree construction method
CN113868253B (en) * 2021-09-28 2024-04-23 中通服创立信息科技有限责任公司 Data relationship capturing and big data relationship tree construction method
CN115203179A (en) * 2022-05-16 2022-10-18 北京航空航天大学 Data cleaning method and device

Similar Documents

Publication Publication Date Title
CN110807026A (en) Automatic capture system for analyzing financial big data blood relationship
CN107886238B (en) Business process management system and method based on mass data analysis
CN107943463B (en) Interactive mode automation big data analysis application development system
Ragan et al. Characterizing provenance in visualization and data analysis: an organizational framework of provenance types and purposes
CN107315776B (en) Data management system based on cloud computing
US9960974B2 (en) Dependency mapping among a system of servers, analytics and visualization thereof
Maiti et al. Capturing, eliciting, predicting and prioritizing (CEPP) non-functional requirements metadata during the early stages of agile software development
CN111125068A (en) Metadata management method and system
US8799859B2 (en) Augmented design structure matrix visualizations for software system analysis
CN106649718B (en) A kind of big data acquisition and processing method for PDM system
CN112527791A (en) Intelligent urban brain big data system
CN112181960A (en) Intelligent operation and maintenance framework system based on AIOps
CN117056867B (en) Multi-source heterogeneous data fusion method and system for digital twin
CN109684402A (en) One kind being based on big data platform metadata genetic connection implementation method
CN116662441A (en) Distributed data blood margin construction and display method
CN103842973A (en) Monitoring stored procedure execution
CN112579563A (en) Power grid big data-based warehouse visualization modeling system and method
Taleghani Executive information systems development lifecycle
CN113987139A (en) Knowledge graph-based visual query management system for software defect cases of aircraft engine FADEC system
CN105117980A (en) Power grid equipment state automatic evaluation method
CN116991931A (en) Metadata management method and system
Duan et al. Visualization and analysis in automated trace retrieval
CN116596412A (en) Method and system for realizing talent type portrait
CN111414355A (en) Offshore wind farm data monitoring and storing system, method and device
CN117009441A (en) Knowledge graph construction system and method based on relational database

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200218

WD01 Invention patent application deemed withdrawn after publication