CN110807026A - Automatic capture system for analyzing financial big data blood relationship - Google Patents
Automatic capture system for analyzing financial big data blood relationship Download PDFInfo
- Publication number
- CN110807026A CN110807026A CN201911015208.XA CN201911015208A CN110807026A CN 110807026 A CN110807026 A CN 110807026A CN 201911015208 A CN201911015208 A CN 201911015208A CN 110807026 A CN110807026 A CN 110807026A
- Authority
- CN
- China
- Prior art keywords
- node
- data
- information
- big data
- analyzing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 239000008280 blood Substances 0.000 title claims abstract description 28
- 210000004369 blood Anatomy 0.000 title claims abstract description 28
- 238000000034 method Methods 0.000 claims abstract description 15
- 230000008569 process Effects 0.000 claims abstract description 9
- 238000004140 cleaning Methods 0.000 claims abstract description 8
- 230000002068 genetic effect Effects 0.000 claims description 9
- 238000012800 visualization Methods 0.000 claims description 4
- 238000009877 rendering Methods 0.000 claims description 3
- 230000002776 aggregation Effects 0.000 claims description 2
- 238000004220 aggregation Methods 0.000 claims description 2
- 238000013499 data model Methods 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 2
- 230000008033 biological extinction Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000008520 organization Effects 0.000 description 1
- 230000002085 persistent effect Effects 0.000 description 1
- 238000012216 screening Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/221—Column-oriented storage; Management thereof
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/288—Entity relationship models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
Abstract
The invention provides an automatic capture system for analyzing financial big data blood relationship, which comprises: collecting nodes of a big data platform: the information node is used for collecting information nodes; a node analysis unit: the information node is used for analyzing the acquired information node to acquire a main node, a data outflow node and a data inflow node; a node cleaning unit: for resolving non-multisource heterogeneous data; and the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment; and the information node relation is visualized, namely a data blood relationship map. The method can realize the establishment of the blood relationship in the processes of establishing, changing, converting and the like of the data models of the components such as the big data basic platform HIVE, FALCON, SQOOP and the like and the rapid capture in the massive metadata model, and increases the function of efficiently and automatically establishing the blood relationship support for the massive metadata.
Description
Technical Field
The invention relates to the technical field of financial big data, in particular to an automatic capture system for analyzing the genetic relationship of financial big data.
Background
In the big data era, data is explosively increased, and massive and various types of data are rapidly generated. The huge and complicated data information generates new data through the contact fusion, conversion transformation and circulation, and the new data is converged into the ocean of the data.
The data generation, processing fusion, circulation and circulation are carried out until the data are finally lost, and a relationship can be naturally formed among the data. We refer to a similar relationship in human society to express this relationship between data, called the relationship of blood-based data. Unlike the relationship of blood relationship in human society, the relationship of blood relationship of data also contains some characteristic features:
1. and attributing. Generally, specific data is attributed to a specific organization or individual, and the data has attributes.
2. And (4) multiple sources. The same data may have multiple sources (multiple parents). One data may be generated by processing a plurality of data, and such processing may be a plurality of data.
3. Traceability. The blood relationship of the data shows the life cycle of the data, shows the whole process from generation to extinction of the data, and has traceability.
4. And (4) layering. The data is hierarchical in its relationship to blood. The description information of the data such as classification, induction and summarization of the data forms new data, and the description information of different degrees forms the hierarchy of the data.
However, the superficial blood relationship is simply constructed in the data warehouse, and the application range is narrow.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an automatic capture system for analyzing the genetic relationship of financial big data.
The invention is realized by the following technical scheme:
the invention provides an automatic capture system for analyzing financial big data blood relationship, which comprises: collecting nodes of a big data platform: the information node is used for collecting information nodes;
and the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment;
and the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment;
a node analysis unit: the information node is used for analyzing the acquired information node to acquire a main node, a data outflow node and a data inflow node;
a node cleaning unit: for resolving non-multisource heterogeneous data;
and the information node relation is visualized, namely a data blood relationship map.
Preferably, in the collection node aggregation stage, the analysis of the multi-source heterogeneous data is completed through a data synchronization tool, where the data synchronization tool includes an open source framework DataX.
Preferably, the bloody border relationships of the acquired data are persisted into an N0SQL distributed database.
Preferably, the visualization technology of the blood relationship map adopts D3, and efficient rendering of the blood relationship map can be rapidly realized through various built-in interfaces.
Preferably, if parameters to be replaced exist in the acquisition nodes, the node analysis unit replaces the parameters to acquire a data inflow node set and a data outflow node set;
the node analysis module is used for acquiring the data flow-in node set and the data flow-out node set which are acquired by the node analysis module, and the node analysis module is used for analyzing the data flow-in node set and the data flow-out node set to obtain the data blood relationship.
According to the embodiment of the invention, the data blood relationship graph is constructed by analyzing the task statements, so that the multi-level blood relationship can be analyzed, and the analysis of the multi-source heterogeneous blood relationship can be realized. And the establishment and tracing of blood relationship of big data base platforms HIVE, HBASE, SQOOP, FALCON and the like are realized.
Drawings
FIG. 1 is a schematic flow chart of an automated capture system for analyzing genetic relationship of financial big data according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring collectively to fig. 1, the automated capture system for analyzing genetic relationship of financial big data of the present application comprises:
collecting nodes of a big data platform: the information node is used for collecting information nodes; and monitoring and collecting the relevant operation process of the big data platform assembly model in real time through the HOOK HOOK pre-embedded.
And the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment; according to the scheme, the metadata relation and the entity information are stored by adopting HBASE + ELASTICSEARCH, and the SCHMEALESS mode is adopted, so that the metadata entities and the entity rule model structure information of different platforms do not need to be concerned. The unified storage model can support metadata with different attribute information in different service fields to the maximum extent; mass data access can be supported, and flexible horizontal extension is guaranteed; meanwhile, the full-text retrieval characteristic is provided, and the quick maturity positioning and the efficient blood relationship tracing of the metadata entity can be realized. And persisting the consanguinity relationship of the acquired data into an N0SQL distributed database.
A node analysis unit: the information node is used for analyzing the acquired information node to acquire a main node, a data outflow node and a data inflow node; in the collection node convergence stage, the analysis of multi-source heterogeneous data is completed through a data synchronization tool, and the data synchronization tool comprises an open source framework DataX. If parameters needing to be replaced exist in the acquisition nodes, the node analysis unit replaces the parameters to acquire a data inflow node set and a data outflow node set;
the node analysis module is used for acquiring the data flow-in node set and the data flow-out node set which are acquired by the node analysis module.
A node cleaning unit: for resolving non-multisource heterogeneous data; the cleaning rule node is used for expressing the screening standard in the data circulation process. A large amount of data are distributed in different places, the requirements on the data quality are different in each place, a data receiver can filter the accessed data according to the requirements on the data, the requirements form data standards, and data cleaning is carried out according to the standards. The cleaning rules may be varied. For example, it is required that it cannot be null, and it is required to conform to a certain format. On the visual graph, the cleaning rules are represented by a circle marked with a capital letter E, and various rules are expressed in a simplified mode to ensure the simplicity and the clarity of the graph. The operation of viewing the rule contents is also simple, and when the mouse is moved to the circle marked with the capital letter "E", the standard list is automatically displayed.
And (4) visualizing the information node relation, namely a data blood relationship map. The visualization technology of the blood relationship map adopts D3, and the efficient rendering of the blood relationship map can be quickly realized through various built-in interfaces. Only by visualization can the relationship of the blood relationship be clearly revealed to the user.
The present invention is not limited to the above preferred embodiments, and any modifications, equivalent substitutions and improvements made within the spirit and principle of the present invention should be included in the protection scope of the present invention.
Claims (5)
1. An automated capture system for analyzing financial big data consanguinity relationships, comprising:
collecting nodes of a big data platform: the information node is used for collecting information nodes;
and the big data platform node stores: the method comprises the steps of information node storage, process relation storage and relation information index establishment;
a node analysis unit: the information node is used for analyzing the acquired information node to acquire a main node, a data outflow node and a data inflow node;
a node cleaning unit: for resolving non-multisource heterogeneous data;
and the information node relation is visualized, namely a data blood relationship map.
2. The automated capture system for analyzing financial big data bloody border relationships according to claim 1, characterized in that in the collection node aggregation stage, the parsing of the multi-source heterogeneous data is completed through a data synchronization tool, which comprises an open source framework DataX.
3. The automated capture system for analyzing genetic big data relationship of claim 2, wherein the genetic relationship of the acquired data is persisted into an N0SQL distributed database.
4. The automated capturing system for analyzing the genetic relationship of financial big data according to claim 1, wherein the visualization technology of the genetic map adopts D3, and the efficient rendering of the genetic relationship map can be rapidly realized through various built-in interfaces.
5. The automated capturing system for analyzing the bloodiness edm of financial big data according to claim 1, wherein if there is a parameter to be replaced in the collection node, the node parsing unit replaces the parameter to obtain a data inflow node set and a data outflow node set;
the node analysis module is used for acquiring the data flow-in node set and the data flow-out node set which are acquired by the node analysis module, and the node analysis module is used for analyzing the data flow-in node set and the data flow-out node set to obtain the data blood relationship.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911015208.XA CN110807026A (en) | 2019-10-24 | 2019-10-24 | Automatic capture system for analyzing financial big data blood relationship |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911015208.XA CN110807026A (en) | 2019-10-24 | 2019-10-24 | Automatic capture system for analyzing financial big data blood relationship |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110807026A true CN110807026A (en) | 2020-02-18 |
Family
ID=69489034
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911015208.XA Pending CN110807026A (en) | 2019-10-24 | 2019-10-24 | Automatic capture system for analyzing financial big data blood relationship |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110807026A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723253A (en) * | 2020-05-25 | 2020-09-29 | 贵州华泰智远大数据服务有限公司 | Data blood relationship query method and query system based on graph database |
CN112100201A (en) * | 2020-09-30 | 2020-12-18 | 东莞市盟大塑化科技有限公司 | Data monitoring method, device, equipment and storage medium based on big data technology |
CN112732987A (en) * | 2020-12-31 | 2021-04-30 | 北京百分点科技集团股份有限公司 | Full life cycle data map generation system and method |
CN113282678A (en) * | 2021-03-30 | 2021-08-20 | 杭州数梦工场科技有限公司 | Data blood relationship display method and device |
CN113722310A (en) * | 2021-09-16 | 2021-11-30 | 北京航空航天大学 | Blood relationship information visual representation method |
CN113868253A (en) * | 2021-09-28 | 2021-12-31 | 中通服创立信息科技有限责任公司 | Data relationship capturing and big data relationship tree construction method |
CN115203179A (en) * | 2022-05-16 | 2022-10-18 | 北京航空航天大学 | Data cleaning method and device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050138160A1 (en) * | 2003-08-28 | 2005-06-23 | Accenture Global Services Gmbh | Capture, aggregation and/or visualization of structural data of architectures |
CN108228747A (en) * | 2017-12-20 | 2018-06-29 | 江苏数加数据科技有限责任公司 | Data genetic connection visualized graphs system in data improvement |
CN109684402A (en) * | 2018-12-21 | 2019-04-26 | 福建南威软件有限公司 | One kind being based on big data platform metadata genetic connection implementation method |
CN109710703A (en) * | 2019-01-03 | 2019-05-03 | 北京顺丰同城科技有限公司 | A kind of generation method and device of genetic connection network |
CN110019315A (en) * | 2018-06-19 | 2019-07-16 | 杭州数澜科技有限公司 | A kind of method and apparatus for the parsing of data blood relationship |
CN110019384A (en) * | 2017-08-15 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of acquisition methods of blood relationship data provide the method and device of blood relationship data |
-
2019
- 2019-10-24 CN CN201911015208.XA patent/CN110807026A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050138160A1 (en) * | 2003-08-28 | 2005-06-23 | Accenture Global Services Gmbh | Capture, aggregation and/or visualization of structural data of architectures |
CN110019384A (en) * | 2017-08-15 | 2019-07-16 | 阿里巴巴集团控股有限公司 | A kind of acquisition methods of blood relationship data provide the method and device of blood relationship data |
CN108228747A (en) * | 2017-12-20 | 2018-06-29 | 江苏数加数据科技有限责任公司 | Data genetic connection visualized graphs system in data improvement |
CN110019315A (en) * | 2018-06-19 | 2019-07-16 | 杭州数澜科技有限公司 | A kind of method and apparatus for the parsing of data blood relationship |
CN109684402A (en) * | 2018-12-21 | 2019-04-26 | 福建南威软件有限公司 | One kind being based on big data platform metadata genetic connection implementation method |
CN109710703A (en) * | 2019-01-03 | 2019-05-03 | 北京顺丰同城科技有限公司 | A kind of generation method and device of genetic connection network |
Non-Patent Citations (2)
Title |
---|
张岩等: "商业银行信息系统中"血缘分析"技术的应用研究", 《信息技术与信息化》 * |
金泳: "基于数据仓库的数据血缘管理研究", 《轻工科技》 * |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111723253A (en) * | 2020-05-25 | 2020-09-29 | 贵州华泰智远大数据服务有限公司 | Data blood relationship query method and query system based on graph database |
CN112100201A (en) * | 2020-09-30 | 2020-12-18 | 东莞市盟大塑化科技有限公司 | Data monitoring method, device, equipment and storage medium based on big data technology |
CN112732987A (en) * | 2020-12-31 | 2021-04-30 | 北京百分点科技集团股份有限公司 | Full life cycle data map generation system and method |
CN112732987B (en) * | 2020-12-31 | 2022-12-06 | 北京百分点科技集团股份有限公司 | Full life cycle data map generation system and method |
CN113282678A (en) * | 2021-03-30 | 2021-08-20 | 杭州数梦工场科技有限公司 | Data blood relationship display method and device |
CN113722310A (en) * | 2021-09-16 | 2021-11-30 | 北京航空航天大学 | Blood relationship information visual representation method |
CN113868253A (en) * | 2021-09-28 | 2021-12-31 | 中通服创立信息科技有限责任公司 | Data relationship capturing and big data relationship tree construction method |
CN113868253B (en) * | 2021-09-28 | 2024-04-23 | 中通服创立信息科技有限责任公司 | Data relationship capturing and big data relationship tree construction method |
CN115203179A (en) * | 2022-05-16 | 2022-10-18 | 北京航空航天大学 | Data cleaning method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110807026A (en) | Automatic capture system for analyzing financial big data blood relationship | |
CN107886238B (en) | Business process management system and method based on mass data analysis | |
CN107943463B (en) | Interactive mode automation big data analysis application development system | |
Ragan et al. | Characterizing provenance in visualization and data analysis: an organizational framework of provenance types and purposes | |
CN107315776B (en) | Data management system based on cloud computing | |
US9960974B2 (en) | Dependency mapping among a system of servers, analytics and visualization thereof | |
Maiti et al. | Capturing, eliciting, predicting and prioritizing (CEPP) non-functional requirements metadata during the early stages of agile software development | |
CN111125068A (en) | Metadata management method and system | |
US8799859B2 (en) | Augmented design structure matrix visualizations for software system analysis | |
CN106649718B (en) | A kind of big data acquisition and processing method for PDM system | |
CN112527791A (en) | Intelligent urban brain big data system | |
CN112181960A (en) | Intelligent operation and maintenance framework system based on AIOps | |
CN117056867B (en) | Multi-source heterogeneous data fusion method and system for digital twin | |
CN109684402A (en) | One kind being based on big data platform metadata genetic connection implementation method | |
CN116662441A (en) | Distributed data blood margin construction and display method | |
CN103842973A (en) | Monitoring stored procedure execution | |
CN112579563A (en) | Power grid big data-based warehouse visualization modeling system and method | |
Taleghani | Executive information systems development lifecycle | |
CN113987139A (en) | Knowledge graph-based visual query management system for software defect cases of aircraft engine FADEC system | |
CN105117980A (en) | Power grid equipment state automatic evaluation method | |
CN116991931A (en) | Metadata management method and system | |
Duan et al. | Visualization and analysis in automated trace retrieval | |
CN116596412A (en) | Method and system for realizing talent type portrait | |
CN111414355A (en) | Offshore wind farm data monitoring and storing system, method and device | |
CN117009441A (en) | Knowledge graph construction system and method based on relational database |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20200218 |
|
WD01 | Invention patent application deemed withdrawn after publication |