CN111324602A - Method for realizing financial big data oriented analysis visualization - Google Patents
Method for realizing financial big data oriented analysis visualization Download PDFInfo
- Publication number
- CN111324602A CN111324602A CN202010106706.1A CN202010106706A CN111324602A CN 111324602 A CN111324602 A CN 111324602A CN 202010106706 A CN202010106706 A CN 202010106706A CN 111324602 A CN111324602 A CN 111324602A
- Authority
- CN
- China
- Prior art keywords
- data
- management
- analysis
- model
- checking
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004458 analytical method Methods 0.000 title claims abstract description 67
- 238000000034 method Methods 0.000 title claims description 53
- 238000012800 visualization Methods 0.000 title claims description 22
- 238000007726 management method Methods 0.000 claims abstract description 96
- 238000007405 data analysis Methods 0.000 claims abstract description 29
- 238000004364 calculation method Methods 0.000 claims abstract description 18
- 238000013523 data management Methods 0.000 claims abstract description 12
- 239000010410 layer Substances 0.000 claims description 45
- 230000008569 process Effects 0.000 claims description 30
- 230000007246 mechanism Effects 0.000 claims description 28
- 238000010586 diagram Methods 0.000 claims description 20
- 230000014509 gene expression Effects 0.000 claims description 18
- 238000006243 chemical reaction Methods 0.000 claims description 16
- 238000012545 processing Methods 0.000 claims description 16
- 230000006870 function Effects 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 15
- 238000013075 data extraction Methods 0.000 claims description 12
- 238000013527 convolutional neural network Methods 0.000 claims description 10
- 238000013079 data visualisation Methods 0.000 claims description 9
- 238000005516 engineering process Methods 0.000 claims description 9
- 238000013461 design Methods 0.000 claims description 7
- 230000005540 biological transmission Effects 0.000 claims description 6
- 238000013135 deep learning Methods 0.000 claims description 6
- 238000009826 distribution Methods 0.000 claims description 6
- 239000000284 extract Substances 0.000 claims description 6
- 238000013507 mapping Methods 0.000 claims description 6
- 230000000750 progressive effect Effects 0.000 claims description 6
- 238000003860 storage Methods 0.000 claims description 6
- 239000000725 suspension Substances 0.000 claims description 6
- 239000008280 blood Substances 0.000 claims description 5
- 210000004369 blood Anatomy 0.000 claims description 5
- 238000013499 data model Methods 0.000 claims description 5
- 238000000605 extraction Methods 0.000 claims description 5
- 230000010365 information processing Effects 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 5
- 238000003672 processing method Methods 0.000 claims description 4
- 238000013459 approach Methods 0.000 claims description 3
- 238000012550 audit Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 3
- 239000002346 layers by function Substances 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 238000004422 calculation algorithm Methods 0.000 abstract description 7
- 238000005065 mining Methods 0.000 abstract description 3
- 238000007794 visualization technique Methods 0.000 abstract description 3
- 239000013598 vector Substances 0.000 description 18
- 238000013528 artificial neural network Methods 0.000 description 14
- 230000000306 recurrent effect Effects 0.000 description 8
- 230000008859 change Effects 0.000 description 7
- 230000002457 bidirectional effect Effects 0.000 description 5
- 230000008901 benefit Effects 0.000 description 4
- 238000004140 cleaning Methods 0.000 description 4
- 238000011156 evaluation Methods 0.000 description 4
- 230000002123 temporal effect Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000012937 correction Methods 0.000 description 3
- 125000004122 cyclic group Chemical group 0.000 description 3
- 238000013501 data transformation Methods 0.000 description 3
- 238000012217 deletion Methods 0.000 description 3
- 230000037430 deletion Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 230000008676 import Effects 0.000 description 3
- 238000012423 maintenance Methods 0.000 description 3
- 238000012790 confirmation Methods 0.000 description 2
- 238000009795 derivation Methods 0.000 description 2
- 230000010354 integration Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 241000561734 Celosia cristata Species 0.000 description 1
- 206010063385 Intellectualisation Diseases 0.000 description 1
- 238000009825 accumulation Methods 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 230000035605 chemotaxis Effects 0.000 description 1
- 210000001520 comb Anatomy 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000004141 dimensional analysis Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007429 general method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000011229 interlayer Substances 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 238000012804 iterative process Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000013439 planning Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 238000002759 z-score normalization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
- G06F16/254—Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/26—Visual data mining; Browsing structured data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- Business, Economics & Management (AREA)
- Quality & Reliability (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- Technology Law (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a visualization method for financial big data analysis, and relates to the technical field of financial big data analysis. The invention comprises the following steps: s01, metadata collection management; s02, managing data quality; s03, standardizing data; s04, managing a data warehouse; s05, visualizing the data; s06, analyzing data; the invention aggregates mass data to realize the mining of weak correlation of the data, generates more data value, better performs relevant calculation analysis on the acquired data and upgrades the algorithm and the model, and enables the multidimensional analysis and dynamic analysis of the system to be displayed visually through data management.
Description
Technical Field
The invention belongs to the technical field of financial big data analysis, and particularly relates to a visualization method for realizing financial big data analysis.
Background
With the rapid growth of the data scale of the internet, data with uncertain relationships is difficult to manage, describe and analyze by traditional means. Meanwhile, the big data is not simple statistics and analysis in informatization, but massive data is aggregated to realize the mining of weak correlation of the data, so that more data values are generated.
In order to better perform relevant calculation analysis on the acquired data and upgrade the algorithm and the model, the data management work is very critical, and a good data management system is a key link for multidimensional analysis and dynamic analysis in the financial industry.
The construction content of the financial big data base platform system mainly comprises modules of metadata management, data quality management, data standardization, data warehouse management, data visualization, data analysis and the like;
the metadata is all basic information of the data, for example, if a person information base needs to be established, basic information of name, sex, date of birth, height, weight and the like is needed, each item of basic information is metadata, and effective and complete data information consists of a plurality of metadata; due to the situations of mass improvement of data, reduction of data quality, lack of timeliness of data, imperfect algorithm model and the like, fusion analysis, data situation prediction and the like of the existing financial big data cluster need to be improved. The individual and the group are analyzed, classified and classified, and the key points are distinguished, so as to achieve the expected prejudgment analysis, intellectualization and automation work.
In the current financial field, financial data grows exponentially, and data with uncertain relations are difficult to manage, describe and analyze by traditional means. Meanwhile, the financial data is not simply counted and analyzed in informatization, but massive data is aggregated to realize the mining of weak data correlation, and more data values are generated. In order to better perform relevant calculation analysis on the acquired data and upgrade the algorithm and the model, data management is a key link of system multidimensional analysis and dynamic analysis, so that the method has great significance for realizing the visualization of financial big data analysis.
Disclosure of Invention
The invention provides a visualization method for realizing financial big data analysis, which aims to solve the problems.
In order to solve the technical problems, the invention is realized by the following technical scheme:
the invention discloses a method for realizing financial big data analysis-oriented visualization, which comprises the following steps:
s01, metadata collection management: the system is used for interacting with a financial system, acquiring initial data, and maintaining and acquiring financial metadata according to basic information, attributes and association of financial attribute data; the system provides life cycle management of metadata and version management function, thereby ensuring the quality of the metadata and the authority and reliability of a subsequent metadata system;
s02, data quality management: the system is used for managing data quality problems, comprises data quality index management, index scheduling and execution management, problem management and data quality analysis management, and is realized by an interface layer, a response layer, a functional layer, a system management layer and a storage layer;
s03, data standardization: the system is used for standardizing the acquired financial metadata to acquire standardized data, and adopts a comprehensive standardized or progressive standardized mode, and comprises the steps of standardized object selection, word standardization, domain standardization and expression standardization;
s04, data warehouse management: a step for storing the financial metadata subjected to data standardization in the form of a data warehouse, including data extraction, data cleansing conversion and data warehouse log and warning transmission;
s05, data visualization: the background is used for acquiring data in the data warehouse according to the data source, the execution condition of the data acquisition task and the backup condition of the system data; specifically, data are acquired from a background through a front-end h5 technology, analysis data are processed, a JSONObject is used for returning a result, the result returned by the background is acquired by the front end for judgment, the result is analyzed by the front end, the last link entering the background is determined, and relevant information of a data extraction task is displayed to an interface through an Echarts visualization technology;
s06, data analysis: the convolutional neural network and the text information processing network of the deep learning network model are utilized to analyze the time complexity, the space complexity and the influence on the model from three parties.
Further, in the step S01, the step of acquiring initial data includes:
t01, establishing adapter: including but not limited to JDBC, EXCEL, template, hive, DB data dictionary adapters;
t02, establishing a suspension point: a meta-model needs to be selected to be matched with the adapter;
t03, creating data source: selecting an adapter, an adapter version, an acquisition mode, a suspension point and parameter configuration; judging whether the audit is needed, if so, entering a step T04, and if not, entering a step T05;
t04, interacting with a quality retrieval system;
t05, acquisition template management: template customization and template mapping;
t06, manual acquisition: selecting a data source through collection task management, uploading a template, and starting collection; checking the acquired collection log, and displaying error reporting and progress; task configuration may be performed for automatic acquisition: setting acquisition time, running immediately and ending an acquisition process;
t07, view distribution: the distribution of collected data to different levels of the user view is managed at the view.
Further, the data quality index manages the checking rules of six typical data quality problems according to eight major element specifications of data quality, including: non-null checking, uniqueness checking, main foreign key checking, length checking, code checking and consistency checking; the data quality check comprises eight major elements: integrity, legality, uniqueness, consistency, accuracy, timeliness, safety and expansibility;
the index scheduling and execution management is a process of checking index execution, is a process of checking data quality problems existing in a source system, and finds the data quality problems existing in the system in an automatic/manual mode;
the problem management is realized by a quality problem management module, and is divided into automatic management and manual problem management of the checking problem, and the blood system analysis, the influence analysis, the detail checking, the export function and the process management are provided for the checking problem;
the data quality analysis management is used for performing distributed analysis on the result checked by the data quality, and comprises index query, view of a trend analysis view, view and download of a data quality report; through a graphical icon interface, the reasons and the historical trends generated by inquiry are quickly positioned, and assistance is provided for data management personnel to solve the data quality problem.
Further, the selecting of the standardized object in the step S03 includes the steps of: specifying a detailed execution plan, determining standardized principles, defining standardized guidelines including naming rules, selecting standardized objects, collecting source data;
the word normalization comprises the steps of: selecting a reference dictionary, analyzing morphemes, defining words, grouping English and abbreviation naming synonyms, and constructing a standard word dictionary;
the domain standard comprises an analysis data type, a domain classification and selection standard, a definition domain, a data type and a length of the definition domain, and a standard word dictionary is constructed after a standard domain dictionary is constructed;
the expression standardization comprises the steps of applying standardization to a data model, judging the compliance of the expression, defining the expression, constructing a standard expression dictionary, and then constructing a standard word dictionary.
Further, the data extraction step in the data warehouse management in step S04 is as follows: firstly, it is clear that data comes from several business systems; the database server of each business system runs what DBMS; whether manual data exists; how large the amount of manual data is; whether unstructured data exist or not, and the like, and the design of data extraction can be carried out after the information is collected, wherein the design comprises the following steps:
for the same data source processing approach as the database system storing the DWs: DBMS (SQLServer, Oracle) provides database link function, and the direct link relation is established between the DW database server and the original business system, so that the Select statement can be written for direct access;
processing methods for data sources other than DW database systems: establishing database link such as SQL Server and Oracle by ODBC mode; if the database link cannot be established, the method is completed in two ways, namely exporting the source data into txt or xls files through a tool, and then importing the source system files into the ODS; another method is through a program interface;
for a file type data source (. txt,. xls): the data are imported into a specified database by training business personnel through a database tool, and then extracted from the specified database;
for the problem of incremental updates: the service system records the time when the service occurs, and marks the time for increment, firstly judges the maximum time recorded in the ODS before each extraction, and then extracts all records more than this time according to the time.
Further, the cleansing of the data in the data warehouse management in the step S04 is used to filter those data that are not in compliance with the requirement, and includes: incomplete data, erroneous data, repeated data; the filtered result is sent to a business administration department, whether the filtered result is filtered or corrected by a business unit is determined, and then the filtered result is extracted;
the conversion of data in the database repository management in step S04 is used to perform inconsistent data conversion, data granularity conversion, and some rule calculations, including inconsistent data conversion, data granularity conversion, and business rule calculation;
the data warehouse logs in the data warehouse management in the step S04 include execution process logs, error logs, and the logs are general logs, and are used for knowing the operation condition of the data warehouse at any time and finding errors in time;
the warning transmission in step S04: the system is used for forming a data warehouse error log when the data warehouse has errors and sending a warning to a system administrator; the warning sending mode comprises sending a mail to a system administrator, attaching error information and facilitating the administrator to check errors.
Further, the data visualization in the step S05 is implemented by high-dimensional visualization components, including but not limited to a trend graph, a radar graph, a 3D scatter diagram, a network graph, a hierarchy graph, and a word cloud graph.
Further, the data analysis of step S06 adopts a model of time complexity including a single convolutional layer, a model of space complexity including an access amount, a training/prediction time determined by the time complexity, and a pointer network model; the pointer network model comprises a basic coder decoder model, an attention mechanism coder decoder model, a self-attention mechanism coder decoder model and a constructed text abstract model.
The invention has the following beneficial effects:
1. the financial metadata management system performs management such as maintenance, retrieval, updating, life cycle management, metadata import, export and the like on the financial metadata through metadata management; exporting all data needing to be exported from the financial metadata tree to a file; analysis results mainly have influence analysis, pedigree analysis, data warehouse mapping analysis and the like, and support data export and picture export functions. The version management has strict processes of whole life cycle management, release, deletion and state change of financial data, provides data life cycle management, ensures the quality of the stable data and ensures the reliability of a subsequent metadata using system.
2. According to the method, a checking rule is specified according to a data quality index through metadata quality management, the checking index is executed to find the data quality problem existing in the system, the blood system analysis, the influence analysis, the data panoramic analysis, the data engine, the process management and the like are provided for the checking problem, and finally, the distributed analysis is carried out according to the result of the data quality management to help a service expert to locate the problem; the data quality problem has four problem domains, which are the most frequent cases of the data quality problem. The financial data quality information of the four fields of the problem domain, the management problem domain, the process problem domain and the technical problem domain can be improved.
3. The invention standardizes the data from the source through data standardization, so that the meaning, the expression method, the value range and the like of each item of data in the processing and running of each system are standardized and unified, the consistency of the data from the generated source is ensured, and the information exchange and sharing are smoothly realized; the data standardization processes the financial data with chemotaxis and dimensionless financial data. The data homochemotaxis processing mainly solves the problem of data with different properties, directly sums indexes with different properties and cannot correctly reflect the comprehensive results of different acting forces, and firstly considers changing the data properties of inverse indexes to ensure that all the indexes are homochemotactic for the acting forces of the evaluation scheme and then sum to obtain correct results. The data dimensionless process mainly addresses the comparability of data. The data normalization method used Z-score normalization. Through the standardization processing, the original data are converted into non-dimensionalized index mapping evaluation values, namely, all the index values are in the same quantity level, and the efficiency of comprehensive evaluation analysis is improved.
4. According to the financial data management system, the financial data is stored in a centralized manner through the management of the database warehouse according to different sources, formats and characteristic properties of the financial data through physical and organic integration in the management process, so that standard data storage is formed, and the data management efficiency is comprehensively improved; one part of the data integrated in a data warehouse management mode is business data, and most of the data are text data which are not easy to change and are directly stored in a database of a storage layer; and for data processing requiring certain calculation power, the data processing is transmitted to a cluster of an operation layer, the cluster performs big data analysis statistics, real-time data processing, data model analysis and the like, the structured data design of a processing result is stored in a relational database, and information such as images and the like is stored in a non-relational database. During data processing, the data is normalized and standardized according to the data management requirements, and the data is fused and integrated to form standard and standard data which is stored in a formal database.
5. The invention dynamically displays the incidence relation between metadata and shows the value provided by the service between data by data visualization data governance and analyzing the characteristics between data, thereby providing a basis for data multidimensional analysis and intelligent analysis.
6. Through data analysis, the existing deep learning network model has high requirements on data analysis and calculation capacity, the mainly used network model comprises a convolutional neural network and a text information processing network, and the network model complexity of the convolutional neural network and the pointer network is mainly analyzed.
Of course, it is not necessary for any product in which the invention is practiced to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is an overall step diagram of a method for implementing visualization oriented to financial big data analysis according to the present invention;
FIG. 2 is a schematic diagram of the step of collecting initial data of step S01 in FIG. 1;
fig. 3 is a system configuration diagram of data quality management in step S02 of fig. 1;
FIG. 4 is a flowchart of the data normalization in step S03 of FIG. 1;
FIG. 5 is a schematic flow chart of the database management in step S04 of FIG. 1;
FIG. 6 is a detailed flow diagram of metadata collection, metadata management, and metadata analysis of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1-6, a method for implementing visualization oriented to financial big data analysis according to the present invention includes the following steps:
s01, metadata collection management: the system is used for interacting with a financial system, acquiring initial data, and maintaining and acquiring financial metadata according to basic information, attributes and association of financial attribute data; the system provides life cycle management of metadata and version management function, thereby ensuring the quality of the metadata and the authority and reliability of a subsequent metadata system;
s02, data quality management: the system is used for managing data quality problems, comprises data quality index management, index scheduling and execution management, problem management and data quality analysis management, and is realized by an interface layer, a response layer, a functional layer, a system management layer and a storage layer;
s03, data standardization: the system is used for standardizing the acquired financial metadata to acquire standardized data, and adopts a comprehensive standardized or progressive standardized mode, and comprises the steps of standardized object selection, word standardization, domain standardization and expression standardization;
s04, data warehouse management: a step for storing the financial metadata subjected to data standardization in the form of a data warehouse, including data extraction, data cleansing conversion and data warehouse log and warning transmission;
s05, data visualization: the background is used for acquiring data in the data warehouse according to the data source, the execution condition of the data acquisition task and the backup condition of the system data; specifically, data are acquired from a background through a front-end h5 technology, analysis data are processed, a JSONObject is used for returning a result, the result returned by the background is acquired by the front end for judgment, the result is analyzed by the front end, the last link entering the background is determined, and relevant information of a data extraction task is displayed to an interface through an Echarts visualization technology;
s06, data analysis: the convolutional neural network and the text information processing network of the deep learning network model are utilized to analyze the time complexity, the space complexity and the influence on the model from three parties.
Further, in the step S01, the step of acquiring initial data includes:
t01, establishing adapter: including but not limited to JDBC, EXCEL, template, hive, DB data dictionary adapters;
t02, establishing a suspension point: a meta-model needs to be selected to be matched with the adapter;
t03, creating data source: selecting an adapter, an adapter version, an acquisition mode, a suspension point and parameter configuration; judging whether the audit is needed, if so, entering a step T04, and if not, entering a step T05;
t04, interacting with a quality retrieval system;
t05, acquisition template management: template customization and template mapping;
t06, manual acquisition: selecting a data source through collection task management, uploading a template, and starting collection; checking the acquired collection log, and displaying error reporting and progress; task configuration may be performed for automatic acquisition: setting acquisition time, running immediately and ending an acquisition process;
t07, view distribution: the distribution of collected data to different levels of the user view is managed at the view.
Further, the data quality index manages the checking rules of six typical data quality problems according to eight major element specifications of data quality, including: non-null checking, uniqueness checking, main foreign key checking, length checking, code checking and consistency checking; the data quality check comprises eight major elements: integrity, legality, uniqueness, consistency, accuracy, timeliness, safety and expansibility;
the index scheduling and execution management is a process of checking index execution, is a process of checking data quality problems existing in a source system, and finds the data quality problems existing in the system in an automatic/manual mode;
the problem management is realized by a quality problem management module, and is divided into automatic management and manual problem management of the checking problem, and the blood system analysis, the influence analysis, the detail checking, the export function and the process management are provided for the checking problem;
the data quality analysis management is used for performing distributed analysis on the result checked by the data quality, and comprises index query, view of a trend analysis view, view and download of a data quality report; through a graphical icon interface, the reasons and the historical trends generated by inquiry are quickly positioned, and assistance is provided for data management personnel to solve the data quality problem.
Further, the selecting of the standardized object in the step S03 includes the steps of: specifying a detailed execution plan, determining standardized principles, defining standardized guidelines including naming rules, selecting standardized objects, collecting source data;
the word normalization comprises the steps of: selecting a reference dictionary, analyzing morphemes, defining words, grouping English and abbreviation naming synonyms, and constructing a standard word dictionary;
the domain standard comprises an analysis data type, a domain classification and selection standard, a definition domain, a data type and a length of the definition domain, and a standard word dictionary is constructed after a standard domain dictionary is constructed;
the expression standardization comprises the steps of applying standardization to a data model, judging the compliance of expressions, defining the expressions, constructing a standard expression dictionary and then constructing a standard word dictionary;
at present, two data standardization constructions are adopted in the enterprise informatization process: full standardization and progressive standardization. Comprehensive standardization firstly implements an independent and comprehensive data standardization project, can basically complete the work of 'Information Resource Planning (IRP)' in the whole enterprise range, establishes a long-term stable subject database system, and builds each subsystem on the basis of the stable 'information resource platform';
the progressive standardization firstly establishes a data standardization framework of an enterprise, completes the standardization work of business data and part of management data related to the test point subsystem in cooperation with the operation of the test point subsystem, then completes the related data standardization work of each subsystem project on the premise of following the unified principle, and brings the standardization results into an enterprise data resource platform. Generally, the data standardization system is built in a progressive mode, a data standardization process and an information project building process are carried out synchronously, the building speed is guaranteed, meanwhile, a standardization principle is adhered to, the full sharing of enterprise information resources and the integration of subsystems are supported, the speed and standard are combined, meanwhile, the practicability of data standardization is guaranteed, and data standardization is prevented from being empty or flowing in a form.
Further, in the step S04, part of the data in the database management is service data, and most of the data is text data that is not easy to change and is directly stored in the database of the storage layer; and for data processing requiring certain calculation power, the data processing is transmitted to a cluster of an operation layer, the cluster performs big data analysis statistics, real-time data processing, data model analysis and the like, the structured data design of a processing result is stored in a relational database, and information such as images and the like is stored in a non-relational database. During data processing, the data is normalized and standardized according to the data management requirements, and the data is fused and integrated to form standard and standard data which is stored in a formal database;
(1) data extraction (Extract)
This part requires a lot of work to be done in the investigation phase. Firstly, it is clear that data comes from several business systems; the database server of each business system runs what DBMS; whether manual data exists; how large the amount of manual data is; whether unstructured data exists or not, and the like, and the data extraction can be designed after the information is collected.
a. Processing method for data source same as database system storing DW
This type of data source is relatively easy to design. Generally, the DBMS (SQLServer, Oracle) provides a database linking function, and a direct link relationship is established between the DW database server and the original business system, so that a Select statement can be written for direct access.
b. Processing method for data source different from DW database system
For this kind of data source, it is also possible to establish a database link in ODBC mode, such as between SQL Server and Oracle. If a database link cannot be established, this can be done in two ways, one is by means of a tool exporting the source data as txt or xls files and then importing these source system files into the ODS. Another approach is through a program interface.
c. For a file type data source (. txt,. xls), business personnel may be trained to import the data into a specified database using a database tool and then extract the data from the specified database.
d. Problem of incremental updates
For systems with large amounts of data, incremental decimation must be considered. Typically, the service system will record the time at which the service occurred, which we can use to mark the increment. The maximum time recorded in the ODS is first judged before each extraction, and then all records more than this time are extracted to the service system based on this time.
(2) Cleaning conversion of data (clearing, Transform)
Generally, the data warehouse is divided into ODS and DW. The general method is to clean the service system to the ODS, filter dirty data and incomplete data, convert the dirty data and incomplete data in the process from the ODS to the DW, and perform calculation and aggregation of some service rules.
a. Data cleansing
The task of data cleaning is to filter the data which do not meet the requirements, and the filtered result is sent to a business administration department to confirm whether the data are filtered or corrected by a business unit and then extracted.
The data which is not qualified is mainly three categories of incomplete data, error data and repeated data.
Incomplete data: this kind of data is mainly a problem of missing necessary information, such as name of supplier, name of branch company, missing regional information of customer, failure of matching between main and detail tables in business system, etc. And filtering the data, respectively writing different Excel files according to the missing contents, submitting the Excel files to the client, requiring completion within a specified time, and writing the Excel files into a data warehouse after completion.
Erroneous data: the error is caused by that a service system is not sound enough, and the data is directly written into a background database without judgment after receiving input, for example, numerical data is input into full-angle numerical characters, a carriage return operation is carried out after character string data, a date format is incorrect, a date is out of range, and the like. At the same time, this type of data also needs to be classified. For the problems similar to the existence of full-angle characters and invisible characters before and after data, the problems can be found only by writing SQL sentences, and then a client is required to extract after the business system is corrected. The errors of incorrect date format or overrange date can cause the failure of the data warehouse operation, and the errors need to be selected by the business system database in an SQL mode, are submitted to business administration departments to require correction in a limited period, and are extracted after correction.
Data for repetition: for this type of data, which may occur in particular in dimension tables, all fields of the duplicate data records are exported, passed along with customer confirmations and collated.
Data cleansing is an iterative process, a process that requires constant correction of problems, and is difficult to accomplish in a short amount of time. And customer confirmation is generally required for whether to filter or not and whether to correct. For the filtered data, writing the filtered data into an Excel file or writing the filtered data into a data table, and sending mails of the filtered data to business units every day at the initial stage of data warehouse development to prompt the business units to correct errors as soon as possible, and meanwhile, the filtered data can also be used as the basis for verifying data in the future. Data cleansing requires care not to filter out useful data, to verify carefully for each filtering rule, and to confirm by the user.
b. Data conversion
The task of data transformation is mainly to perform inconsistent data transformation, transformation of data granularity, and computation of some rules.
Inconsistent data transformation: the process is an integrated process, and unifies the same type of data of different business systems, for example, the code of the same supplier in the settlement system is XX0001, and the code in CRM is YY0001, so that the data are unified and converted into one code after being extracted.
Conversion of data granularity: business systems typically store very detailed data, and the data in the data warehouse is used for analysis and need not be very detailed. Typically, business system data is aggregated at a data warehouse granularity.
And (3) calculating a business rule: different enterprises have different business rules and different data indexes, and the indexes can not be completed by simple operation sometimes. In this case, the data indexes need to be calculated in a data warehouse and then stored in the data warehouse for analysis.
(3) Data warehouse log and alert sending
a. Data warehouse logs
Data warehouse logs are divided into three categories:
one type is an execution process log. This partial log records each step performed during the execution of the data warehouse: and recording the starting time of each step of each operation, and influencing the data of the rows, the flow accounting form and the like.
One type is an error log. When a certain module has errors, the time of each error, the module with the errors, the information of the errors and the like are recorded.
The third type of log is a global log. Only information on the start time, end time and success or not of the data warehouse is recorded. If a data warehouse tool is used, the data warehouse tool automatically generates logs, which may also be part of the data warehouse log.
The purpose of recording the log is to know the operation condition of the data warehouse at any time and find errors in time.
b. Warning transmission
If the data warehouse is in error, not only is a data warehouse error log formed, but also a warning is sent to a system administrator. The warning sending mode is various, generally, the warning sending mode is that an email is sent to a system administrator and error information is attached, so that the administrator can conveniently check errors.
2. Implementation of data warehouse
There are many ways to implement a data warehouse, three of which are commonly used. One is realized by means of data warehouse tools (such as OWB of Oracle, DTS of SQL Server 2000, SSIS service of SQL Server2005, information and the like), the other is realized in an SQL mode, and the other is realized by combining the data warehouse tools and SQL. The former two methods have advantages and disadvantages, and can quickly establish data warehouse engineering by means of tools, shield complex coding tasks, effectively improve operation efficiency, reduce implementation difficulty, but lack flexibility. The SQL method has the advantages of being flexible enough, capable of improving the operation efficiency of the data warehouse, complex in coding and high in technical requirement. And the third is to combine the advantages of the first two, which can greatly improve the development speed and efficiency of the data warehouse.
Further, in the step S05, data visualization relies on data governance, by analyzing characteristics between data, association between metadata is dynamically displayed, a value provided by a service between data is indicated, a basis is provided for multidimensional analysis and intelligent analysis of data, and high-dimensional visualization components are used to implement the data visualization, including but not limited to a trend graph, a radar graph, a 3D scatter diagram, a network diagram, a hierarchical diagram, and a word cloud diagram; according to the execution conditions of various data sources and data acquisition tasks and the backup condition of system data, after basic level data are subjected to data management, data are acquired from a background through a front-end h5 technology, the data are processed and analyzed, and results are returned by using JSONObjects. The front end acquires the result returned by the background for judgment, then analyzes the result and decides to enter the last link of the background. Finally, displaying relevant information of the data extraction task to an interface through an Echarts visualization technology;
wherein, the trend graph is as follows: what can also be called a statistical chart is a graph that presents the development trend of something or some information data in the way of presenting the statistical chart. Which is used to display measurements taken over a time interval, such as a day, week or month. The measured quantities are plotted on the vertical axis and time on the horizontal axis. It resembles a changing scoreboard. Its primary use is to determine whether there are significant temporal patterns for various types of problems. The cause thereof can be investigated. Example (b)
Radar chart: also called as a cloth drawing and a cockscomb net drawing, is one of the analysis report forms. The user can clearly understand the variation and trend of each index by dividing the important items into a circular fixed table to represent the ratio.
A scatter diagram: the sequence is displayed as a set of points. The values are represented by the positions of the points in the graph. Categories are represented by different labels in the chart. Scatter plots are typically used to compare aggregated data across categories.
Word cloud picture: is a visual description of keywords used to summarize user-generated labels or the textual content of a website. The tags are generally independent words and are often arranged in alphabetical order, and the importance degree of the tags can be expressed by changing the font size or the color, so that the tag cloud can flexibly search one tag according to the word order or the hot degree. Most tags are themselves hyperlinks, pointing directly to a series of entries associated with the tag.
Bubble diagram: the data arranged in the columns of the worksheet (x values listed in the first column and corresponding y values and bubble size values listed in the adjacent columns) may be plotted in a bubble map.
In addition, the high-dimensional visual component integrates the display modes of various components, and is displayed in a superposition manner, so that data can be displayed comprehensively from all angles, all dimensions and multiple directions; .
Further, the data analysis of step S06 adopts a model of time complexity including a single convolutional layer, a model of space complexity including an access amount, a training/prediction time determined by the time complexity, and a pointer network model; the pointer network model comprises a basic coder decoder model, an attention mechanism coder decoder model, a self-attention mechanism coder decoder model and a constructed text abstract model;
the data analysis and calculation aspect is mainly that the deep learning network model has high requirements on data analysis and calculation capacity. The network models mainly used include a convolutional neural network and a text information processing network. The complexity of the network model of the convolutional neural network and the pointer network is mainly analyzed.
The convolutional neural network is a commonly used network model in deep learning, and mainly analyzes the time complexity, the space complexity and the influence on the model in three parties;
time complexity
The time complexity of a single convolutional layer, i.e., the number of model operations, can be measured by FLOPs, i.e., the number of floating-point operations.
Time~o(M2*K2*Cin*Cout)
M side lengths of each convolution kernel output Feature Map (Feature Map)
K side lengths per convolution Kernel (Kernel)
The number of channels per convolution kernel for Cin, i.e. the number of input channels, i.e. the number of output channels of the previous layer.
The number of convolution kernels that Cout convolution layer has, that is, the number of output channels.
It can be seen that the time complexity of each convolution layer is completely determined by the output feature map area M ^2, the convolution kernel area K ^2, the input Cin and the output channel number Cout.
Where the output eigenmap size itself is determined by the four parameters of input matrix size X, convolution kernel size K, Padding, Stride, and is expressed as follows:
time complexity of convolutional neural network as a whole
This is the existing, fundamental principle of computing temporal complexity, not an algorithm, and is mainly used to compute the temporal complexity for data analysis calculations, which determines the training/prediction time of the model. If the complexity is too high, a lot of time is consumed for model training and prediction, and the idea cannot be quickly verified and the model cannot be improved, and the quick prediction cannot be realized.
The number of convolutional layers the D neural network has, i.e., the depth of the network.
The first convolutional layer of the neural network
The output channel number Cout of the first convolution layer of the Cl neural network is also the number of convolution kernels of the first convolution layer. For the l-th convolutional layer, the input channel number Cin is the output channel number of the (l-1) -th convolutional layer.
It can be seen that the overall time complexity of the CNN is not mysterious, but only the time complexity of all convolutional layers is accumulated.
Intra-layer multiplication and inter-layer accumulation.
Spatial complexity
Space complexity (inventory amount), including two parts: total parameter number + output characteristic diagram of each layer. Parameter amount: the sum of the weight parameters of all layers with parameters of the model (i.e. the model volume, first summation expression)
Characteristic map: the output characteristic graph size calculated by each layer in the real-time operation process of the model (the second summation expression of the following formula)
The total parameter number is dependent only on the size K of the convolution kernel, the number of channels C, the number of layers D, and is independent of the size of the input data.
The space occupation of the output profile is relatively easy, namely the multiplication of the space size M2 and the channel number C.
Effect of complexity on model
The temporal complexity determines the training/prediction time of the model. If the complexity is too high, a lot of time is consumed for model training and prediction, and the idea cannot be quickly verified and the model cannot be improved, and the quick prediction cannot be realized.
The spatial complexity determines the number of parameters of the model. Because of the limitation of dimension cursing, the more parameters of the model, the larger the amount of data needed to train the model, while the real-life data set is usually not too large, which results in easier overfitting of the model.
When a model needs to be clipped, since the spatial size of the convolution kernel is usually very small (3 × 3), and the depth of the network is closely related to the characterization capability of the model, excessive clipping is not suitable, so that the first place for clipping the model is usually the number of channels.
Pointer network
The pointer network model training takes 6 days and 14 hours on a 50k size dictionary and 8 days and 20 hours with a 150k size dictionary to achieve the best results. The training time of the pointer model is shorter, the model achieves convergence training for 60 ten thousand times, and the time duration is 6 days and 7 hours. Pointer models can learn faster during the early stages of training. In the last 3 ten thousand sessions of training, it took approximately 8 hours.
The pointer network is subjected to encoding and decoding operations of information during training.
(1) Basic codec model
The input sentence is read by one cyclic neural network, the whole sentence is compressed into a vector with fixed dimension, and the vector is read by the other cyclic neural network and decoded into an output sentence of the target language. These two recurrent neural networks are called an Encoder (Encoder) and a Decoder (Decoder), respectively, and the seqtoseq model is also called an Encoder-Decoder model. The model structure diagram of the encoder and decoder is shown as the following diagram: A. b, C, D represent the word vectors of the words input to the network model, h1,h2,h3,h4Representing the hidden layer output versus the hidden state, s, of the encoder, respectively1,s2,s3,s4Respectively representing the hidden state of the decoder, X, Y, Z being the prediction output of the decoder.
(2) Encoder decoder model of attention mechanism
In the seqtoseq model, the encoder integrates the input of the complete sentence into a fixed-dimension vector, and inputs the vector into the decoder, which predicts the output sentence according to the vector. However, when the input sentence is long, it is difficult for the fixed-dimension intermediate vector to store enough information, which becomes a bottleneck of the basic seqtoseq model. An Attention mechanism (Attention) model is proposed to address this problem. The attention mechanism allows the decoder to look at the words or segments of the input sentence in the encoder at any time, thus eliminating the need for intermediate vectorsAll information is stored. The model structure diagram of attention mechanism is shown in the figure, wherein A, B, C, D and E respectively represent word vectors input into the hidden layer, the rectangular block represents the hidden layer of the bidirectional cyclic neural network, h0,h1,h2,h3Respectively, representing the output state of the hidden layer of the encoder. t is tj-1,tj,tj+1The rectangular blocks also represent hidden layers, and besides the output of the previous moment, the hidden state output by the hidden unit at the previous moment is also used as the input of the hidden unit at the next moment.
The decoder takes the hidden state as the input of the query in each decoding step, inputs the hidden state into the encoder to query the hidden state of the encoder, calculates a weight of the degree of correlation with the query at each input position, then calculates a weighted average of the hidden state of each input position according to the weight, and obtains a vector called a context vector after weighted average, which represents the original text information most relevant to the currently output word. When the next word is decoded, the context vector is input as additional information into the recurrent neural network of the decoder, so that the decoder can read the most relevant original information to the current output at any time without completely depending on the hidden state at the previous moment.
Mathematical definition of attention mechanism according to the calculation formula in the attention mechanism model:
at=softmax(et)
the algorithm is mainly used for semantic extraction, and can be used for cleaning and classifying raw data in the system, and text information which does not meet the standard is formatted into metadata information which meets the standard for subsequent management.
WhereinIs a function for calculating the degree of correlation between each word in the original text and the current decoding state, and is most commonly usedIs a recurrent neural network with a single hidden layer. Calculating the weight a by softmax functiontComputing the context vector by weighted averagingThis context vector can be thought of as a vector of information read from the original text at each step of fixed dimensional size.
From the principle of attention mechanism, it is known that the text information input to the encoder is converted into word vectors and input to the hidden layer, and the output state of the hidden layer is hi。aiWeight information representing the hidden state of the encoder and the hidden state of the hidden layer output of the decoder. t is tj-1The output information representing decoder time j-1 is also taken as the output of the decoder at time j. It can be seen from the internal details of the conventional mechanism that after the context vector of the j-th step is calculated, the context vector is added to the j +1 time as the input of the next time, and the decoder can query the most relevant information to the original text at each decoding step through the context vector, so that the information bottleneck problem of the basic seqtoseq model is avoided.
It can be seen from the figure that the encoder with the added attention mechanism adopts a bidirectional recurrent neural network, and it is very important to select the bidirectional recurrent neural network in the attention mechanism model, because when the decoder predicts a word through the attention mechanism, it usually needs to know part of the information around the word, and if a single recurrent neural network is used, the hidden state of each word only contains the literal information on the left of the word and does not contain the information on the right of the word. The use of the bidirectional recurrent neural network can enable the hidden state of the word to contain information on the left side and the right side at the same time.
The attention mechanism is used for eliminating the connection between the encoder and the decoder besides the bidirectional recurrent neural network, and the decoder completely depends on the attention mechanism to acquire the original text information, so that the encoder and the decoder can freely select the neural network model. The attention mechanism is an efficient way to obtain information, and allows the decoder to query the most relevant information input at each step, thereby greatly shortening the distance over which information flows.
(3) Self-attention mechanism encoder decoder model
The encoder decoder model of the basic attention mechanism can enable a decoder to extract semantic features of input information through internal calculation of the attention mechanism, but the extracted semantic features are not strong, and the relevance between the interiors of the input information cannot be extracted, so that the generated text abstract is not strong, in order to better extract the relevant semantic information between the interiors of the input text information, the self-attention mechanism is introduced to generate a news abstract, and the purpose of better semantic extraction is expected to be achieved.
(4) Constructing text abstract model
The hidden layer output of the text abstract model coder introduced with the self-attention mechanism is different from that of the basic coder decoder model added with the attention mechanism, and the self-attention mechanism calculates the correlation degree between each word in the input sequence and extracts more semantic information. The structure diagram of the self-attention mechanism is shown in the following figure:
therefore, in order to better perform relevant calculation analysis on the acquired data and upgrade the algorithm and the model, a corresponding calculation server needs to be added for support on the basis of the first-stage construction.
As shown in fig. 6:
and (3) system access flow: the data source is put into a metadata subsystem according to the obtained metadata after data extraction-storage into a data platform, and a developer can inquire the related metadata through the system. Data change influence evaluation can be applied to the data service system, the metadata subsystem performs influence analysis on the data change, and the influence is fed back to the data service system through a metadata manager. Problems with downstream systems are located by blood margin analysis.
And analyzing the association degree of ancestry, influence, whole chain and table: the method comprises the steps of obtaining data from different data sources, then converting the data according to requirements, converting the extracted data according to a pre-designed rule, cleaning out unnecessary data or unavailable data, unifying originally heterogeneous data formats, combining metadata of the data, and performing statistical analysis to obtain regular conclusive analysis.
And (3) analyzing the ancestry: the cross-tool knows the source and destination of the flow variation of data in the system.
Influence analysis: enterprise-wide data change impact is tracked across tools.
The system provides lifecycle management of metadata and version management functions, which ensure the quality of metadata and the authority and reliability of subsequent use of the metadata system.
The overall technical scheme of the invention is as follows:
1. the acquisition of initial data is performed using some data interface.
2. The data is checked according to different required standards to ensure the availability and uniqueness of the data.
3. And extracting metadata, and summarizing and sorting the metadata, such as formatting the metadata on data or adding metadata similar to tags to the data to facilitate subsequent data processing.
4. Later, by comparing all the values of the same metadata of a large amount of information and carrying out statistics, regular conclusions can be obtained.
5. Specific data relationships are searched according to requirements, and the data relationships can be visually represented by using some visualization front-end technologies, so that the acceptability of the data is enhanced.
And performing management such as maintenance, retrieval, updating, life cycle management, metadata import and export on the financial metadata. The metadata maintenance mainly refers to query modification and deletion of basic information, attributes, association and other information of the financial attribute data. The data retrieval is to screen metadata which accords with the function authority of accessing the financial data according to the search condition. The metadata self-defined multi-dimensional analysis and derivation mainly provides data derivation of the financial system metadata and analysis results; exporting all data needing to be exported from the financial metadata tree to a file; analysis results mainly have influence analysis, pedigree analysis, data warehouse mapping analysis and the like, and support data export and picture export functions. The version management has strict processes of whole life cycle management, release, deletion and state change of financial data, provides data life cycle management, ensures the quality of the stable data and ensures the reliability of a subsequent metadata using system.
The preferred embodiments of the invention disclosed above are intended to be illustrative only. The preferred embodiments are not intended to be exhaustive or to limit the invention to the precise embodiments disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best utilize the invention. The invention is limited only by the claims and their full scope and equivalents.
Claims (8)
1. A method for realizing financial big data analysis-oriented visualization is characterized by comprising the following steps:
s01, metadata collection management: the system is used for interacting with a financial system, acquiring initial data, and maintaining and acquiring financial metadata according to basic information, attributes and association of financial attribute data; the system provides life cycle management of metadata and version management function, thereby ensuring the quality of the metadata and the authority and reliability of a subsequent metadata system;
s02, data quality management: the system is used for managing data quality problems, comprises data quality index management, index scheduling and execution management, problem management and data quality analysis management, and is realized by an interface layer, a response layer, a functional layer, a system management layer and a storage layer;
s03, data standardization: the system is used for standardizing the acquired financial metadata to acquire standardized data, and adopts a comprehensive standardized or progressive standardized mode, and comprises the steps of standardized object selection, word standardization, domain standardization and expression standardization;
s04, data warehouse management: a step for storing the financial metadata subjected to data standardization in the form of a data warehouse, including data extraction, data cleansing conversion and data warehouse log and warning transmission;
s05, data visualization: the background is used for acquiring data in the data warehouse according to the data source, the execution condition of the data acquisition task and the backup condition of the system data; specifically, data are acquired from a background through a front-end h5 technology, analysis data are processed, a JSONObject is used for returning a result, the result returned by the background is acquired by the front end for judgment, the result is analyzed by the front end, the last link entering the background is determined, and relevant information of a data extraction task is displayed to an interface through an Echarts visualization technology;
s06, data analysis: the convolutional neural network and the text information processing network of the deep learning network model are utilized to analyze the time complexity, the space complexity and the influence on the model from three parties.
2. The method for implementing visualization oriented to financial big data analysis according to claim 1, wherein in the step of S01, the step of collecting initial data includes:
t01, establishing adapter: including but not limited to JDBC, EXCEL, template, hive, DB data dictionary adapters;
t02, establishing a suspension point: a meta-model needs to be selected to be matched with the adapter;
t03, creating data source: selecting an adapter, an adapter version, an acquisition mode, a suspension point and parameter configuration; judging whether the audit is needed, if so, entering a step T04, and if not, entering a step T05;
t04, interacting with a quality retrieval system;
t05, acquisition template management: template customization and template mapping;
t06, manual acquisition: selecting a data source through collection task management, uploading a template, and starting collection; checking the acquired collection log, and displaying error reporting and progress; task configuration may be performed for automatic acquisition: setting acquisition time, running immediately and ending an acquisition process;
t07, view distribution: the distribution of collected data to different levels of the user view is managed at the view.
3. The method for implementing visualization oriented to financial big data analysis according to claim 1, wherein the data quality index manages checking rules of six typical data quality problems according to eight major element specifications of data quality, and comprises: non-null checking, uniqueness checking, main foreign key checking, length checking, code checking and consistency checking; the data quality check comprises eight major elements: integrity, legality, uniqueness, consistency, accuracy, timeliness, safety and expansibility;
the index scheduling and execution management is a process of checking index execution, is a process of checking data quality problems existing in a source system, and finds the data quality problems existing in the system in an automatic/manual mode;
the problem management is realized by a quality problem management module, and is divided into automatic management and manual problem management of the checking problem, and the blood system analysis, the influence analysis, the detail checking, the export function and the process management are provided for the checking problem;
the data quality analysis management is used for performing distributed analysis on the result checked by the data quality, and comprises index query, view of a trend analysis view, view and download of a data quality report; through a graphical icon interface, the reasons and the historical trends generated by inquiry are quickly positioned, and assistance is provided for data management personnel to solve the data quality problem.
4. The method for implementing visualization oriented to financial big data analysis as claimed in claim 1, wherein the selecting of standardized objects in the step of S03 includes the steps of: specifying a detailed execution plan, determining standardized principles, defining standardized guidelines including naming rules, selecting standardized objects, collecting source data;
the word normalization comprises the steps of: selecting a reference dictionary, analyzing morphemes, defining words, grouping English and abbreviation naming synonyms, and constructing a standard word dictionary;
the domain standard comprises an analysis data type, a domain classification and selection standard, a definition domain, a data type and a length of the definition domain, and a standard word dictionary is constructed after a standard domain dictionary is constructed;
the expression standardization comprises the steps of applying standardization to a data model, judging the compliance of the expression, defining the expression, constructing a standard expression dictionary, and then constructing a standard word dictionary.
5. The method for implementing visualization oriented to financial big data analysis according to claim 1, wherein the step of extracting data in database management in step S04 is as follows: firstly, it is clear that data comes from several business systems; the database server of each business system runs what DBMS; whether manual data exists; how large the amount of manual data is; whether unstructured data exist or not, and the like, and the design of data extraction can be carried out after the information is collected, wherein the design comprises the following steps:
for the same data source processing approach as the database system storing the DWs: DBMS (SQLServer, Oracle) provides database link function, and the direct link relation is established between the DW database server and the original business system, so that the Select statement can be written for direct access;
processing methods for data sources other than DW database systems: establishing database link such as SQL Server and Oracle by ODBC mode; if the database link cannot be established, the method is completed in two ways, namely exporting the source data into txt or xls files through a tool, and then importing the source system files into the ODS; another method is through a program interface;
for a file type data source (. txt,. xls): the data are imported into a specified database by training business personnel through a database tool, and then extracted from the specified database;
for the problem of incremental updates: the service system records the time when the service occurs, and marks the time for increment, firstly judges the maximum time recorded in the ODS before each extraction, and then extracts all records more than this time according to the time.
6. The method for implementing visualization oriented to financial big data analysis according to claim 1, wherein the cleansing of data in database management in step S04 is used for filtering those data which are not qualified, and includes: incomplete data, erroneous data, repeated data; the filtered result is sent to a business administration department, whether the filtered result is filtered or corrected by a business unit is determined, and then the filtered result is extracted;
the conversion of data in the database repository management in step S04 is used to perform inconsistent data conversion, data granularity conversion, and some rule calculations, including inconsistent data conversion, data granularity conversion, and business rule calculation;
the data warehouse logs in the data warehouse management in the step S04 include execution process logs, error logs, and the logs are general logs, and are used for knowing the operation condition of the data warehouse at any time and finding errors in time;
the warning transmission in step S04: the system is used for forming a data warehouse error log when the data warehouse has errors and sending a warning to a system administrator; the warning sending mode comprises sending a mail to a system administrator, attaching error information and facilitating the administrator to check errors.
7. The method for implementing visualization oriented to financial big data analysis as claimed in claim 1, wherein the data visualization in step S05 is implemented by high-dimensional visualization components including but not limited to trend graph, radar graph, 3D scatter diagram, network diagram, hierarchy diagram, word cloud diagram.
8. The method for implementing visualization oriented to financial big data analysis of claim 1, wherein the data analysis of step S06 employs a time complexity model including a single convolutional layer, a space complexity model including an access amount, a training/prediction time of the time complexity-determined model, a pointer network model; the pointer network model comprises a basic coder decoder model, an attention mechanism coder decoder model, a self-attention mechanism coder decoder model and a constructed text abstract model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010106706.1A CN111324602A (en) | 2020-02-21 | 2020-02-21 | Method for realizing financial big data oriented analysis visualization |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010106706.1A CN111324602A (en) | 2020-02-21 | 2020-02-21 | Method for realizing financial big data oriented analysis visualization |
Publications (1)
Publication Number | Publication Date |
---|---|
CN111324602A true CN111324602A (en) | 2020-06-23 |
Family
ID=71167094
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010106706.1A Pending CN111324602A (en) | 2020-02-21 | 2020-02-21 | Method for realizing financial big data oriented analysis visualization |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111324602A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111881224A (en) * | 2020-08-06 | 2020-11-03 | 广东省信息工程有限公司 | Multidimensional data analysis method and system |
CN111949642A (en) * | 2020-08-13 | 2020-11-17 | 中国工商银行股份有限公司 | Data quality control method and device |
CN111949259A (en) * | 2020-08-14 | 2020-11-17 | 中国工商银行股份有限公司 | Risk decision configuration method, system, electronic equipment and storage medium |
CN112052298A (en) * | 2020-09-11 | 2020-12-08 | 武汉众腾智创信息技术有限公司 | Multidimensional data acquisition and accurate correlation system and method thereof |
CN112256806A (en) * | 2020-11-04 | 2021-01-22 | 成都市食品药品检验研究院 | Method and system for constructing risk information base in whole course of food production and operation |
CN112465075A (en) * | 2020-12-31 | 2021-03-09 | 杭银消费金融股份有限公司 | Metadata management method and system |
CN112699100A (en) * | 2020-12-31 | 2021-04-23 | 天津浪淘科技股份有限公司 | Management and analysis system based on metadata |
CN112947263A (en) * | 2021-04-20 | 2021-06-11 | 南京云玑信息科技有限公司 | Management control system based on data acquisition and coding |
CN113032495A (en) * | 2021-03-23 | 2021-06-25 | 深圳市酷开网络科技股份有限公司 | Multi-layer data storage system based on data warehouse, processing method and server |
CN113204684A (en) * | 2021-03-23 | 2021-08-03 | 厦门速言科技有限公司 | Intelligent question-answer interaction robot |
CN113239188A (en) * | 2021-04-21 | 2021-08-10 | 上海快确信息科技有限公司 | Financial transaction conversation information analysis technical scheme |
CN113610113A (en) * | 2021-07-09 | 2021-11-05 | 中国银行股份有限公司 | Data visualization method and device |
CN113849546A (en) * | 2021-09-08 | 2021-12-28 | 国家电网公司东北分部 | System based on electric power K line analysis data |
CN114490602A (en) * | 2022-01-10 | 2022-05-13 | 杭州数查科技有限公司 | Multidimensional data management method based on data analysis and database system |
CN114647344A (en) * | 2020-12-21 | 2022-06-21 | 京东科技控股股份有限公司 | Data processing method and device |
CN115190026A (en) * | 2022-05-09 | 2022-10-14 | 广州中南网络技术有限公司 | Internet digital circulation method |
CN115794798A (en) * | 2022-12-12 | 2023-03-14 | 江苏省工商行政管理局信息中心 | Market supervision informationized standard management and dynamic maintenance system and method |
CN116644151A (en) * | 2023-05-15 | 2023-08-25 | 绵阳市商业银行股份有限公司 | Intelligent system for applying NLP and ML to data standard alignment |
CN116665020A (en) * | 2023-07-31 | 2023-08-29 | 国网浙江省电力有限公司 | Image recognition method, device, equipment and storage medium based on operator fusion |
CN117648388A (en) * | 2024-01-29 | 2024-03-05 | 成都七柱智慧科技有限公司 | Visual safe real-time data warehouse implementation method and system |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001033468A1 (en) * | 1999-11-03 | 2001-05-10 | Accenture Llp | Data warehouse computing system |
US20030204487A1 (en) * | 2002-04-26 | 2003-10-30 | Sssv Muni Kumar | A System of reusable components for implementing data warehousing and business intelligence solutions |
CN109597850A (en) * | 2018-11-22 | 2019-04-09 | 四川省烟草公司成都市公司 | Tobacco integrated information data mart modeling stores platform and data processing method |
-
2020
- 2020-02-21 CN CN202010106706.1A patent/CN111324602A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2001033468A1 (en) * | 1999-11-03 | 2001-05-10 | Accenture Llp | Data warehouse computing system |
US20030204487A1 (en) * | 2002-04-26 | 2003-10-30 | Sssv Muni Kumar | A System of reusable components for implementing data warehousing and business intelligence solutions |
CN109597850A (en) * | 2018-11-22 | 2019-04-09 | 四川省烟草公司成都市公司 | Tobacco integrated information data mart modeling stores platform and data processing method |
Non-Patent Citations (1)
Title |
---|
江樱;黄慧;卢文达;骆伟艺;: "基于大数据技术的电力全业务数据运营管理平台研究" * |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111881224A (en) * | 2020-08-06 | 2020-11-03 | 广东省信息工程有限公司 | Multidimensional data analysis method and system |
CN111949642A (en) * | 2020-08-13 | 2020-11-17 | 中国工商银行股份有限公司 | Data quality control method and device |
CN111949642B (en) * | 2020-08-13 | 2024-07-09 | 中国工商银行股份有限公司 | Method and device for data quality control |
CN111949259A (en) * | 2020-08-14 | 2020-11-17 | 中国工商银行股份有限公司 | Risk decision configuration method, system, electronic equipment and storage medium |
CN112052298A (en) * | 2020-09-11 | 2020-12-08 | 武汉众腾智创信息技术有限公司 | Multidimensional data acquisition and accurate correlation system and method thereof |
CN112052298B (en) * | 2020-09-11 | 2024-03-15 | 武汉众腾智创信息技术有限公司 | Multidimensional data acquisition and accurate association system and method thereof |
CN112256806A (en) * | 2020-11-04 | 2021-01-22 | 成都市食品药品检验研究院 | Method and system for constructing risk information base in whole course of food production and operation |
CN112256806B (en) * | 2020-11-04 | 2021-05-18 | 成都市食品药品检验研究院 | Method and system for constructing risk information base in whole course of food production and operation |
CN114647344A (en) * | 2020-12-21 | 2022-06-21 | 京东科技控股股份有限公司 | Data processing method and device |
CN112465075B (en) * | 2020-12-31 | 2021-05-25 | 杭银消费金融股份有限公司 | Metadata management method and system |
CN112699100A (en) * | 2020-12-31 | 2021-04-23 | 天津浪淘科技股份有限公司 | Management and analysis system based on metadata |
CN112465075A (en) * | 2020-12-31 | 2021-03-09 | 杭银消费金融股份有限公司 | Metadata management method and system |
CN113032495A (en) * | 2021-03-23 | 2021-06-25 | 深圳市酷开网络科技股份有限公司 | Multi-layer data storage system based on data warehouse, processing method and server |
CN113204684A (en) * | 2021-03-23 | 2021-08-03 | 厦门速言科技有限公司 | Intelligent question-answer interaction robot |
CN113032495B (en) * | 2021-03-23 | 2023-08-01 | 深圳市酷开网络科技股份有限公司 | Multi-layer data storage system, processing method and server based on data warehouse |
CN112947263A (en) * | 2021-04-20 | 2021-06-11 | 南京云玑信息科技有限公司 | Management control system based on data acquisition and coding |
CN113239188A (en) * | 2021-04-21 | 2021-08-10 | 上海快确信息科技有限公司 | Financial transaction conversation information analysis technical scheme |
CN113610113A (en) * | 2021-07-09 | 2021-11-05 | 中国银行股份有限公司 | Data visualization method and device |
CN113849546A (en) * | 2021-09-08 | 2021-12-28 | 国家电网公司东北分部 | System based on electric power K line analysis data |
CN114490602A (en) * | 2022-01-10 | 2022-05-13 | 杭州数查科技有限公司 | Multidimensional data management method based on data analysis and database system |
CN115190026A (en) * | 2022-05-09 | 2022-10-14 | 广州中南网络技术有限公司 | Internet digital circulation method |
CN115794798A (en) * | 2022-12-12 | 2023-03-14 | 江苏省工商行政管理局信息中心 | Market supervision informationized standard management and dynamic maintenance system and method |
CN115794798B (en) * | 2022-12-12 | 2023-09-15 | 江苏省工商行政管理局信息中心 | Market supervision informatization standard management and dynamic maintenance system and method |
CN116644151A (en) * | 2023-05-15 | 2023-08-25 | 绵阳市商业银行股份有限公司 | Intelligent system for applying NLP and ML to data standard alignment |
CN116644151B (en) * | 2023-05-15 | 2024-03-22 | 绵阳市商业银行股份有限公司 | Intelligent system for applying NLP and ML to data standard alignment |
CN116665020A (en) * | 2023-07-31 | 2023-08-29 | 国网浙江省电力有限公司 | Image recognition method, device, equipment and storage medium based on operator fusion |
CN116665020B (en) * | 2023-07-31 | 2024-04-12 | 国网浙江省电力有限公司 | Image recognition method, device, equipment and storage medium based on operator fusion |
CN117648388A (en) * | 2024-01-29 | 2024-03-05 | 成都七柱智慧科技有限公司 | Visual safe real-time data warehouse implementation method and system |
CN117648388B (en) * | 2024-01-29 | 2024-04-12 | 成都七柱智慧科技有限公司 | Visual safe real-time data warehouse implementation method and system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111324602A (en) | Method for realizing financial big data oriented analysis visualization | |
US10740396B2 (en) | Representing enterprise data in a knowledge graph | |
US7165036B2 (en) | System and method for managing a procurement process | |
AU2020200130A1 (en) | Ai-driven transaction management system | |
US20040167870A1 (en) | Systems and methods for providing a mixed data integration service | |
US20100070463A1 (en) | System and method for data provenance management | |
CN112000656A (en) | Intelligent data cleaning method and device based on metadata | |
CN111611458A (en) | Method for realizing system data architecture combing based on metadata and data analysis technology in big data management | |
WO2012057728A1 (en) | Providing information management | |
US20170139966A1 (en) | Automated analysis of data reports to determine data structure and to perform automated data processing | |
CN114880405A (en) | Data lake-based data processing method and system | |
JP6375029B2 (en) | A metadata-based online analytical processing system that analyzes the importance of reports | |
US8688499B1 (en) | System and method for generating business process models from mapped time sequenced operational and transaction data | |
CN117892820A (en) | Multistage data modeling method and system based on large language model | |
Li | Data quality and data cleaning in database applications | |
CN108549672A (en) | A kind of intelligent data analysis method and system | |
US11893008B1 (en) | System and method for automated data harmonization | |
Toivonen | Big data quality challenges in the context of business analytics | |
US20090012919A1 (en) | Explaining changes in measures thru data mining | |
CN115982429A (en) | Knowledge management method and system based on flow control | |
CN114926082A (en) | Artificial intelligence-based data fluctuation early warning method and related equipment | |
Ahmed et al. | Generating data warehouse schema | |
Ahuja et al. | Data: Its Nature and Modern Data Analytical Tools | |
CN112115699B (en) | Method and system for analyzing data | |
CN118820325A (en) | Account period data processing method, system, equipment and medium based on Microsoft 365 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
AD01 | Patent right deemed abandoned |
Effective date of abandoning: 20240419 |
|
AD01 | Patent right deemed abandoned |