CN113568990A - Management system of data warehouse model - Google Patents

Management system of data warehouse model Download PDF

Info

Publication number
CN113568990A
CN113568990A CN202111019380.XA CN202111019380A CN113568990A CN 113568990 A CN113568990 A CN 113568990A CN 202111019380 A CN202111019380 A CN 202111019380A CN 113568990 A CN113568990 A CN 113568990A
Authority
CN
China
Prior art keywords
model
data
model index
index
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111019380.XA
Other languages
Chinese (zh)
Inventor
郭青松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Zhongtongji Network Technology Co Ltd
Original Assignee
Shanghai Zhongtongji Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Zhongtongji Network Technology Co Ltd filed Critical Shanghai Zhongtongji Network Technology Co Ltd
Priority to CN202111019380.XA priority Critical patent/CN113568990A/en
Publication of CN113568990A publication Critical patent/CN113568990A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/283Multi-dimensional databases or data warehouses, e.g. MOLAP or ROLAP
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/211Schema design and management
    • G06F16/212Schema design and management with details for data modelling support
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention relates to a management system of a data warehouse model, which comprises: the system comprises a standard model development module, a model monitoring module and a model index level classification module; the standard model development module is used for detecting model index data according to a preset model development standard to obtain a detection result and sending the detection result to a preset terminal; the model monitoring module is used for acquiring quality information of model index data and acquiring a monitoring result of the model index data according to execution data of a preset model index monitoring rule; the preset model index monitoring rule is determined according to the quality information of model index data; the model index hierarchical classification module is used for obtaining model index data to be classified and carrying out model index classification on the model index data to be classified to obtain model index hierarchical classification metadata. The invention improves the quality of the data model, is beneficial to the popularization of the data model, and is mainly used for intelligently processing the model data, so that the invention has the advantage of low labor cost.

Description

Management system of data warehouse model
Technical Field
The invention relates to the technical field of data model processing, in particular to a management system of a data warehouse model.
Background
With the high-speed development of the express industry, the service domain is continuously expanded, the data volume is increased by a geometric curve, and a data model of a data warehouse is developed in an early stage by a chimney type development mode in order to quickly respond to the requirements of various parties such as service, analysis, application and the like. The data model of the current data warehouse has no systematic constraint in aspects of specification, paraphrase, timeliness, quality and the like.
Aiming at the problems of the current data model, the related technology adopts the technical scheme of scoring the recommendation index of the data model at the model level to obtain the model quality data of the data model. However, the coarse-grained data of the related art is difficult to improve the quality of the data model, promote the popularization of the data model, and require enormous human communication cost on specific model indexes.
Disclosure of Invention
In view of this, a management system for a data warehouse model is provided to solve the problems that the quality of the data model and the popularization effect of the data model are difficult to improve and the labor cost is high in the related art.
The invention adopts the following technical scheme:
an abatement system for a data warehouse model, comprising: the system comprises a standard model development module, a model monitoring module and a model index level classification module;
the standard model development module is used for detecting model index data according to a preset model development standard to obtain a detection result and sending the detection result to a preset terminal;
the model monitoring module is used for acquiring quality information of model index data and acquiring a model index data monitoring result according to execution data of a preset model index monitoring rule; the preset model index monitoring rule is determined according to the model index data quality information;
the model index hierarchical classification module is used for obtaining model index data to be classified and carrying out model index classification on the model index data to be classified to obtain model index hierarchical classification metadata.
Preferably, the management system of the data warehouse model further comprises a model index evaluation module;
the model index evaluation module is used for carrying out hierarchical classification on the model index hierarchical classification metadata according to a preset model index hierarchical classification standard.
Preferably, the classification result of the hierarchical classification includes core, important, common and general.
Preferably, the model index data includes a magnitude, a calculated caliber, a model index definition and a business module.
Preferably, the performing model index classification on the model index data to be classified includes:
carrying out data cleaning on the model index data to be classified based on feature engineering;
and carrying out model index classification on the model index data to be classified after data cleaning based on a decision tree algorithm.
By adopting the technical scheme, the invention provides a treatment system of a data warehouse model, which comprises the following steps: the system comprises a standard model development module, a model monitoring module and a model index level classification module; the standard model development module is used for detecting model index data according to a preset model development standard to obtain a detection result and sending the detection result to a preset terminal; the model monitoring module is used for acquiring quality information of model index data and acquiring a monitoring result of the model index data according to execution data of a preset model index monitoring rule; the preset model index monitoring rule is determined according to the quality information of model index data; the model index hierarchical classification module is used for obtaining model index data to be classified and carrying out model index classification on the model index data to be classified to obtain model index hierarchical classification metadata.
The model index hierarchical classification method has the advantages that model indexes of the data models are defined clearly through the standard model development module, the model monitoring module determines the model index monitoring rules and the monitoring results of the data models, quantitative targets are provided for optimization of the data models, and the model index hierarchical classification module classifies the model index data to avoid repeated development of the model index data. Based on the method, the quality of the data model is improved, the popularization of the data model is facilitated, and most of the data model is processed intelligently, so that the method has the advantage of low labor cost.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is an architecture diagram of an abatement system of a data warehouse model according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
Fig. 1 is an architecture diagram of an abatement system of a data warehouse model according to an embodiment of the present invention. As shown in fig. 1, the abatement system of the data warehouse model of the present embodiment includes: a specification model development module 11, a model monitoring module 12 and a model index hierarchical classification module 13.
The standard model development module 11 is configured to develop standard detection model index data according to a preset model, obtain a detection result, and send the detection result to a preset terminal.
Specifically, the model index data includes model indexes of a new data model and model indexes of a historical data model. When a model developer develops a data model, the system receives the model indexes of the corresponding new data model, detects the model index data according to the preset model development standard to obtain a detection result, and sends the detection result to the side terminal of the model developer, so that the model developer can obtain the detection result and input correct model index data according to the detection result. In addition, the system is also used for counting and detecting the model indexes of the historical data model to obtain the model indexes which do not accord with the preset model development standard, and the model indexes which do not accord with the preset model development standard are sent to the corresponding model principal side terminal, so that the model principal carries out model index improvement operation on the data model which does not accord with the preset model development standard. In detail, the preset model development specification includes a specification for paraphrasing a model index. Therefore, through the standardized model development module 11, each model index is clearly defined, the applicable scene is clear, the application developer can quickly select the specific and appropriate model index, and the development efficiency of the application developer is improved.
The model monitoring module 12 is configured to obtain model index data quality information, and obtain a model index data monitoring result according to execution data of a preset model index monitoring rule; and the preset model index monitoring rule is determined according to the model index data quality information.
Specifically, the system generates a questionnaire about the use condition of the model according to the input operation of a model governing person, sends the questionnaire to a target survey object side terminal, obtains survey result data fed back by the target survey object side terminal, determines the use question of actual model data and the question of online model data according to the survey result data, and further determines the quality information of model index data. Then, model administration personnel obtain the preset model index monitoring rule by analyzing, summarizing and collating the quality information of the model index data. The preset model index monitoring rule comprises a data quality baseline, a quantitative target is provided for model index optimization, and optimization iteration of the model index is facilitated. The preset model index monitoring rules comprise single-column data quality rules, cross-row data quality rules and cross-table data quality rules. And performing specific rule configuration on a big data platform based on the four large rule ranges, and feeding back the condition of model index data by monitoring the rule execution condition every day.
The model index hierarchical classification module 13 is configured to obtain model index data to be classified, and perform model index classification on the model index data to be classified to obtain model index hierarchical classification metadata.
In detail, the model index data to be classified is a model index with disordered levels. After obtaining the index data of the model to be classified, carrying out blood relationship analysis on the index of the model to be classified, and collecting characteristic data such as an index business domain, a theme domain, a calculation logic method, attributes and the like. And normalizing the model indexes to be classified through characteristic engineering, and further carrying out hierarchical classification on the model indexes to be classified to obtain the hierarchical classification metadata of the model indexes. Therefore, the model indexes to be classified are hierarchically classified through the rule and machine learning algorithm, the similarity indexes are identified, a reference basis is provided for model reconstruction and optimization, a large number of repeated or similar indexes are prevented from being developed, and calculation resources and storage resources are saved.
The model index levels comprise three levels of basic indexes, composite indexes and derivative indexes. The basic index is a concept set expressing the attributes of the business entity, and is an atomized index (which cannot be further decomposed), such as an electronic bill fee. The composite index is based on the basic indexes, and the composite index is obtained by a plurality of basic indexes through relevant operation rules and achieves efficiency when the indexes cannot be disassembled from a business perspective, such as single ticket. The derived index is an index generated by combining a basic index or a review index and one or more dimension values, such as the quantity of the parts collected in the current month by a website.
This embodiment adopts above technical scheme, a management system of data warehouse model includes: the system comprises a standard model development module, a model monitoring module and a model index level classification module; the standard model development module is used for detecting model index data according to a preset model development standard to obtain a detection result and sending the detection result to a preset terminal; the model monitoring module is used for acquiring quality information of model index data and acquiring a monitoring result of the model index data according to execution data of a preset model index monitoring rule; the preset model index monitoring rule is determined according to the quality information of model index data; the model index hierarchical classification module is used for obtaining model index data to be classified and carrying out model index classification on the model index data to be classified to obtain model index hierarchical classification metadata.
The model index level classification method has the advantages that model indexes of the data models are defined clearly through the standard model development module, the model monitoring module determines model index monitoring rules and monitoring results of the data models, quantitative targets are provided for optimization of the data models, the model index level classification module classifies the model index data, and repeated development of the model index data is avoided. Based on the method, a machine learning algorithm is integrated into the traditional model index development, an optimized model is reconstructed through accurate model index classification, a perfect model index metadata closed loop is established, a benign model construction process is established, the quality of the data model is improved, and the method is beneficial to popularization of the data model.
Preferably, the management system of the data warehouse model of the embodiment further comprises a model index evaluation module; the model index evaluation module is used for carrying out hierarchical classification on the model index hierarchical classification metadata according to a preset model index hierarchical classification standard.
In detail, the classification result of the hierarchical classification includes core, important, common and general. The preset model index level classification standard is determined according to four aspects of index quoted frequency, whether the index belongs to an important service domain, the inheritance level of the index and the user level of the index.
Preferably, the model index data includes a magnitude, a calculated caliber, a model index definition and a business module.
Specifically, the service modules include a customer service module, a user service module (including a merchant, a sender, and a receiver), a journal service module, a member service module, a date service module, a volume service module, a public service module, an international service module, a network point service module, a performance service module, an address service module, a terminal service module, a financial service module, a transit service module, a personnel service module (OA service module), a steam transportation service module, an aviation service module, an aging service module, an organization service module, an order service module, a staff service module (including a salesman and an operator), and 21 service modules in total.
Preferably, the performing model index classification on the model index data to be classified includes:
carrying out data cleaning on the model index data to be classified based on feature engineering;
and carrying out model index classification on the model index data to be classified after data cleaning based on a decision tree algorithm.
In detail, by performing blood-related analysis on model indexes, performing index feature collection by combining a standard service domain, a subject domain and an atomic vocabulary, then performing classification fitting on the model indexes through a Decision Tree (DT) model, and finally defining a level for the indexes and marking metadata information such as similarity labels on the indexes.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow diagrams or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present invention includes additional implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (5)

1. An abatement system for a data warehouse model, comprising: the system comprises a standard model development module, a model monitoring module and a model index level classification module;
the standard model development module is used for detecting model index data according to a preset model development standard to obtain a detection result and sending the detection result to a preset terminal;
the model monitoring module is used for acquiring quality information of model index data and acquiring a model index data monitoring result according to execution data of a preset model index monitoring rule; the preset model index monitoring rule is determined according to the model index data quality information;
the model index hierarchical classification module is used for obtaining model index data to be classified and carrying out model index classification on the model index data to be classified to obtain model index hierarchical classification metadata.
2. The abatement system of the data warehouse model of claim 1, further comprising a model index evaluation module;
the model index evaluation module is used for carrying out hierarchical classification on the model index hierarchical classification metadata according to a preset model index hierarchical classification standard.
3. The governance system of the data warehouse model of claim 2, wherein the classification results of the hierarchical classification include core, important, common, and general.
4. The governance system of a data warehouse model according to claim 1, wherein the model index data includes a magnitude, a calculated caliber, a model index paraphrase, and a business module.
5. The administration system according to claim 1, wherein said performing model index classification on said model index data to be classified comprises:
carrying out data cleaning on the model index data to be classified based on feature engineering;
and carrying out model index classification on the model index data to be classified after data cleaning based on a decision tree algorithm.
CN202111019380.XA 2021-09-01 2021-09-01 Management system of data warehouse model Pending CN113568990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111019380.XA CN113568990A (en) 2021-09-01 2021-09-01 Management system of data warehouse model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111019380.XA CN113568990A (en) 2021-09-01 2021-09-01 Management system of data warehouse model

Publications (1)

Publication Number Publication Date
CN113568990A true CN113568990A (en) 2021-10-29

Family

ID=78173355

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111019380.XA Pending CN113568990A (en) 2021-09-01 2021-09-01 Management system of data warehouse model

Country Status (1)

Country Link
CN (1) CN113568990A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106327325A (en) * 2016-08-26 2017-01-11 北京元丁科技有限公司 Bank big data operation management system and method
US20170293641A1 (en) * 2016-04-06 2017-10-12 International Business Machines Corporation Data warehouse model validation
CN110569313A (en) * 2018-05-17 2019-12-13 北京京东尚科信息技术有限公司 Method and device for judging grade of model table of data warehouse
CN111241689A (en) * 2020-01-15 2020-06-05 北京航空航天大学 Method and device for evaluating maturity of model
CN111523811A (en) * 2020-04-24 2020-08-11 同盾控股有限公司 Model verification and monitoring method, system, equipment and storage medium
CN112380189A (en) * 2020-11-17 2021-02-19 国网福建省电力有限公司信息通信分公司 Online management system of data model
CN112672370A (en) * 2020-12-23 2021-04-16 中移(杭州)信息技术有限公司 Method, system, equipment and storage medium for automatically detecting network element index data

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170293641A1 (en) * 2016-04-06 2017-10-12 International Business Machines Corporation Data warehouse model validation
CN106327325A (en) * 2016-08-26 2017-01-11 北京元丁科技有限公司 Bank big data operation management system and method
CN110569313A (en) * 2018-05-17 2019-12-13 北京京东尚科信息技术有限公司 Method and device for judging grade of model table of data warehouse
CN111241689A (en) * 2020-01-15 2020-06-05 北京航空航天大学 Method and device for evaluating maturity of model
CN111523811A (en) * 2020-04-24 2020-08-11 同盾控股有限公司 Model verification and monitoring method, system, equipment and storage medium
CN112380189A (en) * 2020-11-17 2021-02-19 国网福建省电力有限公司信息通信分公司 Online management system of data model
CN112672370A (en) * 2020-12-23 2021-04-16 中移(杭州)信息技术有限公司 Method, system, equipment and storage medium for automatically detecting network element index data

Similar Documents

Publication Publication Date Title
CN110020660B (en) Integrity assessment of unstructured processes using Artificial Intelligence (AI) techniques
CA3098838A1 (en) Systems and methods for enriching modeling tools and infrastructure with semantics
CN108491377A (en) A kind of electric business product comprehensive score method based on multi-dimension information fusion
CN111445028A (en) AI-driven transaction management system
AU2019278989B2 (en) System and method for analyzing and modeling content
CN110647590A (en) Target community data identification method and related device
US11373101B2 (en) Document analyzer
CN107577724A (en) A kind of big data processing method
CN110109902A (en) A kind of electric business platform recommender system based on integrated learning approach
CN115063035A (en) Customer evaluation method, system, equipment and storage medium based on neural network
CN105405051A (en) Financial event prediction method and apparatus
US11227288B1 (en) Systems and methods for integration of disparate data feeds for unified data monitoring
CN113987186B (en) Method and device for generating marketing scheme based on knowledge graph
US20230088044A1 (en) End-to-end prospecting platform utilizing natural language processing to reverse engineer client lists
CN113568990A (en) Management system of data warehouse model
US20220374401A1 (en) Determining domain and matching algorithms for data systems
US11880394B2 (en) System and method for machine learning architecture for interdependence detection
CN108921431A (en) Government and enterprise customers clustering method and device
KR20230103025A (en) Method, Apparatus, and System for provision of corporate credit analysis and rating information
CN113793205A (en) Financial data construction method and system based on business data
Christopher et al. SCHEMADB: Structures in relational datasets
US11893008B1 (en) System and method for automated data harmonization
CN113836313B (en) Audit information identification method and system based on map
CN114155038B (en) Epidemic situation affected user identification method
Kovesdi et al. A novel tool for improving the data collection process during control room modernization human-system interface testing and evaluation activities

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination