CN113127455A - Data management method and device, electronic equipment and readable storage medium - Google Patents

Data management method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN113127455A
CN113127455A CN201911399691.6A CN201911399691A CN113127455A CN 113127455 A CN113127455 A CN 113127455A CN 201911399691 A CN201911399691 A CN 201911399691A CN 113127455 A CN113127455 A CN 113127455A
Authority
CN
China
Prior art keywords
data
model
data asset
configuration information
asset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911399691.6A
Other languages
Chinese (zh)
Inventor
徐皓
徐文贵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Qihoo Technology Co Ltd
Original Assignee
Beijing Qihoo Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Qihoo Technology Co Ltd filed Critical Beijing Qihoo Technology Co Ltd
Priority to CN201911399691.6A priority Critical patent/CN113127455A/en
Publication of CN113127455A publication Critical patent/CN113127455A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Abstract

The invention discloses a data management method, a data management device, electronic equipment and a readable storage medium. The method comprises the following steps: receiving data asset configuration information; creating a data asset model corresponding to an entity data source according to the data asset configuration information; and responding to a data asset management request according to the data asset model. The method has the advantages that the configuration information of the data assets can be combined, the data assets can be comprehensively managed through the data asset model, the working efficiency of data asset management can be effectively improved, and the management level and the quality control level of huge data assets can be improved.

Description

Data management method and device, electronic equipment and readable storage medium
Technical Field
The invention relates to the field of data assets, in particular to a data governance method, a data governance device, electronic equipment and a readable storage medium.
Background
With the development of information technology, market bodies accumulate a great deal of data closely related to assets and transactions in daily operations, and present a trend to be further expanded. However, in terms of data quality, the data quality often presents the characteristic of being uneven, and some data quality can not even be evaluated. Effective management and quality maintenance of data assets can make full use of and mine data values, and further enterprise competitiveness is improved. Therefore, in order to improve the value and management level of the data assets, the data quality needs to be evaluated from the field level and table level rules to help users to know the quality condition of the assets, and data processing work such as dotting adjustment, cleaning, optimization and the like is performed on the data according to the quality evaluation condition. However, the prior art has low management efficiency and poor effect on the data assets, which causes uneven data quality and can not fully utilize the value of the data assets.
Disclosure of Invention
In view of the above, the present invention has been made to provide a data governance method, apparatus, electronic device and readable storage medium that overcome or at least partially solve the above problems.
According to an aspect of the present invention, there is provided a data governance method, comprising:
receiving data asset configuration information;
creating a data asset model corresponding to an entity data source according to the data asset configuration information;
and responding to a data asset management request according to the data asset model.
Optionally, the data asset configuration information includes model basic information of at least one of:
the model name, the model category, the security level, the model entity, the data structure of the entity data source, the business table type of the entity data source, the model label and the model description information.
Optionally, wherein the data asset configuration information comprises a data storage type; the data storage types include: relational database storage and streaming storage.
Optionally, the data asset configuration information further includes at least one of the following storage information corresponding to the relationship database storage:
store name, model physical name, table type.
Optionally, the data asset configuration information further includes storage information corresponding to the streaming storage, where the storage information includes at least one of:
the storage name, the storage directory, whether the subdirectory is included, the data period, the file name, the separator, the file code, the compression format, the data format and whether the first author is the title.
Optionally, the data asset configuration information includes field configuration information of at least one of:
the field physical name, the field logical name, the data type, the length, the precision, the default value, whether the data element is allowed to be empty, whether the data element is applied, the data identification and the field description information.
Optionally, the data asset configuration information includes model lifecycle information;
the model lifecycle information includes: performing at least one model operation in a model lifecycle as follows: temporary non-processing, data archiving and data cleaning.
Optionally, the data asset configuration information comprises data security configuration information;
the responding to a data asset management request according to the data asset model comprises: and responding to the data asset preview request, reading corresponding data from the entity data source according to the data asset model, and displaying the read data after performing security processing on the read data according to the data security configuration information.
Optionally, the data asset configuration information includes data parsing configuration information, and responding to the data asset management request according to the data asset model includes:
and responding to the data asset preview request, analyzing the data in the entity data source according to the data asset model, and displaying an analysis result.
Optionally, the data asset configuration information includes quality audit configuration information, and the responding to the data asset management request according to the data asset model includes:
and responding to the quality audit request, and performing quality audit on the data in the entity data source according to the data asset model.
According to another aspect of the present invention, there is provided a data governance device comprising:
a receiving unit that receives data asset configuration information;
the model unit is used for creating a data asset model corresponding to the entity data source according to the data asset configuration information;
and the management unit responds to the data asset management request according to the data asset model.
Optionally, the data asset configuration information includes model basic information of at least one of:
the model name, the model category, the security level, the model entity, the data structure of the entity data source, the business table type of the entity data source, the model label and the model description information.
Optionally, the data asset configuration information includes a data storage type; the data storage types include: relational database storage and streaming storage.
Optionally, the data asset configuration information further includes at least one of the following storage information corresponding to the relationship database storage:
store name, model physical name, table type.
Optionally, the data asset configuration information further includes storage information corresponding to the streaming storage, where the storage information includes at least one of:
the storage name, the storage directory, whether the subdirectory is included, the data period, the file name, the separator, the file code, the compression format, the data format and whether the first author is the title.
Optionally, the data asset configuration information includes field configuration information of at least one of:
the field physical name, the field logical name, the data type, the length, the precision, the default value, whether the data element is allowed to be empty, whether the data element is applied, the data identification and the field description information.
Optionally, the data asset configuration information includes model lifecycle information;
the model lifecycle information includes: performing at least one model operation in a model lifecycle as follows: temporary non-processing, data archiving and data cleaning.
Optionally, the data asset configuration information comprises data security configuration information;
and the management unit is suitable for responding to the data asset preview request, reading corresponding data from an entity data source according to the data asset model, and displaying the read data after performing security processing on the read data according to the data security configuration information.
Optionally, the data asset configuration information includes data parsing configuration information, and the management unit is adapted to, in response to a data asset preview request, parse data in an entity data source according to the data asset model, and display a parsing result.
Optionally, the data asset configuration information includes quality audit configuration information, and the management unit is adapted to respond to a quality audit request and perform quality audit on data in the entity data source according to the data asset model.
In accordance with still another aspect of the present invention, there is provided an electronic apparatus including: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform a method as any one of the above.
According to a further aspect of the invention, there is provided a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement a method as any one of the above.
According to the technical scheme, the data asset configuration information is received; creating a data asset model corresponding to an entity data source according to the data asset configuration information; and responding to a data asset management request according to the data asset model. The method has the advantages that the configuration information of the data assets can be combined, the data assets can be comprehensively managed through the data asset model, the working efficiency of data asset management can be effectively improved, and the management level and the quality control level of huge data assets can be improved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 shows a schematic flow diagram of a data governance method according to one embodiment of the present invention;
FIG. 2 shows a schematic diagram of a data governance device according to one embodiment of the present invention;
FIG. 3 shows a schematic structural diagram of an electronic device according to one embodiment of the invention;
fig. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
FIG. 1 shows a flow diagram of a data governance method according to one embodiment of the present invention. As shown in fig. 1, the method includes:
step S110, receiving data asset configuration information.
The implementation of the technical scheme of the invention can depend on a data asset management and maintenance comprehensive platform, and can be realized by various governing units integrated in the platform or by embedding independent service components into the application. When a data governance request exists, the data governance request of the target data asset can be received through the front-end page, and then the data asset management and maintenance comprehensive platform responds to the configuration request of the data governance and generates data governance configuration information corresponding to the target data asset.
And step S120, creating a data asset model corresponding to the entity data source according to the data asset configuration information.
The governance of the target data asset requires full parsing of the content of the target data asset, which may be specifically achieved by creating a data asset model corresponding to the entity data source. Specifically, a data asset model may be created in advance at the asset management maintenance integrated platform, and when the asset management maintenance integrated platform receives a data governance request, the contents of the target data asset may be read by the data asset model.
And S130, responding to the data asset management request according to the data asset model.
The data asset model integrates the processing logic of a user on the target data asset management request items, and after the data asset management and maintenance comprehensive platform receives the user data management request, the data asset management and maintenance comprehensive platform calls the corresponding data asset model and responds to the data asset management request according to the data asset model so as to execute the specific management items of the data asset. Therefore, the high-efficiency treatment of huge data assets is realized according to the data asset model.
The method shown in fig. 1 can realize the comprehensive management of the data assets through the data asset model by combining the configuration information of the data assets, can effectively improve the working efficiency of the data asset management, and can improve the management level and the quality control level of the data assets with huge quantities.
In an embodiment of the present invention, in the method, the data asset configuration information includes model basic information of at least one of: the model name, the model category, the security level, the model entity, the data structure of the entity data source, the business table type of the entity data source, the model label and the model description information.
The configuration information of the data asset may reflect specific characteristics of the data asset in detail, for example, the configuration information of the data asset may include model names, model categories, security levels, model entities, data structures of entity data sources, business table types of the entity data sources, model tags, model description information, and other model basic information. Therefore, the specific conditions of the model in the data asset configuration information can be reflected more completely through the specific configuration information.
In one embodiment of the present invention, in the method, the data asset configuration information includes a data storage type; the data storage types include: relational database storage and streaming storage.
The data asset configuration information can also comprise two data storage types of relational database storage and streaming storage, wherein the relational database storage mode mainly exists in the form of offline data assets, and data contents read from the relational database can be directly viewed and applied. The streaming data is a dynamic data set which grows infinitely along with time, the data assets stored in a streaming mode are data which are recorded in a log mode and are typically read and then analyzed to be convenient to view.
In an embodiment of the present invention, in the method, the data asset configuration information further includes at least one of the following storage information corresponding to the relationship database storage: store name, model physical name, table type.
The data asset configuration information may also include storage information such as storage name, model physical name, table type, and relationship database storage correspondence. In this way, the information content of the relational database storage can be clearly indicated.
In an embodiment of the present invention, in the above method, the data asset configuration information further includes at least one of the following storage information corresponding to streaming storage: the storage name, the storage directory, whether the subdirectory is included, the data period, the file name, the separator, the file code, the compression format, the data format and whether the first author is the title.
In order to describe the characteristics of the streaming data more clearly, the configuration information of the streaming data asset may include storage information corresponding to the streaming storage. For example, the information content may include a storage name, a storage directory, whether the storage directory includes a subdirectory, a data period, a file name, a separator, a file code, a compression format, a data format, whether the first author is a title, and the like. Therefore, the characteristic condition of the storage of the streaming data assets can be reflected more completely through the specific information description.
In an embodiment of the present invention, in the method, the data asset configuration information includes field configuration information of at least one of: the field physical name, the field logical name, the data type, the length, the precision, the default value, whether the data element is allowed to be empty, whether the data element is applied, the data identification and the field description information.
In order to more clearly describe the characteristics of the field data, the field configuration information can be perfectly recorded in the configuration information of the data assets. For example, the physical name of the field, the logical name of the field, the data type, the length, the precision, the default value, whether the field is allowed to be empty, whether the data element is applied, the data identification, the description information of the field, and the like can be contained. Therefore, the characteristic conditions of the field data can be reflected more completely through the specific information description.
In one embodiment of the present invention, in the method, the data asset configuration information includes model lifecycle information; the model lifecycle information includes: performing at least one model operation in a model lifecycle as follows: temporary non-processing, data archiving and data cleaning.
The life cycle is the time for executing corresponding operation after the data asset model is established and meets certain conditions. For example, a time length may be set as a condition, and the execution of the model operation after the corresponding condition is satisfied may be a temporary processing, an archive, and a clean-up option, or the like. The information content of the life cycle of the models can be set correspondingly, if the data is temporarily not processed, the corresponding operation can be set temporarily not to be carried out, if the data is archived, the data before n days can be archived, the archived data can be kept for n days continuously, the archiving path can be set, and the like. If the cleaning processing mode is selected, the cleaning of the specified field can be set, and the reserved days and the like can also be set. Therefore, the management of the data assets can be realized by perfecting the configuration information of the life cycle of the model and further executing corresponding operations.
The collection of the target data assets can be realized by adopting the following embodiments:
in one embodiment of the invention, a data collection request is received, and a data source identifier is parsed from the data collection request.
The implementation of the technical scheme of the invention can depend on a data asset management and maintenance comprehensive platform, can be realized by integrating a target data asset acquisition unit in the platform, and can also be embedded into application through an independent service component. When a target data asset acquisition request exists, the data acquisition request of the target data asset can be received through the front-end page, and then the data asset management and maintenance comprehensive platform responds to the quality acquisition configuration request and analyzes a data source identifier from the data acquisition request so as to accurately correspond to the target data source.
In one embodiment of the invention, the storage configuration information of the corresponding data source is obtained according to the data source identification.
The storage configuration information can clearly reflect the specific storage condition of the target data asset, so that in order to accurately correspond to the data source of the target data asset, the data source identification of the target data asset and the storage configuration information of the data source can be in one-to-one correspondence in advance, and when a data acquisition request exists, the storage configuration information of the corresponding data source can be acquired through the data source identification.
In one embodiment of the invention, the difference acquisition logic is selected to be adapted to the storage type of the corresponding data source according to the storage configuration information.
Due to different data storage modes of the target data assets, differential acquisition logics under different data asset storage modes need to be considered when different data assets are acquired. When a data acquisition request exists, the difference acquisition logic which is adaptive to the storage type of the corresponding data source can be selected according to the storage configuration information to acquire the target data assets.
In one embodiment of the invention, metadata is collected from the respective data sources based on the selected differential collection logic.
The metadata can be quickly and efficiently collected from the corresponding data source by combining the collection request of the target data assets of the user and the differentiated collection logic selected according to the storage configuration information. Relationship data between models such as upstream and downstream relationships, dependency relationships, calculation relationships and the like existing among the models can also be acquired. In this way, efficient maintenance of model relationships, and further data analysis, may be facilitated.
Therefore, the embodiment can realize batch collection of metadata from the storage of the corresponding data source according to the storage configuration information of the data source of the target data asset and by combining differentiated collection logic, has high collection efficiency, and can meet the batch collection application requirements of large-scale data assets.
Specifically, the collection of the target data assets can be implemented by the following embodiments:
in an embodiment of the present invention, the receiving a data acquisition request in the method includes: presetting data acquisition interfaces corresponding to data of different levels, and receiving data acquisition requests through the data acquisition interfaces.
The target data assets are usually characterized by different data types, different hierarchies and the like, and for example, the field-level data assets and the table-level data assets can be respectively extracted according to different specific application requirements. Therefore, when the collected data is received, data collection interfaces corresponding to the data of different levels can be preset, and the data collection request is received through the data collection interfaces. In this way, ordered receipt of data assets at different levels can be achieved.
In an embodiment of the present invention, in the method, the data acquisition interface includes a standard interface and an extended interface; the standard interface is used for receiving data acquisition requests sent by the standard application, and the expansion interface is used for receiving data acquisition requests sent by the third-party application and/or data acquisition requests of cross-programming languages.
In the interface layer, the standard interface is used for receiving a data acquisition request sent by a standard application, for example, the standard interface may correspond to an external JAVA application and is accompanied by a Driver and a start script, and when a new data asset storage mode is added, restart may not be needed, and simple and efficient calling may be implemented. The development interface is used for receiving a data acquisition request sent by a third-party application and/or a data acquisition request of a cross-programming language, and for example, the development interface can be used in a scenario of interfacing with the third-party application, the cross-language application, and the like. Therefore, different interfaces are determined to correspond to the specific applications according to the differences of the specific applications, and therefore the efficiency of calling the corresponding applications is improved.
In an embodiment of the present invention, the obtaining, according to the data source identifier, storage configuration information of a corresponding data source includes: and determining a data asset model corresponding to the data source identification, and reading storage configuration information of the data asset model.
Due to the difference of different data asset storage modes, the storage configuration information needs to be adjusted correspondingly. The data source can support various forms of target data asset storage, such as storage forms supporting Oracle, MySQL, GBase, Dameng, gold storehouse, Hive, HDFS, Kafka, file transfer protocol FTP, secure file transfer protocol SFTP and the like. The data asset collection can be realized through a data asset model, specifically, a data collection model can be created in advance on a comprehensive platform for asset management and maintenance, and when the comprehensive platform for asset management and maintenance receives a data collection request, the content of the target data asset can be collected through the data collection model. Therefore, when the storage configuration information of the corresponding data source is acquired through the data source identification, the data asset model corresponding to the data source identification can be determined at the same time, and the storage configuration information of the data asset model can be read. Therefore, when the target data assets are collected, the storage configuration information and the collection logic of the target data assets, the corresponding data asset models, the storage configuration information of the data asset models, and the like need to be sufficiently analyzed. Therefore, the data asset model can be accurately selected, and reading and collecting of the target data asset can be rapidly and efficiently carried out.
In an embodiment of the present invention, the collecting metadata from the corresponding data source based on the selected difference collecting logic includes: and according to the general acquisition logic and the selected differential acquisition logic, acquiring metadata from a corresponding data source through a Mybatis component and a JDBC component.
To further increase the collection efficiency of the data assets, the general collection logic may be integrated with the selected differential collection logic. Specifically, a Common collection logic layer Common Collector may be set in the determination of the collection logic, and is specifically responsible for the realization of collection subject flow control, transactions and Common logic, shielding logic differences of the underlying collection, and providing a uniform call entry for the data asset collection interface. The difference acquisition logic layer may be specifically configured to correspond to the acquisition logic differences in different existing manners, for example, the correspondence between the MySQL Collector, the Oracle Collector, the GBase Collector, and the like, which are implemented by setting the MySQL Collector, the Oracle Collector, the GBase Collector, and the like, for different data asset storage manners such as MySQL, Oracle, GBase, and the like. Thus, the general acquisition logic and the differential acquisition logic are integrated. Metadata can then be collected from the corresponding data sources by the generic collection logic and the selected differential collection logic Mybatis component and JDBC component.
In an embodiment of the present invention, in the above method, the method further includes: and responding to the difference acquisition logic configuration request, and correspondingly adding/deleting/modifying the difference acquisition logic and the corresponding difference acquisition logic configuration information.
When data acquisition needs to be performed on a new data asset storage mode except for the existing supported data asset storage mode, the data acquisition can be realized by adding/deleting/modifying the difference acquisition logic and the corresponding difference acquisition logic configuration information. Specifically, when there is a new acquisition request of the target data asset storage mode, the data acquisition request can be received through the front-end page, and then the data asset management and maintenance comprehensive platform can respond to the data acquisition request, correspondingly add/delete/modify the difference acquisition logic and the corresponding difference acquisition logic configuration information, and add the configuration information of the corresponding acquisition device in the XML, so that the adaptation of the acquisition logic and the acquisition work of the new target data asset storage mode can be realized. Therefore, common databases and non-relational databases Hive can be well supported, data acquisition is realized in a diversified data storage mode, and good compatibility and expandability are achieved.
In an embodiment of the present invention, in the above method, the method further includes: and generating collected version information for the collected metadata, and storing the collected metadata into a collection result base according to the collected version information.
In order to ensure the integrity and consistency of data, unique version information can be generated for the collected metadata, when the collected metadata is put into a database, the version numbers can be compared, and the collected metadata is stored in a collection result library by adopting a rule that a high version number covers a low version number and only one data asset with a unique version number exists. If some steps are abnormal or the service code is thrown out in the verification process in the acquisition process, the operation of the acquisition process can be rolled back so as to avoid damaging the consistency of the data assets. In this way, the integrity and uniqueness of the data asset is achieved.
In one embodiment of the present invention, the above method, wherein the operation on the collection database is implemented in a database transaction.
A database transaction is a logical partition consisting of all database operations performed between the beginning and the end of the transaction, which are either all performed or none performed when accessing and possibly manipulating a sequence of database operations for various data items, and is therefore an indivisible unit of work. The transactional access to the database has the advantages that operations related to collection logic can be divided into one group, changes of data assets can be previewed before the data assets are permanently changed, and data consistency can be guaranteed. In addition, in order to facilitate a user to inquire the acquisition progress in time, a periodic successful log can be printed out after each step of data acquisition is completed, and thus, the user can check the data acquisition progress, the acquisition result statistics, the acquisition failure information, the acquisition success information and other information in time through checking the log on a front-end page to perform asynchronous inquiry.
In an embodiment of the invention, in the method described above, the method described in any of the above, the method is implemented on the basis of separate components.
According to the technical scheme, when different storage modes are faced, upper-layer logic can be abstracted, so that the difference of storage media is transparent to an application side, and meanwhile, the lower-layer storage logic is adapted, so that a complete and appropriate solution can be provided for different differentiated data acquisition logics. The method of any one of the above, which is implemented based on independent components, is not limited to the field of data asset collection.
In an embodiment of the present invention, in the method, the data asset configuration information includes data security configuration information; responding to the data asset management request according to the data asset model comprises: and responding to the data asset preview request, reading corresponding data from the entity data source according to the data asset model, and displaying the read data after performing security processing on the read data according to the data security configuration information.
When data asset protection is needed, a preview request of a target data asset can be received through a front-end page, and then the work of safety protection processing of data is started. Due to the fact that the data types, the data sensitivity degrees and other conditions of the data assets are different, the safety protection effect of the data assets can be influenced by adopting different data safety strategies, and therefore when the safety protection is conducted on the data assets, a better safety protection strategy needs to be selected according to the data asset types, the data sensitivity and other conditions. The content of the data asset is a main part of the data asset security protection, so that the content of the corresponding data asset needs to be acquired when the data asset is protected so as to facilitate subsequent security protection processing. Specifically, a user may submit a preview request of a target data asset, and after receiving the preview request, the front-end page finds the content of the corresponding target data asset and determines the data security policy corresponding to the content. Therefore, the content of the corresponding target data asset is obtained according to the user request, and the matched data security strategy is flexibly selected according to different conditions of the type, the data sensitivity degree and the like of different data assets. After the data security policy corresponding to the matched target data asset is found, the target data asset can be processed according to the corresponding data security policy, and processed preview content is generated and displayed to the user. Therefore, the data assets are combined with the data asset security strategy to be processed and then displayed to the user, and the purpose of security protection of the data assets is achieved. The corresponding data security policy can be determined according to different contents and types of different data assets, the corresponding preview effect can be generated automatically, the management level of the data assets is improved, and the effective protection of the data assets is realized.
Specifically, the security processing and displaying of the read data according to the data security configuration information may be implemented according to the following embodiments:
in an embodiment of the present invention, the finding out the content of the target data asset and the data security policy corresponding to the target data asset according to the preview request includes: determining a data asset model of the target data asset according to the preview request; the method includes reading content of a target data asset from a data source of the target data asset according to a data asset model, and determining a data security policy corresponding to the target data asset according to the data asset model.
The data asset model integrates the processing logic of a user for target data asset management request items, and after the data asset management and maintenance comprehensive platform receives the user data management request, the data asset management and maintenance comprehensive platform calls a corresponding data asset model and responds to the data asset management request according to the data asset model so as to execute the specific management items of the data asset. Therefore, the high-efficiency treatment of huge data assets is realized according to the data asset model. The data asset management method can realize the comprehensive management of the data assets through the data asset model by combining the configuration information of the data assets, effectively improve the working efficiency of the data asset management and improve the management level and the quality control level of huge data assets. The data asset configuration information includes model base information of at least one of: the model name, the model category, the security level, the model entity, the data structure of the entity data source, the business table type of the entity data source, the model label and the model description information.
A data security policy corresponding to the target data asset may be determined using the data asset model. Specifically, when a preview request for a target data asset is received, a data asset model of the target data asset is determined, and then the data asset model may determine a data security policy corresponding to the target data asset according to the content of the data asset. Therefore, the data security policy corresponding to the target data asset can be accurately and automatically determined.
In one embodiment of the present invention, in the method, the data security policy is determined according to data security configuration information received during the creation of the data asset model.
The security data configuration information may include data type, data length, sensitivity, etc. of the data asset, which may reflect data characteristics of the data asset. According to the difference of the data security configuration information, a more accurate data security policy can be determined. Therefore, the data security strategy matched with the target data assets can be more accurately worked out by integrating the data security configuration information.
In an embodiment of the present invention, in the method, the data security configuration information includes: data security configuration information received during a data source field configuration phase, and/or data security configuration information received during a data element creation phase.
In order to comprehensively acquire the characteristic situation of the data assets, the data security configuration information of different stages can be comprehensively acquired by combining different data characteristics of specific target data assets. Data security configuration information of the data acquired in the data source field configuration stage and/or data security configuration information received in the data element creation stage can be configured by using the data source field. Therefore, the aim of comprehensively acquiring the asset information characteristics of the target data can be fulfilled.
In an embodiment of the present invention, in the method, the data security policy includes a data security level; the data security level comprises a preset level and a user-defined level; the data security level comprises at least one data security processing mode.
According to the sensitivity of the target data assets and different processing requirements, different security levels can be set to specifically divide the different security levels, and then the target data assets are processed at different security levels according to the different security levels. The data security level can adopt a preset mode or a user-defined mode. When the data assets are processed, the safety processing can be determined to be performed in different ranges by combining different content compositions and different sensitivity degrees of the data assets. For example, security processing may be performed on the entire data asset content, or security protection processing may be performed on corresponding fields and table data portions in the data asset. Meanwhile, different security protection levels can be set by combining the sensitivity of data, processing requirements and the like. For example, the security protection levels at different levels such as L1, L2, L3, L4, and L5 … … are sequentially set, and correspond to different encryption algorithms and processing manners, for example, L1 is the lowest security protection level, and may not perform any data security processing; l2 is data security processing using a encryption algorithm; l3 is data security processing using b encryption algorithm; l4 is for data security processing using a-encryption algorithm and b-encryption algorithm, and the like. Therefore, the matched data security level can be flexibly set for different data assets, and the security protection processing of the data assets is realized.
In an embodiment of the present invention, in the method, the data security processing manner includes: the content is encrypted using a data encryption algorithm and/or at least part of the content is masked.
The data security processing mode can encrypt the field data by an encryption algorithm commonly used in the industry, and can realize the encryption processing of the target data assets by adopting MD5, AES and Base64 encryption algorithms. Masking at least a portion of the corresponding content of the target data asset may also be used to replace a portion of the data in the sensitive field to convert it to data with a fogging effect. For example, as shown in fig. 3, part C1 implements the encryption process by using an encryption algorithm such as MD5, AES, Base64, etc., and the corresponding field content of the data asset is converted into the field content converted by the algorithm. The part C2 adopts a mask atomization processing mode to convert the corresponding field contents of the data assets into a processing effect that the fields of the head part and the tail part are visible and the middle part is covered. Therefore, the target data assets can be encrypted in different modes according to different requirements.
In an embodiment of the present invention, in the method, the data security processing mode includes a preset mode and a custom mode.
In addition to the data security processing mode of the preset mode, a user-defined mode can be adopted in the data security processing mode. Therefore, the user can encrypt the target data word length in a user-defined mode by adopting a new data security processing mode, the diversity of the data security processing mode is expanded, the method has a great flexible characteristic, and the method can be well adapted to the development and change trend of data security.
In an embodiment of the present invention, in the method, the data asset configuration information includes data parsing configuration information, and responding to the data asset management request according to the data asset model includes: and responding to the data asset preview request, analyzing the data in the entity data source according to the data asset model, and displaying an analysis result.
When the data asset management is needed, a request for acquiring the configuration information of the target data asset can be received through the front-end page, and then the data asset management and maintenance comprehensive platform responds to the request for acquiring the configuration information of the target data asset and receives the configuration information of the data asset. The configuration information of the data assets can comprise information such as physical storage addresses, data sizes and data types of the data assets, and further preparation work is made for subsequent data asset management. After the data asset configuration information is acquired, a corresponding data asset model can be created by combining the configuration information of the data asset. The data asset model and the entity data source have a corresponding relation, so that the target data asset can be quickly and efficiently analyzed through the data asset model. In this way, automatic and efficient data governance for a target data asset using a data asset model may be achieved. A user can send a data asset preview request through a front-end page, the data asset management and maintenance comprehensive platform responds to the data asset preview request, and a data asset model is displayed through the front-end page to analyze data in an entity data source to obtain an analysis result. Therefore, a user can send a target data asset preview request and receive the analysis result of the target data asset model to the entity data source through the front-end page of the data asset management and maintenance comprehensive platform, and can automatically and quickly analyze the data asset of the entity data source by using the data asset model, so that the analysis result is obtained, the working efficiency is improved, and the high-efficiency management of the data asset is favorably realized.
Specifically, in response to the data asset preview request, the data in the entity data source is parsed according to the data asset model, and the display parsing result can be specifically implemented according to the following embodiments:
in an embodiment of the present invention, in the method, the entity data source is a distributed file storage system, and data in the entity data source is log data.
The data assets are generally large in quantity and complex, and will continue to grow at a high speed, so that in order to effectively solve the storage and management problems of the data assets, the advantages of the distributed file storage system in terms of good expansion capability, high availability, data backup and data safety can be utilized, and the entity data source adopts the distributed file storage system for storage and management. A large amount of log data can be accumulated in the business processing, and the log data has the characteristic of continuously increasing at a high speed along with the continuation of the business processing, so that the management level of the log data has practical significance for the effective treatment of the data assets. The invention can analyze the log data as the target data assets.
In an embodiment of the present invention, in the method, the data asset configuration information includes: a data type of the entity data source; parsing data in an entity data source according to a data asset model includes: and selecting a data analysis rule matched with the data type configured in the data asset model, and analyzing the data in the entity data source.
The configuration information for the data asset may include information such as the physical storage address, data size, data type, etc. of the data asset. When the data asset model analyzes the target data asset, the data asset model selects a data analysis rule matched with the data type according to the acquired target data asset configuration information, and analyzes the data in the entity data source. These different parsing rules may be preconfigured during the data asset model creation process. Therefore, the data asset model can select a proper analysis rule for analysis according to different configuration information of the target data asset, and the analysis efficiency and the analysis effect can be improved.
In an embodiment of the present invention, in the above method, the method further includes: receiving a data sample conforming to the data type; and analyzing the data sample according to the data analysis rule matched with the data type, and displaying the analysis result.
In order to further guarantee the accuracy of data asset analysis, a data sample conforming to the data type is received by the data asset management and maintenance comprehensive platform to serve as a target data asset, the data type of the data sample is determined according to the configuration information of the data sample, and the data sample is analyzed according to the determined data type and a matched data analysis rule to finally obtain an analysis result. Therefore, the data analysis rule corresponding to the data sample can be used for analyzing and obtaining a corresponding analysis result, the analysis effect can be verified to check the analysis effect in time, and a basis is provided for improving the accuracy of data asset analysis.
In an embodiment of the present invention, in the method, the data type is an array type, and the method further includes: and receiving the group delimiter identification, and determining a data analysis rule matched with the array type according to the group delimiter identification.
The data type of the target data asset may be an array type. The data content may have single or multiple identifier identifications, which may affect the parsing result if the identification is not accurate. Therefore, the data asset management and maintenance comprehensive platform can receive the identifier of the separator and then determine the data analysis rule matched with the array type according to the array identifier of the data asset management and maintenance comprehensive platform to analyze the target data asset. In this way, parsing of group-type target data assets is achieved.
In an embodiment of the present invention, in the above method, the data type is an Nginx log type, and the data parsing rule matching the data type configured in the data asset model is as follows: and determining a data analysis rule according to the default Nginx log configuration information and/or the custom Nginx log configuration information.
When the data type of the target data asset is a Nginx log type, the configuration information of the target data asset in the entity data source can be acquired through the data asset management and maintenance comprehensive platform, and then the target data asset can be analyzed by selecting a matched data asset analysis rule according to the data asset model and the analysis rule matched with the configuration information of the target data asset, or according to default Nginx log configuration information and/or customized Nginx log configuration information. In this way, parsing of the Nginx log type target data asset is achieved.
In an embodiment of the present invention, in the above method, the data type is a character string type, and the data parsing rule matching the data type configured in the data asset model is as follows: the regular expression parses the rule.
When the data type of the target data asset is a character string type, the configuration information of the target data asset in the entity data source can be acquired through the data asset management and maintenance comprehensive platform, and then the target data asset is analyzed according to the data asset model and the regular expression analysis rule. In this way, the parsing of the string-type target data asset is achieved.
In an embodiment of the present invention, in the above method, the data type is a json type, and the data parsing rule matching the data type configured in the data asset model is as follows: json object parsing rules.
When the data type of the target data asset is a json type, the configuration information of the target data asset in the entity data source can be acquired through the data asset management and maintenance comprehensive platform, and then the target data asset is analyzed according to the data asset model and the json object analysis rule. In this way, resolution of json-type target data assets is achieved.
In an embodiment of the present invention, in the method, the data asset configuration information includes quality audit configuration information, and responding to the data asset management request according to the data asset model includes: and responding to the quality audit request, and performing quality audit on the data in the entity data source according to the data asset model.
When a data quality audit request exists, the data quality audit request of the target data assets can be received through the front-end page, and then the data asset management and maintenance comprehensive platform responds to the quality audit configuration request to generate quality audit configuration information corresponding to the target data assets. Wherein, the quality audit configuration information is generated and the audit rules are adapted and determined. The quality assessment of the target data asset requires full parsing of the content of the target data asset, and may be specifically implemented by a data asset model. Specifically, a data asset model may be created in advance on the integrated platform for asset management maintenance, and when the integrated platform for asset management maintenance receives a data quality audit request, the content of the target data asset may be read through the data asset model. The data source can support various forms of target data asset storage, such as storage forms supporting Oracle, MySQL, GBase, Dameng, gold storehouse, Hive, HDFS, Kafka, file transfer protocol FTP, secure file transfer protocol SFTP and the like. Therefore, the target data assets can be read quickly and efficiently according to the data asset model. And the quality assessment can be carried out on the target data assets by combining the quality audit configuration information corresponding to the target data assets and the data read by the data asset model. Therefore, the data asset model of the target data asset can be utilized, the quality audit is carried out on the target data asset by combining the quality audit configuration information, the user is helped to know the quality condition of the asset, a basis is provided for subsequent processing work according to the quality evaluation condition, and the management level and the quality control level of the data asset with huge quantity can be effectively improved.
Specifically, quality auditing of data in an entity data source according to a data asset model may be specifically implemented according to the following embodiments:
in an embodiment of the present invention, in the method, the quality audit configuration information includes: available quality audit rules.
Since there may be various types of data assets and different requirements for quality audit, when auditing target data assets, appropriate and matched quality audit rules need to be selected to ensure the quality audit effect of the target data assets. When the quality audit configuration information is generated, the audit rule can be obtained in a json mode and is adapted and determined.
In an embodiment of the present invention, in the method, the quality audit configuration request includes a quality audit rule configuration request, and generating the quality audit configuration information corresponding to the target data assets in response to the quality audit configuration request includes: determining a quality audit rule to be verified according to the quality audit rule configuration request; and verifying the quality audit rule to be verified, and using the quality audit rule passing the verification as an available quality audit rule.
The availability and the auditing effect determined by the quality auditing rule of the target data assets can be ensured by verifying the quality auditing rule to be verified. Specifically, a quality audit request of a target data asset can be received through a front-end page of a data asset management and maintenance comprehensive platform, quality audit configuration information corresponding to the target data asset is generated, a quality audit rule to be verified is determined as the quality audit rule of the target data asset, then the quality audit rule to be verified is verified, and if the quality audit rule passes the verification, the quality audit rule passing the verification is used as an available quality audit rule. Thus, the usability of the quality audit rule can be guaranteed through verification.
In an embodiment of the present invention, the verification includes self-correctness verification and/or verification according to preview data.
When the quality audit rule to be verified is verified, the correctness of the quality audit rule to be verified can be verified first. In this way, it can be firstly ensured through the verification that no obvious error exists in the quality audit rule to be verified. The quality audit effect of the target data assets can be checked by adopting the quality audit rule to be verified in a mode of verifying according to the preview data, and whether the quality audit rule to be verified can pass the verification or not is found, so that whether the quality audit rule to be verified meets the expectation or not is rapidly known. Meanwhile, the data asset management and maintenance comprehensive platform can integrate debug functions, and timely debugging and checking by users are facilitated. Therefore, the self correctness verification is carried out on the quality audit rule to be verified, and/or the verification is carried out according to the preview data, so that the verification working efficiency of the quality audit rule to be verified can be effectively improved.
In an embodiment of the invention, in the method, the quality audit rule includes a custom rule and/or a native rule in a quality audit configuration rule base.
The technical scheme provided by the invention can decouple the storage and the rules, so that when the storage is newly added, a set of grammar of the audit rule does not need to be written according to the new storage, and when the rule is newly added, the grammar difference of different storages does not need to be concerned, thereby improving the efficiency of quality audit work and development work. The decoupling of the storage and the rule can be realized by adopting a strategy mode, for example, an evaluation strategy EvaluateTractetragy can be adopted as an abstract rule strategy, and a storage strategy StorageStrategy can be adopted as an abstract storage strategy. Therefore, when the rules are added or stored, the storage and the rules can be decoupled by inheriting the corresponding policy class. The quality audit rules can be determined either by the user in a self-defined manner or by the user directly using the native rules in the quality audit configuration rule base. Therefore, through a diversified quality audit rule determining mode, the flexibility and the adaptability of the quality audit rule can be guaranteed, and the work efficiency of determining the quality audit rule can be improved.
In an embodiment of the present invention, in the above method, the custom rule includes a custom rule in the form of a code and/or a custom rule in the form of a custom Jar package.
When the user determines the quality audit rule of the target data assets in a user-defined mode, the user can directly calculate the logic of user quality audit in a code mode, and the diversified requirements of the user are met. Meanwhile, the data asset management and maintenance comprehensive platform can also integrate some common templates and built-in functions for users to use so as to further improve the working efficiency, and the platform can also store custom templates and functions written by the users. The method can also be uploaded to a platform in an integrated self-defined Jar package form, provides personalized quality audit rules for users, and meets the diversified requirements of the users. Meanwhile, the existing asset models of the user association platform are supported to carry out quality audit. Therefore, the quality audit rule can be determined more variously by giving a user flexibility to determine the quality audit rule, and different requirements of quality audit of target data assets can be met better.
In one embodiment of the present invention, in the above method, the native rules include field-level rules and/or table-level rules; the field level rules include at least one of: null value check rules, value range check rules, numerical range rules, regular check rules, violation check rules, recording missing rules, main foreign key constraint rules and uniqueness check rules; the table level rules include at least one of: timeliness rules, volatility rules.
When the user adopts the native rules as the quality audit configuration rules of the target data assets, the native rules can be directly selected and determined from the quality audit configuration rule base. The native rules include field-level rules and/or table-level rules. Wherein, the field level rule has a plurality of quality audit configuration rules to choose from. For example, the null value check rule may check whether a null value exists in a field, and calculate the score of the corresponding field through a formula according to the satisfaction degree configured by the user. The value range checking rule can check whether the value in the field meets the value range in the dictionary table configured by the user, and score calculation is carried out according to the user configuration and the satisfaction degree. A value range rule. The regular check rule can check whether the value in the field meets the user configuration regular, and score calculation is carried out according to the user configuration and the satisfaction degree. The violation check rule can check whether the value in the field meets the specification configured by the user, and carry out score calculation according to the user configuration and the satisfaction degree. The recording missing rule can check whether the values in the fields completely contain the values configured by the user, simultaneously supports the joint check of a plurality of fields, and carries out score calculation according to the user configuration and the satisfaction degree. The main foreign key constraint rule can check whether the value in the field meets the main foreign key constraint configured by the user, simultaneously supports the main foreign key constraint of multiple fields, and carries out score calculation according to the user configuration and the satisfaction degree. The uniqueness check rule can check whether the value in the field is unique, supports multi-field uniqueness check and carries out score calculation according to the satisfaction degree.
The table level rules also have various quality audit configuration rules to choose from, for example, the timeliness rules can check whether the data in the table is calculated within a specified time and calculate the score according to the user configuration. The volatility rule can calculate the ring ratio and the same ratio change condition of the data in the table and calculate the score according to the user configuration.
In an embodiment of the present invention, in the method, the quality audit configuration information further includes: quality audit task information corresponding to available quality audit rules; the quality audit of the read data according to the quality audit configuration information comprises the following steps: and when the task execution condition in the quality audit task information is met, performing quality audit on the read data.
In order to perform long-term stable quality control on the target data assets, the quality audit task corresponding to the quality audit rule can be specifically and directly related. For example, when the quality audit configuration information is generated, quality audit task information corresponding to the available quality audit rules may be generated at the same time, and meanwhile, certain quality audit task execution conditions may be set in the quality audit task information, for example, conditions such as start-up every week or start-up after a certain amount of data assets is reached are set. Therefore, the quality audit work of the target data assets can be associated with the specific quality audit task, and the quality audit work of the data assets is facilitated.
In an embodiment of the present invention, in the method, performing quality audit on the read data according to the quality audit configuration information includes: and performing quality audit on the read data by utilizing a mixed calculation engine and available quality audit rules.
When auditing the target data assets, the quality auditing of the target data assets data contents read by the data assets model can be performed by using the mixed calculation engine according to the quality auditing configuration information and the available quality auditing rules. The mixed calculation engine can be a calculation engine developed based on Spark and/or Flink, wherein the Spark engine is a quick and universal calculation engine specially designed for large-scale data processing, has the advantages of high processing speed, interactive calculation and complex algorithm support, and can complete various operations such as SQL query, text processing, machine learning and the like. The Flink engine is a distributed stream data flow engine, supports the execution of an iterative algorithm, can execute any stream data program in a data parallel and pipeline mode, and executes batch processing and stream processing programs. The data asset management and maintenance can be difficult due to the fact that different storage modes and quality audit rules are not decoupled. For example, at present, SQL syntax under different storage modes such as Hive, Oracle and the like is different, so that each storage corresponds to one rule, thereby causing difficulty in management and maintenance of data assets and increase in difficulty in development. The Spark is used as a mixed calculation engine, so that the calculation rule expressions of different storages can be unified, and the decoupling of the storage and the rules is favorably realized. Therefore, the hybrid computation engine can be used for carrying out efficient quality audit work on the target data assets according to the quality audit configuration information and available quality audit rules, the data quality audit of the target data asset association table in different storage modes can be supported, different business rules can be realized by using the uniform SQL language, the execution of tasks is facilitated, and the higher working efficiency is achieved.
In an embodiment of the present invention, in the method, performing quality audit on the read data according to the quality audit configuration information includes: respectively adding unique identifiers to the read data of each row; solving a union set for the field-level non-compliant data screened by the available quality audit rules according to the unique identification to obtain the rule-level non-compliant data; and solving a union set of the non-compliance data of each rule level to obtain the non-compliance data of the table level.
Non-compliance data is determined by calculating field-level, rule-level, table-level scores all together. When the quality is scored, scoring results in the aspects of uniqueness, completeness, timeliness, effectiveness and the like are mainly considered. Specifically, in the calculation of the uniqueness score, the uniqueness score is calculated in each field, and the calculation process is as follows: assuming that a is a calculated value, then a is count distinguint/count 100, where count distinguint is the number of different data in the data and count is the total amount of the data, assuming that B is a reference value base, calculating a uniqueness score by comparing the calculated value a with the base value B, if a > is B, then a score is 100, if a < B, then a score is calculated by the formula (1- (B-a)/B) 100 is a/B100, and then a final uniqueness score total is calculated by the average of the scores of the fields. In this way, a uniqueness score is obtained. In the integrity score calculation, firstly, integrity score calculation is respectively carried out on each field, and the calculation process is as follows: assuming that a is a calculated value/total number of non-null values count 100, and B is a base value, the calculated value a is compared with 100-B when calculating the integrity score, if a > is 100-B, the score is recorded as 100, if a <100-B, the score is obtained by the formula a/(100-B) 100, and then the final integrity score total score is calculated by the average of the field scores. In this way, an integrity score is obtained. In the timeliness score calculation, firstly, the timeliness score calculation is respectively carried out on each field, and the calculation process is as follows: if R < B, no score is given, and if R > B, the calculation is performed by two different methods, namely "end" and "not end". The calculation procedure in the "not finished" case is: and comparing R-B with 10, if R-B < ═ 10, not scoring, if R-B > is 10, converting into minutes to participate in calculation, and calculating according to the formula 10/(R-B) × 100 to obtain the score. The calculation procedure in the "end" case is: comparing the calculation completion time A with a base value B, and if A < ═ B, scoring into 100 points; and if A is larger than B, comparing A-B with 10, if A-B is smaller than 10, recording the timeliness score as 100, namely, indicating that the delay within 10 minutes is allowed, if A-B is larger than 10, converting the delay into minutes to participate in calculation, and calculating the timeliness score according to the formula 10/(A-B) 100. In this way, a timeliness score is obtained. In the effectiveness score calculation, the effectiveness score is calculated according to the number of fields satisfying the regular expression and the maximum and minimum value condition/the total number of records in the fields 100. In this way, a validity score is obtained. In order to obtain an auditing result of non-compliant data at a table level, a unique identifier can be added to each row of data read to a target data asset through a data asset model so as to ensure that each row of data can be traced, then each row of data of the target data asset is audited according to an available quality auditing rule, and the screened non-compliant data at a field level are merged to obtain the non-compliant data at a rule level. Because each row of data has unique identification, the finally acquired non-compliant data at the rule level can clearly show the source of the non-compliant data in a global view, and the data quality problem can be conveniently and quickly identified and solved. The non-compliance data at each rule level may be re-merged, thus resulting in non-compliance data at the table level. Therefore, the non-compliance data at the table level can be finally obtained, and the source of the non-compliance data can be viewed from a global view point according to the unique identification.
In an embodiment of the present invention, in the above method, the method further includes: and displaying the quality audit result through a quality audit result display page, wherein the quality audit result comprises non-compliance data and quality audit rules corresponding to the non-compliance data.
After the target data asset quality audit is finished, the target data asset quality audit result can be displayed on the front-end page of the data asset management and maintenance comprehensive platform through the quality audit result display page. The quality audit result comprises non-compliance data and quality audit rules corresponding to the non-compliance data. In this way, non-compliant data under the corresponding quality audit rules can be clearly demonstrated.
In an embodiment of the present invention, in the above method, the method further includes: and displaying a quality audit state through a state page, wherein the quality audit state comprises a data asset model state and/or a rule state.
In order to accurately control the quality audit state of the target data assets, the quality audit state can be displayed through a state page at the front end of a data asset management and maintenance comprehensive platform. The presented quality audit status content may include data asset model status and/or rule status. The models of the task which is not configured with the rules, the execution success and the execution failure can be displayed, and the quality audit rules corresponding to the failure conditions can be displayed. Therefore, the specific working state information of the quality audit can be displayed through the state page, so that a user can know the working progress condition of the target data asset quality audit work in time, and the user can directly position the problem investigation of the failure quality audit rule.
In an embodiment of the present invention, in the method, the quality audit configuration information further includes: service subscription configuration information; the method further comprises the following steps: and generating a corresponding quality report and/or warning information according to the quality audit result, generating a corresponding service subscription message according to the service subscription configuration information, and sending the service subscription message.
In order to prompt the quality audit result condition of the target data assets in time, corresponding quality reports and/or warning information can be generated after the quality audit work of the target data assets is completed, and then users are prompted in time. Specifically, the service subscription configuration information may be generated at the same time as the configuration information is generated, and then the corresponding quality report and/or warning information may be generated and sent to the user in the form of a service subscription message. The quality assessment report presentation may include basic information of the target data asset, scoring conditions under each quality audit rule, quality audit assessment suggestions, and the like. Therefore, the quality report can prompt the user in a service subscription message form after the quality audit of the target data assets is finished, so that the user can timely and accurately grasp the quality audit result condition of the target data assets.
In an embodiment of the present invention, in the above method, the method further includes: a quality rules schema is maintained through a quality rules repository, the quality rules schema including solutions to performing failed quality audit tasks.
In order to correct the situation of the quality audit task execution failure of the target data assets and maintain the work, a quality rule knowledge base can be established, and the quality rules are maintained through the quality rule knowledge base after the quality audit task execution failure, so that the problem of the quality audit task execution failure is solved. The quality rule knowledge base can be configured in advance, and can also be updated along with the increase and continuous perfection of maintenance schemes in specific maintenance work. Thus, the task of properly maintaining quality audit through the quality rule knowledge base is realized.
FIG. 2 shows a schematic diagram of a data governance device according to one embodiment of the present invention. As shown in fig. 2, the data abatement device 200 includes:
the receiving unit 210 receives data asset configuration information.
The implementation of the technical scheme of the invention can depend on a data asset management and maintenance comprehensive platform, and can be realized by various governing units integrated in the platform or by embedding independent service components into the application. When a data governance request exists, the data governance request of the target data asset can be received through the front-end page, and then the data asset management and maintenance comprehensive platform responds to the configuration request of the data governance and generates data governance configuration information corresponding to the target data asset.
The model unit 220 creates a data asset model corresponding to the entity data source according to the data asset configuration information.
The governance of the target data asset requires full parsing of the content of the target data asset, which may be specifically achieved by creating a data asset model corresponding to the entity data source. Specifically, a data asset model may be created in advance on the integrated platform for asset management maintenance, and when the integrated platform for asset management maintenance receives a data governance request, the contents of the target data asset may be read by the data asset model.
And a management unit 230 for responding to the data asset management request according to the data asset model.
The data asset model integrates the processing logic of a user on the target data asset management request items, and after the data asset management and maintenance comprehensive platform receives the user data management request, the data asset management and maintenance comprehensive platform calls the corresponding data asset model and responds to the data asset management request according to the data asset model so as to execute the specific management items of the data asset. Therefore, the high-efficiency treatment of huge data assets is realized according to the data asset model.
The device shown in fig. 2 can realize the comprehensive management of the data assets through the data asset model by combining the configuration information of the data assets, can effectively improve the working efficiency of the data asset management, and can improve the management level and the quality control level of the data assets with huge quantities.
In an embodiment of the present invention, in the above apparatus, the data asset configuration information includes model basic information of at least one of: the model name, the model category, the security level, the model entity, the data structure of the entity data source, the business table type of the entity data source, the model label and the model description information.
In an embodiment of the present invention, in the above apparatus, the data asset configuration information includes a data storage type; the data storage types include: relational database storage and streaming storage.
In an embodiment of the present invention, in the above apparatus, the data asset configuration information further includes at least one of the following storage information corresponding to the relationship database storage: store name, model physical name, table type.
In an embodiment of the present invention, in the above apparatus, the data asset configuration information further includes at least one of the following storage information corresponding to streaming storage: the storage name, the storage directory, whether the subdirectory is included, the data period, the file name, the separator, the file code, the compression format, the data format and whether the first author is the title.
In an embodiment of the present invention, in the above apparatus, the data asset configuration information includes field configuration information of at least one of: the field physical name, the field logical name, the data type, the length, the precision, the default value, whether the data element is allowed to be empty, whether the data element is applied, the data identification and the field description information.
In one embodiment of the present invention, in the above apparatus, the data asset configuration information includes model lifecycle information; the model lifecycle information includes: performing at least one model operation in a model lifecycle as follows: temporary non-processing, data archiving and data cleaning.
In an embodiment of the present invention, in the above apparatus, the data asset configuration information includes data security configuration information; the management unit 230 is adapted to, in response to the data asset preview request, read corresponding data from the entity data source according to the data asset model, perform security processing on the read data according to the data security configuration information, and then display the processed data.
In an embodiment of the present invention, in the above apparatus, the data asset configuration information includes data parsing configuration information, and the management unit 230 is adapted to, in response to the data asset preview request, parse data in the entity data source according to the data asset model, and display a parsing result.
In an embodiment of the present invention, in the above apparatus, the data asset configuration information includes quality audit configuration information, and the management unit 230 is adapted to perform quality audit on the data in the entity data source according to the data asset model in response to the quality audit request.
It should be noted that, for the specific implementation of each apparatus embodiment, reference may be made to the specific implementation of the corresponding method embodiment, which is not described herein again.
In summary, according to the technical solution of the present invention, data asset configuration information is received; creating a data asset model corresponding to an entity data source according to the data asset configuration information; and responding to a data asset management request according to the data asset model. The method has the advantages that the configuration information of the data assets can be combined, the data assets can be comprehensively managed through the data asset model, the working efficiency of data asset management can be effectively improved, and the management level and the quality control level of huge data assets can be improved.
It should be noted that:
the algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose devices may be used with the teachings herein. The required structure for constructing such a device will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the invention and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
The various component embodiments of the invention may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in a data administration device according to an embodiment of the present invention. The present invention may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present invention may be stored on computer-readable media or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
For example, fig. 3 shows a schematic structural diagram of an electronic device according to an embodiment of the invention. The electronic device 300 comprises a processor 310 and a memory 320 arranged to store computer executable instructions (computer readable program code). The memory 320 may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. The memory 320 has a storage space 330 storing computer readable program code 331 for performing any of the method steps described above. For example, the storage space 330 for storing the computer readable program code may comprise respective computer readable program codes 331 for respectively implementing various steps in the above method. The computer readable program code 331 may be read from or written to one or more computer program products. These computer program products comprise a program code carrier such as a hard disk, a Compact Disc (CD), a memory card or a floppy disk. Such a computer program product is typically a computer readable storage medium such as described in fig. 4. Fig. 4 shows a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention. The computer readable storage medium 400 has stored thereon a computer readable program code 331 for performing the steps of the method according to the invention, readable by a processor 310 of the electronic device 300, which computer readable program code 331, when executed by the electronic device 300, causes the electronic device 300 to perform the steps of the method described above, in particular the computer readable program code 331 stored on the computer readable storage medium may perform the method shown in any of the embodiments described above. The computer readable program code 331 may be compressed in a suitable form.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.
The invention discloses A1 and a data management method, which comprises the following steps:
receiving data asset configuration information;
creating a data asset model corresponding to an entity data source according to the data asset configuration information;
and responding to a data asset management request according to the data asset model.
A2, the method of A1, wherein the data asset configuration information includes model base information of at least one of:
the model name, the model category, the security level, the model entity, the data structure of the entity data source, the business table type of the entity data source, the model label and the model description information.
A3, the method of A1, wherein the data asset configuration information includes data storage type; the data storage types include: relational database storage and streaming storage.
A4, the method of A3, wherein the data asset configuration information further includes at least one of the following stored information corresponding to the relational database store:
store name, model physical name, table type.
A5, the method of A3, wherein the data asset configuration information further includes at least one of the following storage information corresponding to the streaming storage:
the storage name, the storage directory, whether the subdirectory is included, the data period, the file name, the separator, the file code, the compression format, the data format and whether the first author is the title.
A6, the method of A1, wherein the data asset configuration information includes field configuration information for at least one of:
the field physical name, the field logical name, the data type, the length, the precision, the default value, whether the data element is allowed to be empty, whether the data element is applied, the data identification and the field description information.
A7, the method of A1, wherein the data asset configuration information includes model lifecycle information;
the model lifecycle information includes: performing at least one model operation in a model lifecycle as follows: temporary non-processing, data archiving and data cleaning.
A8, the method of A1, wherein the data asset configuration information includes data security configuration information;
the responding to a data asset management request according to the data asset model comprises: and responding to the data asset preview request, reading corresponding data from the entity data source according to the data asset model, and displaying the read data after performing security processing on the read data according to the data security configuration information.
A9, the method of A1, wherein the data asset configuration information includes data parsing configuration information, responding to data asset management requests according to the data asset model includes:
and responding to the data asset preview request, analyzing the data in the entity data source according to the data asset model, and displaying an analysis result.
A10, the method of A1, wherein the data asset configuration information includes quality audit configuration information, the responding to data asset management requests according to the data asset model includes:
and responding to the quality audit request, and performing quality audit on the data in the entity data source according to the data asset model.
The invention also discloses B11, a data governance device, including:
a receiving unit that receives data asset configuration information;
the model unit is used for creating a data asset model corresponding to the entity data source according to the data asset configuration information;
and the management unit responds to the data asset management request according to the data asset model.
The apparatus of B12, B11, wherein the data asset configuration information includes model base information of at least one of:
the model name, the model category, the security level, the model entity, the data structure of the entity data source, the business table type of the entity data source, the model label and the model description information.
B13, the apparatus as in B11, wherein the data asset configuration information includes data storage type; the data storage types include: relational database storage and streaming storage.
The apparatus of B14, as in B13, wherein the data asset configuration information further includes at least one of the following storage information corresponding to the relationship database storage:
store name, model physical name, table type.
The apparatus of B15, B13, wherein the data asset configuration information further comprises at least one of the following storage information corresponding to the streaming storage:
the storage name, the storage directory, whether the subdirectory is included, the data period, the file name, the separator, the file code, the compression format, the data format and whether the first author is the title.
The apparatus of B16, as defined in B11, wherein the data asset configuration information includes field configuration information for at least one of:
the field physical name, the field logical name, the data type, the length, the precision, the default value, whether the data element is allowed to be empty, whether the data element is applied, the data identification and the field description information.
B17, the apparatus of B11, wherein the data asset configuration information includes model lifecycle information;
the model lifecycle information includes: performing at least one model operation in a model lifecycle as follows: temporary non-processing, data archiving and data cleaning.
B18, the apparatus as in B11, wherein the data asset configuration information includes data security configuration information;
and the management unit is suitable for responding to the data asset preview request, reading corresponding data from an entity data source according to the data asset model, and displaying the read data after performing security processing on the read data according to the data security configuration information.
The device of B19, as in B11, wherein the data asset configuration information includes data parsing configuration information, and the management unit is adapted to parse data in the entity data source according to the data asset model in response to a data asset preview request, and display the parsing result.
B20, the apparatus as in B11, wherein the data asset configuration information includes quality audit configuration information, and the management unit is adapted to, in response to the quality audit request, quality audit data in the entity data source according to the data asset model.
The invention also discloses C21 and an electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer executable instructions that, when executed, cause the processor to perform the method of any one of a1-a 10.
The invention also discloses D22, a computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method as described in any of a1-a 10.

Claims (10)

1. A data governance method, comprising:
receiving data asset configuration information;
creating a data asset model corresponding to an entity data source according to the data asset configuration information;
and responding to a data asset management request according to the data asset model.
2. The method of claim 1, wherein the data asset configuration information includes model base information of at least one of:
the model name, the model category, the security level, the model entity, the data structure of the entity data source, the business table type of the entity data source, the model label and the model description information.
3. The method of claim 1, wherein the data asset configuration information comprises a data storage type; the data storage types include: relational database storage and streaming storage.
4. The method of claim 3, wherein the data asset configuration information further comprises stored information corresponding to the relational database store of at least one of:
store name, model physical name, table type.
5. A data governance device comprising:
a receiving unit that receives data asset configuration information;
the model unit is used for creating a data asset model corresponding to the entity data source according to the data asset configuration information;
and the management unit responds to the data asset management request according to the data asset model.
6. The apparatus of claim 5, wherein the data asset configuration information comprises model base information of at least one of:
the model name, the model category, the security level, the model entity, the data structure of the entity data source, the business table type of the entity data source, the model label and the model description information.
7. The apparatus of claim 5, wherein the data asset configuration information comprises a data storage type; the data storage types include: relational database storage and streaming storage.
8. The apparatus of claim 7, wherein the data asset configuration information further comprises storage information corresponding to the relational database storage of at least one of:
store name, model physical name, table type.
9. An electronic device, wherein the electronic device comprises: a processor; and a memory arranged to store computer-executable instructions that, when executed, cause the processor to perform the method of any one of claims 1-4.
10. A computer readable storage medium, wherein the computer readable storage medium stores one or more programs which, when executed by a processor, implement the method of any of claims 1-4.
CN201911399691.6A 2019-12-30 2019-12-30 Data management method and device, electronic equipment and readable storage medium Pending CN113127455A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911399691.6A CN113127455A (en) 2019-12-30 2019-12-30 Data management method and device, electronic equipment and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911399691.6A CN113127455A (en) 2019-12-30 2019-12-30 Data management method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN113127455A true CN113127455A (en) 2021-07-16

Family

ID=76768412

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911399691.6A Pending CN113127455A (en) 2019-12-30 2019-12-30 Data management method and device, electronic equipment and readable storage medium

Country Status (1)

Country Link
CN (1) CN113127455A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706060A (en) * 2021-10-29 2021-11-26 中国电力科学研究院有限公司 Power grid regulation and control data asset processing method, system, equipment and storage medium
CN115328948A (en) * 2022-02-22 2022-11-11 杭州美创科技有限公司 Master data quality management method, master data quality management device, computer equipment and storage medium
CN115934825A (en) * 2023-02-02 2023-04-07 成都卓讯智安科技有限公司 Data access method and system based on Elasticissearch, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113706060A (en) * 2021-10-29 2021-11-26 中国电力科学研究院有限公司 Power grid regulation and control data asset processing method, system, equipment and storage medium
CN115328948A (en) * 2022-02-22 2022-11-11 杭州美创科技有限公司 Master data quality management method, master data quality management device, computer equipment and storage medium
CN115934825A (en) * 2023-02-02 2023-04-07 成都卓讯智安科技有限公司 Data access method and system based on Elasticissearch, electronic equipment and storage medium
CN115934825B (en) * 2023-02-02 2023-08-25 成都卓讯智安科技有限公司 Data access method, system, electronic device and storage medium based on elastic search

Similar Documents

Publication Publication Date Title
US11249981B2 (en) Data quality analysis
US20200311098A1 (en) Mapping instances of a dataset within a data management system
US7693857B2 (en) Clinical genomics merged repository and partial episode support with support abstract and semantic meaning preserving data sniffers
CN105550241B (en) Multi-dimensional database querying method and device
US10956422B2 (en) Integrating event processing with map-reduce
US7925672B2 (en) Metadata management for a data abstraction model
US8671084B2 (en) Updating a data warehouse schema based on changes in an observation model
US7464073B2 (en) Application of queries against incomplete schemas
US9002905B2 (en) Rapidly deploying virtual database applications using data model analysis
CN113127455A (en) Data management method and device, electronic equipment and readable storage medium
CN116450890A (en) Graph data processing method, device and system, electronic equipment and storage medium
CN113127458A (en) Data quality auditing method and device, electronic equipment and storage medium
US20080189289A1 (en) Generating logical fields for a data abstraction model
Hinrichs et al. An ISO 9001: 2000 Compliant Quality Management System for Data Integration in Data Warehouse Systems.
US20230289331A1 (en) Model generation service for data retrieval
US20190266163A1 (en) System and method for behavior-on-read query processing
US20040267704A1 (en) System and method to retrieve and analyze data
US20220019566A1 (en) System and method for integrating systems to implement data quality processing
US7469249B2 (en) Query-driven partial materialization of relational-to-hierarchical mappings
CN113128805A (en) Method and device for treating streaming data, electronic equipment and storage medium
CN113064943A (en) Data acquisition method and device, electronic equipment and storage medium
US7711730B2 (en) Method of returning data during insert statement processing
CN115599520A (en) Intelligent data processing method, device, equipment and storage medium
CN113342392A (en) Project information query method and device based on multiple code warehouses
Büchi et al. Relational Data Access on Big Data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination