CN113641663A

CN113641663A - Big data management method and system based on DAMA theory

Info

Publication number: CN113641663A
Application number: CN202111213648.3A
Authority: CN
Inventors: 周文群
Original assignee: Beijing Jinhongrui Information Technology Co ltd
Current assignee: Beijing Jinhongrui Information Technology Co ltd
Priority date: 2021-10-19
Filing date: 2021-10-19
Publication date: 2021-11-12
Anticipated expiration: 2041-10-19
Also published as: CN113641663B

Abstract

The invention provides a big data management method based on DAMA theory, which comprises the following steps: obtaining accessible data resources; accessing the accessible data, organizing and comprehensively managing the accessible data based on a preset big data base platform, and generating a service management result; receiving the service management result and transmitting the service management result to a consumption port; the invention covers the full life cycle treatment characteristic processing of data, provides functions of a data standard management system of different levels, a flexible and extensible data source adaptation method, a rapid data service release mode and the like, provides a set of efficient and scientific solution for the treatment of big data, and realizes a uniform treatment system across big data platforms.

Description

Big data management method and system based on DAMA theory

Technical Field

The invention relates to the technical field of data processing systems and big data management, in particular to a big data management method and a big data management system based on a DAMA theory.

Background

At present, a data processing system is applied to various industries, administrative, commercial, financial and other industries, and provides requirements for various components and unified data, in order to meet the problems that different business requirements can be stored in different components, a plurality of calculation paths such as real-time calculation, offline calculation and the like exist, most of the existing data processing systems adopt similar narrow data processing methods on the problem of data processing, the data processing systems cannot be controlled to the most source and the most tail end of the data, and the unified control of platform products compatible with various specifications is difficult to realize.

The DAMA theory is a knowledge system for data management and mainly provides a set of theoretical framework for data management work.

Disclosure of Invention

The invention provides a big data management method and a big data management system based on a DAMA theory, which aim to solve the problems.

The invention provides a big data management method based on DAMA theory, which is characterized by comprising the following steps:

step 1: obtaining accessible data resources;

step 2: accessing the accessible data resources, organizing and comprehensively managing the accessible data resources based on a preset big data base platform, and generating a service management result;

and step 3: receiving the service management result and transmitting the service management result to a consumption port; wherein the content of the first and second substances,

the consumption ports include at least one or more of a user port, an application port, and an analysis port.

As an embodiment of the present technical solution, the step 2 further includes, before:

step 100: calculating the sampling capacity of the accessible data resources;

wherein the content of the first and second substances,

representing the sample capacity of the accessible data resources,

represents the first

The sample size of the data resource is accessible in batches,

，

representing the total number of batches of accessible data resources,

represents the first

First in a batch accessible data resource

The number of the resource data is one,

，

represents the total number of resource data,

represents the first

First in a batch accessible data resource

The number of the sampling capacity is one,

represents the first

First in a batch accessible data resource

Number of resourcesAccording to the occupied capacity during sampling;

step 101: calculating a loss difference value of a preset memory access capacity and a sampling capacity;

wherein the content of the first and second substances,

representing the difference in the loss as a function of,

represents the first

The difference in the capacity of the batch of accessible data resources,

representing memory access capacity;

step 102: dividing the loss difference value to determine a division result;

wherein the content of the first and second substances,

in order to divide the result of the division,

representing predicted loss difference, when the division result is

I.e. by

Then, the divided loss difference value is the first type of division result; when the division result is

I.e. by

Then, the divided loss difference value is the second type of division result; when the division result is

I.e. by

Then, the divided loss difference value is the third type of division result; when the division result is

I.e. by

Then, the divided loss difference value is the fourth type division result;

step 103: when the divided loss difference is the first-class division result, the loss difference of the accessible data resources is increased to determine the incremental data;

when the divided loss difference value is the second type division result and the third type division result, acquiring the data capacity of the accessible data resources, and determining corresponding batch data or full data according to the data capacity;

and when the divided loss difference is the fourth type of division result, performing real-time access on the accessible data resources to determine real-time data.

As an embodiment of the present technical solution, the step 2 includes:

step 201: the method comprises the steps of obtaining accessible data resources, performing data access on the accessible data resources to a big data unit through a preset data access mode, and determining multi-source heterogeneous data resources; wherein the content of the first and second substances,

the data access mode comprises batch data access, real-time data access, full data access and incremental data access;

step 202: uniformly converging the multi-source heterogeneous data resources, determining service data, and transmitting the service data to a one-stop data organization management unit;

step 203: and managing, comprehensively utilizing and managing the service data, and generating a corresponding service management result.

As an embodiment of the present technical solution, the step 203 further includes the following steps:

step S1: acquiring service data, and configuring service basic information according to the service data; wherein the content of the first and second substances,

the service basic information at least comprises a data source and an organization role;

step S2: based on a preset standard design tool, carrying out standard design on the service basic information to determine standard data; wherein the content of the first and second substances,

the standard design comprises business modeling, standard definition and physical table creation;

step S3: processing, refining and storing the standard data in a preset task processing model of the standard data based on a preset visual tool to generate target data;

step S4: periodically detecting the target data based on a preset data detection period and generating a quality report; wherein the content of the first and second substances,

the data detection period comprises a data demand response period and a data service construction period;

step S5: when the quality report is qualified, issuing data service in a preset data service mode; wherein the content of the first and second substances,

the data service mode is a mode of authorizing corresponding data through a preset rule;

step S6: receiving a service result of the data service, regulating technical assets through the service result, assisting a user to construct business assets through the technical assets, and generating a corresponding business management result.

As an embodiment of the present invention, the step S4 includes:

step S401: receiving asset data and safety control data, determining receiving time, and determining a data demand response period through the receiving time;

step S402: dividing a data service construction period according to the data demand response period, and passing the data service construction period;

step S403: and detecting the target data periodically through the data service construction period, and generating a quality report.

As an embodiment of the present invention, the step S6 includes:

step S601: receiving a service result of the data service, regulating technical assets through the service result, assisting a user to construct business assets through the technical assets, and generating a corresponding business management result.

Step S602: receiving a service result of the data service based on the big data center, and determining a corresponding service data type; wherein the content of the first and second substances,

the service data types comprise theme data, integration data and real-time data;

step S603: determining a data service according to the service data type, and regulating technical assets through the data service;

step S604: automatically summarizing asset metadata, applicable standards, a consanguineous chart for each asset through the technical assets;

step S606: and constructing the business assets through the asset metadata, the applicable standards and the blood relationship graph, and generating corresponding business management results.

A big data governance system based on DAMA theory is characterized by comprising:

a data resource module: for acquiring accessible data resources;

a data management platform module: the system is used for accessing the accessible data, organizing and comprehensively managing the accessible data based on a preset big data base platform, and generating a service management result;

a data consumption module: the system is used for receiving the service management result and transmitting the service management result to a consumption port; wherein the content of the first and second substances,

As an embodiment of the present technical solution, the accessible data resource at least includes one or more of a relational database data source, a big data source, a file server data source, a message middleware data source, an interface data source, and a search engine data source.

As an embodiment of the present technical solution, the data management platform module includes:

a data access unit: the method comprises the steps of obtaining accessible data resources, accessing the accessible data resources to a big data unit through a preset data access mode, and determining multi-source heterogeneous data resources; wherein the content of the first and second substances,

the data access mode comprises a batch data access mode, a real-time data access mode, a full data access mode and an incremental data access mode;

big data unit: uniformly converging the multi-source heterogeneous data resources to determine service data;

a data management unit: and the system is used for managing, comprehensively utilizing and safely controlling the service data and generating a corresponding service management result.

As an embodiment of the present technical solution, the data management unit further includes:

a one-stop data organization management unit: the system is used for carrying out service mining on the service data, determining the mining data and carrying out organization management on the mining data; wherein the organization management comprises the standard design of data, data development, quality evaluation and asset estimation;

a data comprehensive utilization unit: the system is used for comprehensively utilizing the service data; wherein the comprehensive utilization comprises comprehensive retrieval of data, asset navigation, service navigation, a data cockpit and a knowledge base;

the data security management and control unit: the system is used for carrying out safety control on the service data; the safety management and control comprises user management, service management, audit management, log audit and panoramic operation and maintenance.

The invention has the following beneficial effects:

the embodiment of the invention provides a big data governance method based on DAMA theory, which comprises the steps of obtaining accessible data resources, accessing the accessible data resources, accessing different accessible data resources into a data management platform, correspondingly processing the data in a data access mode, organizing and comprehensively managing the accessible data resources based on a preset big data base platform, generating a service management result, receiving the service management result, transmitting the service management result to a consumption port, carrying out standard design on the data, carrying out data development and data quality and data asset prediction, comprehensively utilizing the data, namely comprehensively retrieving the data, navigating the asset, navigating the service, developing the service, and estimating the data asset, realizing the comprehensive governance of the data, wherein the consumption port at least comprises one or more of a user port, an application port and an analysis port, the method comprises the steps that data are transmitted to one-stop data organization management through a big data base platform, and schemes such as user management, service management, audit management, log audit and panoramic operation and maintenance are realized through safety management and control on the data, so that data consumption is realized; based on the field-oriented abstract design method, a unified management system across large data platforms is realized. A set of efficient and scientific solution is provided for the treatment of big data in a mode of tool + knowledge + operation.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flow chart of a big data governance method based on DAMA theory in the embodiment of the present invention;

FIG. 2 is a block diagram of a big data governance system based on DAMA theory in an embodiment of the present invention;

FIG. 3 is a block diagram of a big data governance system based on DAMA theory in an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly or indirectly connected to the other element.

It will be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like, as used herein, refer to an orientation or positional relationship indicated in the drawings that is solely for the purpose of facilitating the description and simplifying the description, and do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and is therefore not to be construed as limiting the invention.

Moreover, it is noted that, in this document, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions, and "a plurality" means two or more unless specifically limited otherwise. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Example 1:

as shown in fig. 1, an embodiment of the present invention provides 1 a big data governance method based on a DAMA theory, which is characterized by including:

step 1: obtaining accessible data resources;

The working principle and the beneficial effects of the technical scheme are as follows:

Example 2:

this technical solution provides an embodiment, where step 2, before, further includes:

step 100: calculating the sampling capacity of the accessible data resources;

wherein the content of the first and second substances,

representing the sample capacity of the accessible data resources,

represents the first

The sample size of the data resource is accessible in batches,

，

representing the total number of batches of accessible data resources,

represents the first

First in a batch accessible data resource

The number of the resource data is one,

，

represents the total number of resource data,

represents the first

First in a batch accessible data resource

The number of the sampling capacity is one,

represents the first

First in a batch accessible data resource

The occupied capacity of each resource data during sampling;

wherein the content of the first and second substances,

representing the difference in the loss as a function of,

represents the first

The difference in the capacity of the batch of accessible data resources,

representing memory access capacity;

step 102: dividing the loss difference value to determine a division result;

wherein the content of the first and second substances,

in order to divide the result of the division,

representing predicted loss difference, when the division result is

I.e. by

I.e. by

Time, divided loss differenceThe value is the second type division result; when the division result is

I.e. by

I.e. by

Then, the divided loss difference value is the fourth type division result;

the technical scheme calculates the sampling capacity of the accessible data resources

And by sampling capacity

Calculating the loss difference value of the preset memory access capacity and the sampling capacity

Dividing the loss difference value and determining the division result

When the division result is

I.e. by

When the loss difference value of the division is a first-class division result, the loss difference value of the accessible data resource is increased to determine the incremental data, because the loss value of the data is overlarge, the data needs to be processed and increased, and when the division result is a first-class division result, the loss difference value of the accessible data resource is increased to determine the incremental data

I.e. by

I.e. by

When the accessible data resources are within the threshold range, the full access can be performed, and according to the data resources under different conditions, classification processing is performed, a plurality of processes run simultaneously, so that the data running efficiency is improved, and the running cost is reduced. When the division result is

I.e. by

The loss difference of the division is the fourth type division result(ii) a And when the divided loss difference value is the fourth type of division result, the accessible data resources are accessed in real time, real-time data are determined, and when the loss difference value reaches the minimum value, the data are nearly regarded as ideal data, and the ideal data are accessed in real time, so that the precision rate and the flexibility of the data are improved.

Example 3:

this technical scheme provides an embodiment, step 2, including:

the technical scheme includes that accessible data resources are obtained, data access is conducted on the accessible data resources to a big data unit through a preset data access mode, multi-source heterogeneous data resources are determined, the data access mode comprises batch data access, real-time data access, full data access and incremental data access, the different access modes improve the efficiency of data transmission, the flexibility of process operation is improved, the multi-source heterogeneous data resources are uniformly converged, service data are determined, the service data are transmitted to a one-station data organization management unit, the service data are managed, comprehensively utilized and safely controlled, corresponding service management results are generated, comprehensive management of services is achieved, the utilization rate of data is improved, and the process of work operation is accelerated.

Example 4:

this technical solution provides an embodiment, and step 203 further includes the following steps:

step S4: periodically detecting the target data based on a preset data detection period and generating a quality report;

the technical scheme is based on DAMA theory and best practice, obtains service data, starts a data management system, and configures basic information such as a data source and mechanism roles; the method comprises the steps of carrying out standard design on basic information of a service based on a preset standard design tool, carrying out service modeling, standard definition and physical table creation by using the standard design tool, configuring a data access task, realizing uniform convergence of multi-source heterogeneous data, and processing, refining and storing the standard data again to generate target data in a task processing model preset by the standard data based on a preset visual tool; periodically detecting target data based on a preset data detection period, setting quality detection operation, periodically executing a detection task and outputting a quality report; when the quality report is qualified, issuing data service in a preset data service mode, and issuing data service in a data + rule + authorization mode, wherein the data service mode is a mode of authorizing corresponding data through a preset rule; the method comprises the steps of receiving a service result of data service, automatically organizing technical assets through the service result, assisting a user in constructing business assets through the technical assets, automatically summarizing asset metadata, applicable standards, a consanguinity chart and other information for each asset, carrying out data modeling, data development, data management, data sharing and other work, achieving a scientific management system of a data full life cycle, and effectively supporting scene requirements of data organization management, data comprehensive utilization and data safety control.

Example 5:

the present technical solution provides an embodiment, where in the step S4, the method includes:

according to the technical scheme, asset data and safety control data are received, the receiving time length is determined, the data demand response period is determined through the receiving time length, the data response rule is established, so that the data are mined more flexibly and pertinently according to the rule, the data service construction period is divided through the data demand response period, and the data service construction period is passed. The data demand period is used for periodically acquiring data, the data service period is used for mining and constructing the data period, the target data are periodically detected through the data service construction period, a quality report is generated, the data accuracy is improved, based on DAMA theory and best practice, the full life cycle management characteristic processing of the data is covered, and functions of different levels of data standard management systems, flexible and extensible data source adaptation methods, rapid data service release modes and the like are provided.

Example 6:

the present technical solution provides an embodiment, where in the step S6, the method includes:

step S601: receiving a service result of the data service, regulating technical assets through the service result, assisting a user in constructing business assets through the technical assets, and generating a corresponding business management result;

the technical scheme includes that a service result of a data service is received, technical assets are normalized through the service result, a user is assisted to construct business assets through the technical assets, a corresponding business management result is generated, the user is helped to construct business, the service result of the data service is received based on a big data center, and a corresponding service data type is determined; wherein the service data types comprise subject data, integration data and real-time data; determining a data service according to the service data type, and regulating technical assets through the data service; automatically summarizing asset metadata, applicable standards, a consanguineous chart for each asset through the technical assets; and constructing the business assets through the asset metadata, the applicable standards and the blood relationship graph, and generating corresponding business management results.

Example 7:

a big data governance system based on DAMA theory, comprising:

a data resource module: for acquiring accessible data resources;

the technical scheme provides a big data management system based on a DAMA theory, which comprises a data resource module, a data management platform module and a data consumption module, wherein the data resource module is used for acquiring accessible data resources; the data management platform module is used for accessing the accessible data, organizing and comprehensively managing the accessible data based on a preset big data base platform, and generating a service management result; the data consumption module is used for receiving the service management result and transmitting the service management result to the consumption port; the consumption ports include at least one or more of a user port, an application port, and an analysis port.

Example 8:

the present disclosure provides an embodiment, where the accessible data resources include at least one or more of a relational database data source, a big data source, a file server data source, a message middleware data source, an interface data source, and a search engine data source.

the accessible data resources at least comprise one or more of a relational database data source, a big data source, a file server data source, a message middleware data source, an interface data source and a search engine data source, the heterogeneous data sources of multiple sources are uniformly converged, a service is constructed, and the utilization rate and the commercial value of data are improved.

Example 9:

this technical solution provides an embodiment, and the data management platform module includes:

the data management platform module comprises a data access unit, a big data unit and a data management unit, wherein the data access unit is used for acquiring accessible data resources, accessing the accessible data resources to the big data unit through a preset data access mode and determining multi-source heterogeneous data resources, and the data access mode comprises a batch data access mode, a real-time data access mode, a full data access mode and an incremental data access mode; the big data unit is used for uniformly converging multi-source heterogeneous data resources and determining service data; the data management unit is used for managing, comprehensively utilizing and safely controlling the service data and generating a corresponding service management result.

Example 10:

the technical solution provides an embodiment, where the data management unit further includes:

a one-stop data organization management unit: the system is used for carrying out service mining on the service data, determining the mining data and carrying out organization management on the mining data; wherein the content of the first and second substances,

the organization management comprises the standard design of data, data development, quality evaluation and asset estimation;

a data comprehensive utilization unit: the system is used for comprehensively utilizing the service data; wherein the content of the first and second substances,

the comprehensive utilization comprises comprehensive retrieval of data, asset navigation, service navigation, a data cockpit and a knowledge base;

the data security management and control unit: the system is used for carrying out safety control on the service data; wherein the content of the first and second substances,

the safety management and control comprises user management, service management, audit management, log audit and panoramic operation and maintenance.

in the data management unit of the technical scheme, the one-stop data organization management unit is used for carrying out service mining on service data, determining the mined data, carrying out organization management on the mined data, providing original data for data service, and carrying out organization management including standard design, data development, quality evaluation and asset estimation of the data and judging the value of the data so as to achieve maximum commercial utilization; the data comprehensive utilization unit is used for comprehensively utilizing the service data, and comprehensively utilizing the service data, including comprehensive retrieval of the data, asset navigation, service navigation, a data cockpit and a knowledge base; the data security management and control unit: the system is used for carrying out safety control on the service data; the safety management and control comprises user management, service management, audit management, log audit and panoramic operation and maintenance.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A big data governance method based on DAMA theory is characterized by comprising the following steps:

step 1: obtaining accessible data resources;

2. The big data governance method based on DAMA theory as claimed in claim 1, wherein said step 2, before, further comprises:

step 100: calculating the sampling capacity of the accessible data resources;

wherein the content of the first and second substances,

representing the sample capacity of the accessible data resources,

represents the first

The sample size of the data resource is accessible in batches,

，

representing the total number of batches of accessible data resources,

represents the first

First in a batch accessible data resource

The number of the resource data is one,

，

represents the total number of resource data,

represents the first

First in a batch accessible data resource

The number of the sampling capacity is one,

represents the first

First in a batch accessible data resource

The occupied capacity of each resource data during sampling;

wherein the content of the first and second substances,

representing the difference in the loss as a function of,

represents the first

The difference in the capacity of the batch of accessible data resources,

representing memory access capacity;

step 102: dividing the loss difference value to determine a division result;

wherein the content of the first and second substances,

in order to divide the result of the division,

representing predicted loss difference, when the division result is

I.e. by

I.e. by

I.e. by

I.e. by

Then, the divided loss difference value is the fourth type division result;

3. The big data governance method based on the DAMA theory as claimed in claim 1, wherein said step 2, comprises:

4. The big data governance method based on DAMA theory as claimed in claim 3, wherein said step 203 further comprises the steps of:

5. The big data governance method based on the DAMA theory as claimed in claim 4, wherein said step S4 includes:

6. The big data governance method based on the DAMA theory as claimed in claim 4, wherein said step S6 includes:

step S602: receiving a service result of the data service based on the big data center, and determining a corresponding service data type; wherein the service data types comprise subject data, integration data and real-time data;

7. A big data governance system based on DAMA theory is characterized by comprising:

a data resource module: for acquiring accessible data resources;

8. The DAMA-theory based big data governance system in accordance with claim 7, wherein said accessible data resources comprise at least one or more of relational database data sources, big data sources, file server data sources, message middleware data sources, interface data sources, and search engine data sources.

9. The DAMA-theory based big data governance system according to claim 7, wherein said data management platform module comprises:

10. The DAMA theory-based big data governance system according to claim 9, wherein said data management unit further comprises: