CN113468160A - Data management method and device and electronic equipment - Google Patents

Data management method and device and electronic equipment Download PDF

Info

Publication number
CN113468160A
CN113468160A CN202110837541.XA CN202110837541A CN113468160A CN 113468160 A CN113468160 A CN 113468160A CN 202110837541 A CN202110837541 A CN 202110837541A CN 113468160 A CN113468160 A CN 113468160A
Authority
CN
China
Prior art keywords
field
data
base table
data set
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110837541.XA
Other languages
Chinese (zh)
Inventor
刘圣财
许阳
叶科
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Dt Dream Technology Co Ltd
Original Assignee
Hangzhou Dt Dream Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Dt Dream Technology Co Ltd filed Critical Hangzhou Dt Dream Technology Co Ltd
Priority to CN202110837541.XA priority Critical patent/CN113468160A/en
Publication of CN113468160A publication Critical patent/CN113468160A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/254Extract, transform and load [ETL] procedures, e.g. ETL data flows in data warehouses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application provides a data management method and device and electronic equipment. The method comprises the following steps: configuring a service data set containing field information in a service theme base model and a field standard corresponding to the field information; based on the data item information in the business data set, constructing a mapping relation between each data item and a field in an original base table; based on the data standard corresponding to the data item in the service data set by each field in the original base table, performing data cleaning on the data in the original base table, and storing the cleaned standardized data into a standard base table inheriting the field information of the original base table; after data fusion is carried out on the standardized data in the standard base table, based on the mapping relation between the data item of the service data set and the field in the original base table, the field inherited from the original base table in the standard database is determined to correspond to the field in the subject base table, and therefore the fused data are stored in the subject base table.

Description

Data management method and device and electronic equipment
Technical Field
The embodiment of the application relates to the technical field of internet, in particular to a data management method and device and electronic equipment.
Background
The data management means that data management of internal and external shared data is realized through data access, data cleaning, data fusion, special processing and other data management modes, and a uniform large data resource library is formed. And then the big data resource library provides uniform data directory service for the internal business system and the external sharing exchange.
Disclosure of Invention
The embodiment of the specification provides a data management method and device and an electronic device:
according to a first aspect of embodiments herein, there is provided a data governance method, the method comprising:
configuring a service data set containing a subject library model and a field standard corresponding to the field information;
based on the business data set information, constructing a mapping relation between each business data set data item and a field in an original base table;
based on the field standard corresponding to the service data set by each field in the original base table, performing data cleaning on the data in the original base table, and storing the cleaned standardized data into a standard base table inheriting the field information of the original base table;
after data fusion is carried out on the standardized data in the standard base table, based on the mapping relation between the data item of the service data set and the field in the original base table, the field in the standard database inherited to the original base table is determined to be mapped to the field in the subject base table, and therefore the fused data are stored in the subject base table.
According to a second aspect of embodiments herein, there is provided a data governance device, the device comprising:
the configuration unit is used for configuring a service data set comprising a subject library model and a field standard corresponding to the field information;
the construction unit is used for constructing the mapping relation between the data item in each business data set and the field in the original base table based on the business data set information;
the cleaning unit is used for cleaning data in the original base table based on the data standard of each field in the original base table corresponding to the business data set data item, and storing the cleaned standardized data into a standard base table inheriting the field information of the original base table;
and the fusion unit is used for determining that the field inherited from the original base table in the standard database corresponds to the field in the subject base table based on the mapping relation between the data item of the service data set and the field in the original base table after the standardized data in the standard base table are subjected to data fusion, so that the fused data are stored in the subject base table.
According to a third aspect of embodiments herein, there is provided an electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured as any one of the data governance methods described above.
The embodiment of the present specification provides a data governance scheme, which is implemented by taking a governance target as a driver and taking a result as a guide, combing a business-level data set (including a topic library model and a data element and a cleansing rule of each topic library field) first, and finally optimizing and controlling a data governance process from a primitive library- > a standard library- > a topic library on the basis of the business data set in a mapping manner, so as to shorten the time for standardized cleansing of a table and improve the quality of data after cleansing.
Drawings
FIG. 1 is a schematic diagram of a prior art data governance system as provided herein.
Fig. 2 is a flowchart of a data governance method provided in an embodiment of the present specification.
FIG. 3 is a schematic diagram of an improved data governance system provided by one embodiment of the present description.
Fig. 4 is a hardware structure diagram of a data management apparatus according to an embodiment of the present disclosure.
FIG. 5 is a block diagram of a data governance device provided in an embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.
The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The data management means that data management of internal and external shared data is realized through data access, data cleaning, data fusion, special processing and other data management modes, and a uniform large data resource library is formed. And then the big data resource library provides uniform data directory service for the internal business system and the external sharing exchange.
FIG. 1 is a schematic diagram of a prior art data governance system. In fig. 1, a large database repository interfaces data services with data sources and application layers.
The data source may refer to each business system, and the business systems may access the big data repository through ETL (Extract-Transform-Load) and transmit data generated by the business systems to an original repository of the big data repository. In fig. 1, the data types of the data source may include structured data, semi-structured data, and unstructured data, and thus it can be seen that the data standards of the data source are not uniform.
Data services at the application layer may include data sharing, service development, BI reporting, data mining, and the like. The operation of these data services needs to rely on the data after data governance provided by the thematic library.
As shown in FIG. 1, the large database resource library is divided into a source library, a standard library, a subject library and a subject library.
The original library is used for interfacing with a data source and storing data transmitted from the data source. The primary library mainly relates to a data access link in a data management process.
The standard library is used for probing data stored in the original library, cleaning and converting are carried out according to data standards (including data elements, data dictionaries and cleaning rules) to finally form standardized data, and the table structure after cleaning directly inherits the table structure of the original library. The standard library mainly relates to a data cleaning link in a data management process.
The subject database is used for performing fusion processing (such as analysis, synthesis, classification and fusion) on the standardized data stored in the standard database, abstracting to form an entity object in the service field, and finally forming a data set for normalization, integrity and consistency of the entity object. For example: in the field of safety production supervision, a plurality of business systems relate to production enterprises, and relevant information of different business systems and enterprises is analyzed, extracted and designed into a large and complete enterprise data model, and enterprise theme information which can be oriented to various business scenes and business fields is formed through data fusion.
The special topic library is a data market layer and is used for generating related data meeting specific service scenes according to the requirements of an application layer and the data of the topic library and the standard library, so that the application of the application layer service is supported.
The database is combined with various links involved in the data management process, wherein the original database relates to a data docking link, the standard database relates to a data cleaning link, the subject database relates to a data fusion link, and the special subject database relates to a special subject processing link.
1. Original library flow (flow of data governance of original library): the method mainly depends on an ETL tool, and extracts the data into an original library of a large data resource library in a full or incremental, timing or real-time mode according to the actual situation of business production data, and the process does not process the data.
2. Standard library procedure (standard library data governance procedure): based on data of an original library, the method adopts the following steps to carry out standardized cleaning:
1) data exploration: and performing data exploration on the service data, mainly exploring the vacancy rate, the maximum text length, the value range and the code distribution of field data in the data table, and comprehensively grasping the data of each field in the table according to an exploration result.
2) Carding standard: all data standards (mainly comprising qualifiers, data elements and data dictionaries) of the data table fields are combed according to the understanding and mastering conditions of the fields and combined with national standards/line standards.
3) Data cleaning: and associating the combed data standard to a field of an original library data table, and carrying out standardized cleaning to form standardized data.
3. Theme bank flow (flow of data governance of theme bank):
1) and carrying out deep analysis on the data after the standardized cleaning, and abstracting to form a service data model.
2) And combing the priority of the data fields of the service systems relative to the data model fields according to the authority of different service systems to different data.
3) And according to the field priority relation, taking the standard library data as a reference, and fully fusing to form a subject library.
4. Topic library flow (flow of data management of topic library): processing is carried out according to a multidimensional mode on the premise of business requirements of a business user, wherein the processing comprises defining dimensionality, indexes needing to be calculated, dimensionality hierarchy and the like, and a data set facing business system decision analysis requirements is generated.
On the basis of knowing the existing data management system, the data management process has the following problems for the customers with commonality (such as the industrial customers of emergency, fire fighting and the like):
1. the cost of specifying data criteria is high. The data standard is established in the standard database process, and the established data standard is very important and directly influences the data quality after cleaning. In the step of combing the data standard (qualifier, data element, data dictionary, field rule) according to the exploration result, not only the business data is understood, but also the national standard/row standard of the related field needs to be searched on the network, and the process adopted for each project has large investment and low output.
2. The rules for data cleansing are not uniform. In the step of data cleaning in the standard library process, the association of the sorted data standard to the original library data is simple in combination with the understanding of the business data, but because each client has very much business data and the field amount is large, the link occupies a large workload in the working process, and because different people have different understandings of the data standard and the cleaning algorithm is different, even different cleaning may be adopted for the unified field in different business systems, so that the quality of the cleaned standard data is uneven.
3. Data fusion lacks a uniform fusion criterion. In the theme library process, data of the business system needs to be fused to generate a theme library, and the fact that the same business subject exists in the same industry is considered, but different people have very large difference in final fused themes due to the difference in cognitive level, business understanding and data authority understanding.
4. The standard library is missing an association with the subject library. In the existing data cleaning, cleaning rules from an original library to a standard library are generated through a service data centralized data standard, automatic generation of a subject library is not considered, and the problem of uniqueness check of the cleaning rules of the same field in different services is not considered, so that repeated work of cleaning data inconsistency and subject library manual SQL fusion is required again.
In summary, the existing data management process mainly adopts a data management process sequence from an original library- > a standard library- > a subject library- > a special subject library, and depends excessively on the familiarity and cognitive degree of client field data management personnel to the industry, so that the quality of the data after being managed is uneven.
In order to solve the above problems, the present application provides a data management scheme, which is driven by a management target, and is directed by a result, a business-level data set (including a topic library model, a data element of each topic library field, and a cleaning rule) is first carded, and finally, a data management process from an original library- > a standard library- > a topic library- > a special library is optimized and controlled by a mapping manner on the basis of the business data set, so that the time for standardized cleaning of a table is shortened, and the quality of the cleaned data is improved.
The following may be described by way of example with reference to a data governance method as shown in fig. 2, which may include the steps of:
step 110: and configuring a business data set containing the subject library model and the field standard corresponding to the field information.
In the embodiment, the treatment target is used as a drive, the result is used as a guide, and the business level data set is combed from the theme base table.
Specifically, the step 110 may include:
step A1: and determining field information of the subject library conforming to the target service based on the industry specification related to the target service.
By referring to the existing industry instructive documents (such as laws and regulations, policy bulletins, construction task books and the like) and combining the relevant specifications of the industry, the specifications and functions related to the industry and the relevant data table field information are combed to form the field information of the business-level subject table.
For example, enterprise basic information, expert information, hazardous chemical substance information, hazard source information and potential safety hazard information in the safety production industry; earthquake information, flood disaster information, drought information and hazardous chemical substance explosion information in disaster accident service; rescue team and rescue personnel information in the emergency rescue force service; rescue goods and materials information, rescue equipment information and the like in the emergency rescue goods and materials service.
Step A2: and determining the field standard of the subject base table based on the field information of the subject base table.
Based on the field information of the subject base table, the field standard of the data can be unified and combed, and each field in the subject base table is ensured to correspond to the unique field standard.
In addition, because a plurality of roles and a plurality of sections of processes form a condition that the same field standard corresponds to different service data sets, the same field standard and different qualifiers can be adopted for association.
For example, the legal representative identity card number, the responsible person identity card number, the on-duty person identity card number and the corresponding field standard are the legal representative-identity card number, the responsible person-identity card number and the on-duty person-identity card number.
Step A3: and constructing a business data set containing the subject library model and the field standard corresponding to the field information.
Wherein, the field information of the service data set may include: at least one of a Chinese name, an English name, a length, a field origin, a data type, and a meaning expressed by the field.
The field standard may include at least one of an object word, a characteristic word, a representation word, a field value rule, a dictionary code, and a value range.
Step 120: and constructing a mapping relation between each business data set data item and a field in an original base table based on the information of the data items in the business data sets.
In the implementation process, a data source needs to be accessed into an original table of a large data resource base. Because the original database table does not process the data of the data source, the data of the original data standard is reserved.
To facilitate data fusion of subsequent standard base tables to subject base tables. Here, a mapping relationship between each business data set data item and a field in the original base table can be constructed based on the information of the data item in the business data set.
Considering that the existing table structure of each service and the subject table in the service data set have certain differences, the mapping process may have the following situations:
1. one-to-one mapping relationship. Correspondingly, in the step 120, constructing a mapping relationship between each business data set data item and a field in the original library table includes:
and when the fields in the original base table correspond, constructing a one-to-one mapping relation between the business data set data items and the fields in the original base table.
In such a one-to-one case, it is easier to handle, and data items in the business data set may be mapped with fields in the database table one by one.
2. One-to-many mapping. Correspondingly, in the step 120, constructing a mapping relationship between each business data set data item and a field in the original library table includes:
when the original base table contains non-keyword information of other tables, the original base is split and mapped into a plurality of subject base tables by taking a service data set as a reference;
and after filtering repeated fields in the temporary table, constructing a mapping relation between the business data set and the fields in the temporary table, which correspond to the fields in the original base table.
In this case, the third paradigm (3NF) is not adopted when designing the data tables for the data sources, that is, one original table contains the non-primary key information of other tables. For the situation, a form of table splitting is mainly adopted, namely an original table which is a wide table is mapped to a plurality of temporary tables corresponding to the business data sets. Repeated data may exist in a certain table temporarily after the table is detached, and for the situation, the repeated data can be checked and filtered out firstly.
3. Many-to-one mapping. Correspondingly, in the step 120, constructing a mapping relationship between each business data set data item and a field in the original library table includes:
and when a plurality of original base tables belong to the same relation table, performing association combination on the same fields of the plurality of original base tables, and constructing the corresponding relation between the service data set and the fields after the association combination.
This is mainly the case for a data source to store data originally belonging to the same relational table dispersedly into a plurality of original table. For this case, the associations need to be combined into a table.
4. Many-to-many mapping relationships. Correspondingly, in the step 120, constructing a mapping relationship between each business data set data item and a field in the original library table includes:
when fields of the same type of objects are stored in a plurality of original base tables, the original base tables are related into a wide table, and the wide table is divided into a plurality of temporary tables for mapping the service data set;
and removing the duplicate fields in the temporary table by adopting the key fields, and constructing the mapping relation between the service data set and the fields in the temporary table after the duplication removal, which correspond to the fields in the original table.
This situation may result in splitting of the same class of objects into different primitive library tables due to the data source specific framework. For this situation, different tables need to be associated into a wide table, and then split into different temporary tables according to the 2 nd one-to-many case, and then mapping is performed.
5. The columns are rotated to rows. Correspondingly, in the step 120, constructing a mapping relationship between each business data set data item and a field in the original library table includes:
when the dictionary name is used as a column name in the original base table, converting the dictionary name in the original base table into a field name according to the data standard in the business data set, and converting the column name into a field value;
and constructing the mapping relation between each business data set data item and the field in the original base table.
The condition is mainly that the dictionary type is limited when the data source is designed, and the dictionary type is listed in the original library table one by one. For example, class lesson teacher information in a school educational system, the data source creates a class table with each lesson as a column of table fields, as shown in table 1 below:
table 1: directly using dictionary as column name
Class of class Chinese language Mathematics, and english language Art and craft Sports
1 class Zhang Yi Li Yi Liu Er Wang Yi Zhao San
One (2) class Zhang two Li Er Liu Er Wangsan Zhao San
One (3) class Zhang San Li Yi Liu San Wang Yi Zhao San
For the situation, the dictionary name in the original base table is converted into a field name according to the data standard in the business data set, and the column name is converted into a field value; as shown in table 2 below:
table 2: storage with column name converted into field value
Figure BDA0003177706140000091
Figure BDA0003177706140000101
Step 130: and based on the field standard corresponding to the service data set by each field in the original base table, performing data cleaning on the data in the original base table, and storing the cleaned standardized data into the standard base table inheriting the field information of the original base table.
Step 140: after data fusion is carried out on the standardized data in the standard base table, based on the mapping relation between the data item of the service data set and the field in the original base table, the field inherited from the original base table in the standard database is determined to correspond to the field in the subject base table, and therefore the fused data are stored in the subject base table.
After the business data sets are configured and the mapping relationships are constructed, the flow of the data governance system related to the embodiment is different from that of the data governance system shown in fig. 1.
Reference may be made to the schematic diagram of the improved data governance system shown in figure 3. FIG. 3 is an improvement over FIG. 1 in the flow from the standard library to the subject library. Wherein:
solid black line 1: the relation between the subject table and the business data set, and the fields of the business data set are the same as those of the subject table.
Black solid line 2: and constructing a mapping relation between the business data set data item and the original base table.
Black dotted line 3: the standard base table inherits the field information of the original base table, namely the field of the standard base table is the same as that of the original base table; the fields of the subject table are the same as the fields of the business data set; therefore, based on the mapping relation between the business data set data item and the original base table, the mapping relation between the original base table and the business data set can be directly inherited from the standard base table to the subject base table.
According to the embodiment of the specification, data in the database table are subjected to data cleaning through the field standard configured by the subject table, so that standardized data are obtained, and the consistency of field cleaning rules is guaranteed.
The field relation from the standard base table to the subject base table directly inherits the mapping relation between the original base table and the business data set, avoids the need of re-configuration of the fused field relation, and shortens the development time of the subject base table.
The standard table is used as an intermediate temporary table, so that the repeated cleaning times of the fields are reduced, and the waste of computing resources caused by repeated cleaning of the fields is avoided.
In summary, in the embodiment of the present disclosure, by taking the result as a guide, the service-level service data set is first sorted, and the data management process from the original library- > the standard library- > the subject library is driven by the mapping relationship, so that the time for data management is shortened, the consumption of computing resources is reduced, and the data quality after data management is improved.
Corresponding to the embodiment of the data governance method, the specification also provides an embodiment of a data governance device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer business program instructions in the nonvolatile memory into the memory for operation through the processor of the device in which the device is located. In terms of hardware, as shown in fig. 4, the hardware structure diagram of the device where the data governance apparatus is located in this specification is shown, except for the processor, the network interface, the memory, and the nonvolatile memory shown in fig. 4, the device where the apparatus is located in the embodiment may generally include other hardware according to the actual data governance function, which is not described again.
Referring to fig. 5, a block diagram of a data governance device according to an embodiment of the present disclosure is provided, where the device corresponds to the embodiment shown in fig. 2, and the device includes:
a configuration unit 310, configured to configure a service data set including a topic library model and a field standard corresponding to the field information;
a constructing unit 320, configured to construct a mapping relationship between each service data set data item and a field in an original base table based on field information of a service data set in the service data set;
a cleaning unit 330, configured to perform data cleaning on data in the original base table based on a field standard that each field in the original base table corresponds to the service data set, and store the cleaned standardized data in a standard base table that inherits field information of the original base table;
the fusion unit 340, after performing data fusion on the standardized data in the standard library table, determines, based on the mapping relationship between the data item of the service data set and the field in the original library table, that the field in the standard database inherited from the original library table corresponds to the field in the subject library table, so as to store the fused data in the subject library table.
Optionally, the configuration unit 310 includes:
determining field information of a subject library conforming to the target service based on an industry standard related to the target service;
determining field standard of the subject base table based on the field information of the subject base table;
and constructing a business data set containing the subject library model and the field standard corresponding to the field information.
Optionally, each piece of field information corresponds to a unique field standard; the field standard comprises at least one of an object word, a characteristic word, a representation word and a field value rule, a dictionary code and a value range.
Optionally, the field information of the service data set includes: at least one of a Chinese name, an English name, a length, a field origin, a data type, and a meaning expressed by the field.
Optionally, the configuration unit 310 further includes:
and when the same field standard exists in the service data sets and corresponds to different service data sets, associating the same field standard with different qualifiers.
Optionally, in the constructing unit 320, constructing a mapping relationship between each data item and a field in the original library table includes:
and when the fields in the original base table correspond to each other, constructing a one-to-one mapping relation between the data items and the fields in the original base table.
Optionally, in the constructing unit 320, constructing a mapping relationship between each data item and a field in the original library table includes:
when the original base table contains non-keyword information of other tables, splitting the original base into a plurality of temporary tables;
and after filtering repeated fields in the temporary table, constructing a mapping relation between the data item and the fields in the temporary table, which correspond to the fields in the original table.
Optionally, in the constructing unit 320, constructing a mapping relationship between each data item and a field in the original library table includes:
and when a plurality of original base tables belong to the same relation table, performing association combination on the same fields of the plurality of original base tables, and constructing the corresponding relation between the data items and the fields after the association combination.
Optionally, in the constructing unit 320, constructing a mapping relationship between each data item and a field in the original library table includes:
when fields of the same type of objects are stored in a plurality of original base tables, associating the original base tables with a wide table, and splitting the wide table into a plurality of temporary tables for mapping the service data set;
and removing the duplicate fields in the temporary table by adopting the key fields, and constructing a mapping relation between the service data set and the fields in the temporary table after the duplication removal, which correspond to the fields in the original table.
Optionally, in the constructing unit 320, constructing a mapping relationship between each data item and a field in the original library table includes:
when the dictionary name is used as the column name in the original library table, converting the dictionary name in the original library table into a field name according to the data standard in the data item, and converting the column name into a field value;
and constructing the mapping relation between each data item and the field in the original base table.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may take the form of a personal computer, laptop computer, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.
Fig. 5 above describes the internal functional modules and the structural schematic of the data governance device, and the actual execution subject may be an electronic device, including:
a processor; a memory for storing processor-executable instructions;
wherein the processor is configured to perform the data governance method of any preceding embodiment.
In the above embodiments of the electronic device, it should be understood that the Processor may be a Central Processing Unit (CPU), other general-purpose processors, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), etc. The general-purpose processor may be a microprocessor, or the processor may be any conventional processor, and the aforementioned memory may be a read-only memory (ROM), a Random Access Memory (RAM), a flash memory, a hard disk, or a solid state disk. The steps of a method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware processor, or may be implemented by a combination of hardware and software modules in a processor.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the embodiment of the electronic device, since it is substantially similar to the embodiment of the method, the description is simple, and for the relevant points, reference may be made to part of the description of the embodiment of the method.
Other embodiments of the present description will be apparent to those skilled in the art from consideration of the specification and practice of the application disclosed herein. This specification is intended to cover any variations, uses, or adaptations of the specification following, in general, the principles of the specification and including such departures from the present disclosure as come within known or customary practice within the art to which the specification pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
It will be understood that the present description is not limited to the precise arrangements described above and shown in the drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the present description is limited only by the appended claims.

Claims (13)

1. A data governance method, comprising:
configuring a service data set comprising field information in a theme base model corresponding to a service and a field standard corresponding to the field information;
based on the data item information in the business data set, constructing a mapping relation with fields in an original base table;
based on the field standard of each field in the original base table corresponding to the data item in the business data set, data in the original base table are cleaned, and the cleaned standardized data are stored in a standard base table inheriting the field information of the original base table;
and determining that the field inherited from the original base table in the standard base table corresponds to the field in the subject base table based on the mapping relation between the data item information in the business data set and the field in the original base table, so that the fused data is stored in the subject base table corresponding to the subject base model.
2. The method of claim 1, wherein configuring the business data set containing the topic library model and the field criteria corresponding to the field information comprises:
determining field information of a subject library conforming to the target service based on an industry standard related to the target service;
determining field standard of the subject base table based on the field information of the subject base table;
and constructing a business data set containing the subject library model and the field standard corresponding to the field information.
3. The method of claim 2, wherein each field information corresponds to a unique field standard, and wherein the field standard comprises at least one of an object word, a feature word, a representation word, and a field value rule, a dictionary code, and a value range.
4. The method of claim 2, wherein the field information of the service data set comprises: at least one of a Chinese name, an English name, a length, a field origin, a data type, and a meaning expressed by the field.
5. The method according to claim 2, wherein when the same field standard exists in the service data set and corresponds to different service scenario data items, the same field standard and different qualifiers are used for distinguishing.
6. The method of claim 1, wherein constructing a mapping relationship between each business data set data item and a field in an original library table comprises:
and when the fields in the original base table correspond, constructing a one-to-one mapping relation between the business data set data items and the fields in the original base table.
7. The method of claim 1, wherein constructing a mapping relationship between each business data set data item and a field in an original library table comprises:
when the original base table contains non-keyword information of other tables, splitting the original base into a plurality of temporary tables according to the service data set;
and after filtering repeated fields in the temporary table, constructing a mapping relation between the business data set and the fields in the temporary table, which correspond to the fields in the original base table.
8. The method of claim 1, wherein constructing a mapping relationship between each business data set data item and a field in an original library table comprises:
and when a plurality of original base tables belong to the same relation table, performing association combination on the same fields of the plurality of original base tables, and constructing the corresponding relation between the service data set and the fields after the association combination.
9. The method of claim 1, wherein constructing a mapping relationship between a data item in each business data set and a field in an original base table comprises:
when fields of the same class of objects are stored in a plurality of original base tables, the original base tables are related into a wide table, and the wide table is divided into a plurality of temporary tables for mapping the inherited service data set;
and removing the duplicate fields in the temporary table by adopting the key fields, and constructing a mapping relation between the service data set and the fields in the temporary table after the duplication removal, which correspond to the fields in the original table.
10. The method of claim 1, wherein constructing a mapping relationship between each business data set data item and a field in an original library table comprises:
when the dictionary name is used as the column name in the original library table, converting the dictionary name in the original library table into a field name according to the service data set, and converting the column name into a field value;
and constructing the mapping relation between each business data set data item and the field in the original base table.
11. A data governance device, the device comprising:
the configuration unit is used for configuring a service data set comprising field information in a theme base model corresponding to a service and a field standard corresponding to the field information;
the construction unit is used for constructing the mapping relation between each business data set data item and the field in the original base table based on the information of the business data set data items;
the cleaning unit is used for cleaning data in the original base table based on the data standard of each field in the original base table corresponding to the business data set data item, and storing the cleaned standardized data into a standard base table inheriting the field information of the original base table;
and the fusion unit is used for determining that the field inherited from the original base table in the standard database corresponds to the field in the subject base table based on the mapping relation between the data item of the service data set and the field in the original base table after the standardized data in the standard base table are subjected to data fusion, so that the fused data are stored in the subject base table.
12. An electronic device, comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor is configured as the method of any of the preceding claims 1-10.
13. A computer-readable storage medium, wherein instructions in the computer-readable storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method of any of claims 1-10.
CN202110837541.XA 2021-07-23 2021-07-23 Data management method and device and electronic equipment Pending CN113468160A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110837541.XA CN113468160A (en) 2021-07-23 2021-07-23 Data management method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110837541.XA CN113468160A (en) 2021-07-23 2021-07-23 Data management method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN113468160A true CN113468160A (en) 2021-10-01

Family

ID=77882172

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110837541.XA Pending CN113468160A (en) 2021-07-23 2021-07-23 Data management method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113468160A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661723A (en) * 2022-03-29 2022-06-24 杭州数梦工场科技有限公司 Data processing method and device and electronic equipment
CN115599840A (en) * 2022-10-17 2023-01-13 中电科大数据研究院有限公司(Cn) Complex service data management method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337291A1 (en) * 2016-05-17 2017-11-23 JustTagIt, Inc. Function and memory mapping registry with reactive management events
CN111061833A (en) * 2019-12-10 2020-04-24 北京明略软件系统有限公司 Data processing method and device, electronic equipment and computer readable storage medium
US20200169685A1 (en) * 2018-11-23 2020-05-28 Sony Corporation Apparatus and method for tuner control by middleware
US20200233862A1 (en) * 2019-01-23 2020-07-23 Servicenow, Inc. Grammar-based searching of a configuration management database
CN112364003A (en) * 2020-11-09 2021-02-12 南威软件股份有限公司 Big data management method, device, equipment and medium for different industries

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170337291A1 (en) * 2016-05-17 2017-11-23 JustTagIt, Inc. Function and memory mapping registry with reactive management events
US20200169685A1 (en) * 2018-11-23 2020-05-28 Sony Corporation Apparatus and method for tuner control by middleware
US20200233862A1 (en) * 2019-01-23 2020-07-23 Servicenow, Inc. Grammar-based searching of a configuration management database
CN111061833A (en) * 2019-12-10 2020-04-24 北京明略软件系统有限公司 Data processing method and device, electronic equipment and computer readable storage medium
CN112364003A (en) * 2020-11-09 2021-02-12 南威软件股份有限公司 Big data management method, device, equipment and medium for different industries

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
杨兴凯;: "基于本体的政务数据仓库构建方法研究", 计算机工程与设计, no. 07 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114661723A (en) * 2022-03-29 2022-06-24 杭州数梦工场科技有限公司 Data processing method and device and electronic equipment
CN115599840A (en) * 2022-10-17 2023-01-13 中电科大数据研究院有限公司(Cn) Complex service data management method and system

Similar Documents

Publication Publication Date Title
US11403464B2 (en) Method and system for implementing semantic technology
AU2019204976B2 (en) Intelligent data ingestion system and method for governance and security
Shepperd et al. Researcher bias: The use of machine learning in software defect prediction
US8856157B2 (en) Automatic detection of columns to be obfuscated in database schemas
US11551105B2 (en) Knowledge management using machine learning model trained on incident-knowledge relationship fingerprints
US10339038B1 (en) Method and system for generating production data pattern driven test data
Beheshti et al. Intelligent knowledge lakes: The age of artificial intelligence and big data
CN113468160A (en) Data management method and device and electronic equipment
Silva et al. Integrating big data into the computing curricula
JP2019040600A (en) Determination of task automation using natural language processing
Dasgupta et al. Towards auto-remediation in services delivery: Context-based classification of noisy and unstructured tickets
Pramanik et al. A framework for criminal network analysis using big data
AU2012244271A1 (en) Associative memory-based project management system
US20190042951A1 (en) Analysis of computing activities using graph data structures
CN115481111A (en) Data fusion method and device, computer equipment and storage medium
US8527552B2 (en) Database consistent sample data extraction
Eren et al. A K-means algorithm application on big data
Chen Database Design and Implementation
Dass et al. Amelioration of big data analytics by employing big data tools and techniques
CN113468161A (en) Data management method and device and electronic equipment
US20170032004A1 (en) Core data services based cross-system analytics
US20230072607A1 (en) Data augmentation and enrichment
US20210049553A1 (en) Rate Ingestion Tool
Mohamed et al. STA Data Model for Effective Business Process Modelling
Ferreira Gestão de dados em arquiteturas de microsserviços

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination