CN115374105A

CN115374105A - Data processing method and device

Info

Publication number: CN115374105A
Application number: CN202210819792.XA
Authority: CN
Inventors: 王明; 王天振; 陈建欣; 李印; 庞艳蓓; 付大超; 李飞飞
Original assignee: Alibaba Cloud Computing Ltd
Current assignee: Alibaba Cloud Computing Ltd
Priority date: 2022-07-13
Filing date: 2022-07-13
Publication date: 2022-11-22

Abstract

An embodiment of the present specification provides a data processing method and an apparatus, wherein the data processing method includes: the method comprises the steps of obtaining at least two service data tables, constructing an initial knowledge graph based on fields contained in the at least two service data tables, determining an incidence relation between different fields in the at least two service data tables according to historical operation data related to the at least two service data tables, updating the initial knowledge graph according to the incidence relation to generate a target knowledge graph, and constructing a service width table of the target service based on the target knowledge graph.

Description

Data processing method and device

Technical Field

The embodiment of the specification relates to the technical field of computers, in particular to a data processing method.

Background

A Data Warehouse (Data wait, which may be abbreviated as DW or DWH) is a structured Data environment. The data warehouse can provide data support for data analysis, data reporting, data mining and other applications, the data warehouse management is a core content of the data warehouse operation and maintenance, the data warehouse management generally comprises data maintenance, data warehouse construction evaluation and the like, and the data warehouse is mainly used for sorting, summarizing and recombining information and timely providing the information to decision-making personnel.

Disclosure of Invention

In view of this, the embodiments of the present specification provide a data processing method. One or more embodiments of the present specification also relate to a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program, so as to solve the technical deficiencies of the prior art.

According to a first aspect of embodiments herein, there is provided a data processing method comprising:

acquiring at least two service data tables, and constructing an initial knowledge graph based on fields contained in the at least two service data tables, wherein the at least two service data tables respectively correspond to different service types of a target service;

determining the association relation between different fields in the at least two business data tables according to historical operation data related to the at least two business data tables;

updating the initial knowledge graph according to the incidence relation to generate a target knowledge graph;

and constructing a service width table of the target service based on the target knowledge graph.

Optionally, the constructing an initial knowledge graph based on fields contained in the at least two business data tables includes:

taking a table identifier of a target service data table as a first node, taking a field identifier corresponding to different fields in the target service data table as a second node, taking an inclusion relation between the target service data table and the different fields as an edge between the first node and the second node, and constructing an initial sub-knowledge graph corresponding to the target service data table, wherein the target service data table is one of the at least two service data tables, and the initial sub-knowledge graphs corresponding to the at least two service data tables jointly form the initial knowledge graph.

Optionally, the updating the initial knowledge graph according to the association relationship to generate a target knowledge graph includes:

under the condition that the association relationship exists between a first field in a first service data table and a second field in a second service data table, establishing an edge between a second node corresponding to the first field and a second node corresponding to the second field in the initial knowledge graph based on the association relationship, and establishing an edge between the first node of the first service data table and the first node of the second service data table, so as to update the initial knowledge graph and generate a target knowledge graph.

Optionally, the determining, according to the historical operation data related to the at least two service data tables, an association relationship between different fields in the at least two service data tables includes:

and determining a first association relationship between the at least two business data tables and a second association relationship between different fields in the at least two business data tables according to historical operation data related to the at least two business data tables.

under the condition that a first association relation exists between a first service data table and a second service data table, establishing edges among a first node of the first service data table and a first node of the second service data table in the initial knowledge graph based on the first association relation;

and under the condition that a second association relationship exists between a first field in the first service data table and a second field in the second service data table, establishing edges among a second node corresponding to the first field and a second node corresponding to the second field in the initial knowledge graph based on the second association relationship so as to update the initial knowledge graph and generate a target knowledge graph.

Optionally, the service width table comprises a database table;

correspondingly, the building of the service width table of the target service based on the target knowledge graph comprises the following steps:

taking the field identifications corresponding to each node in the target knowledge graph as fields to construct an initial database table;

adjusting the field position in the initial database table according to the incidence relation among the nodes in the target knowledge graph to generate an intermediate database table;

and adding the service data in the at least two service data tables to the data units of the corresponding fields in the intermediate database table to generate a target database table.

Optionally, the constructing a service width table of the target service based on the target knowledge graph includes:

and constructing an enhanced entity relation graph based on the target knowledge graph, and constructing a service width table of the target service according to the incidence relation among different entities in the enhanced entity relation graph.

Optionally, the constructing an enhanced entity relationship graph based on the target knowledge graph includes:

determining a first field and a second field of the at least two business data tables with incidence relation;

performing deduplication processing on the service data contained in the first field and the second field;

determining the data volume of the service data contained in the first field and the second field according to the duplicate removal processing result, and dividing the at least two service data tables into a master table and a slave table according to the data volume;

determining the association relationship between the master table and the slave table according to the association relationship between the nodes in the target knowledge graph;

and constructing an enhanced entity relationship graph based on the incidence relationship between the master table and the slave table.

Optionally, the dividing the at least two service data tables into a master table and a slave table according to the data amount includes:

dividing a service data table to which a target field with the data volume larger than a preset data volume threshold value belongs in the first field and the second field into a master table, and dividing service data tables outside the master table in the at least two service data tables into slave tables;

correspondingly, the determining the association relationship between the master table and the slave table according to the association relationship between the nodes in the target knowledge graph includes:

and determining the association relationship between the master table and each slave table according to the association relationship between the nodes in the target knowledge graph.

Optionally, the adding the service data in the at least two service data tables to the data unit of the corresponding field in the intermediate database table to generate a target database table includes:

determining a mapping relation between a target business data table of the at least two business data tables and each field in the intermediate data warehouse table, wherein the target business data table is one of the at least two business data tables;

and based on the table structure of the intermediate database table, adding the business data in the target business data table to the data unit of the corresponding field in the intermediate database table according to the mapping relation, and generating the target database table.

Optionally, after the constructing the target data warehouse table based on the target knowledge graph, the method further includes:

receiving a data query instruction, wherein the data query instruction carries a field identifier of a target field to be queried and a table identifier of a target service data table to which the field identifier belongs;

taking the table identifier as index information, and performing data index processing according to the mapping relation between the table identifier and the target field in the service width table;

and outputting the index result as a data query result.

Optionally, after the constructing the service width table of the target service based on the target knowledge graph, the method further includes:

and under the condition that incremental data exist in the at least two service data tables, updating the service width table based on the incremental data.

Optionally, the data processing method further includes:

inputting the field information contained in the at least two service data tables into a text processing model for similarity calculation, and determining the association relation between different fields in the at least two service data tables according to the similarity calculation result.

According to a second aspect of embodiments herein, there is provided a data processing apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is configured to acquire at least two service data tables and construct an initial knowledge graph based on fields contained in the at least two service data tables, and the at least two service data tables respectively correspond to different service types of a target service;

the determining module is configured to determine an association relation between different fields in the at least two business data tables according to historical operation data related to the at least two business data tables;

the generating module is configured to update the initial knowledge graph according to the incidence relation and generate a target knowledge graph;

a construction module configured to construct a service width table of the target service based on the target knowledge-graph.

According to a third aspect of embodiments herein, there is provided a computing device comprising:

a memory and a processor;

the memory is used for storing computer executable instructions, and the processor is used for executing the computer executable instructions to realize the steps of any one of the data processing methods.

According to a fourth aspect of embodiments herein, there is provided a computer-readable storage medium storing computer-executable instructions that, when executed by a processor, implement the steps of any one of the data processing methods.

According to a fifth aspect of embodiments herein, there is provided a computer program, wherein the computer program, when executed in a computer, causes the computer to perform the steps of the above-mentioned data processing method.

In an embodiment of the present specification, at least two service data tables are obtained, and an initial knowledge graph is constructed based on fields included in the at least two service data tables, where the at least two service data tables respectively correspond to different service types of a target service, an association relationship between different fields in the at least two service data tables is determined according to historical operation data related to the at least two service data tables, the initial knowledge graph is updated according to the association relationship to generate a target knowledge graph, and a service width table of the target service is constructed based on the target knowledge graph.

The embodiment of the specification builds the initial knowledge graph based on the fields in the service data table, and generates the target knowledge graph in a mode of updating the incidence relation between different nodes in the initial knowledge graph based on the historical operation data, so that the service wide table can be built based on the target knowledge graph, the automation or semi-automation of the construction of the service wide table can be realized to a certain extent, the cost for constructing the service wide table is reduced, the construction efficiency of the service wide table is improved, and the accuracy of the construction result is ensured.

Drawings

FIG. 1 is a flow chart of a data processing method provided by an embodiment of the present description;

FIG. 2a is a diagram illustrating initial knowledge-graph construction results provided by one embodiment of the present description;

FIG. 2b is a schematic diagram of a target knowledge-graph provided by one embodiment of the present description;

FIG. 3 is a flowchart illustrating a data processing method according to an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present specification;

fig. 5 is a block diagram of a computing device according to an embodiment of the present disclosure.

Detailed Description

In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present description. This description may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein, as those skilled in the art will be able to make and use the present disclosure without departing from the spirit and scope of the present disclosure.

The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in one or more embodiments of the present specification refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, etc. may be used herein in one or more embodiments to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, a first can also be referred to as a second and, similarly, a second can also be referred to as a first without departing from the scope of one or more embodiments of the present description. The word "if" as used herein may be interpreted as "at" \8230; "or" when 8230; \8230; "or" in response to a determination ", depending on the context.

First, the noun terms to which one or more embodiments of the present specification relate are explained.

Metadata: metadata describes data of data, descriptive information of data and information resources. Metadata may be described in terms of attributes such as the size of a data store, acquisition time, update time, maintainer, etc.

Data asset: data assets are data collection resources owned or controlled by individuals or businesses and can bring certain economic benefits.

Knowledge graph: and integrating related knowledge of the vertical domain in a mode of graph Schema, describing the domain knowledge through the entities and the relations of the graph, and providing production value.

ER graph: an entity relationship diagram is a method for providing entities/attributes and relationships to describe a conceptual model of the real world.

Data warehouse wide table: the database table which puts the relevant indexes, dimensions and attributes of the same business body together is used for modeling a data warehouse of big data, mining data and the like.

Modeling a data warehouse: specifically, starting from the business process, the data warehouse selects a suitable dimension and integrates corresponding fact indexes.

In the existing business system, online data is stored and used by an OLTP data system, in order to solve the learning of some data statistical reports and intelligent AI systems, an enterprise often builds a data warehouse system based on an OLAP system, because the data storage and use modes of the business system are different, the data in the OLTP system of the online system is often stored dispersedly according to a data paradigm and a business field, and the data in the OLAP system of the data warehouse often models data in a near business field to be stored and used intensively, which relates to the building of a data model in the data warehouse.

The data management system of the embodiment of the specification maintains data and metadata data assets of an online system and a warehouse system in an enterprise, combines the business relation and the operation relation of the data assets, constructs a data asset knowledge graph, stores entity information of the data assets of the enterprise and association relations of various business fields, and can automatically construct a database table based on the data asset knowledge graph.

In the present specification, a data processing method is provided, and the present specification relates to a data processing apparatus, a computing device, a computer-readable storage medium, and a computer program, which are described in detail one by one in the following embodiments.

Fig. 1 shows a flowchart of a data processing method provided in accordance with an embodiment of the present specification, which specifically includes the following steps.

102, acquiring at least two service data tables, and constructing an initial knowledge graph based on fields contained in the at least two service data tables, wherein the at least two service data tables respectively correspond to different service types of a target service.

In the embodiment of the specification, the data warehouse is a strategic collection which provides all types of data support for the decision making process of a user, and is an environment for providing current and historical data used by the user for decision support, and the data is difficult or unavailable in a traditional operation type database. Data warehouse technology is a general term for various technologies and modules for effectively integrating operational data into a unified environment to provide decision-based data access, and the ultimate purpose of the technology is to make it faster and more convenient for users to query for required information and to provide decision support.

The traditional business field modeling of the data warehouse is realized by the manual modeling of business field experts, and the business field experts in a certain field of a plurality of groups of warehouses learn and summarize the business, design and develop the model, and convert the data storage organization mode of an online system into the data storage organization mode in the data warehouse.

In an actual service OLTP system, data storage is often distributed to each service system for storage due to service and performance requirements, for example, a service data table of a member domain, assuming that 4 systems maintain member information, and corresponding member information is stored in 4 service data tables, which respectively correspond to service data tables ods _ huiyuan _ t1, ods _ huiyuan _ t2, ods _ huiyuan _ t3, and ods _ huiyuan _ t4. The business data tables store the related information of different dimensions of the member users.

These data are distributed into 4 systems and four databases, representing the storage form of the related data of the member domain distributed in OLTP. Respectively for the business use of 4 business systems, but these data stored and used dispersedly are very unfriendly to data analysts and business operators, whose work of analyzing some data of customers is difficult to perform. Therefore, in this case, a developer of the data warehouse uses the OLAP system to re-model and develop the data, and a common development work is to integrate data in several business fields and construct a business width table, where the business width table contains most relevant data information in the field. For example, in a data warehouse, the data of the member domain corresponds to a wide table named ods _ huiyuan _ info, the table has a unique user id as a main key, and other fields are member related information, so that a service developer of the data warehouse can design a similar wide table model for the data of the member domain and write the member domain data in the service system into the wide table model, and subsequent data analysts and service staff can directly use the wide table in the data warehouse for data analysis without reading service data tables scattered into 4 systems in the service system.

The modeling process of the data warehouse is carried out by depending on the understanding of developers of the data warehouse to the services of all subsystems in the online service system, so that the developers are required to be very familiar with the data model of the online service system to ensure the accuracy of the constructed result. In the meantime, a lot of manpower and time are consumed, and particularly, a lot of business learning and model review are needed for the data warehouse model construction of a new business system.

The embodiment of the specification can use the data management system to manage the service data of each subsystem, and can use the part of service data to construct the data asset knowledge graph, so that the automatic construction of the service wide table based on the data asset knowledge graph is realized, and the service wide table can be a database table.

Specifically, the purpose of the data warehouse is to enable a user to query needed information more quickly and more conveniently and provide decision support, so that the target service, that is, the service platform in the embodiment of the present specification may provide service services for the user, including but not limited to a member purchasing service, a financial service, or a financial service, and the like, and the service type is a service type for the user to query the decision information, where the target service is a member purchasing service, the service type includes but not limited to a member registration service type, a member consumption service type, and the like in the member purchasing service, or a financial product consultation service type or a financial product transaction service type in the financial service.

Because the service data related to different service types are respectively and independently stored in the service data tables corresponding to the service types, and the service wide table is constructed, that is, the data in each service data table is integrated to generate a data warehouse wide table, when constructing the data warehouse table, at least two service data tables need to be acquired first to construct an initial knowledge map based on the service data (physical metadata) contained in each service data table, and the initial knowledge map can be updated subsequently to generate a target knowledge map.

In practical applications, the at least two service data tables may be stored in the data management system, so that the at least two service data tables are obtained, that is, the at least two service data tables stored in the data management system are obtained.

In specific implementation, the constructing of the initial knowledge graph based on the fields contained in the at least two service data tables includes:

Specifically, because the service data tables are independent from each other, and the association relationship between the two service data tables cannot be obtained only when the initial knowledge graph is constructed according to the service data tables, the initial sub-knowledge graphs corresponding to the service data tables can be constructed only according to the service data contained in the service data tables, and then the initial knowledge graph is constructed by constructing at least two generated initial sub-knowledge graphs. Specifically, the table identifier of the target service data table is used as a first node, the field identifiers corresponding to different fields in the target service data table are used as second nodes, the inclusion relation between the target service data table and the different fields is used as an edge between the first node and the second node, and the initial sub-knowledge graph corresponding to the target service data table is constructed.

A schematic diagram of an initial knowledge graph construction result provided in an embodiment of the present specification is shown in fig. 2a, where the initial knowledge graph shown in fig. 2a includes 4 initial sub-knowledge graphs, where each initial sub-knowledge graph is constructed by one service data table, a node in the middle of an initial sub-knowledge graph is a table identifier, which represents a service data table, and other nodes are field identifiers including fields in the service data table.

And 104, determining the association relation between different fields in the at least two business data tables according to historical operation data related to the at least two business data tables.

Specifically, after the initial knowledge graph is constructed, the association relationship among the nodes in the initial knowledge graph can be determined according to historical operation data so as to update the initial knowledge graph.

The historical operation data may be used to represent historical operation relationships between different fields in the same service data table or different service data tables, including but not limited to operation relationships such as modification and update of a field in a service data table by a user, or association operations between two service data tables. Therefore, the association relationship between different fields in each business data table can be determined according to the historical operation data related to each business data table.

For example, after a user queries a first service data table, the user switches to query a second service data table, that is, the first service data table and the second service data table are represented to be associated, and meanwhile, the user is respectively associated with the first service data table and the second service data table; for another example, a field of the user identifier is stored in the first service data table, and the user identifier of the user U1 is stored under the field; a field of a commodity is stored in the second service data table, and a commodity M1 is stored under the field; the historical operation data contains data that the user U1 purchases the commodity M1, and under the condition, the association relationship between the user identification field in the first business data table and the commodity field in the second business data table can be determined.

Or, the association relationship between different fields in at least two service data tables can be determined by the field information in at least two service data tables, specifically, the field information included in at least two service data tables can be input into a text processing model for similarity calculation, and the association relationship between different fields in at least two service data tables can be determined according to the similarity calculation result.

Specifically, after an initial knowledge graph is constructed based on fields contained in at least two business data tables, field information in each business data table can be obtained, the field information is subjected to knowledge extraction in a machine learning mode, the association relation between different fields in each business data table is determined according to the extraction result, the field information can be input into a text processing model for semantic recognition, the field information in each business data table is subjected to similarity calculation according to the semantic recognition result, then the association relation between two fields with the similarity larger than a preset similarity threshold is determined according to the similarity calculation result, and the initial knowledge graph is updated according to the association relation.

In practical application, the historical operation data (operation metadata) can be stored in the data management system as well; or, since the at least two service data tables may be stored in the data management system, the field information in the at least two service data tables may be obtained from the data management system, so as to determine the association relationship between different fields in the at least two service data tables according to the similarity between the field information.

And 106, updating the initial knowledge graph according to the incidence relation to generate a target knowledge graph.

Specifically, the initial knowledge graph is constructed and generated by fields contained in each business data table, each business data table is independent of each other, the generated initial knowledge graph is constructed only based on the business data table, and the information which can be covered by the initial knowledge graph is limited, so that the embodiment of the specification can acquire historical operation data and/or business metadata related to each business type, determine the association relation between different fields in each business data table based on the historical operation data and/or the business metadata, update the initial knowledge graph based on the association relation, and acquire the target knowledge graph, wherein the historical operation data and/or the business metadata related to each business type can be stored in the data management system.

In specific implementation, the initial knowledge graph is updated according to the association relationship to generate a target knowledge graph, specifically, under the condition that it is determined that an association relationship exists between a first field in a first service data table and a second field in a second service data table, an edge is constructed between a second node corresponding to the first field and a second node corresponding to the second field in the initial knowledge graph based on the association relationship, and an edge is constructed between the first node of the first service data table and the first node of the second service data table to update the initial knowledge graph to generate the target knowledge graph.

Specifically, as a schematic diagram of an initial knowledge graph construction result is shown in fig. 2a, only a connection relationship exists between a node corresponding to a table identifier of a service data table and a node corresponding to a field identifier of a field included in the table, and no connection relationship occurs between a node corresponding to a table identifier of another service data table or a node corresponding to a field identifier in another service data table regardless of the node corresponding to a table identifier or the node corresponding to a field identifier.

Therefore, in the embodiments of the present specification, a connection relationship needs to be constructed between nodes corresponding to field identifiers of fields included in two service data tables according to an association relationship between different fields in each service data table, specifically, when it is determined that an association relationship exists between a first field in a first service data table and a second field in a second service data table, an edge is constructed between a second node (corresponding to the field identifier of the field) corresponding to the first field and the second node corresponding to the second field in the initial knowledge graph based on the association relationship, and an edge is constructed between the first node (corresponding to the table identifier of the service data table) in the first service data table and the first node in the second service data table based on the association relationship, or when an association relationship exists between the first field and the second field in the same service data table, an edge is constructed between the second node of the first field and the second node in the second service data table, and an edge is constructed between the second node of the first field and the second node in the second service data table based on the association relationship, so as to update the initial knowledge graph based on the connection relationship, thereby generating a target knowledge graph.

Fig. 2b is a schematic diagram of a target knowledge graph provided in this embodiment, and compared with the initial knowledge graph in fig. 2a, a connection relationship between nodes corresponding to field identifiers in fields in the same service data table or different service data tables is added to the target knowledge graph in fig. 2 b.

Along the above example, if the user identifier U1 included in the user field in the first service data table has a purchase operation on the commodity M1 included in the commodity field in the second service data table, an edge between a node corresponding to the user identifier U1 and a node corresponding to the commodity M1 can be established; similarly, if it is determined that a blood relationship, a calculation relationship, or the like exists between any two nodes, an edge between the two nodes can be established.

The embodiment of the specification updates the incidence relation among different nodes in the initial knowledge graph based on historical operation data, can realize automatic or semi-automatic construction of the knowledge graph to a certain extent, is favorable for improving the construction efficiency of the target knowledge graph and is favorable for ensuring the accuracy of the construction result of the target knowledge graph.

Or, determining the association relationship between different fields in the at least two service data tables according to the historical operation data related to the at least two service data tables, including:

Further, updating the initial knowledge graph according to the association relationship to generate a target knowledge graph, including:

and under the condition that a second association relationship exists between a first field in the first service data table and a second field in the second service data table, constructing edges among a second node corresponding to the first field and a second node corresponding to the second field in the initial knowledge graph based on the second association relationship so as to update the initial knowledge graph and generate a target knowledge graph.

Specifically, the historical operation data may be used to represent historical operation relationships between different fields in the same service data table or different service data tables, including but not limited to operation relationships such as modification and update of a certain field in a service data table by a user, or associated operations between two service data tables. Therefore, the association relationship between the business data tables and the association relationship between different fields in the same business data table or different business data tables can be determined according to the historical operation data related to the business data tables.

For example, after a user queries a first service data table, the user switches to query a second service data table, that is, the first service data table and the second service data table are represented to be associated, and meanwhile, the user is respectively associated with the first service data table and the second service data table; for another example, if the user performs the association operation on the first service data table and the second service data table at a certain historical time point, the association of the first service data table and the second service data table is represented; or, the historical operation data includes data that the user U1 purchases the product M1, and in this case, it can be determined that the user identification field in the first service data table and the product field in the second service data table have an association relationship.

Therefore, the association relationship between the at least two business data tables and between different fields in the at least two business data tables can be determined according to the historical operation data related to the at least two business data tables, so as to update the initial knowledge graph based on the association relationship.

Specifically, when it is determined that the first service data table and the second service data table have an association relationship, an edge is constructed in the initial knowledge graph between a first node of the first service data table and a first node of the second service data table based on the association relationship, and when it is determined that the first field in the first service data table and a second field in the second service data table have a second association relationship, an edge is constructed in the initial knowledge graph between a second node corresponding to the first field and a second node corresponding to the second field based on the association relationship, so as to update the initial knowledge graph and generate the target knowledge graph.

In practical application, when an edge between a first node of a first service data table and a first node of a second service data table is constructed, domain knowledge possibly existing between the first service data table and the second service data table can be determined according to an association relation between the first service data table and the second service data table and by combining a target service. For example, if the target service is an e-commerce service, the service data tables may include a member table, a transaction table, and an object table, and if an association relationship exists between the three tables according to historical operation data (shopping data of a user or association operation data of the user on the three tables), it may be determined that the domain knowledge between the three tables may be "transaction-related"; if the target service is a financial service, then the domain knowledge between the service data sheets may be "loan-related".

After determining the domain knowledge between the first service data table and the second service data table, an edge can be constructed between the first node of the first service data table and the first node of the second service data table, and the domain knowledge is added to the initial knowledge graph as the attribute information of the edge, so as to update the knowledge graph.

In addition, for a node corresponding to a field identifier in the initial knowledge graph, that is, a second node, if it is determined that the first field and the second field in the service data table have an association relationship, an edge between the second node corresponding to the first field and the second node corresponding to the second field may be constructed in the initial knowledge graph, and the association relationship may be added to the initial knowledge graph as attribute information of the edge, so as to update the initial knowledge graph and generate the target knowledge graph.

The embodiment of the specification combines the knowledge graph and the data asset management field, breaks through the traditional storage mode and organization mode of the data asset, stores the incidence relation between the business data table in the data asset and the fields in the business data table into the initial knowledge graph, induces the domain knowledge of the target business and the relation information between different fields in the business data table by using the business metadata and/or the operation metadata in the data management system, and then introduces the domain knowledge and the relation information into the initial knowledge graph to construct the complete data asset knowledge graph belonging to individuals or enterprises, can realize the automatic or semi-automatic construction of the knowledge graph to a certain extent, is favorable for improving the construction efficiency of the target knowledge graph, and is favorable for ensuring the accuracy of the construction result of the target knowledge graph.

And 108, constructing a service width table of the target service based on the target knowledge graph.

Specifically, after the target knowledge graph is generated, a service width table of the target service can be automatically constructed based on the target knowledge graph, and the service width table can be a database table.

In specific implementation, the service width table comprises a database table;

Specifically, since the database table is composed of different fields, when the target database table is constructed, the field identifier corresponding to each node in the target knowledge base map can be used as the field of the initial database table, and the field identifier is added to the initial database table to construct the initial database table. And then, adjusting the front-back position of the field in the initial database table according to the association relationship between the nodes in the target knowledge graph, for example, if the field Z1 is located in the first column of the initial database table and the field Z2 is located in the tenth column of the initial database table, but the association relationship between the field Z1 and the field Z2 can be determined according to the target knowledge graph, the field Z2 can be adjusted from the tenth column to the second column, but the specific adjustment manner can be determined according to actual needs, which is not limited herein.

And generating an intermediate target database table after adjustment, and then adding the business data in the business data table to the data units under the corresponding fields in the intermediate database table to generate the target database table. For example, if the first column in the intermediate database table is a member id, the member id included in each service data table may be added to the data unit included in the first column.

The method includes adding service data in at least two service data tables to data units of corresponding fields in an intermediate database table to generate a target database table, and specifically, determining a mapping relationship between a target service data table in the at least two service data tables and each field in the intermediate database table, where the target service data table is one of the at least two service data tables, and based on a table structure of the intermediate database table, adding the service data in the target service data table to the data unit of the corresponding field in the intermediate database table according to the mapping relationship to generate the target database table.

Specifically, because the fields contained in different business data tables are different, when the intermediate knowledge graph is constructed, a mapping relation between the business data table and each field in the intermediate database table can be established according to the containing relation between the business data table and the field, and the mapping relation is used for representing the storage position of the business data in the business data table in the database table.

In addition, one of the table structures of the intermediate database table is shown in table 1.

TABLE 1

Member ID	KEY1	KEY2	……	KEYn	……
						1	V1	V2	……	……	……
2	……	……	……	Vn	……
						……	……	……	……	……	……

Therefore, after the intermediate database table is generated, the mapping relationship between the target business data table of the at least two business data tables and each field in the intermediate database table can be determined, and the business data in the target business data table is added to the data unit of the corresponding field in the intermediate database table according to the mapping relationship based on the table structure of the intermediate database table, so as to generate the target database table.

The embodiment of the specification builds the initial knowledge map based on the fields in the business data table, and generates the target knowledge map in a mode of updating the incidence relation between different nodes in the initial knowledge map based on historical operation data, so that the database table can be built based on the target knowledge map, automation or semi-automation of database modeling can be realized to a certain extent, the cost for building the database table is reduced, the building efficiency of the database table is improved, and the accuracy of the building result is ensured.

Or, constructing a service width table of the target service based on the target knowledge graph, including:

and constructing an enhanced entity relationship diagram based on the target knowledge graph, and constructing a service width table of the target service according to the incidence relation among different entities in the enhanced entity relationship diagram.

Wherein, constructing an enhanced entity relationship graph based on the target knowledge graph comprises:

Further, dividing the at least two service data tables into a master table and a slave table according to the data volume, including:

dividing a service data table to which a target field with a data volume larger than a preset data volume threshold value belongs in the first field and the second field into a master table, and dividing a service data table outside the master table in the at least two service data tables into a slave table;

Specifically, a relationship entity graph, namely an Enhanced-ER model, is Enhanced.

In the target knowledge graph, because there are more association relations between the fields after the introduction of the service metadata and the operation metadata, in this case, the embodiments of the present specification may determine whether there is an association relation between two service data tables based on the association relation, thereby constructing an enhanced entity relationship graph based on the association relation between the service data tables.

In a specific process of constructing the enhanced entity relationship graph, a first field and a second field of an association relationship in at least two service data tables can be determined, that is, in at least two service data tables, the first field in the first service data table and the second field in the second service data table have an association relationship, then, the service data contained in the first field and the second field can be subjected to deduplication processing, the data volume of the service data contained in the first field and the second field is determined according to the deduplication processing result, the service data table in which the field (target field, which is the first field or the second field) containing the service data with a larger data volume is located in the first field and the second field is used as a master table, other service data tables are used as slave tables, and then, according to the association relationship between nodes in the target knowledge graph, whether the association relationship exists between the master table and each slave table is determined, so as to construct the enhanced entity relationship graph based on the association relationship. For example, if a connection relationship exists between one node in the master table B1 and one node in one slave table B2, it is determined that the master table B1 and the slave table B2 have an association relationship; in the case that a node in the slave table B2 has a connection relationship with a node in the slave table B3, but a node in the master table B1 does not have a connection relationship with any node in the slave table B3, it can still be determined that the master table B1 has an association relationship with the slave table B3.

By displaying the product visualization result of the enhanced ER diagram constructed based on the target knowledge graph, the corresponding display result can be seen, an incidence relation exists in the original unassociated business data table, the incidence relation can be a field join relation from a data management system, and the knowledge graph can construct the corresponding relation as long as similar learning and acquisition mechanisms exist.

Based on the above enhanced ER capability based on the data asset knowledge graph, the embodiment of the present specification may reversely model the membership table of 4 subsystems with the association field, thereby forming a service width table associated with the uid _1 field of the ods _ huiyuan _ t1 and the uid _2 field of the ods _ huiyuan _ t2, the uid _3 field of the ods _ huiyuan _ t3, and the uid _4 field of the ods _ huiyuan _ t4, which are associated with the manually modeled wide table of the previous data warehouse, and thus implementing the function of automatically constructing the service width table based on the data asset knowledge graph.

In addition, after the service width table of the target service is constructed based on the target knowledge graph, the method further comprises the following steps:

and outputting the index result as a data query result.

Specifically, the data warehouse is a strategic set which provides all types of data support for the decision making process of the user, so that the user can inquire required information more quickly and conveniently, and the decision support is provided for the user, therefore, after a business broad table, namely a target data warehouse table is generated, a data inquiry instruction of the user can be received, the table identification of the target business data table in the data inquiry instruction is used as index information, data index processing is carried out on the business data under the target field according to the mapping relation between the table identification and each field in the target data warehouse table, and an index result is output.

Still taking the target service as a transaction service as an example, if the data query instruction is to query service data under a member ID field in a service data table B1, taking a table identifier "B1" as index information, performing data index processing according to a mapping relationship between the table identifier "B1" in the target database table and the member ID field, and outputting an index result (1, 2).

And the data query service is provided for the user through the target database table, so that convenience of multi-class data query is provided for the user.

In addition, after the service width table is constructed based on the target knowledge graph, in the case that incremental data exists in the at least two service data tables, the service width table is updated based on the incremental data.

The embodiment of the specification utilizes the incidence relation of the data asset knowledge graph and the metadata service field to successfully and automatically construct the service wide table under the service field, service data dispersed in several independent service systems are unified to form a wide table model required in a data warehouse, developers who model the data warehouse do not intervene in the process or knowledge learning in the service field, automation or semi-automation of modeling the data warehouse can be achieved to a certain extent, cost required by constructing the data warehouse table is reduced, efficiency of constructing the data warehouse table is improved, and accuracy of a constructed result is guaranteed.

The embodiment of the specification builds the initial knowledge graph based on the fields in the service data table, and generates the target knowledge graph in a mode of updating the incidence relation between different nodes in the initial knowledge graph based on the historical operation data, so that the service wide table can be built based on the target knowledge graph, the automation or semi-automation of the construction of the service wide table can be realized to a certain extent, the cost for building the service wide table is reduced, the construction efficiency of the service wide table is improved, and the accuracy of the construction result is ensured.

The following describes the data processing method further by taking the application of the data processing method provided in the present specification to member purchasing as an example, with reference to fig. 3. Fig. 3 shows a processing procedure flowchart of a data processing method provided in an embodiment of the present specification, which specifically includes the following steps.

Step 302, at least two member data tables stored in the data management system are obtained, wherein, at least two member data tables correspond to different service types respectively.

And step 304, taking the table identifier of the target member data table as a first node, taking the field identifiers corresponding to different fields in the target member data table as a second node, taking the inclusion relation between the target member data table and the different fields as an edge between the first node and the second node, and constructing an initial sub-knowledge graph corresponding to the target member data table. Wherein the target service data table is one of the at least two service data tables.

And step 306, combining the initial sub-knowledge maps respectively corresponding to the at least two member data tables to form an initial knowledge map.

Step 308, determining the association relationship between different fields in the at least two member data tables according to the historical operation data which is stored in the data management system and is related to the at least two member data tables.

And 310, under the condition that the first fields and the second fields in the at least two member data tables are determined to have the association relationship, establishing edges among the second nodes corresponding to the first fields and the second nodes corresponding to the second fields in the initial knowledge graph based on the association relationship so as to update the initial knowledge graph and generate the target knowledge graph.

In step 312, any one member data table, which contains a data amount greater than a preset data amount threshold, of the at least two member data tables is divided into a master table, and the member data tables other than the master table of the at least two member data tables are divided into slave tables.

And step 314, determining the association relationship between the master table and each slave table according to the association relationship between the nodes in the target knowledge graph, and constructing an enhanced entity relationship graph according to the association relationship between the master table and each slave table.

And step 316, constructing a target database table according to the incidence relation among different entities in the enhanced entity relation graph.

The embodiment of the specification builds the initial knowledge graph based on the fields in the member data table, and generates the target knowledge graph in a mode of updating the incidence relation between different nodes in the initial knowledge graph based on historical operation data, so that the database table can be built based on the target knowledge graph, the automation or semi-automation of database modeling can be realized to a certain extent, the cost for building the database table is reduced, the building efficiency of the database table is improved, and the accuracy of the building result is ensured.

Corresponding to the above method embodiment, this specification further provides an embodiment of a data processing apparatus, and fig. 4 shows a schematic structural diagram of a data processing apparatus provided in an embodiment of this specification. As shown in fig. 4, the apparatus includes:

an obtaining module 402, configured to obtain at least two service data tables, and construct an initial knowledge graph based on fields included in the at least two service data tables, where the at least two service data tables respectively correspond to different service types of a target service;

a determining module 404 configured to determine an association relationship between different fields in the at least two service data tables according to historical operation data related to the at least two service data tables;

a generating module 406 configured to update the initial knowledge graph according to the association relationship to generate a target knowledge graph;

a building module 408 configured to build a business breadth table for the target business based on the target knowledge-graph.

Optionally, the obtaining module 402 is further configured to:

Optionally, the generating module 406 is further configured to:

under the condition that the association relationship exists between a first field in a first service data table and a second field in a second service data table, constructing an edge between a second node corresponding to the first field and a second node corresponding to the second field in the initial knowledge graph based on the association relationship, and constructing an edge between a first node of the first service data table and a first node of the second service data table so as to update the initial knowledge graph and generate a target knowledge graph.

Optionally, the determining module 404 is further configured to:

Optionally, the generating module 406 is further configured to:

Optionally, the service width table comprises a database table;

accordingly, the building module 408 is further configured to:

constructing an initial database table by taking field identifications corresponding to each node in the target knowledge graph as fields;

Optionally, the building module 408 is further configured to:

and constructing an enhanced entity relationship graph based on the association relationship between the master table and the slave table.

Optionally, the building module 408 is further configured to:

and adding the business data in the target business data table to the data units of the corresponding fields in the intermediate database table according to the mapping relation based on the table structure of the intermediate database table to generate the target database table.

Optionally, the data processing apparatus further includes a query module configured to:

and outputting the index result as a data query result.

Optionally, the data processing apparatus further includes a processing module configured to:

Optionally, the data processing apparatus further comprises an input module configured to:

The foregoing is a schematic arrangement of a data processing apparatus of the present embodiment. It should be noted that the technical solution of the data processing apparatus and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the data processing apparatus can be referred to the description of the technical solution of the data processing method.

FIG. 5 illustrates a block diagram of a computing device 500 provided in accordance with one embodiment of the present description. The components of the computing device 500 include, but are not limited to, a memory 510 and a processor 520. Processor 520 is coupled to memory 510 via bus 530, and database 550 is used to store data.

Computing device 500 also includes access device 540, access device 540 enabling computing device 500 to communicate via one or more networks 560. Examples of such networks include the Public Switched Telephone Network (PSTN), a Local Area Network (LAN), a Wide Area Network (WAN), a Personal Area Network (PAN), or a combination of communication networks such as the internet. The access device 540 may include one or more of any type of network interface, e.g., a Network Interface Card (NIC), wired or wireless, such as an IEEE802.11 Wireless Local Area Network (WLAN) wireless interface, a worldwide interoperability for microwave access (Wi-MAX) interface, an ethernet interface, a Universal Serial Bus (USB) interface, a cellular network interface, a bluetooth interface, a Near Field Communication (NFC) interface, and so forth.

In one embodiment of the present description, the above-described components of computing device 500, as well as other components not shown in FIG. 5, may also be connected to each other, such as by a bus. It should be understood that the block diagram of the computing device structure shown in FIG. 5 is for illustration purposes only and is not intended to limit the scope of the present description. Those skilled in the art may add or replace other components as desired.

Computing device 500 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., tablet, personal digital assistant, laptop, notebook, netbook, etc.), mobile phone (e.g., smartphone), wearable computing device (e.g., smartwatch, smartglasses, etc.), or other type of mobile device, or a stationary computing device such as a desktop computer or PC. Computing device 500 may also be a mobile or stationary server.

The processor 520 is configured to execute computer-executable instructions, which, when executed by the processor, implement the steps of the data processing method described above.

The above is an illustrative scheme of a computing device of the present embodiment. It should be noted that the technical solution of the computing device and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computing device can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification further provides a computer-readable storage medium storing computer-executable instructions, which when executed by a processor implement the steps of the data processing method described above.

The above is an illustrative scheme of a computer-readable storage medium of the present embodiment. It should be noted that the technical solution of the storage medium belongs to the same concept as the technical solution of the data processing method, and details that are not described in detail in the technical solution of the storage medium can be referred to the description of the technical solution of the data processing method.

An embodiment of the present specification further provides a computer program, wherein when the computer program is executed in a computer, the computer is caused to execute the steps of the data processing method.

The above is a schematic scheme of a computer program of the present embodiment. It should be noted that the technical solution of the computer program and the technical solution of the data processing method belong to the same concept, and details that are not described in detail in the technical solution of the computer program can be referred to the description of the technical solution of the data processing method.

The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

The computer instructions comprise computer program code which may be in source code form, object code form, an executable file or some intermediate form, or the like. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, read-Only Memory (ROM), random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, and the like. It should be noted that the computer readable medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunications signals as is required by legislation and patent practice.

It should be noted that, for the sake of simplicity, the foregoing method embodiments are described as a series of combinations of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the described order of acts, as some steps may be performed in other orders or simultaneously according to the embodiments. Further, those skilled in the art should also appreciate that the embodiments described in this specification are preferred embodiments and that acts and modules referred to are not necessarily required for an embodiment of the specification.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The preferred embodiments of the present specification disclosed above are intended only to aid in the description of the specification. Alternative embodiments are not exhaustive and do not limit the invention to the precise embodiments described. Obviously, many modifications and variations are possible in light of the teaching of the embodiments of the present disclosure. The embodiments were chosen and described in order to best explain the principles of the embodiments and the practical application, to thereby enable others skilled in the art to best understand and utilize the embodiments. The specification is limited only by the claims and their full scope and equivalents.

Claims

1. A method of data processing, comprising:

2. The data processing method of claim 1, wherein constructing an initial knowledge-graph based on fields contained in the at least two business data tables comprises:

3. The data processing method of claim 2, wherein updating the initial knowledge-graph according to the association to generate a target knowledge-graph comprises:

4. The data processing method according to claim 2, wherein the determining, according to the historical operation data related to the at least two service data tables, an association relationship between different fields in the at least two service data tables includes:

5. The data processing method of claim 4, wherein updating the initial knowledge-graph according to the association to generate a target knowledge-graph comprises:

6. The data processing method of any one of claims 1 to 5, the business width table comprising a database table;

7. The data processing method of claim 1, wherein constructing the business breadth table for the target business based on the target knowledge-graph comprises:

8. The data processing method of claim 7, the building an enhanced entity relationship graph based on the target knowledge-graph, comprising:

9. The data processing method according to claim 8, wherein the dividing the at least two service data tables into a master table and a slave table according to the data amount comprises:

10. The data processing method of claim 6, wherein the adding the business data in the at least two business data tables to the data units of the corresponding fields in the intermediate database table to generate a target database table comprises:

11. The data processing method of claim 1, further comprising:

12. A data processing apparatus comprising:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is configured to acquire at least two service data tables and construct an initial knowledge graph based on fields contained in the at least two service data tables, and the at least two service data tables respectively correspond to different service types of a target service;

a construction module configured to construct a service width table of the target service based on the target knowledge graph.

13. A computing device, comprising:

a memory and a processor;

the memory is for storing computer-executable instructions, and the processor is for executing the computer-executable instructions, which when executed by the processor, implement the steps of the data processing method of any one of claims 1 to 11.

14. A computer-readable storage medium storing computer-executable instructions which, when executed by a processor, implement the steps of the data processing method of any one of claims 1 to 11.