CN113157947A

CN113157947A - Knowledge graph construction method, tool, device and server

Info

Publication number: CN113157947A
Application number: CN202110551912.8A
Authority: CN
Inventors: 张梦迪; 贾玉红; 徐聿帆; 陆怡
Original assignee: Industrial and Commercial Bank of China Ltd ICBC
Current assignee: Industrial and Commercial Bank of China Ltd ICBC
Priority date: 2021-05-20
Filing date: 2021-05-20
Publication date: 2021-07-23

Abstract

The specification provides a method, a tool, a device and a server for constructing a knowledge graph. Based on the construction method of the knowledge graph, the data structure type of target source data to be processed can be determined firstly; then, according to a preset construction rule and the data structure type of the target source data, constructing and obtaining a target knowledge extraction unit matched with the target source data; further, the target knowledge extraction unit can be called to specifically process target source data to obtain an entity relationship file which contains a plurality of ternary data sets and meets the requirements; and then, according to the entity relation file, a target knowledge graph associated with the target source data is constructed. Therefore, the operation of the user side can be effectively simplified, the construction difficulty of the knowledge graph is reduced, and the knowledge graph which meets the diversified service requirements and has a good effect can be efficiently and accurately constructed by the user.

Description

Knowledge graph construction method, tool, device and server

Technical Field

The specification belongs to the technical field of artificial intelligence, and particularly relates to a knowledge graph construction method, tool, device and server.

Background

The knowledge graph is an important branch in the artificial intelligence technology and has an important role in the learning and cognition of the machine.

However, the existing method for constructing the knowledge graph has high technical threshold and high construction difficulty for users with requirements on constructing the knowledge graph. Moreover, based on the existing knowledge graph construction method, when the knowledge graph is specifically constructed, the problems of complex and tedious operation, low construction efficiency, incapability of meeting diversified business requirements of users and the like often exist.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The specification provides a method, a tool, a device and a server for constructing a knowledge graph, so that user side operation is simplified, construction difficulty of the knowledge graph is reduced, and a user can efficiently and accurately construct the knowledge graph which meets diversified business requirements and has a good effect.

The embodiment of the specification provides a method for constructing a knowledge graph, which comprises the following steps:

acquiring target source data;

determining a data structure type of target source data;

constructing a target knowledge extraction unit matched with the target source data according to a preset construction rule and the data structure type of the target source data;

calling the target knowledge extraction unit to process the target source data to obtain an entity relationship file meeting the requirements; wherein, the entity relation file comprises a plurality of ternary data groups; the ternary data set at least comprises two data objects connected through a data relationship;

and constructing a target knowledge graph associated with the target source data according to the entity relationship file.

In some embodiments, the data structure type of the target source data comprises at least one of: structured data, unstructured data, semi-structured data.

In some embodiments, constructing a target knowledge extraction unit matched with the target source data according to a preset construction rule and a data structure type of the target source data includes:

screening a target source operator corresponding to target source data from a plurality of preset data source operators according to a preset construction rule; the target source operator is used for accessing the target source data to a target knowledge extraction unit;

determining a matched target data processing structure according to the data structure type of the target source data; the target data processing structure is used for processing target source data to obtain a plurality of ternary data groups;

determining and configuring a target identifier termination operator; the target identification termination operator is used for extracting a ternary data group meeting the requirement from a plurality of ternary data groups output by the target data processing structure so as to obtain a corresponding entity relationship file;

and combining the target source operator, the target data processing structure and the target identification termination operator to obtain a target knowledge extraction unit matched with the target data source.

In some embodiments, determining a matching target data processing structure according to the data structure type of the target source data includes:

screening an initial processing operator from a plurality of preset data processing operators under the condition that the data structure type of the target source data is determined to be structured data;

correspondingly configuring the initial processing operator to obtain a target processing operator; and determining the target processing operator as a matched target data processing structure.

In some embodiments, the preset data processing operator comprises at least one of: SQL operator, HIVE operator and SPARK operator.

and under the condition that the data structure type of the target source data is determined to be unstructured data or semi-structured data, determining a preset triple extraction model as a matched target data processing structure.

In some embodiments, after determining the data structure type of the target source data, the method further comprises:

screening recommended knowledge extraction units from a plurality of preset knowledge extraction units according to the data structure type of the target source data;

presenting the recommended knowledge extraction unit to a user;

and determining the recommended knowledge extraction unit selected by the user as the target knowledge extraction unit.

In some embodiments, constructing a target knowledge-graph associated with the target source data from the entity relationship file comprises:

acquiring a definition parameter file related to a target knowledge graph; wherein the defining parameter file comprises: defining parameters of data objects and/or defining parameters of data relations;

and generating a target knowledge graph associated with the target data source by performing data mapping according to the entity relation file and the definition parameter file.

In some embodiments, the definition parameter file further comprises an index definition parameter;

correspondingly, in the process of constructing the target knowledge graph associated with the target source data according to the entity relationship file, the method further comprises the following steps:

and constructing a target query index aiming at the target knowledge graph by using the definition parameters of the data objects and/or the definition parameters of the data relation according to the index definition parameters.

In some embodiments, after constructing a target knowledge-graph associated with the target source data from the entity relationship file, the method further comprises:

receiving a target query statement; the target query statement at least carries a target identification of a target knowledge graph;

retrieving a graph database according to the target identification to determine a target knowledge graph;

responding to the target query statement, and performing query operation on the target knowledge graph to obtain a corresponding query result;

and feeding back the query result.

In some embodiments, the target source data comprises a running record of the customer's transaction data; correspondingly, the query result comprises a flow chart of the transaction data of the target customer.

The embodiment of the present specification further provides a knowledge graph construction tool, which at least includes: the system comprises a source data import interface, a first processing interface and a second processing interface; wherein the content of the first and second substances,

the source data import interface is used for supporting a user to import target source data;

the first processing interface is used for supporting a user to set definition parameters of data objects and/or definition parameters of data relations in the target knowledge graph so as to generate a definition parameter file related to the target knowledge graph;

the second processing interface is used for supporting a user to determine and combine a matched target source operator, a target data processing structure and an identifier termination operator according to a preset construction rule so as to obtain a target knowledge extraction unit matched with the target source data;

the knowledge graph construction tool is also used for calling a target knowledge extraction unit to process target source data to obtain a corresponding entity relationship file; and generating a target knowledge graph associated with the target data source by performing data mapping according to the entity relationship file and the definition parameter file.

The embodiment of the present specification further provides a device for constructing a knowledge graph, including:

the acquisition module is used for acquiring target source data;

the determining module is used for determining the data structure type of the target source data;

the first construction module is used for constructing a target knowledge extraction unit matched with the target source data according to a preset construction rule and the data structure type of the target source data;

the calling module is used for calling the target knowledge extraction unit to process the target source data so as to obtain an entity relationship file meeting the requirements; wherein, the entity relation file comprises a plurality of ternary data groups; the ternary data set at least comprises two data objects connected through a data relationship;

and the second construction module is used for constructing a target knowledge graph associated with the target source data according to the entity relationship file.

Embodiments of the present specification also provide a server comprising a processor and a memory for storing processor-executable instructions, wherein the processor executes the steps of the method for constructing a knowledge-graph.

The present specification also provides a computer storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of constructing a knowledge-graph.

The specification provides a method, a tool, a device and a server for constructing a knowledge graph, and based on the method for constructing the knowledge graph, the data structure type of target source data to be processed can be determined firstly; then, according to a preset construction rule and the data structure type of the target source data, constructing and obtaining a target knowledge extraction unit which is matched with the target source data and has stronger pertinence; further, the target knowledge extraction unit can be called to specifically process target source data to obtain an entity relationship file which contains a plurality of ternary data sets and meets the requirements; and then, according to the entity relation file, a target knowledge graph associated with the target source data is constructed. Therefore, the operation of the user side can be effectively simplified, the construction difficulty of the knowledge graph is reduced, and the knowledge graph which meets the diversified service requirements and has a good effect can be efficiently and accurately constructed by the user.

Drawings

In order to more clearly illustrate the embodiments of the present specification, the drawings needed to be used in the embodiments will be briefly described below, and the drawings in the following description are only some of the embodiments described in the present specification, and it is obvious to those skilled in the art that other drawings can be obtained according to the drawings without any creative effort.

FIG. 1 is a schematic diagram of one embodiment of the structural composition of a system to which the method of construction of a knowledge graph provided by embodiments of the present description is applied;

FIG. 2 is a schematic diagram of a knowledge-graph building tool provided by one embodiment of the present description;

FIG. 3 is a flow diagram of a method of constructing a knowledge-graph provided by one embodiment of the present description;

FIG. 4 is a schematic structural component diagram of a server provided in an embodiment of the present description;

FIG. 5 is a schematic structural component diagram of a knowledge-graph constructing apparatus provided in an embodiment of the present specification;

fig. 6 is a schematic diagram of an embodiment of a method for constructing a knowledge graph provided by the embodiments of the present specification in a specific scenario example.

Detailed Description

In order to make those skilled in the art better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is obvious that the described embodiments are only a part of the embodiments of the present specification, and not all of the embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments in the present specification without any inventive step should fall within the scope of protection of the present specification.

The embodiment of the specification provides a method for constructing a knowledge graph, which can be particularly applied to a system comprising a server and terminal equipment. In particular, reference may be made to fig. 1. The server and the terminal device can be connected in a wired or wireless manner to perform specific data interaction.

In this embodiment, the server may specifically include a background server that is applied to a network platform side and is capable of implementing functions such as data transmission and data processing. Specifically, the server may be, for example, an electronic device having data operation, storage function and network interaction function. Alternatively, the server may be a software program running in the electronic device and providing support for data processing, storage and network interaction. In this embodiment, the number of servers included in the server is not particularly limited. The server may specifically be one server, or may also be several servers, or a server cluster formed by several servers.

In this embodiment, the terminal device may specifically include a front-end electronic device that is applied to a user side and can implement functions such as data acquisition and data transmission. Specifically, the terminal device may be, for example, a desktop computer, a tablet computer, a notebook computer, a smart phone, and the like. Alternatively, the terminal device may be a software application capable of running in the electronic device. For example, it may be some APP running on a smartphone, etc.

In this embodiment, the server may be further connected to a graph database of the network platform, and is configured to maintain and manage the graph database of the network platform. The map database may specifically store a plurality of knowledge maps. The terminal device may be further specifically configured with a knowledge graph construction tool.

In this embodiment, when a current user needs to process a batch of data (for example, a running record of transaction data of a customer in the year 2020 by XX bank) to construct a target knowledge graph meeting business requirements (for example, suitable for analyzing whether the customer has a risk of illegal transaction), a start instruction of a construction tool for the knowledge graph may be initiated on a terminal device to start the construction tool for the knowledge graph.

Correspondingly, the terminal equipment starts and displays an operation interface of the construction tool of the knowledge graph to the user. In particular, as shown in fig. 2. And then the user can process corresponding target source data through the terminal equipment by using the knowledge graph construction tool so as to construct and obtain a target knowledge graph meeting the service requirements.

The operation interface of the displayed construction tool of the knowledge graph at least comprises structures such as an active data import interface, a first processing interface, a second processing interface and the like.

During specific implementation, firstly, a user can utilize a source data import interface according to a relevant instruction on an operation interface to select and import a batch of data to be processed by multiple import modes such as local file uploading, HDFS file import, database table import, third-party data access and the like to serve as target source data.

When the target source data is specifically imported, the user can also customize an import mode, for example, preview information of the data to be imported can be displayed through the source data import interface in the process of importing the target source data by using the source data import interface; the preview information may be a data amount parameter (e.g., the number of rows, columns, total number of data, etc.), or may be a content parameter of the data (e.g., a content keyword of the data, a preview of the first few rows of data, a data name, etc.). Therefore, when the user imports the target source data by using the source data import interface, the import operation can be more accurately carried out according to the preview information, and the import error is avoided.

Next, the user may set definition parameters defining data objects and/or definition parameters of data relationships in the target knowledge graph to be built using the first processing interface.

Specifically, the user may set the name, attribute, type, etc. of the data object (or the entity object) in the first processing interface as the definition parameters of the data object (e.g., the name of the customer, the account of the customer, the business of the customer, etc.). The user may also set the name, attributes, type, etc. of the data relationship in the first processing interface as defining parameters of the data relationship (e.g., affiliation between data objects, transfer relationship between data objects, debt relationship between data objects, etc.).

Correspondingly, the terminal equipment can receive the definition parameters of the data objects and/or the definition parameters of the data relations set by the user through the first processing interface, and generate a definition parameter file related to the target knowledge graph according to the definition parameters.

Through the mode, a user can flexibly complete the construction of the graph structure of the target knowledge graph by generating the definition parameter file of the target knowledge graph through the first processing interface based on the actual service requirement.

Of course, it should be noted that, in the specific implementation, the user may input a specific service requirement in the first processing interface; and the terminal equipment automatically generates a definition parameter file related to the target knowledge graph according to the service requirement and by combining the data characteristics of the target source data.

Then, the user can efficiently and simply construct a target knowledge extraction unit which is matched with the target source data and meets the business requirements according to the preset construction rule and the data structure type of the target source data by using the second processing interface. The target knowledge extraction unit is used for processing target source data to extract an entity relation file used for generating a target knowledge graph. The entity relationship file may specifically include a plurality of ternary data sets extracted based on the target source data. Each ternary data set may include at least two data objects and a data relationship, where two data objects in the same ternary data set may be connected by the data relationship.

Specifically, as shown in fig. 2, the second processing interface may specifically include: a menu bar, a main canvas, a parameter configuration bar and the like. The menu bar may specifically display a data source operator selection box, a data processing structure selection box, and an identifier termination operator selection box. The parameter configuration column can specifically provide a parameter configuration interface for a data source operator, a data processing structure and an identification termination operator.

Further, the data source operator selection box may specifically include a plurality of preset data source operators for the user to select, for example, a DATAS operator for structured data, a DATAU operator for unstructured data, and the like. The data processing structure selection box may specifically include a plurality of preset data processing structures for a user to select, for example, a plurality of preset data processing operators (including SQL operators, HIVE operators, SPARK operators, and the like) matched with the structured data, a pre-trained preset triple extraction model matched with the unstructured data or the semi-structured data, and the like. The identifier termination operator selection box may specifically include a plurality of preset identifier termination operators for the user to select, for example, an MDATAS operator.

Specifically, based on a preset construction rule, a user can select a preset data source operator, a preset data processing structure and a preset identifier termination operator which are matched with each other from a menu bar through the second processing interface by combining a specific service requirement and a data structure type of target source data; meanwhile, the selected preset data source operator, the preset data processing structure and the preset identifier termination operator can be subjected to parameter configuration columns, so that a corresponding target source operator, a corresponding target data processing structure and a corresponding target identifier termination operator are obtained; and combining the target source operator, the target data processing structure and the target identification termination operator through the main canvas to obtain a target knowledge extraction unit which meets the personalized service requirements of the user and is matched with the target source data.

For example, taking a running record of transaction data of a customer in 2020 of a bank with XX as target source data as an example, first, based on a preset construction rule, considering that the target source data is structured data imported through a database table, a data operator for the structured data may be selected and used, and a corresponding import parameter is configured, so as to perform custom setting on an import mode of the target source data, and obtain a corresponding target source operator. Meanwhile, SQL operators suitable for processing structured data can be selected and used, corresponding processing logics are configured, custom setting is carried out in a knowledge extraction mode of target source data, and a corresponding target data processing structure is obtained. Then, in consideration of specific service requirements, an MDATAS operator may be selected and used, and corresponding extraction parameters (for example, identification information of a data object to be extracted, identification information of a data relationship to be extracted, and the like) are configured, so as to perform custom setting on the extracted data object and the data relationship, and obtain a corresponding target identification termination operator.

Then, the user can drag the target source operator, the target data processing structure and the target identification termination operator into the main canvas, and arrange the target source operator, the target data processing structure and the target identification termination operator in sequence; and then connecting the target source operator and the target data processing structure, and the target data processing structure and the target identification termination operator respectively by using a connecting line to complete combination, thereby obtaining a target knowledge extraction unit which meets the service requirements of users and is matched with the target source data.

Through the mode, the user can efficiently and conveniently construct the target knowledge extraction unit meeting the requirements by utilizing the second processing interface based on actual business requirements.

Of course, it should be noted that, in the specific implementation, the process may be that the terminal device automatically generates the target knowledge extraction unit according to the data structure type of the target source data and the specific service requirement based on a preset construction rule.

After obtaining the target knowledge extraction unit, the user may initiate a run instruction by performing a corresponding operation (e.g., clicking a confirmation run icon) on the construction tool of the knowledge graph. The terminal equipment can respond to an operation instruction initiated by a user, and based on an instruction program in a knowledge graph construction tool, a target instruction extraction unit is called to process target source data so as to efficiently extract a ternary data set for constructing a target knowledge graph from the target source data, and further a corresponding entity relationship file.

Further, the terminal device may generate a target knowledge graph associated with the target data source by performing data mapping according to the entity relationship file and the definition parameter file based on an instruction program in a knowledge graph construction tool.

Through the mode, the user can efficiently and accurately construct the target knowledge graph meeting the diversified business requirements only by carrying out simple operation.

After the target knowledge graph is obtained, the terminal device may show the target knowledge graph to the user for the user to inquire to use the target knowledge graph.

The user can also modify and edit the target knowledge graph through the terminal equipment by utilizing the construction tool of the knowledge graph, and name the target knowledge graph and the like.

Furthermore, the terminal device may also set a corresponding target identifier for the target knowledge graph (e.g., a generation number or name of the target knowledge graph may be determined as the target identifier corresponding to the target knowledge graph); and sending the target knowledge graph with the target identification to a server. Correspondingly, the server may store the received target knowledge graph carrying the target identifier in the graph database.

Subsequently, when the user needs to query the target knowledge graph again, the relevant target query statement can be generated and sent to the server through the terminal equipment. Wherein the target query statement carries at least a target identifier.

Correspondingly, the server receives the target query statement; and searching the graph database according to the target identification carried by the target query statement to find the target knowledge graph which is instructed to be queried by the user. Then, the server can respond to the target query statement and carry out specific query operation on the target knowledge graph to obtain a corresponding query result; and then feeding back the query result to the terminal equipment.

And the terminal equipment receives and displays the query result to the user.

Therefore, the terminal equipment can efficiently and conveniently finish the query of the target knowledge graph to obtain the required query result. Furthermore, the terminal device can perform further data processing according to the query result.

For example, the terminal device may further analyze whether an anomaly exists in the flow of the transaction data of the target customer by using the flow graph of the transaction data of the target customer; and then determine whether the target customer is at risk of a violation (e.g., money laundering risk, gambling risk, etc.). Therefore, the customers with the risk of illegal transactions can be efficiently and accurately identified.

Referring to fig. 3, the embodiment of the present disclosure provides a method for constructing a knowledge graph. When the method is implemented, the following contents may be included.

S301: and acquiring target source data.

S302: the data structure type of the target source data is determined.

S303: and constructing a target knowledge extraction unit matched with the target source data according to a preset construction rule and the data structure type of the target source data.

S304: calling the target knowledge extraction unit to process the target source data to obtain an entity relationship file meeting the requirements; wherein, the entity relation file comprises a plurality of ternary data groups; the triple data set includes at least two data objects connected by a data relationship.

S305: and constructing a target knowledge graph associated with the target source data according to the entity relationship file.

By the embodiment, the user side operation can be effectively simplified, the construction difficulty of the knowledge graph is reduced, and the user can effectively and accurately construct the target knowledge graph meeting the diversified business requirements of the user.

In some embodiments, the target source data may specifically refer to source data for generating a target knowledge graph required by a user. The target source data may be data of different contents, corresponding to different application scenarios and different service requirements. Specifically, for example, in a transaction risk prediction scenario of a customer, the target source data may specifically be a running record of transaction data (e.g., asset data, financial data, etc.) of the customer, and the like. For example, in the relation testimonial scenario of historical people, the target source data may be communication records between different historical people.

In some embodiments, the target source data may specifically include data of a plurality of different data structure types. Specifically, the data structure type of the target source data may specifically include at least one of the following: structured data, unstructured data, semi-structured data, and the like.

The structured data may specifically refer to data that satisfies a preset data format (e.g., key-value pair format, etc.). Generally, for a certain structured data, according to a preset data format corresponding to the data, specific attributes of different data contained in the data can be relatively directly determined. For example, for a piece of data satisfying the key-value pair format, it can be determined more directly which data is the key value and which data is the value

The semi-structured data may specifically refer to data that, although not satisfying a preset data format, still satisfies some other conventional formats (e.g., a table format, etc.). Generally, for a certain semi-structured data, although the specific attributes of different data contained in the data cannot be determined more directly like the structured data; however, in combination with the corresponding conventional format, the specific attributes of different data contained in the data can also be determined through certain semantic analysis processing.

The unstructured data may specifically refer to data that does not satisfy a preset data format, and also does not satisfy some conventional formats, for example, a text message in a trade order. Generally, for unstructured data, semantic analysis processing is required to determine specific attributes of different data contained in the data.

Through the embodiment, the construction method of the knowledge graph provided by the embodiment of the specification can be popularized and applied to processing target source data of various different data structure types so as to meet diversified business requirements of users.

In some embodiments, when the target source data is specifically acquired, for the structured data, the target source data may be acquired by any one of the following acquiring manners: uploading through a local file to obtain target source data; target source data are obtained through HDFS file import; and importing and acquiring target source data through a database table, and the like.

For semi-structured data and unstructured data, the target source data may be obtained by any one of the following enumerated obtaining methods: uploading through a local file to obtain target source data; acquiring target source data through data provided by an accessed third party; and acquiring target source data and the like by receiving data transmitted by other distributed clusters.

In some embodiments, in order to obtain target source data more efficiently, when the target source data is obtained by HDFS file import, a file path of the target source data (corresponding to HDFS metadata) may be recorded in a database in advance by using characteristics of the HDFS; when the target source data needs to be obtained, the database can be queried to obtain and use the file path to directly access and obtain the corresponding target source data. Therefore, multiple landing in the data acquisition process can be avoided, and the acquisition efficiency of the target source data is improved.

In some embodiments, in implementation, the data structure type of the target source data may be detected and determined according to the data structure feature of the target source data.

In some embodiments, in order to better consider processing target source data of multiple different data structure types, a data structure type of the target source data may be determined first; and distinguishing target source data of different data structure types according to the data structure types of the target source data, and processing the target source data of different data structure types in a matching mode to construct a target knowledge graph which meets the service requirements of users.

Specifically, two categories can be distinguished first: a first class (including structured data) and a second class (including unstructured data and semi-structured data); then, aiming at the two categories, a matched target knowledge extraction unit is constructed according to a preset construction rule; and then the target source data can be processed by utilizing the matched target knowledge extraction unit to construct and obtain a corresponding target knowledge graph.

In some embodiments, the above-mentioned constructing the target knowledge extraction unit matched with the target source data according to a preset construction rule and a data structure type of the target source data may include the following steps:

s1: screening a target source operator corresponding to target source data from a plurality of preset data source operators according to a preset construction rule; the target source operator is used for accessing the target source data to a target knowledge extraction unit;

s2: determining a matched target data processing structure according to the data structure type of the target source data; the target data processing structure is used for processing target source data to obtain a plurality of ternary data groups;

s3: determining and configuring a target identifier termination operator; the target identification termination operator is used for extracting a ternary data group meeting the requirement from a plurality of ternary data groups output by the target data processing structure so as to obtain a corresponding entity relationship file;

s4: and combining the target source operator, the target data processing structure and the target identification termination operator to obtain a target knowledge extraction unit matched with the target data source.

Through the embodiment, the target knowledge extraction unit which is matched with the target source data and meets the service requirements of the user can be accurately established based on the preset construction rule and by combining the data characteristics of the data structure type and the like of the target source data.

In some embodiments, the determining a matching target data processing structure according to the data structure type of the target source data may include the following steps: screening an initial processing operator from a plurality of preset data processing operators under the condition that the data structure type of the target source data is determined to be structured data; correspondingly configuring the initial processing operator to obtain a target processing operator; and determining the target processing operator as a matched target data processing structure.

Through the embodiment, the target data processing structure with strong pertinence and relatively matched can be determined for the target source data with the data structure type of the structured data based on the preset construction rule.

In some embodiments, a preset data processing operator meeting requirements can be screened from a plurality of preset data processing operators as an initial processing operator according to a business requirement of a user, a data characteristic of target source data, a programming language preferred by the user, and the like. Furthermore, the processing logic of the initial processing operator can be configured according to the business requirements, so that a target data processing structure which is matched with the initial processing operator can be obtained.

In some embodiments, the preset data processing operator may specifically include at least one of: SQL operators, HIVE operators, SPARK operators, etc.

Of course, it should be noted that the above-listed preset data processing operators are only schematic illustrations. In specific implementation, according to specific situations and used programming languages, other types of data processing operators can be introduced as preset data processing operators.

By the embodiment, for the target source data of which the data structure type is structured data, multiple optional preset data processing operators can be prepared and provided to obtain a target data processing structure meeting requirements, so that a target data processing structure with relatively higher matching degree and more accuracy can be obtained.

In some embodiments, the determining a matched target data processing structure according to the data structure type of the target source data may further include, in specific implementation: and under the condition that the data structure type of the target source data is determined to be unstructured data or semi-structured data, determining a preset triple extraction model as a matched target data processing structure.

By the embodiment, the target data processing structure with strong pertinence and relatively matched can be determined according to the target source data with the data structure type of unstructured data or semi-structured data based on the preset construction rule.

In some embodiments, the preset triple extraction model may be a model that is trained in advance and is capable of extracting a corresponding triple data set from text data based on semantic recognition. The ternary data set may specifically include two data objects connected by a data relationship.

In some embodiments, in a transaction risk prediction scenario of a customer, the data object may specifically be a name of the customer, an account of the customer, an enterprise holding the stock of the customer, or the like. The data relationship may specifically be a transfer relationship between data objects, a profit attribution relationship between data objects, a debt relationship between data objects, and the like. Of course, the data objects and data relationships listed above are only schematic illustrations. The data object and the data relationship may also be data of other contents according to specific application scenarios and business requirements.

In some embodiments, before implementation, the preset triplet extraction model may be obtained by training in the following manner: acquiring sample text data; marking two data objects with a data relation in the sample text data to obtain marked sample text data; and performing model training by using the labeled sample text data to obtain a preset triple extraction model.

In some embodiments, after determining the data structure type of the target source data, when the method is implemented, the following may be further included: screening recommended knowledge extraction units from a plurality of preset knowledge extraction units according to the data structure type of the target source data; presenting the recommended knowledge extraction unit to a user; and determining the recommended knowledge extraction unit selected by the user as the target knowledge extraction unit.

By the embodiment, before specific implementation, a plurality of preset knowledge extraction units can be configured in advance according to historical processing records, aiming at various relatively common target source data and relatively common business requirements; in specific implementation, a preset knowledge extraction unit matched with the data structure type of the target source data can be screened from a plurality of preset knowledge extraction units according to the data structure type of the target source data and used as a recommended knowledge extraction unit for a user to select; correspondingly, the user only needs to select the recommended knowledge extraction unit which meets the service requirement of the user from the plurality of recommended knowledge extraction units as the target knowledge extraction unit according to the specific service requirement. Therefore, the target knowledge extraction unit which meets the service requirements of the user can be obtained more efficiently and conveniently.

In some embodiments, after the target knowledge extraction unit is constructed, the target knowledge extraction unit may be invoked to process the target source data to extract the ternary data group from the target source data; and screening out a ternary data group associated with the service requirement of the user from the ternary data group, and constructing to obtain an entity relationship file meeting the requirement. The entity relationship file may specifically include a plurality of ternary data sets associated with the service requirements of the user.

In some embodiments, the constructing of the target knowledge graph associated with the target source data according to the entity relationship file may include the following steps:

s1: acquiring a definition parameter file related to a target knowledge graph; wherein the defining parameter file comprises: defining parameters of data objects and/or defining parameters of data relations;

s2: and generating a target knowledge graph associated with the target data source by performing data mapping according to the entity relation file and the definition parameter file.

By the embodiment, the target knowledge graph required by the user can be efficiently and accurately constructed and obtained through data mapping by utilizing the entity relation file and the definition parameter file.

In some embodiments, the target knowledge-graph may be a graph comprising a plurality of nodes and connecting edges. Each node corresponds to one data object, and each connection edge corresponds to at least one data relation. And the nodes are connected through the connecting edges.

In some embodiments, when data mapping is specifically performed according to the entity relationship file and the definition parameter file, a data object in the definition parameter file may be mapped into a node according to the entity relationship file, a data relationship may be mapped into a connection edge, and the connection edge may be converted into corresponding graph data, so that a target knowledge graph may be constructed.

In some embodiments, when the target knowledge graph is specifically constructed, the attribute information of the node and/or the attribute information of the connecting edge can be determined according to the entity relationship file and the definition parameter file; and attribute information labeling is carried out on corresponding nodes and/or connecting edges in the knowledge graph, so that the target knowledge graph with relatively richer data content and relatively better effect can be obtained.

In some embodiments, the defining the parameter file may further include: index definition parameters; correspondingly, in the process of constructing the target knowledge graph associated with the target source data according to the entity relationship file, when the method is implemented, the method may further include: and constructing a target query index aiming at the target knowledge graph by using the definition parameters of the data objects and/or the definition parameters of the data relation according to the index definition parameters.

By the embodiment, the target knowledge graph can be constructed and obtained according to the index definition parameters in the definition parameter file while the target knowledge graph is constructed, so that the target knowledge graph can be used and inquired more efficiently by using the target inquiry index in the following process.

In some embodiments, in specific implementation, the constructed target knowledge graph and the corresponding target query index may be stored in a graph database together, so as to facilitate subsequent use.

In some embodiments, when a target knowledge graph of a plurality of target source data with a large data volume is constructed in batch, a large amount of data processing resources are often consumed, a large processing load is easily formed on a system (or a server, a terminal device, or the like), and the stability of the overall operation of the system is affected. Therefore, when a plurality of target knowledge maps are built in batch, the system can be further configured to estimate whether the data processing amount required by the building of each target knowledge map is larger than the preset threshold processing amount. The preset threshold processing amount may be specifically determined according to the overall processing performance of the system.

When the data processing amount required by the construction of the target knowledge graph is determined to be smaller than the preset threshold processing amount, the system can load data normally, process and construct the target knowledge graph to obtain the corresponding target knowledge graph.

When the data processing amount required by the construction of the target knowledge graph is determined to be larger than or equal to the preset threshold value, the system can suspend the data loading and the construction of the target knowledge graph and prompt a user initiating the target knowledge graph, and the construction of the target knowledge graph needs to be firstly approved and can be normally executed under the condition that the approval is passed. In addition, the system can also monitor the load state of the system in real time, and restore the loaded data and carry out corresponding construction processing of the target knowledge graph under the condition that the load state of the system is determined to allow the target knowledge graph to be constructed and processed. Therefore, the whole system can be protected to operate stably and reliably.

In some embodiments, after constructing the target knowledge-graph associated with the target source data according to the entity relationship file, when the method is implemented, the following may be further included: receiving a target query statement; the target query statement at least carries a target identification of a target knowledge graph; retrieving a graph database according to the target identification to determine a target knowledge graph; responding to the target query statement, and performing query operation on the target knowledge graph to obtain a corresponding query result; and feeding back the query result.

By the embodiment, the target query statement initiated by the user can be responded, the corresponding target knowledge graph is efficiently found out from the graph database for query operation, and the related query result is timely fed back to the user.

In some embodiments, after the target knowledge graph is determined, whether a target query index of the target knowledge graph is stored in the graph database or not can be detected; under the condition that the target query index is detected, the target query statement can be responded, and the target knowledge graph is more efficiently and accurately queried in combination with the target query index, so that the query efficiency can be further improved, and the query experience of a user can be improved.

Specifically, for example, in a transaction risk prediction scenario of a customer, the transaction data may be fund data, and the query result may be a flow graph of the fund data of the target customer. In particular, whether the flow of the fund data of the target customer is abnormal or not can be analyzed according to the flow chart of the fund data of the target customer, and further whether the target customer has corresponding transaction risks (for example, money laundering risks, gambling risks, fraud risks and the like) or not can be judged.

Through the embodiment, the construction method of the knowledge graph provided by the embodiment of the specification can be better applied to a prediction scene of the transaction risk of the client, so that whether the corresponding transaction risk exists in the target client or not can be accurately and efficiently predicted by using the constructed target knowledge graph.

As can be seen from the above, in the method for constructing a knowledge graph provided in the embodiments of the present specification, based on the method for constructing a knowledge graph, the data structure type of target source data to be processed may be determined first; then, according to a preset construction rule and the data structure type of the target source data, constructing and obtaining a target knowledge extraction unit matched with the target source data; further, the target knowledge extraction unit can be called to specifically process target source data to obtain an entity relationship file which contains a plurality of ternary data sets and meets the requirements; and then, according to the entity relation file, a target knowledge graph associated with the target source data is constructed. Therefore, the operation of the user side can be effectively simplified, the construction difficulty of the knowledge graph is reduced, and the knowledge graph which meets the diversified service requirements and has a good effect can be efficiently and accurately constructed by the user.

Referring to fig. 2, the embodiment of the present specification further provides a knowledge graph construction tool. The construction tool of the knowledge graph at least comprises: the system comprises a source data import interface, a first processing interface and a second processing interface; wherein the content of the first and second substances,

the source data import interface can be specifically used for supporting a user to import target source data;

the first processing interface may be specifically configured to support a user to set definition parameters of data objects and/or definition parameters of data relationships in a target knowledge graph, so as to generate a definition parameter file related to the target knowledge graph;

the second processing interface can be specifically used for supporting a user to determine and combine a matched target source operator, a target data processing structure and an identifier termination operator according to a preset construction rule so as to obtain a target knowledge extraction unit matched with target source data;

the knowledge graph construction tool is used for calling a target knowledge extraction unit to process target source data to obtain a corresponding entity relationship file; and generating a target knowledge graph associated with the target data source by performing data mapping according to the entity relationship file and the definition parameter file.

Through the embodiment, a user can efficiently and conveniently realize one-stop construction by using the construction tool of the knowledge graph to obtain the target knowledge graph meeting diversified business requirements, so that the user side operation can be effectively simplified, the construction difficulty of the knowledge graph is reduced, and the use experience of the user is improved.

Embodiments of the present specification further provide a server, including a processor and a memory for storing processor-executable instructions, where the processor, when implemented, may perform the following steps according to the instructions: acquiring target source data; determining a data structure type of target source data; constructing a target knowledge extraction unit matched with the target source data according to a preset construction rule and the data structure type of the target source data; calling the target knowledge extraction unit to process the target source data to obtain an entity relationship file meeting the requirements; wherein, the entity relation file comprises a plurality of ternary data groups; the ternary data set at least comprises two data objects connected through a data relationship; and constructing a target knowledge graph associated with the target source data according to the entity relationship file.

In order to more accurately complete the above instructions, referring to fig. 4, another specific server is provided in the embodiments of the present specification, wherein the server includes a network communication port 401, a processor 402, and a memory 403, and the above structures are connected by an internal cable, so that the structures may perform specific data interaction.

The network communication port 401 may be specifically configured to acquire target source data.

The processor 402 may be specifically configured to determine a data structure type of the target source data; constructing a target knowledge extraction unit matched with the target source data according to a preset construction rule and the data structure type of the target source data; calling the target knowledge extraction unit to process the target source data to obtain an entity relationship file meeting the requirements; wherein, the entity relation file comprises a plurality of ternary data groups; the ternary data set at least comprises two data objects connected through a data relationship; and constructing a target knowledge graph associated with the target source data according to the entity relationship file.

The memory 403 may be specifically configured to store a corresponding instruction program.

In this embodiment, the network communication port 401 may be a virtual port that is bound to different communication protocols, so that different data can be sent or received. For example, the network communication port may be a port responsible for web data communication, a port responsible for FTP data communication, or a port responsible for mail data communication. In addition, the network communication port can also be a communication interface or a communication chip of an entity. For example, it may be a wireless mobile network communication chip, such as GSM, CDMA, etc.; it can also be a Wifi chip; it may also be a bluetooth chip.

In this embodiment, the processor 402 may be implemented in any suitable manner. For example, the processor may take the form of, for example, a microprocessor or processor and a computer-readable medium that stores computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, an Application Specific Integrated Circuit (ASIC), a programmable logic controller, an embedded microcontroller, and so forth. The description is not intended to be limiting.

In this embodiment, the memory 403 may include multiple layers, and in a digital system, the memory may be any memory as long as binary data can be stored; in an integrated circuit, a circuit without a physical form and with a storage function is also called a memory, such as a RAM, a FIFO and the like; in the system, the storage device in physical form is also called a memory, such as a memory bank, a TF card and the like.

The present specification further provides a computer storage medium based on the above construction method of the knowledge graph, where the computer storage medium stores computer program instructions, and when the computer program instructions are executed, the computer storage medium implements: acquiring target source data; determining a data structure type of target source data; constructing a target knowledge extraction unit matched with the target source data according to a preset construction rule and the data structure type of the target source data; calling the target knowledge extraction unit to process the target source data to obtain an entity relationship file meeting the requirements; wherein, the entity relation file comprises a plurality of ternary data groups; the ternary data set at least comprises two data objects connected through a data relationship; and constructing a target knowledge graph associated with the target source data according to the entity relationship file.

In this embodiment, the storage medium includes, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Cache (Cache), a Hard Disk Drive (HDD), or a Memory Card (Memory Card). The memory may be used to store computer program instructions. The network communication unit may be an interface for performing network connection communication, which is set in accordance with a standard prescribed by a communication protocol.

In this embodiment, the functions and effects specifically realized by the program instructions stored in the computer storage medium can be explained by comparing with other embodiments, and are not described herein again.

Referring to fig. 5, in a software level, an embodiment of the present specification further provides an apparatus for constructing a knowledge graph, which may specifically include the following structural modules:

the obtaining module 501 may be specifically configured to obtain target source data;

the determining module 502 may be specifically configured to determine a data structure type of the target source data;

the first building module 503 is specifically configured to build a target knowledge extraction unit matched with the target source data according to a preset building rule and a data structure type of the target source data;

a calling module 504, specifically configured to call the target knowledge extraction unit to process the target source data to obtain an entity relationship file meeting requirements; wherein, the entity relation file comprises a plurality of ternary data groups; the ternary data set at least comprises two data objects connected through a data relationship;

the second constructing module 505 may be specifically configured to construct a target knowledge graph associated with the target source data according to the entity relationship file.

It should be noted that, the units, devices, modules, etc. illustrated in the above embodiments may be implemented by a computer chip or an entity, or implemented by a product with certain functions. For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. It is to be understood that, in implementing the present specification, functions of each module may be implemented in one or more pieces of software and/or hardware, or a module that implements the same function may be implemented by a combination of a plurality of sub-modules or sub-units, or the like. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Therefore, the device for constructing the knowledge graph provided by the embodiment of the specification can effectively simplify the operation of the user side, reduce the difficulty in constructing the knowledge graph, and enable the user to efficiently and accurately construct the knowledge graph which meets the requirements of diversified services and has a good effect.

In a specific scenario example, the method for constructing a knowledge graph provided by the embodiments of the present specification may be applied to import data to construct a corresponding knowledge graph. The specific implementation process can be seen in fig. 6, and includes the following steps.

Step 1: various types of data (e.g., target source data of multiple different data structure types) are imported into the system.

Step 2: and constructing the map structure according to the scene requirement (for example, generating a corresponding definition parameter file).

In this scenario example, the construction of the graph structure may include: entity information (e.g., definition parameters of data objects), relationship information (e.g., definition parameters of data relationships), attribute information of entity relationships, and index information contained in the definition map. Specifically, for example, the entity VERTEX may be defined to include, but not limited to, a type tag of VERTEX, and various types of attributes of VERTEX; defining a relationship EDGE includes, but is not limited to, a type tag for EDGE, a start point type for EDGE, and various types of attributes for EDGE. For scenarios with specific query requirements, the efficiency of subsequent queries can be improved by constructing an index (e.g., a target query index), which can be specifically constructed on entities, relationships or attributes, and a combination of the three. For example, point indexes can be built on VERTEX to improve the query efficiency of scenes with point-starting query requirements.

And step 3: and carrying out structuring processing on each type of data (so as to obtain a corresponding entity relation file).

In the present scenario example, for structured data, a structured data DATAS operator (e.g., a target source operator) that needs to be processed can be first selected at the main sidebar (in the second processing interface) and dragged into the main canvas. And secondly, selecting a data processing operator such as an SQL operator at the main side bar, filling the specific processing logic of the SQL operator in the configuration side bar (obtaining a target data processing structure), and clicking an operation operator. And after the operation is successful, selecting an identifier termination operator, filling the name MDATAS of the termination operator (obtaining a target identifier termination operator), and clicking the operation operator until the operation is successful.

Similarly, for unstructured data or semi-structured data, firstly dragging unstructured data DATAU, then selecting a trained model operator (for example, a preset triple extraction model), clicking operation prediction, and after the operator is successfully operated, selecting an identifier termination operator MDATAU until the operation is successful.

And 4, step 4: and carrying out knowledge mapping on the ontology and the data.

In the present scenario example, first, the data source to be used may be selected as the candidate data source in the full volume of data, and second, the onto-model to be mapped may be selected. Then, selecting a certain entity VERTEX in the ontology model, and selecting a data source MDATAS corresponding to VERTEX. Finally, for each attribute of VERTEX, a field of MDATAS is selected to map with one of the fields. And by analogy, mapping all entity relations and entity relation data source files.

And 5: and (4) importing the graph structure and the data into a database (generating a corresponding knowledge graph and storing the knowledge graph into a graph database).

In this scenario embodiment, when the knowledge graph is generated and stored, some configuration information of the knowledge graph may also be filled in, including but not limited to a graph name and the like. And clicking the graph to import the ontology model and the data into the knowledge graph in batch by one key.

Step 6: and visually displaying the map through a visualization module.

In the scene example, when a user needs to query the map data, the user can fill in and send a corresponding query statement, query the map data, and visually display a query result.

Through the scene example, it is verified that the method for constructing the knowledge graph provided by the embodiment of the specification is a one-stop, simple and efficient method, and has the following advantages: firstly, a simple and efficient map construction and visual display platform is provided for technical background-free service personnel who use knowledge map technology for analysis and exploration, usability optimization is provided for each constructed link, and the disadvantages that a map has a large number of available scenes but the technology is high in threshold are solved; secondly, the essential aspects in the process of building the map are comprehensively considered and summarized, a one-stop map building system which is simple and easy to use and has a low threshold is formed, each process of building the map refers to ETL for module splitting, and compared with the existing processes in the industry, each process of building the map is more clearly shown, the thinking cognitive logic of a user is conformed, and a new idea is provided for the creation of a map building tool; and thirdly, for the knowledge extraction module, a DAG-based data processing flow is provided, and compared with a common rule model, the knowledge extraction module is more powerful and easier to use.

Although the present specification provides method steps as described in the examples or flowcharts, additional or fewer steps may be included based on conventional or non-inventive means. The order of steps recited in the embodiments is merely one manner of performing the steps in a multitude of orders and does not represent the only order of execution. When an apparatus or client product in practice executes, it may execute sequentially or in parallel (e.g., in a parallel processor or multithreaded processing environment, or even in a distributed data processing environment) according to the embodiments or methods shown in the figures. The terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the presence of additional identical or equivalent elements in a process, method, article, or apparatus that comprises the recited elements is not excluded. The terms first, second, etc. are used to denote names, but not any particular order.

Those skilled in the art will also appreciate that, in addition to implementing the controller as pure computer readable program code, the same functionality can be implemented by logically programming method steps such that the controller is in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Such a controller may therefore be considered as a hardware component, and the means included therein for performing the various functions may also be considered as a structure within the hardware component. Or even means for performing the functions may be regarded as being both a software module for performing the method and a structure within a hardware component.

This description may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, classes, etc. that perform particular tasks or implement particular abstract data types. The specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

From the above description of the embodiments, it is clear to those skilled in the art that the present specification can be implemented by software plus necessary general hardware platform. With this understanding, the technical solutions in the present specification may be essentially embodied in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a mobile terminal, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments in the present specification.

The embodiments in the present specification are described in a progressive manner, and the same or similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. The description is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable electronic devices, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

While the specification has been described with examples, those skilled in the art will appreciate that there are numerous variations and permutations of the specification that do not depart from the spirit of the specification, and it is intended that the appended claims include such variations and modifications that do not depart from the spirit of the specification.

Claims

1. A method for constructing a knowledge graph, comprising:

acquiring target source data;

determining a data structure type of target source data;

2. The method of claim 1, wherein the data structure type of the target source data comprises at least one of: structured data, unstructured data, semi-structured data.

3. The method of claim 2, wherein constructing a target knowledge extraction unit matched with the target source data according to a preset construction rule and a data structure type of the target source data comprises:

4. The method of claim 3, wherein determining a matching target data processing structure based on the data structure type of the target source data comprises:

5. The method of claim 4, wherein the predetermined data processing operator comprises at least one of: SQL operator, HIVE operator and SPARK operator.

6. The method of claim 3, wherein determining a matching target data processing structure based on the data structure type of the target source data comprises:

7. The method of claim 1, wherein after determining the data structure type of the target source data, the method further comprises:

presenting the recommended knowledge extraction unit to a user;

8. The method of claim 1, wherein constructing a target knowledge-graph associated with the target source data from the entity relationship file comprises:

9. The method of claim 8, wherein the definition parameter file further comprises an index definition parameter;

10. The method of claim 1, wherein after constructing a target knowledge-graph associated with the target source data from the entity relationship file, the method further comprises:

and feeding back the query result.

11. The method of claim 10, wherein the target source data comprises a running record of the customer's transaction data; correspondingly, the query result comprises a flow chart of the transaction data of the target customer.

12. A knowledge graph building tool, comprising at least: the system comprises a source data import interface, a first processing interface and a second processing interface; wherein the content of the first and second substances,

13. An apparatus for constructing a knowledge graph, comprising:

the acquisition module is used for acquiring target source data;

14. A server comprising a processor and a memory for storing processor-executable instructions which, when executed by the processor, implement the steps of the method of any one of claims 1 to 11.

15. A computer storage medium having stored thereon computer instructions which, when executed, implement the steps of the method of any one of claims 1 to 11.