WO2020143326A1

WO2020143326A1 - Knowledge data storage method, device, computer apparatus, and storage medium

Info

Publication number: WO2020143326A1
Application number: PCT/CN2019/118619
Authority: WO
Inventors: 孙佳兴; 胡逸凡; 陈泽晖; 黄鸿顺
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-01-11
Filing date: 2019-11-15
Publication date: 2020-07-16
Also published as: CN109885692B; CN109885692A

Abstract

A knowledge data storage method, a device, a computer apparatus, and a storage medium, pertaining to the technical field of knowledge graphs. The method comprises: extracting knowledge data from a knowledge data source; extracting entity information from the knowledge data, performing vectorization conversion on the entity information, generating entity data vectors, extracting relation information from the knowledge data, performing vectorization conversion on the relation information, and generating relation data vectors; acquiring entity ID identifiers of the entity data vectors and relation ID identifiers of the relation data vectors, and performing clustering to form knowledge data subsets; calculating an information similarity level between any two of the knowledge data subsets, and establishing nodes of a knowledge graph; and acquiring feature information of the nodes of the knowledge graph, and storing the knowledge data in a database according to a correspondence relationship between the feature information and a storing position of the database. The method effectively solves the problem of insufficient efficiency in storing and querying knowledge data.

Description

Knowledge data storage method, device, computer equipment and storage medium

This application requires the priority of the Chinese patent application submitted to the China Patent Office on January 11, 2019, with the application number 201910025164.2 and the invention titled "Knowledge Data Storage Method, Device, Computer Equipment, and Storage Media", the entire contents of which are cited by reference Incorporated in this application.

Technical field

The present application relates to the field of knowledge graph technology, and in particular, to a method, device, computer equipment, and storage medium for storing knowledge data.

Background technique

Knowledge graph is also called scientific knowledge graph. It is called knowledge domain visualization or knowledge domain mapping map in the library and information industry. It is a series of different graphs showing the relationship between the development process and structure of knowledge. Visualization technology is used to describe knowledge resources and their carriers , Mining, analyzing, constructing, drawing and displaying knowledge and their interrelationships.

The inventor realized that when the knowledge data in the knowledge graph is stored in the database, there is a long storage time due to the large amount of data associated with the knowledge graph. And when querying the knowledge data in the knowledge graph, the required knowledge data cannot be quickly queried.

Summary of the invention

Based on this, it is necessary to provide a knowledge data storage method, device, computer equipment, and storage medium for the problem that the existing knowledge data storage time is long and the query speed is slow.

This application provides a method for storing knowledge data, including the following steps:

Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;

Extracting entity information in the knowledge data, vectorizing the entity information to generate entity data vectors, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vectors;

Obtain the entity ID identification of the entity data vector and the relationship ID identification of the relationship data vector, cluster knowledge data with the same entity ID identification to form a knowledge data set, and cluster the knowledge data set with the same relationship ID The identified knowledge data forms a subset of knowledge data;

Calculating the information similarity of any two of the knowledge data subsets, and establishing a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;

Obtain the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.

This application provides a knowledge data storage device, including the following modules:

The data acquisition module is configured to send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data source according to the form of knowledge data contained in the feedback information Knowledge data

The vector generation module is configured to extract entity information in the knowledge data, vectorize the entity information, generate entity data vectors, extract relationship information in the knowledge data, and vectorize the relationship information To generate relational data vectors;

The data clustering module is configured to obtain the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector, and cluster knowledge data with the same entity ID identification to form a knowledge data set. Knowledge data in the knowledge data set with the same relationship ID forms a subset of knowledge data;

A node establishment module, configured to calculate the information similarity of any two of the knowledge data subsets, and to establish a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;

The data storage module is configured to acquire the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.

A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the steps of the knowledge data storage method are caused.

A storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, causes the one or more processors to perform the steps of the above knowledge data storage method.

BRIEF DESCRIPTION

FIG. 1 is an overall flowchart of a method for storing knowledge data in an embodiment of the present application;

2 is a schematic diagram of a data acquisition process in a method for storing knowledge data in an embodiment of the present application;

3 is a schematic diagram of a vector generation process in a method for storing knowledge data in an embodiment of the present application;

FIG. 4 is a structural diagram of a basic knowledge data storage device in an embodiment of the present application.

detailed description

In order to make the purpose, technical solutions and advantages of the present application more clear, the following describes the present application in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.

Those skilled in the art can understand that unless specifically stated, the singular forms "a", "an", "said" and "the" used herein may also include the plural forms. It should be further understood that the word "comprising" used in the description of this application refers to the presence of the described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or their groups.

FIG. 1 is an overall flowchart of a method for storing knowledge data in an embodiment of the present application. As shown in FIG. 1, a method for storing knowledge data includes the following steps:

S1: Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;

Specifically, obtain the IP address of the source of knowledge data with extracted knowledge data, obtain the data collection server closest to the IP address according to the IP address, and the data collection server sends a knowledge data extraction instruction to the knowledge data to be extracted Source of knowledge data. After receiving the feedback information from the knowledge source, the feedback information is segmented into several sub-segments, and feature words in the form of knowledge data are extracted from the sub-segments. Among them, there are three main forms of knowledge data: structured knowledge data, semi-structured knowledge data and unstructured knowledge data.

S2, extracting entity information in the knowledge data, vectorizing the entity information to generate an entity data vector, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vector;

Specifically, the knowledge data mainly contains three kinds of information, namely: entity information, relationship information and attribute information. In the original knowledge data, entity information and relationship information exist in the form of text, which is not convenient for comparison of similarity, and the entity information and relationship information are vectorized and converted to obtain entity data vector and relationship data Vectors can be quantified and compared to increase the speed of information processing.

S3. Obtain the entity ID identifier of the entity data vector and the relationship ID identifier of the relational data vector, and cluster knowledge data with the same entity ID identifier to form a knowledge data set, and cluster the knowledge data sets to have the same The knowledge data identified by the relationship ID forms a subset of knowledge data;

Specifically, the entity ID identification is given when the entity data vector is generated, and it can be used as the entity ID identification according to the generation time of the entity data vector. For example, if the time when the A entity vector is generated is 10:00, the entity ID is identified as 1000. Similarly, the relationship ID identification of the relationship data vector can also be given in the same way as the entity ID identification.

S4. Calculate the information similarity of any two of the knowledge data subsets, and establish a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;

Specifically, the calculation method of the information similarity may use the Euclidean distance algorithm, Pearson correlation coefficient, and cosine similarity algorithm. In the specific calculation process, one or more of the above methods can be used. When multiple similarity algorithms are used for calculation, the results obtained by different similarity algorithms can be compared. If the similarity obtained by the two algorithms is If the difference is greater than the error threshold (usually 95%), a subset of knowledge data needs to be re-established.

In this step, the node of the knowledge graph refers to adding a knowledge point to the existing knowledge graph. For example, in the existing knowledge graph, the attribute "vegetable" is connected with three entities "cabbage", "cauliflower" and "chili", and the newly added entity information is "green pepper", which is similar to "chili" After the calculation, the node "green pepper" is established in the existing knowledge graph.

S5: Obtain the feature information of the node of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.

Specifically, the node feature information of the knowledge graph refers to information that the node is different from other nodes. For example, the feature information of the "green pepper" node is "green" compared to the "chili pepper" node. Binary processing is performed on the feature information to obtain a binary character string. Extract the first 5 characters of the binary character string, compare the first 5 characters with the key value of the database, obtain the key value is the storage location of the first 5 characters database, and store the knowledge data in the database in.

In this embodiment, by effectively sorting out the knowledge data, it can be quickly stored to the corresponding position of the database, thereby facilitating querying the knowledge data.

FIG. 2 is a schematic diagram of a data acquisition process in a method for storing knowledge data in an embodiment of the present application. As shown in the figure, in S1, a knowledge data extraction instruction is sent to a knowledge data source of knowledge data to be extracted. According to the feedback information of the source of the knowledge data, extracting the knowledge data of the source of the knowledge data according to the form of the knowledge data contained in the feedback information, including:

S101: Obtain the network address of the knowledge data source of the knowledge data to be extracted, compare the network address with the content in the preset network address list, and send if the network address is in the network address list Knowledge data extraction instruction, otherwise it will not be sent;

Specifically, the network address of the source of the knowledge data of the knowledge data to be extracted is obtained, and the type of the network address is determined according to the format of the network address, that is, whether the network address is a static IP address or a dynamic IP address, if it is a static IP Address, retrieve the IP address table from the database for comparison, determine whether the static IP address is on the IP address table, send a knowledge data acquisition instruction when it is, and not send it if it is not; if it is a dynamic IP address, then Performing DNS resolution on the dynamic IP address to obtain a DNS resolution code corresponding to the dynamic IP address, and then calling a DNS resolution code table in a database to compare the DNS resolution code to determine whether the DNS resolution code is in the On the DNS resolution code table, the knowledge data acquisition instruction is sent when it is not, and it is not sent when it is not.

S102: Receive feedback information of the knowledge data source, extract form keywords of the data source form from the feedback information, and determine the form of the knowledge data source according to the form keywords;

Specifically, the formal keyword refers to whether the knowledge data is structured data, semi-structured data, or unstructured data. For example, in the feedback information, the keyword of the form "table" corresponds to structured data; the keyword of the form "webpage" corresponds to semi-structured data; the key of the form "text" appears Words correspond to unstructured data.

S103. Acquire an extraction method corresponding to the form of the knowledge data source, and extract the knowledge data of the knowledge data source according to the extraction method.

Specifically, different forms of data sources correspond to different data extraction methods. For example, semi-structured web page data is usually crawled by web crawlers. For unstructured text, text language is usually used for extraction.

In this embodiment, by analyzing the feedback information of the source of knowledge data, the data form of the source of knowledge data is determined, so that the knowledge data of the source of knowledge data can be extracted by using the correct extraction method.

FIG. 3 is a schematic diagram of a vector generation process in a method for storing knowledge data in an embodiment of the present application. As shown in the figure, the S2 extracts entity information in the knowledge data and converts the entity information into a vector Conversion, generating entity data vectors, extracting the relationship information in the knowledge data, and vectorizing the relationship information to generate relationship data vectors, including:

S201. Acquire the vector dimension corresponding to the entity information according to the amount of entity data in the existing knowledge map, and acquire the vector dimension corresponding to the relationship information according to the amount of relationship data in the existing knowledge map;

Specifically, the existing knowledge map refers to the knowledge map that has been stored in the database. The entity feature word query on the existing knowledge map can obtain the amount of entity data. For example, the entity feature word in the sports knowledge map can be "ball ", "swimming", "car", etc., and then can find the corresponding entity data according to these feature words, such as "basketball", "800 meters freestyle" and so on. The vector dimension corresponding to the entity information is the number of repeated occurrences of the entity information, and the vector dimension corresponding to the relationship information is the number of repeated occurrences of the relationship information.

S202: Generate an initial entity data vector after generating elements of each dimension in the vector corresponding to the entity information according to the vector dimension corresponding to the entity information and the entity data included in the knowledge data from the knowledge data source;

Specifically, the entity data vector refers to that different entity data in the knowledge graph is represented in the form of a vector. The entity data vector may be a person entity data vector, a regional entity data vector, a disease entity data vector, or a symptom entity number vector.

S203. Generate an initial relationship data vector after generating elements of each dimension in the vector corresponding to the relationship information according to the dimension of the vector corresponding to the relationship information and the relationship data included in the knowledge data from the source of the knowledge data;

Specifically, the relational data vector means that the relational data connecting different entity data is expressed in the form of a vector, and the relational data may be symptom relational data vectors or physical examination relational data.

S204. Normalize the initial entity data vector to obtain the entity data vector; normalize the initial relationship data vector to obtain the relationship data vector.

In this embodiment, by establishing the entity data vector and the relationship data vector, the entity information and the relationship information are quantified, thereby facilitating the analysis of the correlation between the entity information and the relationship information.

In one embodiment, in S3, the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector are acquired, and knowledge data with the same entity ID identification are clustered to form a knowledge data set. After forming the knowledge data identified by the same relationship ID in the knowledge data set, the knowledge data subset includes:

Transpose the entity data vector and integrate with the original entity data vector to form an entity information matrix, wherein the elements of the entity information matrix are product values of the entity data contained in the knowledge data from the knowledge data source ;

Binarize the entity information matrix to obtain a binarized entity information matrix, obtain the main diagonal elements of the binarized entity information matrix, and add the main diagonal elements to obtain The entity ID identification;

After extracting the knowledge data with the same entity ID, sort them according to the chronological order of knowledge data generation to form a knowledge data set;

Transpose the relational data vector and integrate with the original relational data vector to form a relational information matrix, wherein the elements of the relational information matrix are product values of the relational data contained in the knowledge data from the knowledge data source ;

Binarize the relationship information matrix to obtain a binarized relationship information matrix, obtain the main diagonal elements of the binarized relationship information matrix, and add the main diagonal elements to obtain The relationship ID identification;

Traverse the knowledge data set, extract the knowledge data with the relationship ID identified from the relationship information contained in the knowledge data set, and sort by the position of the knowledge data in the knowledge data set at the time of extraction to form A subset of knowledge data.

In this embodiment, the process of forming the entity ID mark and the relationship ID mark is limited, so that the location of the problem data can be effectively found during data tracking.

In one embodiment, in S4, the information similarity of any two of the knowledge data subsets is calculated, and a node of the knowledge graph is established between the knowledge data subsets whose information similarity is greater than a preset similarity threshold ,include:

Discretizing the knowledge data in the knowledge data subset to obtain the discrete value of the knowledge data subset;

Specifically, discretization refers to the mapping of finite individuals in infinite space into a limited space to improve the space-time efficiency of the algorithm. Before discretization, you can use the unique() function to remove duplicate data in the knowledge data, and then discretize the knowledge data.

The discrete values corresponding to any two data subsets are input into the similarity function for operation, and after the parameters are obtained, the information similarity of the any two data subsets is obtained;

Specifically, the similarity function may be an Euclidean distance function, a cosine function, a Hamming function, and so on.

The information similarity is added to the error correction function and modified to obtain the corrected information similarity, and the corrected information similarity is compared with the similarity threshold, if the corrected information is similar If the degree is greater than the similarity threshold, a node of the knowledge graph is established between the subsets of knowledge data, otherwise it is not established.

Specifically, the error correction function may be a primary error correction function or a quadratic error correction function. When the secondary error correction function is used, it is necessary to perform co-integration regression on the information similarity value and then perform the calculation. The similarity threshold is obtained based on historical data, usually the value of the similarity threshold is 99%

In this embodiment, the conditions for establishing the nodes of the knowledge graph are conditionally defined, so as to better determine the storage location of the knowledge data.

In one embodiment, in S5, acquiring feature information of nodes of the knowledge graph, and storing the knowledge data in the database according to the correspondence between the feature information and the storage location of the database, including:

Extracting the attribute information contained in the knowledge data subset connected to the node of the knowledge graph to obtain the attribute value of the attribute information;

Specifically, when the attribute information is numerically converted, a conversion method that can be adopted is to obtain the number of characters or the number of strokes of the attribute information, and use the number of characters or the number of strokes as the attribute value.

Use the attribute value as a key value stored in the database, and obtain a database storage location corresponding to the key value;

According to the storage location of the database, establish a dendritic storage index of the knowledge data, and according to the node location of the subset of the knowledge data in the dendritic storage index, connect the knowledge connected by the nodes of the knowledge graph The knowledge data in the data subset is stored in the database.

Among them, the dendritic storage index is a hierarchical tree structure of the storage location in the database. For example, the location where X data is stored in the database is the A folder, the B folder, and the C subfolder. Then the dendritic storage index is ABC, where A is The master node of the dendritic storage index, B is the slave node, and C is the secondary slave node. When obtaining the X data storage location, first obtain the A master node, then sequentially obtain the B slave node and the C secondary slave node, thereby obtaining X Where the data is stored.

In this embodiment, the accurate storage location of the knowledge data is effectively obtained, thereby facilitating the query of the knowledge data.

In one embodiment, in S103, an extraction method corresponding to the form of the source of the knowledge data is acquired, and extracting the knowledge data of the source of the knowledge data according to the extraction method includes:

If the form of acquiring the source of the knowledge data is unstructured text data, the neural network model is used to extract the knowledge data of the source of the knowledge data, including:

Obtain the unstructured text data, perform matrix conversion on the unstructured text data according to a pre-trained word vector layer to generate a text matrix, and the elements of the text matrix are numerical unstructured text data;

Among them, the trained word vector layer is obtained after training in the long-term and short-term memory neural network model based on historical data; when the unstructured text data is matrix-transformed, according to the generation position of the word vector layer, it is digitized Unstructured text data is written into the text matrix.

Performing regularization processing on the text matrix to obtain a regularized text matrix;

Extract the numerical elements in the regularized text matrix, enter the numerical elements into the cross-entropy loss function for operation, obtain the revised numerical elements after the parameters are obtained, and return the revised numerical elements to the Regularize the original position of the text matrix to get the revised regularized text matrix, where the calculation formula of the cross-entropy loss function is:

In the formula: L(θ) represents the modified numerical element, m represents the total number of predefined relationship types; r _i is the probability value of the predefined relationship type, and the value is 0 or 1; M is the total of the predefined labels Number; y _i is the probability value of the j-th predefined label, and the value is 0 or 1; θ represents a numerical element.

Among them, in this embodiment, the predefined relationship type refers to the relationship type of the text data corresponding to each word vector, for example, the connection verb behind the noun, etc.; the probability value of the predefined relationship type refers to the occurrence of the relationship type of any two word vectors Probability, for example, the probability that "eat" and "meal" are closely connected to form "meal" is 90%, and the probability that the interval is "eat XX meal" is 10%; the pre-defined labels are the labels of word vectors, For example, if there are 5 adverbs and 3 nouns, the total number of labels is 8. The probability of pre-defined labels refers to the probability that a certain word vector appears. For example, in the above example, the probability of adverbs is 0.675.

After the elements in the modified regularized text matrix are sequentially input into the long-term and short-term memory neural network model for training, the feature code of the unstructured text data is obtained, and the knowledge data is extracted according to the feature code Source of knowledge data.

Among them, feature encoding can use one-hot encoding, which uses one-hot encoding to encode the text data in the source of knowledge data, and then compare all the encoded text data information with the data information after the previous encoding, then the extraction comparison is consistent Part of the data.

In this embodiment, the required knowledge data can be effectively extracted from the unstructured text data, and the efficiency of knowledge data extraction is improved.

In one embodiment, a knowledge data storage device is proposed, as shown in FIG. 4, including:

The data acquisition module 41 is configured to send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data source according to the form of knowledge data contained in the feedback information Knowledge data

The vector generation module 42 is configured to extract entity information in the knowledge data, vectorize the entity information, generate entity data vectors, extract relationship information in the knowledge data, and vectorize the relationship information Convert and generate relational data vectors;

The data clustering module 43 is configured to acquire the entity ID identifier of the entity data vector and the relationship ID identifier of the relational data vector, and cluster knowledge data with the same entity ID identifier to form a knowledge data set. The knowledge data identified in the knowledge data set with the same relationship ID forms a subset of knowledge data;

The node establishment module 44 is configured to calculate the information similarity of any two of the knowledge data subsets, and establish a node of the knowledge graph between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;

The data storage module 45 is configured to acquire the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.

A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to execute the knowledge described in the foregoing embodiments Steps of data storage method.

A storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, causes the one or more processors to perform the steps of the knowledge data storage method described in the foregoing embodiments. The storage medium may be a non-volatile storage medium or a volatile storage medium, which is not specifically limited in this application.

A person of ordinary skill in the art may understand that all or part of the steps in the various methods of the foregoing embodiments may be completed by instructing relevant hardware through a program. The program may be stored in a computer-readable storage medium, and the storage medium may include: Read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

The technical features of the above-mentioned embodiments can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, It should be considered as the scope described in this specification.

The above-mentioned embodiments only express some exemplary embodiments of the present application, and the description thereof is more specific and detailed, but it should not be construed as limiting the patent scope of the present application. It should be pointed out that, for a person of ordinary skill in the art, without departing from the concept of the present application, a number of modifications and improvements can also be made, which all fall within the protection scope of the present application. Therefore, the protection scope of the patent of this application shall be subject to the appended claims.

Claims

A method for storing knowledge data, including:

Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;

Extracting entity information in the knowledge data, vectorizing the entity information to generate entity data vectors, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vectors;

Obtain the entity ID identification of the entity data vector and the relationship ID identification of the relationship data vector, cluster knowledge data with the same entity ID identification to form a knowledge data set, and cluster the knowledge data set with the same relationship ID The identified knowledge data forms a subset of knowledge data;

Calculating the information similarity of any two of the knowledge data subsets, and establishing a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;

Obtain the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
The method for storing knowledge data according to claim 1, wherein the sending of a knowledge data extraction instruction to a source of knowledge data of the knowledge data to be extracted, receiving feedback information of the source of knowledge data, according to the information contained in the feedback information In the form of knowledge data, extract the knowledge data from the source of the knowledge data, including:

Obtain the network address of the knowledge data source of the knowledge data to be extracted, compare the network address with the content in the preset network address list, and send the knowledge data if the network address is in the network address list Extract instructions, otherwise do not send;

Receiving feedback information of the source of knowledge data, extracting form keywords of the data source form from the feedback information, and determining the form of the source of knowledge data according to the form keywords;

Acquiring an extraction method corresponding to the form of the knowledge data source, and extracting the knowledge data of the knowledge data source according to the extraction method.
The knowledge data storage method according to claim 1, wherein the entity information in the knowledge data is extracted, the entity information is vectorized and converted to generate an entity data vector, and relationship information in the knowledge data is extracted , Vectorizing the relationship information to generate a relationship data vector, including:

Acquiring the vector dimension corresponding to the entity information according to the amount of entity data in the existing knowledge map, and acquiring the vector dimension corresponding to the relationship information according to the amount of relationship data in the existing knowledge map;

Generating elements of each dimension in the vector corresponding to the entity information according to the vector dimension corresponding to the entity information and the entity data included in the knowledge data from the knowledge data source to obtain an initial entity data vector;

Generating the initial relationship data vector after generating elements of each dimension in the vector corresponding to the relationship information according to the dimension of the vector corresponding to the relationship information and the relationship data contained in the knowledge data from the source of the knowledge data;

Normalize the initial entity data vector to obtain the entity data vector; normalize the initial relationship data vector to obtain the relationship data vector.
The method for storing knowledge data according to claim 1, wherein the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector are clustered after clustering knowledge data with the same entity ID identification Forming a knowledge data set, clustering knowledge data in the knowledge data set with the same relationship ID to form a knowledge data subset, including:

Transpose the entity data vector and integrate with the original entity data vector to form an entity information matrix, wherein the elements of the entity information matrix are product values of the entity data contained in the knowledge data from the knowledge data source ;

Binarize the entity information matrix to obtain a binarized entity information matrix, obtain the main diagonal elements of the binarized entity information matrix, and add the main diagonal elements to obtain The entity ID identification;

After extracting the knowledge data with the same entity ID, sort them according to the chronological order of knowledge data generation to form a knowledge data set;

Transpose the relational data vector and integrate with the original relational data vector to form a relational information matrix, wherein the elements of the relational information matrix are product values of the relational data contained in the knowledge data from the knowledge data source ;

Binarize the relationship information matrix to obtain a binarized relationship information matrix, obtain the main diagonal elements of the binarized relationship information matrix, and add the main diagonal elements to obtain The relationship ID identification;

Traverse the knowledge data set, extract the knowledge data with the relationship ID identified from the relationship information contained in the knowledge data set, and sort by the position of the knowledge data in the knowledge data set at the time of extraction to form A subset of knowledge data.
The knowledge data storage method according to claim 1, wherein said calculating the information similarity of any two of said knowledge data subsets, among the knowledge data subsets where the information similarity is greater than a preset similarity threshold The nodes that establish the knowledge graph include:

Discretizing the knowledge data in the knowledge data subset to obtain the discrete value of the knowledge data subset;

The discrete values corresponding to any two data subsets are input into the similarity function for operation, and after the parameters are obtained, the information similarity of the any two data subsets is obtained;

Enter the information similarity into the error correction function and modify it to obtain the corrected information similarity, and compare the corrected information similarity with the similarity threshold, if the corrected information is similar If the degree is greater than the similarity threshold, a node of the knowledge graph is established between the subsets of knowledge data, otherwise it is not established.
The knowledge data storage method according to claim 1, wherein the feature information of the node that acquires the knowledge graph is stored in the database according to the correspondence between the feature information and the storage location of the database, include:

Extracting the attribute information contained in the knowledge data subset connected to the node of the knowledge graph to obtain the attribute value of the attribute information;

Use the attribute value as a key value stored in the database, and obtain a database storage location corresponding to the key value;

According to the storage location of the database, establish a dendritic storage index of the knowledge data, and according to the node location of the subset of the knowledge data in the dendritic storage index, connect the knowledge connected by the nodes of the knowledge graph The knowledge data in the data subset is stored in the database.
The method for storing knowledge data according to claim 2, wherein the extraction method corresponding to the form of acquiring the source of the knowledge data, and extracting the knowledge data of the source of the knowledge data according to the extraction method includes:

If the form of acquiring the source of the knowledge data is unstructured text data, the neural network model is used to extract the knowledge data of the source of the knowledge data, including:

Obtain the unstructured text data, perform matrix conversion on the unstructured text data according to a pre-trained word vector layer to generate a text matrix, and the elements of the text matrix are numerical unstructured text data;

Perform regularization processing on the text matrix to obtain a regularized text matrix; extract the numerical elements in the regularized text matrix, enter the numerical elements into the cross-entropy loss function for operation, and obtain the modified after the parameters are obtained The numerical element of, returns the corrected numerical element to the original position of the regularized text matrix to obtain the modified regularized text matrix, where the calculation formula of the cross-entropy loss function is:

In the formula: L(θ) represents the modified numerical element, m represents the total number of predefined relationship types; r i is the probability value of the predefined relationship type, and the value is 0 or 1; M is the total of the predefined labels Number; y i is the probability value of the jth pre-defined label, and the value is 0 or 1; θ represents a numerical element;

After the elements in the modified regularized text matrix are sequentially input into the long-term and short-term memory neural network model for training, the feature code of the unstructured text data is obtained, and the knowledge data is extracted according to the feature code Source of knowledge data.
A knowledge data storage device, including:

The data acquisition module is configured to send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data source according to the form of knowledge data contained in the feedback information Knowledge data

The vector generation module is configured to extract entity information in the knowledge data, vectorize the entity information, generate entity data vectors, extract relationship information in the knowledge data, and vectorize the relationship information To generate relational data vectors;

The data clustering module is configured to obtain the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector, and cluster knowledge data with the same entity ID identification to form a knowledge data set. Knowledge data in the knowledge data set with the same relationship ID forms a subset of knowledge data;

A node establishment module, configured to calculate the information similarity of any two of the knowledge data subsets, and to establish a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;

The data storage module is configured to acquire the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
The knowledge data storage device according to claim 8, wherein the data acquisition module is further configured to:

Obtain the network address of the knowledge data source of the knowledge data to be extracted, compare the network address with the content in the preset network address list, and send the knowledge data if the network address is in the network address list Extract instructions, otherwise do not send;

Receiving feedback information of the source of knowledge data, extracting form keywords of the data source form from the feedback information, and determining the form of the source of knowledge data according to the form keywords;

Acquiring an extraction method corresponding to the form of the source of knowledge data, and extracting the knowledge data of the source of knowledge data according to the extraction method.
The knowledge data storage device according to claim 8, wherein the vector generation module is further configured to:

Acquiring the vector dimension corresponding to the entity information according to the amount of entity data in the existing knowledge map, and acquiring the vector dimension corresponding to the relationship information according to the amount of relationship data in the existing knowledge map;

Generating elements of each dimension in the vector corresponding to the entity information according to the vector dimension corresponding to the entity information and the entity data included in the knowledge data from the knowledge data source to obtain an initial entity data vector;

Generating the initial relationship data vector after generating elements of each dimension in the vector corresponding to the relationship information according to the dimension of the vector corresponding to the relationship information and the relationship data contained in the knowledge data from the source of the knowledge data;

Normalizing the initial entity data vector to obtain the entity data vector;

Normalizing the initial relationship data vector to obtain the relationship data vector.
The knowledge data storage device according to claim 8, wherein the data clustering module is further configured to:

Transpose the entity data vector and integrate with the original entity data vector to form an entity information matrix, wherein the elements of the entity information matrix are product values of the entity data contained in the knowledge data from the knowledge data source ;

Binarize the entity information matrix to obtain a binarized entity information matrix, obtain the main diagonal elements of the binarized entity information matrix, and add the main diagonal elements to obtain The entity ID identification;

After extracting the knowledge data with the same entity ID, sort them according to the chronological order of knowledge data generation to form a knowledge data set;

Transpose the relational data vector and integrate with the original relational data vector to form a relational information matrix, wherein the elements of the relational information matrix are product values of the relational data contained in the knowledge data from the knowledge data source ;

Binarize the relationship information matrix to obtain a binarized relationship information matrix, obtain the main diagonal elements of the binarized relationship information matrix, and add the main diagonal elements to obtain The relationship ID identification;

Traverse the knowledge data set, extract the knowledge data with the relationship ID identified from the relationship information contained in the knowledge data set, and sort by the position of the knowledge data in the knowledge data set at the time of extraction to form A subset of knowledge data.
The knowledge data storage device according to claim 8, wherein the node establishment module is further configured to:

Discretizing the knowledge data in the knowledge data subset to obtain the discrete value of the knowledge data subset;

The discrete values corresponding to any two data subsets are input into the similarity function for operation, and after the parameters are obtained, the information similarity of the any two data subsets is obtained;

The information similarity is added to the error correction function and modified to obtain the corrected information similarity, and the corrected information similarity is compared with the similarity threshold, if the corrected information is similar If the degree is greater than the similarity threshold, a node of the knowledge graph is established between the subsets of knowledge data, otherwise it is not established.
The knowledge data storage device according to claim 8, wherein the data storage module is further configured to:

Extracting attribute information contained in the subset of knowledge data connected to the nodes of the knowledge graph to obtain attribute values of the attribute information;

Use the attribute value as a key value stored in the database, and obtain a database storage location corresponding to the key value;

According to the storage location of the database, establish a dendritic storage index of the knowledge data, and according to the node location of the subset of the knowledge data in the dendritic storage index, connect the knowledge connected by the nodes of the knowledge graph The knowledge data in the data subset is stored in the database.
The knowledge data storage device according to claim 13, wherein the data acquisition module is further configured to:

If the form of acquiring the source of the knowledge data is unstructured text data, the neural network model is used to extract the knowledge data of the source of the knowledge data, including:

Obtain the unstructured text data, perform matrix conversion on the unstructured text data according to a pre-trained word vector layer to generate a text matrix, and the elements of the text matrix are numerical unstructured text data;

Perform regularization on the text matrix to obtain a regularized text matrix; extract the numerical elements in the regularized text matrix, enter the numerical elements into the cross-entropy loss function for operation, and obtain the modified after the parameters The numerical element of, returns the corrected numerical element to the original position of the regularized text matrix to obtain the modified regularized text matrix, where the calculation formula of the cross-entropy loss function is:

In the formula: L(θ) represents the corrected numerical element, m represents the total number of predefined relationship types; r i is the probability value of the predefined relationship type, and the value is 0 or 1; M is the total of the predefined labels Number; y i is the probability value of the jth pre-defined label, and the value is 0 or 1; θ represents a numerical element;

After the elements in the modified regularized text matrix are sequentially input into the long-term and short-term memory neural network model for training, the feature code of the unstructured text data is obtained, and the knowledge data is extracted according to the feature code Source of knowledge data.
A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to perform the following steps:

Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;

Extracting entity information in the knowledge data, vectorizing the entity information to generate entity data vectors, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vectors;

Obtain the entity ID identification of the entity data vector and the relationship ID identification of the relationship data vector, cluster knowledge data with the same entity ID identification to form a knowledge data set, and cluster the knowledge data set with the same relationship ID The identified knowledge data forms a subset of knowledge data;

Calculating the information similarity of any two of the knowledge data subsets, and establishing a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;

Obtain the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
A storage medium storing computer-readable instructions, which when executed by one or more processors, causes the one or more processors to perform the following steps:

Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;

Extracting entity information in the knowledge data, vectorizing the entity information to generate entity data vectors, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vectors;

Obtain the entity ID identification of the entity data vector and the relationship ID identification of the relationship data vector, cluster knowledge data with the same entity ID identification to form a knowledge data set, and cluster the knowledge data set with the same relationship ID The identified knowledge data forms a subset of knowledge data;

Calculating the information similarity of any two of the knowledge data subsets, and establishing a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;

Obtain the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
The storage medium storing computer-readable instructions according to claim 16, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the transmitted knowledge data extraction instructions To the knowledge data source of the knowledge data to be extracted, to receive feedback information from the knowledge data source, and to extract the knowledge data from the knowledge data source according to the form of the knowledge data contained in the feedback information, the following steps are also performed:

Obtain the network address of the knowledge data source of the knowledge data to be extracted, compare the network address with the content in the preset network address list, and send the knowledge data if the network address is in the network address list Extract instructions, otherwise do not send;

Receiving feedback information of the source of knowledge data, extracting form keywords of the data source form from the feedback information, and determining the form of the source of knowledge data according to the form keywords;

Acquiring an extraction method corresponding to the form of the source of knowledge data, and extracting the knowledge data of the source of knowledge data according to the extraction method.
The storage medium storing computer-readable instructions according to claim 16, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the extraction of the knowledge data The entity information in the vector is converted into vectors to generate entity data vectors, the relationship information in the knowledge data is extracted, and the relationship information is vectorized to generate the relationship data vector. The following steps:

Acquiring the vector dimension corresponding to the entity information according to the amount of entity data in the existing knowledge map, and acquiring the vector dimension corresponding to the relationship information according to the amount of relationship data in the existing knowledge map;

Generating elements of each dimension in the vector corresponding to the entity information according to the vector dimension corresponding to the entity information and the entity data included in the knowledge data from the knowledge data source to obtain an initial entity data vector;

Generating the initial relationship data vector after generating elements of each dimension in the vector corresponding to the relationship information according to the dimension of the vector corresponding to the relationship information and the relationship data contained in the knowledge data from the source of the knowledge data;

Normalizing the initial entity data vector to obtain the entity data vector;

Normalizing the initial relationship data vector to obtain the relationship data vector.
The storage medium storing computer-readable instructions according to claim 16, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the acquiring the entity data The entity ID of the vector and the relationship ID of the relational data vector, cluster knowledge data with the same entity ID to form a knowledge data set, and after clustering knowledge data with the same relationship ID in the knowledge data set When forming a step of forming a subset of knowledge data, the following steps are also performed:

Transpose the entity data vector and integrate with the original entity data vector to form an entity information matrix, wherein the elements of the entity information matrix are product values of the entity data contained in the knowledge data from the knowledge data source ;

Binarize the entity information matrix to obtain a binarized entity information matrix, obtain the main diagonal elements of the binarized entity information matrix, and add the main diagonal elements to obtain The entity ID identification;

After extracting the knowledge data with the same entity ID, sort them according to the chronological order of knowledge data generation to form a knowledge data set;

Transpose the relational data vector and integrate with the original relational data vector to form a relational information matrix, wherein the elements of the relational information matrix are product values of the relational data contained in the knowledge data from the knowledge data source ;

Binarize the relationship information matrix to obtain a binarized relationship information matrix, obtain the main diagonal elements of the binarized relationship information matrix, and add the main diagonal elements to obtain The relationship ID identification;

Traverse the knowledge data set, extract the knowledge data with the relationship ID identified from the relationship information contained in the knowledge data set, and sort by the position of the knowledge data in the knowledge data set at the time of extraction to form A subset of knowledge data.
The storage medium storing computer readable instructions according to claim 16, wherein when the computer readable instructions are executed by one or more processors, the one or more processors execute any two of the calculations When the information similarity of the knowledge data subset is set up between the knowledge data subset whose information similarity is greater than a preset similarity threshold, a node of a knowledge graph is further executed as follows:

Discretizing the knowledge data in the knowledge data subset to obtain the discrete value of the knowledge data subset;

The discrete values corresponding to any two data subsets are input into the similarity function for operation, and after the parameters are obtained, the information similarity of the any two data subsets is obtained;

Enter the information similarity into the error correction function and modify it to obtain the corrected information similarity, and compare the corrected information similarity with the similarity threshold, if the corrected information is similar If the degree is greater than the similarity threshold, a node of the knowledge graph is established between the subsets of knowledge data, otherwise it is not established.