WO2020143326A1 - Knowledge data storage method, device, computer apparatus, and storage medium - Google Patents

Knowledge data storage method, device, computer apparatus, and storage medium Download PDF

Info

Publication number
WO2020143326A1
WO2020143326A1 PCT/CN2019/118619 CN2019118619W WO2020143326A1 WO 2020143326 A1 WO2020143326 A1 WO 2020143326A1 CN 2019118619 W CN2019118619 W CN 2019118619W WO 2020143326 A1 WO2020143326 A1 WO 2020143326A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
knowledge data
knowledge
information
entity
Prior art date
Application number
PCT/CN2019/118619
Other languages
French (fr)
Chinese (zh)
Inventor
孙佳兴
胡逸凡
陈泽晖
黄鸿顺
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2020143326A1 publication Critical patent/WO2020143326A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • the present application relates to the field of knowledge graph technology, and in particular, to a method, device, computer equipment, and storage medium for storing knowledge data.
  • Knowledge graph is also called scientific knowledge graph. It is called knowledge domain visualization or knowledge domain mapping map in the library and information industry. It is a series of different graphs showing the relationship between the development process and structure of knowledge. Visualization technology is used to describe knowledge resources and their carriers , Mining, analyzing, constructing, drawing and displaying knowledge and their interrelationships.
  • This application provides a method for storing knowledge data, including the following steps:
  • Extracting entity information in the knowledge data vectorizing the entity information to generate entity data vectors
  • extracting relationship information in the knowledge data vectorizing the relationship information to generate relationship data vectors
  • This application provides a knowledge data storage device, including the following modules:
  • the data acquisition module is configured to send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data source according to the form of knowledge data contained in the feedback information Knowledge data
  • the vector generation module is configured to extract entity information in the knowledge data, vectorize the entity information, generate entity data vectors, extract relationship information in the knowledge data, and vectorize the relationship information To generate relational data vectors;
  • the data clustering module is configured to obtain the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector, and cluster knowledge data with the same entity ID identification to form a knowledge data set.
  • Knowledge data in the knowledge data set with the same relationship ID forms a subset of knowledge data;
  • a node establishment module configured to calculate the information similarity of any two of the knowledge data subsets, and to establish a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
  • the data storage module is configured to acquire the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
  • a computer device includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the steps of the knowledge data storage method are caused.
  • a storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, causes the one or more processors to perform the steps of the above knowledge data storage method.
  • FIG. 1 is an overall flowchart of a method for storing knowledge data in an embodiment of the present application
  • FIG. 2 is a schematic diagram of a data acquisition process in a method for storing knowledge data in an embodiment of the present application
  • FIG. 3 is a schematic diagram of a vector generation process in a method for storing knowledge data in an embodiment of the present application
  • FIG. 4 is a structural diagram of a basic knowledge data storage device in an embodiment of the present application.
  • FIG. 1 is an overall flowchart of a method for storing knowledge data in an embodiment of the present application. As shown in FIG. 1, a method for storing knowledge data includes the following steps:
  • S1 Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;
  • the data collection server obtains the IP address of the source of knowledge data with extracted knowledge data
  • the data collection server sends a knowledge data extraction instruction to the knowledge data to be extracted Source of knowledge data.
  • the feedback information After receiving the feedback information from the knowledge source, the feedback information is segmented into several sub-segments, and feature words in the form of knowledge data are extracted from the sub-segments.
  • the knowledge data mainly contains three kinds of information, namely: entity information, relationship information and attribute information.
  • entity information and relationship information exist in the form of text, which is not convenient for comparison of similarity, and the entity information and relationship information are vectorized and converted to obtain entity data vector and relationship data Vectors can be quantified and compared to increase the speed of information processing.
  • the entity ID identification is given when the entity data vector is generated, and it can be used as the entity ID identification according to the generation time of the entity data vector. For example, if the time when the A entity vector is generated is 10:00, the entity ID is identified as 1000. Similarly, the relationship ID identification of the relationship data vector can also be given in the same way as the entity ID identification.
  • the calculation method of the information similarity may use the Euclidean distance algorithm, Pearson correlation coefficient, and cosine similarity algorithm.
  • the specific calculation process one or more of the above methods can be used.
  • the results obtained by different similarity algorithms can be compared. If the similarity obtained by the two algorithms is If the difference is greater than the error threshold (usually 95%), a subset of knowledge data needs to be re-established.
  • the node of the knowledge graph refers to adding a knowledge point to the existing knowledge graph.
  • the attribute "vegetable” is connected with three entities “cabbage”, “cauliflower” and “chili”, and the newly added entity information is “green pepper", which is similar to "chili”
  • the node "green pepper” is established in the existing knowledge graph.
  • the node feature information of the knowledge graph refers to information that the node is different from other nodes.
  • the feature information of the "green pepper” node is "green” compared to the "chili pepper” node.
  • Binary processing is performed on the feature information to obtain a binary character string. Extract the first 5 characters of the binary character string, compare the first 5 characters with the key value of the database, obtain the key value is the storage location of the first 5 characters database, and store the knowledge data in the database in.
  • FIG. 2 is a schematic diagram of a data acquisition process in a method for storing knowledge data in an embodiment of the present application.
  • a knowledge data extraction instruction is sent to a knowledge data source of knowledge data to be extracted.
  • extracting the knowledge data of the source of the knowledge data according to the form of the knowledge data contained in the feedback information, including:
  • the network address of the source of the knowledge data of the knowledge data to be extracted is obtained, and the type of the network address is determined according to the format of the network address, that is, whether the network address is a static IP address or a dynamic IP address, if it is a static IP Address, retrieve the IP address table from the database for comparison, determine whether the static IP address is on the IP address table, send a knowledge data acquisition instruction when it is, and not send it if it is not; if it is a dynamic IP address, then Performing DNS resolution on the dynamic IP address to obtain a DNS resolution code corresponding to the dynamic IP address, and then calling a DNS resolution code table in a database to compare the DNS resolution code to determine whether the DNS resolution code is in the On the DNS resolution code table, the knowledge data acquisition instruction is sent when it is not, and it is not sent when it is not.
  • the format of the network address that is, whether the network address is a static IP address or a dynamic IP address, if it is a static IP Address, retrieve the IP address table from the database for comparison
  • S102 Receive feedback information of the knowledge data source, extract form keywords of the data source form from the feedback information, and determine the form of the knowledge data source according to the form keywords;
  • the formal keyword refers to whether the knowledge data is structured data, semi-structured data, or unstructured data.
  • the keyword of the form “table” corresponds to structured data
  • the keyword of the form “webpage” corresponds to semi-structured data
  • the key of the form “text” appears Words correspond to unstructured data.
  • different forms of data sources correspond to different data extraction methods.
  • semi-structured web page data is usually crawled by web crawlers.
  • unstructured text text language is usually used for extraction.
  • the data form of the source of knowledge data is determined, so that the knowledge data of the source of knowledge data can be extracted by using the correct extraction method.
  • FIG. 3 is a schematic diagram of a vector generation process in a method for storing knowledge data in an embodiment of the present application.
  • the S2 extracts entity information in the knowledge data and converts the entity information into a vector Conversion, generating entity data vectors, extracting the relationship information in the knowledge data, and vectorizing the relationship information to generate relationship data vectors, including:
  • the existing knowledge map refers to the knowledge map that has been stored in the database.
  • the entity feature word query on the existing knowledge map can obtain the amount of entity data.
  • the entity feature word in the sports knowledge map can be "ball “, "swimming", "car”, etc., and then can find the corresponding entity data according to these feature words, such as “basketball", "800 meters freestyle” and so on.
  • the vector dimension corresponding to the entity information is the number of repeated occurrences of the entity information
  • the vector dimension corresponding to the relationship information is the number of repeated occurrences of the relationship information.
  • S202 Generate an initial entity data vector after generating elements of each dimension in the vector corresponding to the entity information according to the vector dimension corresponding to the entity information and the entity data included in the knowledge data from the knowledge data source;
  • the entity data vector refers to that different entity data in the knowledge graph is represented in the form of a vector.
  • the entity data vector may be a person entity data vector, a regional entity data vector, a disease entity data vector, or a symptom entity number vector.
  • relational data vector means that the relational data connecting different entity data is expressed in the form of a vector, and the relational data may be symptom relational data vectors or physical examination relational data.
  • the entity information and the relationship information are quantified, thereby facilitating the analysis of the correlation between the entity information and the relationship information.
  • the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector are acquired, and knowledge data with the same entity ID identification are clustered to form a knowledge data set.
  • the knowledge data subset includes:
  • the process of forming the entity ID mark and the relationship ID mark is limited, so that the location of the problem data can be effectively found during data tracking.
  • the information similarity of any two of the knowledge data subsets is calculated, and a node of the knowledge graph is established between the knowledge data subsets whose information similarity is greater than a preset similarity threshold ,include:
  • discretization refers to the mapping of finite individuals in infinite space into a limited space to improve the space-time efficiency of the algorithm. Before discretization, you can use the unique() function to remove duplicate data in the knowledge data, and then discretize the knowledge data.
  • the discrete values corresponding to any two data subsets are input into the similarity function for operation, and after the parameters are obtained, the information similarity of the any two data subsets is obtained;
  • the similarity function may be an Euclidean distance function, a cosine function, a Hamming function, and so on.
  • the information similarity is added to the error correction function and modified to obtain the corrected information similarity, and the corrected information similarity is compared with the similarity threshold, if the corrected information is similar If the degree is greater than the similarity threshold, a node of the knowledge graph is established between the subsets of knowledge data, otherwise it is not established.
  • the error correction function may be a primary error correction function or a quadratic error correction function.
  • the secondary error correction function it is necessary to perform co-integration regression on the information similarity value and then perform the calculation.
  • the similarity threshold is obtained based on historical data, usually the value of the similarity threshold is 99%
  • the conditions for establishing the nodes of the knowledge graph are conditionally defined, so as to better determine the storage location of the knowledge data.
  • S5 acquiring feature information of nodes of the knowledge graph, and storing the knowledge data in the database according to the correspondence between the feature information and the storage location of the database, including:
  • a conversion method that can be adopted is to obtain the number of characters or the number of strokes of the attribute information, and use the number of characters or the number of strokes as the attribute value.
  • the storage location of the database establish a dendritic storage index of the knowledge data, and according to the node location of the subset of the knowledge data in the dendritic storage index, connect the knowledge connected by the nodes of the knowledge graph
  • the knowledge data in the data subset is stored in the database.
  • the dendritic storage index is a hierarchical tree structure of the storage location in the database.
  • the location where X data is stored in the database is the A folder, the B folder, and the C subfolder.
  • the dendritic storage index is ABC, where A is The master node of the dendritic storage index, B is the slave node, and C is the secondary slave node.
  • the accurate storage location of the knowledge data is effectively obtained, thereby facilitating the query of the knowledge data.
  • an extraction method corresponding to the form of the source of the knowledge data is acquired, and extracting the knowledge data of the source of the knowledge data according to the extraction method includes:
  • the neural network model is used to extract the knowledge data of the source of the knowledge data, including:
  • Obtain the unstructured text data perform matrix conversion on the unstructured text data according to a pre-trained word vector layer to generate a text matrix, and the elements of the text matrix are numerical unstructured text data;
  • the trained word vector layer is obtained after training in the long-term and short-term memory neural network model based on historical data; when the unstructured text data is matrix-transformed, according to the generation position of the word vector layer, it is digitized Unstructured text data is written into the text matrix.
  • Extract the numerical elements in the regularized text matrix enter the numerical elements into the cross-entropy loss function for operation, obtain the revised numerical elements after the parameters are obtained, and return the revised numerical elements to the Regularize the original position of the text matrix to get the revised regularized text matrix, where the calculation formula of the cross-entropy loss function is:
  • L( ⁇ ) represents the modified numerical element
  • m represents the total number of predefined relationship types
  • r i is the probability value of the predefined relationship type, and the value is 0 or 1
  • M is the total of the predefined labels Number
  • y i is the probability value of the j-th predefined label, and the value is 0 or 1
  • represents a numerical element.
  • the predefined relationship type refers to the relationship type of the text data corresponding to each word vector, for example, the connection verb behind the noun, etc.;
  • the probability value of the predefined relationship type refers to the occurrence of the relationship type of any two word vectors Probability, for example, the probability that "eat” and “meal” are closely connected to form “meal” is 90%, and the probability that the interval is "eat XX meal” is 10%;
  • the pre-defined labels are the labels of word vectors, For example, if there are 5 adverbs and 3 nouns, the total number of labels is 8.
  • the probability of pre-defined labels refers to the probability that a certain word vector appears. For example, in the above example, the probability of adverbs is 0.675.
  • the feature code of the unstructured text data is obtained, and the knowledge data is extracted according to the feature code Source of knowledge data.
  • feature encoding can use one-hot encoding, which uses one-hot encoding to encode the text data in the source of knowledge data, and then compare all the encoded text data information with the data information after the previous encoding, then the extraction comparison is consistent Part of the data.
  • the required knowledge data can be effectively extracted from the unstructured text data, and the efficiency of knowledge data extraction is improved.
  • a knowledge data storage device including:
  • the data acquisition module 41 is configured to send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data source according to the form of knowledge data contained in the feedback information Knowledge data
  • the vector generation module 42 is configured to extract entity information in the knowledge data, vectorize the entity information, generate entity data vectors, extract relationship information in the knowledge data, and vectorize the relationship information Convert and generate relational data vectors;
  • the data clustering module 43 is configured to acquire the entity ID identifier of the entity data vector and the relationship ID identifier of the relational data vector, and cluster knowledge data with the same entity ID identifier to form a knowledge data set.
  • the knowledge data identified in the knowledge data set with the same relationship ID forms a subset of knowledge data;
  • the node establishment module 44 is configured to calculate the information similarity of any two of the knowledge data subsets, and establish a node of the knowledge graph between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
  • the data storage module 45 is configured to acquire the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
  • a computer device includes a memory and a processor.
  • the memory stores computer-readable instructions.
  • the processor causes the processor to execute the knowledge described in the foregoing embodiments Steps of data storage method.
  • a storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, causes the one or more processors to perform the steps of the knowledge data storage method described in the foregoing embodiments.
  • the storage medium may be a non-volatile storage medium or a volatile storage medium, which is not specifically limited in this application.
  • the program may be stored in a computer-readable storage medium, and the storage medium may include: Read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.

Abstract

A knowledge data storage method, a device, a computer apparatus, and a storage medium, pertaining to the technical field of knowledge graphs. The method comprises: extracting knowledge data from a knowledge data source; extracting entity information from the knowledge data, performing vectorization conversion on the entity information, generating entity data vectors, extracting relation information from the knowledge data, performing vectorization conversion on the relation information, and generating relation data vectors; acquiring entity ID identifiers of the entity data vectors and relation ID identifiers of the relation data vectors, and performing clustering to form knowledge data subsets; calculating an information similarity level between any two of the knowledge data subsets, and establishing nodes of a knowledge graph; and acquiring feature information of the nodes of the knowledge graph, and storing the knowledge data in a database according to a correspondence relationship between the feature information and a storing position of the database. The method effectively solves the problem of insufficient efficiency in storing and querying knowledge data.

Description

知识数据存储方法、装置、计算机设备和存储介质Knowledge data storage method, device, computer equipment and storage medium
本申请要求于2019年1月11日提交中国专利局、申请号为201910025164.2、发明名称为“知识数据存储方法、装置、计算机设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application requires the priority of the Chinese patent application submitted to the China Patent Office on January 11, 2019, with the application number 201910025164.2 and the invention titled "Knowledge Data Storage Method, Device, Computer Equipment, and Storage Media", the entire contents of which are cited by reference Incorporated in this application.
技术领域Technical field
本申请涉及知识图谱技术领域,尤其涉及一种知识数据存储方法、装置、计算机设备和存储介质。The present application relates to the field of knowledge graph technology, and in particular, to a method, device, computer equipment, and storage medium for storing knowledge data.
背景技术Background technique
知识图谱又称为科学知识图谱,在图书情报界称为知识域可视化或知识领域映射地图,是显示知识发展进程与结构关系的一系列各种不同的图形,用可视化技术描述知识资源及其载体,挖掘、分析、构建、绘制和显示知识及它们之间的相互联系。Knowledge graph is also called scientific knowledge graph. It is called knowledge domain visualization or knowledge domain mapping map in the library and information industry. It is a series of different graphs showing the relationship between the development process and structure of knowledge. Visualization technology is used to describe knowledge resources and their carriers , Mining, analyzing, constructing, drawing and displaying knowledge and their interrelationships.
发明人意识到在将知识图谱中的知识数据存储到数据库中时,存在着由于知识图谱关联的数据量大,导致存储时间长。并且在对知识图谱中的知识数据进行查询时,无法快速查询到所需的知识数据。The inventor realized that when the knowledge data in the knowledge graph is stored in the database, there is a long storage time due to the large amount of data associated with the knowledge graph. And when querying the knowledge data in the knowledge graph, the required knowledge data cannot be quickly queried.
发明内容Summary of the invention
基于此,有必要针对现有知识数据存储时间长查询速度慢的问题,提供一种知识数据存储方法、装置、计算机设备和存储介质。Based on this, it is necessary to provide a knowledge data storage method, device, computer equipment, and storage medium for the problem that the existing knowledge data storage time is long and the query speed is slow.
本申请提供了一种知识数据存储方法,包括如下步骤:This application provides a method for storing knowledge data, including the following steps:
发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据;Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;
抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量;Extracting entity information in the knowledge data, vectorizing the entity information to generate entity data vectors, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vectors;
获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集;Obtain the entity ID identification of the entity data vector and the relationship ID identification of the relationship data vector, cluster knowledge data with the same entity ID identification to form a knowledge data set, and cluster the knowledge data set with the same relationship ID The identified knowledge data forms a subset of knowledge data;
计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点;Calculating the information similarity of any two of the knowledge data subsets, and establishing a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中。Obtain the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
本申请提供了一种知识数据存储装置,包括如下模块:This application provides a knowledge data storage device, including the following modules:
数据获取模块,设置为发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据;The data acquisition module is configured to send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data source according to the form of knowledge data contained in the feedback information Knowledge data
向量生成模块,设置为抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量;The vector generation module is configured to extract entity information in the knowledge data, vectorize the entity information, generate entity data vectors, extract relationship information in the knowledge data, and vectorize the relationship information To generate relational data vectors;
数据聚类模块,设置为获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集;The data clustering module is configured to obtain the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector, and cluster knowledge data with the same entity ID identification to form a knowledge data set. Knowledge data in the knowledge data set with the same relationship ID forms a subset of knowledge data;
节点设立模块,设置为计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点;A node establishment module, configured to calculate the information similarity of any two of the knowledge data subsets, and to establish a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
数据存储模块,设置为获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中。The data storage module is configured to acquire the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述知识数据存储方法的步骤。A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the steps of the knowledge data storage method are caused.
一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述知识数据存储方法的步骤。A storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, causes the one or more processors to perform the steps of the above knowledge data storage method.
附图说明BRIEF DESCRIPTION
图1为本申请在一个实施例中的一种知识数据存储方法的整体流程图;FIG. 1 is an overall flowchart of a method for storing knowledge data in an embodiment of the present application;
图2为本申请在一个实施例中的一种知识数据存储方法中的数据获取过程示意图;2 is a schematic diagram of a data acquisition process in a method for storing knowledge data in an embodiment of the present application;
图3为本申请在一个实施例中的一种知识数据存储方法中的向量生成过程示意图;3 is a schematic diagram of a vector generation process in a method for storing knowledge data in an embodiment of the present application;
图4为本申请在一个实施例中的一种基知识数据存储装置的结构图。FIG. 4 is a structural diagram of a basic knowledge data storage device in an embodiment of the present application.
具体实施方式detailed description
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solutions and advantages of the present application more clear, the following describes the present application in further detail with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present application, and are not used to limit the present application.
本技术领域技术人员可以理解,除非特意声明,这里使用的单数形式“一”、“一个”、“所述”和“该”也可包括复数形式。应该进一步理解的是,本申请的说明书中使用的措辞“包括”是指存在所述特征、整数、步骤、操作、元件和/或组件,但是并不排除存在或添加一个或多个其他特征、整数、步骤、操作、元件、组件和/或它们的组。Those skilled in the art can understand that unless specifically stated, the singular forms "a", "an", "said" and "the" used herein may also include the plural forms. It should be further understood that the word "comprising" used in the description of this application refers to the presence of the described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, Integers, steps, operations, elements, components, and/or their groups.
图1为本申请在一个实施例中的一种知识数据存储方法的整体流程图,如图1所示,一种知识数据存储方法,包括以下步骤:FIG. 1 is an overall flowchart of a method for storing knowledge data in an embodiment of the present application. As shown in FIG. 1, a method for storing knowledge data includes the following steps:
S1,发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据;S1: Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;
具体的,获取带抽取知识数据的知识数据来源IP地址,根据所述IP地址,获取距离所述IP地址最近的数据采集服务器,由所述数据采集服务器发送知识数据抽取指令至待抽取知识数据的知识数据来源。在接收所述知识来源的反馈信息后,将所述反馈信息进行语段分割,分割成数个子语段,从所述子语段中抽取反应知识数据形式的特征词。其中,知识数据的形式主要有三种:结构化知识数据、半结构化知识数据和非结构化知识数据。Specifically, obtain the IP address of the source of knowledge data with extracted knowledge data, obtain the data collection server closest to the IP address according to the IP address, and the data collection server sends a knowledge data extraction instruction to the knowledge data to be extracted Source of knowledge data. After receiving the feedback information from the knowledge source, the feedback information is segmented into several sub-segments, and feature words in the form of knowledge data are extracted from the sub-segments. Among them, there are three main forms of knowledge data: structured knowledge data, semi-structured knowledge data and unstructured knowledge data.
S2,抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量;S2, extracting entity information in the knowledge data, vectorizing the entity information to generate an entity data vector, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vector;
具体的,在知识数据中主要包含着三种信息,分别是:实体信息、关系信息和属性信息。在原有的知识数据中,实体信息和关系信息均是以文字化的形式存在的,这样不便于进行相似性的比较,而将实体信息和关系信息进行向量化转化后得到实体数据向量和关系数据向量能够进行量化比较,从而提升信息处理的速度。Specifically, the knowledge data mainly contains three kinds of information, namely: entity information, relationship information and attribute information. In the original knowledge data, entity information and relationship information exist in the form of text, which is not convenient for comparison of similarity, and the entity information and relationship information are vectorized and converted to obtain entity data vector and relationship data Vectors can be quantified and compared to increase the speed of information processing.
S3,获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集;S3. Obtain the entity ID identifier of the entity data vector and the relationship ID identifier of the relational data vector, and cluster knowledge data with the same entity ID identifier to form a knowledge data set, and cluster the knowledge data sets to have the same The knowledge data identified by the relationship ID forms a subset of knowledge data;
具体的,所述实体ID标识是在实体数据向量产生的时候赋予的,其可以根据实体数据向量的产生时间作为实体ID标识。比如,A实体向量产生的时间为10:00则实体ID标识为1000。类似地,所述关系数据向量的关系ID标识也可以采用与所述实体ID标识同样的方式赋予。Specifically, the entity ID identification is given when the entity data vector is generated, and it can be used as the entity ID identification according to the generation time of the entity data vector. For example, if the time when the A entity vector is generated is 10:00, the entity ID is identified as 1000. Similarly, the relationship ID identification of the relationship data vector can also be given in the same way as the entity ID identification.
S4,计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点;S4. Calculate the information similarity of any two of the knowledge data subsets, and establish a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
具体的,信息相似度的计算方法可以采用欧几里得距离算法、皮尔逊相关系数和余弦相似度算法等。在具体计算过程中,可以采用上述方法中的一种或者多种,当采用多种相似度算法进行计算时,可以将不同相似度算法得到的结果进行比较,若两种算法得到的相似度的差值大于误差阈值(通常为95%)则需要重新建立知识数据子集。Specifically, the calculation method of the information similarity may use the Euclidean distance algorithm, Pearson correlation coefficient, and cosine similarity algorithm. In the specific calculation process, one or more of the above methods can be used. When multiple similarity algorithms are used for calculation, the results obtained by different similarity algorithms can be compared. If the similarity obtained by the two algorithms is If the difference is greater than the error threshold (usually 95%), a subset of knowledge data needs to be re-established.
本步骤中,知识图谱的节点是指在已有的知识图谱中增加一个知识点。比如,在已有的知识图谱中“蔬菜”这一属性连接有“白菜”、“花菜”和“辣椒”三个实体,新加入的实体信息为“青椒”,通过与“辣椒”进行相似度计算后,则在已有的知识图谱中设立“青椒”这一节点。In this step, the node of the knowledge graph refers to adding a knowledge point to the existing knowledge graph. For example, in the existing knowledge graph, the attribute "vegetable" is connected with three entities "cabbage", "cauliflower" and "chili", and the newly added entity information is "green pepper", which is similar to "chili" After the calculation, the node "green pepper" is established in the existing knowledge graph.
S5,获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中。S5: Obtain the feature information of the node of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
具体的,知识图谱的节点特征信息是指该节点不同于其它节点的信息,比如,“青椒”节点与“辣椒”节点相比,其特征信息为“青”。将所述特征信息进行二值化处理后得到二进制的字符串。提取所述二进制字符串的前5为字符,将所述前5位字符与数据库的键值做比较,获取键值是所述前5位字符数据库存储位置后,将所述知识数据存储到数据库中。Specifically, the node feature information of the knowledge graph refers to information that the node is different from other nodes. For example, the feature information of the "green pepper" node is "green" compared to the "chili pepper" node. Binary processing is performed on the feature information to obtain a binary character string. Extract the first 5 characters of the binary character string, compare the first 5 characters with the key value of the database, obtain the key value is the storage location of the first 5 characters database, and store the knowledge data in the database in.
本实施例,通过对知识数据进行有效整理,从而能够快速的存储到数据库的相应位置,从而便于查询知识数据。In this embodiment, by effectively sorting out the knowledge data, it can be quickly stored to the corresponding position of the database, thereby facilitating querying the knowledge data.
图2为本申请在一个实施例中的一种知识数据存储方法中的数据获取过程示意图,如图所示,所述S1,发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据,包括:FIG. 2 is a schematic diagram of a data acquisition process in a method for storing knowledge data in an embodiment of the present application. As shown in the figure, in S1, a knowledge data extraction instruction is sent to a knowledge data source of knowledge data to be extracted. According to the feedback information of the source of the knowledge data, extracting the knowledge data of the source of the knowledge data according to the form of the knowledge data contained in the feedback information, including:
S101、获取所述待抽取知识数据的知识数据来源的网络地址,将所述网络地址与预设的网络地址列表中的内容进行比对,若所述网络地址在所述网络地址列表中则发送知识数据抽取指令,否则不发送;S101: Obtain the network address of the knowledge data source of the knowledge data to be extracted, compare the network address with the content in the preset network address list, and send if the network address is in the network address list Knowledge data extraction instruction, otherwise it will not be sent;
具体的,获取所述待抽取知识数据的知识数据来源的网络地址,根据所述网络地址的格式确定所述网络地址的类型,即所述网络地址是静态IP地址还是动态IP地址,若是静态IP地址,则从数据库中调取IP地址表进行比对后,确定所述静态IP地址是否在所述IP地址表上,在则发送知识数据获取指令,不在则不发送;若是动态IP地址,则对所述动态IP地址进行DNS解析得到所述动态IP地址对应的DNS解析代码,而后调用数据库中的DNS解析代码表对所述DNS解析代码进行比对,确定所述DNS解析代码是否在所述DNS解析代码表上,在则发送知识数据获取指令,不在则不发送。Specifically, the network address of the source of the knowledge data of the knowledge data to be extracted is obtained, and the type of the network address is determined according to the format of the network address, that is, whether the network address is a static IP address or a dynamic IP address, if it is a static IP Address, retrieve the IP address table from the database for comparison, determine whether the static IP address is on the IP address table, send a knowledge data acquisition instruction when it is, and not send it if it is not; if it is a dynamic IP address, then Performing DNS resolution on the dynamic IP address to obtain a DNS resolution code corresponding to the dynamic IP address, and then calling a DNS resolution code table in a database to compare the DNS resolution code to determine whether the DNS resolution code is in the On the DNS resolution code table, the knowledge data acquisition instruction is sent when it is not, and it is not sent when it is not.
S102、接收所述知识数据来源的反馈信息,从所述反馈信息中抽取出数据来源形式的形式关键词,根据所述形式关键词确定所述知识数据来源的形式;S102: Receive feedback information of the knowledge data source, extract form keywords of the data source form from the feedback information, and determine the form of the knowledge data source according to the form keywords;
具体的,形式关键词是指知识数据是结构化数据、半结构化数据还是非结构化数据。比如,反馈信息中出现“表”这一形式关键词,则对应的是结构化数据;出现“网页”这一形式关键词,则对应的是半结构化数据;出现“文本”这一形式关键词,则对应非结构化数据。Specifically, the formal keyword refers to whether the knowledge data is structured data, semi-structured data, or unstructured data. For example, in the feedback information, the keyword of the form "table" corresponds to structured data; the keyword of the form "webpage" corresponds to semi-structured data; the key of the form "text" appears Words correspond to unstructured data.
S103、获取所述知识数据来源的形式对应的抽取方法,根据所述抽取方法抽取所述知识数据来源的知识数据。S103. Acquire an extraction method corresponding to the form of the knowledge data source, and extract the knowledge data of the knowledge data source according to the extraction method.
具体的,不同形式的数据来源对应不同的数据抽取方法,比如,半结构化的网页数据通常采用网络爬虫进行爬取,对于非结构化的文本,通常采用文本语言进行抽取。Specifically, different forms of data sources correspond to different data extraction methods. For example, semi-structured web page data is usually crawled by web crawlers. For unstructured text, text language is usually used for extraction.
本实施例,通过对知识数据来源的反馈信息进行分析,确定知识数据来源的数据形式,从而能够采用正确的抽取方式对知识数据来源的知识数据进行抽取。In this embodiment, by analyzing the feedback information of the source of knowledge data, the data form of the source of knowledge data is determined, so that the knowledge data of the source of knowledge data can be extracted by using the correct extraction method.
图3为本申请在一个实施例中的一种知识数据存储方法中的向量生成过程示意图,如图所示,所述S2,抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量,包括:FIG. 3 is a schematic diagram of a vector generation process in a method for storing knowledge data in an embodiment of the present application. As shown in the figure, the S2 extracts entity information in the knowledge data and converts the entity information into a vector Conversion, generating entity data vectors, extracting the relationship information in the knowledge data, and vectorizing the relationship information to generate relationship data vectors, including:
S201、根据已有知识图谱中的实体数据的数量获取所述实体信息对应的向量维度,根据已有知识图谱中的关系数据的数量获取所述关系信息对应的向量维度;S201. Acquire the vector dimension corresponding to the entity information according to the amount of entity data in the existing knowledge map, and acquire the vector dimension corresponding to the relationship information according to the amount of relationship data in the existing knowledge map;
具体的,已有知识图谱是指在数据库中已经存储的知识图谱,对已有知识图谱进行实体特征词查询可以得出实体数据的数量,比如,体育知识图谱中的实体特征词可以是“球”、“泳”、“车”等,然后根据这些特征词可以找的相应的实体数据,如“篮球”、“800米自由泳”等。实体信息对应的向量维度是实体信息重复出现的次数,关系信息对应的向量维度是关系信息重复出现的次数。Specifically, the existing knowledge map refers to the knowledge map that has been stored in the database. The entity feature word query on the existing knowledge map can obtain the amount of entity data. For example, the entity feature word in the sports knowledge map can be "ball ", "swimming", "car", etc., and then can find the corresponding entity data according to these feature words, such as "basketball", "800 meters freestyle" and so on. The vector dimension corresponding to the entity information is the number of repeated occurrences of the entity information, and the vector dimension corresponding to the relationship information is the number of repeated occurrences of the relationship information.
S202、根据所述实体信息对应的向量维度以及所述知识数据来源的知识数据中包含的实体数据生成所述实体信息对应的向量中每个维度的元素后得到初始实体数据向量;S202: Generate an initial entity data vector after generating elements of each dimension in the vector corresponding to the entity information according to the vector dimension corresponding to the entity information and the entity data included in the knowledge data from the knowledge data source;
具体的,实体数据向量是指在知识图谱中的不同实体数据采用向量的形式进行表示,实体数据向量可以是人物实体数据向量、地区实体数据向量、疾病实体数据向量或症状实体数向量等。Specifically, the entity data vector refers to that different entity data in the knowledge graph is represented in the form of a vector. The entity data vector may be a person entity data vector, a regional entity data vector, a disease entity data vector, or a symptom entity number vector.
S203、根据所述关系信息对应向量的维度以及所述知识数据来源的知识数据中包含的关系数据生成所述关系信息对应的向量中每个维度的元素后得到初始关系数据向量;S203. Generate an initial relationship data vector after generating elements of each dimension in the vector corresponding to the relationship information according to the dimension of the vector corresponding to the relationship information and the relationship data included in the knowledge data from the source of the knowledge data;
具体的,关系数据向量是指连接不同的实体数据之间的关系数据采用向量的形式进行表示,关系数据可以是症状关系数据向量或身体检查关系数据等。Specifically, the relational data vector means that the relational data connecting different entity data is expressed in the form of a vector, and the relational data may be symptom relational data vectors or physical examination relational data.
S204、将所述初始实体数据向量归一化处理得到所述实体数据向量;将所述初始关系数据向量归一化处理得到所述关系数据向量。S204. Normalize the initial entity data vector to obtain the entity data vector; normalize the initial relationship data vector to obtain the relationship data vector.
本实施例,通过建立实体数据向量和关系数据向量,使实体信息和关系信息数量化表示,从而便于对实体信息和关系信息的关联性进行分析。In this embodiment, by establishing the entity data vector and the relationship data vector, the entity information and the relationship information are quantified, thereby facilitating the analysis of the correlation between the entity information and the relationship information.
在一个实施例中,所述S3,获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集,包括:In one embodiment, in S3, the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector are acquired, and knowledge data with the same entity ID identification are clustered to form a knowledge data set. After forming the knowledge data identified by the same relationship ID in the knowledge data set, the knowledge data subset includes:
将所述实体数据向量进行转置后与原所述实体数据向量做积,形成一实体信息矩阵,其中,实体信息矩阵的元素为所述知识数据来源的知识数据中包含的实体数据的乘积值;Transpose the entity data vector and integrate with the original entity data vector to form an entity information matrix, wherein the elements of the entity information matrix are product values of the entity data contained in the knowledge data from the knowledge data source ;
将所述实体信息矩阵进行二值化处理后得到二值化的实体信息矩阵,获取所述二值化的实体信息矩阵的主对角线元素,将所述主对角线元素相加后得到所述实体ID标识;Binarize the entity information matrix to obtain a binarized entity information matrix, obtain the main diagonal elements of the binarized entity information matrix, and add the main diagonal elements to obtain The entity ID identification;
抽取具有相同实体ID标识的知识数据后按照知识数据生成的时间顺序进行排序,形成一知识数据集;After extracting the knowledge data with the same entity ID, sort them according to the chronological order of knowledge data generation to form a knowledge data set;
将所述关系数据向量进行转置后与原所述关系数据向量做积,形成一关系信息矩阵,其中,关系信息矩阵的元素为所述知识数据来源的知识数据中包含的关系数据的乘积值;Transpose the relational data vector and integrate with the original relational data vector to form a relational information matrix, wherein the elements of the relational information matrix are product values of the relational data contained in the knowledge data from the knowledge data source ;
将所述关系信息矩阵进行二值化处理后得到二值化的关系信息矩阵,获取所述二值化的关系信息矩阵的主对角线元素,将所述主对角线元素相加后得到所述关系ID标识;Binarize the relationship information matrix to obtain a binarized relationship information matrix, obtain the main diagonal elements of the binarized relationship information matrix, and add the main diagonal elements to obtain The relationship ID identification;
遍历所述知识数据集,从所述知识数据集包含的关系信息中抽取出带有所述关系ID标识的知识数据,按照抽取时所述知识数据在所述知识数据集中的位置进行排序,形成一知识数据子集。Traverse the knowledge data set, extract the knowledge data with the relationship ID identified from the relationship information contained in the knowledge data set, and sort by the position of the knowledge data in the knowledge data set at the time of extraction to form A subset of knowledge data.
本实施例,通过对对实体ID标识和关系ID标识的形成过程进行限定,从而使在进行数据追踪时能够有效找到问题数据的位置。In this embodiment, the process of forming the entity ID mark and the relationship ID mark is limited, so that the location of the problem data can be effectively found during data tracking.
在一个实施例中,所述S4,计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点,包括:In one embodiment, in S4, the information similarity of any two of the knowledge data subsets is calculated, and a node of the knowledge graph is established between the knowledge data subsets whose information similarity is greater than a preset similarity threshold ,include:
将所述知识数据子集中的知识数据进行离散化处理,得到所述知识数据子集的离散值;Discretizing the knowledge data in the knowledge data subset to obtain the discrete value of the knowledge data subset;
具体的,离散化是指把无限空间中有限的个体映射到有限的空间中去,以此来提高算法的时空效率。在进行离散化处理前,可以使用unique()函数等去除知识数据中的重复数据,而后再对知识数据离散化。Specifically, discretization refers to the mapping of finite individuals in infinite space into a limited space to improve the space-time efficiency of the algorithm. Before discretization, you can use the unique() function to remove duplicate data in the knowledge data, and then discretize the knowledge data.
将任意两个数据子集对应的离散值入参到相似度函数中进行运算,出参后得到所述任意两个数据子集的信息相似度;The discrete values corresponding to any two data subsets are input into the similarity function for operation, and after the parameters are obtained, the information similarity of the any two data subsets is obtained;
具体的,相似度函数可以是欧氏距离函数、余弦函数、汉明函数等。Specifically, the similarity function may be an Euclidean distance function, a cosine function, a Hamming function, and so on.
将所述信息相似度入参到误差修正函数中进行修成后得到修正后的信息相似度,将所述修正后的信息相似度与所述相似度阈值进行比较,若所述修正后的信息相似度大于所述相似度阈值,则在所述知识数据子集之间设立知识图谱的节点,否则不建立。The information similarity is added to the error correction function and modified to obtain the corrected information similarity, and the corrected information similarity is compared with the similarity threshold, if the corrected information is similar If the degree is greater than the similarity threshold, a node of the knowledge graph is established between the subsets of knowledge data, otherwise it is not established.
具体的,误差修正函数可以是一次误差修正函数,也可以是二次误差修正函数,在使用二次误差修正函数时需要对信息相似度数值进行协整回归后再进行计算。相似度阈值是根据历史数据得到的,通常相似度阈值的取值为99%Specifically, the error correction function may be a primary error correction function or a quadratic error correction function. When the secondary error correction function is used, it is necessary to perform co-integration regression on the information similarity value and then perform the calculation. The similarity threshold is obtained based on historical data, usually the value of the similarity threshold is 99%
本实施例,通过对知识图谱的节点的设立进行条件限定,从而更好的确定知识数据存储的位置。In this embodiment, the conditions for establishing the nodes of the knowledge graph are conditionally defined, so as to better determine the storage location of the knowledge data.
在一个实施例中,所述S5,获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中,包括:In one embodiment, in S5, acquiring feature information of nodes of the knowledge graph, and storing the knowledge data in the database according to the correspondence between the feature information and the storage location of the database, including:
抽取所述知识图谱的节点所连接的所述知识数据子集中所包含的属性信息,获取所述属性信息的属性数值;Extracting the attribute information contained in the knowledge data subset connected to the node of the knowledge graph to obtain the attribute value of the attribute information;
具体的,将所述属性信息进行数值转换时,可以采用的转化方式是获取所述属性信息的字符数或者笔画数,将所述字符数或者所述笔画数作为属性数值。Specifically, when the attribute information is numerically converted, a conversion method that can be adopted is to obtain the number of characters or the number of strokes of the attribute information, and use the number of characters or the number of strokes as the attribute value.
将所述属性数值作为存储到所述数据库的键值,获取所述键值对应的数据库存储位置;Use the attribute value as a key value stored in the database, and obtain a database storage location corresponding to the key value;
根据所述数据库存储位置,建立所述知识数据的树枝状存储索引,根据所述知识数据子集在所述树枝状存储索引中的节点位置,将所述知识图谱的节点所连接的所述知识数据子集中的知识数据存储到所述数据库中。According to the storage location of the database, establish a dendritic storage index of the knowledge data, and according to the node location of the subset of the knowledge data in the dendritic storage index, connect the knowledge connected by the nodes of the knowledge graph The knowledge data in the data subset is stored in the database.
其中,树枝状存储索引是将数据库中的存储位置进行树结构分级,比如X数据存储在数据库中的位置为A区B文件夹C子文件夹,那么树枝状的存储索引就是A-B-C,其中A为树枝状存储索引的主节点,B为从节点,C为次级从节点,在获取X数据存储位置时,先获取A主节点,然后依次获取B从节点和C次级从节点,从而获得X数据的存储位置。Among them, the dendritic storage index is a hierarchical tree structure of the storage location in the database. For example, the location where X data is stored in the database is the A folder, the B folder, and the C subfolder. Then the dendritic storage index is ABC, where A is The master node of the dendritic storage index, B is the slave node, and C is the secondary slave node. When obtaining the X data storage location, first obtain the A master node, then sequentially obtain the B slave node and the C secondary slave node, thereby obtaining X Where the data is stored.
本实施例,有效获得知识数据的准确存储位置,从而便于查询知识数据。In this embodiment, the accurate storage location of the knowledge data is effectively obtained, thereby facilitating the query of the knowledge data.
在一个实施例中,所述S103、获取所述知识数据来源的形式对应的抽取方法,根据所述抽取方法抽取所述知识数据来源的知识数据,包括:In one embodiment, in S103, an extraction method corresponding to the form of the source of the knowledge data is acquired, and extracting the knowledge data of the source of the knowledge data according to the extraction method includes:
若所述获取所述知识数据来源的形式为非结构化文本数据,则采用神经网络模型对所述知识数据来源的知识数据进行抽取,包括:If the form of acquiring the source of the knowledge data is unstructured text data, the neural network model is used to extract the knowledge data of the source of the knowledge data, including:
获取所述非结构化文本数据,根据预先训练好的词向量层将所述非结构化文本数据进行矩阵化转换生成文本矩阵,所述文本矩阵的元素为数值化的非结构化文本数据;Obtain the unstructured text data, perform matrix conversion on the unstructured text data according to a pre-trained word vector layer to generate a text matrix, and the elements of the text matrix are numerical unstructured text data;
其中,训练好的词向量层是根据历史数据在长短时记忆神经网络模型中训练后得到的;在将非结构化文本数据进行矩阵化转换时,根据词向量层的生成位置,将数值化的非结构化文本数据写入到文本矩阵中。Among them, the trained word vector layer is obtained after training in the long-term and short-term memory neural network model based on historical data; when the unstructured text data is matrix-transformed, according to the generation position of the word vector layer, it is digitized Unstructured text data is written into the text matrix.
将所述文本矩阵进行正则化处理,得到正则化文本矩阵;Performing regularization processing on the text matrix to obtain a regularized text matrix;
提取所述正则化文本矩阵中的数值元素,将所述数值元素入参到交叉熵损失函数中进行运算,出参后得到修正后的数值元素,将所述修正后的数值元素 返回到所述正则化文本矩阵的原位置,得到修正后的正则化文本矩阵,其中,交叉熵损失函数的计算公式为:Extract the numerical elements in the regularized text matrix, enter the numerical elements into the cross-entropy loss function for operation, obtain the revised numerical elements after the parameters are obtained, and return the revised numerical elements to the Regularize the original position of the text matrix to get the revised regularized text matrix, where the calculation formula of the cross-entropy loss function is:
Figure PCTCN2019118619-appb-000001
Figure PCTCN2019118619-appb-000001
式中:L(θ)表示修正后的数值元素,m表示预定义关系类型的总个数;r i是预定义关系类型的概率值,取值为0或1;M是预定义标签的总个数;y i是第j个预定义标签的概率值,取值为0或1;θ表示数值元素。 In the formula: L(θ) represents the modified numerical element, m represents the total number of predefined relationship types; r i is the probability value of the predefined relationship type, and the value is 0 or 1; M is the total of the predefined labels Number; y i is the probability value of the j-th predefined label, and the value is 0 or 1; θ represents a numerical element.
其中,在本实施例中,预定义关系类型是指文本数据对应各个词向量的关系类型,比如,名词后面连接动词等;预定义关系类型的概率值是指任意两个词向量的关系类型出现的概率,比如,“吃”和“饭”紧密连接成“吃饭”的概率是90%,而间隔连接的为“吃XX饭”的概率是10%;预定义标签是指词向量的标签,比如,副词5个,名词3个,则标签的总个数为8个;预定义标签的概率是指某种词向量的标签出现的概率,比如,上例中,副词的概率为0.675。Among them, in this embodiment, the predefined relationship type refers to the relationship type of the text data corresponding to each word vector, for example, the connection verb behind the noun, etc.; the probability value of the predefined relationship type refers to the occurrence of the relationship type of any two word vectors Probability, for example, the probability that "eat" and "meal" are closely connected to form "meal" is 90%, and the probability that the interval is "eat XX meal" is 10%; the pre-defined labels are the labels of word vectors, For example, if there are 5 adverbs and 3 nouns, the total number of labels is 8. The probability of pre-defined labels refers to the probability that a certain word vector appears. For example, in the above example, the probability of adverbs is 0.675.
将所述修正后的正则化文本矩阵中的元素依次入参到长短时记忆神经网络模型中进行训练后,得到所述非结构化文本数据的特征编码,根据所述特征编码抽取所述知识数据来源的知识数据。After the elements in the modified regularized text matrix are sequentially input into the long-term and short-term memory neural network model for training, the feature code of the unstructured text data is obtained, and the knowledge data is extracted according to the feature code Source of knowledge data.
其中,特征编码可以采用独热编码,利用独热编码对知识数据来源中的文本数据进行编码,然后将编码后的所有文本数据信息与历次编码后的数据信息进行比对,则抽取比对一致的那部分数据。Among them, feature encoding can use one-hot encoding, which uses one-hot encoding to encode the text data in the source of knowledge data, and then compare all the encoded text data information with the data information after the previous encoding, then the extraction comparison is consistent Part of the data.
本实施例,能够有效的从非结构化的文本数据中抽取出所需要的知识数据,提升知识数据抽取的效率。In this embodiment, the required knowledge data can be effectively extracted from the unstructured text data, and the efficiency of knowledge data extraction is improved.
在一个实施例中,提出了一种知识数据存储装置,如图4所示,包括:In one embodiment, a knowledge data storage device is proposed, as shown in FIG. 4, including:
数据获取模块41,设置为发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据;The data acquisition module 41 is configured to send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data source according to the form of knowledge data contained in the feedback information Knowledge data
向量生成模块42,设置为抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量;The vector generation module 42 is configured to extract entity information in the knowledge data, vectorize the entity information, generate entity data vectors, extract relationship information in the knowledge data, and vectorize the relationship information Convert and generate relational data vectors;
数据聚类模块43,设置为获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集;The data clustering module 43 is configured to acquire the entity ID identifier of the entity data vector and the relationship ID identifier of the relational data vector, and cluster knowledge data with the same entity ID identifier to form a knowledge data set. The knowledge data identified in the knowledge data set with the same relationship ID forms a subset of knowledge data;
节点设立模块44,设置为计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点;The node establishment module 44 is configured to calculate the information similarity of any two of the knowledge data subsets, and establish a node of the knowledge graph between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
数据存储模块45,设置为获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中。The data storage module 45 is configured to acquire the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行上述各实施例中所述知识数据存储方法的步骤。A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to execute the knowledge described in the foregoing embodiments Steps of data storage method.
一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行上述各实施例中所述知识数据存储方法的步骤。所述存储介质可以为非易失性存储介质,也可以为易失性存储介质,具体本申请不做限定。A storage medium storing computer-readable instructions, when the computer-readable instructions are executed by one or more processors, causes the one or more processors to perform the steps of the knowledge data storage method described in the foregoing embodiments. The storage medium may be a non-volatile storage medium or a volatile storage medium, which is not specifically limited in this application.
本领域普通技术人员可以理解上述实施例的各种方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,该程序可以存储于一计算机可读存储介质中,存储介质可以包括:只读存储器(ROM,Read Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁盘或光盘等。A person of ordinary skill in the art may understand that all or part of the steps in the various methods of the foregoing embodiments may be completed by instructing relevant hardware through a program. The program may be stored in a computer-readable storage medium, and the storage medium may include: Read only memory (ROM, Read Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disk, etc.
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。The technical features of the above-mentioned embodiments can be arbitrarily combined. In order to simplify the description, all possible combinations of the technical features in the above-mentioned embodiments are not described. However, as long as there is no contradiction in the combination of these technical features, It should be considered as the scope described in this specification.
以上所述实施例仅表达了本申请一些示例性实施例,其中描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。The above-mentioned embodiments only express some exemplary embodiments of the present application, and the description thereof is more specific and detailed, but it should not be construed as limiting the patent scope of the present application. It should be pointed out that, for a person of ordinary skill in the art, without departing from the concept of the present application, a number of modifications and improvements can also be made, which all fall within the protection scope of the present application. Therefore, the protection scope of the patent of this application shall be subject to the appended claims.

Claims (20)

  1. 一种知识数据存储方法,其中,包括:A method for storing knowledge data, including:
    发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据;Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;
    抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量;Extracting entity information in the knowledge data, vectorizing the entity information to generate entity data vectors, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vectors;
    获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集;Obtain the entity ID identification of the entity data vector and the relationship ID identification of the relationship data vector, cluster knowledge data with the same entity ID identification to form a knowledge data set, and cluster the knowledge data set with the same relationship ID The identified knowledge data forms a subset of knowledge data;
    计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点;Calculating the information similarity of any two of the knowledge data subsets, and establishing a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
    获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中。Obtain the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
  2. 根据权利要求1所述的知识数据存储方法,其中,所述发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据,包括:The method for storing knowledge data according to claim 1, wherein the sending of a knowledge data extraction instruction to a source of knowledge data of the knowledge data to be extracted, receiving feedback information of the source of knowledge data, according to the information contained in the feedback information In the form of knowledge data, extract the knowledge data from the source of the knowledge data, including:
    获取所述待抽取知识数据的知识数据来源的网络地址,将所述网络地址与预设的网络地址列表中的内容进行比对,若所述网络地址在所述网络地址列表中则发送知识数据抽取指令,否则不发送;Obtain the network address of the knowledge data source of the knowledge data to be extracted, compare the network address with the content in the preset network address list, and send the knowledge data if the network address is in the network address list Extract instructions, otherwise do not send;
    接收所述知识数据来源的反馈信息,从所述反馈信息中抽取出数据来源形式的形式关键词,根据所述形式关键词确定所述知识数据来源的形式;Receiving feedback information of the source of knowledge data, extracting form keywords of the data source form from the feedback information, and determining the form of the source of knowledge data according to the form keywords;
    获取所述知识数据来源的形式对应的抽取方法,根据所述抽取方法抽取所述知识数据来源的知识数据。Acquiring an extraction method corresponding to the form of the knowledge data source, and extracting the knowledge data of the knowledge data source according to the extraction method.
  3. 根据权利要求1所述的知识数据存储方法,其中,所述抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量,包括:The knowledge data storage method according to claim 1, wherein the entity information in the knowledge data is extracted, the entity information is vectorized and converted to generate an entity data vector, and relationship information in the knowledge data is extracted , Vectorizing the relationship information to generate a relationship data vector, including:
    根据已有知识图谱中的实体数据的数量获取所述实体信息对应的向量维度,根据已有知识图谱中的关系数据的数量获取所述关系信息对应的向量维度;Acquiring the vector dimension corresponding to the entity information according to the amount of entity data in the existing knowledge map, and acquiring the vector dimension corresponding to the relationship information according to the amount of relationship data in the existing knowledge map;
    根据所述实体信息对应的向量维度以及所述知识数据来源的知识数据中包含的实体数据生成所述实体信息对应的向量中每个维度的元素后得到初始实体数据向量;Generating elements of each dimension in the vector corresponding to the entity information according to the vector dimension corresponding to the entity information and the entity data included in the knowledge data from the knowledge data source to obtain an initial entity data vector;
    根据所述关系信息对应向量的维度以及所述知识数据来源的知识数据中包含的关系数据生成所述关系信息对应的向量中每个维度的元素后得到初始关系数据向量;Generating the initial relationship data vector after generating elements of each dimension in the vector corresponding to the relationship information according to the dimension of the vector corresponding to the relationship information and the relationship data contained in the knowledge data from the source of the knowledge data;
    将所述初始实体数据向量归一化处理得到所述实体数据向量;将所述初始关系数据向量归一化处理得到所述关系数据向量。Normalize the initial entity data vector to obtain the entity data vector; normalize the initial relationship data vector to obtain the relationship data vector.
  4. 根据权利要求1所述的知识数据存储方法,其中,所述获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集,包括:The method for storing knowledge data according to claim 1, wherein the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector are clustered after clustering knowledge data with the same entity ID identification Forming a knowledge data set, clustering knowledge data in the knowledge data set with the same relationship ID to form a knowledge data subset, including:
    将所述实体数据向量进行转置后与原所述实体数据向量做积,形成一实体信息矩阵,其中,实体信息矩阵的元素为所述知识数据来源的知识数据中包含的实体数据的乘积值;Transpose the entity data vector and integrate with the original entity data vector to form an entity information matrix, wherein the elements of the entity information matrix are product values of the entity data contained in the knowledge data from the knowledge data source ;
    将所述实体信息矩阵进行二值化处理后得到二值化的实体信息矩阵,获取所述二值化的实体信息矩阵的主对角线元素,将所述主对角线元素相加后得到所述实体ID标识;Binarize the entity information matrix to obtain a binarized entity information matrix, obtain the main diagonal elements of the binarized entity information matrix, and add the main diagonal elements to obtain The entity ID identification;
    抽取具有相同实体ID标识的知识数据后按照知识数据生成的时间顺序进行排序,形成一知识数据集;After extracting the knowledge data with the same entity ID, sort them according to the chronological order of knowledge data generation to form a knowledge data set;
    将所述关系数据向量进行转置后与原所述关系数据向量做积,形成一关系信息矩阵,其中,关系信息矩阵的元素为所述知识数据来源的知识数据中包含的关系数据的乘积值;Transpose the relational data vector and integrate with the original relational data vector to form a relational information matrix, wherein the elements of the relational information matrix are product values of the relational data contained in the knowledge data from the knowledge data source ;
    将所述关系信息矩阵进行二值化处理后得到二值化的关系信息矩阵,获取所述二值化的关系信息矩阵的主对角线元素,将所述主对角线元素相加后得到所述关系ID标识;Binarize the relationship information matrix to obtain a binarized relationship information matrix, obtain the main diagonal elements of the binarized relationship information matrix, and add the main diagonal elements to obtain The relationship ID identification;
    遍历所述知识数据集,从所述知识数据集包含的关系信息中抽取出带有所述关系ID标识的知识数据,按照抽取时所述知识数据在所述知识数据集中的位置进行排序,形成一知识数据子集。Traverse the knowledge data set, extract the knowledge data with the relationship ID identified from the relationship information contained in the knowledge data set, and sort by the position of the knowledge data in the knowledge data set at the time of extraction to form A subset of knowledge data.
  5. 根据权利要求1所述的知识数据存储方法,其中,所述计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点,包括:The knowledge data storage method according to claim 1, wherein said calculating the information similarity of any two of said knowledge data subsets, among the knowledge data subsets where the information similarity is greater than a preset similarity threshold The nodes that establish the knowledge graph include:
    将所述知识数据子集中的知识数据进行离散化处理,得到所述知识数据子集的离散值;Discretizing the knowledge data in the knowledge data subset to obtain the discrete value of the knowledge data subset;
    将任意两个数据子集对应的离散值入参到相似度函数中进行运算,出参后得到所述任意两个数据子集的信息相似度;The discrete values corresponding to any two data subsets are input into the similarity function for operation, and after the parameters are obtained, the information similarity of the any two data subsets is obtained;
    将所述信息相似度入参到误差修正函数中进行修成后得到修正后的信息相似度,将所述修正后的信息相似度与所述相似度阈值进行比较,若所述修正后 的信息相似度大于所述相似度阈值,则在所述知识数据子集之间设立知识图谱的节点,否则不建立。Enter the information similarity into the error correction function and modify it to obtain the corrected information similarity, and compare the corrected information similarity with the similarity threshold, if the corrected information is similar If the degree is greater than the similarity threshold, a node of the knowledge graph is established between the subsets of knowledge data, otherwise it is not established.
  6. 根据权利要求1所述的知识数据存储方法,其中,所述获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中,包括:The knowledge data storage method according to claim 1, wherein the feature information of the node that acquires the knowledge graph is stored in the database according to the correspondence between the feature information and the storage location of the database, include:
    抽取所述知识图谱的节点所连接的所述知识数据子集中所包含的属性信息,获取所述属性信息的属性数值;Extracting the attribute information contained in the knowledge data subset connected to the node of the knowledge graph to obtain the attribute value of the attribute information;
    将所述属性数值作为存储到所述数据库的键值,获取所述键值对应的数据库存储位置;Use the attribute value as a key value stored in the database, and obtain a database storage location corresponding to the key value;
    根据所述数据库存储位置,建立所述知识数据的树枝状存储索引,根据所述知识数据子集在所述树枝状存储索引中的节点位置,将所述知识图谱的节点所连接的所述知识数据子集中的知识数据存储到所述数据库中。According to the storage location of the database, establish a dendritic storage index of the knowledge data, and according to the node location of the subset of the knowledge data in the dendritic storage index, connect the knowledge connected by the nodes of the knowledge graph The knowledge data in the data subset is stored in the database.
  7. 根据权利要求2所述的知识数据存储方法,其中,所述获取所述知识数据来源的形式对应的抽取方法,根据所述抽取方法抽取所述知识数据来源的知识数据,包括:The method for storing knowledge data according to claim 2, wherein the extraction method corresponding to the form of acquiring the source of the knowledge data, and extracting the knowledge data of the source of the knowledge data according to the extraction method includes:
    若所述获取所述知识数据来源的形式为非结构化文本数据,则采用神经网络模型对所述知识数据来源的知识数据进行抽取,包括:If the form of acquiring the source of the knowledge data is unstructured text data, the neural network model is used to extract the knowledge data of the source of the knowledge data, including:
    获取所述非结构化文本数据,根据预先训练好的词向量层将所述非结构化文本数据进行矩阵化转换生成文本矩阵,所述文本矩阵的元素为数值化的非结构化文本数据;Obtain the unstructured text data, perform matrix conversion on the unstructured text data according to a pre-trained word vector layer to generate a text matrix, and the elements of the text matrix are numerical unstructured text data;
    将所述文本矩阵进行正则化处理,得到正则化文本矩阵;提取所述正则化文本矩阵中的数值元素,将所述数值元素入参到交叉熵损失函数中进行运算,出参后得到修正后的数值元素,将所述修正后的数值元素返回到所述正则化文本矩阵的原位置,得到修正后的正则化文本矩阵,其中,交叉熵损失函数的计算公式为:Perform regularization processing on the text matrix to obtain a regularized text matrix; extract the numerical elements in the regularized text matrix, enter the numerical elements into the cross-entropy loss function for operation, and obtain the modified after the parameters are obtained The numerical element of, returns the corrected numerical element to the original position of the regularized text matrix to obtain the modified regularized text matrix, where the calculation formula of the cross-entropy loss function is:
    Figure PCTCN2019118619-appb-100001
    Figure PCTCN2019118619-appb-100001
    式中:L(θ)表示修正后的数值元素,m表示预定义关系类型的总个数;r i是预定义关系类型的概率值,取值为0或1;M是预定义标签的总个数;y i是第j个预定义标签的概率值,取值为0或1;θ表示数值元素; In the formula: L(θ) represents the modified numerical element, m represents the total number of predefined relationship types; r i is the probability value of the predefined relationship type, and the value is 0 or 1; M is the total of the predefined labels Number; y i is the probability value of the jth pre-defined label, and the value is 0 or 1; θ represents a numerical element;
    将所述修正后的正则化文本矩阵中的元素依次入参到长短时记忆神经网络模型中进行训练后,得到所述非结构化文本数据的特征编码,根据所述特征编码抽取所述知识数据来源的知识数据。After the elements in the modified regularized text matrix are sequentially input into the long-term and short-term memory neural network model for training, the feature code of the unstructured text data is obtained, and the knowledge data is extracted according to the feature code Source of knowledge data.
  8. 一种知识数据存储装置,其中,包括:A knowledge data storage device, including:
    数据获取模块,设置为发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据;The data acquisition module is configured to send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data source according to the form of knowledge data contained in the feedback information Knowledge data
    向量生成模块,设置为抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量;The vector generation module is configured to extract entity information in the knowledge data, vectorize the entity information, generate entity data vectors, extract relationship information in the knowledge data, and vectorize the relationship information To generate relational data vectors;
    数据聚类模块,设置为获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集;The data clustering module is configured to obtain the entity ID identification of the entity data vector and the relationship ID identification of the relational data vector, and cluster knowledge data with the same entity ID identification to form a knowledge data set. Knowledge data in the knowledge data set with the same relationship ID forms a subset of knowledge data;
    节点设立模块,设置为计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点;A node establishment module, configured to calculate the information similarity of any two of the knowledge data subsets, and to establish a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
    数据存储模块,设置为获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中。The data storage module is configured to acquire the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
  9. 根据权利要求8所述的知识数据存储装置,其中,所述数据获取模块,还设置为:The knowledge data storage device according to claim 8, wherein the data acquisition module is further configured to:
    获取所述待抽取知识数据的知识数据来源的网络地址,将所述网络地址与预设的网络地址列表中的内容进行比对,若所述网络地址在所述网络地址列表中则发送知识数据抽取指令,否则不发送;Obtain the network address of the knowledge data source of the knowledge data to be extracted, compare the network address with the content in the preset network address list, and send the knowledge data if the network address is in the network address list Extract instructions, otherwise do not send;
    接收所述知识数据来源的反馈信息,从所述反馈信息中抽取出数据来源形式的形式关键词,根据所述形式关键词确定所述知识数据来源的形式;Receiving feedback information of the source of knowledge data, extracting form keywords of the data source form from the feedback information, and determining the form of the source of knowledge data according to the form keywords;
    获取所述知识数据来源的形式对应的抽取方法,根据所述抽取方法抽取所述知识数据来源的知识数据。Acquiring an extraction method corresponding to the form of the source of knowledge data, and extracting the knowledge data of the source of knowledge data according to the extraction method.
  10. 根据权利要求8所述的知识数据存储装置,其中,所述向量生成模块,还设置为:The knowledge data storage device according to claim 8, wherein the vector generation module is further configured to:
    根据已有知识图谱中的实体数据的数量获取所述实体信息对应的向量维度,根据已有知识图谱中的关系数据的数量获取所述关系信息对应的向量维度;Acquiring the vector dimension corresponding to the entity information according to the amount of entity data in the existing knowledge map, and acquiring the vector dimension corresponding to the relationship information according to the amount of relationship data in the existing knowledge map;
    根据所述实体信息对应的向量维度以及所述知识数据来源的知识数据中包含的实体数据生成所述实体信息对应的向量中每个维度的元素后得到初始实体数据向量;Generating elements of each dimension in the vector corresponding to the entity information according to the vector dimension corresponding to the entity information and the entity data included in the knowledge data from the knowledge data source to obtain an initial entity data vector;
    根据所述关系信息对应向量的维度以及所述知识数据来源的知识数据中包含的关系数据生成所述关系信息对应的向量中每个维度的元素后得到初始关系数据向量;Generating the initial relationship data vector after generating elements of each dimension in the vector corresponding to the relationship information according to the dimension of the vector corresponding to the relationship information and the relationship data contained in the knowledge data from the source of the knowledge data;
    将所述初始实体数据向量归一化处理得到所述实体数据向量;Normalizing the initial entity data vector to obtain the entity data vector;
    将所述初始关系数据向量归一化处理得到所述关系数据向量。Normalizing the initial relationship data vector to obtain the relationship data vector.
  11. 根据权利要求8所述的知识数据存储装置,其中,所述数据聚类模块,还设置为:The knowledge data storage device according to claim 8, wherein the data clustering module is further configured to:
    将所述实体数据向量进行转置后与原所述实体数据向量做积,形成一实体信息矩阵,其中,实体信息矩阵的元素为所述知识数据来源的知识数据中包含的实体数据的乘积值;Transpose the entity data vector and integrate with the original entity data vector to form an entity information matrix, wherein the elements of the entity information matrix are product values of the entity data contained in the knowledge data from the knowledge data source ;
    将所述实体信息矩阵进行二值化处理后得到二值化的实体信息矩阵,获取所述二值化的实体信息矩阵的主对角线元素,将所述主对角线元素相加后得到所述实体ID标识;Binarize the entity information matrix to obtain a binarized entity information matrix, obtain the main diagonal elements of the binarized entity information matrix, and add the main diagonal elements to obtain The entity ID identification;
    抽取具有相同实体ID标识的知识数据后按照知识数据生成的时间顺序进行排序,形成一知识数据集;After extracting the knowledge data with the same entity ID, sort them according to the chronological order of knowledge data generation to form a knowledge data set;
    将所述关系数据向量进行转置后与原所述关系数据向量做积,形成一关系信息矩阵,其中,关系信息矩阵的元素为所述知识数据来源的知识数据中包含的关系数据的乘积值;Transpose the relational data vector and integrate with the original relational data vector to form a relational information matrix, wherein the elements of the relational information matrix are product values of the relational data contained in the knowledge data from the knowledge data source ;
    将所述关系信息矩阵进行二值化处理后得到二值化的关系信息矩阵,获取所述二值化的关系信息矩阵的主对角线元素,将所述主对角线元素相加后得到所述关系ID标识;Binarize the relationship information matrix to obtain a binarized relationship information matrix, obtain the main diagonal elements of the binarized relationship information matrix, and add the main diagonal elements to obtain The relationship ID identification;
    遍历所述知识数据集,从所述知识数据集包含的关系信息中抽取出带有所述关系ID标识的知识数据,按照抽取时所述知识数据在所述知识数据集中的位置进行排序,形成一知识数据子集。Traverse the knowledge data set, extract the knowledge data with the relationship ID identified from the relationship information contained in the knowledge data set, and sort by the position of the knowledge data in the knowledge data set at the time of extraction to form A subset of knowledge data.
  12. 根据权利要求8所述的知识数据存储装置,其中,所述节点设立模块,还设置为:The knowledge data storage device according to claim 8, wherein the node establishment module is further configured to:
    将所述知识数据子集中的知识数据进行离散化处理,得到所述知识数据子集的离散值;Discretizing the knowledge data in the knowledge data subset to obtain the discrete value of the knowledge data subset;
    将任意两个数据子集对应的离散值入参到相似度函数中进行运算,出参后得到所述任意两个数据子集的信息相似度;The discrete values corresponding to any two data subsets are input into the similarity function for operation, and after the parameters are obtained, the information similarity of the any two data subsets is obtained;
    将所述信息相似度入参到误差修正函数中进行修成后得到修正后的信息相似度,将所述修正后的信息相似度与所述相似度阈值进行比较,若所述修正后的信息相似度大于所述相似度阈值,则在所述知识数据子集之间设立知识图谱的节点,否则不建立。The information similarity is added to the error correction function and modified to obtain the corrected information similarity, and the corrected information similarity is compared with the similarity threshold, if the corrected information is similar If the degree is greater than the similarity threshold, a node of the knowledge graph is established between the subsets of knowledge data, otherwise it is not established.
  13. 根据权利要求8所述的知识数据存储装置,其中,所述数据存储模块,还设置为:The knowledge data storage device according to claim 8, wherein the data storage module is further configured to:
    抽取所述知识图谱的节点所连接的所述知识数据子集中所包含的属性信息,获取所述属性信息的属性数值;Extracting attribute information contained in the subset of knowledge data connected to the nodes of the knowledge graph to obtain attribute values of the attribute information;
    将所述属性数值作为存储到所述数据库的键值,获取所述键值对应的数据库存储位置;Use the attribute value as a key value stored in the database, and obtain a database storage location corresponding to the key value;
    根据所述数据库存储位置,建立所述知识数据的树枝状存储索引,根据所述知识数据子集在所述树枝状存储索引中的节点位置,将所述知识图谱的节点所连接的所述知识数据子集中的知识数据存储到所述数据库中。According to the storage location of the database, establish a dendritic storage index of the knowledge data, and according to the node location of the subset of the knowledge data in the dendritic storage index, connect the knowledge connected by the nodes of the knowledge graph The knowledge data in the data subset is stored in the database.
  14. 根据权利要求13所述的知识数据存储装置,其中,所述数据获取模块,还设置为:The knowledge data storage device according to claim 13, wherein the data acquisition module is further configured to:
    若所述获取所述知识数据来源的形式为非结构化文本数据,则采用神经网络模型对所述知识数据来源的知识数据进行抽取,包括:If the form of acquiring the source of the knowledge data is unstructured text data, the neural network model is used to extract the knowledge data of the source of the knowledge data, including:
    获取所述非结构化文本数据,根据预先训练好的词向量层将所述非结构化文本数据进行矩阵化转换生成文本矩阵,所述文本矩阵的元素为数值化的非结构化文本数据;Obtain the unstructured text data, perform matrix conversion on the unstructured text data according to a pre-trained word vector layer to generate a text matrix, and the elements of the text matrix are numerical unstructured text data;
    将所述文本矩阵进行正则化处理,得到正则化文本矩阵;提取所述正则化文本矩阵中的数值元素,将所述数值元素入参到交叉熵损失函数中进行运算,出参后得到修正后的数值元素,将所述修正后的数值元素返回到所述正则化文本矩阵的原位置,得到修正后的正则化文本矩阵,其中,交叉熵损失函数的计算公式为:Perform regularization on the text matrix to obtain a regularized text matrix; extract the numerical elements in the regularized text matrix, enter the numerical elements into the cross-entropy loss function for operation, and obtain the modified after the parameters The numerical element of, returns the corrected numerical element to the original position of the regularized text matrix to obtain the modified regularized text matrix, where the calculation formula of the cross-entropy loss function is:
    Figure PCTCN2019118619-appb-100002
    Figure PCTCN2019118619-appb-100002
    式中:L(θ)表示修正后的数值元素,m表示预定义关系类型的总个数;r i是预定义关系类型的概率值,取值为0或1;M是预定义标签的总个数;y i是第j个预定义标签的概率值,取值为0或1;θ表示数值元素; In the formula: L(θ) represents the corrected numerical element, m represents the total number of predefined relationship types; r i is the probability value of the predefined relationship type, and the value is 0 or 1; M is the total of the predefined labels Number; y i is the probability value of the jth pre-defined label, and the value is 0 or 1; θ represents a numerical element;
    将所述修正后的正则化文本矩阵中的元素依次入参到长短时记忆神经网络模型中进行训练后,得到所述非结构化文本数据的特征编码,根据所述特征编码抽取所述知识数据来源的知识数据。After the elements in the modified regularized text matrix are sequentially input into the long-term and short-term memory neural network model for training, the feature code of the unstructured text data is obtained, and the knowledge data is extracted according to the feature code Source of knowledge data.
  15. 一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行以下步骤:A computer device includes a memory and a processor. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the processor causes the processor to perform the following steps:
    发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据;Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;
    抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量;Extracting entity information in the knowledge data, vectorizing the entity information to generate entity data vectors, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vectors;
    获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集;Obtain the entity ID identification of the entity data vector and the relationship ID identification of the relationship data vector, cluster knowledge data with the same entity ID identification to form a knowledge data set, and cluster the knowledge data set with the same relationship ID The identified knowledge data forms a subset of knowledge data;
    计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点;Calculating the information similarity of any two of the knowledge data subsets, and establishing a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
    获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中。Obtain the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
  16. 一种存储有计算机可读指令的存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行以下步骤:A storage medium storing computer-readable instructions, which when executed by one or more processors, causes the one or more processors to perform the following steps:
    发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据;Send a knowledge data extraction instruction to the knowledge data source of the knowledge data to be extracted, receive feedback information of the knowledge data source, and extract the knowledge data of the knowledge data source according to the form of knowledge data contained in the feedback information;
    抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量;Extracting entity information in the knowledge data, vectorizing the entity information to generate entity data vectors, extracting relationship information in the knowledge data, vectorizing the relationship information to generate relationship data vectors;
    获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集;Obtain the entity ID identification of the entity data vector and the relationship ID identification of the relationship data vector, cluster knowledge data with the same entity ID identification to form a knowledge data set, and cluster the knowledge data set with the same relationship ID The identified knowledge data forms a subset of knowledge data;
    计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相似度阈值的所述知识数据子集之间设立知识图谱的节点;Calculating the information similarity of any two of the knowledge data subsets, and establishing a knowledge graph node between the knowledge data subsets whose information similarity is greater than a preset similarity threshold;
    获取所述知识图谱的节点的特征信息,根据所述特征信息与数据库存储位置的对应关系,将所述知识数据存储到数据库中。Obtain the feature information of the nodes of the knowledge graph, and store the knowledge data in the database according to the correspondence between the feature information and the storage location of the database.
  17. 根据权利要求16所述的存储有计算机可读指令的存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行所述发送知识数据抽取指令至待抽取知识数据的知识数据来源,接收所述知识数据来源的反馈信息,根据所述反馈信息中所包含的知识数据形式抽取所述知识数据来源的知识数据的步骤时,还执行如下步骤:The storage medium storing computer-readable instructions according to claim 16, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the transmitted knowledge data extraction instructions To the knowledge data source of the knowledge data to be extracted, to receive feedback information from the knowledge data source, and to extract the knowledge data from the knowledge data source according to the form of the knowledge data contained in the feedback information, the following steps are also performed:
    获取所述待抽取知识数据的知识数据来源的网络地址,将所述网络地址与预设的网络地址列表中的内容进行比对,若所述网络地址在所述网络地址列表中则发送知识数据抽取指令,否则不发送;Obtain the network address of the knowledge data source of the knowledge data to be extracted, compare the network address with the content in the preset network address list, and send the knowledge data if the network address is in the network address list Extract instructions, otherwise do not send;
    接收所述知识数据来源的反馈信息,从所述反馈信息中抽取出数据来源形式的形式关键词,根据所述形式关键词确定所述知识数据来源的形式;Receiving feedback information of the source of knowledge data, extracting form keywords of the data source form from the feedback information, and determining the form of the source of knowledge data according to the form keywords;
    获取所述知识数据来源的形式对应的抽取方法,根据所述抽取方法抽取所述知识数据来源的知识数据。Acquiring an extraction method corresponding to the form of the source of knowledge data, and extracting the knowledge data of the source of knowledge data according to the extraction method.
  18. 根据权利要求16所述的存储有计算机可读指令的存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行所述抽取所述知识数据中的实体信息,将所述实体信息进行向量化转化,生成实体数据向量,抽取所述知识数据中的关系信息,将所述关系信息进行向量化转换,生成关系数据向量的步骤时,还执行如下步骤:The storage medium storing computer-readable instructions according to claim 16, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the extraction of the knowledge data The entity information in the vector is converted into vectors to generate entity data vectors, the relationship information in the knowledge data is extracted, and the relationship information is vectorized to generate the relationship data vector. The following steps:
    根据已有知识图谱中的实体数据的数量获取所述实体信息对应的向量维度,根据已有知识图谱中的关系数据的数量获取所述关系信息对应的向量维度;Acquiring the vector dimension corresponding to the entity information according to the amount of entity data in the existing knowledge map, and acquiring the vector dimension corresponding to the relationship information according to the amount of relationship data in the existing knowledge map;
    根据所述实体信息对应的向量维度以及所述知识数据来源的知识数据中包含的实体数据生成所述实体信息对应的向量中每个维度的元素后得到初始实体数据向量;Generating elements of each dimension in the vector corresponding to the entity information according to the vector dimension corresponding to the entity information and the entity data included in the knowledge data from the knowledge data source to obtain an initial entity data vector;
    根据所述关系信息对应向量的维度以及所述知识数据来源的知识数据中包含的关系数据生成所述关系信息对应的向量中每个维度的元素后得到初始关系数据向量;Generating the initial relationship data vector after generating elements of each dimension in the vector corresponding to the relationship information according to the dimension of the vector corresponding to the relationship information and the relationship data contained in the knowledge data from the source of the knowledge data;
    将所述初始实体数据向量归一化处理得到所述实体数据向量;Normalizing the initial entity data vector to obtain the entity data vector;
    将所述初始关系数据向量归一化处理得到所述关系数据向量。Normalizing the initial relationship data vector to obtain the relationship data vector.
  19. 根据权利要求16所述的存储有计算机可读指令的存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行所述获取所述实体数据向量的实体ID标识和所述关系数据向量的关系ID标识,将具有同一实体ID标识的知识数据进行聚类后形成知识数据集,聚类所述知识数据集中具有同一关系ID标识的知识数据后形成知识数据子集的步骤时,还执行如下步骤:The storage medium storing computer-readable instructions according to claim 16, wherein when the computer-readable instructions are executed by one or more processors, the one or more processors execute the acquiring the entity data The entity ID of the vector and the relationship ID of the relational data vector, cluster knowledge data with the same entity ID to form a knowledge data set, and after clustering knowledge data with the same relationship ID in the knowledge data set When forming a step of forming a subset of knowledge data, the following steps are also performed:
    将所述实体数据向量进行转置后与原所述实体数据向量做积,形成一实体信息矩阵,其中,实体信息矩阵的元素为所述知识数据来源的知识数据中包含的实体数据的乘积值;Transpose the entity data vector and integrate with the original entity data vector to form an entity information matrix, wherein the elements of the entity information matrix are product values of the entity data contained in the knowledge data from the knowledge data source ;
    将所述实体信息矩阵进行二值化处理后得到二值化的实体信息矩阵,获取所述二值化的实体信息矩阵的主对角线元素,将所述主对角线元素相加后得到所述实体ID标识;Binarize the entity information matrix to obtain a binarized entity information matrix, obtain the main diagonal elements of the binarized entity information matrix, and add the main diagonal elements to obtain The entity ID identification;
    抽取具有相同实体ID标识的知识数据后按照知识数据生成的时间顺序进行排序,形成一知识数据集;After extracting the knowledge data with the same entity ID, sort them according to the chronological order of knowledge data generation to form a knowledge data set;
    将所述关系数据向量进行转置后与原所述关系数据向量做积,形成一关系信息矩阵,其中,关系信息矩阵的元素为所述知识数据来源的知识数据中包含的关系数据的乘积值;Transpose the relational data vector and integrate with the original relational data vector to form a relational information matrix, wherein the elements of the relational information matrix are product values of the relational data contained in the knowledge data from the knowledge data source ;
    将所述关系信息矩阵进行二值化处理后得到二值化的关系信息矩阵,获取所述二值化的关系信息矩阵的主对角线元素,将所述主对角线元素相加后得到所述关系ID标识;Binarize the relationship information matrix to obtain a binarized relationship information matrix, obtain the main diagonal elements of the binarized relationship information matrix, and add the main diagonal elements to obtain The relationship ID identification;
    遍历所述知识数据集,从所述知识数据集包含的关系信息中抽取出带有所述关系ID标识的知识数据,按照抽取时所述知识数据在所述知识数据集中的位置进行排序,形成一知识数据子集。Traverse the knowledge data set, extract the knowledge data with the relationship ID identified from the relationship information contained in the knowledge data set, and sort by the position of the knowledge data in the knowledge data set at the time of extraction to form A subset of knowledge data.
  20. 根据权利要求16所述的存储有计算机可读指令的存储介质,其中,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行所述计算任意两个所述知识数据子集的信息相似度,在信息相似度大于预设的相 似度阈值的所述知识数据子集之间设立知识图谱的节点的步骤时,还执行如下步骤:The storage medium storing computer readable instructions according to claim 16, wherein when the computer readable instructions are executed by one or more processors, the one or more processors execute any two of the calculations When the information similarity of the knowledge data subset is set up between the knowledge data subset whose information similarity is greater than a preset similarity threshold, a node of a knowledge graph is further executed as follows:
    将所述知识数据子集中的知识数据进行离散化处理,得到所述知识数据子集的离散值;Discretizing the knowledge data in the knowledge data subset to obtain the discrete value of the knowledge data subset;
    将任意两个数据子集对应的离散值入参到相似度函数中进行运算,出参后得到所述任意两个数据子集的信息相似度;The discrete values corresponding to any two data subsets are input into the similarity function for operation, and after the parameters are obtained, the information similarity of the any two data subsets is obtained;
    将所述信息相似度入参到误差修正函数中进行修成后得到修正后的信息相似度,将所述修正后的信息相似度与所述相似度阈值进行比较,若所述修正后的信息相似度大于所述相似度阈值,则在所述知识数据子集之间设立知识图谱的节点,否则不建立。Enter the information similarity into the error correction function and modify it to obtain the corrected information similarity, and compare the corrected information similarity with the similarity threshold, if the corrected information is similar If the degree is greater than the similarity threshold, a node of the knowledge graph is established between the subsets of knowledge data, otherwise it is not established.
PCT/CN2019/118619 2019-01-11 2019-11-15 Knowledge data storage method, device, computer apparatus, and storage medium WO2020143326A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910025164.2 2019-01-11
CN201910025164.2A CN109885692B (en) 2019-01-11 2019-01-11 Knowledge data storage method, apparatus, computer device and storage medium

Publications (1)

Publication Number Publication Date
WO2020143326A1 true WO2020143326A1 (en) 2020-07-16

Family

ID=66925945

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/118619 WO2020143326A1 (en) 2019-01-11 2019-11-15 Knowledge data storage method, device, computer apparatus, and storage medium

Country Status (2)

Country Link
CN (1) CN109885692B (en)
WO (1) WO2020143326A1 (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932174A (en) * 2020-07-28 2020-11-13 中华人民共和国深圳海关 Freight monitoring abnormal information acquisition method, device, server and storage medium
CN112256927A (en) * 2020-10-21 2021-01-22 网易(杭州)网络有限公司 Method and device for processing knowledge graph data based on attribute graph
CN112256884A (en) * 2020-10-23 2021-01-22 国网辽宁省电力有限公司信息通信分公司 Knowledge graph-based data asset library access method and device
CN112306687A (en) * 2020-10-30 2021-02-02 平安数字信息科技(深圳)有限公司 Resource allocation method and device based on knowledge graph, computer equipment and medium
CN112487214A (en) * 2020-12-23 2021-03-12 中译语通科技股份有限公司 Knowledge graph relation extraction method and system based on entity co-occurrence matrix
CN112579789A (en) * 2020-12-04 2021-03-30 珠海格力电器股份有限公司 Equipment fault diagnosis method and device and equipment
CN112612899A (en) * 2020-11-24 2021-04-06 中国传媒大学 Knowledge graph construction method and device, storage medium and electronic equipment
CN112633504A (en) * 2020-12-23 2021-04-09 北京工业大学 Wisdom cloud knowledge service system and method for fruit tree diseases and insect pests based on knowledge graph
CN112650858A (en) * 2020-12-29 2021-04-13 中国平安人寿保险股份有限公司 Method and device for acquiring emergency assistance information, computer equipment and medium
CN112883735A (en) * 2021-02-10 2021-06-01 海尔数字科技(上海)有限公司 Form image structured processing method, device, equipment and storage medium
CN113094506A (en) * 2021-04-14 2021-07-09 每日互动股份有限公司 Early warning method based on relation map, computer equipment and storage medium
CN113312410A (en) * 2021-06-10 2021-08-27 平安证券股份有限公司 Data map construction method, data query method and terminal equipment
CN113590835A (en) * 2021-07-28 2021-11-02 上海致景信息科技有限公司 Method and device for establishing knowledge graph of textile industry data and processor
CN113837028A (en) * 2021-09-03 2021-12-24 广州大学 Road flow analysis method and device based on space-time knowledge graph
CN114840686A (en) * 2022-05-07 2022-08-02 中国电信股份有限公司 Knowledge graph construction method, device and equipment based on metadata and storage medium
CN116523039A (en) * 2023-04-26 2023-08-01 华院计算技术(上海)股份有限公司 Continuous casting knowledge graph generation method and device, storage medium and terminal

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109885692B (en) * 2019-01-11 2023-06-16 平安科技(深圳)有限公司 Knowledge data storage method, apparatus, computer device and storage medium
CN110569372B (en) * 2019-09-20 2022-08-30 四川大学 Construction method of heart disease big data knowledge graph system
CN111026865B (en) * 2019-10-18 2023-07-21 平安科技(深圳)有限公司 Knowledge graph relationship alignment method, device, equipment and storage medium
CN111752943A (en) * 2020-05-19 2020-10-09 北京网思科平科技有限公司 Map relation path positioning method and system
CN112364173B (en) * 2020-10-21 2022-03-18 中国电子科技网络信息安全有限公司 IP address mechanism tracing method based on knowledge graph
CN112328791A (en) * 2020-11-09 2021-02-05 济南大学 Text classification method of Chinese government affair information based on DiTextCNN
CN112380355A (en) * 2020-11-20 2021-02-19 华南理工大学 Method for representing and storing time slot heterogeneous knowledge graph
CN115129719A (en) * 2022-06-28 2022-09-30 深圳市规划和自然资源数据管理中心 Knowledge graph-based qualitative position space range construction method
CN115187153B (en) * 2022-09-14 2022-12-09 杭银消费金融股份有限公司 Data processing method and system applied to business risk tracing
CN116720578B (en) * 2023-05-12 2024-01-23 航天恒星科技有限公司 Storage method of knowledge graph with space-time characteristics
CN117033541B (en) * 2023-10-09 2023-12-19 中南大学 Space-time knowledge graph indexing method and related equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
CN107944012A (en) * 2017-12-08 2018-04-20 北京百度网讯科技有限公司 Knowledge data computing system, method, server and storage medium
CN108804419A (en) * 2018-05-22 2018-11-13 湖南大学 Medicine is sold accurate recommended technology under a kind of line of knowledge based collection of illustrative plates
CN109086347A (en) * 2018-07-13 2018-12-25 武汉尼维智能科技有限公司 A kind of construction method, device and the storage medium of international ocean shipping dangerous cargo knowledge mapping system
US20190005163A1 (en) * 2017-06-29 2019-01-03 International Business Machines Corporation Extracting a knowledge graph from program source code
CN109885692A (en) * 2019-01-11 2019-06-14 平安科技(深圳)有限公司 Knowledge data storage method, device, computer equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105279277A (en) * 2015-11-12 2016-01-27 百度在线网络技术(北京)有限公司 Knowledge data processing method and device
US9727554B2 (en) * 2015-11-24 2017-08-08 International Business Machines Corporation Knowledge-based editor with natural language interface
CN107943874B (en) * 2017-11-13 2019-08-23 平安科技(深圳)有限公司 Knowledge mapping processing method, device, computer equipment and storage medium
CN108595449A (en) * 2017-11-23 2018-09-28 北京科东电力控制系统有限责任公司 The structure and application process of dispatch automated system knowledge mapping
CN107943998B (en) * 2017-12-05 2021-05-11 竹间智能科技(上海)有限公司 Man-machine conversation control system and method based on knowledge graph
CN108345647B (en) * 2018-01-18 2021-12-03 北京邮电大学 Web-based domain knowledge graph construction system and method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190005163A1 (en) * 2017-06-29 2019-01-03 International Business Machines Corporation Extracting a knowledge graph from program source code
CN107665252A (en) * 2017-09-27 2018-02-06 深圳证券信息有限公司 A kind of method and device of creation of knowledge collection of illustrative plates
CN107944012A (en) * 2017-12-08 2018-04-20 北京百度网讯科技有限公司 Knowledge data computing system, method, server and storage medium
CN108804419A (en) * 2018-05-22 2018-11-13 湖南大学 Medicine is sold accurate recommended technology under a kind of line of knowledge based collection of illustrative plates
CN109086347A (en) * 2018-07-13 2018-12-25 武汉尼维智能科技有限公司 A kind of construction method, device and the storage medium of international ocean shipping dangerous cargo knowledge mapping system
CN109885692A (en) * 2019-01-11 2019-06-14 平安科技(深圳)有限公司 Knowledge data storage method, device, computer equipment and storage medium

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932174A (en) * 2020-07-28 2020-11-13 中华人民共和国深圳海关 Freight monitoring abnormal information acquisition method, device, server and storage medium
CN112256927A (en) * 2020-10-21 2021-01-22 网易(杭州)网络有限公司 Method and device for processing knowledge graph data based on attribute graph
CN112256884A (en) * 2020-10-23 2021-01-22 国网辽宁省电力有限公司信息通信分公司 Knowledge graph-based data asset library access method and device
CN112306687A (en) * 2020-10-30 2021-02-02 平安数字信息科技(深圳)有限公司 Resource allocation method and device based on knowledge graph, computer equipment and medium
CN112612899A (en) * 2020-11-24 2021-04-06 中国传媒大学 Knowledge graph construction method and device, storage medium and electronic equipment
CN112579789A (en) * 2020-12-04 2021-03-30 珠海格力电器股份有限公司 Equipment fault diagnosis method and device and equipment
CN112487214A (en) * 2020-12-23 2021-03-12 中译语通科技股份有限公司 Knowledge graph relation extraction method and system based on entity co-occurrence matrix
CN112633504A (en) * 2020-12-23 2021-04-09 北京工业大学 Wisdom cloud knowledge service system and method for fruit tree diseases and insect pests based on knowledge graph
CN112650858B (en) * 2020-12-29 2023-09-26 中国平安人寿保险股份有限公司 Emergency assistance information acquisition method and device, computer equipment and medium
CN112650858A (en) * 2020-12-29 2021-04-13 中国平安人寿保险股份有限公司 Method and device for acquiring emergency assistance information, computer equipment and medium
CN112883735A (en) * 2021-02-10 2021-06-01 海尔数字科技(上海)有限公司 Form image structured processing method, device, equipment and storage medium
CN112883735B (en) * 2021-02-10 2024-01-12 卡奥斯数字科技(上海)有限公司 Method, device, equipment and storage medium for structured processing of form image
CN113094506A (en) * 2021-04-14 2021-07-09 每日互动股份有限公司 Early warning method based on relation map, computer equipment and storage medium
CN113094506B (en) * 2021-04-14 2023-08-18 每日互动股份有限公司 Early warning method based on relational graph, computer equipment and storage medium
CN113312410B (en) * 2021-06-10 2023-11-21 平安证券股份有限公司 Data map construction method, data query method and terminal equipment
CN113312410A (en) * 2021-06-10 2021-08-27 平安证券股份有限公司 Data map construction method, data query method and terminal equipment
CN113590835A (en) * 2021-07-28 2021-11-02 上海致景信息科技有限公司 Method and device for establishing knowledge graph of textile industry data and processor
CN113837028A (en) * 2021-09-03 2021-12-24 广州大学 Road flow analysis method and device based on space-time knowledge graph
CN114840686A (en) * 2022-05-07 2022-08-02 中国电信股份有限公司 Knowledge graph construction method, device and equipment based on metadata and storage medium
CN114840686B (en) * 2022-05-07 2024-01-02 中国电信股份有限公司 Knowledge graph construction method, device, equipment and storage medium based on metadata
CN116523039A (en) * 2023-04-26 2023-08-01 华院计算技术(上海)股份有限公司 Continuous casting knowledge graph generation method and device, storage medium and terminal
CN116523039B (en) * 2023-04-26 2024-02-09 华院计算技术(上海)股份有限公司 Continuous casting knowledge graph generation method and device, storage medium and terminal

Also Published As

Publication number Publication date
CN109885692B (en) 2023-06-16
CN109885692A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
WO2020143326A1 (en) Knowledge data storage method, device, computer apparatus, and storage medium
WO2020182019A1 (en) Image search method, apparatus, device, and computer-readable storage medium
WO2020143184A1 (en) Knowledge fusion method and apparatus, computer device, and storage medium
Dong et al. From data fusion to knowledge fusion
WO2017118427A1 (en) Webpage training method and device, and search intention identification method and device
US20150331936A1 (en) Method and system for extracting a product and classifying text-based electronic documents
CN108647322B (en) Method for identifying similarity of mass Web text information based on word network
US11556590B2 (en) Search systems and methods utilizing search based user clustering
CN112115232A (en) Data error correction method and device and server
US11886515B2 (en) Hierarchical clustering on graphs for taxonomy extraction and applications thereof
Song et al. Improving neural named entity recognition with gazetteers
US20220101057A1 (en) Systems and methods for tagging datasets using models arranged in a series of nodes
CN110019474B (en) Automatic synonymy data association method and device in heterogeneous database and electronic equipment
CN116127090A (en) Aviation system knowledge graph construction method based on fusion and semi-supervision information extraction
CN113971210B (en) Data dictionary generation method and device, electronic equipment and storage medium
WO2019163642A1 (en) Summary evaluation device, method, program, and storage medium
US10866944B2 (en) Reconciled data storage system
US10614031B1 (en) Systems and methods for indexing and mapping data sets using feature matrices
Liu et al. A framework for image dark data assessment
Zandieh et al. Clustering data text based on semantic
Ramirez et al. Natural language inference over tables: Enabling explainable data exploration on data lakes
Shao et al. Web and Big Data: Third International Joint Conference, APWeb-WAIM 2019, Chengdu, China, August 1–3, 2019, Proceedings, Part I
Ajeissh et al. An adaptive distributed approach of a self organizing map model for document clustering using ring topology
Manzoor et al. Toward a New Paradigm for Author Name Disambiguation
JP5569908B2 (en) Analogue device, analogy method and program

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19908527

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19908527

Country of ref document: EP

Kind code of ref document: A1