CN110543570B - Knowledge graph storage method based on Hash addressing - Google Patents
Knowledge graph storage method based on Hash addressing Download PDFInfo
- Publication number
- CN110543570B CN110543570B CN201910689943.2A CN201910689943A CN110543570B CN 110543570 B CN110543570 B CN 110543570B CN 201910689943 A CN201910689943 A CN 201910689943A CN 110543570 B CN110543570 B CN 110543570B
- Authority
- CN
- China
- Prior art keywords
- entity
- link
- hash
- resource
- linked
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/31—Indexing; Data structures therefor; Storage structures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/367—Ontology
Abstract
The invention provides a knowledge graph storage method based on Hash addressing, which comprises the following steps: acquiring a knowledge graph; storing the knowledge graph; wherein the data structure of the entities in the knowledge-graph comprises data and a linked array; the link array comprises one or more links, each link comprising a link name and a Hash of an entity to which the entity is linked; the data is information of the entity. The invention realizes the knowledge graph storage mode based on Hash addressing, improves the updating and inquiring speed, avoids the redundant storage of resources and greatly reduces the storage consumption.
Description
Technical Field
The invention belongs to the technical field of distributed knowledge organization, and particularly relates to a knowledge graph storage method based on Hash addressing.
Background
With the development and application of artificial intelligence technology, knowledge maps are gaining attention in both academic and industrial circles, and knowledge maps are applied in the fields of intelligent search, intelligent question answering, personalized recommendation, content distribution and the like. The knowledge graph aims to describe various entities and relations thereof existing in the real world, a huge semantic network graph is formed, nodes represent concepts or examples, and edges are formed by relations or attributes.
Knowledge maps generally employ relational databases, graph databases, key value databases, document databases, or the like as the most basic storage engines. The graph database has the advantages that the structure of the knowledge graph can be visually represented, the nodes in the graph represent the entities of the knowledge graph, and the edges in the graph represent the entity relationship of the knowledge graph; the method has the disadvantages that the updating of the graph database is complex, the data updating and inquiring speed is slow, and the operation speed of the overlarge nodes, namely the nodes with a large number of edges, is greatly reduced. As the data volume increases, the more complex the relationship, the more geometrically the relationship between the data that the knowledge-graph needs to process increases with the data volume.
In summary, the conventional graph database has the problems of slow updating and query speed, repeated resources and the like. Therefore, there is a need for a new, more efficient way to represent and address a knowledge-graph.
Disclosure of Invention
In order to overcome the problem of slow update and query speed of the existing knowledge graph storage method or at least partially solve the problem, embodiments of the present invention provide a knowledge graph storage method based on hash addressing.
According to a first aspect of the embodiments of the present invention, there is provided a method for storing a knowledge graph based on hash addressing, including:
acquiring a knowledge graph;
storing the knowledge graph;
the data structures of the entities in the knowledge graph comprise data and a link array;
the link array comprises one or more links, and each link comprises a link name and a Hash of an entity linked to the entity;
the data is information of the entity.
According to a second aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor calls the program instruction to execute the method for storing a knowledge graph based on hash addressing according to any one of the various possible implementations of the first aspect.
The embodiment of the invention provides a knowledge graph storage method based on Hash addressing, which stores a knowledge graph of a specific data structure, wherein the data structure of each entity in the knowledge graph comprises data and a link array, each link array comprises one or more links, each link comprises a link name and Hash of the entity linked to the entity, and the information of the entity is stored in the data, so that the knowledge graph storage mode based on Hash addressing is realized; on the other hand, the stored entity can be quickly searched through the Hash value of the entity in the data structure, and the updating and inquiring speed is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic overall flow chart of a knowledge graph storage method based on hash addressing according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of resource connection of entities in a knowledge graph storage method based on Hash addressing according to the present invention;
FIG. 3 is a schematic diagram of adding links and entities in the method for storing a knowledge graph based on Hash addressing according to the present invention;
FIG. 4 is a schematic diagram of link deletion in a hash addressing-based knowledge-graph storage method according to the present invention;
FIG. 5 is a diagram illustrating an update entity in a method for storing a knowledge graph based on Hash addressing according to the present invention;
FIG. 6 is a schematic diagram of a rename entity storage mechanism in the hash addressing-based knowledge-graph storage method provided in the present invention;
fig. 7 is a schematic view of an overall structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
In an embodiment of the present invention, a method for storing a knowledge graph based on hash addressing is provided, and fig. 1 is a schematic overall flow chart of the method for storing a knowledge graph based on hash addressing provided in the embodiment of the present invention, where the method includes: s101, acquiring a knowledge graph; s102, storing the knowledge graph; the data structures of the entities in the knowledge graph comprise data and a link array; the link array comprises one or more links, and each link comprises a link name and a Hash of an entity linked to the entity; the data is information of the entity.
Wherein, the entity is described by adopting the following data structure:
data: information representing entities, such as resource content or non-link attributes;
links: an array of Link data structures, the entities being linked to other entities by links;
the Link data structure contains two fields:
name: the name of Link;
hash: the Hash of the entity to which the Link is linked.
In the prior art, the knowledge graph is a complex nonlinear directed graph structure, and the embodiment represents the data structure of the knowledge graph as a linear array structure, so that less storage space is occupied when physical storage is performed according to the data structure of the knowledge graph.
In the embodiment, the knowledge graph of a specific data structure is stored, the data structure of each entity in the knowledge graph comprises data and a link array, each link array comprises one or more links, each link comprises a link name and a Hash of the entity linked to the entity, and the information of the entity is stored in the data, so that a knowledge graph storage mode based on Hash addressing is realized, on one hand, a complex nonlinear directed graph structure of the knowledge graph is converted into a linear array structure, redundant storage of resources is avoided, and storage consumption is greatly reduced; on the other hand, the stored entity can be quickly searched through the Hash value of the entity in the data structure, and the updating and inquiring speed is improved.
On the basis of the above embodiments, the entities in the knowledge graph in this embodiment include resource entities and non-resource entities; if the linked entity is a resource entity, the Hash of the linked entity is the Hash of the content of the linked entity; if the linked entity is a non-resource entity, the Hash of the linked entity is the Hash of the linked entity name; the information of the resource entity is the resource content of the resource entity and is non-structural data; the information of the non-resource entity is the non-link attribute of the non-resource entity and is an array structure; each element in the array structure comprises a key of the unlinked attribute and a value of the unlinked attribute, wherein the key of the unlinked attribute is a hash value of the unlinked attribute; the link name is a hash value of a relationship between the entity and an entity to which the entity is linked.
In the embodiment, entities in the knowledge graph are divided into resource entities and non-resource entities, wherein the resource entities include picture resources, web page resources and the like. A non-resource entity is a concept or instance, such as a human, a person and a place, and the like. The present embodiment describes a data structure of an entity in a knowledge-graph. And representing the knowledge graph as a directed graph according to the characteristic that the relation in the knowledge graph is cyclic. Since the expression mode of the knowledge graph is subject-attribute-object, in this embodiment, the subject is referred to as a link departure node, the relationship is referred to as a link, and the object is referred to as a link-to node.
The Data structures of the resource entity and the non-resource entity comprise Data and linked arrays Links. Describing the resource entities by adopting the following data structure:
data: unstructured data representing resource content of the resource entity;
links: an array of Link data structures, the entities being linked to other entities by links;
the Link data structure contains two fields:
name: the name of Link;
hash: the Hash of the entity to which the Link is linked.
For the Data structure of the non-resource entity, the Links domain is expanded, and the Data domain is redefined, so that the Data domain is more suitable for the entity and relationship representation of the knowledge graph.
The following parameters are first defined:
< root > -the entity that issued the link;
< name > -link name;
< ref > -the entity to be linked.
A non-resource entity is a data structure that contains two fields:
links is an array of Link data structures through which entities are linked to other entities.
The Link data structure contains two fields:
name: the name of Link is named by the hash value of the relation between root and ref, namely the relation of ref relative to root;
hash: the hash value of ref. When ref is a non-resource entity, the hash value is obtained by directly hashing the ref name; when ref is a resource entity, the hash value is obtained by hashing the content of ref.
In order to adapt to the entity and relation expression of the knowledge graph, the relation between the Name and the Hash in one Link can be one-to-one, one-to-many, many-to-one or many-to-many.
Data is an array of Data structure, and the non-resource entity can store some non-linked attributes through the Data;
the data structure contains two fields:
key: the key code of the non-link attribute is directly obtained by hashing the non-link attribute;
value is a Value of a string or number type that expresses the values of some non-linked attributes of a non-resource entity.
The data structure of the entities in the knowledge graph is as follows:
in one example as follows, the data structure of a human entity named Aaron is described as follows:
the above example defines some of the link and non-link properties of Aaron's human entities. Wherein the link attribute includes:
a web page linked to an entity with a relationship name of "Wikipedia" and "Article", i.e., a Wikipedia attribute, which is also a web page attribute: wikipedia pages of Aaron entries;
an entity linked to a relationship name "Wikiquote" and "Article", that is, a web page with a wiki corpus attribute, which is also a web page attribute: a wiki bibliography page of Aaron entries;
a class entity human linked to a relationship name "Category";
a male with a generic entity linked to a relationship name "gender";
linking to two entity-like dramas and dramatives with relation name 'profession';
linking to a photo entity with a relationship name "photo";
the non-link attributes include:
book index code of national Ladeswia library: 000057405, respectively;
russian national library number: 000002833.
table 1 and fig. 2 are visual representations of the attributes of the above example, where table 1 is a visual representation of the above example, fig. 2 converts the link and the attribute into a hash value, and then details the naming and linking manner of the entity in this embodiment based on fig. 2.
Naming mode of the entity: the personal entity Aaron is directly subjected to Hash operation and stored in the form of Aaron character strings, the requirement of inquiring according to entity names in a knowledge graph is met, the retrieval speed is improved, the representation range is large, the collision is small, and the webpage resource entities linked to the personal entity Aaron are as follows: the web page resource file is named in a Hash mode according to the web page content, and the data domain is expanded into the data domain of the web page to store the web page resource content. In the file storage and naming mode in this embodiment, only one resource with the same content is reserved, so that redundancy removal of the resource is realized. Other resource entities such as Aaron's personal entities, such as pictures, etc.
Naming mode of the link: as shown in fig. 2, Aaron is a personal entity with a link named "wikipedia", and the link is named directly by the hash value of the string of "wikipedia", which has the following advantages: the storage formatting is equal, for example, the storage length is equal, the query is fast, and each attribute can directly search the corresponding storage node according to the hash value. Such as profession, which also corresponds to a class entity. In addition, the link and data entities in the data structure in this embodiment do not have defined lengths, and can be dynamically extended, which is suitable for the following scenarios: the same attribute contains a plurality of attribute entities, for example, Aaron is both a drama and a screenwriter, and the Wikipedia webpage resource is a resource of the Wikipedia attribute and a resource of the Archicle attribute. The Data field of the resource file is expanded into unstructured Data, so that the resource entity is conveniently stored.
TABLE 1 visual representation of data structures for Aaron entities
According to the embodiment, the entities in the knowledge graph are divided into the resource entities and the non-resource entities, then the data structures of the resource entities and the non-resource entities are described in different modes according to the relation between the entities in the knowledge graph, and the resource entities and the non-resource entities are stored according to the described data structures, so that the knowledge graph storage mode based on Hash addressing is realized, the updating and inquiring speed is improved, the redundant storage of resources is avoided, and the storage consumption is greatly reduced.
On the basis of the above embodiment, the present embodiment further includes adding an entity and a link in the knowledge-graph by the following steps: judging whether an entity to be linked to the entity to be added with the link exists in the knowledge graph or not, and if not, creating the entity to be linked to; if so, taking the hash value of the entity to be added with the link as the input of a Kademlia algorithm for hash addressing to obtain the entity to be added with the link; combining the Hash of the entity to be linked and the Hash of the name of the link to be added into a link type object; and adding the object of the link type at the end of the link array of the entity to which the link is to be added.
Specifically, in order to adapt to the dynamics of the entities and the relationships in the knowledge graph, the entities and the relationships in the knowledge graph need to be added, deleted and changed. FIG. 3 is a schematic diagram of an entity and a link adding method, which is applied to the addition of the link and the entity. When entities and relationships need to be added to the knowledge-graph, the required parameters are as follows:
< root > -entity hash to modify;
< name > -the name of the link to be created;
< ref > -the entity to be linked.
The present embodiment creates and links a new entity based on an existing entity. Since the hash value of the non-resource entity is calculated according to the name and the hash value of the resource entity is calculated according to the content data, the addition relation does not need to modify the hash value of the upper entity. If the entity to be linked does not exist, the entity needs to be created first, the name of the entity is a parameter which must be created, and Data and Links arrays are optional. And taking the value of the parameter < root > as the input of the Kademlia algorithm, performing hash addressing, and finding out the source entity needing to add the link.
Hashing the parameter < name > yields the name of the link, meaning the relationship of the entity to be linked with respect to the source entity of the link, the purpose of hashing is to facilitate relationship-based name addressing. The parameters < ref > and < name > are combined into a Link type object, which is appended to the end of the Links array of the < root > object to form a new Link.
On the basis of the foregoing embodiment, in this embodiment, the hash value of the entity to which the link is to be added is used as an input of a Kademlia algorithm for hash addressing, and the step of acquiring the entity to which the link is to be added specifically includes: searching whether the entity of the link to be added is stored in the node initiating the search request, if so, returning the ID of the node initiating the search request; if not, returning a preset number of nodes with the key codes nearest to the key codes of the nodes initiating the search requests, and sending the search requests to the preset number of nodes; the node receiving the search request checks whether the node stores the entity to be added with the link, and if so, returns the node ID of the node; if not, returning a preset number of nodes with the key codes closest to the key codes of the nodes in the K-bucket corresponding to the nodes in the K-bucket; judging whether the node initiating the search request receives the node ID, if so, finishing the search; if not, after the node initiating the search request receives the returned preset number of nodes, sending the search request to the nodes which do not send the search request in the preset number of nodes again until the node ID is obtained; and acquiring the entity of the link to be added according to the node ID, and caching the entity of the link to be added on the node without returning the node ID.
Specifically, the step of taking the value of the parameter < root > as the input of the Kademlia algorithm, performing hash addressing, and finding the source entity needing to add the link is as follows:
firstly, an initiator can search whether the initiator stores a < root > entity, if so, the initiator directly returns a NODE ID, otherwise, the initiator returns a NODE with K Key values closest to the Key values of the initiator, and initiates a FIND _ NODE request, namely a NODE searching request, to the K NODEs.
And secondly, the NODE receiving the FIND _ NODE request checks whether the NODE stores a < root > entity, if the NODE directly returns the ID of the NODE, and if the NODE does not store the < root > entity, the NODE returns K Key values closest to the Key values in a corresponding K-bucket.
And thirdly, if the initiator receives the NODE ID, finishing the searching process, otherwise, after receiving the returned NODEs, updating the result list of the initiator, and selecting the NODEs which do not send the request from the returned K nearest NODEs again to initiate the FIND _ NODE request again.
And fourthly, repeatedly executing the first step to the third step until the node ID is acquired or the node which is closer to the root than the K nodes currently known by the initiator cannot be acquired, wherein the node of the storage source entity is not found.
If the node storing the source entity is finally found, the < root > entity is cached on the nearest node which does not return the correct result, so that the query speed can be increased when the same < root > value is queried next time.
On the basis of the above embodiment, the present embodiment further includes deleting links from the knowledge-graph by: taking the hash value of the entity to be modified as the input of a Kademlia algorithm for hash addressing, and acquiring the stored entity to be modified; and deleting the specified link item from the link array of the entity to be modified.
Specifically, the deletion method in the present embodiment is applied to deletion of an entity and a link. Knowledge graph representation methods are content or name-based addressing, and there may be multiple relationships for an entity, represented by links linked to the entity. If a link is deleted and an entity is bound to be deleted, deleting one link results in the entity not being accessible through other links. The present embodiment separates deletion of a link from deletion of an entity. The method of deleting a link is shown in fig. 4. The parameters required for the method of deleting a link are as follows:
< root > -entity to modify
< link > -Link to be removed
Taking the value of the parameter < root > as the input of the Kademlia algorithm, performing hash addressing, and finding the source entity needing to delete the link, wherein the specific addressing method is as described above. And taking root as a source of the link, and deleting the item corresponding to the link from the Links array of the entity, namely, breaking the link.
On the basis of the above embodiment, the present embodiment further includes deleting an entity from the knowledge-graph by: performing authority authentication on a user initiating a deletion request; and if the user initiating the deletion request is the user issuing the entity to be deleted, deleting the entity to be deleted.
Specifically, each entity binds the identity of the publisher when publishing to the network, and the owner of the entity has the right to delete the entity after the authority authentication. It should be noted that, for an entity, only its publisher has the right to delete, even if all links pointing to the entity are deleted. If the issuer does not initiate the delete entity request, the entity still exists in the network.
On the basis of the above embodiment, the embodiment further includes updating the entities in the knowledge-graph by the following steps: creating a new entity according to the updated content of the entity to be updated; creating a link initiated by the entity to be updated to point to the new entity, and defining the name of the link pointing to the new entity as an update; and when the new entity is accessed, deleting the link between the upper layer entity of the entity to be updated and the entity to be updated, and creating the link between the upper layer entity and the new entity.
Specifically, the method for updating the entity includes two parts of adding the entity and deleting the link, as shown in fig. 5. Since the entities in the knowledge graph are not directly modifiable, updating an entity requires creating a link initiated by the original entity to point to the new entity, the name of the link is defined as "update", thereby establishing the relationship between the new entity and the original system. When the new entity has the requirement of being accessed, the relationship between the upper entity and the original entity is deleted, and the relationship between the upper entity and the new entity is created by using the same parameters. The parameters required to update the entity are as follows:
< root > -the entity to be updated;
< content > -updated content.
First, a new entity is created as a link target entity according to < content >. Taking the value of the parameter < root > as the input of the Kademlia algorithm, performing hash addressing, finding the node where the entity needing to be updated is located, and executing the step of adding the link on the node. When the new entity has access requirement, the link between the upper layer entity and the original entity is deleted according to the method, and the link between the upper layer entity and the new entity is created.
On the basis of the above embodiment, the step of deleting the entity from the knowledge-graph in this embodiment further includes: when a user sends a request for accessing a resource entity, if the resource entity is checked to be in a failure state, returning failure information of the resource entity, and deleting a link linked to the resource entity; and/or scanning the links of all entities in the knowledge graph at preset time intervals, and deleting the link of the invalid entity if the entity linked by any entity is invalid.
Specifically, the garbage collection method in this embodiment is applied to the collection and destruction of empty links. The method is suitable for deleting a certain resource node by a publisher, and Links of the resource entity are still stored in Links arrays of a plurality of entities in the knowledge graph. Since all links owning the entity cannot be obtained directly through the deleted entity and all empty links are deleted, the embodiment proposes a garbage collection mechanism, that is, when a user accesses the resource node, and finds that the resource node is invalid, a resource invalid message is returned, and the link in the linked starting entity is deleted, and at this time, the linked starting entity is known. And periodically scan all the entity's links to remove empty links that have failed in order to reclaim storage space. Combining these two approaches, a reclamation mechanism is implemented for invalid links pointing to deleted resources.
On the basis of the above embodiments, the present embodiment further includes: if entities with the same name exist in the knowledge graph, maintaining a directory for the hash value with the same name; the data structure of the directory is the same as that of the entity, the name of each link in the data structure of the directory is the characteristic attribute of the corresponding entity with the same name, and the Hash in each link is the Hash of the splicing result of the corresponding entity and the characteristic attribute of the corresponding entity.
Specifically, the knowledge graph is named by the hash value of the name of an entity, and in nature, many entities with duplicate names exist, such as several persons with the same name, and an apple may refer to both fruits and apple companies, and a mechanism is needed to solve the problem. In this embodiment, the entities with the same name are still addressed according to the hash value of the name, but a directory is maintained on the node corresponding to the hash value of the name. The data structure of the directory is the same as that of a common entity, and the directory is used for storing the corresponding relation between the characteristics of the entity and the hash value. As shown in fig. 6, the Name of the link is some characteristic attributes, and the Hash value is a Hash value obtained by splicing the corresponding entity and some characteristic attributes thereof. The blue and white porcelain is a porcelain and also a song name, and Hash values in the link are named by hashing the two entity attributes and the entity names under the directory of the blue and white porcelain. During searching, the entity to be queried is judged according to the context, and then the query is performed again to obtain a return result.
This embodiment provides an electronic device, and fig. 7 is a schematic diagram of an overall structure of the electronic device according to the embodiment of the present invention, where the electronic device includes: at least one processor 701, at least one memory 702, and a bus 703; wherein the content of the first and second substances,
the processor 701 and the memory 702 communicate with each other via a bus 703;
the memory 702 stores program instructions executable by the processor 701, and the processor calls the program instructions to perform the methods provided by the method embodiments, for example, the methods include: acquiring a knowledge graph; storing the knowledge graph; wherein the data structure of the entities in the knowledge-graph comprises data and a linked array; the link array comprises one or more links, each link comprising a link name and a Hash of an entity to which the entity is linked; the data is information of the entity.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: acquiring a knowledge graph; storing the knowledge graph; wherein the data structure of the entities in the knowledge-graph comprises data and a linked array; the link array comprises one or more links, each link comprising a link name and a Hash of an entity to which the entity is linked; the data is information of the entity.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (9)
1. A knowledge graph storage method based on Hash addressing is characterized by comprising the following steps:
acquiring a knowledge graph;
storing the knowledge graph;
wherein the data structure of the entities in the knowledge-graph comprises data and a linked array;
the link array comprises one or more links, each link comprising a link name and a Hash of an entity to which the entity is linked;
the data is information of the entity;
entities in the knowledge graph comprise resource entities and non-resource entities;
if the linked entity is a resource entity, the Hash of the linked entity is the Hash of the content of the linked entity; the resource entity comprises a picture resource and a webpage resource;
if the linked entity is a non-resource entity, the Hash of the linked entity is the Hash of the linked entity name; wherein the non-resource entities include a human, a person, and a place;
the information of the resource entity is the resource content of the resource entity and is non-structural data;
the information of the non-resource entity is the non-link attribute of the non-resource entity and is an array structure;
each element in the array structure comprises a key of the unlinked attribute and a value of the unlinked attribute, wherein the key of the unlinked attribute is a hash value of the unlinked attribute;
the link name is a hash value of a relationship between the entity and an entity to which the entity is linked.
2. The method of hash-addressing-based knowledgegraph storage according to claim 1, further comprising adding entities and links in the knowledgegraph by:
judging whether an entity to be linked to the entity to be added with the link exists in the knowledge graph or not, and if not, creating the entity to be linked to;
if so, addressing by taking the hash value of the entity to be added with the link as input to obtain the entity to be added with the link;
combining the Hash of the entity to be linked and the Hash of the name of the link to be added into a link type object;
and adding the object of the link type at the end of the link array of the entity to which the link is to be added.
3. The hash addressing-based knowledge graph storage method according to claim 2, wherein the hash value of the entity to which the link is to be added is used as an input for addressing, and the step of obtaining the entity to which the link is to be added specifically comprises:
searching whether the entity of the link to be added is stored in the node initiating the search request, if so, returning the ID of the node initiating the search request; if not, returning a preset number of nodes with the key codes nearest to the key codes of the nodes initiating the search requests, and sending the search requests to the preset number of nodes;
the node receiving the search request checks whether the node stores the entity to be added with the link, and if so, returns the node ID of the node; if not, returning a preset number of nodes with the key codes closest to the key codes of the nodes in the K-bucket corresponding to the nodes in the K-bucket;
judging whether the node initiating the search request receives the node ID, if so, finishing the search; if not, after the node initiating the search request receives the returned preset number of nodes, sending the search request to the nodes which do not send the search request in the preset number of nodes again until the node ID is obtained;
and acquiring the entity of the link to be added according to the node ID, and caching the entity of the link to be added on the node without returning the node ID.
4. The hash-addressing-based knowledgegraph storage method of claim 1, further comprising removing links from the knowledgegraph by:
addressing by taking the hash value of the entity to be modified as input, and acquiring the stored entity to be modified;
and deleting the specified link item from the link array of the entity to be modified.
5. The hash-addressing-based knowledgegraph storage method of claim 1, further comprising removing entities from the knowledgegraph by:
performing authority authentication on a user initiating a deletion request;
and if the user initiating the deletion request is the user issuing the entity to be deleted, deleting the entity to be deleted.
6. The method of hash-addressing-based knowledgegraph storage according to claim 1, further comprising updating entities in the knowledgegraph by:
creating a new entity according to the updated content of the entity to be updated;
creating a link initiated by the entity to be updated to point to the new entity, and defining the name of the link pointing to the new entity as an update;
and when the new entity is accessed, deleting the link between the upper layer entity of the entity to be updated and the entity to be updated, and creating the link between the upper layer entity and the new entity.
7. The hash-addressing-based knowledgegraph storage method of claim 5, wherein the step of removing entities from the knowledgegraph further comprises:
when a user sends a request for accessing a resource entity, if the resource entity is checked to be in a failure state, returning failure information of the resource entity, and deleting a link linked to the resource entity; and/or the presence of a gas in the gas,
and scanning the links of all entities in the knowledge graph every other preset time length, and deleting the link of the invalid entity if the entity linked by any entity is invalid.
8. The method of any of claims 1-7, further comprising:
if entities with the same name exist in the knowledge graph, maintaining a directory for the hash value with the same name; the data structure of the directory is the same as that of the entity, the name of each link in the data structure of the directory is the characteristic attribute of the corresponding entity with the same name, and the Hash in each link is the Hash of the splicing result of the corresponding entity and the characteristic attribute of the corresponding entity.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method of hash-addressing-based knowledge-graph storage of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910689943.2A CN110543570B (en) | 2019-07-29 | 2019-07-29 | Knowledge graph storage method based on Hash addressing |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910689943.2A CN110543570B (en) | 2019-07-29 | 2019-07-29 | Knowledge graph storage method based on Hash addressing |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110543570A CN110543570A (en) | 2019-12-06 |
CN110543570B true CN110543570B (en) | 2022-03-11 |
Family
ID=68709931
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910689943.2A Active CN110543570B (en) | 2019-07-29 | 2019-07-29 | Knowledge graph storage method based on Hash addressing |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110543570B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112015986B (en) * | 2020-08-26 | 2024-01-26 | 北京奇艺世纪科技有限公司 | Data pushing method, device, electronic equipment and computer readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102194002A (en) * | 2011-05-25 | 2011-09-21 | 中兴通讯股份有限公司 | Table entry adding, deleting and searching method of hash table and hash table storage device |
CN104462501A (en) * | 2014-12-19 | 2015-03-25 | 北京奇虎科技有限公司 | Knowledge graph construction method and device based on structural data |
CN107491555A (en) * | 2017-09-01 | 2017-12-19 | 北京纽伦智能科技有限公司 | Knowledge mapping construction method and system |
CN108600321A (en) * | 2018-03-26 | 2018-09-28 | 中国科学院计算技术研究所 | A kind of diagram data storage method and system based on distributed memory cloud |
-
2019
- 2019-07-29 CN CN201910689943.2A patent/CN110543570B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102194002A (en) * | 2011-05-25 | 2011-09-21 | 中兴通讯股份有限公司 | Table entry adding, deleting and searching method of hash table and hash table storage device |
CN104462501A (en) * | 2014-12-19 | 2015-03-25 | 北京奇虎科技有限公司 | Knowledge graph construction method and device based on structural data |
CN107491555A (en) * | 2017-09-01 | 2017-12-19 | 北京纽伦智能科技有限公司 | Knowledge mapping construction method and system |
CN108600321A (en) * | 2018-03-26 | 2018-09-28 | 中国科学院计算技术研究所 | A kind of diagram data storage method and system based on distributed memory cloud |
Also Published As
Publication number | Publication date |
---|---|
CN110543570A (en) | 2019-12-06 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
JP7113040B2 (en) | Versioned hierarchical data structure for distributed data stores | |
JP4739455B2 (en) | Document management method | |
US7562087B2 (en) | Method and system for processing directory operations | |
JP6123339B2 (en) | Database, apparatus and method for storing encoded triples | |
US8250081B2 (en) | Resource access filtering system and database structure for use therewith | |
US7895176B2 (en) | Entry group tags | |
CN110019540B (en) | Implementation method, display method, device and equipment of enterprise atlas | |
US9183267B2 (en) | Linked databases | |
US9122769B2 (en) | Method and system for processing information of a stream of information | |
US8812435B1 (en) | Learning objects and facts from documents | |
US10216716B2 (en) | Method and system for electronic resource annotation including proposing tags | |
US11030242B1 (en) | Indexing and querying semi-structured documents using a key-value store | |
CN110321325A (en) | File inode lookup method, terminal, server, system and storage medium | |
CN113010476B (en) | Metadata searching method, device, equipment and computer readable storage medium | |
CN107526762A (en) | Service end, multi-data source searching method and system | |
US9600597B2 (en) | Processing structured documents stored in a database | |
CN1279468C (en) | Method and device for mapping file sentence | |
CN110543570B (en) | Knowledge graph storage method based on Hash addressing | |
US9020977B1 (en) | Managing multiprotocol directories | |
US20170083635A1 (en) | Computer Implemented Systems and Methods for Dynamic and Heuristically-generated Search Returns of Particular Relevance | |
US7689584B2 (en) | Hybrid groups | |
Aslam et al. | SPedia: a central hub for the linked open data of scientific publications | |
CN108062277B (en) | Electronic credential data access method, device and system | |
CN115983965A (en) | Method and system for realizing bank risk strategy consanguinity analysis | |
US20230062227A1 (en) | Index generation and use with indeterminate ingestion patterns |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |