CN110543570B - Knowledge graph storage method based on Hash addressing - Google Patents

Knowledge graph storage method based on Hash addressing Download PDF

Info

Publication number
CN110543570B
CN110543570B CN201910689943.2A CN201910689943A CN110543570B CN 110543570 B CN110543570 B CN 110543570B CN 201910689943 A CN201910689943 A CN 201910689943A CN 110543570 B CN110543570 B CN 110543570B
Authority
CN
China
Prior art keywords
entity
link
hash
resource
linked
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910689943.2A
Other languages
Chinese (zh)
Other versions
CN110543570A (en
Inventor
商彦磊
乔秀全
刘舒
何明会
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910689943.2A priority Critical patent/CN110543570B/en
Publication of CN110543570A publication Critical patent/CN110543570A/en
Application granted granted Critical
Publication of CN110543570B publication Critical patent/CN110543570B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology

Abstract

The invention provides a knowledge graph storage method based on Hash addressing, which comprises the following steps: acquiring a knowledge graph; storing the knowledge graph; wherein the data structure of the entities in the knowledge-graph comprises data and a linked array; the link array comprises one or more links, each link comprising a link name and a Hash of an entity to which the entity is linked; the data is information of the entity. The invention realizes the knowledge graph storage mode based on Hash addressing, improves the updating and inquiring speed, avoids the redundant storage of resources and greatly reduces the storage consumption.

Description

Knowledge graph storage method based on Hash addressing
Technical Field
The invention belongs to the technical field of distributed knowledge organization, and particularly relates to a knowledge graph storage method based on Hash addressing.
Background
With the development and application of artificial intelligence technology, knowledge maps are gaining attention in both academic and industrial circles, and knowledge maps are applied in the fields of intelligent search, intelligent question answering, personalized recommendation, content distribution and the like. The knowledge graph aims to describe various entities and relations thereof existing in the real world, a huge semantic network graph is formed, nodes represent concepts or examples, and edges are formed by relations or attributes.
Knowledge maps generally employ relational databases, graph databases, key value databases, document databases, or the like as the most basic storage engines. The graph database has the advantages that the structure of the knowledge graph can be visually represented, the nodes in the graph represent the entities of the knowledge graph, and the edges in the graph represent the entity relationship of the knowledge graph; the method has the disadvantages that the updating of the graph database is complex, the data updating and inquiring speed is slow, and the operation speed of the overlarge nodes, namely the nodes with a large number of edges, is greatly reduced. As the data volume increases, the more complex the relationship, the more geometrically the relationship between the data that the knowledge-graph needs to process increases with the data volume.
In summary, the conventional graph database has the problems of slow updating and query speed, repeated resources and the like. Therefore, there is a need for a new, more efficient way to represent and address a knowledge-graph.
Disclosure of Invention
In order to overcome the problem of slow update and query speed of the existing knowledge graph storage method or at least partially solve the problem, embodiments of the present invention provide a knowledge graph storage method based on hash addressing.
According to a first aspect of the embodiments of the present invention, there is provided a method for storing a knowledge graph based on hash addressing, including:
acquiring a knowledge graph;
storing the knowledge graph;
the data structures of the entities in the knowledge graph comprise data and a link array;
the link array comprises one or more links, and each link comprises a link name and a Hash of an entity linked to the entity;
the data is information of the entity.
According to a second aspect of the embodiments of the present invention, there is also provided an electronic device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor calls the program instruction to execute the method for storing a knowledge graph based on hash addressing according to any one of the various possible implementations of the first aspect.
The embodiment of the invention provides a knowledge graph storage method based on Hash addressing, which stores a knowledge graph of a specific data structure, wherein the data structure of each entity in the knowledge graph comprises data and a link array, each link array comprises one or more links, each link comprises a link name and Hash of the entity linked to the entity, and the information of the entity is stored in the data, so that the knowledge graph storage mode based on Hash addressing is realized; on the other hand, the stored entity can be quickly searched through the Hash value of the entity in the data structure, and the updating and inquiring speed is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic overall flow chart of a knowledge graph storage method based on hash addressing according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of resource connection of entities in a knowledge graph storage method based on Hash addressing according to the present invention;
FIG. 3 is a schematic diagram of adding links and entities in the method for storing a knowledge graph based on Hash addressing according to the present invention;
FIG. 4 is a schematic diagram of link deletion in a hash addressing-based knowledge-graph storage method according to the present invention;
FIG. 5 is a diagram illustrating an update entity in a method for storing a knowledge graph based on Hash addressing according to the present invention;
FIG. 6 is a schematic diagram of a rename entity storage mechanism in the hash addressing-based knowledge-graph storage method provided in the present invention;
fig. 7 is a schematic view of an overall structure of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
In an embodiment of the present invention, a method for storing a knowledge graph based on hash addressing is provided, and fig. 1 is a schematic overall flow chart of the method for storing a knowledge graph based on hash addressing provided in the embodiment of the present invention, where the method includes: s101, acquiring a knowledge graph; s102, storing the knowledge graph; the data structures of the entities in the knowledge graph comprise data and a link array; the link array comprises one or more links, and each link comprises a link name and a Hash of an entity linked to the entity; the data is information of the entity.
Wherein, the entity is described by adopting the following data structure:
data: information representing entities, such as resource content or non-link attributes;
links: an array of Link data structures, the entities being linked to other entities by links;
the Link data structure contains two fields:
name: the name of Link;
hash: the Hash of the entity to which the Link is linked.
In the prior art, the knowledge graph is a complex nonlinear directed graph structure, and the embodiment represents the data structure of the knowledge graph as a linear array structure, so that less storage space is occupied when physical storage is performed according to the data structure of the knowledge graph.
In the embodiment, the knowledge graph of a specific data structure is stored, the data structure of each entity in the knowledge graph comprises data and a link array, each link array comprises one or more links, each link comprises a link name and a Hash of the entity linked to the entity, and the information of the entity is stored in the data, so that a knowledge graph storage mode based on Hash addressing is realized, on one hand, a complex nonlinear directed graph structure of the knowledge graph is converted into a linear array structure, redundant storage of resources is avoided, and storage consumption is greatly reduced; on the other hand, the stored entity can be quickly searched through the Hash value of the entity in the data structure, and the updating and inquiring speed is improved.
On the basis of the above embodiments, the entities in the knowledge graph in this embodiment include resource entities and non-resource entities; if the linked entity is a resource entity, the Hash of the linked entity is the Hash of the content of the linked entity; if the linked entity is a non-resource entity, the Hash of the linked entity is the Hash of the linked entity name; the information of the resource entity is the resource content of the resource entity and is non-structural data; the information of the non-resource entity is the non-link attribute of the non-resource entity and is an array structure; each element in the array structure comprises a key of the unlinked attribute and a value of the unlinked attribute, wherein the key of the unlinked attribute is a hash value of the unlinked attribute; the link name is a hash value of a relationship between the entity and an entity to which the entity is linked.
In the embodiment, entities in the knowledge graph are divided into resource entities and non-resource entities, wherein the resource entities include picture resources, web page resources and the like. A non-resource entity is a concept or instance, such as a human, a person and a place, and the like. The present embodiment describes a data structure of an entity in a knowledge-graph. And representing the knowledge graph as a directed graph according to the characteristic that the relation in the knowledge graph is cyclic. Since the expression mode of the knowledge graph is subject-attribute-object, in this embodiment, the subject is referred to as a link departure node, the relationship is referred to as a link, and the object is referred to as a link-to node.
The Data structures of the resource entity and the non-resource entity comprise Data and linked arrays Links. Describing the resource entities by adopting the following data structure:
data: unstructured data representing resource content of the resource entity;
links: an array of Link data structures, the entities being linked to other entities by links;
the Link data structure contains two fields:
name: the name of Link;
hash: the Hash of the entity to which the Link is linked.
For the Data structure of the non-resource entity, the Links domain is expanded, and the Data domain is redefined, so that the Data domain is more suitable for the entity and relationship representation of the knowledge graph.
The following parameters are first defined:
< root > -the entity that issued the link;
< name > -link name;
< ref > -the entity to be linked.
A non-resource entity is a data structure that contains two fields:
links is an array of Link data structures through which entities are linked to other entities.
The Link data structure contains two fields:
name: the name of Link is named by the hash value of the relation between root and ref, namely the relation of ref relative to root;
hash: the hash value of ref. When ref is a non-resource entity, the hash value is obtained by directly hashing the ref name; when ref is a resource entity, the hash value is obtained by hashing the content of ref.
In order to adapt to the entity and relation expression of the knowledge graph, the relation between the Name and the Hash in one Link can be one-to-one, one-to-many, many-to-one or many-to-many.
Data is an array of Data structure, and the non-resource entity can store some non-linked attributes through the Data;
the data structure contains two fields:
key: the key code of the non-link attribute is directly obtained by hashing the non-link attribute;
value is a Value of a string or number type that expresses the values of some non-linked attributes of a non-resource entity.
The data structure of the entities in the knowledge graph is as follows:
Figure BDA0002147580100000061
Figure BDA0002147580100000071
in one example as follows, the data structure of a human entity named Aaron is described as follows:
Figure BDA0002147580100000072
Figure BDA0002147580100000081
the above example defines some of the link and non-link properties of Aaron's human entities. Wherein the link attribute includes:
a web page linked to an entity with a relationship name of "Wikipedia" and "Article", i.e., a Wikipedia attribute, which is also a web page attribute: wikipedia pages of Aaron entries;
an entity linked to a relationship name "Wikiquote" and "Article", that is, a web page with a wiki corpus attribute, which is also a web page attribute: a wiki bibliography page of Aaron entries;
a class entity human linked to a relationship name "Category";
a male with a generic entity linked to a relationship name "gender";
linking to two entity-like dramas and dramatives with relation name 'profession';
linking to a photo entity with a relationship name "photo";
the non-link attributes include:
book index code of national Ladeswia library: 000057405, respectively;
russian national library number: 000002833.
table 1 and fig. 2 are visual representations of the attributes of the above example, where table 1 is a visual representation of the above example, fig. 2 converts the link and the attribute into a hash value, and then details the naming and linking manner of the entity in this embodiment based on fig. 2.
Naming mode of the entity: the personal entity Aaron is directly subjected to Hash operation and stored in the form of Aaron character strings, the requirement of inquiring according to entity names in a knowledge graph is met, the retrieval speed is improved, the representation range is large, the collision is small, and the webpage resource entities linked to the personal entity Aaron are as follows: the web page resource file is named in a Hash mode according to the web page content, and the data domain is expanded into the data domain of the web page to store the web page resource content. In the file storage and naming mode in this embodiment, only one resource with the same content is reserved, so that redundancy removal of the resource is realized. Other resource entities such as Aaron's personal entities, such as pictures, etc.
Naming mode of the link: as shown in fig. 2, Aaron is a personal entity with a link named "wikipedia", and the link is named directly by the hash value of the string of "wikipedia", which has the following advantages: the storage formatting is equal, for example, the storage length is equal, the query is fast, and each attribute can directly search the corresponding storage node according to the hash value. Such as profession, which also corresponds to a class entity. In addition, the link and data entities in the data structure in this embodiment do not have defined lengths, and can be dynamically extended, which is suitable for the following scenarios: the same attribute contains a plurality of attribute entities, for example, Aaron is both a drama and a screenwriter, and the Wikipedia webpage resource is a resource of the Wikipedia attribute and a resource of the Archicle attribute. The Data field of the resource file is expanded into unstructured Data, so that the resource entity is conveniently stored.
TABLE 1 visual representation of data structures for Aaron entities
Figure BDA0002147580100000091
According to the embodiment, the entities in the knowledge graph are divided into the resource entities and the non-resource entities, then the data structures of the resource entities and the non-resource entities are described in different modes according to the relation between the entities in the knowledge graph, and the resource entities and the non-resource entities are stored according to the described data structures, so that the knowledge graph storage mode based on Hash addressing is realized, the updating and inquiring speed is improved, the redundant storage of resources is avoided, and the storage consumption is greatly reduced.
On the basis of the above embodiment, the present embodiment further includes adding an entity and a link in the knowledge-graph by the following steps: judging whether an entity to be linked to the entity to be added with the link exists in the knowledge graph or not, and if not, creating the entity to be linked to; if so, taking the hash value of the entity to be added with the link as the input of a Kademlia algorithm for hash addressing to obtain the entity to be added with the link; combining the Hash of the entity to be linked and the Hash of the name of the link to be added into a link type object; and adding the object of the link type at the end of the link array of the entity to which the link is to be added.
Specifically, in order to adapt to the dynamics of the entities and the relationships in the knowledge graph, the entities and the relationships in the knowledge graph need to be added, deleted and changed. FIG. 3 is a schematic diagram of an entity and a link adding method, which is applied to the addition of the link and the entity. When entities and relationships need to be added to the knowledge-graph, the required parameters are as follows:
< root > -entity hash to modify;
< name > -the name of the link to be created;
< ref > -the entity to be linked.
The present embodiment creates and links a new entity based on an existing entity. Since the hash value of the non-resource entity is calculated according to the name and the hash value of the resource entity is calculated according to the content data, the addition relation does not need to modify the hash value of the upper entity. If the entity to be linked does not exist, the entity needs to be created first, the name of the entity is a parameter which must be created, and Data and Links arrays are optional. And taking the value of the parameter < root > as the input of the Kademlia algorithm, performing hash addressing, and finding out the source entity needing to add the link.
Hashing the parameter < name > yields the name of the link, meaning the relationship of the entity to be linked with respect to the source entity of the link, the purpose of hashing is to facilitate relationship-based name addressing. The parameters < ref > and < name > are combined into a Link type object, which is appended to the end of the Links array of the < root > object to form a new Link.
On the basis of the foregoing embodiment, in this embodiment, the hash value of the entity to which the link is to be added is used as an input of a Kademlia algorithm for hash addressing, and the step of acquiring the entity to which the link is to be added specifically includes: searching whether the entity of the link to be added is stored in the node initiating the search request, if so, returning the ID of the node initiating the search request; if not, returning a preset number of nodes with the key codes nearest to the key codes of the nodes initiating the search requests, and sending the search requests to the preset number of nodes; the node receiving the search request checks whether the node stores the entity to be added with the link, and if so, returns the node ID of the node; if not, returning a preset number of nodes with the key codes closest to the key codes of the nodes in the K-bucket corresponding to the nodes in the K-bucket; judging whether the node initiating the search request receives the node ID, if so, finishing the search; if not, after the node initiating the search request receives the returned preset number of nodes, sending the search request to the nodes which do not send the search request in the preset number of nodes again until the node ID is obtained; and acquiring the entity of the link to be added according to the node ID, and caching the entity of the link to be added on the node without returning the node ID.
Specifically, the step of taking the value of the parameter < root > as the input of the Kademlia algorithm, performing hash addressing, and finding the source entity needing to add the link is as follows:
firstly, an initiator can search whether the initiator stores a < root > entity, if so, the initiator directly returns a NODE ID, otherwise, the initiator returns a NODE with K Key values closest to the Key values of the initiator, and initiates a FIND _ NODE request, namely a NODE searching request, to the K NODEs.
And secondly, the NODE receiving the FIND _ NODE request checks whether the NODE stores a < root > entity, if the NODE directly returns the ID of the NODE, and if the NODE does not store the < root > entity, the NODE returns K Key values closest to the Key values in a corresponding K-bucket.
And thirdly, if the initiator receives the NODE ID, finishing the searching process, otherwise, after receiving the returned NODEs, updating the result list of the initiator, and selecting the NODEs which do not send the request from the returned K nearest NODEs again to initiate the FIND _ NODE request again.
And fourthly, repeatedly executing the first step to the third step until the node ID is acquired or the node which is closer to the root than the K nodes currently known by the initiator cannot be acquired, wherein the node of the storage source entity is not found.
If the node storing the source entity is finally found, the < root > entity is cached on the nearest node which does not return the correct result, so that the query speed can be increased when the same < root > value is queried next time.
On the basis of the above embodiment, the present embodiment further includes deleting links from the knowledge-graph by: taking the hash value of the entity to be modified as the input of a Kademlia algorithm for hash addressing, and acquiring the stored entity to be modified; and deleting the specified link item from the link array of the entity to be modified.
Specifically, the deletion method in the present embodiment is applied to deletion of an entity and a link. Knowledge graph representation methods are content or name-based addressing, and there may be multiple relationships for an entity, represented by links linked to the entity. If a link is deleted and an entity is bound to be deleted, deleting one link results in the entity not being accessible through other links. The present embodiment separates deletion of a link from deletion of an entity. The method of deleting a link is shown in fig. 4. The parameters required for the method of deleting a link are as follows:
< root > -entity to modify
< link > -Link to be removed
Taking the value of the parameter < root > as the input of the Kademlia algorithm, performing hash addressing, and finding the source entity needing to delete the link, wherein the specific addressing method is as described above. And taking root as a source of the link, and deleting the item corresponding to the link from the Links array of the entity, namely, breaking the link.
On the basis of the above embodiment, the present embodiment further includes deleting an entity from the knowledge-graph by: performing authority authentication on a user initiating a deletion request; and if the user initiating the deletion request is the user issuing the entity to be deleted, deleting the entity to be deleted.
Specifically, each entity binds the identity of the publisher when publishing to the network, and the owner of the entity has the right to delete the entity after the authority authentication. It should be noted that, for an entity, only its publisher has the right to delete, even if all links pointing to the entity are deleted. If the issuer does not initiate the delete entity request, the entity still exists in the network.
On the basis of the above embodiment, the embodiment further includes updating the entities in the knowledge-graph by the following steps: creating a new entity according to the updated content of the entity to be updated; creating a link initiated by the entity to be updated to point to the new entity, and defining the name of the link pointing to the new entity as an update; and when the new entity is accessed, deleting the link between the upper layer entity of the entity to be updated and the entity to be updated, and creating the link between the upper layer entity and the new entity.
Specifically, the method for updating the entity includes two parts of adding the entity and deleting the link, as shown in fig. 5. Since the entities in the knowledge graph are not directly modifiable, updating an entity requires creating a link initiated by the original entity to point to the new entity, the name of the link is defined as "update", thereby establishing the relationship between the new entity and the original system. When the new entity has the requirement of being accessed, the relationship between the upper entity and the original entity is deleted, and the relationship between the upper entity and the new entity is created by using the same parameters. The parameters required to update the entity are as follows:
< root > -the entity to be updated;
< content > -updated content.
First, a new entity is created as a link target entity according to < content >. Taking the value of the parameter < root > as the input of the Kademlia algorithm, performing hash addressing, finding the node where the entity needing to be updated is located, and executing the step of adding the link on the node. When the new entity has access requirement, the link between the upper layer entity and the original entity is deleted according to the method, and the link between the upper layer entity and the new entity is created.
On the basis of the above embodiment, the step of deleting the entity from the knowledge-graph in this embodiment further includes: when a user sends a request for accessing a resource entity, if the resource entity is checked to be in a failure state, returning failure information of the resource entity, and deleting a link linked to the resource entity; and/or scanning the links of all entities in the knowledge graph at preset time intervals, and deleting the link of the invalid entity if the entity linked by any entity is invalid.
Specifically, the garbage collection method in this embodiment is applied to the collection and destruction of empty links. The method is suitable for deleting a certain resource node by a publisher, and Links of the resource entity are still stored in Links arrays of a plurality of entities in the knowledge graph. Since all links owning the entity cannot be obtained directly through the deleted entity and all empty links are deleted, the embodiment proposes a garbage collection mechanism, that is, when a user accesses the resource node, and finds that the resource node is invalid, a resource invalid message is returned, and the link in the linked starting entity is deleted, and at this time, the linked starting entity is known. And periodically scan all the entity's links to remove empty links that have failed in order to reclaim storage space. Combining these two approaches, a reclamation mechanism is implemented for invalid links pointing to deleted resources.
On the basis of the above embodiments, the present embodiment further includes: if entities with the same name exist in the knowledge graph, maintaining a directory for the hash value with the same name; the data structure of the directory is the same as that of the entity, the name of each link in the data structure of the directory is the characteristic attribute of the corresponding entity with the same name, and the Hash in each link is the Hash of the splicing result of the corresponding entity and the characteristic attribute of the corresponding entity.
Specifically, the knowledge graph is named by the hash value of the name of an entity, and in nature, many entities with duplicate names exist, such as several persons with the same name, and an apple may refer to both fruits and apple companies, and a mechanism is needed to solve the problem. In this embodiment, the entities with the same name are still addressed according to the hash value of the name, but a directory is maintained on the node corresponding to the hash value of the name. The data structure of the directory is the same as that of a common entity, and the directory is used for storing the corresponding relation between the characteristics of the entity and the hash value. As shown in fig. 6, the Name of the link is some characteristic attributes, and the Hash value is a Hash value obtained by splicing the corresponding entity and some characteristic attributes thereof. The blue and white porcelain is a porcelain and also a song name, and Hash values in the link are named by hashing the two entity attributes and the entity names under the directory of the blue and white porcelain. During searching, the entity to be queried is judged according to the context, and then the query is performed again to obtain a return result.
This embodiment provides an electronic device, and fig. 7 is a schematic diagram of an overall structure of the electronic device according to the embodiment of the present invention, where the electronic device includes: at least one processor 701, at least one memory 702, and a bus 703; wherein the content of the first and second substances,
the processor 701 and the memory 702 communicate with each other via a bus 703;
the memory 702 stores program instructions executable by the processor 701, and the processor calls the program instructions to perform the methods provided by the method embodiments, for example, the methods include: acquiring a knowledge graph; storing the knowledge graph; wherein the data structure of the entities in the knowledge-graph comprises data and a linked array; the link array comprises one or more links, each link comprising a link name and a Hash of an entity to which the entity is linked; the data is information of the entity.
The present embodiments provide a non-transitory computer-readable storage medium storing computer instructions that cause a computer to perform the methods provided by the above method embodiments, for example, including: acquiring a knowledge graph; storing the knowledge graph; wherein the data structure of the entities in the knowledge-graph comprises data and a linked array; the link array comprises one or more links, each link comprising a link name and a Hash of an entity to which the entity is linked; the data is information of the entity.
Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. A knowledge graph storage method based on Hash addressing is characterized by comprising the following steps:
acquiring a knowledge graph;
storing the knowledge graph;
wherein the data structure of the entities in the knowledge-graph comprises data and a linked array;
the link array comprises one or more links, each link comprising a link name and a Hash of an entity to which the entity is linked;
the data is information of the entity;
entities in the knowledge graph comprise resource entities and non-resource entities;
if the linked entity is a resource entity, the Hash of the linked entity is the Hash of the content of the linked entity; the resource entity comprises a picture resource and a webpage resource;
if the linked entity is a non-resource entity, the Hash of the linked entity is the Hash of the linked entity name; wherein the non-resource entities include a human, a person, and a place;
the information of the resource entity is the resource content of the resource entity and is non-structural data;
the information of the non-resource entity is the non-link attribute of the non-resource entity and is an array structure;
each element in the array structure comprises a key of the unlinked attribute and a value of the unlinked attribute, wherein the key of the unlinked attribute is a hash value of the unlinked attribute;
the link name is a hash value of a relationship between the entity and an entity to which the entity is linked.
2. The method of hash-addressing-based knowledgegraph storage according to claim 1, further comprising adding entities and links in the knowledgegraph by:
judging whether an entity to be linked to the entity to be added with the link exists in the knowledge graph or not, and if not, creating the entity to be linked to;
if so, addressing by taking the hash value of the entity to be added with the link as input to obtain the entity to be added with the link;
combining the Hash of the entity to be linked and the Hash of the name of the link to be added into a link type object;
and adding the object of the link type at the end of the link array of the entity to which the link is to be added.
3. The hash addressing-based knowledge graph storage method according to claim 2, wherein the hash value of the entity to which the link is to be added is used as an input for addressing, and the step of obtaining the entity to which the link is to be added specifically comprises:
searching whether the entity of the link to be added is stored in the node initiating the search request, if so, returning the ID of the node initiating the search request; if not, returning a preset number of nodes with the key codes nearest to the key codes of the nodes initiating the search requests, and sending the search requests to the preset number of nodes;
the node receiving the search request checks whether the node stores the entity to be added with the link, and if so, returns the node ID of the node; if not, returning a preset number of nodes with the key codes closest to the key codes of the nodes in the K-bucket corresponding to the nodes in the K-bucket;
judging whether the node initiating the search request receives the node ID, if so, finishing the search; if not, after the node initiating the search request receives the returned preset number of nodes, sending the search request to the nodes which do not send the search request in the preset number of nodes again until the node ID is obtained;
and acquiring the entity of the link to be added according to the node ID, and caching the entity of the link to be added on the node without returning the node ID.
4. The hash-addressing-based knowledgegraph storage method of claim 1, further comprising removing links from the knowledgegraph by:
addressing by taking the hash value of the entity to be modified as input, and acquiring the stored entity to be modified;
and deleting the specified link item from the link array of the entity to be modified.
5. The hash-addressing-based knowledgegraph storage method of claim 1, further comprising removing entities from the knowledgegraph by:
performing authority authentication on a user initiating a deletion request;
and if the user initiating the deletion request is the user issuing the entity to be deleted, deleting the entity to be deleted.
6. The method of hash-addressing-based knowledgegraph storage according to claim 1, further comprising updating entities in the knowledgegraph by:
creating a new entity according to the updated content of the entity to be updated;
creating a link initiated by the entity to be updated to point to the new entity, and defining the name of the link pointing to the new entity as an update;
and when the new entity is accessed, deleting the link between the upper layer entity of the entity to be updated and the entity to be updated, and creating the link between the upper layer entity and the new entity.
7. The hash-addressing-based knowledgegraph storage method of claim 5, wherein the step of removing entities from the knowledgegraph further comprises:
when a user sends a request for accessing a resource entity, if the resource entity is checked to be in a failure state, returning failure information of the resource entity, and deleting a link linked to the resource entity; and/or the presence of a gas in the gas,
and scanning the links of all entities in the knowledge graph every other preset time length, and deleting the link of the invalid entity if the entity linked by any entity is invalid.
8. The method of any of claims 1-7, further comprising:
if entities with the same name exist in the knowledge graph, maintaining a directory for the hash value with the same name; the data structure of the directory is the same as that of the entity, the name of each link in the data structure of the directory is the characteristic attribute of the corresponding entity with the same name, and the Hash in each link is the Hash of the splicing result of the corresponding entity and the characteristic attribute of the corresponding entity.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program performs the steps of the method of hash-addressing-based knowledge-graph storage of any one of claims 1 to 8.
CN201910689943.2A 2019-07-29 2019-07-29 Knowledge graph storage method based on Hash addressing Active CN110543570B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910689943.2A CN110543570B (en) 2019-07-29 2019-07-29 Knowledge graph storage method based on Hash addressing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910689943.2A CN110543570B (en) 2019-07-29 2019-07-29 Knowledge graph storage method based on Hash addressing

Publications (2)

Publication Number Publication Date
CN110543570A CN110543570A (en) 2019-12-06
CN110543570B true CN110543570B (en) 2022-03-11

Family

ID=68709931

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910689943.2A Active CN110543570B (en) 2019-07-29 2019-07-29 Knowledge graph storage method based on Hash addressing

Country Status (1)

Country Link
CN (1) CN110543570B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112015986B (en) * 2020-08-26 2024-01-26 北京奇艺世纪科技有限公司 Data pushing method, device, electronic equipment and computer readable storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194002A (en) * 2011-05-25 2011-09-21 中兴通讯股份有限公司 Table entry adding, deleting and searching method of hash table and hash table storage device
CN104462501A (en) * 2014-12-19 2015-03-25 北京奇虎科技有限公司 Knowledge graph construction method and device based on structural data
CN107491555A (en) * 2017-09-01 2017-12-19 北京纽伦智能科技有限公司 Knowledge mapping construction method and system
CN108600321A (en) * 2018-03-26 2018-09-28 中国科学院计算技术研究所 A kind of diagram data storage method and system based on distributed memory cloud

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102194002A (en) * 2011-05-25 2011-09-21 中兴通讯股份有限公司 Table entry adding, deleting and searching method of hash table and hash table storage device
CN104462501A (en) * 2014-12-19 2015-03-25 北京奇虎科技有限公司 Knowledge graph construction method and device based on structural data
CN107491555A (en) * 2017-09-01 2017-12-19 北京纽伦智能科技有限公司 Knowledge mapping construction method and system
CN108600321A (en) * 2018-03-26 2018-09-28 中国科学院计算技术研究所 A kind of diagram data storage method and system based on distributed memory cloud

Also Published As

Publication number Publication date
CN110543570A (en) 2019-12-06

Similar Documents

Publication Publication Date Title
JP7113040B2 (en) Versioned hierarchical data structure for distributed data stores
JP4739455B2 (en) Document management method
US7562087B2 (en) Method and system for processing directory operations
JP6123339B2 (en) Database, apparatus and method for storing encoded triples
US8250081B2 (en) Resource access filtering system and database structure for use therewith
US7895176B2 (en) Entry group tags
CN110019540B (en) Implementation method, display method, device and equipment of enterprise atlas
US9183267B2 (en) Linked databases
US9122769B2 (en) Method and system for processing information of a stream of information
US8812435B1 (en) Learning objects and facts from documents
US10216716B2 (en) Method and system for electronic resource annotation including proposing tags
US11030242B1 (en) Indexing and querying semi-structured documents using a key-value store
CN110321325A (en) File inode lookup method, terminal, server, system and storage medium
CN113010476B (en) Metadata searching method, device, equipment and computer readable storage medium
CN107526762A (en) Service end, multi-data source searching method and system
US9600597B2 (en) Processing structured documents stored in a database
CN1279468C (en) Method and device for mapping file sentence
CN110543570B (en) Knowledge graph storage method based on Hash addressing
US9020977B1 (en) Managing multiprotocol directories
US20170083635A1 (en) Computer Implemented Systems and Methods for Dynamic and Heuristically-generated Search Returns of Particular Relevance
US7689584B2 (en) Hybrid groups
Aslam et al. SPedia: a central hub for the linked open data of scientific publications
CN108062277B (en) Electronic credential data access method, device and system
CN115983965A (en) Method and system for realizing bank risk strategy consanguinity analysis
US20230062227A1 (en) Index generation and use with indeterminate ingestion patterns

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant