CN116186280A - Knowledge-graph data compression and decompression method and system - Google Patents

Knowledge-graph data compression and decompression method and system Download PDF

Info

Publication number
CN116186280A
CN116186280A CN202211682921.1A CN202211682921A CN116186280A CN 116186280 A CN116186280 A CN 116186280A CN 202211682921 A CN202211682921 A CN 202211682921A CN 116186280 A CN116186280 A CN 116186280A
Authority
CN
China
Prior art keywords
graph
knowledge
data
knowledge graph
entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211682921.1A
Other languages
Chinese (zh)
Inventor
杨娟
杨再飞
邵伯仲
翟士丹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Haizhi Xingtu Technology Co ltd
Original Assignee
Beijing Haizhi Xingtu Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Haizhi Xingtu Technology Co ltd filed Critical Beijing Haizhi Xingtu Technology Co ltd
Priority to CN202211682921.1A priority Critical patent/CN116186280A/en
Publication of CN116186280A publication Critical patent/CN116186280A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Animal Behavior & Ethology (AREA)
  • Computational Linguistics (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention relates to a method and a system for compressing and decompressing knowledge graph data, wherein the method comprises the steps of obtaining a knowledge graph query request, sending the query request to a graph database, and recording query conditions of the knowledge graph; the compression program responds to a return result of the graph database to form a first knowledge graph, wherein the first knowledge graph comprises knowledge graph data composed of entities and relations and formed by a plurality of paths; removing repeated data of the entities and the relations in the first knowledge-graph data according to the compression program to obtain a second knowledge-graph, wherein the second knowledge-graph represents a entity set and a relation set after repeated removal; processing the second knowledge graph according to the query condition of the acquired knowledge graph query request; reconstructing the entity and relation information of the first knowledge graph according to the second knowledge graph by the decompression program; and according to the compression program and the query condition of the recorded knowledge graph, analyzing all paths of the entity and the relationship of the first knowledge graph.

Description

Knowledge-graph data compression and decompression method and system
Technical Field
The invention relates to the technical field of knowledge-graph data compression and decompression, in particular to a knowledge-graph data compression and decompression method and system.
Background
In recent years, knowledge graphs are applied to more and more fields, the graph data is larger and larger in scale, how to efficiently process the knowledge graph data becomes an important subject, most of the subjects are focused on links of generating, cleaning and displaying the knowledge graph data, and attention is not paid to the data expansion condition of the knowledge graph data in the processing process. In the related knowledge graph inquiring action, the knowledge graph data are stored in a graph database in advance, when the knowledge graph data in the graph database are required to be inquired, the knowledge graph data are inquired by utilizing the graph database, the result data returned by the graph database usually comprise a sub graph, the sub graph comprises a plurality of paths, and each path consists of a plurality of entity data and relationship data. The same entity or relationship may become part of different paths.
Based on the reasons, the result data contains a large amount of repeated entity and relation data, obvious data expansion occurs, and the subsequent transmission and calculation processing efficiency of the knowledge graph data are further affected.
Therefore, there is a need for a method that can compress the returned knowledge-graph data, thereby improving the calculation and processing efficiency of the knowledge-graph data.
Disclosure of Invention
The invention aims to provide a knowledge graph data compression and decompression method which can solve the problem that a large amount of repeated entity and relationship data exist in a result returned by a database, so that the returned result data is expanded, and the calculation and processing efficiency is affected.
The first aspect of the invention provides a knowledge-graph data compression and decompression method, which comprises the following steps:
acquiring a knowledge graph query request, sending the query request to a graph database, and recording query conditions of the knowledge graph;
the compression program responds to a return result of the graph database to form a first knowledge graph, wherein the first knowledge graph comprises knowledge graph data composed of entities and relations and formed by a plurality of paths;
removing repeated data of entities and relations in the first knowledge-graph data according to the compression program to obtain a second knowledge-graph, wherein the second knowledge-graph represents a entity set and a relation set after repeated removal;
processing the second knowledge graph according to the query condition of the acquired knowledge graph query request;
reconstructing the entity and relation information of the first knowledge graph according to the second knowledge graph by a decompression program;
and according to the compression program and the query condition of the recorded knowledge graph, analyzing all paths of the entity and the relation of the first knowledge graph to form the first knowledge graph.
In one implementation manner, the step of obtaining a knowledge graph query request, sending the query request to a graph database, and recording a query condition of the knowledge graph, where the query condition includes: the query origin and traversal depth of the knowledge graph.
In one implementation, the compression program is responsive to the step of returning the results of the graph database, and the compression program includes a compression program made of Java.
In one implementation manner, the step of removing the repeated data of the entity and the relationship in the first knowledge-graph data according to the compression procedure to obtain the second knowledge-graph includes:
traversing the first knowledge-graph data by the compression program to acquire entities and relations in each path;
de-weighting the entities and the relations according to the vertexes and edges of the primary key field values so that each entity and each relation are reserved in one part;
and establishing an entity data set and a relation establishing relation data set for all the entity after the duplication removal.
In one implementation manner, the step of processing the second knowledge graph according to the query condition of the query request for obtaining the knowledge graph includes:
determining a required transmission and processing mode according to the query condition of the acquired knowledge graph query request;
and according to the transmission and processing modes, carrying out attribute marking and data filtering on the entities and the relations in the second knowledge graph.
In one implementation manner, the step of reconstructing the entity and relationship information of the first knowledge-graph according to the second knowledge-graph by using a decompression procedure includes:
acquiring a second knowledge graph subjected to attribute marking and data filtering;
and reconstructing the entity and relation information of the first knowledge graph according to the decompression program, wherein the decompression program represents a jgrapht graph algorithm library.
In one implementation manner, the step of analyzing all paths of the entity and the relationship of the first knowledge graph according to the compression procedure and the query condition of the recorded knowledge graph to form the first knowledge graph includes:
and analyzing path information of the entity and the relationship of the first knowledge graph through the jgrapht graph algorithm library, the query condition of the recorded knowledge graph and the entity and the relationship information of the first knowledge graph.
The second aspect of the present application provides a knowledge-graph data compression and decompression system, including the aforementioned knowledge-graph data compression and decompression method, the system includes:
the acquisition unit is used for acquiring a knowledge graph query request, sending the query request to a graph database and recording the query condition of the knowledge graph;
the compression unit is used for responding to the return result of the graph database by the compression program to form a first knowledge graph, wherein the first knowledge graph comprises knowledge graph data composed of entities and relations and formed by a plurality of paths;
the duplicate removal unit is used for removing duplicate data of the entities and the relations in the first knowledge-graph data according to the compression program to obtain a second knowledge-graph, wherein the second knowledge-graph represents a entity set and a relation set after duplicate removal;
the processing unit is used for processing the second knowledge graph according to the query condition of the acquired knowledge graph query request;
the decompression unit is used for reconstructing the entity and relation information of the first knowledge graph according to the second knowledge graph by the decompression program;
and the reduction unit is used for analyzing all paths of the entity and the relation of the first knowledge graph according to the compression program and the query condition of recording the knowledge graph to form the first knowledge graph.
A third aspect of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor implements the steps of a knowledge-graph data compression and decompression method described above when executing the computer program.
A fourth aspect of the present application is a computer storage medium having a computer program stored thereon, wherein the computer program, when executed by a processor, implements the steps of a knowledge-graph data compression and decompression method described above.
The invention has the beneficial effects that:
and acquiring a knowledge graph query request, sending the query request to a graph database, and recording the query condition of the knowledge graph. The compression program responds to the returned result of the graph database; and returning repeated data of the entities and the relations in the results to the graph database, and removing the repeated data by using a compression program so as to obtain a second knowledge graph. And then, processing the second knowledge graph according to the query condition of the knowledge graph query request so that the compressed second knowledge graph can be identified according to the query condition, and the formed returned result. In this way, the second knowledge-graph can be reconstructed into the entity and relationship information of the first knowledge-graph using the decompression procedure. And finally, analyzing all paths of the entity and the relation of the first knowledge graph by utilizing the compression program and recording the query condition of the knowledge graph to form the first knowledge graph formed by a plurality of paths.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are needed in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the present invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a knowledge-graph data compression and decompression method of the invention;
fig. 2 is a knowledge graph example of a knowledge graph data compression and decompression method according to the present invention;
FIG. 3 is a flowchart for deleting duplicate entities and relationships in a knowledge-graph data compression-decompression method according to the present invention;
FIG. 4 is a schematic diagram of duplication elimination according to the primary key field value in the knowledge-graph data compression and decompression method of the present invention;
FIG. 5 is a flowchart of a second knowledge-graph processing method in the knowledge-graph data compression and decompression method according to the present invention;
fig. 6 is a schematic diagram of comparison before and after data compression in a JSON graph database in the knowledge graph data compression and decompression method of the present invention;
FIG. 7 is a flowchart of constructing a graph object in the knowledge graph data compression and decompression method of the present invention;
fig. 8 is a schematic diagram of an embodiment of a knowledge-graph data compression and decompression method according to the present invention.
Detailed Description
In the description of the embodiments of the present invention, those skilled in the art will appreciate that the embodiments of the present invention may be implemented as a method, an apparatus, an electronic device, and a computer-readable storage medium. Thus, embodiments of the present invention may be embodied in the following forms: complete hardware, complete software (including firmware, resident software, micro-code, etc.), a combination of hardware and software. Furthermore, in some embodiments, embodiments of the invention may also be implemented in the form of a computer program product in one or more computer-readable storage media having computer program code embodied therein.
Any combination of one or more computer-readable storage media may be employed by the computer-readable storage media described above. The computer-readable storage medium includes: an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination thereof. More specific examples of the computer readable storage medium include the following: portable computer diskette, hard disk, random Access Memory (RAM), read-only Memory (ROM), erasable programmable read-only Memory (EPROM), flash Memory (Flash Memory), optical fiber, compact disc read-only Memory (CD-ROM), optical storage device, magnetic storage device, or any combination thereof. In embodiments of the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, device.
The computer program code embodied in the computer readable storage medium may be transmitted using any appropriate medium, including: wireless, wire, fiber optic cable, radio Frequency (RF), or any suitable combination thereof.
Computer program code for carrying out operations of embodiments of the present invention may be written in assembly instructions, instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, integrated circuit configuration data, or in one or more programming languages, or combinations thereof, including an object oriented programming language such as: java, smalltalk, C ++, also include conventional procedural programming languages, such as: c language or similar programming language. The computer program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of remote computers, the remote computers may be connected via any sort of network, including: a Local Area Network (LAN) or a Wide Area Network (WAN), which may be connected to the user's computer or to an external computer.
The embodiment of the invention describes a method, a device and electronic equipment through flowcharts and/or block diagrams.
It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions. These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
These computer readable program instructions may also be stored in a computer readable storage medium that can cause a computer or other programmable data processing apparatus to function in a particular manner. Thus, instructions stored in a computer-readable storage medium produce an instruction means which implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.
The terms first and second and the like in the description and in the claims of embodiments of the invention, are used for distinguishing between different objects and not necessarily for describing a particular sequential order of objects. For example, the first target object and the second target object, etc., are used to distinguish between different target objects, and are not used to describe a particular order of target objects.
In embodiments of the invention, words such as "exemplary" or "such as" are used to mean serving as an example, instance, or illustration. Any embodiment or design described herein as "exemplary" or "e.g." in an embodiment should not be taken as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "exemplary" or "such as" is intended to present related concepts in a concrete fashion.
In the description of the embodiments of the present invention, unless otherwise indicated, the meaning of "a plurality" means two or more. For example, the plurality of processing units refers to two or more processing units; the plurality of systems means two or more systems.
The terms appearing in the present application are explained as follows:
jgrapht is an open source library written in Java and focuses on data structures and algorithms.
As shown in fig. 1, a first aspect of the present application provides a knowledge-graph data compression and decompression method, where the method includes:
s100: acquiring a knowledge graph query request, sending the query request to a graph database, and recording query conditions of the knowledge graph.
According to the obtained knowledge graph query request, the graph database responds to the query request, and records the starting point and the traversal depth of the graph query in the graph database, so that the query result can be returned according to the starting point and the traversal depth of the query.
Specifically, the query condition includes a query origin of the knowledge graph and a traversal depth.
As shown in fig. 2, the query origin represents the origin of the content to be queried by the knowledge graph, and the traversal depth represents the depth of the content to be queried based on the origin. Illustratively, in FIG. 2, the traversal results are H-I, H-G, H-D, H-B, and H-A when the traversal depth is 2 degrees, and the traversal results are A-I and A-B when the traversal depth is 2 degrees with the point A as the starting point. If J is taken as a starting point, under the condition that the traversing depth is 3 degrees, the traversing results are J-I-A and J-I-H.
S200: the compression program is responsive to the returned results of the graph database.
The first knowledge graph comprises knowledge graph data composed of entities and relations and composed of a plurality of paths.
Specifically, a compression program constituted by the compression program Java. The database returns a first knowledge-graph composed of a plurality of paths to the Java compression program. The path contains entities and relationships. It is to be understood that the compression program is not limited to Java program, and may also be a program developed by programs such as c++, python, etc. to initiate a knowledge graph query request to the graph database, and record query conditions such as a graph query start point, a traversal depth, etc.
It should be noted that, the first knowledge graph is the knowledge graph to be obtained in the knowledge graph query request, but because the first knowledge graph is the original knowledge graph in the graph database, the first knowledge graph contains more repeated entities and relationships, and the digital display shows obvious data expansion, the knowledge graph can influence the subsequent transmission and calculation processing efficiency of the knowledge graph data. Therefore, the knowledge-graph data needs to be compressed.
S300: and removing repeated data of the entity and the relation in the first knowledge-graph data according to the compression program to obtain a second knowledge-graph.
The second knowledge graph represents the entity set and the relation set after the duplication is removed.
Specifically, the obtained first knowledge-graph data is correspondingly compressed by utilizing a compression program, repeated contents in the entity are deleted in the compression process, and repeated contents in the relation are deleted. The deletion step includes S301 to S303.
As shown in fig. 3, S301: the compression program traverses the first knowledge-graph data to obtain entities and relations in each path.
The compression program traverses the first knowledge graph data, and can also traverse the first knowledge graph data according to the traversing depth required by the query condition in the acquired knowledge graph query request, and the entity and the relation in each path are obtained through the traversal of the first knowledge graph.
S302: entity and relationship are de-duplicated according to the vertex and edge of the primary key field value, so that each entity and relationship is kept in one part.
As shown in fig. 4, in this embodiment, the first knowledge graph is de-duplicated according to vertex vertex_id and edge_id. Illustratively, in the graph database, using vertex1 as a starting point, searching all neighbor entities and relations within 2 degrees, wherein neighbors within 2 degrees adjacent to vertex1 are vertex2 and vertex3 respectively, the relation between vertex1 and vertex2 is edge1_2, the relation between vertex1 and vertex3 is edge1_3, and the relation between vertex2 and vertex3 is edge2_3. All paths comprise Path1, path2 and Path3, specifically, path1 Path is vertex1 and vertex2 relation is Edge1_2; path2 paths are in turn vertex1 to vertex2, the relationship is Edge1_2, and the relationship between vertex2 and vertex3 is Edge2_3; path3 has a Path of vertex1 and a relationship of vertex3 of Edge1_3.
It should be noted that, the compression program is not limited to Java program, and other programs (such as c++, python, etc.) may be used to perform deduplication on the entities and relationships returned by the graph database according to the primary key value, so as to complete the compression operation of the graph data. It is also understood that the method is not limited to performing deduplication according to the primary key, and may also be used to perform deduplication on entities and relationships returned by the graph database by using all attribute values or md5 values of each piece of data. The present embodiment is not limited.
S303: and establishing an entity data set and a relation establishing relation data set for all the entity after the duplication removal.
And removing repeated contents according to the entity and the relationship obtained in the previous contents, and establishing an entity data set and a relationship data set.
In this embodiment, the repeated data in the first knowledge graph is removed by using the compression program, and only one copy is reserved for respectively establishing an entity data Set and a relationship data Set, where the entity data Set may be represented as Set < Vertex >, and the relationship data Set may be represented as Set < Edge >.
After removing the repeated entities and relationships, the user needs to restore the relationships when presenting the relationships to the user, so that the user can query the knowledge graph data, and the method is realized through the following steps.
S400: and processing the second knowledge graph according to the query condition of the acquired knowledge graph query request.
According to the query condition of recording the knowledge spectrum in the knowledge spectrum query request obtained in step S100, for example, when the knowledge spectrum query request is sent, the returned data such as the transmission mode and the calculation processing mode have corresponding requirements, so that the returned knowledge spectrum can be conveniently used, that is, the transmission and the calculation processing required by the service are performed.
Specifically, the step of processing the second knowledge-graph includes S401 and S402.
As shown in fig. 5, S401: and determining the required transmission and processing modes according to the query conditions of the acquired knowledge graph query request.
The query conditions for acquiring the knowledge graph query request need to include a returned transmission mode and a processing mode, wherein the transmission mode is a mode adopted by transmitting data on a channel, and the processing mode is a process of extracting valuable information from an entity data set and a relationship data set, namely converting the data into information.
S402: and according to the transmission and processing modes, carrying out attribute marking and data filtering on the entities and the relations in the second knowledge graph.
Further, as shown in fig. 6, the transmission and processing modes represent that attribute marking and data filtering are performed on the entities and relationships in the entity data set and the relationship data set, and by the mode, positioning and searching can be performed through the attributes of the entities and the relationships, and data filtering adjustment is set to screen out unused entities and relationships.
In this embodiment, compared with the first knowledge-graph data, the second knowledge-graph formed by the transmission and processing method can obviously reduce the consumption of hardware resources such as network, CPU and memory.
S500: and reconstructing the entity and relation information of the first knowledge graph according to the second knowledge graph by the decompression program.
Before the entity and relation information of the first knowledge graph are restored by using the decompression program, a graph object is constructed, and the graph object is expressed as org.
Specifically, the step of constructing the graph object includes S501 and S502.
As shown in fig. 7, S501: and acquiring a second knowledge graph subjected to attribute marking and data filtering.
And obtaining entity data sets and relationship data sets in the second knowledge graph correspondingly according to the attribute marks and the data filtering.
S502: and reconstructing the entity and relation information of the first knowledge graph according to a decompression program, wherein the decompression program is expressed as a jgrapht graph algorithm library.
And reconstructing the entity and relation information of the first knowledge graph by utilizing a jgraph graph algorithm library to form a graph object org.jgraph graph, so that the graph object org.jgraph is taken as information on the path when the path is restored in the subsequent step.
The decompression program is not limited to the jgrapht graph algorithm library, and other graph algorithm libraries (e.g., guava com. Google. Common. Graph, apache commons graph, etc.) may be used to construct graph objects from the compressed graph data.
S600: and according to the compression program and the query condition of recording the knowledge graph, analyzing all paths of the entity and the relation of the first knowledge graph to form the first knowledge graph.
And (2) restoring (reconstructing) the first knowledge graph according to the obtained graph object org. Jgraph. Graph and the query condition of the knowledge graph query request obtained in the step S100 so as to be convenient for a requester to use.
Specifically, the path information of the entity and the relationship of the first knowledge graph is analyzed through a jgrapht graph algorithm library, query conditions of the recorded knowledge graph and the entity and the relationship information of the first knowledge graph.
The method is not limited to the jgrapht graph algorithm library, and other graph algorithm libraries (e.g., guava com. Google. Common. Graph, apache commons graph, etc.) may be used to search the original graph for conditions (start point, depth of traversal) and extract path information from the compressed graph object.
It should be noted that, one knowledge graph includes an entity, a relationship, and a path, the entity and the relationship are obtained in step S500, the path is obtained in step S600, and the obtained entity, relationship, and path are combined to restore (reconstruct) the first knowledge graph, that is, the knowledge graph that the query requester wants to obtain through the step S100.
Examples
As shown in fig. 8, the application provides a knowledge-graph data compression and decompression method, which includes the following steps:
step one: and initiating a knowledge graph query request to a graph database by using a Java program, and recording query conditions such as a graph query starting point, a traversal depth and the like.
Step two: the graph database returns the original knowledge-graph data (first knowledge-graph) result composed of a plurality of paths to the Java program.
Step three: and traversing the original knowledge graph result data obtained in the last step by using the Java application program, taking out entity data Vertex and relationship data Edge in each path, performing duplication removal operation according to the main key field values vertex_id and edge_id, and reserving one part for each entity and relationship to obtain a duplication-removed entity data Set < Vertex > and a relationship data Set < Edge > (second knowledge graph).
Step four: and carrying out transmission and calculation processing required by the service, such as attribute marking, data filtering and the like on the entity data Set < Vertex > and the relation data Set < Edge > of the second knowledge graph obtained by compression in the previous step, and obviously reducing the consumption of hardware resources such as a network, a CPU, a memory and the like compared with the direct processing of the original path data.
Step five: and reconstructing an object org.jgraph.graph in a memory by using a jgraph graph algorithm library to Set < Vertex > of entity data processed in the previous step and Set < Edge > of relation data.
Step six: and (3) using a jgrapht graph algorithm library to combine query conditions such as a graph query starting point, a traversal depth and the like recorded in the step one, and re-analyzing all path information through a graph object org.jgrapht.graph constructed in the previous step.
The second aspect of the present application provides a knowledge-graph data compression and decompression system, which is characterized by comprising the knowledge-graph data compression and decompression method, wherein the system comprises:
the acquisition unit is used for acquiring a knowledge graph query request, sending the query request to the graph database and recording the query condition of the knowledge graph;
the compression unit is used for compressing the program to respond to the returned result of the graph database to form a first knowledge graph, wherein the first knowledge graph comprises knowledge graph data composed of entities and relations and formed by a plurality of paths;
the duplicate removal unit is used for removing duplicate data of the entities and the relations in the first knowledge-graph data according to the compression program to obtain a second knowledge-graph, wherein the second knowledge-graph represents a entity set and a relation set after duplicate removal;
the processing unit is used for processing the second knowledge graph according to the query condition of the acquired knowledge graph query request;
the decompression unit is used for reconstructing the entity and relation information of the first knowledge graph according to the second knowledge graph by the decompression program;
and the reduction unit is used for analyzing all paths of the entity and the relation of the first knowledge graph according to the compression program and the query condition of recording the knowledge graph to form the first knowledge graph.
A third aspect of the present application provides a computer device, including a memory and a processor, where the memory stores a computer program, and the method is characterized in that the processor implements the steps of a knowledge-graph data compression and decompression method described above when executing the computer program.
A fourth aspect of the present application provides a computer storage medium having a computer program stored thereon, wherein the computer program when executed by a processor implements the steps of a knowledge-graph data compression and decompression method as described above.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention.

Claims (10)

1. The knowledge-graph data compression and decompression method is characterized by comprising the following steps of:
acquiring a knowledge graph query request, sending the query request to a graph database, and recording query conditions of the knowledge graph;
the compression program responds to a return result of the graph database to form a first knowledge graph, wherein the first knowledge graph comprises knowledge graph data composed of entities and relations and formed by a plurality of paths;
removing repeated data of entities and relations in the first knowledge-graph data according to the compression program to obtain a second knowledge-graph, wherein the second knowledge-graph represents a entity set and a relation set after repeated removal;
processing the second knowledge graph according to the query condition of the acquired knowledge graph query request;
reconstructing the entity and relation information of the first knowledge graph according to the second knowledge graph by a decompression program;
and according to the compression program and the query condition of the recorded knowledge graph, analyzing all paths of the entity and the relation of the first knowledge graph to form the first knowledge graph.
2. The method according to claim 1, wherein the step of obtaining a knowledge-graph query request, sending the query request to a graph database, and recording query conditions of the knowledge-graph, the query conditions comprising: the query origin and traversal depth of the knowledge graph.
3. A knowledge-graph data compression and decompression method according to claim 1, wherein the compression program is responsive to the return result of the graph database, and the compression program comprises a compression program made of Java.
4. The method for compressing and decompressing knowledge-graph data according to claim 1, wherein the step of removing the repeated data of the entities and the relationships in the first knowledge-graph data according to the compressing program to obtain the second knowledge-graph comprises:
traversing the first knowledge-graph data by the compression program to acquire entities and relations in each path;
de-weighting the entities and the relations according to the vertexes and edges of the primary key field values so that each entity and each relation are reserved in one part;
and establishing an entity data set and a relation establishing relation data set for all the entity after the duplication removal.
5. The method for compressing and decompressing data of a knowledge-graph according to claim 1, wherein the step of processing the second knowledge-graph according to a query condition of a query request for obtaining the knowledge-graph comprises:
determining a required transmission and processing mode according to the query condition of the acquired knowledge graph query request;
and according to the transmission and processing modes, carrying out attribute marking and data filtering on the entities and the relations in the second knowledge graph.
6. The method according to claim 5, wherein the step of reconstructing the entity and relationship information of the first knowledge-graph from the second knowledge-graph according to the decompression program comprises:
acquiring a second knowledge graph subjected to attribute marking and data filtering;
and reconstructing the entity and relation information of the first knowledge graph according to the decompression program, wherein the decompression program represents a jgrapht graph algorithm library.
7. The method for compressing and decompressing data of a first knowledge-graph according to claim 6, wherein the step of analyzing all paths of entities and relationships of the first knowledge-graph according to the compressing procedure and the query condition of recording the knowledge-graph, and forming the first knowledge-graph comprises:
and analyzing path information of the entity and the relationship of the first knowledge graph through the jgrapht graph algorithm library, the query condition of the recorded knowledge graph and the entity and the relationship information of the first knowledge graph.
8. A knowledge-graph data compression and decompression system, characterized by comprising a knowledge-graph data compression and decompression method according to any one of claims 1-7, the system comprising:
the acquisition unit is used for acquiring a knowledge graph query request, sending the query request to a graph database and recording the query condition of the knowledge graph;
the compression unit is used for responding to the return result of the graph database by the compression program to form a first knowledge graph, wherein the first knowledge graph comprises knowledge graph data composed of entities and relations and formed by a plurality of paths;
the duplicate removal unit is used for removing duplicate data of the entities and the relations in the first knowledge-graph data according to the compression program to obtain a second knowledge-graph, wherein the second knowledge-graph represents a entity set and a relation set after duplicate removal;
the processing unit is used for processing the second knowledge graph according to the query condition of the acquired knowledge graph query request;
the decompression unit is used for reconstructing the entity and relation information of the first knowledge graph according to the second knowledge graph by the decompression program;
and the reduction unit is used for analyzing all paths of the entity and the relation of the first knowledge graph according to the compression program and the query condition of recording the knowledge graph to form the first knowledge graph.
9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of a knowledge-graph data compression and decompression method according to any one of claims 1 to 7.
10. A computer storage medium having stored thereon a computer program, which when executed by a processor realizes the steps of a knowledge-graph data compression and decompression method according to any one of claims 1 to 7.
CN202211682921.1A 2022-12-27 2022-12-27 Knowledge-graph data compression and decompression method and system Pending CN116186280A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211682921.1A CN116186280A (en) 2022-12-27 2022-12-27 Knowledge-graph data compression and decompression method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211682921.1A CN116186280A (en) 2022-12-27 2022-12-27 Knowledge-graph data compression and decompression method and system

Publications (1)

Publication Number Publication Date
CN116186280A true CN116186280A (en) 2023-05-30

Family

ID=86441362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211682921.1A Pending CN116186280A (en) 2022-12-27 2022-12-27 Knowledge-graph data compression and decompression method and system

Country Status (1)

Country Link
CN (1) CN116186280A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515968A (en) * 2019-08-30 2019-11-29 北京百度网讯科技有限公司 Method and apparatus for output information
CN113868254A (en) * 2021-09-28 2021-12-31 北京百度网讯科技有限公司 Method, device and storage medium for removing duplication of entity node in graph database
CN114020934A (en) * 2022-01-05 2022-02-08 深圳市其域创新科技有限公司 Method and system for integrating spatial semantic information based on knowledge graph
US20220335270A1 (en) * 2021-04-15 2022-10-20 International Business Machines Corporation Knowledge graph compression
CN115510932A (en) * 2021-06-07 2022-12-23 中移(成都)信息通信科技有限公司 Model training method and device, electronic equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110515968A (en) * 2019-08-30 2019-11-29 北京百度网讯科技有限公司 Method and apparatus for output information
US20220335270A1 (en) * 2021-04-15 2022-10-20 International Business Machines Corporation Knowledge graph compression
CN115510932A (en) * 2021-06-07 2022-12-23 中移(成都)信息通信科技有限公司 Model training method and device, electronic equipment and storage medium
CN113868254A (en) * 2021-09-28 2021-12-31 北京百度网讯科技有限公司 Method, device and storage medium for removing duplication of entity node in graph database
CN114020934A (en) * 2022-01-05 2022-02-08 深圳市其域创新科技有限公司 Method and system for integrating spatial semantic information based on knowledge graph

Similar Documents

Publication Publication Date Title
CN109034993B (en) Account checking method, account checking equipment, account checking system and computer readable storage medium
CN107832406B (en) Method, device, equipment and storage medium for removing duplicate entries of mass log data
AU2021200046A1 (en) Approaches for knowledge graph pruning based on sampling and information gain theory
CN103548003B (en) Method and system for improving the client-side fingerprint cache of deduplication system backup performance
JP6047017B2 (en) Pattern extraction apparatus and control method
US20140354649A1 (en) Distributed k-core view materialization and maintenance for graphs
US11907659B2 (en) Item recall method and system, electronic device and readable storage medium
WO2020211393A1 (en) Written judgment information retrieval method and device, computer apparatus, and storage medium
CN113448935B (en) Method, electronic device and computer program product for providing log information
CN112527843B (en) Data query method, device, terminal equipment and storage medium
Ali et al. A review of digital forensics methods for JPEG file carving
US20160314141A1 (en) Compression-based filtering for deduplication
CN110866881A (en) Image processing method and device, storage medium and electronic equipment
CN116778935A (en) Watermark generation, information processing and audio watermark generation model training method and device
KR20140000369A (en) Forensic analysis method and system for document files
WO2021082926A1 (en) Data compression method and apparatus
CN116186280A (en) Knowledge-graph data compression and decompression method and system
CN113761059A (en) Data processing method and device
CN111880964A (en) Method and system for provenance-based data backup
CN116383644A (en) Data enhancement method, device, equipment and storage medium
CN112487943B (en) Key frame de-duplication method and device and electronic equipment
US9838032B2 (en) Data compression device, data compression method, and computer program product
CN105512232A (en) Data storage method and device
KR102141411B1 (en) The content based clean cloud systems and method
US11188501B1 (en) Transactional and batch-updated data store search

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination