CN111949810A - Data processing method and device based on graph database - Google Patents

Data processing method and device based on graph database Download PDF

Info

Publication number
CN111949810A
CN111949810A CN201910403525.2A CN201910403525A CN111949810A CN 111949810 A CN111949810 A CN 111949810A CN 201910403525 A CN201910403525 A CN 201910403525A CN 111949810 A CN111949810 A CN 111949810A
Authority
CN
China
Prior art keywords
data
processed
relation
graph database
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910403525.2A
Other languages
Chinese (zh)
Inventor
曾智嵘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hitachi Ltd
Original Assignee
Hitachi Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hitachi Ltd filed Critical Hitachi Ltd
Priority to CN201910403525.2A priority Critical patent/CN111949810A/en
Publication of CN111949810A publication Critical patent/CN111949810A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a data processing method and device based on a graph database, and belongs to the technical field of data processing. The data processing method based on the graph database comprises the following steps: acquiring data to be processed of a graph database; calculating the relation depth of the data to be processed; and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database. The technical scheme of the invention can improve the query efficiency of the graph database.

Description

Data processing method and device based on graph database
Technical Field
The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus based on a graph database.
Background
Nowadays, graph databases (graph databases) are becoming more popular, people use graph databases to support applications such as knowledge maps and images, and graph databases may also be referred to as graph-oriented databases, where data is stored and queried in a data structure such as a "graph" in the graph databases, and data is represented and stored by means of nodes, edges, attributes and the like, and operations such as addition, deletion, modification and query are supported.
In the prior art, data to be stored is semantically imported into a graph database, but some data are not suitable for semantically storing, so that the query efficiency of the graph database is reduced.
Disclosure of Invention
The invention aims to provide a data processing method and device based on a graph database, which can improve the query efficiency of the graph database.
To solve the above technical problem, embodiments of the present invention provide the following technical solutions:
in one aspect, a method for data processing based on a graph database is provided, including:
acquiring data to be processed of a graph database;
calculating the relation depth of the data to be processed;
and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database.
Further, still include:
and when the relation depth of the data to be processed is not greater than the preset relation threshold, storing the data to be processed in a relation database or a key value database.
Further, the calculating the relation depth of the data to be processed includes:
searching the associated data associated with the data to be processed;
respectively calculating the shortest path between the data to be processed and each associated data;
and determining a maximum value from all the calculated shortest paths as the relation depth of the data to be processed.
Further, the data to be processed includes at least one of:
data to be stored in the graph database;
data directly associated with data to be deleted in the graph database.
Further, after the data to be processed is semantically converted and stored in the graph database, the method further comprises:
acquiring modified data of the data to be processed;
and semanticizing the modified data and storing the semanticized modified data in the graph database.
Further, the method further comprises the step of setting the relation threshold, wherein the step of setting the relation threshold comprises the following steps:
determining a value interval of the relation threshold;
selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer;
and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.
Further, the method further comprises a step of determining an upper limit of the value interval, wherein the step of determining the upper limit of the value interval comprises:
selecting data, namely selecting M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2;
a storage step, wherein the M data are respectively stored by L different types of databases, wherein the L databases comprise a database, and L is an integer greater than 1;
a testing step, namely selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query duration sums, and determining the minimum query duration sum and the corresponding database;
and repeating the test steps, and when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are graph databases, determining that the upper limit of the value interval is D2.
An embodiment of the present invention further provides a data processing apparatus based on a graph database, including:
the acquisition module is used for acquiring data to be processed of the graph database;
the calculation module is used for calculating the relation depth of the data to be processed;
and the storage module is used for semanticizing the data to be processed and storing the data to be processed in the graph database when the relation depth of the data to be processed is greater than a preset relation threshold value.
Further, the storage module is further configured to store the data to be processed in a relational database or a key-value database when the relation depth of the data to be processed is not greater than the preset relation threshold.
Further, the calculation module includes:
the searching unit is used for searching the associated data associated with the data to be processed;
the shortest path calculating unit is used for calculating the shortest path between the data to be processed and each piece of associated data respectively;
and the determining unit is used for determining a maximum value from all the calculated shortest paths as the relation depth of the data to be processed.
Further, the data to be processed includes at least one of:
data to be stored in the graph database;
data directly associated with data to be deleted in the graph database.
Further, the obtaining module is further configured to obtain modification data of the data to be processed;
the storage module is further used for storing the modified data in the graph database after semantization.
Further, still include:
the relation threshold setting module is used for determining a value interval of the relation threshold; selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer; and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.
Further, the device also comprises a value interval upper limit determining module, wherein the value interval upper limit determining module comprises:
the data selecting unit is used for selecting M data, the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2;
a storage unit, configured to store the M data in L different types of databases, respectively, where the L databases include a graph database;
the testing unit is used for selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query time length sums, and determining the minimum query time length and the corresponding database;
and the determining unit is used for determining that the upper limit of the value interval is D2 when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are both graph databases.
An embodiment of the present invention further provides a data processing device based on a graph database, including:
a processor; and
a memory having computer program instructions stored therein,
wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of the graph database based data processing method as described above.
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor is caused to execute the steps in the data processing method based on the graph database.
The embodiment of the invention has the following beneficial effects:
according to the scheme, after the data to be processed is received, the data to be processed is judged according to the relation depth of the data to be processed, whether the data to be processed is suitable for semantization storage is judged, when the relation depth of the data to be processed is larger than a preset relation threshold value, the data to be processed is judged to be suitable for semantization storage, the data to be processed is semantized and stored in a database, and therefore the data which are not suitable for semantization storage can be prevented from being semantically processed and stored in the database, and query efficiency of the database can be guaranteed.
Drawings
FIG. 1 is a schematic flow chart diagram illustrating a data processing method based on a graph database according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating relationships between data;
FIG. 3 is a flowchart illustrating setting of a relationship threshold according to an embodiment of the present invention;
FIG. 4 is a block diagram of a graph database based data processing apparatus according to an embodiment of the present invention;
FIG. 5 is a block diagram of a graph database based data processing apparatus according to an embodiment of the present invention;
FIG. 6 is a flowchart illustrating adding data to a graph database according to an embodiment of the present invention;
FIG. 7 is a schematic flow chart of modifying data in a graph database according to an embodiment of the present invention;
FIG. 8 is a flowchart illustrating deletion of data in a graph database according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved by the embodiments of the present invention clearer, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a data processing method and device based on a graph database, which can improve the query efficiency of the graph database.
Example one
An embodiment of the present invention provides a data processing method based on a graph database, as shown in fig. 1, including:
step 101: acquiring data to be processed of a graph database;
wherein, the data to be processed of the graph database comprises at least one of the following data: data to be stored in a graph database; data directly associated with data to be deleted in a graph database.
The data to be processed may be in the form of text, such as "yao ming in shanghai, china".
Step 102: calculating the relation depth of the data to be processed;
specifically, the associated data associated with the data to be processed may be searched, the shortest paths between the data to be processed and each associated data are respectively calculated, and a maximum value is determined from all the calculated shortest paths as the depth of relationship of the data to be processed.
As shown in fig. 2, taking the data to be processed as a as an example, the data associated with the data to be processed a includes B, C and D, where the shortest path between a and D is 1, the shortest path between a and B is 1, and the shortest path between a and C is 2, it may be determined that the depth of relationship of the data to be processed as a is the maximum value among 1, and 2, that is, the depth of relationship of the data to be processed as a is 2.
Step 103: and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database.
Specifically, semantization of the data to be processed is to convert the text into a triple, which can be understood as (entity, entity relation, entity) such as "Yao Ming originated in Shanghai of china" can be represented by a triple (Yao Ming, PlaceOfBirth, Shanghai). Entities are used as nodes, entity relations (including attributes, categories and the like) are used as edges between the nodes, and the triples are stored in a graph database in a data structure of a graph.
In the embodiment, after the data to be processed is received, the data to be processed is judged according to the relation depth of the data to be processed, whether the data to be processed is suitable for semantization storage is judged, when the relation depth of the data to be processed is larger than a preset relation threshold value, the data to be processed is judged to be suitable for semantization storage, and the data to be processed is semantized and stored in a database, so that the data which is not suitable for semantization storage can be prevented from being semantically processed and stored in the database, and the query efficiency of the database can be ensured.
When the relation depth of the data to be processed is not greater than the preset relation threshold, the data to be processed is considered to be data unsuitable for semantization, and the data to be processed cannot be stored in the graph database after semantization, so that the query efficiency of the graph database is reduced, and the data to be processed needs to be stored in other types of databases, such as a relation database or a key value database.
As shown in table 1, when the depth of relationship of data is 1, under the same query condition, if the data is stored in the graph database after being semantically processed, the time length for querying the data is 0.01876s, and if the data is stored in the relational database, the time length for querying the data is 0.00078 s; when the relation depth of the data is 2, under the same query condition, if the data is stored in a graph database after semantization, the time length for querying the data is 0.002304s, and if the data is stored in a relation database, the time length for querying the data is 0.00556 s; when the relation depth of the data is 3, under the same query condition, if the data is stored in a graph database after semantization, the time length for querying the data is 0.03092s, and if the data is stored in a relation database, the time length for querying the data is 0.20788 s; when the relation depth of the data is 4, under the same query condition, if the data is stored in the graph database after being semantically processed, the time length for querying the data is 0.04865s, and if the data is stored in the relation database, the time length for querying the data is 9.89270 s. Wherein, the query conditions are as follows: the number M of query data is 100000, and the query is repeated 10 times. It can be seen that when the depth of relationship of the data is larger, the data is more suitable for being semantically stored in the graph database, and when the depth of relationship of the data is smaller, the data is more suitable for being stored in other types of databases.
TABLE 1
Depth of relationship Graph database(s) Relational database(s)
1 0.01876 0.00078
2 0.02304 0.00556
3 0.03092 0.20788s
4 0.04865 9.89270
Further, after the data to be processed is stored in the graph database, when modified data of the data to be processed is received, the relationship depth of the modified data does not need to be judged, and the modified data is directly stored in the graph database after being semantically processed. For example, if "yaoming is from shanghai in china" is modified to "yaoming is from beijing in china", then "yaoming is from beijing in china" is directly semantically stored in a map database to replace the original data.
When data in a graph database needs to be deleted, data directly associated with the data to be deleted in the graph database needs to be searched first, wherein direct association means that a path between the data is 1, and as shown in fig. 2, the data directly associated with the data A are data B and data D; the method comprises the steps of judging the relation depth of data directly related to data to be deleted, judging that the data directly related to the data to be deleted is still data suitable for semantic storage when the relation depth of the data directly related to the data to be deleted is larger than a preset relation threshold, not changing the data, judging that the data directly related to the data to be deleted is data unsuitable for semantic storage when the relation depth of the data directly related to the data to be deleted is not larger than the preset relation threshold, and moving the data directly related to the data to be deleted to other types of databases for storage after the data to be deleted is deleted.
In this embodiment, the value of the relationship threshold may be adaptively adjusted according to the reading condition of the whole data of the graph database, so as to optimize the query efficiency of the graph database. Specifically, the setting of the relationship threshold may be performed periodically, or may be performed after receiving the trigger instruction. The step of setting the relationship threshold may comprise: determining a value interval of the relation threshold; selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer; and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.
In a specific example, as shown in fig. 3, the step of setting the relationship threshold includes:
step 201: determining the value interval of the relation threshold value to be 2-Dmax;
since the query efficiency of the graph database is low when the relation depth of the data is 1, it is not necessary to consider using the graph database when the relation depth is 1, and therefore, the minimum value of the value range is set to 2 excluding the case where D is 1.
Step 202: selecting a relation threshold value test value Di from the value-taking interval;
step 203: reading data in the graph database for N times by taking the relation threshold value Di as the relation depth, and calculating the time and Ti required by N times of reading;
step 204: repeating the step 202 and 203 to obtain time T2, T3, … and TDmax corresponding to each relation threshold test value in the value-taking interval;
step 205: and determining a relation threshold test value corresponding to the minimum time sum as a relation threshold D.
The upper limit Dmax of the value interval of the relationship threshold can be determined according to the performance test result of each database. The step of determining the upper limit of the value interval comprises the following steps: selecting data, namely selecting M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2; a storage step, wherein the M data are respectively stored by L different types of databases, wherein the L databases comprise a database, and L is an integer greater than 1; a testing step, namely selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query duration sums, and determining the minimum query duration sum and the corresponding database; and repeating the test steps, and when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are graph databases, determining that the upper limit Dmax of the value interval is D2.
In a specific example, the data is stored in a database, a relational database and a key value database, and each database is subjected to performance test. The specific performance test method is as follows: selecting fixed M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, the selection principle of the M is that the difference of query efficiency can be reflected obviously, and the value of D1 is larger than or equal to Dmax; respectively storing the M data by using a graph database, a relational database and a key value database, respectively querying all the M data by using different relational depths (2-D1), recording the query efficiency of each database when querying by using each relational depth, and comparing the query efficiencies of different databases. For example, when the depth of the relationship is 2, the query efficiency of the relational database is optimal; when the depth of relationship is 3, the query efficiency of the key-value database is optimal; when the relation depth is greater than or equal to 4, the query efficiency of the graph database is optimal, and then, the Dmax is determined to be equal to 4, namely, the Dmax is the minimum relation depth when the query efficiency of the graph database is greater than that of other databases obtained after the performance test is performed on each database.
The technical scheme of the embodiment can be applied to an edge data integration scene, a typical requirement of edge data integration is real-time processing, and when edge intelligent analysis is provided through semantic processing, logical reasoning and the like, the data query time can be shortened by using the scheme of the embodiment, so that the real-time processing capability is improved. The technical scheme of the embodiment can also be applied to scenes with higher requirements on intelligence and real time, such as a public security real-time monitoring scene, and can provide technical means in the aspects of quickly positioning suspects, preventing public safety incidents and the like; in addition, the technical scheme of the embodiment can also be applied to an intelligent manufacturing field, so that the error reason can be quickly positioned, and the loss possibly caused by production halt can be reduced. In addition, in the application of cloud big data, the technical scheme of the embodiment can shorten the data query time, so that more data can be processed in unit time, and the asset utilization rate of the data center is improved.
Example two
An embodiment of the present invention further provides a data processing apparatus based on a graph database, as shown in fig. 4, including:
an obtaining module 21, configured to obtain to-be-processed data of a graph database;
a calculating module 22, configured to calculate a depth of relationship of the to-be-processed data;
the storage module 23 is configured to semantically store the to-be-processed data in the graph database when the relationship depth of the to-be-processed data is greater than a preset relationship threshold.
In the embodiment, after the data to be processed is received, the data to be processed is judged according to the relation depth of the data to be processed, whether the data to be processed is suitable for semantization storage is judged, when the relation depth of the data to be processed is larger than a preset relation threshold value, the data to be processed is judged to be suitable for semantization storage, and the data to be processed is semantized and stored in a database, so that the data which is not suitable for semantization storage can be prevented from being semantically processed and stored in the database, and the query efficiency of the database can be ensured.
Further, the storage module 23 is further configured to store the data to be processed in a relational database or a key-value database when the relation depth of the data to be processed is not greater than the preset relation threshold.
The obtaining module 21 is specifically configured to obtain to-be-processed data of a graph database, and includes at least one of the following: data to be stored in a graph database; data directly associated with data to be deleted in the graph database; as shown in fig. 4, after the obtaining module 21 receives the data to be processed, the data to be processed is sent to the calculating module 22, the calculating module 22 calculates the depth of relationship of the data to be processed, and tags the data to be processed according to the depth of relationship of the data to be processed, for example, when the depth of relationship of the data to be processed is greater than a preset threshold value of relationship, the data to be processed is judged to be data suitable for semantic storage, and "semantic storage tag" is marked for the data to be processed, when the depth of relationship of the data to be processed is not greater than the preset threshold value of relationship, the data to be processed is judged to be data unsuitable for semantic storage, and "non-semantic storage tag" is marked for the data to be processed, the data to be processed with tag is sent to the storing module 23, the storing module 23 is connected to the database, and the data to be processed is stored in the database or other types of databases according to the tag of the data to be processed, specifically, the data to be processed with the "semantic storage tag" is stored in the graph database, and the data to be processed with the "non-semantic storage tag" is stored in other types of databases, such as a relational database and a key value database.
Further, the calculation module 22 includes:
the searching unit is used for searching the associated data associated with the data to be processed;
the shortest path calculating unit is used for calculating the shortest path between the data to be processed and each piece of associated data respectively;
and the determining unit is used for determining a maximum value from all the calculated shortest paths as the relation depth of the data to be processed.
For the newly added data to be stored, the calculating module 22 determines the depth of relationship of the newly added data to be stored, determines whether the data to be stored is suitable for semantic storage, and outputs the determination result to the storing module 23.
When data in the graph database needs to be deleted, the obtaining module 21 needs to search data in the graph database, which is directly associated with the data to be deleted, first, where the direct association means that a path between the data is 1, and as shown in fig. 2, the data directly associated with the data a are data B and data D; the calculation module 22 determines the relation depth of the data directly associated with the data to be deleted, when the relation depth of the data directly associated with the data to be deleted is greater than a preset relation threshold, it is determined that the data directly associated with the data to be deleted is still data suitable for semantic storage, when the relation depth of the data directly associated with the data to be deleted is not greater than the preset relation threshold, the calculation module 22 determines that the data directly associated with the data to be deleted is data unsuitable for semantic storage, and after the data to be deleted is deleted, the storage module 23 needs to move the data directly associated with the data to be deleted to other types of databases for storage.
Further, after the data to be processed is stored in the graph database, when modified data of the data to be processed is received, the relationship depth of the modified data does not need to be judged, and the modified data is directly stored in the graph database after being semantically processed. The obtaining module 21 is further configured to obtain modified data of the data to be processed; the storage module 23 is further configured to semantically store the modified data in the map database. For example, if "yaoming is from shanghai in china" is modified to "yaoming is from beijing in china", then "yaoming is from beijing in china" is directly semantically stored in a map database to replace the original data.
In this embodiment, the value of the relationship threshold may be adaptively adjusted according to the reading condition of the whole data of the graph database, so as to optimize the query efficiency of the graph database. Specifically, the setting of the relationship threshold may be performed periodically, or may be performed after receiving the trigger instruction.
Further, the apparatus further comprises:
the relation threshold setting module is used for determining a value interval of the relation threshold; selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer; and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.
Further, the device also includes a value interval upper limit determining module, where the value interval upper limit determining module includes:
the data selecting unit is used for selecting M data, the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2;
a storage unit, configured to store the M data in L different types of databases, respectively, where the L databases include a graph database;
the testing unit is used for selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query time length sums, and determining the minimum query time length and the corresponding database;
and the determining unit is used for determining that the upper limit of the value interval is D2 when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are both graph databases.
In a specific example, the data is stored in a database, a relational database and a key value database, and each database is subjected to performance test. The specific performance test method is as follows: selecting fixed M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, the selection principle of the M is that the difference of query efficiency can be reflected obviously, and the value of D1 is larger than or equal to Dmax; respectively storing the M data by using a graph database, a relational database and a key value database, respectively querying all the M data by using different relational depths (2-D1), recording the query efficiency of each database when querying by using each relational depth, and comparing the query efficiencies of different databases. For example, when the depth of the relationship is 2, the query efficiency of the relational database is optimal; when the depth of relationship is 3, the query efficiency of the key-value database is optimal; when the relation depth is greater than or equal to 4, the query efficiency of the graph database is optimal, and then, the Dmax is determined to be equal to 4, namely, the Dmax is the minimum relation depth when the query efficiency of the graph database is greater than that of other databases obtained after the performance test is performed on each database.
EXAMPLE III
An embodiment of the present invention further provides a data processing device 30 based on a graph database, as shown in fig. 5, including:
a processor 32; and
a memory 34, in which memory 34 computer program instructions are stored,
wherein the computer program instructions, when executed by the processor, cause the processor 32 to perform the steps of:
acquiring data to be processed of a graph database;
calculating the relation depth of the data to be processed;
and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database.
In the embodiment, after the data to be processed is received, the data to be processed is judged according to the relation depth of the data to be processed, whether the data to be processed is suitable for semantization storage is judged, when the relation depth of the data to be processed is larger than a preset relation threshold value, the data to be processed is judged to be suitable for semantization storage, and the data to be processed is semantized and stored in a database, so that the data which is not suitable for semantization storage can be prevented from being semantically processed and stored in the database, and the query efficiency of the database can be ensured.
Further, as shown in FIG. 5, the graph database-based data processing device 30 also includes a network interface 31, an input device 33, a hard disk 35, and a display device 36.
The various interfaces and devices described above may be interconnected by a bus architecture. A bus architecture may be any architecture that may include any number of interconnected buses and bridges. Various circuits of one or more Central Processing Units (CPUs), represented in particular by processor 32, and one or more memories, represented by memory 34, are coupled together. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like. It will be appreciated that a bus architecture is used to enable communications among the components. The bus architecture includes a power bus, a control bus, and a status signal bus, in addition to a data bus, all of which are well known in the art and therefore will not be described in detail herein.
The network interface 31 may be connected to a network (e.g., the internet, a local area network, etc.), and may obtain relevant data from the network, such as data to be processed in a graph database, and may store the relevant data in the hard disk 35.
The input device 33 can receive various commands input by the operator and send the commands to the processor 32 for execution. The input device 33 may comprise a keyboard or a pointing device (e.g., a mouse, a trackball, a touch pad or a touch screen, etc.
The display device 36 may display the results of the instructions executed by the processor 32.
The memory 34 is used for storing programs and data necessary for operating the operating system, and data such as intermediate results in the calculation process of the processor 32.
It will be appreciated that memory 34 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. The memory 34 of the apparatus and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
In some embodiments, memory 34 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof: an operating system 341 and application programs 342.
The operating system 341 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application 342 includes various applications, such as a Browser (Browser), and the like, for implementing various application services. A program implementing the method of an embodiment of the present invention may be included in the application 342.
The processor 32, when calling and executing the application program and data stored in the memory 34, may specifically obtain data to be processed from a graph database; calculating the relation depth of the data to be processed; and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database.
The methods disclosed in the above embodiments of the present invention may be implemented in the processor 32 or by the processor 32. The processor 32 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 32. The processor 32 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 34, and the processor 32 reads the information in the memory 34 and completes the steps of the method in combination with the hardware thereof.
It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.
For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Further, when the depth of relationship of the to-be-processed data is not greater than the preset relationship threshold, the processor 32 stores the to-be-processed data in a relationship database or a key value database.
Further, the processor 32 searches for associated data associated with the data to be processed;
respectively calculating the shortest path between the data to be processed and each associated data;
and determining a maximum value from all the calculated shortest paths as the relation depth of the data to be processed.
Further, the data to be processed includes at least one of:
data to be stored in the graph database;
data directly associated with data to be deleted in the graph database.
Further, the processor 32 obtains modification data of the data to be processed;
and semanticizing the modified data and storing the semanticized modified data in the graph database.
Further, the processor 32 determines a value interval of the relationship threshold; selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer; and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.
Further, processor 32 performs: selecting data, namely selecting M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2; a storage step, wherein the M data are respectively stored by L different types of databases, wherein the L databases comprise a database, and L is an integer greater than 1; a testing step, namely selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query duration sums, and determining the minimum query duration sum and the corresponding database; and repeating the test steps, and when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are graph databases, determining that the upper limit of the value interval is D2.
Example four
An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor is caused to execute the following steps:
acquiring data to be processed of a graph database;
calculating the relation depth of the data to be processed;
and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database.
In the embodiment, after the data to be processed is received, the data to be processed is judged according to the relation depth of the data to be processed, whether the data to be processed is suitable for semantization storage is judged, when the relation depth of the data to be processed is larger than a preset relation threshold value, the data to be processed is judged to be suitable for semantization storage, and the data to be processed is semantized and stored in a database, so that the data which is not suitable for semantization storage can be prevented from being semantically processed and stored in the database, and the query efficiency of the database can be ensured.
Further, the computer program, when executed by a processor, further causes the processor to perform the steps of:
and when the relation depth of the data to be processed is not greater than the preset relation threshold, storing the data to be processed in a relation database or a key value database.
Further, the computer program, when executed by a processor, further causes the processor to perform the steps of:
searching the associated data associated with the data to be processed;
respectively calculating the shortest path between the data to be processed and each associated data;
and determining a maximum value from all the calculated shortest paths as the relation depth of the data to be processed.
Further, the data to be processed includes at least one of:
data to be stored in the graph database;
data directly associated with data to be deleted in the graph database.
Further, the computer program, when executed by a processor, further causes the processor to perform the steps of:
acquiring modified data of the data to be processed;
and semanticizing the modified data and storing the semanticized modified data in the graph database.
Further, the computer program, when executed by a processor, further causes the processor to perform the steps of:
determining a value interval of the relation threshold;
selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer;
and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.
Further, the computer program, when executed by a processor, further causes the processor to perform the steps of:
selecting data, namely selecting M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2;
a storage step, wherein the M data are respectively stored by L different types of databases, wherein the L databases comprise a database, and L is an integer greater than 1;
a testing step, namely selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query duration sums, and determining the minimum query duration sum and the corresponding database;
and repeating the test steps, and when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are graph databases, determining that the upper limit of the value interval is D2.
EXAMPLE five
The data processing method based on a graph database according to the present invention is further described with reference to a specific embodiment, as shown in fig. 6, when receiving new data to be stored, the embodiment includes the following steps:
step 401: receiving data to be stored;
step 402: searching the associated data of the data to be stored;
step 403: determining the relation depth of the data to be processed according to the associated data;
specifically, the associated data associated with the data to be processed may be searched, the shortest paths between the data to be processed and each associated data are respectively calculated, and a maximum value is determined from all the calculated shortest paths as the depth of relationship of the data to be processed.
As shown in fig. 2, taking the data to be processed as a as an example, determining that the data associated with the data to be processed a includes B, C and D, where the shortest path between a and D is 1, the shortest path between a and B is 1, and the shortest path between a and C is 2, it may be determined that the depth of relationship of the data to be processed as a is the maximum value among 1, and 2, that is, the depth of relationship of the data to be processed as a is 2.
Step 404: judging whether the relation depth is greater than a preset relation threshold value, if so, turning to step 405, otherwise, turning to step 406;
step 405: setting a semantic storage tag of the data to be stored, and turning to step 407;
step 406: setting a non-semantic storage tag of the data to be stored, and turning to step 407;
step 407: the data to be stored with the semantic storage labels are stored in a database, and the data to be stored with the non-semantic storage labels are stored in other types of databases.
In the embodiment, data is classified according to the relation depth of the data, if data which is not suitable for semantic storage is found, the data is converted into non-semantic storage, and if data which is not suitable for non-semantic storage is found, the data is converted into semantic storage, so that the data query efficiency of a graph database can be improved.
EXAMPLE six
The method for processing data based on a graph database according to the present invention will be further described with reference to a specific embodiment, as shown in fig. 7, when modified data of data in the graph database is received, the embodiment includes the following steps:
step 501: receiving modification data;
step 502: and semanticizing the modified data and storing the modified data in a database.
After the data is stored in the graph database, when the modified data of the data is received, the relationship depth of the modified data does not need to be judged, and the modified data is directly stored in the graph database after semantization. For example, if "yaoming is from shanghai in china" is modified to "yaoming is from beijing in china", then "yaoming is from beijing in china" is directly semantically stored in a map database to replace the original data.
EXAMPLE seven
The method for processing data based on a graph database according to the present invention is further described with reference to a specific embodiment, as shown in fig. 8, when data in the graph database needs to be deleted, the embodiment includes the following steps:
step 601: searching data directly related to the data to be deleted in the graph database;
wherein the data directly associated with the data to be deleted is the data affected by the data to be deleted. Direct association means that the path between data is 1, and as shown in fig. 2, data directly associated with data a are data B and D.
Step 602: adding data directly associated with the data to be deleted into the data set;
step 603: calculating the relation depth of each data in the data set;
step 604: when the relation depth of data in the data set is larger than a preset relation threshold, the data is judged to be still suitable for semantic storage, when the relation depth of the data is not larger than the preset relation threshold, the data is judged to be not suitable for semantic storage, and the data is moved to other types of databases for storage.
The embodiment can monitor the change of data in the graph database, reclassify the affected data, convert the data which is not suitable for semantic storage into non-semantic storage if finding the data which is not suitable for semantic storage, and convert the data which is not suitable for non-semantic storage into semantic storage if finding the data which is not suitable for semantic storage, thereby improving the data query efficiency of the graph database.
The foregoing is a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should be construed as the protection scope of the present invention.

Claims (14)

1. A method for data processing based on a graph database, comprising:
acquiring data to be processed of a graph database;
calculating the relation depth of the data to be processed;
and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database.
2. A method for graph database based data processing according to claim 1, further comprising:
and when the relation depth of the data to be processed is not greater than the preset relation threshold, storing the data to be processed in a relation database or a key value database.
3. A method for graph database based data processing according to claim 1, wherein said calculating a depth of relationship for said data to be processed comprises:
searching the associated data associated with the data to be processed;
respectively calculating the shortest path between the data to be processed and each associated data;
and determining a maximum value from all the calculated shortest paths as the relation depth of the data to be processed.
4. A method for graph database based data processing according to claim 1 or 2, wherein said data to be processed comprises at least one of:
data to be stored in the graph database;
data directly associated with data to be deleted in the graph database.
5. A method for graph database based data processing according to claim 1, wherein after storing said data to be processed in said graph database after semantization, said method further comprises:
acquiring modified data of the data to be processed;
and semanticizing the modified data and storing the semanticized modified data in the graph database.
6. A method for graph database based data processing according to claim 1, further comprising the step of setting said relationship threshold, said step of setting said relationship threshold comprising:
determining a value interval of the relation threshold;
selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer;
and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.
7. A method as defined in claim 6, further comprising the step of determining an upper limit for said interval of values, said step of determining an upper limit for said interval of values comprising:
selecting data, namely selecting M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2;
a storage step, wherein the M data are respectively stored by L different types of databases, wherein the L databases comprise a database, and L is an integer greater than 1;
a testing step, namely selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query duration sums, and determining the minimum query duration sum and the corresponding database;
and repeating the test steps, and when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are graph databases, determining that the upper limit of the value interval is D2.
8. A graph database-based data processing apparatus, comprising:
the acquisition module is used for acquiring data to be processed of the graph database;
the calculation module is used for calculating the relation depth of the data to be processed;
and the storage module is used for semanticizing the data to be processed and storing the data to be processed in the graph database when the relation depth of the data to be processed is greater than a preset relation threshold value.
9. The graph database-based data processing apparatus of claim 8, wherein the storage module is further configured to store the data to be processed in a relational database or a key-value database when the depth of relationship of the data to be processed is not greater than the preset threshold of relationship.
10. A graph database-based data processing apparatus according to claim 8, wherein said calculation module comprises:
the searching unit is used for searching the associated data associated with the data to be processed;
the shortest path calculating unit is used for calculating the shortest path between the data to be processed and each piece of associated data respectively;
and the determining unit is used for determining a maximum value from all the calculated shortest paths as the relation depth of the data to be processed.
11. A graph database based data processing apparatus according to claim 8 or 9, wherein said data to be processed comprises at least one of:
data to be stored in the graph database;
data directly associated with data to be deleted in the graph database.
12. The graph database-based data processing apparatus according to claim 8,
the acquisition module is also used for acquiring modified data of the data to be processed;
the storage module is further used for storing the modified data in the graph database after semantization.
13. A graph database-based data processing apparatus according to claim 8, further comprising:
the relation threshold setting module is used for determining a value interval of the relation threshold; selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer; and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.
14. The graph database-based data processing apparatus of claim 13, further comprising an interval upper limit determination module, the interval upper limit determination module comprising:
the data selecting unit is used for selecting M data, the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2;
a storage unit, configured to store the M data in L different types of databases, respectively, where the L databases include a graph database;
the testing unit is used for selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query time length sums, and determining the minimum query time length and the corresponding database;
and the determining unit is used for determining that the upper limit of the value interval is D2 when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are both graph databases.
CN201910403525.2A 2019-05-15 2019-05-15 Data processing method and device based on graph database Pending CN111949810A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910403525.2A CN111949810A (en) 2019-05-15 2019-05-15 Data processing method and device based on graph database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910403525.2A CN111949810A (en) 2019-05-15 2019-05-15 Data processing method and device based on graph database

Publications (1)

Publication Number Publication Date
CN111949810A true CN111949810A (en) 2020-11-17

Family

ID=73336891

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910403525.2A Pending CN111949810A (en) 2019-05-15 2019-05-15 Data processing method and device based on graph database

Country Status (1)

Country Link
CN (1) CN111949810A (en)

Similar Documents

Publication Publication Date Title
US8396852B2 (en) Evaluating execution plan changes after a wakeup threshold time
US10133778B2 (en) Query optimization using join cardinality
US9298774B2 (en) Changing the compression level of query plans
US8700605B1 (en) Estimating rows returned by recursive queries using fanout
US9378233B2 (en) For all entries processing
US8468146B2 (en) System and method for creating search index on cloud database
US10248683B2 (en) Applications of automated discovery of template patterns based on received requests
CN107729371B (en) Data indexing and querying method, device, equipment and storage medium of block chain
WO2017096892A1 (en) Index construction method, search method, and corresponding device, apparatus, and computer storage medium
US11327985B2 (en) System and method for subset searching and associated search operators
CN109299101B (en) Data retrieval method, device, server and storage medium
US10466936B2 (en) Scalable, multi-dimensional search for optimal configuration
CN109241100B (en) Query method, device, equipment and storage medium
US20170032052A1 (en) Graph data processing system that supports automatic data model conversion from resource description framework to property graph
CN111125199B (en) Database access method and device and electronic equipment
US8396858B2 (en) Adding entries to an index based on use of the index
CN113722600B (en) Data query method, device, equipment and product applied to big data
CN108959571B (en) SQL statement operation method and device, terminal equipment and storage medium
CN107679107B (en) Graph database-based power grid equipment reachability query method and system
CN109697234B (en) Multi-attribute information query method, device, server and medium for entity
JP2022522790A (en) Methods and devices for tracking blockchain transactions
CN111949810A (en) Data processing method and device based on graph database
CN114547086B (en) Data processing method, device, equipment and computer readable storage medium
CN113687920B (en) Object policy operation method, device and equipment of distributed system
KR102202792B1 (en) Method and device for performing multi-caching on data sources of same or different types by using cluster-based processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination