CN111949810A

CN111949810A - Data processing method and device based on graph database

Info

Publication number: CN111949810A
Application number: CN201910403525.2A
Authority: CN
Inventors: 曾智嵘
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 2019-05-15
Filing date: 2019-05-15
Publication date: 2020-11-17

Abstract

The invention provides a data processing method and device based on a graph database, and belongs to the technical field of data processing. The data processing method based on the graph database comprises the following steps: acquiring data to be processed of a graph database; calculating the relation depth of the data to be processed; and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database. The technical scheme of the invention can improve the query efficiency of the graph database.

Description

Data processing method and device based on graph database

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a data processing method and apparatus based on a graph database.

Background

Nowadays, graph databases (graph databases) are becoming more popular, people use graph databases to support applications such as knowledge maps and images, and graph databases may also be referred to as graph-oriented databases, where data is stored and queried in a data structure such as a "graph" in the graph databases, and data is represented and stored by means of nodes, edges, attributes and the like, and operations such as addition, deletion, modification and query are supported.

In the prior art, data to be stored is semantically imported into a graph database, but some data are not suitable for semantically storing, so that the query efficiency of the graph database is reduced.

Disclosure of Invention

The invention aims to provide a data processing method and device based on a graph database, which can improve the query efficiency of the graph database.

To solve the above technical problem, embodiments of the present invention provide the following technical solutions:

in one aspect, a method for data processing based on a graph database is provided, including:

acquiring data to be processed of a graph database;

calculating the relation depth of the data to be processed;

and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database.

Further, still include:

and when the relation depth of the data to be processed is not greater than the preset relation threshold, storing the data to be processed in a relation database or a key value database.

Further, the calculating the relation depth of the data to be processed includes:

searching the associated data associated with the data to be processed;

respectively calculating the shortest path between the data to be processed and each associated data;

and determining a maximum value from all the calculated shortest paths as the relation depth of the data to be processed.

Further, the data to be processed includes at least one of:

data to be stored in the graph database;

data directly associated with data to be deleted in the graph database.

Further, after the data to be processed is semantically converted and stored in the graph database, the method further comprises:

acquiring modified data of the data to be processed;

and semanticizing the modified data and storing the semanticized modified data in the graph database.

Further, the method further comprises the step of setting the relation threshold, wherein the step of setting the relation threshold comprises the following steps:

determining a value interval of the relation threshold;

selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer;

and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.

Further, the method further comprises a step of determining an upper limit of the value interval, wherein the step of determining the upper limit of the value interval comprises:

selecting data, namely selecting M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2;

a storage step, wherein the M data are respectively stored by L different types of databases, wherein the L databases comprise a database, and L is an integer greater than 1;

a testing step, namely selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query duration sums, and determining the minimum query duration sum and the corresponding database;

and repeating the test steps, and when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are graph databases, determining that the upper limit of the value interval is D2.

An embodiment of the present invention further provides a data processing apparatus based on a graph database, including:

the acquisition module is used for acquiring data to be processed of the graph database;

the calculation module is used for calculating the relation depth of the data to be processed;

and the storage module is used for semanticizing the data to be processed and storing the data to be processed in the graph database when the relation depth of the data to be processed is greater than a preset relation threshold value.

Further, the storage module is further configured to store the data to be processed in a relational database or a key-value database when the relation depth of the data to be processed is not greater than the preset relation threshold.

Further, the calculation module includes:

the searching unit is used for searching the associated data associated with the data to be processed;

the shortest path calculating unit is used for calculating the shortest path between the data to be processed and each piece of associated data respectively;

and the determining unit is used for determining a maximum value from all the calculated shortest paths as the relation depth of the data to be processed.

Further, the data to be processed includes at least one of:

data to be stored in the graph database;

data directly associated with data to be deleted in the graph database.

Further, the obtaining module is further configured to obtain modification data of the data to be processed;

the storage module is further used for storing the modified data in the graph database after semantization.

Further, still include:

the relation threshold setting module is used for determining a value interval of the relation threshold; selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer; and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.

Further, the device also comprises a value interval upper limit determining module, wherein the value interval upper limit determining module comprises:

the data selecting unit is used for selecting M data, the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2;

a storage unit, configured to store the M data in L different types of databases, respectively, where the L databases include a graph database;

the testing unit is used for selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query time length sums, and determining the minimum query time length and the corresponding database;

and the determining unit is used for determining that the upper limit of the value interval is D2 when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are both graph databases.

An embodiment of the present invention further provides a data processing device based on a graph database, including:

a processor; and

a memory having computer program instructions stored therein,

wherein the computer program instructions, when executed by the processor, cause the processor to perform the steps of the graph database based data processing method as described above.

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor is caused to execute the steps in the data processing method based on the graph database.

The embodiment of the invention has the following beneficial effects:

according to the scheme, after the data to be processed is received, the data to be processed is judged according to the relation depth of the data to be processed, whether the data to be processed is suitable for semantization storage is judged, when the relation depth of the data to be processed is larger than a preset relation threshold value, the data to be processed is judged to be suitable for semantization storage, the data to be processed is semantized and stored in a database, and therefore the data which are not suitable for semantization storage can be prevented from being semantically processed and stored in the database, and query efficiency of the database can be guaranteed.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating a data processing method based on a graph database according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating relationships between data;

FIG. 3 is a flowchart illustrating setting of a relationship threshold according to an embodiment of the present invention;

FIG. 4 is a block diagram of a graph database based data processing apparatus according to an embodiment of the present invention;

FIG. 5 is a block diagram of a graph database based data processing apparatus according to an embodiment of the present invention;

FIG. 6 is a flowchart illustrating adding data to a graph database according to an embodiment of the present invention;

FIG. 7 is a schematic flow chart of modifying data in a graph database according to an embodiment of the present invention;

FIG. 8 is a flowchart illustrating deletion of data in a graph database according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantages to be solved by the embodiments of the present invention clearer, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.

The embodiment of the invention provides a data processing method and device based on a graph database, which can improve the query efficiency of the graph database.

Example one

An embodiment of the present invention provides a data processing method based on a graph database, as shown in fig. 1, including:

step 101: acquiring data to be processed of a graph database;

wherein, the data to be processed of the graph database comprises at least one of the following data: data to be stored in a graph database; data directly associated with data to be deleted in a graph database.

The data to be processed may be in the form of text, such as "yao ming in shanghai, china".

Step 102: calculating the relation depth of the data to be processed;

specifically, the associated data associated with the data to be processed may be searched, the shortest paths between the data to be processed and each associated data are respectively calculated, and a maximum value is determined from all the calculated shortest paths as the depth of relationship of the data to be processed.

As shown in fig. 2, taking the data to be processed as a as an example, the data associated with the data to be processed a includes B, C and D, where the shortest path between a and D is 1, the shortest path between a and B is 1, and the shortest path between a and C is 2, it may be determined that the depth of relationship of the data to be processed as a is the maximum value among 1, and 2, that is, the depth of relationship of the data to be processed as a is 2.

Step 103: and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database.

Specifically, semantization of the data to be processed is to convert the text into a triple, which can be understood as (entity, entity relation, entity) such as "Yao Ming originated in Shanghai of china" can be represented by a triple (Yao Ming, PlaceOfBirth, Shanghai). Entities are used as nodes, entity relations (including attributes, categories and the like) are used as edges between the nodes, and the triples are stored in a graph database in a data structure of a graph.

In the embodiment, after the data to be processed is received, the data to be processed is judged according to the relation depth of the data to be processed, whether the data to be processed is suitable for semantization storage is judged, when the relation depth of the data to be processed is larger than a preset relation threshold value, the data to be processed is judged to be suitable for semantization storage, and the data to be processed is semantized and stored in a database, so that the data which is not suitable for semantization storage can be prevented from being semantically processed and stored in the database, and the query efficiency of the database can be ensured.

When the relation depth of the data to be processed is not greater than the preset relation threshold, the data to be processed is considered to be data unsuitable for semantization, and the data to be processed cannot be stored in the graph database after semantization, so that the query efficiency of the graph database is reduced, and the data to be processed needs to be stored in other types of databases, such as a relation database or a key value database.

As shown in table 1, when the depth of relationship of data is 1, under the same query condition, if the data is stored in the graph database after being semantically processed, the time length for querying the data is 0.01876s, and if the data is stored in the relational database, the time length for querying the data is 0.00078 s; when the relation depth of the data is 2, under the same query condition, if the data is stored in a graph database after semantization, the time length for querying the data is 0.002304s, and if the data is stored in a relation database, the time length for querying the data is 0.00556 s; when the relation depth of the data is 3, under the same query condition, if the data is stored in a graph database after semantization, the time length for querying the data is 0.03092s, and if the data is stored in a relation database, the time length for querying the data is 0.20788 s; when the relation depth of the data is 4, under the same query condition, if the data is stored in the graph database after being semantically processed, the time length for querying the data is 0.04865s, and if the data is stored in the relation database, the time length for querying the data is 9.89270 s. Wherein, the query conditions are as follows: the number M of query data is 100000, and the query is repeated 10 times. It can be seen that when the depth of relationship of the data is larger, the data is more suitable for being semantically stored in the graph database, and when the depth of relationship of the data is smaller, the data is more suitable for being stored in other types of databases.

TABLE 1

Depth of relationship	Graph database(s)	Relational database(s)
			1	0.01876	0.00078
2	0.02304	0.00556
			3	0.03092	0.20788s
4	0.04865	9.89270

Further, after the data to be processed is stored in the graph database, when modified data of the data to be processed is received, the relationship depth of the modified data does not need to be judged, and the modified data is directly stored in the graph database after being semantically processed. For example, if "yaoming is from shanghai in china" is modified to "yaoming is from beijing in china", then "yaoming is from beijing in china" is directly semantically stored in a map database to replace the original data.

When data in a graph database needs to be deleted, data directly associated with the data to be deleted in the graph database needs to be searched first, wherein direct association means that a path between the data is 1, and as shown in fig. 2, the data directly associated with the data A are data B and data D; the method comprises the steps of judging the relation depth of data directly related to data to be deleted, judging that the data directly related to the data to be deleted is still data suitable for semantic storage when the relation depth of the data directly related to the data to be deleted is larger than a preset relation threshold, not changing the data, judging that the data directly related to the data to be deleted is data unsuitable for semantic storage when the relation depth of the data directly related to the data to be deleted is not larger than the preset relation threshold, and moving the data directly related to the data to be deleted to other types of databases for storage after the data to be deleted is deleted.

In this embodiment, the value of the relationship threshold may be adaptively adjusted according to the reading condition of the whole data of the graph database, so as to optimize the query efficiency of the graph database. Specifically, the setting of the relationship threshold may be performed periodically, or may be performed after receiving the trigger instruction. The step of setting the relationship threshold may comprise: determining a value interval of the relation threshold; selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer; and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.

In a specific example, as shown in fig. 3, the step of setting the relationship threshold includes:

step 201: determining the value interval of the relation threshold value to be 2-Dmax;

since the query efficiency of the graph database is low when the relation depth of the data is 1, it is not necessary to consider using the graph database when the relation depth is 1, and therefore, the minimum value of the value range is set to 2 excluding the case where D is 1.

Step 202: selecting a relation threshold value test value Di from the value-taking interval;

step 203: reading data in the graph database for N times by taking the relation threshold value Di as the relation depth, and calculating the time and Ti required by N times of reading;

step 204: repeating the

step

202 and 203 to obtain time T2, T3, … and TDmax corresponding to each relation threshold test value in the value-taking interval;

step 205: and determining a relation threshold test value corresponding to the minimum time sum as a relation threshold D.

The upper limit Dmax of the value interval of the relationship threshold can be determined according to the performance test result of each database. The step of determining the upper limit of the value interval comprises the following steps: selecting data, namely selecting M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2; a storage step, wherein the M data are respectively stored by L different types of databases, wherein the L databases comprise a database, and L is an integer greater than 1; a testing step, namely selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query duration sums, and determining the minimum query duration sum and the corresponding database; and repeating the test steps, and when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are graph databases, determining that the upper limit Dmax of the value interval is D2.

In a specific example, the data is stored in a database, a relational database and a key value database, and each database is subjected to performance test. The specific performance test method is as follows: selecting fixed M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, the selection principle of the M is that the difference of query efficiency can be reflected obviously, and the value of D1 is larger than or equal to Dmax; respectively storing the M data by using a graph database, a relational database and a key value database, respectively querying all the M data by using different relational depths (2-D1), recording the query efficiency of each database when querying by using each relational depth, and comparing the query efficiencies of different databases. For example, when the depth of the relationship is 2, the query efficiency of the relational database is optimal; when the depth of relationship is 3, the query efficiency of the key-value database is optimal; when the relation depth is greater than or equal to 4, the query efficiency of the graph database is optimal, and then, the Dmax is determined to be equal to 4, namely, the Dmax is the minimum relation depth when the query efficiency of the graph database is greater than that of other databases obtained after the performance test is performed on each database.

The technical scheme of the embodiment can be applied to an edge data integration scene, a typical requirement of edge data integration is real-time processing, and when edge intelligent analysis is provided through semantic processing, logical reasoning and the like, the data query time can be shortened by using the scheme of the embodiment, so that the real-time processing capability is improved. The technical scheme of the embodiment can also be applied to scenes with higher requirements on intelligence and real time, such as a public security real-time monitoring scene, and can provide technical means in the aspects of quickly positioning suspects, preventing public safety incidents and the like; in addition, the technical scheme of the embodiment can also be applied to an intelligent manufacturing field, so that the error reason can be quickly positioned, and the loss possibly caused by production halt can be reduced. In addition, in the application of cloud big data, the technical scheme of the embodiment can shorten the data query time, so that more data can be processed in unit time, and the asset utilization rate of the data center is improved.

Example two

An embodiment of the present invention further provides a data processing apparatus based on a graph database, as shown in fig. 4, including:

an obtaining module 21, configured to obtain to-be-processed data of a graph database;

a calculating module 22, configured to calculate a depth of relationship of the to-be-processed data;

the storage module 23 is configured to semantically store the to-be-processed data in the graph database when the relationship depth of the to-be-processed data is greater than a preset relationship threshold.

Further, the storage module 23 is further configured to store the data to be processed in a relational database or a key-value database when the relation depth of the data to be processed is not greater than the preset relation threshold.

The obtaining module 21 is specifically configured to obtain to-be-processed data of a graph database, and includes at least one of the following: data to be stored in a graph database; data directly associated with data to be deleted in the graph database; as shown in fig. 4, after the obtaining module 21 receives the data to be processed, the data to be processed is sent to the calculating module 22, the calculating module 22 calculates the depth of relationship of the data to be processed, and tags the data to be processed according to the depth of relationship of the data to be processed, for example, when the depth of relationship of the data to be processed is greater than a preset threshold value of relationship, the data to be processed is judged to be data suitable for semantic storage, and "semantic storage tag" is marked for the data to be processed, when the depth of relationship of the data to be processed is not greater than the preset threshold value of relationship, the data to be processed is judged to be data unsuitable for semantic storage, and "non-semantic storage tag" is marked for the data to be processed, the data to be processed with tag is sent to the storing module 23, the storing module 23 is connected to the database, and the data to be processed is stored in the database or other types of databases according to the tag of the data to be processed, specifically, the data to be processed with the "semantic storage tag" is stored in the graph database, and the data to be processed with the "non-semantic storage tag" is stored in other types of databases, such as a relational database and a key value database.

Further, the calculation module 22 includes:

For the newly added data to be stored, the calculating module 22 determines the depth of relationship of the newly added data to be stored, determines whether the data to be stored is suitable for semantic storage, and outputs the determination result to the storing module 23.

When data in the graph database needs to be deleted, the obtaining module 21 needs to search data in the graph database, which is directly associated with the data to be deleted, first, where the direct association means that a path between the data is 1, and as shown in fig. 2, the data directly associated with the data a are data B and data D; the calculation module 22 determines the relation depth of the data directly associated with the data to be deleted, when the relation depth of the data directly associated with the data to be deleted is greater than a preset relation threshold, it is determined that the data directly associated with the data to be deleted is still data suitable for semantic storage, when the relation depth of the data directly associated with the data to be deleted is not greater than the preset relation threshold, the calculation module 22 determines that the data directly associated with the data to be deleted is data unsuitable for semantic storage, and after the data to be deleted is deleted, the storage module 23 needs to move the data directly associated with the data to be deleted to other types of databases for storage.

Further, after the data to be processed is stored in the graph database, when modified data of the data to be processed is received, the relationship depth of the modified data does not need to be judged, and the modified data is directly stored in the graph database after being semantically processed. The obtaining module 21 is further configured to obtain modified data of the data to be processed; the storage module 23 is further configured to semantically store the modified data in the map database. For example, if "yaoming is from shanghai in china" is modified to "yaoming is from beijing in china", then "yaoming is from beijing in china" is directly semantically stored in a map database to replace the original data.

In this embodiment, the value of the relationship threshold may be adaptively adjusted according to the reading condition of the whole data of the graph database, so as to optimize the query efficiency of the graph database. Specifically, the setting of the relationship threshold may be performed periodically, or may be performed after receiving the trigger instruction.

Further, the apparatus further comprises:

Further, the device also includes a value interval upper limit determining module, where the value interval upper limit determining module includes:

EXAMPLE III

An embodiment of the present invention further provides a data processing device 30 based on a graph database, as shown in fig. 5, including:

a processor 32; and

a memory 34, in which memory 34 computer program instructions are stored,

wherein the computer program instructions, when executed by the processor, cause the processor 32 to perform the steps of:

acquiring data to be processed of a graph database;

calculating the relation depth of the data to be processed;

Further, as shown in FIG. 5, the graph database-based data processing device 30 also includes a network interface 31, an input device 33, a hard disk 35, and a display device 36.

The various interfaces and devices described above may be interconnected by a bus architecture. A bus architecture may be any architecture that may include any number of interconnected buses and bridges. Various circuits of one or more Central Processing Units (CPUs), represented in particular by processor 32, and one or more memories, represented by memory 34, are coupled together. The bus architecture may also connect various other circuits such as peripherals, voltage regulators, power management circuits, and the like. It will be appreciated that a bus architecture is used to enable communications among the components. The bus architecture includes a power bus, a control bus, and a status signal bus, in addition to a data bus, all of which are well known in the art and therefore will not be described in detail herein.

The network interface 31 may be connected to a network (e.g., the internet, a local area network, etc.), and may obtain relevant data from the network, such as data to be processed in a graph database, and may store the relevant data in the hard disk 35.

The input device 33 can receive various commands input by the operator and send the commands to the processor 32 for execution. The input device 33 may comprise a keyboard or a pointing device (e.g., a mouse, a trackball, a touch pad or a touch screen, etc.

The display device 36 may display the results of the instructions executed by the processor 32.

The memory 34 is used for storing programs and data necessary for operating the operating system, and data such as intermediate results in the calculation process of the processor 32.

It will be appreciated that memory 34 in embodiments of the invention may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an Erasable Programmable Read Only Memory (EPROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), or a flash memory. Volatile memory can be Random Access Memory (RAM), which acts as external cache memory. The memory 34 of the apparatus and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.

In some embodiments, memory 34 stores the following elements, executable modules or data structures, or a subset thereof, or an expanded set thereof: an operating system 341 and application programs 342.

The operating system 341 includes various system programs, such as a framework layer, a core library layer, a driver layer, and the like, and is used for implementing various basic services and processing hardware-based tasks. The application 342 includes various applications, such as a Browser (Browser), and the like, for implementing various application services. A program implementing the method of an embodiment of the present invention may be included in the application 342.

The processor 32, when calling and executing the application program and data stored in the memory 34, may specifically obtain data to be processed from a graph database; calculating the relation depth of the data to be processed; and when the relation depth of the data to be processed is greater than a preset relation threshold value, semanticizing the data to be processed and storing the data to be processed in the graph database.

The methods disclosed in the above embodiments of the present invention may be implemented in the processor 32 or by the processor 32. The processor 32 may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method may be performed by integrated logic circuits of hardware or instructions in the form of software in the processor 32. The processor 32 may be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof, and may implement or perform the methods, steps, and logic blocks disclosed in the embodiments of the present invention. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present invention may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in the memory 34, and the processor 32 reads the information in the memory 34 and completes the steps of the method in combination with the hardware thereof.

It is to be understood that the embodiments described herein may be implemented in hardware, software, firmware, middleware, microcode, or any combination thereof. For a hardware implementation, the processing units may be implemented within one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), general purpose processors, controllers, micro-controllers, microprocessors, other electronic units designed to perform the functions described herein, or a combination thereof.

For a software implementation, the techniques described herein may be implemented with modules (e.g., procedures, functions, and so on) that perform the functions described herein. The software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Further, when the depth of relationship of the to-be-processed data is not greater than the preset relationship threshold, the processor 32 stores the to-be-processed data in a relationship database or a key value database.

Further, the processor 32 searches for associated data associated with the data to be processed;

Further, the data to be processed includes at least one of:

data to be stored in the graph database;

data directly associated with data to be deleted in the graph database.

Further, the processor 32 obtains modification data of the data to be processed;

Further, the processor 32 determines a value interval of the relationship threshold; selecting a relation threshold test value from the value-taking interval, reading data in the graph database for N times by taking the relation threshold test value as a relation depth, calculating the sum of time required by the N times of reading, and repeating the steps to obtain the sum of time corresponding to each relation threshold test value in the value-taking interval, wherein N is a positive integer; and determining a relation threshold test value corresponding to the minimum time and the minimum time as the relation threshold.

Further, processor 32 performs: selecting data, namely selecting M data, wherein the relation depths of the M data are all larger than or equal to a preset threshold value D1, and D1 is an integer larger than 2; a storage step, wherein the M data are respectively stored by L different types of databases, wherein the L databases comprise a database, and L is an integer greater than 1; a testing step, namely selecting a depth test value from the range of 2-D1, respectively querying the M data in the L databases by taking the depth test value as a relation depth to obtain L query duration sums, and determining the minimum query duration sum and the corresponding database; and repeating the test steps, and when the depth test value is greater than or equal to D2, and the minimum query duration and the corresponding database are graph databases, determining that the upper limit of the value interval is D2.

Example four

An embodiment of the present invention further provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the processor is caused to execute the following steps:

acquiring data to be processed of a graph database;

calculating the relation depth of the data to be processed;

Further, the computer program, when executed by a processor, further causes the processor to perform the steps of:

searching the associated data associated with the data to be processed;

Further, the data to be processed includes at least one of:

data to be stored in the graph database;

data directly associated with data to be deleted in the graph database.

acquiring modified data of the data to be processed;

determining a value interval of the relation threshold;

EXAMPLE five

The data processing method based on a graph database according to the present invention is further described with reference to a specific embodiment, as shown in fig. 6, when receiving new data to be stored, the embodiment includes the following steps:

step 401: receiving data to be stored;

step 402: searching the associated data of the data to be stored;

step 403: determining the relation depth of the data to be processed according to the associated data;

As shown in fig. 2, taking the data to be processed as a as an example, determining that the data associated with the data to be processed a includes B, C and D, where the shortest path between a and D is 1, the shortest path between a and B is 1, and the shortest path between a and C is 2, it may be determined that the depth of relationship of the data to be processed as a is the maximum value among 1, and 2, that is, the depth of relationship of the data to be processed as a is 2.

Step 404: judging whether the relation depth is greater than a preset relation threshold value, if so, turning to step 405, otherwise, turning to step 406;

step 405: setting a semantic storage tag of the data to be stored, and turning to step 407;

step 406: setting a non-semantic storage tag of the data to be stored, and turning to step 407;

step 407: the data to be stored with the semantic storage labels are stored in a database, and the data to be stored with the non-semantic storage labels are stored in other types of databases.

In the embodiment, data is classified according to the relation depth of the data, if data which is not suitable for semantic storage is found, the data is converted into non-semantic storage, and if data which is not suitable for non-semantic storage is found, the data is converted into semantic storage, so that the data query efficiency of a graph database can be improved.

EXAMPLE six

The method for processing data based on a graph database according to the present invention will be further described with reference to a specific embodiment, as shown in fig. 7, when modified data of data in the graph database is received, the embodiment includes the following steps:

step 501: receiving modification data;

step 502: and semanticizing the modified data and storing the modified data in a database.

After the data is stored in the graph database, when the modified data of the data is received, the relationship depth of the modified data does not need to be judged, and the modified data is directly stored in the graph database after semantization. For example, if "yaoming is from shanghai in china" is modified to "yaoming is from beijing in china", then "yaoming is from beijing in china" is directly semantically stored in a map database to replace the original data.

EXAMPLE seven

The method for processing data based on a graph database according to the present invention is further described with reference to a specific embodiment, as shown in fig. 8, when data in the graph database needs to be deleted, the embodiment includes the following steps:

step 601: searching data directly related to the data to be deleted in the graph database;

wherein the data directly associated with the data to be deleted is the data affected by the data to be deleted. Direct association means that the path between data is 1, and as shown in fig. 2, data directly associated with data a are data B and D.

Step 602: adding data directly associated with the data to be deleted into the data set;

step 603: calculating the relation depth of each data in the data set;

step 604: when the relation depth of data in the data set is larger than a preset relation threshold, the data is judged to be still suitable for semantic storage, when the relation depth of the data is not larger than the preset relation threshold, the data is judged to be not suitable for semantic storage, and the data is moved to other types of databases for storage.

The embodiment can monitor the change of data in the graph database, reclassify the affected data, convert the data which is not suitable for semantic storage into non-semantic storage if finding the data which is not suitable for semantic storage, and convert the data which is not suitable for non-semantic storage into semantic storage if finding the data which is not suitable for semantic storage, thereby improving the data query efficiency of the graph database.

The foregoing is a preferred embodiment of the present invention, and it should be noted that it is obvious to those skilled in the art that various modifications and improvements can be made without departing from the principle of the present invention, and these modifications and improvements should be construed as the protection scope of the present invention.

Claims

1. A method for data processing based on a graph database, comprising:

acquiring data to be processed of a graph database;

calculating the relation depth of the data to be processed;

2. A method for graph database based data processing according to claim 1, further comprising:

3. A method for graph database based data processing according to claim 1, wherein said calculating a depth of relationship for said data to be processed comprises:

searching the associated data associated with the data to be processed;

4. A method for graph database based data processing according to claim 1 or 2, wherein said data to be processed comprises at least one of:

data to be stored in the graph database;

data directly associated with data to be deleted in the graph database.

5. A method for graph database based data processing according to claim 1, wherein after storing said data to be processed in said graph database after semantization, said method further comprises:

acquiring modified data of the data to be processed;

6. A method for graph database based data processing according to claim 1, further comprising the step of setting said relationship threshold, said step of setting said relationship threshold comprising:

determining a value interval of the relation threshold;

7. A method as defined in claim 6, further comprising the step of determining an upper limit for said interval of values, said step of determining an upper limit for said interval of values comprising:

8. A graph database-based data processing apparatus, comprising:

9. The graph database-based data processing apparatus of claim 8, wherein the storage module is further configured to store the data to be processed in a relational database or a key-value database when the depth of relationship of the data to be processed is not greater than the preset threshold of relationship.

10. A graph database-based data processing apparatus according to claim 8, wherein said calculation module comprises:

11. A graph database based data processing apparatus according to claim 8 or 9, wherein said data to be processed comprises at least one of:

data to be stored in the graph database;

data directly associated with data to be deleted in the graph database.

12. The graph database-based data processing apparatus according to claim 8,

the acquisition module is also used for acquiring modified data of the data to be processed;

13. A graph database-based data processing apparatus according to claim 8, further comprising:

14. The graph database-based data processing apparatus of claim 13, further comprising an interval upper limit determination module, the interval upper limit determination module comprising: