CN112732727B

CN112732727B - Graph index flow batch integrated processing method and device

Info

Publication number: CN112732727B
Application number: CN202110365349.5A
Authority: CN
Inventors: 顾凌云; 郭志攀; 王伟; 张晓丰
Original assignee: Nanjing Bingjian Information Technology Co ltd
Current assignee: Nanjing Bingjian Information Technology Co ltd
Priority date: 2021-04-06
Filing date: 2021-04-06
Publication date: 2021-06-18
Anticipated expiration: 2041-04-06
Also published as: CN112732727A

Abstract

According to the method and the device for processing the graph indexes in a batch-to-batch integrated mode, the side table and the point table are obtained according to the whole graph data and stored in a Hive database, the side table and the point table are input into a calculation model to generate a first connection chart, the side table, the point table and the first connection chart are input into a preset program to obtain a graph database bottom file, a parameter combination of the calculation model of the graph indexes and the parameter list is called, target data matched with the parameter combination is searched in the graph database bottom file, the graph indexes are calculated according to the calculation model to obtain a graph index calculation result, and the graph index calculation result is stored into Hbase by calling api of the HBase. The data of different connected graphs are obtained by splitting the data of the whole graph, different graph database bottom layer files are accurately generated respectively, the data do not need to be pulled across the network during index calculation, the performance is higher, and the calculation result can be accurately stored.

Description

Graph index flow batch integrated processing method and device

Technical Field

The application relates to the technical field of graph calculation, in particular to a graph index flow batch integrated processing method and device.

Background

In related graph metrics processing techniques, for the computation of each graph metric, the business system needs to initiate two requests to the computation service. The first request is to obtain all input data combinations and buffer them inside the business system, which may increase the complexity of the business system. When the graph database instance calculates the graph index, data participating in calculation is obtained from the underlying distributed storage system, which may cause network overhead to be large and some uncontrollable emergencies to exist.

Disclosure of Invention

The application provides a method and a device for integrally processing a graph index stream in batch, so as to solve the technical problems of the background technology.

A graph index stream batch integration processing method comprises the following steps:

obtaining an edge table and a point table according to the whole graph data in a graph database, and storing the edge table and the point table into a Hive database;

generating a first connection chart based on the edge table and the point table in the Hive database and a preset Spark calculation model; the first connection diagram comprises two columns, one column is the ID of a point, and the other column is the ID of the first connection diagram;

inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a map database bottom file corresponding to each first connection chart;

calling a calculation model of the graph indexes and parameter combinations of the parameter lists, searching target data matched with the parameter combinations in the graph database bottom layer file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result;

and storing the graph index calculation result into the HBase by calling the api of the HBase.

Further, obtaining an edge table and a point table according to the full map data in the map database, and storing the edge table and the point table into the Hive database, including:

and accessing the graph database through a data reading interface, exporting the total data corresponding to the target graph in the graph database to obtain an edge table and a point table, and storing the edge table and the point table into a Hive database.

Further, the step of inputting the edge table, the point table, and the first connection diagram into a preset Spark program to obtain a diagram database bottom file corresponding to each first connection diagram includes:

when the edge table and the point table are input into a preset Spark calculation model, the Spark calculation model traverses the data of each first connected graph, and the data of the same first connected graph are aggregated into the bottom layer file of the same map database;

and compressing the bottom file of each database to obtain a corresponding compressed package, and storing the compressed package into an HDFS file system.

Further, before performing graph index calculation on the target data according to the calculation model, the method further includes:

before the graph index calculation is carried out, extracting the parameter combination of a parameter list, and judging whether the parameter combination of the parameter list has a mapping relation with a mapping table of a graph database bottom layer file;

if yes, traversing the parameter combination and calculating the graph index;

and if not, traversing the bottom file of each first connection diagram, inquiring all the parameter combinations, solidifying the mapping relation between the parameter combinations and the bottom files of the first connection diagrams into the Hive database, traversing the parameter groups and calculating the diagram indexes.

Further, the method further comprises:

and storing the graph index calculation results into the Hive database in parallel.

A graph index stream batch integration processing apparatus, the apparatus comprising:

the data acquisition module is used for obtaining an edge table and a point table according to the full-image data in the image database and storing the edge table and the point table into the Hive database;

the data calculation module is used for generating a first communication chart based on the edge table and the point table in the Hive database and a preset Spark calculation model; the first connection diagram comprises two columns, one column is the ID of a point, and the other column is the ID of the first connection diagram;

the data processing module is used for inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a map database bottom file corresponding to each first connection chart;

the data matching module is used for calling a calculation model of the graph index and a parameter combination of the parameter list, searching target data matched with the parameter combination in the graph database bottom file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result;

and the data storage module is used for storing the graph index calculation result into the HBase by calling the api of the HBase.

Further, the data acquisition module is specifically configured to:

Further, the data processing module is specifically configured to:

Further, the data matching module is specifically configured to:

if yes, traversing the parameter combination and calculating the graph index;

Further, the data storage module is specifically configured to:

When the graph index stream batch integrated processing method and device in the embodiment of the application are applied, the edge table and the point table are obtained according to the whole graph data in the graph database and are stored in the Hive database; inputting the edge table and the point table into a Spark calculation model to generate a first connection chart; inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a graph database bottom layer file; calling a calculation model of the graph indexes and parameter combinations of the parameter list, searching target data matched with the parameter combinations in a bottom file of a graph database, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result; and storing the graph index calculation result into the HBase by calling the api of the HBase. In the invention, because the data of different connected graphs obtained by splitting the whole graph data in the graph database respectively and accurately generate different graph database bottom files, the data does not need to be pulled across the network during index calculation, the performance is higher, and the calculation result can be accurately and quickly stored in Hbase. Therefore, the whole image data can be quickly processed, the processing time of the image index can be effectively reduced, the processing efficiency of the image index is improved, and the time cost is effectively reduced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a flowchart of a batch integration processing method for graph index flow according to an embodiment of the present invention;

fig. 2 is a functional block diagram of an index flow batch integrated processing apparatus according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

Referring to fig. 1, a flow chart of a graph index flow batch integration processing method according to an embodiment of the present invention is shown, and further, the graph index flow batch integration processing method may specifically include the contents described in the following steps S21 to S25.

And step S21, obtaining an edge table and a point table according to the whole graph data in the graph database, and storing the edge table and the point table into the Hive database.

Illustratively, the edge table is composed of two parts, namely a table head node and a table node, and each vertex in the graph corresponds to one table head node stored in the array. The point table indicates the number of variables used in the automation control system, called the point number.

Step S22, generating a first connection chart based on the edge table and the point table in the Hive database and a preset Spark calculation model.

Illustratively, the first connection diagram includes two columns, one column is the ID of the point, and the other column is the ID of the first connection diagram.

Step S23, inputting the edge table, the point table, and the first connection diagram into a preset Spark program, to obtain a diagram database file corresponding to each first connection diagram.

Illustratively, the connectivity graph is used to characterize connectivity as a fundamental property of the graph; the map database base file is used for representing the minimum unit of the map database. According to the scheme, different map database bottom files are generated by the data of different connected maps, so that data do not need to be pulled across a network during index calculation, and the performance is higher. A Spark calculation model is used so that the computational resources can be allocated on demand.

And step S24, calling a calculation model of the graph index and a parameter combination of the parameter list, searching target data matched with the parameter combination in the graph database bottom file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result.

Illustratively, the graph index is used to characterize the corresponding feature (e.g., color, shape, etc.) in the graph

And step S25, storing the graph index calculation result into the HBase by calling the api of the HBase.

It is understood that, in the execution of the above-mentioned steps S21-S25, the edge table and the point table are obtained according to the whole graph data in the graph database and stored in the Hive database; inputting the edge table and the point table into a Spark calculation model to generate a first connection chart; inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a graph database bottom layer file; calling a calculation model of the graph indexes and parameter combinations of the parameter list, searching target data matched with the parameter combinations in a bottom file of a graph database, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result; and storing the graph index calculation result into the HBase by calling the api of the HBase. In the invention, because the data of different connected graphs obtained by splitting the whole graph data in the graph database respectively and accurately generate different graph database bottom files, the data does not need to be pulled across the network during index calculation, the performance is higher, and the calculation result can be accurately and quickly stored in Hbase. Therefore, the whole image data can be quickly processed, the processing time of the image index can be effectively reduced, the processing efficiency of the image index is improved, and the time cost is effectively reduced.

In a specific implementation process, the inventors found that, in obtaining the edge table and the point table from the full map data in the map database, there is a technical problem that data reading is wrong, so that it is difficult to accurately store the edge table and the point table in the Hive database, and in order to improve the technical problem, the step of obtaining the edge table and the point table from the full map data in the map database and storing the edge table and the point table in the Hive database described in step S21 may specifically include the following step S211.

Step S211, accessing the graph database through a data reading interface, exporting the total data corresponding to the target graph in the graph database to obtain an edge table and a point table, and storing the edge table and the point table in a Hive database.

Illustratively, the map database is accessed through the corresponding interface, so that the corresponding data can be obtained, and the problem of data reading errors is reduced.

It can be understood that when the contents described in the above step S211 are executed, when the edge table and the point table are obtained from the full map data in the map database, the technical problem of data reading errors is avoided, so that the edge table and the point table can be accurately stored in the Hive database.

In an actual operation process, the inventor finds that when the edge table, the point table, and the first connection chart are input into a preset Spark program, there is a calculation problem of data calculation error, so that it is difficult to easily and accurately obtain a map database base file corresponding to each first connection chart, and in order to improve the above technical problem, the step of inputting the edge table, the point table, and the first connection chart into the preset Spark program, which is described in step S23, to obtain a map database base file corresponding to each first connection chart may specifically include the following contents described in step S231 and step S232.

Step S231, when the edge table and the point table are input into a preset Spark calculation model, the Spark calculation model traverses data of each first connected graph, and aggregates data of the same first connected graph into a bottom file of the same map database.

Illustratively, the data of the second connectivity graph is merged together in parallel to obtain a copy of the graph database base file. Therefore, all connected graphs can be effectively integrated, and the loss of the connected graphs is effectively avoided. Due to the use of the Spark calculation model, calculation resources can be allocated according to needs.

Step S232, compressing the bottom layer file of each database to obtain a corresponding compressed package, and storing the compressed package into the HDFS file system.

Illustratively, the temporary space of the map database bottom layer file can be effectively reduced by compressing the map database bottom layer file in such a way, so that the pressure of the storage space can be reduced, and the condition of disordered stored data is avoided.

It can be understood that, when the contents described in step S231 and step S232 are executed, when the edge table, the point table, and the first connection diagram are input into the preset Spark program, the calculation problem of data calculation errors is avoided, so that the map database base file corresponding to each first connection diagram can be accurately obtained.

Based on the above, before the graph index calculation is performed on the target data according to the calculation model, the method further includes the following steps S11-S13.

Step S11, before calculating the graph index, extracting the parameter combination of the parameter list, and judging whether the parameter combination of the parameter list has a mapping relation with the mapping table of the map database bottom file.

Illustratively, the calculation performance of different index calculation logics for the same parameter combination is higher due to the solidified mapping relation between the parameter combination and the underlying graph database file.

And step S12, if the parameters exist, traversing the parameter combination and calculating the graph index.

Step S13, if not, traversing the bottom file of each first connection diagram, querying out all the parameter combinations, solidifying the mapping relationship between the parameter combinations and the bottom files of the first connection diagrams into the Hive database, traversing the parameter sets, and performing graph index calculation.

It can be understood that, when the contents described in the above steps S11-S13 are executed, since the mapping relationship between the parameter combinations and the underlying graph data file is solidified, it is possible to support the computation of the graph indexes of only part of the parameter combinations, and it is not necessary to compute the whole parameter combinations each time, and specifically compute which combinations are decided by the parameters transmitted by the service system, thereby implementing streaming computation.

Based on the above, the method further includes the content described in step S31.

And step S31, storing the graph index calculation results into the Hive database in parallel.

It can be understood that, when the content described in the step S31 is executed, the result is stored in Hive, so that we can query the calculation result using SQL language, thereby effectively avoiding secondary calculation, reducing processing time, and enabling the query of the business system to be only a search query without excessive calculation logic, and the calculation result is stored in one place, so that the search query can be repeated without running the calculation logic each time; network overhead may also be minimized.

Based on the same inventive concept, the invention also provides a graph index stream batch integrated processing system, which comprises a graph data processing terminal and a graph data input end, wherein the graph data processing terminal is in communication connection with the graph data input end, and the graph data processing terminal is specifically used for:

Further, the graph data processing terminal is specifically configured to:

if yes, traversing the parameter combination and calculating the graph index;

Further, the graph data processing terminal is specifically configured to:

Based on the same inventive concept, please refer to fig. 2 in combination, a functional block diagram of the graph index flow batch integration processing apparatus 500 is also provided, and the detailed description of the graph index flow batch integration processing apparatus 500 is as follows.

A graph index stream batch integration processing device 500 applied to a graph data processing terminal, the device 500 comprising:

the data acquisition module 510 is configured to obtain an edge table and a point table according to the full map data in the map database, and store the edge table and the point table in the Hive database;

a data calculation module 520, configured to generate a first connection chart based on the edge table and the point table in the Hive database and a preset Spark calculation model; the first connection diagram comprises two columns, one column is the ID of a point, and the other column is the ID of the first connection diagram;

the data processing module 530 is configured to input the edge table, the point table, and the first connection diagram into a preset Spark program to obtain a diagram database bottom file corresponding to each first connection diagram;

the data matching module 540 is configured to call a calculation model of a graph index and a parameter combination of a parameter list, search for target data matched with the parameter combination in the graph database bottom file, and perform graph index calculation on the target data according to the calculation model to obtain a graph index calculation result;

and a data storage module 550, configured to store the graph index calculation result in the HBase by calling the api of the HBase.

It can be understood that the invention provides a better scheme for calculating the calculation results of the graph indexes in a batch processing mode, so that the query of the business system is only one retrieval query without excessive calculation logic, and the calculation results are stored in one place, so that the retrieval query can be repeated without running the calculation logic every time.

The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A graph index stream batch integration processing method is characterized by comprising the following steps:

calling a calculation model of the graph indexes and parameter combinations of the parameter lists, searching target data matched with the parameter combinations in the graph database bottom layer file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result; wherein the graph indicator is used to characterize a corresponding feature in the graph;

storing the graph index calculation result into the HBase by calling the api of the HBase;

wherein the method further comprises:

if yes, traversing the parameter combination and calculating the graph index;

2. The method of claim 1, wherein obtaining an edge table and a point table from the full graph data in the graph database, and storing the edge table and the point table in the Hive database comprises:

3. The method according to claim 1, wherein inputting the edge table, the point table, and the first connection diagram into a preset Spark program to obtain a database base file corresponding to each first connection diagram, comprises:

4. The method of claim 1, further comprising:

5. A graph index stream batch integration processing apparatus, the apparatus comprising:

the data matching module is used for calling a calculation model of the graph index and a parameter combination of the parameter list, searching target data matched with the parameter combination in the graph database bottom file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result; wherein the graph indicator is used to characterize a corresponding feature in the graph;

the data storage module is used for storing the graph index calculation result into the HBase by calling the api of the HBase;

the data matching module is specifically configured to:

if yes, traversing the parameter combination and calculating the graph index;

6. The device of claim 5, wherein the data acquisition module is specifically configured to:

7. The apparatus according to claim 5, wherein the data processing module is specifically configured to:

8. The apparatus of claim 5, wherein the data storage module is specifically configured to: