CN112732727B - Graph index flow batch integrated processing method and device - Google Patents

Graph index flow batch integrated processing method and device Download PDF

Info

Publication number
CN112732727B
CN112732727B CN202110365349.5A CN202110365349A CN112732727B CN 112732727 B CN112732727 B CN 112732727B CN 202110365349 A CN202110365349 A CN 202110365349A CN 112732727 B CN112732727 B CN 112732727B
Authority
CN
China
Prior art keywords
graph
database
data
parameter
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110365349.5A
Other languages
Chinese (zh)
Other versions
CN112732727A (en
Inventor
顾凌云
郭志攀
王伟
张晓丰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Bingjian Information Technology Co ltd
Original Assignee
Nanjing Bingjian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Bingjian Information Technology Co ltd filed Critical Nanjing Bingjian Information Technology Co ltd
Priority to CN202110365349.5A priority Critical patent/CN112732727B/en
Publication of CN112732727A publication Critical patent/CN112732727A/en
Application granted granted Critical
Publication of CN112732727B publication Critical patent/CN112732727B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/174Redundancy elimination performed by the file system
    • G06F16/1744Redundancy elimination performed by the file system using compression, e.g. sparse files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

According to the method and the device for processing the graph indexes in a batch-to-batch integrated mode, the side table and the point table are obtained according to the whole graph data and stored in a Hive database, the side table and the point table are input into a calculation model to generate a first connection chart, the side table, the point table and the first connection chart are input into a preset program to obtain a graph database bottom file, a parameter combination of the calculation model of the graph indexes and the parameter list is called, target data matched with the parameter combination is searched in the graph database bottom file, the graph indexes are calculated according to the calculation model to obtain a graph index calculation result, and the graph index calculation result is stored into Hbase by calling api of the HBase. The data of different connected graphs are obtained by splitting the data of the whole graph, different graph database bottom layer files are accurately generated respectively, the data do not need to be pulled across the network during index calculation, the performance is higher, and the calculation result can be accurately stored.

Description

Graph index flow batch integrated processing method and device
Technical Field
The application relates to the technical field of graph calculation, in particular to a graph index flow batch integrated processing method and device.
Background
In related graph metrics processing techniques, for the computation of each graph metric, the business system needs to initiate two requests to the computation service. The first request is to obtain all input data combinations and buffer them inside the business system, which may increase the complexity of the business system. When the graph database instance calculates the graph index, data participating in calculation is obtained from the underlying distributed storage system, which may cause network overhead to be large and some uncontrollable emergencies to exist.
Disclosure of Invention
The application provides a method and a device for integrally processing a graph index stream in batch, so as to solve the technical problems of the background technology.
A graph index stream batch integration processing method comprises the following steps:
obtaining an edge table and a point table according to the whole graph data in a graph database, and storing the edge table and the point table into a Hive database;
generating a first connection chart based on the edge table and the point table in the Hive database and a preset Spark calculation model; the first connection diagram comprises two columns, one column is the ID of a point, and the other column is the ID of the first connection diagram;
inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a map database bottom file corresponding to each first connection chart;
calling a calculation model of the graph indexes and parameter combinations of the parameter lists, searching target data matched with the parameter combinations in the graph database bottom layer file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result;
and storing the graph index calculation result into the HBase by calling the api of the HBase.
Further, obtaining an edge table and a point table according to the full map data in the map database, and storing the edge table and the point table into the Hive database, including:
and accessing the graph database through a data reading interface, exporting the total data corresponding to the target graph in the graph database to obtain an edge table and a point table, and storing the edge table and the point table into a Hive database.
Further, the step of inputting the edge table, the point table, and the first connection diagram into a preset Spark program to obtain a diagram database bottom file corresponding to each first connection diagram includes:
when the edge table and the point table are input into a preset Spark calculation model, the Spark calculation model traverses the data of each first connected graph, and the data of the same first connected graph are aggregated into the bottom layer file of the same map database;
and compressing the bottom file of each database to obtain a corresponding compressed package, and storing the compressed package into an HDFS file system.
Further, before performing graph index calculation on the target data according to the calculation model, the method further includes:
before the graph index calculation is carried out, extracting the parameter combination of a parameter list, and judging whether the parameter combination of the parameter list has a mapping relation with a mapping table of a graph database bottom layer file;
if yes, traversing the parameter combination and calculating the graph index;
and if not, traversing the bottom file of each first connection diagram, inquiring all the parameter combinations, solidifying the mapping relation between the parameter combinations and the bottom files of the first connection diagrams into the Hive database, traversing the parameter groups and calculating the diagram indexes.
Further, the method further comprises:
and storing the graph index calculation results into the Hive database in parallel.
A graph index stream batch integration processing apparatus, the apparatus comprising:
the data acquisition module is used for obtaining an edge table and a point table according to the full-image data in the image database and storing the edge table and the point table into the Hive database;
the data calculation module is used for generating a first communication chart based on the edge table and the point table in the Hive database and a preset Spark calculation model; the first connection diagram comprises two columns, one column is the ID of a point, and the other column is the ID of the first connection diagram;
the data processing module is used for inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a map database bottom file corresponding to each first connection chart;
the data matching module is used for calling a calculation model of the graph index and a parameter combination of the parameter list, searching target data matched with the parameter combination in the graph database bottom file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result;
and the data storage module is used for storing the graph index calculation result into the HBase by calling the api of the HBase.
Further, the data acquisition module is specifically configured to:
and accessing the graph database through a data reading interface, exporting the total data corresponding to the target graph in the graph database to obtain an edge table and a point table, and storing the edge table and the point table into a Hive database.
Further, the data processing module is specifically configured to:
when the edge table and the point table are input into a preset Spark calculation model, the Spark calculation model traverses the data of each first connected graph, and the data of the same first connected graph are aggregated into the bottom layer file of the same map database;
and compressing the bottom file of each database to obtain a corresponding compressed package, and storing the compressed package into an HDFS file system.
Further, the data matching module is specifically configured to:
before the graph index calculation is carried out, extracting the parameter combination of a parameter list, and judging whether the parameter combination of the parameter list has a mapping relation with a mapping table of a graph database bottom layer file;
if yes, traversing the parameter combination and calculating the graph index;
and if not, traversing the bottom file of each first connection diagram, inquiring all the parameter combinations, solidifying the mapping relation between the parameter combinations and the bottom files of the first connection diagrams into the Hive database, traversing the parameter groups and calculating the diagram indexes.
Further, the data storage module is specifically configured to:
and storing the graph index calculation results into the Hive database in parallel.
When the graph index stream batch integrated processing method and device in the embodiment of the application are applied, the edge table and the point table are obtained according to the whole graph data in the graph database and are stored in the Hive database; inputting the edge table and the point table into a Spark calculation model to generate a first connection chart; inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a graph database bottom layer file; calling a calculation model of the graph indexes and parameter combinations of the parameter list, searching target data matched with the parameter combinations in a bottom file of a graph database, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result; and storing the graph index calculation result into the HBase by calling the api of the HBase. In the invention, because the data of different connected graphs obtained by splitting the whole graph data in the graph database respectively and accurately generate different graph database bottom files, the data does not need to be pulled across the network during index calculation, the performance is higher, and the calculation result can be accurately and quickly stored in Hbase. Therefore, the whole image data can be quickly processed, the processing time of the image index can be effectively reduced, the processing efficiency of the image index is improved, and the time cost is effectively reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a flowchart of a batch integration processing method for graph index flow according to an embodiment of the present invention;
fig. 2 is a functional block diagram of an index flow batch integrated processing apparatus according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.
Referring to fig. 1, a flow chart of a graph index flow batch integration processing method according to an embodiment of the present invention is shown, and further, the graph index flow batch integration processing method may specifically include the contents described in the following steps S21 to S25.
And step S21, obtaining an edge table and a point table according to the whole graph data in the graph database, and storing the edge table and the point table into the Hive database.
Illustratively, the edge table is composed of two parts, namely a table head node and a table node, and each vertex in the graph corresponds to one table head node stored in the array. The point table indicates the number of variables used in the automation control system, called the point number.
Step S22, generating a first connection chart based on the edge table and the point table in the Hive database and a preset Spark calculation model.
Illustratively, the first connection diagram includes two columns, one column is the ID of the point, and the other column is the ID of the first connection diagram.
Step S23, inputting the edge table, the point table, and the first connection diagram into a preset Spark program, to obtain a diagram database file corresponding to each first connection diagram.
Illustratively, the connectivity graph is used to characterize connectivity as a fundamental property of the graph; the map database base file is used for representing the minimum unit of the map database. According to the scheme, different map database bottom files are generated by the data of different connected maps, so that data do not need to be pulled across a network during index calculation, and the performance is higher. A Spark calculation model is used so that the computational resources can be allocated on demand.
And step S24, calling a calculation model of the graph index and a parameter combination of the parameter list, searching target data matched with the parameter combination in the graph database bottom file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result.
Illustratively, the graph index is used to characterize the corresponding feature (e.g., color, shape, etc.) in the graph
And step S25, storing the graph index calculation result into the HBase by calling the api of the HBase.
It is understood that, in the execution of the above-mentioned steps S21-S25, the edge table and the point table are obtained according to the whole graph data in the graph database and stored in the Hive database; inputting the edge table and the point table into a Spark calculation model to generate a first connection chart; inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a graph database bottom layer file; calling a calculation model of the graph indexes and parameter combinations of the parameter list, searching target data matched with the parameter combinations in a bottom file of a graph database, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result; and storing the graph index calculation result into the HBase by calling the api of the HBase. In the invention, because the data of different connected graphs obtained by splitting the whole graph data in the graph database respectively and accurately generate different graph database bottom files, the data does not need to be pulled across the network during index calculation, the performance is higher, and the calculation result can be accurately and quickly stored in Hbase. Therefore, the whole image data can be quickly processed, the processing time of the image index can be effectively reduced, the processing efficiency of the image index is improved, and the time cost is effectively reduced.
In a specific implementation process, the inventors found that, in obtaining the edge table and the point table from the full map data in the map database, there is a technical problem that data reading is wrong, so that it is difficult to accurately store the edge table and the point table in the Hive database, and in order to improve the technical problem, the step of obtaining the edge table and the point table from the full map data in the map database and storing the edge table and the point table in the Hive database described in step S21 may specifically include the following step S211.
Step S211, accessing the graph database through a data reading interface, exporting the total data corresponding to the target graph in the graph database to obtain an edge table and a point table, and storing the edge table and the point table in a Hive database.
Illustratively, the map database is accessed through the corresponding interface, so that the corresponding data can be obtained, and the problem of data reading errors is reduced.
It can be understood that when the contents described in the above step S211 are executed, when the edge table and the point table are obtained from the full map data in the map database, the technical problem of data reading errors is avoided, so that the edge table and the point table can be accurately stored in the Hive database.
In an actual operation process, the inventor finds that when the edge table, the point table, and the first connection chart are input into a preset Spark program, there is a calculation problem of data calculation error, so that it is difficult to easily and accurately obtain a map database base file corresponding to each first connection chart, and in order to improve the above technical problem, the step of inputting the edge table, the point table, and the first connection chart into the preset Spark program, which is described in step S23, to obtain a map database base file corresponding to each first connection chart may specifically include the following contents described in step S231 and step S232.
Step S231, when the edge table and the point table are input into a preset Spark calculation model, the Spark calculation model traverses data of each first connected graph, and aggregates data of the same first connected graph into a bottom file of the same map database.
Illustratively, the data of the second connectivity graph is merged together in parallel to obtain a copy of the graph database base file. Therefore, all connected graphs can be effectively integrated, and the loss of the connected graphs is effectively avoided. Due to the use of the Spark calculation model, calculation resources can be allocated according to needs.
Step S232, compressing the bottom layer file of each database to obtain a corresponding compressed package, and storing the compressed package into the HDFS file system.
Illustratively, the temporary space of the map database bottom layer file can be effectively reduced by compressing the map database bottom layer file in such a way, so that the pressure of the storage space can be reduced, and the condition of disordered stored data is avoided.
It can be understood that, when the contents described in step S231 and step S232 are executed, when the edge table, the point table, and the first connection diagram are input into the preset Spark program, the calculation problem of data calculation errors is avoided, so that the map database base file corresponding to each first connection diagram can be accurately obtained.
Based on the above, before the graph index calculation is performed on the target data according to the calculation model, the method further includes the following steps S11-S13.
Step S11, before calculating the graph index, extracting the parameter combination of the parameter list, and judging whether the parameter combination of the parameter list has a mapping relation with the mapping table of the map database bottom file.
Illustratively, the calculation performance of different index calculation logics for the same parameter combination is higher due to the solidified mapping relation between the parameter combination and the underlying graph database file.
And step S12, if the parameters exist, traversing the parameter combination and calculating the graph index.
Step S13, if not, traversing the bottom file of each first connection diagram, querying out all the parameter combinations, solidifying the mapping relationship between the parameter combinations and the bottom files of the first connection diagrams into the Hive database, traversing the parameter sets, and performing graph index calculation.
It can be understood that, when the contents described in the above steps S11-S13 are executed, since the mapping relationship between the parameter combinations and the underlying graph data file is solidified, it is possible to support the computation of the graph indexes of only part of the parameter combinations, and it is not necessary to compute the whole parameter combinations each time, and specifically compute which combinations are decided by the parameters transmitted by the service system, thereby implementing streaming computation.
Based on the above, the method further includes the content described in step S31.
And step S31, storing the graph index calculation results into the Hive database in parallel.
It can be understood that, when the content described in the step S31 is executed, the result is stored in Hive, so that we can query the calculation result using SQL language, thereby effectively avoiding secondary calculation, reducing processing time, and enabling the query of the business system to be only a search query without excessive calculation logic, and the calculation result is stored in one place, so that the search query can be repeated without running the calculation logic each time; network overhead may also be minimized.
Based on the same inventive concept, the invention also provides a graph index stream batch integrated processing system, which comprises a graph data processing terminal and a graph data input end, wherein the graph data processing terminal is in communication connection with the graph data input end, and the graph data processing terminal is specifically used for:
obtaining an edge table and a point table according to the whole graph data in a graph database, and storing the edge table and the point table into a Hive database;
generating a first connection chart based on the edge table and the point table in the Hive database and a preset Spark calculation model; the first connection diagram comprises two columns, one column is the ID of a point, and the other column is the ID of the first connection diagram;
inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a map database bottom file corresponding to each first connection chart;
calling a calculation model of the graph indexes and parameter combinations of the parameter lists, searching target data matched with the parameter combinations in the graph database bottom layer file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result;
and storing the graph index calculation result into the HBase by calling the api of the HBase.
Further, the graph data processing terminal is specifically configured to:
and accessing the graph database through a data reading interface, exporting the total data corresponding to the target graph in the graph database to obtain an edge table and a point table, and storing the edge table and the point table into a Hive database.
Further, the graph data processing terminal is specifically configured to:
when the edge table and the point table are input into a preset Spark calculation model, the Spark calculation model traverses the data of each first connected graph, and the data of the same first connected graph are aggregated into the bottom layer file of the same map database;
and compressing the bottom file of each database to obtain a corresponding compressed package, and storing the compressed package into an HDFS file system.
Further, the graph data processing terminal is specifically configured to:
before the graph index calculation is carried out, extracting the parameter combination of a parameter list, and judging whether the parameter combination of the parameter list has a mapping relation with a mapping table of a graph database bottom layer file;
if yes, traversing the parameter combination and calculating the graph index;
and if not, traversing the bottom file of each first connection diagram, inquiring all the parameter combinations, solidifying the mapping relation between the parameter combinations and the bottom files of the first connection diagrams into the Hive database, traversing the parameter groups and calculating the diagram indexes.
Further, the graph data processing terminal is specifically configured to:
and storing the graph index calculation results into the Hive database in parallel.
Based on the same inventive concept, please refer to fig. 2 in combination, a functional block diagram of the graph index flow batch integration processing apparatus 500 is also provided, and the detailed description of the graph index flow batch integration processing apparatus 500 is as follows.
A graph index stream batch integration processing device 500 applied to a graph data processing terminal, the device 500 comprising:
the data acquisition module 510 is configured to obtain an edge table and a point table according to the full map data in the map database, and store the edge table and the point table in the Hive database;
a data calculation module 520, configured to generate a first connection chart based on the edge table and the point table in the Hive database and a preset Spark calculation model; the first connection diagram comprises two columns, one column is the ID of a point, and the other column is the ID of the first connection diagram;
the data processing module 530 is configured to input the edge table, the point table, and the first connection diagram into a preset Spark program to obtain a diagram database bottom file corresponding to each first connection diagram;
the data matching module 540 is configured to call a calculation model of a graph index and a parameter combination of a parameter list, search for target data matched with the parameter combination in the graph database bottom file, and perform graph index calculation on the target data according to the calculation model to obtain a graph index calculation result;
and a data storage module 550, configured to store the graph index calculation result in the HBase by calling the api of the HBase.
It can be understood that the invention provides a better scheme for calculating the calculation results of the graph indexes in a batch processing mode, so that the query of the business system is only one retrieval query without excessive calculation logic, and the calculation results are stored in one place, so that the retrieval query can be repeated without running the calculation logic every time.
The implementation process of the functions and actions of each module in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.
Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.
It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims (8)

1. A graph index stream batch integration processing method is characterized by comprising the following steps:
obtaining an edge table and a point table according to the whole graph data in a graph database, and storing the edge table and the point table into a Hive database;
generating a first connection chart based on the edge table and the point table in the Hive database and a preset Spark calculation model; the first connection diagram comprises two columns, one column is the ID of a point, and the other column is the ID of the first connection diagram;
inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a map database bottom file corresponding to each first connection chart;
calling a calculation model of the graph indexes and parameter combinations of the parameter lists, searching target data matched with the parameter combinations in the graph database bottom layer file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result; wherein the graph indicator is used to characterize a corresponding feature in the graph;
storing the graph index calculation result into the HBase by calling the api of the HBase;
wherein the method further comprises:
before the graph index calculation is carried out, extracting the parameter combination of a parameter list, and judging whether the parameter combination of the parameter list has a mapping relation with a mapping table of a graph database bottom layer file;
if yes, traversing the parameter combination and calculating the graph index;
and if not, traversing the bottom file of each first connection diagram, inquiring all the parameter combinations, solidifying the mapping relation between the parameter combinations and the bottom files of the first connection diagrams into the Hive database, traversing the parameter groups and calculating the diagram indexes.
2. The method of claim 1, wherein obtaining an edge table and a point table from the full graph data in the graph database, and storing the edge table and the point table in the Hive database comprises:
and accessing the graph database through a data reading interface, exporting the total data corresponding to the target graph in the graph database to obtain an edge table and a point table, and storing the edge table and the point table into a Hive database.
3. The method according to claim 1, wherein inputting the edge table, the point table, and the first connection diagram into a preset Spark program to obtain a database base file corresponding to each first connection diagram, comprises:
when the edge table and the point table are input into a preset Spark calculation model, the Spark calculation model traverses the data of each first connected graph, and the data of the same first connected graph are aggregated into the bottom layer file of the same map database;
and compressing the bottom file of each database to obtain a corresponding compressed package, and storing the compressed package into an HDFS file system.
4. The method of claim 1, further comprising:
and storing the graph index calculation results into the Hive database in parallel.
5. A graph index stream batch integration processing apparatus, the apparatus comprising:
the data acquisition module is used for obtaining an edge table and a point table according to the full-image data in the image database and storing the edge table and the point table into the Hive database;
the data calculation module is used for generating a first communication chart based on the edge table and the point table in the Hive database and a preset Spark calculation model; the first connection diagram comprises two columns, one column is the ID of a point, and the other column is the ID of the first connection diagram;
the data processing module is used for inputting the edge table, the point table and the first connection chart into a preset Spark program to obtain a map database bottom file corresponding to each first connection chart;
the data matching module is used for calling a calculation model of the graph index and a parameter combination of the parameter list, searching target data matched with the parameter combination in the graph database bottom file, and performing graph index calculation on the target data according to the calculation model to obtain a graph index calculation result; wherein the graph indicator is used to characterize a corresponding feature in the graph;
the data storage module is used for storing the graph index calculation result into the HBase by calling the api of the HBase;
the data matching module is specifically configured to:
before the graph index calculation is carried out, extracting the parameter combination of a parameter list, and judging whether the parameter combination of the parameter list has a mapping relation with a mapping table of a graph database bottom layer file;
if yes, traversing the parameter combination and calculating the graph index;
and if not, traversing the bottom file of each first connection diagram, inquiring all the parameter combinations, solidifying the mapping relation between the parameter combinations and the bottom files of the first connection diagrams into the Hive database, traversing the parameter groups and calculating the diagram indexes.
6. The device of claim 5, wherein the data acquisition module is specifically configured to:
and accessing the graph database through a data reading interface, exporting the total data corresponding to the target graph in the graph database to obtain an edge table and a point table, and storing the edge table and the point table into a Hive database.
7. The apparatus according to claim 5, wherein the data processing module is specifically configured to:
when the edge table and the point table are input into a preset Spark calculation model, the Spark calculation model traverses the data of each first connected graph, and the data of the same first connected graph are aggregated into the bottom layer file of the same map database;
and compressing the bottom file of each database to obtain a corresponding compressed package, and storing the compressed package into an HDFS file system.
8. The apparatus of claim 5, wherein the data storage module is specifically configured to:
and storing the graph index calculation results into the Hive database in parallel.
CN202110365349.5A 2021-04-06 2021-04-06 Graph index flow batch integrated processing method and device Active CN112732727B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110365349.5A CN112732727B (en) 2021-04-06 2021-04-06 Graph index flow batch integrated processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110365349.5A CN112732727B (en) 2021-04-06 2021-04-06 Graph index flow batch integrated processing method and device

Publications (2)

Publication Number Publication Date
CN112732727A CN112732727A (en) 2021-04-30
CN112732727B true CN112732727B (en) 2021-06-18

Family

ID=75596434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110365349.5A Active CN112732727B (en) 2021-04-06 2021-04-06 Graph index flow batch integrated processing method and device

Country Status (1)

Country Link
CN (1) CN112732727B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502718A (en) * 2019-07-10 2019-11-26 中国电力科学研究院有限公司 A kind of power information system high-performance formula calculating realization method and system

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5689361B2 (en) * 2011-05-20 2015-03-25 インターナショナル・ビジネス・マシーンズ・コーポレーションInternational Business Machines Corporation Method, program, and system for converting a part of graph data into a data structure that is an image of a homomorphic map
CN108090179A (en) * 2017-12-15 2018-05-29 北京海致星图科技有限公司 A kind of method of the concurrent subgraph inquiries of Spark
CN110727836B (en) * 2019-12-17 2020-04-07 南京华飞数据技术有限公司 Social network analysis system based on Spark GraphX and implementation method thereof
CN111241353B (en) * 2020-01-16 2023-08-22 支付宝(杭州)信息技术有限公司 Partitioning method, device and equipment for graph data
CN112148834B (en) * 2020-08-24 2022-03-29 北京工商大学 Graph embedding-based high-risk food and hazard visual analysis method and system
CN112052413B (en) * 2020-08-28 2024-02-13 上海谋乐网络科技有限公司 URL fuzzy matching method, device and system
CN112230894B (en) * 2020-10-19 2024-09-13 浪潮通信信息系统有限公司 Flink-based flow batch integrated index design method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502718A (en) * 2019-07-10 2019-11-26 中国电力科学研究院有限公司 A kind of power information system high-performance formula calculating realization method and system

Also Published As

Publication number Publication date
CN112732727A (en) 2021-04-30

Similar Documents

Publication Publication Date Title
CN106407311B (en) Method and device for obtaining search result
CN111258978B (en) Data storage method
US11954133B2 (en) Method and apparatus for managing and controlling resource, device and storage medium
CN108536808B (en) Spark calculation framework-based data acquisition method and device
WO2018201887A1 (en) Data response method, apparatus, terminal device, and medium
CN110569252B (en) Data processing system and method
CN108363741B (en) Big data unified interface method, device, equipment and storage medium
EP4170514A1 (en) Data association query method and apparatus, and device and storage medium
CN108268468B (en) Big data analysis method and system
CN111382182A (en) Data processing method and device, electronic equipment and storage medium
CN111125199A (en) Database access method and device and electronic equipment
CN112732727B (en) Graph index flow batch integrated processing method and device
CN107463671B (en) Method and device for path query
CN117131230A (en) Data blood edge analysis method, device, equipment and storage medium
CN110083598B (en) Spark-oriented remote sensing data indexing method, system and electronic equipment
CN111125108A (en) HBASE secondary index method, device and computer equipment based on Lucene
CN113268487B (en) Data statistical method, device and computer readable storage medium
CN113360503B (en) Test data tracking method and device for distributed database
CN115292415A (en) Database access method and device
CN116204579A (en) Method, apparatus, device, storage medium and program product for selecting computing engine
CN114492844A (en) Method and device for constructing machine learning workflow, electronic equipment and storage medium
CN114443686A (en) Compression graph construction method and device based on relational data
CN109960695B (en) Management method and device for database in cloud computing system
CN113064914A (en) Data extraction method and device
CN112749189A (en) Data query method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant