CN117009038A

CN117009038A - Graph computing platform based on cloud native technology

Info

Publication number: CN117009038A
Application number: CN202311283918.7A
Authority: CN
Inventors: 杨建明; 陈红阳; 吕劲松; 杨文涛; 余磊
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-10-07
Filing date: 2023-10-07
Publication date: 2023-11-07
Anticipated expiration: 2043-10-07
Also published as: CN117009038B

Abstract

The application discloses a graph computing platform based on a cloud native technology, which comprises a software and hardware collaboration system, a graph storage system, a graph computing engine and a graph development factory system, wherein the software and hardware collaboration system, the graph storage system, the graph computing engine and the graph development factory system are designed by using a cloud native architecture; the software and hardware collaboration system, the graph storage system and the graph calculation engine run at the back end, and the graph development plant runs at the front end. The system comprises a graph storage system, a graph query engine, a graph analysis engine and a graph learning engine, wherein the graph development factory system is a visual operating system which is constructed based on the graph calculation engine and faces to the graph calculation field, is deployed by a K8s containerization technology and is used for unitizing, procedural and visualizing the whole development process of a graph calculation algorithm. The platform has the advantages of clear framework, strong expandability, low use threshold and high calculation efficiency.

Description

Graph computing platform based on cloud native technology

Technical Field

The application relates to the field of graph computation, in particular to a graph computation platform based on a cloud native technology.

Background

The development of graph theory can be traced back to the 18 th century, and the Euler proposed a well-known "Ke Nisi Bay seven-bridge problem", which initiated the research hot trend of graph theory, and gradually became an independent subject over time. The European data has regular distribution and fixed structure, and cannot flexibly represent complex relationships among things; the graph structure in the non-Euclidean space can represent the complex relationship of everything, and has strong data expression capability. In recent years, research of graph calculation is one of the most hot fields in the field of artificial intelligence, and the graph calculation is considered to be a core element for promoting artificial intelligence to enter a cognitive intelligence stage from a perception intelligence stage, and has various application scenes including social network analysis, biological information fields, road planning, financial wind control, recommendation systems and the like. As an infrastructure for graph computation, the graph computation platform plays a vital role in graph algorithm development efficiency and popularization places.

The graph computing platform generally comprises graph data storage and query, a distributed graph analysis algorithm, a graph neural network model and training, the computing modes are various, the mutual collaboration of a plurality of complex systems is involved, the threshold for learning and using by a user is high, meanwhile, the scale of the graph can influence the efficiency of a computing framework because the graph data size is large, and the common practice is to design aiming at the requirements and characteristics of a specific scene in order to reduce the implementation difficulty.

The mixed execution behavior of the graph aggregation and graph update phases results in different access modes, data multiplexing rates, computation modes, computation intensities and execution constraints, which make efficient graph computation extremely complex, and the challenges can be mainly divided into two aspects of computation and access. 1) The calculation aspect is as follows: it is desirable to be able to efficiently cope with both irregular and dense regularity calculations. Because nodes in the graph have extremely high irregularity and follow power law distribution, the traversal of neighbor nodes in the graph aggregation stage can cause serious load imbalance. 2) Access aspect: it is desirable to be able to efficiently cope with both irregular and regular coarse-grained memory access and high bandwidth requirements. These two problems limit the computational scale and computational efficiency of graph computation.

The graph learning algorithm enables machine learning to be applied to a graph structure of a non-Euclidean space, and has the capability of learning a graph. However, the graph computing platforms developed in the industry are limited by respective business scenes, all subsystems are mutually split, some platforms focus on graph data storage and query analysis, some platforms are good at graph analysis algorithms, most of the existing platforms in the graph learning field are suitable for supervised and centralized business scenes, all of the existing platforms solve part of the problems in the graph computing field, and the system support for the flow and visualization of the whole graph computing research and development period is lacking. In addition, the current graph computing platform has limited support on data and algorithms in the scientific computing field, is not comprehensive enough for domestic hardware support in the clamped neck field, and has a higher threshold for non-professional persons to enter the graph computing field for research and development.

Disclosure of Invention

Based on the problems of complex system, high use threshold and low calculation efficiency commonly existing in the current graph calculation system, the application provides a graph calculation platform of a cloud native technology, which can support hardware resources and graph calculation chips by a hardware virtualization technology to provide heterogeneous calculation capability of software and hardware cooperation, solve I/O bottleneck in graph calculation based on a shared memory pool mode, provide a main calculation engine system and a storage engine system in the graph calculation platform, perform task scheduling and resource management of graph calculation by deploying all subsystems in a containerization mode, provide a one-stop visual graph development workshop, and can complete modeling of graph data, training of query analysis and graph algorithm, model deployment and other works by a modularized and flowsheet operation mode, and support research, development and construction of application in the field.

The aim of the application is achieved by the following technical scheme:

a graph computing platform based on a cloud native technology, which comprises a software and hardware collaboration system, a graph storage system, a graph computation engine and a graph development factory system, wherein the software and hardware collaboration system is designed by using a cloud native architecture; the software and hardware collaboration system, the graph storage system and the graph calculation engine are operated at the rear end, and the graph development plant is operated at the front end;

the hardware and software cooperation system provides hardware computing resources and a hardware and software adaptation environment, wherein the hardware computing resources comprise various central processors, data processors, FPGA accelerators and graph computing chips; the software and hardware adaptation environment comprises a data flow acceleration library and an operator acceleration library which accelerate a graph calculation algorithm based on heterogeneous hardware resources;

the graph storage system comprises a graph dividing module, a distributed persistent storage module and a distributed shared memory pool module, wherein the graph dividing module is internally provided with a plurality of graph partitioning algorithms; the distributed persistent storage module is used for carrying out disk-drop storage of the graph data through the distributed file system and providing interfaces and services for reading and writing the graph data; the distributed shared memory pool module is used for managing the global memory space of the host memory and the GPU video memory on a plurality of computing nodes based on the remote direct data access technology, so as to realize the distributed memory sharing in the cluster;

the graph calculation engine comprises a graph query engine, a graph analysis engine and a graph Xi Yinqing, which are respectively used for realizing the direct query of graph associated data, the distributed graph analysis and the training of a parallel graph learning model;

the graph development factory system is a visual operating system which is constructed based on the graph calculation engine and oriented to the graph calculation field, is deployed by a K8s containerization technology and is used for unitizing, flowing and visualizing the whole development process of the graph calculation algorithm.

Further, the software and hardware collaboration system supports expansibility hardware equipment comprising a GPU processor, a CPU processor, an FPGA acceleration card and a memory and computation integrated chip, develops corresponding compiling environments and software tools, provides memory and computation integrated operators for graph data, and provides heterogeneous computing capability for a graph computing engine, wherein the memory and computation integrated operators are an operator acceleration module adapted to specific hardware.

Further, the distributed persistent storage module is realized based on an NVMe SSD technology; the distributed shared memory pool module is realized based on RDMA technology, and performs data synchronization based on the distributed persistent storage module and the distributed file system.

Further, the direct query of the graph association data in the graph query engine comprises a multi-order neighbor query of a single point, an association path query between two points and a sub-graph query for acquiring the association between the multiple points.

Further, the graph query engine comprises a graph query analyzer and a graph query executor, wherein the graph query analyzer is used for analyzing query sentences input by a user, constructing a query grammar tree and generating an execution plan; the graph query executor is used for optimizing the grammar tree produced by the graph query analyzer, then performing resource scheduling and planning execution according to the resource quota, and processing large-scale graph data in the graph storage system in a mode of minimizing running time and resource consumption by using a distributed parallel computing technology.

Further, the graph analysis engine comprises a graph analysis algorithm library and a graph analysis calculation module, wherein the graph analysis algorithm library comprises an efficient data structure for graph data processing and various general functions and class libraries, a programmable tool set is provided for a user, the user is allowed to realize and test the graph analysis algorithm, and parameters of the graph analysis algorithm are changed and adjusted, so that the performance of the graph analysis algorithm is optimized and improved;

the graph analysis and calculation module operates on a distributed system and comprises a series of bottom operators and data processing logic capable of processing graph structures, supports communication and cooperation among multiple nodes, and combines calculation capability provided by hardware calculation resources to realize distributed calculation and parallel calculation of a graph analysis algorithm.

Further, the graph learning engine comprises a graph Xi Suanfa library and a graph learning training module, wherein the graph Xi Suanfa library is a software tool package for constructing and training a graph neural network model, and comprises various graph data sets and a pre-trained graph neural network model; the diagram learning training module is used for sequentially realizing feature extraction and representation, sub-sampling, model construction and selection, training strategy and parameter adjustment, and model evaluation and tuning.

Further, the graph development factory system comprises a graph data modeling module, a graph data visualization module, a graph algorithm component module, a graph training flow construction module and a graph model deployment module;

the diagram data modeling module is used for defining a diagram data standard storage structure, abstracting things in the real world into nodes and edges in the diagram, and describing the relation and characteristics among the things by using the structure and the attribute of the diagram;

the diagram data visualization module is used for visualizing the result data after diagram query, diagram analysis and diagram learning;

the graph algorithm component module is used for realizing model codes and related algorithms of model training and reasoning and comprises a graph data conversion component, a characteristic engineering component, a graph algorithm operator component, a GNN model training and evaluating component, a parameter optimization component and a framework searching component;

the diagram training process construction module uses an automatic tool to construct and manage processing steps of data preprocessing, feature engineering, model training and model evaluation in a Pipeline mode to form a complete and repeatable data processing process;

the graph model deployment module is used for deploying the trained graph algorithm model into a remote server and is used for realizing model deployment, model monitoring, configuration management and model reasoning service and providing support for a business system or an end user.

Further, the visualization method comprises the following steps: node coloring and size, edge drawing and coloring, clustering and subgraph, and interactive control are adjusted.

Further, the visualization through clustering and subgraphs specifically comprises the following steps: nodes are divided into different groups according to the similarity between the nodes, and are presented as subgraphs.

The beneficial effects of the application are as follows:

the graph computing platform provided by the application can realize high-efficiency coordination among all subsystems of the platform, can perform reasonable resource management according to requirements, and realizes dynamic capacity expansion/contraction; the software and hardware collaboration system performs virtualization management on heterogeneous computing resources based on a cloud primitive technology, and transparentization of computing resources and computing logic is realized based on the provided graph algorithm operator library and software and hardware adaptation.

The interconnection and intercommunication of the GPU video memories of all nodes in the cluster and the host memory are realized based on the distributed shared memory pool module, extremely high communication efficiency can be achieved depending on the bandwidth of a high-speed network, the widely existing I/O bottleneck in graph calculation can be solved, and the distributed calculation of a large-scale graph algorithm is accelerated.

The graph development factory designed by the application is a visual one-stop multi-user collaborative graph computing development foreground system, is a machine learning operating system in the graph computing field, can reduce the threshold of graph computing development in a componentized, procedural and easy-to-operate use mode, and greatly improves the efficiency of algorithm development and online iteration.

Drawings

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.

The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.

Fig. 1 is a schematic architecture diagram of a graph computing platform based on cloud native technology according to an embodiment of the present application.

Fig. 2 is a schematic diagram of a software and hardware collaboration system according to an embodiment of the application.

Fig. 3 is a schematic diagram of an architecture of a graph storage system according to an embodiment of the application.

FIG. 4 is a schematic diagram of an architecture of a computing engine according to an embodiment of the present application.

FIG. 5 is a schematic diagram of a development plant system according to an embodiment of the present application.

Detailed Description

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used in this disclosure refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

CPU: central Processing Unit, a central processing unit;

GPU: graphics Processing Unit, a graphics processing unit;

DPU: data Processing Unit, a data processing unit;

UI: user Interface;

RDMA: remote Direct Memory Access, remote direct data access;

MLOps: machine Learning Operations, machine learning operating system;

NVMe: non-Volatile Memory Express, nonvolatile memory;

SSD: solid State Disk, solid State memory;

HDD: hard Disk Drive, mechanical Hard Disk;

DPU: data Processing Unit, a data processor;

ROM: read-Only Memory;

RAM: random Access Memory, random access memory;

and (3) FPGA: field Programmable Gate Array, field programmable gate array;

k8s: kubernetes, distributed architecture solution based on container technology;

API: application Programming Interface, application program interface;

GNN: graph Neural Network, neural network.

The graph computing platform based on the cloud native technology of the present application may be implemented in any computer language such as C, C ++, java, go, python, etc., and the implemented methods and systems may be run on a computer node or multiple computer nodes, each computer node being implemented in a hardware environment that may include one or more of the following components: CPU processor, GPU processor, FPGA acceleration card, memory and network card. The Memory may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (ROM). The memory may be used to store instructions, programs, code, sets of codes, or instructions.

The processor may include one or more processing cores. The processor connects various parts within the overall terminal using various interfaces and lines, performs various functions of the terminal and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory, and invoking data stored in the memory. In addition, those skilled in the art will appreciate that the above-described hardware is not limiting as to the type of hardware, and that the hardware may include more or fewer components, or may combine certain components, or a different arrangement of components. For example, the hardware further includes components such as a radio frequency circuit, an input unit, a sensor, an audio circuit, a power supply, and the like, which are not described herein.

As shown in fig. 1, the graph computing platform based on the cloud native technology in this embodiment includes subsystems such as a software and hardware collaboration system, a graph storage system, a graph computing engine, and a graph development factory system, which are designed by using a cloud native architecture, where each subsystem performs resource management and task scheduling based on a containerized technology, and the systems interact through a standardized interface. The system comprises a software and hardware collaboration system, a graph storage system and a graph calculation engine, wherein the graph calculation engine operates at the back end, and the graph development factory comprises a background module and a foreground module, so that the function implementation of a front end UI is provided.

1. Software and hardware cooperative system

The software and hardware cooperation system provides mainstream hardware computing resources and software and hardware adaptation environments at home and abroad, as shown in fig. 2.

1. Hardware resources: the method comprises the steps of CPU, GPU, DPU which is mainstream at home and abroad, manufacturers are not limited to Yu Yingwei, AMD, intel, hua Cheng, shenwei and the like, and each computing node is provided with more than one CPU and more than one GPU; in addition, the graph calculation chip is a special custom-developed chip in the graph calculation field, the memory calculation integrated chip comprises a near memory calculation chip, a memory calculation integrated chip and the like, the FPGA acceleration card is an FPGA programmable array calculation unit customized in the graph calculation field, in addition, the storage resources comprise RAM, ROM, HDD, SSD and other resource types, the network resources comprise network cards, routers, network communication protocols and other communication facilities, and the whole hardware resources belong to heterogeneous calculation architecture.

2. Software and hardware adaptation: in order to adapt to heterogeneous computing resources and storage resources, the cloud native technology virtualizes a software-hardware adaptation environment, realizes the transparency of hardware resources and software implementation layers, and abstracts the following sub-modules:

(1) Hardware environment adaptation: bottom layer driving and programming interface and module adaptation;

(2) Compiling environment and tool: the system comprises a hardware driver, a compiler and a development kit;

(3) An algorithm operator module provides a data flow acceleration library and an operator acceleration library for accelerating a graph calculation algorithm based on heterogeneous hardware resources, provides a programming model suitable for heterogeneous architecture, adapts to the dependent environment for the calculation resources such as CPU, GPU, FPGA, a special chip in the graph calculation field and the like, and provides a general and movable development interface;

(4) Interaction intelligent module: optimizing parameter automatic learning, providing intelligent programming interface, parameter and model automatic recommendation.

2. Graph storage system

The graph storage system layer provides a partitioning algorithm, a distributed persistent storage mode and a distributed shared memory pool mode of large-scale graph data based on the cooperation capability of software and hardware, and provides the graph calculation engine with the read-write capability of massive graph data, as shown in fig. 3, and the graph storage system layer comprises the following modules:

the graph storage system is based on a built-in graph data standardized format specification to construct a graph data benchmark and a graph modeling mode, abstract a general graph data loading/uploading tool package and define a read-write interaction specification of the graph storage system.

1. Graph dividing module

The partitioning method for providing the large graph comprises common graph partitioning algorithms such as random partitioning, node partitioning, edge partitioning, time sequence partitioning and the like, different partitioning modes are selected according to the size, the type and the service scene of graph data, and the partitioning algorithm can be applied to two storage modes of graph persistent storage (e.g. graph database) and distributed shared memory pools (graph memory database).

2. Distributed persistent storage module

Based on the HDD and the SSD, the graph data is segmented based on a partitioning algorithm provided by the graph partitioning module, and the drop disc storage of the graph data is performed through a distributed file system such as HDFS, unionFS, juiceFS, so that a distributed persistent storage scheme of the large-scale graph data is realized, and interfaces and services for reading and writing the graph data are provided.

3. Distributed shared memory pool module

The distributed shared memory pool module can manage the global memory space of the host memory and the GPU video memory on a plurality of computing nodes based on the RDMA technology, and realize the distributed memory/video memory sharing in the cluster. Based on the division algorithm provided by the graph division module, the graph data is segmented, and the graph data is loaded from the distributed persistent storage module to the distributed shared memory pool module, so that the problem of I/O performance bottleneck in graph calculation can be effectively solved. The application utilizes the shared memory pool to realize the sharing of the distributed data in the memory in an extremely low delay mode, reduces the additional I/O storage overhead and improves the instantaneity; and meanwhile, high-level abstraction is carried out on the graph data in the memory, and the object is described by using the hierarchical metadata, so that the conversion cost of the data format is eliminated. The scheme solves the following problems in graph calculation to a great extent:

the method solves the problems of high read-write data waiting and communication overhead caused by frequent iteration of graph calculation.

The method solves the problem of calculation dependence of the graph algorithm on neighbor information of nodes and edges.

The method solves the problem that parallel computation on the partitioned blocks with uneven distribution is difficult to realize by a graph algorithm due to the complex structure of graph data.

Meanwhile, the application provides a using mode that various image data are loaded into the memory pool:

the graph structure data is stored in the video memory, and the characteristic data is stored in the video memory in a slicing way;

the graph structure data is stored in a video memory, and the characteristic data is stored in the memory in a slicing way;

the graph structure data is stored in a memory, and the characteristic data is stored in a video memory in a slicing way;

the graph structure data is stored in the memory, and the characteristic data fragments are stored in the memory.

The loading mode can be selected according to the size of the graph data and the service scene, and the application provides relevant tool packages and parameter selection.

3. Graph computation engine

The graph computing engine is designed into three core computing engines based on a cloud native architecture, as shown in fig. 4, comprising:

1. graph query engine

The graph query engine is based on a distributed read-write interface of a graph storage layer, performs targeted optimization on a graph data storage model, a data structure and a query algorithm, provides direct query capability of graph association data by using a Cypher query language and a Gremlin query language, comprises single-point multi-order neighbor query, two-point association path query and multi-point association sub-graph query acquisition, can support efficient insertion of graph data such as a knowledge graph, and particularly can rapidly insert more than one hundred million data volumes, and further comprises the following two sub-modules:

(1) Graph query parser

It allows a user to write and execute graph query statements in a SQL-like manner to retrieve relevant information in graph data from a graph storage system. Through the graph query parser, a user can accurately screen required data according to the requirements of the user and perform customized data analysis. The process typically involves the steps of parsing a query entered by a user, building a query syntax tree, and generating an execution plan. Common graph query languages include Cypher, gremlin, etc.

(2) Graph query executor

The aim of the runtime engine as the bottom layer of the graph query engine system is to improve query efficiency and expandability, so that a user can quickly complete complex query operation on a large graph. The method has the main functions of optimizing a grammar tree produced by a query parser, then carrying out resource scheduling and planning execution according to resource quota, and processing large-scale graph data in a graph storage system in a mode of minimizing running time and resource consumption by using a distributed parallel computing technology.

2. Graph analysis engine

The graph analysis engine is provided with rich graph analysis algorithm libraries, such as clustering, segmentation, spanning tree, pageRank calculation and the like of the graphs, and distributed computing capability of the graph analysis algorithm is provided, wherein the distributed computing capability comprises offline graph analysis capability supporting large-scale graph data and online graph analysis capability supporting streaming graph data, and data reading and writing interaction is provided between the graph analysis engine and a graph storage system. The graph analysis engine includes the following sub-modules:

(1) Graph analysis algorithm library

The general inclusion of efficient data structures for graph data processing, as well as various general functions and class libraries, provides a user with a programmable, fully customized tool set that allows the user to easily implement and test graph analysis algorithms, as well as to alter and adjust the parameters of the algorithms, thereby optimizing and improving the performance of the algorithms. The graph analysis algorithm library is also well suited for searching and identifying large-scale combined patterns and pattern similarities, which can help users find hidden data associations and patterns. As an efficient and rapid graph analysis tool provided by users, the graph analysis tool has the advantages that various graph analysis algorithms such as community discovery, node centrality calculation, graph matching, path calculation and the like can rapidly and conveniently process a large amount of complex graph data, and is applied to various researches on graphs, including social network, network security, financial transaction analysis and the like.

(2) Graph analysis and calculation module

The parallel computing capability of the graph analysis algorithm is provided based on an algorithm package provided by the graph analysis algorithm library, the parallel computing capability of the graph analysis algorithm is provided, the parallel computing capability comprises a series of bottom operators and data processing logic which can process the graph structure, communication and cooperation among multiple nodes are supported, the computing capability provided by hardware resources is combined, distributed computing and parallel computing of the graph analysis algorithm can be well realized through software and hardware collaborative acceleration, and meanwhile, the parallel computing capability has the characteristics of high expansibility, high reliability, high efficiency and the like, and a large amount of complex graph structure data can be rapidly processed.

3. Graph learning engine

The graph learning engine is used for training a parallel graph learning model, and comprises the following steps:

(1) Graph learning algorithm library

The method mainly provides the expandability realization of the common graph neural network algorithm, supports the functions of loading various graph models and data sets and preprocessing the data, and is a software tool kit for constructing and training the graph neural network model. It typically contains a series of algorithmic implementations based on graph neural networks, such as graph roll-up network (GCN), graph annotation network (GAT), graph self-encoder (Graph Autoencoder), graph generation network (Graph Generative Network), and so forth. Unlike conventional deep neural networks, graph neural networks are a model that is particularly well suited for processing graph and network data. The method treats nodes and edges in the graph and the network data as trainable data, so that the relation between the nodes and the edges can be better described, an algorithm library aims at providing efficient implementation, various graph data sets and pre-trained models are usually provided as experimental bases, and the difficulty of a user in implementing and training the graph neural network model can be greatly reduced. Besides the built-in various graphic neural networks, a parameter configuration interface is also opened to a user, and the user can build a personalized and specific graphic neural network structure under the framework of a graphic algorithm according to the self requirements.

(2) Drawing training module

The algorithm component based on the graph Xi Suanfa library provides an efficient distributed computing engine and provides the ability to optimize parameters in the training of the graph neural network model, so that the model can learn the characteristics representing the relationship between nodes and edges in a given graph data with high efficiency. The method comprises the following functions: feature extraction and representation, involving vectorization of nodes, embedding of edges, or other forms of feature abstraction; sub-sampling, namely utilizing topology information and attribute information of a distributed storage graph and providing an efficient graph sampling query interface, and acquiring a sub-graph and calculating by an algorithm model through calling the graph sampling and graph query interface; model construction and selection, wherein in the training process, a proper graph neural network model is selected for training according to the characteristics of the problem; training strategies and parameter adjustment, designating corresponding training strategies and parameter adjustment methods, and setting corresponding training stopping conditions and adjustment strategies to ensure convergence and performance stability of the model in the training process; model evaluation and tuning, wherein the actual effect of the model can reach the expectations by means of verification set evaluation and test set evaluation, parameter optimization and the like.

4. Graph development factory system

The map development factory system constructs an MLOPS visual operation system oriented to the map computing field based on the three-core map computing engine and the map storage system, provides continuous delivery and automatic pipeline capability for the life cycle of map development, and makes the whole development process of the map computing algorithm modularized, procedural and visualized, so that team cooperation and algorithm iterative maintenance are facilitated; the bottom layer of the graph development factory system is matched with the engine layer, a graph algorithm operator library can be quickly called through parameter setting configuration, and a Pipeline execution model is built according to a graph algorithm operator library interface. The graph development factory is deployed by K8s containerized technology, supports a multi-user collaborative development mode, and mainly comprises the following sub-modules as shown in FIG. 5:

1. graph data modeling module

The goal of graph data modeling is to abstract real world things into nodes and edges in a graph and use the structure and properties of the graph to describe the links and features between things. The module defines a graph data standard storage structure, designs the semantic conventions of graph structure data and graph feature data in a mode of defining a Schema, provides a visual design interface supported by a usability development tool, can perform standardized modeling on field data, facilitates generation and loading of the graph data, and provides a unified read-write interface for other modules.

2. Graph data visualization module

The results data after graph query, graph analysis and graph learning can be presented mostly in the form of graph data visualization to help people better understand and analyze the data. The visualization method comprises the following steps: the node coloring and size are adjusted, and the size and color of the node can be adjusted according to the attribute of the node, such as centrality, clustering coefficient and the like. Thus, the importance degree of the node and the characteristics of the group in which the node is located can be expressed more clearly. The color, line width, broken line style, etc. of the edges can be adjusted to reflect different edge attributes. Clustering and subgraphs, nodes can be divided into different groups according to the similarity among the nodes, and the groups are presented as subgraphs. Interactive control elements, such as zoom, drag, select, etc., may be added to allow the user to freely view the data as desired, while also allowing the data to be more conveniently analyzed and mined.

3. Graph algorithm component module

The implementation of model codes and related algorithms for model training and reasoning is used for processing tasks such as a conversion component of graph data, feature engineering, GNN model training and evaluation, parameter optimization, architecture search and the like. Accordingly, the graph algorithm component module includes a transformation component, a feature engineering component, a graph algorithm operator component, a GNN model training and evaluation component, a parameter optimization component, and an architecture search component of the graph data. Each component is designed and implemented according to specific logic, including both many implementations of graph machine learning frameworks and libraries, such as Tensorflow, pyTorch (dgl, pyg), and the like, and graph data loading, graph data processing, graph feature extraction, graph sampling, graph convolution, and graph analysis algorithm components. These components can help developers quickly build and train machine learning models and are responsible for packaging the trained models into executable files for subsequent deployment during deployment.

4. Graph training process construction module

The module manages a series of processing steps in the graph machine learning model training process in a flow mode, including data preprocessing, feature engineering, model training, model evaluation and the like, and the steps are composed of a plurality of parallel and serial processing steps, are constructed and managed by using automation tools, and are organized in a Pipeline mode to form a complete and repeatable data processing flow, so that the efficiency and accuracy of model training are improved.

5. Graph model deployment module

The module is used for deploying the trained graph algorithm model into a remote server and providing on-line reasoning of the graph algorithm model and processing capacity of large-scale parallel requests. The method comprises the steps of deploying the trained graph algorithm model into a production environment, providing service for a business system or an end user, providing the capabilities of model deployment, model monitoring, configuration management and model reasoning service, effectively reducing the operation and maintenance cost of developers and enterprises and improving the business efficiency of the enterprises. Model services are typically implemented using Web APIs or other Web services that provide real-time predictive or recommendation functionality to support enterprise production and business applications.

The graph computing platform based on the cloud native technology can also comprise various field applications, a graph computing algorithm library, a model, an algorithm component, a computing engine and a visual development factory which use the graph computing platform, and various application systems which are built aiming at industry field hatching, such as intelligent computing products which are developed by graph algorithm for computing data in the fields of pharmacy, biology, astronomy and the like, wherein each field application is deployed by K8s containerized technology.

It will be appreciated by persons skilled in the art that the foregoing description is a preferred embodiment of the application, and is not intended to limit the application, but rather to limit the application to the specific embodiments described, and that modifications may be made to the technical solutions described in the foregoing embodiments, or equivalents may be substituted for elements thereof, for the purposes of those skilled in the art. Modifications, equivalents, and alternatives falling within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. The graph computing platform based on the cloud native technology is characterized by comprising a software and hardware collaboration system, a graph storage system, a graph computing engine and a graph development factory system which are designed by using a cloud native architecture; the software and hardware collaboration system, the graph storage system and the graph calculation engine are operated at the rear end, and the graph development plant is operated at the front end;

2. The graph computing platform based on the cloud native technology according to claim 1, wherein the software and hardware collaboration system supports an expansive hardware device including a GPU processor, a CPU processor, an FPGA accelerator card, and a memory integrated chip, develops a corresponding compiling environment and a software tool, provides an operator for memory integrated with graph data, and an operator accelerator module adapted to specific hardware, and provides heterogeneous computing capability for a graph computing engine.

3. The cloud native technology-based graph computing platform of claim 1, wherein the distributed persistent storage module is implemented based on NVMe SSD technology; the distributed shared memory pool module is realized based on RDMA technology, and performs data synchronization based on the distributed persistent storage module and the distributed file system.

4. The graph computing platform based on cloud native technology of claim 1, wherein the direct query of graph association data in the graph query engine comprises a single point multi-level neighbor query, an association path query between two points, and a sub-graph query that obtains an association between multiple points.

5. The graph computing platform based on the cloud native technology according to claim 1, wherein the graph query engine comprises a graph query parser and a graph query executor, the graph query parser is used for parsing query sentences input by a user, constructing a query grammar tree and generating an execution plan; the graph query executor is used for optimizing the grammar tree produced by the graph query analyzer, then performing resource scheduling and planning execution according to the resource quota, and processing large-scale graph data in the graph storage system in a mode of minimizing running time and resource consumption by using a distributed parallel computing technology.

6. The graph computing platform based on cloud native technology of claim 1, wherein the graph analysis engine comprises a graph analysis algorithm library and a graph analysis computing module, the graph analysis algorithm library containing efficient data structures for graph data processing and various general functions and class libraries, providing a programmable tool set for a user, allowing the user to implement and test the graph analysis algorithm, and altering and adjusting parameters of the graph analysis algorithm, thereby optimizing and improving the performance of the graph analysis algorithm;

7. The cloud native technology based graph computing platform of claim 1, wherein the graph learning engine comprises a graph Xi Suanfa library and a graph learning training module, the graph Xi Suanfa library being a software toolkit for building and training graph neural network models, including various different graph datasets and pre-trained graph neural network models; the diagram learning training module is used for sequentially realizing feature extraction and representation, sub-sampling, model construction and selection, training strategy and parameter adjustment, and model evaluation and tuning.

8. The cloud native technology-based graph computing platform of claim 1, wherein the graph development plant system comprises a graph data modeling module, a graph data visualization module, a graph algorithm component module, a graph training flow construction module, and a graph model deployment module;

9. The cloud-native technology-based graph computing platform of claim 8, wherein the manner of visualization comprises: node coloring and size, edge drawing and coloring, clustering and subgraph, and interactive control are adjusted.

10. The graph computing platform based on cloud proto technology according to claim 9, wherein the visualization by clustering and subgraphs comprises: nodes are divided into different groups according to the similarity between the nodes, and are presented as subgraphs.