CN115248826A - Method and system for large-scale distributed graph database cluster operation and maintenance management - Google Patents
Method and system for large-scale distributed graph database cluster operation and maintenance management Download PDFInfo
- Publication number
- CN115248826A CN115248826A CN202211148001.1A CN202211148001A CN115248826A CN 115248826 A CN115248826 A CN 115248826A CN 202211148001 A CN202211148001 A CN 202211148001A CN 115248826 A CN115248826 A CN 115248826A
- Authority
- CN
- China
- Prior art keywords
- monitoring
- graph database
- control plane
- database cluster
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/25—Integrating or interfacing systems involving database management systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/242—Query formulation
- G06F16/2433—Query languages
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/27—Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The application relates to a method and a system for operation and maintenance management of a large-scale distributed graph database cluster, wherein the method comprises the following steps: constructing a control plane, importing a distributed graph database cluster into the control plane, and connecting the control plane to a distributed graph database cluster node corresponding to a resource plane through ssh connection information; acquiring monitoring index data of a distributed graph database cluster through a Nebula proxy service component on a corresponding node, and reporting the monitoring index data to a prometheus component of a control plane for graph data service monitoring; sending a prometheus query language statement to the prometheus component causes the monitoring data to be displayed and rendered on a monitoring display page of the control plane. By the method and the device, the problem that operation and maintenance management of the large-scale distributed graph database cluster is low in efficiency is solved, and operation efficiency is improved.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method and a system for operation and maintenance management of a large-scale distributed graph database cluster.
Background
With the wide application of graph databases in the fields of social networks, financial wind control, knowledge maps, and the like, more and more enterprises choose to use graph databases to store user data. Distributed graph databases are the best choice for quickly responding to the surge and subsidence of traffic and reducing the overall cost of use of the system. Based on the design architecture with the separation of distributed graph database storage and calculation, calculation and storage resources can be respectively subjected to online capacity expansion or capacity reduction as required to cope with various types of service scenes. However, with the increasing data volume, a distributed graph database cluster often includes more nodes and services, and for some complex operation and maintenance operations, such as cluster expansion and capacity expansion, version upgrade, etc., there is a risk of error in the operation process. Therefore, how to efficiently manage large-scale distributed graph database clusters is a difficult problem.
Disclosure of Invention
The embodiment of the application provides a method and a system for operation and maintenance management of a large-scale distributed graph database cluster, which at least solve the problem of low efficiency of operation and maintenance management of the large-scale distributed graph database cluster in the related technology.
In a first aspect, an embodiment of the present application provides a method for operation and maintenance management of a large-scale distributed graph database cluster, where the method includes:
constructing a control plane, importing a distributed graph database cluster into the control plane, and connecting the control plane to a distributed graph database cluster node corresponding to a resource plane through ssh connection information;
acquiring monitoring index data of a distributed graph database cluster through Nebula proxy service components on corresponding nodes, and reporting the monitoring index data to a proxy component of the control plane for graph data service monitoring;
and sending a prometheus query language statement to a prometheus component to enable the monitoring data to be displayed and rendered on a monitoring display page of the control plane.
In some embodiments, the obtaining, by a Nebula proxy service component on a corresponding node, monitoring indicator data of a distributed graph database cluster, and the reporting to a prometheus component of the control plane for graph data service monitoring includes:
the Nebula agent service component acquires monitoring index data of a graph database in a mode of regularly sending http requests to each graph database service of a corresponding node, and marks a label according to the structure of an IP-port-component;
the control plane configures the Nebula proxy service component into a collection target of the prometheus component, and the pometheus regularly acquires the collected monitoring index data of each node from the Nebula proxy service component, collects and stores the data, wherein the label is used for distinguishing different nodes and services.
In some of these embodiments, in monitoring a distributed graph database cluster through a monitoring display page, the method includes:
and when the distributed database cluster fails, issuing batch start-stop operation and maintenance instructions to the Nebula agent service component through the control plane.
In some of these embodiments, in monitoring the distributed graph database cluster through the monitoring display page, the method further comprises:
when the cluster load is high or the traffic flow is increased suddenly, issuing an Execute instruction to the Nebula proxy service component on the nodes in batch through the task interface, increasing new node resources to expand the cluster, and uniformly distributing the fragments in the database space to the new nodes through a balance map data instruction to share the access pressure among the nodes of the distributed database cluster;
and after the flow rate is too high, when a plurality of nodes are idle for a long time, issuing Execute instructions to the idle nodes in batches for capacity reduction.
In some embodiments, when the instructions are issued in batch, if the instruction execution fails, a Rollback operation is performed through a Rollback instruction, and the operation returns to the previous step.
In some embodiments, sending a prometheus query language statement to a prometheus component to cause the monitoring data to be displayed and rendered on a monitoring display page of the control plane comprises:
the method comprises the steps of obtaining graph space operation management data of a graph database cluster, displaying the graph space operation management data on a monitoring page, carrying out remote execution and information viewing on the graph space operation management data through NebulaGraph query language, and stopping and recovering relevant interfaces in the graph space operation management of the graph database cluster.
In a second aspect, an embodiment of the present application provides a system for operation and maintenance management of a large-scale distributed graph database cluster, where the system includes:
the communication module is used for constructing a control plane, importing a distributed graph database cluster into the control plane, and connecting the control plane to a distributed graph database cluster node corresponding to a resource plane through ssh connection information;
a monitoring display module used for acquiring monitoring index data of the distributed graph database cluster through the Nebula proxy service component on the corresponding node and reporting the monitoring index data to the promemeus component of the control plane for graph data service monitoring,
and sending a prometheus query language statement to a prometheus component to display and render the monitoring data on a monitoring display page of the control plane.
In some embodiments, the monitoring display module is further configured to collect monitoring index data of a graph database by the Nebula proxy service component by sending an http request to each graph database service of a corresponding node at regular time, and mark the monitoring index data according to a structure of the IP-port-component,
the control plane configures the Nebula proxy service component into a collection target of the prometheus component, and the pometheus regularly acquires the collected monitoring index data of each node from the Nebula proxy service, collects and stores the data, wherein the label is used for distinguishing different nodes and services.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor, when executing the computer program, implements the method for operation and maintenance management of a large-scale distributed graph database cluster as described in the first aspect.
In a fourth aspect, embodiments of the present application provide a storage medium, on which a computer program is stored, where the program, when executed by a processor, implements the method for operation and maintenance management of a large-scale distributed graph database cluster as described in the first aspect above.
Compared with the related technology, the method for managing the operation and maintenance of the large-scale distributed graph database cluster, provided by the embodiment of the application, comprises the steps of constructing a control plane, importing the distributed graph database cluster into the control plane, and connecting the control plane to the distributed graph database cluster nodes corresponding to the resource plane through ssh connection information; acquiring monitoring index data of a distributed graph database cluster through a Nebula proxy service component on a corresponding node, and reporting the monitoring index data to a prometheus component of a control plane for graph data service monitoring; sending a prometheus query language statement to the prometheus component causes the monitoring data to be displayed and rendered on a monitoring display page of the control plane.
The whole operation and maintenance management system is abstracted into a control plane and a resource plane, and the control plane is mainly responsible for service monitoring and alarming of the whole cluster and batch issuing of operation and maintenance instructions; the resource plane takes the server node as a unit and is mainly responsible for operating the graph database service, collecting the monitoring indexes corresponding to the graph database and responding to the operation and maintenance instructions issued by the control plane. By the method for separating the control plane from the resource plane, the complexity of system operation and maintenance management can be effectively reduced, the problem of low efficiency of operation and maintenance management of a large-scale distributed graph database cluster is solved, and the operation efficiency is improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a method for large scale distributed graph database cluster operation and maintenance management according to an embodiment of the present application;
FIG. 2 is a schematic flow diagram of operation and maintenance management of a large-scale distributed graph database cluster according to an embodiment of the present application;
FIG. 3 is a block diagram of a system for large scale distributed graph database cluster operation and maintenance management according to an embodiment of the present application;
fig. 4 is an internal structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clearly understood, the present application is described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to only those steps or elements but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
Fig. 1 is a flowchart of a method for operation and maintenance management of a large-scale distributed graph database cluster according to an embodiment of the present application, and as shown in fig. 1, the flowchart includes the following steps:
step S101, a control plane is constructed, a distributed graph database cluster is led into the control plane, and the control plane is connected to distributed graph database cluster nodes corresponding to a resource plane through ssh connection information;
firstly, constructing a control plane, which specifically comprises the following steps: deploying a NebuLAGraph control panel component for the ui interaction related to operation and maintenance of operation and maintenance personnel on a system browser; deploying a prometheus component for monitoring of a persistent graph data service; and deploying an alert manager component (namely an alert module) for sending an alert notice to operation and maintenance personnel when the cluster has a problem.
Fig. 2 is a schematic flow diagram of operation and maintenance management of a large-scale distributed graph database cluster according to an embodiment of the present application, and as shown in fig. 2, a distributed graph database cluster is imported into a control plane, and the control plane is connected to distributed graph database cluster nodes corresponding to resource planes through ssh connection information, that is, 22-port information, at this time, the control plane will automatically deploy a Nebula proxy service component at each node of the resource planes, and the Nebula proxy service component will establish communication with the corresponding graph database cluster after being started, and join in the cluster. It should be noted that the Nebula proxy service component is a very simple stateless service, and is mainly responsible for collecting various indexes of the graph database service, reporting the indexes to prometheus, communicating with a Nebula graph control panel of a control plane, receiving and executing various operation and maintenance instructions, and having little invasion to a cluster. In addition, it exists in the installation directory of each server in binary form, and when the cluster service is started, an operation and maintenance instruction interface may be provided for the NebulaGraph control panel through RPC (Remote Procedure Call).
Taking the NebulaGraph data cluster as an example, the NebulaGraph proxy service component obtains the topology information of the NebulaGraph cluster through communication with the metad service of the NebulaGraph. After the topology information is acquired, the Nebula proxy service assembly reports the information to a Nebula graph control panel of the control plane to complete communication among the Nebula proxy service assembly and the Nebula graph control panel.
It should be noted that the resource plane refers to a plane formed by server nodes where distributed graph databases are located.
Step S102, acquiring monitoring index data of a distributed graph database cluster through a Nebula proxy service component on a corresponding node, and reporting the monitoring index data to a promemeus component of a control plane for graph data service monitoring; sending a prometheus query language statement to the prometheus component causes the monitoring data to be displayed and rendered on a monitoring display page of the control plane.
Preferably, in this embodiment, the Nebula proxy service component acquires the monitoring index data of the graph database by sending http requests to each graph database service of the corresponding cluster node at regular time, and tags the monitoring index data according to the structure of the IP-port-component; for each registered Nebula proxy service component, the control plane configures the Nebula proxy service component into a collection target of a prometheus component, and the pometeeus periodically acquires the collected monitoring index data of each node from the Nebula proxy service component, collects and stores the monitoring index data, wherein the label is used for distinguishing different nodes and services.
And finally, in a monitoring page of the NebulaGraph control panel, sending a prometheus query language statement, namely a promql statement to prometheus, so that monitoring data is displayed and rendered on a monitoring display page of the control plane for operation and maintenance personnel to view. At the moment, operation and maintenance personnel can check the operation condition of each graph database service through a monitoring page of the NebulaGraph control panel, intuitively perceive the health condition, the load condition and the like of the whole cluster, and conveniently and quickly respond when the service is abnormal.
In some embodiments, job management for graph database clusters may be managed by the NebulaGraph control panel's monitor page. The method specifically comprises the following steps: the method comprises the steps of obtaining graph space operation management data (namely Job management data) of a graph database cluster, displaying the graph space operation management data on a monitoring page, carrying out remote execution and information viewing on the graph space operation management data through NebulaGraph query language (namely NGQL), and stopping and recovering relevant interfaces in the graph space operation management of the graph database cluster. Taking NebulaGraph as an example, the tasks of long-term operation and maintenance on Storage service, i.e. Storage service, are called graph space jobs, such as COMPACT (compressed stored data), FLUSH (persisting data to disk), STATS (statistical graph data), BALANCE (balanced graph data). With the increasing amount of traffic data, it is often time consuming to run such tasks. Therefore, in this embodiment, in the control plane, the NebulaGraph control panel may periodically check all the instances of the currently running graph-space job, display them on the monitoring page, provide remote execution and check details by executing the NebulaGraph query language, and stop and resume the relevant interfaces of the graph-space job in the graph database.
Through the steps S101 to S102, the present embodiment abstracts the entire operation and maintenance management system into two parts, namely a control plane and a resource plane, where the control plane is mainly responsible for service monitoring and alarm of the entire cluster and batch issuing of operation and maintenance instructions; the resource plane takes the server node as a unit and is mainly responsible for operating the graph database service, collecting the monitoring indexes corresponding to the graph database and responding to the operation and maintenance instructions issued by the control plane. By the method for separating the control plane from the resource plane, the complexity of system operation and maintenance management can be effectively reduced, the problem of low efficiency of operation and maintenance management of a large-scale distributed graph database cluster is solved, and the operation efficiency is improved.
In some embodiments, when the distributed graph database cluster is monitored through the monitoring display page, if the distributed graph database cluster fails, a batch start-stop operation and maintenance instruction is issued to the Nebula agent service component through the control plane. For example, when an accident occurs in a resource plane machine room or when partial services are down due to sudden increase of traffic, there are many stopped services, and at this time, all abnormally stopped machines may be selected and start or stop instructions may be sent to the Nebula agent service component in batch.
In some embodiments, when a distributed graph database cluster is monitored through a monitoring display page, if the cluster load is high or the traffic flow is increased suddenly, an Execute instruction is issued to a Nebula proxy service component on a node in batch through a task interface, new node resources are added to expand the cluster, and the fragments in a graph database space are uniformly distributed to the new nodes through a balanced graph data instruction to share the access pressure among the nodes of the distributed graph database cluster; and after the flow rate is too high, when a plurality of nodes are idle for a long time, issuing Execute instructions to the idle nodes in batches for capacity reduction.
It should be noted that Execute is a method under task (task) class in the workflow of executing operation and maintenance instructions in batch in the present system, and is mainly used for executing corresponding operation and maintenance instructions of a graph database on corresponding resource nodes, for example, batch capacity expansion and capacity reduction.
In some embodiments, when instructions are issued in batch, if the instructions fail to be executed, a Rollback operation is performed through a Rollback instruction, and the previous operation is returned. In this embodiment, the Rollback is another method in the task class of the workflow where the operation and maintenance instructions are executed in batches in the system, and is mainly used for performing Rollback operation after a series of operation and maintenance instructions fails to be executed, and when there is a further error, the Rollback operation may be performed according to a specified instruction, for example, if a certain installation package fails to be downloaded due to network abnormality in a process of expanding a map database service, at this time, a Rollback flow may be triggered, and the Rollback may clean up a part of the downloaded installation packages, so as to ensure atomicity of operation.
It should be noted that, all instructions in the workflow for executing the operation and maintenance instructions in batch form the workflow for executing in batch at one time.
According to the embodiment, each operation and maintenance instruction is distributed in batches in a workflow manner, the instruction issuing efficiency is effectively improved, a rollback mechanism is provided for each high-risk operation, and the fault tolerance of each operation and maintenance operation is improved. Therefore, the efficiency can be remarkably improved under the scene of simultaneously managing a plurality of large-scale graph databases.
It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than presented herein.
The present embodiment further provides a system for large-scale distributed graph database cluster operation and maintenance management, which is used to implement the foregoing embodiments and preferred embodiments, and the description of the system that has been already made is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware or a combination of software and hardware is also possible and contemplated.
Fig. 3 is a block diagram of a system for operation and maintenance management of a large-scale distributed graph database cluster according to an embodiment of the present application, and as shown in fig. 3, the system includes a communication module 31 and a monitoring display module 32:
the communication module 31 is used for constructing a control plane, importing a distributed graph database cluster into the control plane, and connecting the control plane to a distributed graph database cluster node corresponding to a resource plane through ssh connection information; and the monitoring display module 32 is configured to acquire monitoring index data of the distributed graph database cluster through the Nebula proxy service component on the corresponding node, report the monitoring index data to the prometheus component of the control plane for graph data service monitoring, and send a prometheus query language statement to the prometheus component so that the monitoring data is displayed and rendered on a monitoring display page of the control plane.
Through the system, in the embodiment, operation and maintenance personnel do not need to manually operate each step of each service, and only need to select the corresponding machine and service on the control surface through the UI page, all the steps are automatically split into individual tasks according to the workflow in the system, the tasks are integrally organized and then sent to the Nebula proxy service component, and then the Nebula proxy service component executes the preset operation and maintenance instruction on the corresponding resource plane. The operation such as expansion or contraction of the graph database cluster can be completed by one key, the resource use condition after the operation can be quickly checked through the monitoring service, the operation and maintenance management can be carried out on the whole cluster, and the efficiency is improved.
It should be noted that, for specific examples in this embodiment, reference may be made to the examples described in the foregoing embodiment and optional implementation manners, and details of this embodiment are not described herein again.
Note that each of the modules may be a functional module or a program module, and may be implemented by software or hardware. For a module implemented by hardware, the above modules may be located in the same processor; or the modules can be respectively positioned in different processors in any combination.
The present embodiment also provides an electronic device, comprising a memory having a computer program stored therein and a processor configured to run the computer program to perform the steps of any of the method embodiments described above.
Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.
In addition, by combining the method for operation and maintenance management of a large-scale distributed graph database cluster in the foregoing embodiments, embodiments of the present application may provide a storage medium to implement. The storage medium has a computer program stored thereon; the computer program, when executed by a processor, implements any of the methods for large-scale distributed database cluster operation and maintenance management in the above embodiments.
In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method for large-scale distributed graph database cluster operation and maintenance management. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.
In one embodiment, fig. 4 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 4, there is provided an electronic device, which may be a server, and its internal structure diagram may be as shown in fig. 4. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capability, the network interface is used for communicating with an external terminal through network connection, the internal memory is used for providing an environment for an operating system and the running of a computer program, the computer program is executed by the processor to realize a method for large-scale distributed database cluster operation and maintenance management, and the database is used for storing data.
Those skilled in the art will appreciate that the configuration shown in fig. 4 is a block diagram of only a portion of the configuration associated with the present application, and does not constitute a limitation on the electronic device to which the present application is applied, and a particular electronic device may include more or less components than those shown in the drawings, or combine certain components, or have a different arrangement of components.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), rambus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. A method for large-scale distributed graph database cluster operation and maintenance management, the method comprising:
constructing a control plane, importing a distributed graph database cluster into the control plane, and connecting the control plane to a distributed graph database cluster node corresponding to a resource plane through ssh connection information;
acquiring monitoring index data of a distributed graph database cluster through a Nebula proxy service component on a corresponding node, reporting to a prometheus component of the control plane for graph data service monitoring, and sending a prometheus query language statement to the prometheus component to display and render the monitoring data on a monitoring display page of the control plane.
2. The method according to claim 1, wherein the acquiring of the monitoring index data of the distributed graph database cluster by the Nebula proxy service component on the corresponding node, and the reporting of the prometheus component to the control plane for graph data service monitoring comprises:
the Nebula agent service component acquires monitoring index data of a graph database in a mode of regularly sending an http request to each graph database service of a corresponding node, and marks a label according to the structure of an IP-port-component;
the control plane configures the Nebula proxy service component into a collection target of the prometheus component, and the pometheus regularly acquires the collected monitoring index data of each node from the Nebula proxy service component, collects and stores the data, wherein the label is used for distinguishing different nodes and services.
3. The method of claim 1, wherein in monitoring a distributed graph database cluster through a monitoring display page, the method comprises:
and when the distributed database cluster fails, issuing batch start-stop operation and maintenance instructions to the Nebula agent service component through the control plane.
4. The method of claim 1, wherein in monitoring a distributed graph database cluster through a monitoring display page, the method further comprises:
when the cluster load is high or the traffic flow is increased suddenly, issuing an Execute instruction to the Nebula proxy service component on the nodes in batch through the task interface, increasing new node resources to expand the cluster, and uniformly distributing the fragments in the database space to the new nodes through a balance map data instruction to share the access pressure among the nodes of the distributed database cluster;
and after the flow rate is too high, when a plurality of nodes are idle for a long time, issuing Execute instructions to the idle nodes in batches for capacity reduction.
5. The method according to claim 3 or 4,
and when the instructions are issued in batch, if the instruction execution fails, performing Rollback operation through a Rollback instruction, and returning to the previous operation.
6. The method of claim 1, wherein sending a prometheus query language statement to a prometheus component to cause display and rendering of monitoring data on a monitoring display page of the control plane comprises:
the method comprises the steps of obtaining graph space operation management data of a graph database cluster, displaying the graph space operation management data on a monitoring page, carrying out remote execution and information viewing on the graph space operation management data through NebulaGraph query language, and stopping and recovering relevant interfaces in the graph space operation management of the graph database cluster.
7. A system for large scale distributed graph database cluster operation and maintenance management, the system comprising:
the communication module is used for constructing a control plane, importing a distributed graph database cluster into the control plane, and connecting the control plane to a distributed graph database cluster node corresponding to a resource plane through ssh connection information;
a monitoring display module used for acquiring monitoring index data of the distributed graph database cluster through a Nebula proxy service component on the corresponding node and reporting the monitoring index data to a prometheus component of the control plane for graph data service monitoring,
and sending a prometheus query language statement to a prometheus component to display and render the monitoring data on a monitoring display page of the control plane.
8. The system of claim 7,
the monitoring display module is also used for acquiring monitoring index data of a graph database by the Nebula agent service component in a mode of sending an http request to each graph database service of a corresponding node at regular time, marking a label according to the structure of the IP-port-component,
the control plane configures the Nebula proxy service assembly into a collection target of the prometheus assembly, and the pometexus acquires the collected monitoring index data of each node from the Nebula proxy service assembly at regular time, collects and stores the data, wherein the label is used for distinguishing different nodes and services.
9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and the processor is configured to execute the computer program to perform the method for large-scale distributed graph database cluster operation and maintenance management according to any of claims 1 to 6.
10. A storage medium having stored thereon a computer program, wherein the computer program is arranged to execute the method for operation and maintenance management of a large scale distributed graph database cluster according to any of claims 1 to 6 when running.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211148001.1A CN115248826B (en) | 2022-09-21 | 2022-09-21 | Method and system for large-scale distributed graph database cluster operation and maintenance management |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211148001.1A CN115248826B (en) | 2022-09-21 | 2022-09-21 | Method and system for large-scale distributed graph database cluster operation and maintenance management |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115248826A true CN115248826A (en) | 2022-10-28 |
CN115248826B CN115248826B (en) | 2023-04-11 |
Family
ID=83699443
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211148001.1A Active CN115248826B (en) | 2022-09-21 | 2022-09-21 | Method and system for large-scale distributed graph database cluster operation and maintenance management |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115248826B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116127149A (en) * | 2023-04-14 | 2023-05-16 | 杭州悦数科技有限公司 | Quantification method and system for health degree of graph database cluster |
CN116955674A (en) * | 2023-09-20 | 2023-10-27 | 杭州悦数科技有限公司 | Method and web device for generating graph database statement through LLM |
CN116992065A (en) * | 2023-09-26 | 2023-11-03 | 之江实验室 | Graph database data importing method, system, electronic equipment and medium |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111190888A (en) * | 2020-01-03 | 2020-05-22 | 中国建设银行股份有限公司 | Method and device for managing graph database cluster |
CN112202617A (en) * | 2020-10-09 | 2021-01-08 | 腾讯科技(深圳)有限公司 | Resource management system monitoring method and device, computer equipment and storage medium |
CN112395350A (en) * | 2020-11-17 | 2021-02-23 | 中国工商银行股份有限公司 | Method and device for visualizing monitoring data of multiple data sources |
US11222072B1 (en) * | 2015-07-17 | 2022-01-11 | EMC IP Holding Company LLC | Graph database management system and method for a distributed computing environment |
US20220067011A1 (en) * | 2020-08-31 | 2022-03-03 | Vesoft Inc. | Data processing method and system of a distributed graph database |
CN114528085A (en) * | 2022-02-21 | 2022-05-24 | 中国工商银行股份有限公司 | Resource scheduling method, device, computer equipment, storage medium and program product |
CN114924931A (en) * | 2022-04-24 | 2022-08-19 | 杭州悦数科技有限公司 | Method, system, device and medium for monitoring and maintaining graph database |
CN114924952A (en) * | 2022-04-28 | 2022-08-19 | 杭州悦数科技有限公司 | Method, system and medium for diagnosing health condition of distributed graph database black box |
CN115033722A (en) * | 2022-08-10 | 2022-09-09 | 杭州悦数科技有限公司 | Method, system, device and medium for accelerating data query of database |
-
2022
- 2022-09-21 CN CN202211148001.1A patent/CN115248826B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11222072B1 (en) * | 2015-07-17 | 2022-01-11 | EMC IP Holding Company LLC | Graph database management system and method for a distributed computing environment |
CN111190888A (en) * | 2020-01-03 | 2020-05-22 | 中国建设银行股份有限公司 | Method and device for managing graph database cluster |
US20220067011A1 (en) * | 2020-08-31 | 2022-03-03 | Vesoft Inc. | Data processing method and system of a distributed graph database |
CN112202617A (en) * | 2020-10-09 | 2021-01-08 | 腾讯科技(深圳)有限公司 | Resource management system monitoring method and device, computer equipment and storage medium |
CN112395350A (en) * | 2020-11-17 | 2021-02-23 | 中国工商银行股份有限公司 | Method and device for visualizing monitoring data of multiple data sources |
CN114528085A (en) * | 2022-02-21 | 2022-05-24 | 中国工商银行股份有限公司 | Resource scheduling method, device, computer equipment, storage medium and program product |
CN114924931A (en) * | 2022-04-24 | 2022-08-19 | 杭州悦数科技有限公司 | Method, system, device and medium for monitoring and maintaining graph database |
CN114924952A (en) * | 2022-04-28 | 2022-08-19 | 杭州悦数科技有限公司 | Method, system and medium for diagnosing health condition of distributed graph database black box |
CN115033722A (en) * | 2022-08-10 | 2022-09-09 | 杭州悦数科技有限公司 | Method, system, device and medium for accelerating data query of database |
Non-Patent Citations (2)
Title |
---|
PRASANNA BAGADE ET AL.: "Designing performance monitoring tool for NoSQL Cassandra distributed database", 《IEEE》 * |
王梅等: "基于分布式系统的大数据管理平台技术架构研究", 《电脑与电信》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116127149A (en) * | 2023-04-14 | 2023-05-16 | 杭州悦数科技有限公司 | Quantification method and system for health degree of graph database cluster |
CN116955674A (en) * | 2023-09-20 | 2023-10-27 | 杭州悦数科技有限公司 | Method and web device for generating graph database statement through LLM |
CN116955674B (en) * | 2023-09-20 | 2024-01-09 | 杭州悦数科技有限公司 | Method and web device for generating graph database statement through LLM |
CN116992065A (en) * | 2023-09-26 | 2023-11-03 | 之江实验室 | Graph database data importing method, system, electronic equipment and medium |
CN116992065B (en) * | 2023-09-26 | 2024-01-12 | 之江实验室 | Graph database data importing method, system, electronic equipment and medium |
Also Published As
Publication number | Publication date |
---|---|
CN115248826B (en) | 2023-04-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115248826B (en) | Method and system for large-scale distributed graph database cluster operation and maintenance management | |
CN110427299B (en) | Log processing method, related device and system for micro-service system application | |
CN105718351A (en) | Hadoop cluster-oriented distributed monitoring and management system | |
CN108365985A (en) | A kind of cluster management method, device, terminal device and storage medium | |
CN108959385B (en) | Database deployment method, device, computer equipment and storage medium | |
CN112698915A (en) | Multi-cluster unified monitoring alarm method, system, equipment and storage medium | |
CN111124609B (en) | Data acquisition method and device, data acquisition equipment and storage medium | |
KR102176028B1 (en) | System for Real-time integrated monitoring and method thereof | |
CN103973516A (en) | Method and device for achieving monitoring function in data processing system | |
US10474509B1 (en) | Computing resource monitoring and alerting system | |
CN111339466A (en) | Interface management method and device, electronic equipment and readable storage medium | |
CN111026606A (en) | Alarm method and device based on hystrix fuse monitoring and computer equipment | |
CN114629883A (en) | Service request processing method and device, electronic equipment and storage medium | |
CN112149975B (en) | APM monitoring system and method based on artificial intelligence | |
CN113377535A (en) | Distributed timing task allocation method, device, equipment and readable storage medium | |
CN117389830A (en) | Cluster log acquisition method and device, computer equipment and storage medium | |
CN115766715A (en) | High-availability super-fusion cluster monitoring method and system | |
CN114816914A (en) | Data processing method, equipment and medium based on Kubernetes | |
CN116668269A (en) | Arbitration method, device and system for dual-activity data center | |
US9274905B1 (en) | Configuration tests for computer system | |
CN113094053B (en) | Delivery method and device of product and computer storage medium | |
CN114490003A (en) | Distributed job scheduling method of large-scale data and related equipment | |
CN114428704A (en) | Method and device for full-link distributed monitoring, computer equipment and storage medium | |
CN112596974A (en) | Full link monitoring method, device, equipment and storage medium | |
US10296967B1 (en) | System, method, and computer program for aggregating fallouts in an ordering system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |