CN114637656A

CN114637656A - Redis-based monitoring method, device, storage medium and equipment

Info

Publication number: CN114637656A
Application number: CN202210519248.3A
Authority: CN
Inventors: 陈实; 张益军; 王金明
Original assignee: Feihu Information Technology Tianjin Co Ltd
Current assignee: Feihu Information Technology Tianjin Co Ltd
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-06-17
Anticipated expiration: 2042-05-13
Also published as: CN114637656B

Abstract

The application discloses a monitoring method, a device, a storage medium and equipment based on Redis, wherein the method comprises the following steps: carrying out big data analysis on sample operation data of each instance in Redis to obtain an index set, and fault types and fault analyses corresponding to each index in the index set; after the Redis is started to operate, acquiring a value of each index in an index set of each instance according to a monitoring period corresponding to each index; for each instance, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric; and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user. Compared with the prior art, the method does not need to manually troubleshoot the instances one by one, and therefore the problem troubleshooting efficiency of Redis is effectively improved.

Description

Redis-based monitoring method and device, storage medium and equipment

Technical Field

The present application relates to the field of databases, and in particular, to a method, an apparatus, a storage medium, and a device for monitoring based on Redis.

Background

Remote Dictionary service (Redis) is a key-based storage system, is a cross-platform non-relational database, and can be used for caching, databases and message middleware. As the Redis is widely used in enterprises, the enterprises can encounter various problems during the use of the Redis, and when the service is found to be abnormal, the problems of the Redis need to be eliminated.

Currently, when a service is abnormal, a problem is usually troubleshoot by a human for an instance of Redis. However, in the case that the Redis contains a large number of instances, it takes time to manually perform the problem troubleshooting process, and it is difficult to complete the problem troubleshooting work of each instance within a limited time.

Therefore, how to improve the problem troubleshooting efficiency of Redis becomes a problem to be solved urgently in the field.

Disclosure of Invention

The application provides a monitoring method, a monitoring device, a storage medium and a monitoring device based on Redis, and aims to improve the problem troubleshooting efficiency of Redis.

In order to achieve the above object, the present application provides the following technical solutions:

a Redis-based monitoring method, comprising:

carrying out big data analysis on sample operation data of each instance in Redis to obtain an index set, and fault types and fault analyses corresponding to each index in the index set;

configuring a monitoring period, a value range and an importance degree corresponding to each index;

after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index, and acquiring a value of each index in an index set of each instance;

for each of the instances, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric; the problem indicators include: the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value;

and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user.

Optionally, the performing big data analysis on the sample operation data of each instance in the Redis to obtain an index set, and performing fault type and fault analysis corresponding to each index in the index set, includes:

grouping each instance of Redis in advance to obtain a global instance set and a special instance set; the set of global instances comprises a plurality of global instances; the special instance set comprises a plurality of special instances, and the service processing capacity of the global instance is not greater than a preset threshold value; the service processing capacity of the special instance is larger than the preset threshold value;

carrying out big data analysis on the sample operation data of each global instance to obtain a first index set, and fault types and fault analysis corresponding to each index in the first index set;

and carrying out big data analysis on the sample operation data of each special example to obtain a second index set, and fault types and fault analysis corresponding to each index in the second index set.

Optionally, the configuring the monitoring period, the value range, and the importance degree corresponding to each index includes:

configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set;

and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.

Optionally, after the Redis is started and operated, according to a monitoring period corresponding to each index, calling a preset Info command to obtain a value of each index in the index set of each instance, where the method includes:

after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index in the first index set, and acquiring a value of each index in the first index set of each global instance;

and calling the Info command according to the monitoring period corresponding to each index in the second index set to acquire the value of each index in the second index set of each special example.

Optionally, for each of the instances, in a case that the index set of the instance contains a problem index, identifying the instance as a target instance includes:

for each of the global instances, identifying the global instance as a target instance if a first set of metrics of the global instance contains a problem metric;

for each of the special instances, identifying the special instance as the target instance if the second set of indicators for the special instance contains the problem indicator.

A Redis-based monitoring device, comprising:

the analysis unit is used for carrying out big data analysis on the sample operation data of each instance in Redis to obtain an index set, and fault types and fault analysis corresponding to each index in the index set;

the configuration unit is used for configuring the monitoring period, the value range and the importance degree corresponding to each index;

the obtaining unit is used for calling a preset Info command according to a monitoring period corresponding to each index after the Redis is started to operate, and obtaining a value of each index in an index set of each instance;

an identification unit, configured to identify, for each of the instances, the instance as a target instance if the set of metrics of the instance contains a problem metric; the problem indicators include: the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value;

and the warning unit is used for generating a warning prompt based on the preset id of each target example, the fault type corresponding to the problem index contained in each target example and fault analysis, and sending the warning prompt to a user.

Optionally, the analysis unit is specifically configured to:

Optionally, the configuration unit is specifically configured to:

A computer-readable storage medium comprising a stored program, wherein the program performs the Redis-based monitoring method.

A Redis-based monitoring device comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is used for storing a program, and the processor is used for executing the program, wherein the program executes the Redis-based monitoring method during the running process.

According to the technical scheme, big data analysis is carried out on sample operation data of each instance in Redis to obtain an index set, and fault types and fault analysis corresponding to each index in the index set; after the Redis is started to operate, acquiring a value of each index in an index set of each instance according to a monitoring period corresponding to each index; for each instance, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric; and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user. According to the method and the device, the examples containing the problem indexes are identified as the target examples, the preset id of the target examples, the fault types corresponding to the problem indexes contained in the target examples and the fault analysis are sent to the user, the user is helped to rapidly troubleshoot the problems within a limited time, compared with the prior art, the problem troubleshooting is not needed to be carried out on the examples one by one manually, and therefore the problem troubleshooting efficiency of Redis is effectively improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a monitoring method based on Redis according to an embodiment of the present application;

fig. 2 is a schematic flowchart of another Redis-based monitoring method provided in an embodiment of the present application;

fig. 3 is a schematic structural diagram of a monitoring device based on Redis according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

As shown in fig. 1, a schematic flow diagram of a monitoring method based on Redis provided in an embodiment of the present application may be applied to a server, and includes the following steps:

s101: and grouping each instance of Redis in advance to obtain a global instance set and a special instance set.

The global instance set comprises a plurality of global instances, the special instance set comprises a plurality of special instances, the service processing capacity of the global instances is not larger than a preset threshold, and the service processing capacity of the special instances is larger than the preset threshold. The so-called instance, i.e. the process of providing the Redis service, is common general knowledge familiar to the skilled person and will not be described further here.

S102: and carrying out big data analysis on the sample operation data of each global instance to obtain a first index set, and fault types and fault analysis corresponding to each index in the first index set.

The sample operation data of the global instance comprises operation data when the global instance processes the business with normal efficiency and operation data when the global instance processes the business with lower efficiency. The first set of metrics includes a plurality of metrics that affect the efficiency of traffic processing for the global instance.

Specifically, each index included in the first index set at least includes: the method comprises the steps of outputting the maximum queue length of a buffer area, the maximum buffer size of an input buffer area, the current size of AOF, the blocking number of minutes of AOF, the last bgsave state, the last time used for fork, the execution times of minute full copy, the times of minute partial copy failure, the times of minute partial copy success, the times of minute connection rejection, real-time ops, the memory fragmentation rate, minute network output flow, minute network input flow, the deviation quantity difference of a master node and a slave node, user CPU consumption, kernel CPU consumption, cluster state and the number of cluster successful allocation grooves.

Through big data analysis, the fault type corresponding to the maximum queue length of the output buffer area is as follows: the server generates a blocking. The failure analysis corresponding to the maximum queue length of the output buffer is: the client frequently calls bigkey, the client has a large amount of commands of bulk operation hgetall, smembers and the like to acquire all elements, and the client executes monitor to block other client connections.

The fault type corresponding to the maximum buffer size of the input buffer is as follows: the server generates a block. The fault analysis corresponding to the maximum buffer size of the input buffer area is as follows: the bigkey operation executed by the server side causes slow query and blockage of client side command execution, and frequent writing operation of batch keys.

The fault types corresponding to the current size of the AOF are: the additional performance overhead is increased in rewriting the instance. The failure analysis corresponding to the current size of the AOF is: the overhead of the server memory and the disk is increased when the AOF file is too large in the process of rewriting and merging.

The fault types corresponding to the number of minutes AOF blocked are: influence client calls, have an influence on Redis performance. The failure analysis corresponding to the number of AOF blockages per minute is: the machine disk performance is not good enough, the read-write speed cannot keep up with, the host has other processes to write the disk, AOF rewrite or RDB operation consumes the disk io.

The fault type corresponding to the last bgsave state is: affecting master-slave instance handoff failures. The failure analysis corresponding to the last bgsave state is: disk space is insufficient.

The failure type corresponding to the time used for last fork is: affecting server performance. The failure analysis corresponding to the time spent last fork is: the instances consume excessive memory.

The types of failures corresponding to the number of times of the minute full-scale copy execution are: full replication occurs. The failure analysis corresponding to the number of times of the minute full-scale copy execution was: the primary node copies the buffer beyond a threshold.

The failure types corresponding to the number of minute partial copy failures are: the partial copy execution fails. The failure analysis corresponding to the number of minute partial copy failures is: network failure and insufficient disk space.

The failure types corresponding to the number of times of successful minute partial replication are: the copy fails. The failure analysis corresponding to the number of successful minute segment replications was: a network failure.

The failure types corresponding to the number of connections rejected in minutes are: the client refuses the connection. The failure analysis corresponding to the number of connections rejected in minutes is: the client creates a large number of connections and there is connection leakage in the client code.

The fault types corresponding to the real-time ops are: affecting client performance. The fault analysis corresponding to real-time ops is: the method comprises the steps of suddenly increasing flow of a client, increasing machine call of the client and improving thread concurrence of a client list.

The fault types corresponding to the memory fragmentation rate are: this is a significant waste of machine memory that may cause oom downtime in container memory. The failure analysis corresponding to the memory fragmentation rate is as follows: there is too much memory fragmentation.

The fault types corresponding to the minute network output traffic are: causing a bottleneck to the machine network traffic. The failure analysis corresponding to the minute network output traffic is: how large the client reads the traffic.

The fault types corresponding to minute network input traffic are: causing a bottleneck to machine network traffic. The failure analysis corresponding to minute network input traffic is: how large the client writes traffic.

The fault types corresponding to the master-slave node offset difference are: the data of the master and slave instances are inconsistent. The fault analysis corresponding to the master-slave node offset difference is as follows: and due to network failure, the performance difference of machines where the master instance and the slave instance are located is large.

The fault types corresponding to the user CPU consumption are: process CPU consumes too much. The fault analysis corresponding to the user CPU consumption is: the Fork sub-process performs the task consumption.

The fault types corresponding to the kernel CPU consumption are: the host process CPU consumes too much. The failure analysis corresponding to kernel CPU consumption is: the instance master process is busy.

The fault types corresponding to the cluster states are: the Redis cluster is not available. The failure analysis corresponding to the cluster state is: one or more groups of master and slave examples are down at the same time, and the machine room is in failure.

The fault types corresponding to the number of successfully allocated slots of the cluster are as follows: the client reports an error to the slot service hitting the key. The failure analysis corresponding to the number of successfully allocated slots of the cluster is as follows: the same group or a plurality of groups of master and slave examples are down, and the slot call is abnormal.

It should be noted that the above specific implementation process is only for illustration.

S103: and carrying out big data analysis on the sample operation data of each special case to obtain a second index set and fault types and fault analysis corresponding to each index in the second index set.

The sample operation data of the special example comprises operation data when the special example processes the business with normal efficiency and operation data when the special example processes the business with lower efficiency. The second set of metrics includes a plurality of metrics that affect the efficiency of the business process for the particular instance.

Specifically, the second index set includes all the indexes in the first index set, and also includes a key value elimination number and a key expiration time.

The fault types corresponding to the key value elimination numbers are as follows: the example hit rate is low. The fault analysis corresponding to the key value elimination number is as follows: the same or multiple groups of master and slave instances are down.

The fault types corresponding to the key expiration durations are: the example hit rate is low. The failure analysis corresponding to the key expiration time is: master-slave instance data is lost.

S104: and configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set.

S105: and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.

S106: after the Redis is started and operated, calling a preset Info command according to a monitoring period corresponding to each index in the first index set, and acquiring the value of each index in the first index set of each global instance.

The preset Info command includes, but is not limited to: info clients, Info persistence, Info stats, Info replication, Info CPU, Info cluster, and the like.

Optionally, each index in the first index set and the acquisition time of each index may be stored in a preset data table, so that a user can view each index in the first index set and the acquisition time of each index at any time.

Specifically, the maximum queue length of the output buffer and the maximum buffer size of the input buffer can be obtained by calling the Info clients.

By calling Info persistence, the current size of the AOF, the number of minutes AOF blocked, the last bgsave state can be obtained.

By calling the Info stats, the time used by the previous fork, the number of times of executing the full-scale minute copy, the number of times of failure in the partial-minute copy, the number of times of success in the partial-minute copy, the number of connections rejected in the minute, the real-time ops, the memory fragmentation rate, the minute network output traffic, and the minute network input traffic can be obtained.

By calling Info reapplication, the offset difference of the master node and the slave node can be obtained.

By calling the Info CPU, the consumption of the CPU of the user and the consumption of the CPU of the kernel can be obtained.

By calling the Info cluster, the cluster state and the number of the cluster successfully distributed slots can be obtained.

S107: and calling an Info command according to the monitoring period corresponding to each index in the second index set, and acquiring the value of each index in the second index set of each special example.

For this reason, the manner of acquiring some indexes in the second index set can be referred to the explanation of the step S106. The remaining indexes, specifically, the number of obsolete key values and the key expiration duration can be obtained by calling the Info memory.

Optionally, each index in the second index set and the acquisition time of each index may also be stored in a preset data table, so that a user can view each index in the second index set and the acquisition time of each index at any time.

S108: for each global instance, the global instance is identified as a target instance if the first set of metrics for the global instance contains a problem metric.

Wherein the problem indicators include: and the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value.

S109: for each particular instance, where the second set of metrics for the particular instance includes a problem metric, the particular instance is identified as the target instance.

S110: and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user.

In summary, the case containing the problem index is identified as the target case, the preset id of the target case, and the fault type and fault analysis corresponding to the problem index contained in the target case are sent to the user, so that the user is helped to quickly troubleshoot the problem within a limited time.

It should be noted that, in the foregoing embodiment, reference is made to S101, which is an optional implementation manner of the Redis-based monitoring method described in this application. In addition, S107 mentioned in the above embodiment is also an optional implementation manner of the Redis-based monitoring method described in this application. For this reason, the flow mentioned in the above embodiment can be summarized as the method shown in fig. 2.

As shown in fig. 2, a schematic flow chart of another Redis-based monitoring method provided in the embodiment of the present application includes the following steps:

s201: and carrying out big data analysis on the sample operation data of each instance in the Redis to obtain an index set, and fault types and fault analysis corresponding to each index in the index set.

S202: and configuring a monitoring period, a value range and an importance degree corresponding to each index.

S203: after the Redis is started and operated, calling a preset Info command according to a monitoring period corresponding to each index, and acquiring a value of each index in an index set of each instance.

S204: for each instance, where the set of metrics for the instance contains a problem metric, the instance is identified as the target instance.

S205: and generating an alarm prompt based on the preset id of each target instance, the fault type corresponding to the problem index contained in each target instance and fault analysis, and sending the alarm prompt to a user.

Corresponding to the monitoring method based on Redis provided by the embodiment of the application, the embodiment of the application also provides a monitoring device based on Redis.

As shown in fig. 3, an architecture diagram of a monitoring device based on Redis provided in an embodiment of the present application includes:

the analysis unit 100 is configured to perform big data analysis on the sample operation data of each instance in the Redis to obtain an index set, and a fault type and a fault analysis corresponding to each index in the index set.

Optionally, the analysis unit 100 is specifically configured to: grouping each instance of Redis in advance to obtain a global instance set and a special instance set; the global instance set comprises a plurality of global instances; the special instance set comprises a plurality of special instances, and the service processing amount of the global instance is not greater than a preset threshold value; the service processing capacity of the special case is larger than a preset threshold value; carrying out big data analysis on the sample operation data of each global instance to obtain a first index set, and fault types and fault analysis corresponding to each index in the first index set; and carrying out big data analysis on the sample operation data of each special case to obtain a second index set and fault types and fault analysis corresponding to each index in the second index set.

And the configuration unit 200 is configured to configure the monitoring period, the value range, and the importance corresponding to each index.

Optionally, the configuration unit 200 is specifically configured to: configuring a monitoring period, a value range and an importance degree corresponding to each index in the first index set; and configuring a monitoring period, a value range and an importance degree corresponding to each index in the second index set.

The obtaining unit 300 is configured to, after the Redis is started to operate, call a preset Info command according to a monitoring period corresponding to each index, and obtain a value of each index in the index set of each instance.

Optionally, the obtaining unit 300 is specifically configured to: after the Redis is started to operate, calling a preset Info command according to a monitoring period corresponding to each index in the first index set, and acquiring the value of each index in the first index set of each global instance; and calling an Info command according to the monitoring period corresponding to each index in the second index set, and acquiring the value of each index in the second index set of each special example.

An identifying unit 400, configured to identify, for each instance, the instance as a target instance in a case that the index set of the instance contains a problem index; the problem indicators include: and the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value.

Optionally, the identification unit 400 is specifically configured to: for each global instance, identifying the global instance as a target instance if a first set of metrics of the global instance contains a problem metric; for each particular instance, where the second set of metrics for the particular instance includes a problem metric, the particular instance is identified as the target instance.

And the warning unit 500 is configured to generate a warning prompt based on the preset id of each target instance, the fault type corresponding to the problem index included in each target instance, and fault analysis, and send the warning prompt to the user.

The present application also provides a computer-readable storage medium including a stored program, wherein the program performs the Redis-based monitoring method provided by the present application.

The application also provides a monitoring device based on Redis, including: a processor, memory, and a bus. The processor is connected with the memory through a bus, the memory is used for storing programs, and the processor is used for running the programs, wherein when the programs run, the Redis-based monitoring method provided by the application is executed, and the method comprises the following steps:

carrying out big data analysis on sample operation data of each instance in Redis to obtain an index set, and carrying out fault type and fault analysis corresponding to each index in the index set;

for each of the instances, identifying the instance as a target instance if the set of metrics for the instance includes a problem metric; the problem indicators include: the value is not in the value range corresponding to the index, and the importance degree corresponding to the index is larger than the index of the preset value;

Specifically, on the basis of the above embodiment, the performing big data analysis on the sample operation data of each instance in the Redis to obtain an index set, and a fault type and a fault analysis corresponding to each index in the index set includes:

Specifically, on the basis of the above embodiment, the configuring the monitoring period, the value range, and the importance degree corresponding to each index includes:

Specifically, on the basis of the above embodiment, after the Redis is started to operate, according to the monitoring period corresponding to each index, invoking a preset Info command to obtain a value of each index in the index set of each instance, includes:

Specifically, on the basis of the foregoing embodiment, in a case that the index set of the instance includes a problem index, identifying the instance as a target instance includes, for each instance, the following steps:

The functions described in the method of the embodiment of the present application, if implemented in the form of software functional units and sold or used as independent products, may be stored in a storage medium readable by a computing device. Based on such understanding, part of the contribution to the prior art of the embodiments of the present application or part of the technical solution may be embodied in the form of a software product stored in a storage medium and including several instructions for causing a computing device (which may be a personal computer, a server, a mobile computing device or a network device) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: u disk, removable hard disk, read only memory, random access memory, magnetic disk or optical disk, etc. for storing program codes.

The embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same or similar parts among the embodiments are referred to each other.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A monitoring method based on Redis is characterized by comprising the following steps:

2. The method according to claim 1, wherein the performing big data analysis on the sample operation data of each instance in Redis to obtain an index set, and the fault type and fault analysis corresponding to each index in the index set comprise:

3. The method of claim 2, wherein the configuring the monitoring period, the value range, and the importance degree corresponding to each index comprises:

4. The method as claimed in claim 3, wherein after the Redis is started and operated, according to a monitoring period corresponding to each index, calling a preset Info command to obtain a value of each index in the index set of each instance, including:

5. The method of claim 4, wherein for each of the instances, identifying the instance as a target instance if the set of metrics for the instance contains a problem metric comprises:

6. A Redis-based monitoring device, comprising:

7. The apparatus according to claim 6, wherein the analysis unit is specifically configured to:

8. The apparatus according to claim 7, wherein the configuration unit is specifically configured to:

9. A computer readable storage medium, characterized in that the computer readable storage medium comprises a stored program, wherein the program performs the Redis-based monitoring method of any of claims 1-5.

10. A Redis-based monitoring device, comprising: a processor, a memory, and a bus; the processor and the memory are connected through the bus;

the memory is configured to store a program and the processor is configured to execute the program, wherein the program executes the Redis-based monitoring method according to any one of claims 1 to 5.