CN117806899A

CN117806899A - Data monitoring analysis method, device, server, operation and maintenance system and storage medium

Info

Publication number: CN117806899A
Application number: CN202211172716.0A
Authority: CN
Inventors: 上官栋栋; 张钧宇; 曾维富
Original assignee: Huawei Cloud Computing Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2022-09-26
Filing date: 2022-09-26
Publication date: 2024-04-02
Also published as: WO2024066506A1

Abstract

The invention discloses a data monitoring and analyzing method, a device, a server, an operation and maintenance system and a storage medium, and relates to the technical field of computers. The data monitoring and analyzing method comprises the following steps: the server receives and executes a change command, wherein the change command is used for executing a change operation on an object in an application; the server monitors the monitoring object associated with the change command to obtain the identification of the first object and the identification of the monitoring object generated in the operation process of the monitoring object. Because only the monitoring object associated with the change command is monitored, the monitoring range is reduced, and the monitoring accuracy and efficiency are improved. In addition, the monitoring range of the server is reduced, so that the monitoring data volume generated by the server is reduced, redundant data generated in the data monitoring process is reduced, and the occupation of the monitoring data to storage resources in the server is reduced; the server carries out risk assessment on the data to obtain the risk level corresponding to the change command, analysis on the redundant data is not needed, and the data analysis efficiency of the server is improved.

Description

Data monitoring analysis method, device, server, operation and maintenance system and storage medium

Technical Field

The present disclosure relates to the field of computers, and in particular, to a data monitoring and analyzing method, a device, a server, an operation and maintenance system, and a storage medium.

Background

The operation and maintenance personnel can execute the command to change the application program according to the requirement of the user so as to delete, modify or newly add the data of the application program and modify the behavior of the application program. During the process of running the application program by the server, the application program is monitored to find out the running fault caused by the change. However, monitoring all processes in an application by a server generates a large amount of monitoring data, and analyzing all monitoring data is inefficient. Therefore, how to provide a more effective monitoring and analyzing method for the modification process is a problem to be solved.

Disclosure of Invention

The application provides a data monitoring and analyzing method, a device, a server, an operation and maintenance system and a storage medium, which are used for solving the problem of low efficiency of monitoring and analyzing data in the common technology.

The application adopts the following technical scheme.

In a first aspect, a data monitoring and analysis method is provided, the method being performed by one or more servers in a server cluster, the data monitoring and analysis method comprising: first, the server receives and executes a change command for instructing to execute a change operation on a first object in a first application. Secondly, in the process of executing the change command, the server determines the monitoring object associated with the first object, and obtains the identification of the first object and the identification of the monitoring object generated in the running process of the monitoring object by utilizing a monitor deployed in advance. And finally, the server carries out risk assessment according to the identification of the first object and the identification of the monitoring object to obtain a risk level corresponding to the change command.

For example, the server may determine whether to alert based on the risk level. The first object may be a data file or process in the first application. The monitoring object may be a process associated with a change command.

Compared with the embodiment in which the server monitors all processes of the whole application, in the embodiment, the server only monitors the monitoring object associated with the change command, so that the monitoring range is reduced, and the monitoring accuracy and efficiency are improved. In addition, as the range of the process to be monitored by the server is reduced, the monitoring data volume generated by the server is reduced, redundant data generated in the data monitoring process is reduced, and the occupation of the monitoring data to storage resources in the server is reduced. In addition, the server only analyzes the data of the monitoring object related to the change command, and the redundant data is not required to be analyzed, so that the data analysis efficiency of the server is improved.

In one possible implementation, the monitoring object includes a second object in the first application and an object in the second application, where the second application is an application that interacts with the first application. The above object is used to indicate a process.

In one possible implementation, the monitoring object is associated with the first object through a system call function.

For example, when the first object is a data file, the monitoring object may read the data file through a read function, and the monitoring object is associated with the first object based on the read function.

When the first object is a process, the monitoring object may create the first object through a process creation (copy_process) function, the monitoring object being associated with the first object based on the copy_process function.

The server can clearly show the relation between the monitoring object and the first object through the system call function, and further can accurately correlate the monitoring object with the first object, so that the process of executing change operation on the first object is ensured to belong to the monitoring object, the problem of incomplete monitored data caused by missing the monitoring object is avoided, and the monitoring accuracy is improved.

In one possible implementation, the altering operation includes one or more of adding, deleting, and modifying.

In one possible implementation manner, the server processes the identifier of the first object and the identifier of the monitoring object by using a preset risk assessment model, so as to obtain a risk level corresponding to the change command.

The server processes the identification of the first object and the identification of the monitoring object by using the risk assessment model, and compared with the analysis processing by using the index data, the server determines the risk condition of the change command by using the identification of the first object and the identification of the monitoring object, for example, the identification of the first object and the identification of the monitoring object are input into the risk assessment model for analysis processing, and the risk level of the change command is determined, so that whether an alarm is given or not is determined.

In one possible implementation manner, after the execution of the change command is completed, when the server or other servers in the server cluster fail, the server acquires alarm information corresponding to the failure, retrieves the identifier of the first object and the identifier of the monitoring object according to the alarm information, and determines a corresponding operation log according to the retrieved identifier of the first object and the identifier of the monitoring object, so as to obtain the change command in the operation log.

The alarm information is used for indicating fault data generated in the running process of the server; the operation log is used for indicating operation records of a plurality of change commands for the first application, wherein the plurality of change commands comprise the change commands.

For example, the server may employ a spatiotemporal retrieval algorithm of graph computation to determine, from data stored in the server, an identification of the first object and an identification of the monitoring object that match the alert information.

The server performs data retrieval by using a space-time retrieval algorithm based on graph calculation, for example, the space-time retrieval algorithm based on graph calculation refers to a deep semantic matching model (Deep Structured Semantic Models, DSSM), and if the server queries influence surface data corresponding to the alarm information based on the DSSM, the influence surface data with the highest matching degree with the alarm information can be obtained rapidly, so that the retrieval efficiency of the influence surface data is improved. The server determines the change command in the operation log corresponding to the identification of the first object and the identification of the monitoring object, and outputs the change command to the front end, so that the change command possibly causing faults is indicated for a user, the time consumption of abnormality investigation is shortened, and the efficiency of abnormality investigation is improved.

In one possible implementation, the server updates the interception model deployed in the fort machine with the change command and the corresponding risk level as training data, and the updated interception is used for intercepting part of the change command. Wherein the server comprises a fort machine.

The updated interception model can more accurately intercept the execution command with the risk level meeting the set condition, so that the accuracy of interception is improved, the server does not need to judge the execution command intercepted by the interception model, the number of the execution commands required to be monitored and analyzed in the data monitoring and analyzing process is reduced, and the monitoring and analyzing efficiency is improved.

In one possible implementation manner, when the monitored object is executed, the server invokes a preset detection point, and the detection point triggers a monitor which is deployed in advance in the server, and the monitor acquires a system resource processed by the monitored object when executed, so as to obtain an identifier of the first object and an identifier of the monitored object. Wherein the tracking points include detection points and monitors.

For example, the monitor may be an extended berkeley package filter (Extended Berkeley Packet Filter, eBPF).

The server is matched with the monitor according to the detection points included in the tracking points, the detection points are used for monitoring the monitored objects which are executed to the preset command or function, the monitoring range is further reduced, and redundant monitoring data are avoided. When the monitoring object executes a preset command or function, the detection point triggers the monitor to monitor the monitoring object to obtain the identification of the first object and the identification of the monitoring object, and compared with the monitoring data in the common technology, which comprises indicative data such as performance index data, the server inputs the influence surface data into the risk assessment model for analysis and processing, and determines the risk level of the change command, thereby determining whether to alarm or not, and improving the accuracy of the server for alarming.

In another possible implementation manner, when the monitored object is a remote access process, the server obtains a message generated when the monitored object is operated, and the server analyzes the message to obtain the identification of the first object and the identification of the monitored object.

For example, the server may obtain and parse a message generated when the monitoring object is run through the high performance data path (Express Data Path, XDP).

The server analyzes the message of the remote access process to determine the remote service type accessed in the influence surface data, so that the remote access process is monitored, the type of the monitorable process is increased, and the monitoring efficiency is improved.

In one possible implementation manner, the server sends at least one of the identification of the first object, the identification of the monitoring object and the risk level to the front end of the terminal for display. The front end here may be a display connected to the terminal, a display screen provided in the terminal, or the like, and is not limited in this application. The server displays the data at the front end, so that the data visualization is realized, and a user can timely process the command input into the server according to the visualized data.

In a second aspect, a data monitoring and analysis device is provided, which is applied in a server, and the device includes various modules for executing the data monitoring and analysis method in the first aspect or any possible implementation manner of the first aspect. For example, the data monitoring and analyzing apparatus includes: the device comprises a receiving module, an object determining module and a grade determining module. The receiving module is used for receiving the change command; the object determining module is used for determining a monitoring object associated with the first object in the process of executing the change command; and the grade determining module is used for determining the risk grade corresponding to the change command according to the identification of the first object and the identification of the monitoring object. The change command is used for indicating to execute a change operation on a first object in the first application.

The advantages may be seen from the description of any one of the possible implementations of the first aspect, which is not repeated here. The data monitoring analysis device has the function of implementing the behavior in the method instance in any one of the possible implementations of the first aspect. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the functions described above.

In a third aspect, a server is provided, the server comprising at least one processor and a memory, the memory storing instructions, the processor invoking the instructions to implement the method in any of the first aspect and the possible implementation of the first aspect.

In a fourth aspect, an operation and maintenance system is provided, the operation and maintenance system comprising: a fort machine and a plurality of servers;

the fort machine is used for receiving and screening the execution command to obtain a change command;

the server is configured to execute a change command, and monitor and analyze a process of executing the change command, so as to implement the method in any one of the first aspect and the possible implementation manner of the first aspect.

In a fifth aspect, the present application provides a computer readable storage medium having stored therein a computer program or instructions which, when executed by a processing device, implement the method of any one of the first aspect and the possible implementation manner of the first aspect.

In a sixth aspect, the present application provides a computer program product comprising a computer program or instructions which, when run on a processing device, cause the processing device to execute the computer program or instructions to implement the method in any one of the possible implementations of the first aspect and the first aspect.

Advantageous effects of the above second aspect to sixth aspect may be referred to the description of the first aspect or any implementation manner of the first aspect, and are not described here.

Further combinations of the present application may be made to provide further implementations based on the implementations provided in the above aspects.

Drawings

Fig. 1 is an application scenario diagram of an operation and maintenance system provided in the present application;

fig. 2 is a first monitoring schematic diagram of an eBPF provided in the present application;

fig. 3 is a second monitoring schematic diagram of an eBPF provided in the present application;

fig. 4 is a schematic diagram of XDP monitoring provided in the present application;

FIG. 5 is a schematic diagram of a process tree provided herein;

FIG. 6 is a schematic flow chart of a data monitoring and analyzing method provided by the present application;

FIG. 7 is a schematic diagram of a server association provided in the present application;

Fig. 8 is a schematic structural diagram of a data monitoring and analyzing device provided in the present application;

fig. 9 is a schematic structural diagram of a server provided in the present application.

Detailed Description

The application provides a data monitoring analysis method, which comprises the following steps: firstly, a server receives a change command, executes the change command, and stores a log record (or referred to as an operation log) of the change command; the change command is to instruct to perform a change operation on a first object in a first application. Secondly, in the process of executing the change command, the server determines the monitoring object associated with the first object, and obtains the identification of the first object and the identification of the monitoring object generated in the running process of the monitoring object by utilizing a monitor deployed in advance. And finally, the server carries out risk assessment according to the identification of the first object and the identification of the monitoring object to obtain a risk level corresponding to the change command, and the server can determine whether to alarm according to the risk level.

By way of example, the first object may be a data file or process in a first application. The monitoring object may be a process associated with the change command, for example, the server may utilize the deep learning model to perform risk assessment on the identification of the first object and the identification of the monitoring object, so as to obtain an analysis result (such as the risk level described above).

Compared with the embodiment in which the server monitors all processes of the whole application, in the embodiment, the server only monitors the monitoring object associated with the change command, so that the monitoring range is reduced, and the monitoring accuracy and efficiency are improved. In addition, as the range of the process needing to be monitored in the service area is reduced, the monitoring data volume generated by the server is reduced, redundant data generated in the data monitoring process is reduced, and the occupation of the monitoring data to storage resources in the server is reduced. In addition, the server only analyzes the data of the monitoring object related to the change command, and the redundant data is not required to be analyzed, so that the data analysis efficiency of the server is improved.

Next, description will be given of a data monitoring analysis method provided in this embodiment, and first, description will be given of related art.

In the operation and maintenance scenario, the management of the execution command by the server is generally divided into three links: pre-management, in-matter management, post-matter management.

The prior management refers to a filtering processing link before the server executes the change command.

The in-process management refers to a monitoring link in the process of executing the change command by the server.

Post-event management refers to the failure handling and model update links after the server executes the change command.

The interception error refers to that in the prior management, due to the complexity of an execution command, a server easily misjudges the risk of the execution command in the prior management and uses an interception method, so that the execution command with high risk is caused, or the normal execution command is intercepted. By way of example, since modifying a normal file does not pose a risk, but if a display/etc/passwd file (user database in which fields give user name, real name, home directory, encrypted password and other information of the user) would pose serious consequences for the user information leakage, etc.

Risk level refers to the risk condition of the server executing the command, which may be divided into a plurality of levels. For example, when the command is intercepted by a method of managing a black-and-white list in advance and the command is checked/etc/passwd file by a vim command, the risk level of the command is 2, so the command is the command to be intercepted.

The monitoring object is a process to be monitored in the in-process management. Illustratively, the process of executing the read command.

Monitoring behavior refers to the action or behavior of monitoring an object in a management in a matter. For example, the server monitors the resource call condition of a process in the execution process.

The secondary risk is that the commands of an operation and maintenance script are related, if the last command is intercepted by the server, the next command will be executed in an error state, and unknown risk is easily caused.

The bastion machine is a device for monitoring and recording the operation behaviors of operation and maintenance personnel on devices such as servers, network devices, security devices, databases and the like in a network by using various technical means so as to intensively alarm, timely process and audit responsibility in order to ensure that the network and data are not invaded and damaged by external and internal users under a specific network environment.

To avoid the secondary risk described above, the present application monitors the progress of the executed change order. As shown in fig. 1, fig. 1 is an application scenario diagram of an operation and maintenance system provided in the present application. The operation and maintenance system 100 can include a fort machine 102 and n servers 103, where n is a positive integer. The fort machine 102 and any one of the servers 103 can communicate with each other by a wired method or by a wireless method. Wherein the identification of the first object and the identification of the monitoring object are indicated by the influence surface data.

Illustratively, in the operation and maintenance system 100 of FIG. 1, after an operator logs into the fort machine 102, one or more execution commands are entered. The fort 102 screens and intercepts the execution command to obtain a screened change command. The fort 102 forwards the change command to the server 103 where the first application that needs to perform the change operation is located. The server 103 executes the change command, the monitor on the server 103 acquires the influence surface data generated by the monitoring object associated with the change command in the execution process, the server 103 determines the risk level of the change command according to the influence surface data, and alarms based on the risk level.

In one possible example, an operator may specify a changed server in the change system, set operations to be performed, and upload script programs for the change. And then the change system executes the change and returns an execution result, wherein the execution result can be an alarm or no alarm.

In an alternative implementation, the fort 102 utilizes an interceptor to filter and intercept the execution command.

In one possible example, the interceptor may be a black and white list interception algorithm, and the fort machine 102 uses the black and white list interception algorithm to determine the risk level of each execution command, resulting in a risk level of the execution command. The fort 102 intercepts or passes the execution command according to the resulting risk level. The black-and-white list interception algorithm refers to that the fort 102 queries whether the preset black-and-white list has the same command by using the input execution command, and if yes, determines the corresponding risk level if the black-and-white list contains the command; if not, if the white list contains the command, the execution command is released.

In another possible example, the interceptor may be a deep learning model, the deep learning module may include models such as a K-means Neighbor (KNN), a support vector machine (Support Vector Machine, SVM), etc., and the processing method of the deep learning model to execute the command may refer to the processing steps of the black-and-white list intercepting algorithm to execute the command, which are not described herein.

Before the fort 102 sends a change command to the server 103 where the first application that needs to perform a change operation, a monitor will be deployed in the server 103 where the change operation needs to be performed.

For the process of managing deployment monitors in advance, the present application presents two possible implementations as follows.

In a first possible implementation manner, the server 103 deploys the eBPF locally, and obtains, by using the eBPF, an identification of a first object and an identification of a monitoring object that the server 103 operates when running the monitoring object; the eBPF comprises a kernel program, a collector and an intermediate medium, wherein the intermediate medium is used for interaction data between the kernel program and the collector.

As shown in fig. 2, fig. 2 is a schematic diagram of monitoring an eBPF provided in the present application, where the server 103 monitors the system resource situation by deploying an eBPF kernel in a kernel mode. The server 103 gathers system resource situations by deploying collectors in the user state. The server 103 also deploys an intermediate storage medium (eBPF Map) for interacting data with the eBPF kernel and the collector, which is a shared memory for the eBPF kernel to interact data with the collector. After the eBPF kernel monitors the operation condition of the system resource when the server 103 executes the change command, the operation condition is written into the eBPF Map, and the collector acquires the aforementioned operation condition from the eBPF Map, thereby obtaining the influence surface data.

In one possible scenario, the server 103 runs a monitor object to access a local dynamic object through shared memory, pipes, signals, shared files, sockets, etc. As shown in fig. 3, fig. 3 is a second monitoring schematic diagram of an eBPF provided in the present application, taking a case that a monitoring object accesses a local dynamic object through a socket as an example, the eBPF will monitor sys_recv and sys_send system calls, and obtain the influence surface data.

In a possible example, the server 103 is further provided with a detection point, where the detection point is used to trigger the eBPF when the server 103 executes a preset command or function during the operation of the monitored object.

For example, when a "vim" command is received in the server 103 to view the content of the file, the server 103 first pulls up a sub-process through Bash to execute a vim program, which invokes the open and read functions of the system. Since the probe point is inserted into the read function in advance, when the sub-process executes the vim program and invokes the read function, the probe point triggers the eBPF to monitor so as to acquire the influence surface data of the sub-process.

The server is matched with the eBPF according to the detection points included in the tracking points, the detection points are used for monitoring the monitored objects which are executed to the preset command or function, the monitoring range is further reduced, and redundant monitoring data are avoided.

And compared with the prior art that the monitoring data comprises indicative data such as performance index data, when the monitoring object executes a preset command or function, the detection point triggers the eBPF to monitor the monitoring object, so as to obtain the influence surface data. The server uses the influence surface data to determine the risk condition of the change command, such as inputting the influence surface data into a risk assessment model for processing, and determining the risk level of the change command, thereby determining whether to alarm.

In a second possible implementation, the server 103 deploys the XDP program onto the local area. When the monitoring object belongs to the remote access process, the server 103 analyzes a message received or sent by the monitoring object in the server 103 by using the XDP program to obtain IP-Port information. Based on the IP-Port information, server 103 may determine to affect access services in the face data.

With respect to the aforementioned server 103 acquiring IP-Port information using XDP, the present embodiment provides an example in which the server 103 acquires a service of a monitoring object accessing a remote node using XDP. As shown in fig. 4, fig. 4 is a schematic XDP monitoring diagram provided in the present application, where a monitored object in a server 103 may access a service of a remote node by means of http access, grpc call, or the like, where the above means may use a TCP/IP stack for access. The XDP program runs before the TCP/IP stack after the network card in the server 103 receives the packet, performs Ethernet protocol analysis, IP protocol analysis, and TCP protocol analysis on the packet to obtain the IP-Port information, writes the IP-Port information into the eBPF Map, and the collector obtains the IP-Port information from the eBPF Map. Finally, the server 103 obtains the access service based on the corresponding relation between the IP-Port information and the process.

In one possible scenario, the server 103 may deploy the XDP program in a native kernel mode.

The server 103 analyzes the message of the remote access process by adopting the XDP program, determines the remote service type accessed in the influence surface data, monitors the remote access process, increases the type of the monitorable process, and improves the monitoring efficiency.

The face data may include content as shown in table 1 below.

TABLE 1

Sequence number	Influence surface data
		1	Process PID
2	Calling function name
		3	Resource name of calling object (File name)
4	Calling parameters
		5	IP-Port
6	Operation object
		7	Accessed local cloud services
8	Accessed remote services

It is noted that the influence surface data shown in table 1 includes content only provided as examples of the present application, and should not be construed as limiting the present application, and the influence surface data may include more or less content. The identification of the first object comprises the resource name of the calling object, the operation object, the accessed local cloud service or the accessed remote service and the like. The identification of the monitoring object comprises the calling function name, the process PID, the calling parameter and the like.

In order to achieve the purpose that the server only monitors the monitoring object associated with the change command, the embodiment provides an implementation manner for determining the monitoring object. As shown in fig. 5, fig. 5 is a schematic diagram of a process tree provided in the present application, where each node in the process tree represents a process, a connection line between nodes represents an association relationship between processes, and a sequence number in a node represents a PID of a process. When the server 103 starts the monitor, the monitor takes the PID of a preset process (initial object) as a starting parameter, and takes the initial object and the PID as a root node (root process) of a process tree, wherein the processes in the process tree all belong to the monitored object. When another process has an association with a process on the process tree, the other process also belongs to the monitor object and is maintained in the process tree. The server only monitors the processes in the process tree, so that the problem that monitored data is incomplete due to omission of the processes needing to be monitored is avoided, and the monitoring accuracy is improved.

For the determination of the monitoring object, the present application gives the following four optional cases.

In the first alternative case, the initial object in the server 103 receives and executes the change command, and in the course of the initial object executing the change command, the sub-process (second object) created by the server 103 is the monitor object. The first object is used to indicate a file in the first application.

For example, in the process of executing the change command, the initial object needs to create a second object to execute the change operation on the file in the first application due to the use of the change command, for example, when the change command needs to search for the content in the first object, a process for searching is started. The eBPF determines whether the initial object has an action of creating the second object by monitoring the use of copy_process system call. And if the copy_process system call is monitored, taking the second object as a monitoring object. And maintaining the PID of the second object in the process tree by using the copy_process system call as the association relation between the initial object and the monitoring object.

The above examples only show that the relationship between the initial object and the child process is determined based on copy_process system call, and the relationship between other processes in the process tree may also be determined based on copy_process system call.

The server 103 associates a plurality of processes through a process tree based on the above-described system call function, and the association relationship between the plurality of processes is determined by the system call function. Since all processes in the process tree are monitored by the server 103, when the server 103 adds the aforementioned second object to the process tree, the second object is also monitored by the server 103, so that it is ensured that all processes associated with the change command are not missed by the server 103, and the integrity and accuracy of the server 103 for monitoring all processes associated with the change command are improved.

In a second alternative case, the initial object interacts with other processes that belong to the monitoring object during execution of the change command. The first object is used to indicate a process in the first application.

For example, when the initial object executes the change command to process the running result of the process in the first application, the process in the first application is a monitoring object. The process in the first application is associated with the initial object by a system call function.

In a third alternative scenario, the initial object is when, during execution of the change command, a process in a second application is accessed via the network, the object in the second application belonging to the monitoring object. The first application and the second application may be applications running in a server.

For example, the change command executed by the initial object needs to call a running process in other applications, where the running process is a monitoring object.

In a fourth alternative scenario, the initial object invokes, views or modifies a file of a process in the first application or the second application during execution of the change command, the process belonging to the monitoring object.

It should be noted that, the determination of the monitoring object is not limited to the above four situations, and when the monitoring object included in the process tree affects other processes when executing the change command, the other processes have an association relationship with the monitoring object, so the other processes are also monitoring objects.

In order to solve the problem that the server monitors all processes of the whole application to generate redundant data and result in low analysis efficiency, the embodiment provides a data monitoring and analyzing method. As shown in fig. 6, fig. 6 is a schematic flow chart of a data monitoring and analyzing method provided in the present application, in which steps executed by a monitor and an analyzer are executed by a processor in a server 103, in this embodiment, the steps are managed in a matter of fact, the server 103 executes a received change command, and monitors a process of executing the change command by using a monitor deployed in the server to obtain influence surface data, and then whether to terminate the change is determined according to the influence surface data. The data monitoring and analyzing method of the present embodiment may be executed by one or more servers.

Referring to fig. 6, the data monitoring and analyzing method provided in the present embodiment includes the following steps S610 to S630.

S610, the server 103 receives the change command.

The change command is used for indicating to execute a change operation on a first object in the first application.

The step of the server 103 receiving the change command sent by the fort machine 102 and the fort machine 102 obtaining the change command can refer to the content of the fort machine shown in fig. 1, which is not described herein.

The server 103 performs a change operation on the first object to change the behavior of the first application, for example, to add a new function to the first application, to modify an existing function in the first application, and the like. In the present embodiment, three possible cases of performing the change operation are provided.

In a first possible scenario, the change operation is an addition, which refers to adding a new process of executing a command, e.g. the server 103 adds a new process of executing a command in the first application, such that the first application adds a new function.

In a second possible scenario, the change operation is a delete, where delete refers to deleting an existing process of the execution command or deleting a data file, e.g., when the server 103 deletes a process for implementing a search function in the first application, the search function in the first application is brought down; or the server 103 deletes the data file supporting the search function in the first application, the search function will be off-line as well.

In a third possible scenario, the change operation is a modification, which refers to modifying an existing process of executing the command or modifying the data file. When the server 103 modifies the process of implementing the hot commodity pushing function in the first application, the hot commodity pushing function is changed into an active pushing function.

It is noted that the change operation may also be a combination of the above cases, such as the change operation including addition and modification.

With continued reference to fig. 6, the data monitoring analysis method provided in the present embodiment further includes step S620.

S620, the server 103 determines a monitoring object associated with the first object in the process of executing the change command.

The monitoring object is a process that needs to be scheduled in the process of executing the change command by the server 103. For example, a change command executed in an initial object may call or view another process, which is associated with the change command, so that the other process is a monitoring object. And the change command has a corresponding relation with the first object, the first object is associated with the monitoring object.

For more determination manners of the monitoring object, reference may be made to the determination contents of the monitoring object shown in fig. 5, which are not described herein. The change command received by the server 103 is an execution command filtered by the bastion device 102 in advance, and the content of the bastion device 102 shown in fig. 1 can be referred to, which is not described herein.

The server 103 obtains the condition of the operating system resource of the monitored object in the process of executing the change command in the monitor which is managed and deployed in advance, thereby obtaining the influence surface data. For the process of obtaining the influence surface data by the monitor, reference may be made to the process of deploying the monitor, which is not described herein.

The present embodiment provides two possible examples for the manner in which the face data is acquired.

In a first possible example, the influence plane data is acquired locally at the server 103, and the acquisition method may refer to the content of the deployment monitor, which is not described herein.

In a second possible example, the server 103 obtains the impact plane data of a remote server, which may also be deployed with a monitor, through a remote procedure call (Remote Procedure Call, RPC). The server 103 calls the remote server to execute the change command based on the RPC, and reads the influence surface data acquired by the monitor in the remote server.

S630, the server 103 determines a risk level corresponding to the change command according to the influence surface data.

Wherein the risk level is used to indicate an impact of the change command on the first application. The server 103 determines whether to alarm the change command according to the risk level of the change command and a preset alarm table. The alarm table is used for indicating the corresponding relation between the risk level and the analysis result, and the analysis result is alarm or not alarm.

For example, the analyzer in the server 103 may determine the risk level of the impact surface data based on a preset rule or a deep learning model, and the server 103 determines whether to alarm according to the risk level. The analyzer will now be described with respect to the influence surface data processing based on the deep learning model.

In a possible implementation manner, the server 103 inputs the collected influence surface data into a preset risk assessment model through a collector in the monitor to process, obtain a risk level corresponding to the change command, and determine an analysis result based on the risk level.

The risk assessment model can be obtained by training based on algorithms such as SVM, density-based clustering algorithm (Density-Based Spatial Clustering ofApplications withNoise, DBSCAN), KNN and neural network. The analysis result is used to instruct the server 103 to alarm, and the risk level and the analysis result may have a corresponding relationship.

By way of example, the alert table is shown in table 2 below.

TABLE 2

Risk level	Analysis results
		1	Alarm-free
2	Alarm
		3	Alarm

It should be noted that the correspondence between the risk level and the analysis result shown in table 2 is only an example provided in the present application, and should not be construed as limiting the present application, and the correspondence between the risk level and the analysis result may also include more or less content.

After the analysis result is determined to be an alarm, an alarm is sent to the terminal 101 or the fort machine 102, and the execution of the current change command is terminated.

Compared with the method of analyzing and processing by using index type data, in the embodiment, the server inputs the influence surface data into the risk assessment model for analysis and processing, and determines the risk level of the change command, so that whether to alarm or not is determined, and the accuracy of the server for alarming is improved.

In this embodiment, the server 103 monitors only the monitoring object associated with the change command, so as to reduce the monitoring range and improve the monitoring accuracy and efficiency compared with the monitoring of all processes of the whole application by the server. Moreover, because the scope of the process that the server 103 needs to monitor is reduced, the amount of monitoring data generated by the server 103 is reduced, redundant data generated in the data monitoring process is reduced, and occupation of storage resources in the server by the monitoring data is reduced. In addition, the server 103 only analyzes the data of the monitoring object associated with the change command, and the redundant data is not required to be analyzed, so that the data analysis efficiency of the server 103 is improved. Compared with the operation condition of the resource when the monitored object is operated by the monitored data, which is acquired by the server 103, the operation condition of the resource is compared with the operation condition of the resource when the monitored data comprises indicative data such as performance index data and the like in the common technology, the operation condition of the monitored data is analyzed and processed by the server by inputting the monitored data into the risk assessment model, and the risk level of the change command is determined, so that whether an alarm is performed is determined, and the accuracy of the alarm performed by the server is improved.

In an alternative implementation, in post hoc management, one or more servers in the server cluster fail, the failed server will issue a failure alarm, the server 103 obtains alarm information in the failure alarm, and determines one or more change commands corresponding to the alarm information according to the operation log of the alarm information retrieval influence surface data.

After the server 103 obtains the influence plane data, the influence plane data is associated with the corresponding operation log. The operation log stores operation records of a plurality of change commands for the first application, wherein the change commands comprise change commands.

The alarm information indicates fault data generated by the server executing the change command, and the server 103 performs retrieval matching according to the fault data and the influence surface data to obtain one or more influence surface data; the server 103 obtains a change command in the operation log according to the correspondence between the influence surface data and the operation log, and outputs the change command to the terminal 101.

By way of example, the alert information shown in table 3 below includes content.

TABLE 3 Table 3

/>

It should be noted that the alert information shown in table 3 is only an example provided in the present application, and should not be construed as limiting the present application, and the alert information may include more or less content.

In one possible scenario, the server 103 matches the alert information with the influence surface data by using a spatiotemporal search algorithm based on graph calculation, where the influence surface data obtained by each server is stored in a local memory of the server, and the data stored in each server is associated based on the logic sequence in which each server executes the service. The server 103 determines a failed server based on the alarm information. The server 103 determines a server to be searched by the above-described association relationship and the failed server. When the data matched with the alarm information is not searched in the influence surface data stored in the server to be searched, the range of the server to be searched is enlarged until the influence surface data matched with the alarm information is obtained.

The graph-based computation space-time retrieval algorithm may be a DSSM for indicating computation of a similarity between the alert information and the plurality of influence surface data stored in the server, the server obtaining a maximum value of the similarity between the alert information and the plurality of influence surface data. And when the server is based on the maximum value, acquiring the influence surface data corresponding to the alarm information.

For example, in the commodity order process, the commodity display, order, inventory and distribution server are sequentially processed, as shown in fig. 7, fig. 7 is a schematic association diagram of a server provided in the present application, and the association relationship between the servers is shown. The example overall flow shown in fig. 7 is as follows (1): and the commodity display server performs commodity display, and jumps to the ordering server for processing after receiving the click ordering operation of the client. (2) The method comprises the following steps After receiving the payment completion instruction, the ordering server jumps to the inventory server, and the inventory server updates the commodity inventory. (3) The method comprises the following steps After the inventory server updates the commodity inventory, the commodity inventory server jumps to the delivery server to deliver the commodity to the warehouse. (4) After the delivery server delivers the commodity to the commodity delivery server, an instruction is sent to the ordering server to instruct the ordering server that the commodity delivery is completed. The servers connected by one short line are directly related, such as servers corresponding to distribution; the servers connected by the two short lines are indirectly related, such as servers corresponding to commodity display and distribution.

When the server determined to be searched is the server corresponding to the inventory in fig. 6, the server 103 determines the server directly connected to the server corresponding to the inventory, such as the order server and the distribution server, as the server to be searched. The server 103 matches the alarm information with the influence surface data in the ordering server, the inventory server and the distribution server, and determines the influence surface data corresponding to the alarm information. If the server 103 does not match the corresponding influence surface data in the matching process, a server indirectly connected to the inventory server, such as a commodity display server, is determined as a server to be searched. The server 103 may match the influence surface data stored in the merchandise display server with the alarm information, and determine the influence surface data corresponding to the alarm information. The server 103 then determines a change order in the corresponding change order based on the influence plane data.

In post-hoc management, the server 103 retrieves the face data matching the alarm information based on the spatiotemporal retrieval algorithm calculated by the graph, and the server establishes the spatiotemporal retrieval algorithm based on the DSSM. The DSSM uses the words in the text as the finest segmentation granularity, can multiplex the semantics of each word expression, reduces the dependence of segmentation words, and improves the generalization capability of the model; and DSSM is supervised training, and the accuracy is higher. Therefore, in this example, the server 103 performs a spatiotemporal search algorithm based on DSSM, and can improve the matching accuracy of the alert information and the influence surface data. The server 103 determines the change command in the operation log corresponding to the influence surface data, outputs the change command to the front end, indicates the change command possibly causing fault alarm for the user, shortens the time consumption of the abnormality investigation, and improves the efficiency of the abnormality investigation.

In an alternative implementation, in post hoc management, the server 103 uses the change command and the risk level corresponding to the influence plane data as training data according to the risk level of the influence plane data, and the server 103 updates the interceptors deployed in the bastion machine 102 according to the training data.

The interceptor in the fort machine is an interception model obtained based on deep learning model training, the server retrains the interception model by using training data obtained in actual production to obtain an updated interception model, and redeploys the updated interception model into the fort machine 102, so that the updated interception model can intercept the execution command with the risk level meeting the set condition more accurately, the interception accuracy is improved, the server 103 does not need to judge the execution command intercepted by the interception model, the number of the execution commands required to be monitored and analyzed in the data monitoring and analyzing process is reduced, and the monitoring and analyzing efficiency is improved.

In an alternative implementation, the server 103 sends the impact surface data or risk level to the terminal 101, and the terminal 101 displays the impact surface data or risk level on the front end. The front end here may be a display connected to the terminal 101, a display screen provided in the terminal 101, or the like, which is not limited in this application.

For example, the server 103 transmits the identification of the first object indicated by the influence surface data to the terminal 101 based on the fort machine 102, so that the terminal 101 displays the identification of the first object.

As another example, the server 103 sends the identification of the monitoring object indicated by the impact plane data to the terminal 101 based on the fort machine 102, so that the terminal 101 displays the identification of the monitoring object.

For another example, the server 103 sends a change command and a corresponding risk level to the terminal 101 based on the fort machine 102, so that the terminal 101 displays the change command and the risk level.

It is noted that the above example is the case where only one type of data is transmitted by the server 103. In another case, the server 103 may transmit the multi-class data to the terminal 101 at the same time, such as transmitting the identification of the first object and the identification of the monitoring object.

The server 103 sends at least one of the identification of the first object, the identification of the monitoring object and the risk level to the terminal 101 for display, so as to realize the visualization of the data, and the user can timely process the command input into the server according to the visualized data.

It will be appreciated that, in order to implement the functions of the above embodiments, the processing device includes corresponding hardware structures and/or software modules that perform the respective functions. Those of skill in the art will readily appreciate that the elements and method steps of the examples described in connection with the embodiments disclosed herein may be implemented as hardware or a combination of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application scenario and design constraints imposed on the solution.

The data monitoring and analyzing method provided according to the present embodiment is described in detail above with reference to fig. 1 to 7, and the data monitoring and analyzing device provided according to the present embodiment will be described below with reference to fig. 8, where fig. 8 is a schematic structural diagram of a data monitoring and analyzing device provided according to the present application.

The data monitoring and analyzing device can be used for realizing the functions of the processor in the method embodiment, so that the beneficial effects of the method embodiment can be realized. In this embodiment, the data monitoring and analyzing device may be a module (e.g., a chip) applied to the server 103.

As shown in fig. 8, the data monitoring analysis apparatus 800 includes a receiving module 810, an object determining module 820, and a rank determining module 830. The data monitoring and analyzing device 800 is used to implement the functions described above in the method embodiments shown in fig. 2 to 7.

The receiving module 810 is configured to receive a change command.

An object determining module 820 is configured to determine a monitoring object associated with the first object during the process of executing the change command.

The monitoring object is a process that needs to be scheduled in the process of executing the change command by the server 103. For example, a change command executed in an initial object may invoke or view another process that is associated with the change command, so that the other process belongs to the monitoring object.

In the pre-management, the object determining module 820 acquires a system resource operated by the monitoring object in the process of executing the change command by using the monitor deployed in advance, the system resource including the identification of the first object and the identification of the monitoring object, and the server obtains the influence surface data. For the monitoring means of the monitor, reference may be made to the foregoing process of deploying the monitor, which is not described herein.

The level determining module 830 is configured to determine a risk level corresponding to the change command according to the identifier of the first object and the identifier of the monitoring object.

The rank determination module 830 may determine the risk rank of the impact surface data using a preset rule or a deep learning model.

To further achieve the functionality described above in the method embodiments shown in fig. 2-7. The data monitoring and analyzing device 800 further comprises an information obtaining module 840, a retrieving module 850, an updating module 860, a first monitoring module 870, and a second monitoring module 880, a display module 890.

The acquiring module 840 is configured to acquire alarm information; the retrieving module 850 is used for retrieving the operation log according to the alarm information; the update module 860 is configured to update an interception model deployed in the server by taking the change command and the risk level as input; the first monitoring module 870 is configured to monitor a system resource using a tracking point, and obtain an identifier of the first object and an identifier of a monitored object; the second monitoring module 880 is configured to monitor a remote access process, and receive a message generated when the monitoring object is operated; analyzing the message to obtain the identification of the first object and the identification of the monitoring object; the display module 890 is configured to display at least one of impact surface data and risk level.

It should be understood that the server 103 of the foregoing embodiment may correspond to the data monitoring and analyzing apparatus 800 and may correspond to the respective bodies performing the methods according to the embodiments of the present application, and the operations and/or functions of the respective modules in the data monitoring and analyzing apparatus 800 are respectively for implementing the respective flows of the respective methods of the corresponding embodiments in fig. 2 to 7, and are not repeated herein for brevity.

For example, when the data monitoring and analyzing apparatus 800 is implemented by the foregoing server 103, the server 103 may include various hardware, as shown in fig. 9, and fig. 9 is a schematic structural diagram of a server provided in the present application. The server 900 may be used in the operation and maintenance system shown in fig. 1, and the server may be any one of the fort machine 102 and the server 103.

As shown in fig. 9, the server 900 may include a processor 910, a memory 920, a communication interface 930, a bus 940, and the like, with the processor 910, the memory 920, and the communication interface 930 being connected by the bus 940.

Processor 910 is the operational core and control core of server 900. Processor 910 may be a very large scale integrated circuit. An operating system and other software programs are installed in the processor 910, which enable the processor 910 to access memory 920 and various peripheral component interconnect express (Peripheral Component Interconnect Express, PCIe) devices. The processor 910 includes one or more processor cores (cores). The processor core in the processor 910 is, for example, a central processing unit (Central Processing unit, CPU) or other specific integrated circuit (Application Specific Integrated Circuit, ASIC). The processor 910 may also be other general purpose processors, digital signal processors (digital signal processing, DSPs), field programmable gate arrays (fieldprogrammable gate array, FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. In practice, the server device 900 may also include multiple processors.

Memory 920 may be used to store computer-executable program code that includes instructions. The processor 910 executes various functional applications of the server 900 and data processing by executing instructions stored in the internal memory 920. The memory 920 may include a stored program area and a stored data area. The storage program area may store, among other things, an operating system, an application program (such as a run model function, a send function, etc.) required for at least one function, and the like. The storage data area may store data created during use of the processing device 900 (e.g., impact surface data, etc.), and so on. In addition, the internal memory 920 may include a high-speed random access memory, and may also include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (universal flash storage, UFS), and the like.

The communication interface 930 is used to enable communication of the server 900 with external devices or apparatuses. In this embodiment, the communication interface 930 is used for data interaction with other processing devices.

Bus 940 may include a path for transferring information between components (e.g., processor 910, memory 920, communication interface 930). The bus 940 may include a power bus, a control bus, a status signal bus, and the like in addition to a data bus. But for clarity of illustration, the various buses are labeled as bus 940 in the drawing. Bus 940 may be a PCIe bus, or an extended industry standard architecture (extended industry standard architecture, EISA) bus, a unified bus (Ubus or UB), a computer quick link (compute express link, CXL), a cache coherent interconnect protocol (cache coherent interconnect for accelerators, CCIX), or the like. For example, processor 910 may access these I/O devices via a PCIe bus. The processor 910 is coupled to the memory 920 by a Double Data Rate (DDR) bus. Here, different memories 920 may use different data buses to communicate with the processor 910, so the DDR bus may be replaced with other types of data buses, and the embodiments of the present application are not limited to bus types.

It should be noted that, in fig. 9, only the server 900 includes 1 processor 910 and 1 memory 920 as an example, where the processor 910 and the memory 920 are respectively used to indicate a type of device or apparatus, and in a specific embodiment, the number of each type of device or apparatus may be determined according to service requirements.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer programs or instructions. When the computer program or instructions are loaded and executed on a computer, the processes or functions described in the embodiments of the present application are performed in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, a network device, a user device, or other programmable apparatus. The computer program or instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer program or instructions may be transmitted from one website site, computer, server, or data center to another website site, computer, server, or data center by wired or wireless means. The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that integrates one or more available media. The usable medium may be a magnetic medium, e.g., floppy disk, hard disk, tape; optical media, such as digital video discs (digital video disc, DVD); but also semiconductor media such as solid state disks (solid state drive, SSD).

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method of data monitoring and analysis, the method performed by a server, the method comprising:

receiving a change command, wherein the change command indicates to execute a change operation on a first object in a first application;

determining a monitoring object associated with the first object in the process of executing the change command;

and determining the risk level corresponding to the change command according to the identification of the first object and the identification of the monitoring object.

2. The method of claim 1, wherein the monitored object comprises one or more of the following:

and the second object in the first application and the object in the second application.

3. The method according to claim 1 or 2, wherein the monitoring object is associated with the first object by a system call function.

4. A method according to any one of claims 1 to 3, wherein the altering operation comprises one or more of:

adding, deleting and modifying.

5. The method according to any one of claims 1 to 4, wherein determining the risk level corresponding to the change command according to the identification of the first object and the identification of the monitoring object includes:

and inputting the identification of the first object and the identification of the monitoring object into a risk assessment model, and determining the risk level corresponding to the change command.

6. The method according to any one of claims 1 to 5, further comprising: acquiring alarm information, wherein the alarm information is used for indicating fault data generated in the running process of the server;

and searching an operation log according to the alarm information, and determining a change command corresponding to the fault data, wherein the operation log is used for indicating operation records of a plurality of change commands aiming at the first application, and the plurality of change commands comprise the change command.

7. The method according to claim 5 or 6, characterized in that the method further comprises:

and taking the change command and the risk level as input, updating an interception model deployed in the server, wherein the updated interception model is used for intercepting part of the change command.

8. The method according to any one of claims 1 to 7, further comprising:

invoking a tracking point of the monitoring object when the monitoring object is operated;

and monitoring system resources through the tracking point to obtain the identification of the first object and the identification of the monitoring object.

9. The method according to any one of claims 1 to 7, wherein the monitoring object is a remote access process, the method further comprising:

receiving a message generated when the monitoring object is operated;

analyzing the message to obtain the identification of the first object and the identification of the monitoring object.

10. The method according to any one of claims 1 to 9, further comprising:

displaying at least one of the identification of the first object, the identification of the monitoring object and the risk level.

11. A data monitoring and analysis device, the device comprising:

a receiving module, configured to receive a change command, where the change command indicates to perform a change operation on a first object in a first application;

the object determining module is used for determining a monitoring object associated with the first object in the process of executing the change command;

And the grade determining module is used for determining the risk grade corresponding to the change command according to the identification of the first object and the identification of the monitoring object.

12. The apparatus of claim 11, wherein the monitoring object comprises one or more of:

13. The apparatus of claim 11 or 12, wherein the monitoring object is associated with the first object by a system call function.

14. The apparatus of any one of claims 11 to 13, wherein the altering operation comprises one or more of:

adding, deleting and modifying.

15. The apparatus of any one of claims 11 to 14, wherein the rank determination module is further configured to: and inputting the identification of the first object and the identification of the monitoring object into a risk assessment model, and determining the risk level corresponding to the change command.

16. The apparatus according to any one of claims 11 to 15, further comprising:

the acquisition module is used for acquiring alarm information, wherein the alarm information is used for indicating fault data generated in the running process of the server;

And the retrieval module is used for retrieving an operation log according to the alarm information, determining a change command corresponding to the fault data, wherein the operation log is used for indicating operation records of a plurality of change commands aiming at the first application, and the plurality of change commands comprise the change command.

17. The apparatus according to claim 15 or 16, characterized in that the apparatus further comprises:

and the updating module is used for taking the change command and the risk level as input, updating an interception model deployed in the server, and intercepting part of the change command by the updated interception model.

18. The apparatus according to any one of claims 11 to 17, further comprising:

the first monitoring module is used for calling the tracking point of the monitoring object when the monitoring object is operated; and monitoring system resources through the tracking point to obtain the identification of the first object and the identification of the monitoring object.

19. The apparatus according to any one of claims 11 to 17, wherein the monitoring object is a remote access process, the apparatus further comprising:

the second monitoring module is used for receiving messages generated when the monitoring object is operated; analyzing the message to obtain the identification of the first object and the identification of the monitoring object.

20. The apparatus according to any one of claims 11 to 19, further comprising:

and the display module is used for displaying at least one of the identification of the first object, the identification of the monitoring object and the risk level.

21. A server, comprising: a processor and a memory; the memory stores instructions that are invoked by the processor to implement the method of any one of claims 1 to 10.

22. An operation and maintenance system is characterized by comprising a fort machine and a plurality of servers;

the server is used for executing the change command and monitoring and analyzing the process of executing the change command, so as to realize the method of any one of claims 1 to 10.

23. A computer readable storage medium, characterized in that the storage medium has stored therein a computer program or instructions which, when executed by a processing device, implement the method of any of claims 1 to 10.

24. A computer program product comprising a computer program or instructions which, when executed by a processing device, implements the method of any one of claims 1 to 10.