CN114741218A - Method, device, equipment, system and medium for extracting abnormal index of operating system - Google Patents
Method, device, equipment, system and medium for extracting abnormal index of operating system Download PDFInfo
- Publication number
- CN114741218A CN114741218A CN202210289111.3A CN202210289111A CN114741218A CN 114741218 A CN114741218 A CN 114741218A CN 202210289111 A CN202210289111 A CN 202210289111A CN 114741218 A CN114741218 A CN 114741218A
- Authority
- CN
- China
- Prior art keywords
- operating system
- index
- resource
- state
- operating
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0706—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/079—Root cause analysis, i.e. error or fault diagnosis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Quality & Reliability (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Debugging And Monitoring (AREA)
Abstract
One or more embodiments of the present specification provide an abnormal index extraction method, apparatus, device, distributed system, and storage medium for an operating system. The method is applied to multiple operating system scenes; the method comprises the following steps: determining the running state of the operating system according to the acquisition duration of at least one resource of the operating system; if the running state is a non-healthy state, acquiring at least one index to be checked of the operating system and acquiring at least one reference index in other operating systems with the running state being a healthy state; and performing outlier analysis on the index to be checked according to the reference index, and extracting the abnormal index of the operating system in the unhealthy state. The method and the device realize active identification of the running state of the operating system, and actively check the operating system in an unhealthy state so as to extract the abnormal index.
Description
Technical Field
One or more embodiments of the present disclosure relate to the field of terminal technologies, and in particular, to a method, an apparatus, a device, a distributed system, and a storage medium for extracting an abnormal index of an operating system.
Background
An Operating System (OS) is a computer program that manages computer hardware and software resources. The operating system needs to handle basic transactions such as managing and configuring memory, prioritizing system resources, controlling input devices and output devices, operating the network, and managing the file system. The operating system is a foundation stone for running all application software, and the running health of the operating system is directly related to the running quality of the application software.
Disclosure of Invention
In view of this, one or more embodiments of the present disclosure provide an abnormal index extraction method, apparatus, device, distributed system, and storage medium for an operating system.
To achieve the above object, one or more embodiments of the present disclosure provide the following technical solutions:
according to a first aspect of one or more embodiments of the present specification, an abnormal index extraction method for an operating system is provided, which is applied to a multi-operating system scenario; the method comprises the following steps:
determining the running state of the operating system according to the acquisition duration of at least one resource of the operating system;
if the running state is the unhealthy state, acquiring at least one index to be checked of the operating system, and acquiring at least one reference index in other operating systems of which the running state is the healthy state;
and performing outlier analysis on the index to be checked according to the reference index, and extracting the abnormal index of the operating system in the unhealthy state.
According to a second aspect of one or more embodiments of the present specification, an anomaly index extraction apparatus for an operating system is provided, which is applied to a multiple operating system scenario; the device comprises:
the operation state determining module is used for determining the operation state of the operating system according to the acquisition duration of at least one resource of the operating system;
the index acquisition module is used for acquiring at least one index to be checked of the operating system and acquiring at least one reference index of other operating systems with the operating state being a healthy state if the operating state is a non-healthy state;
and the abnormal index extraction module is used for carrying out outlier analysis on the index to be checked according to the reference index and extracting the abnormal index of the operating system in the unhealthy state.
According to a third aspect of one or more embodiments of the present specification, there is provided an electronic apparatus having a plurality of operating systems installed thereon, the electronic apparatus including:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any of the first aspects by executing the executable instructions.
According to a fourth aspect of one or more embodiments of the present specification, a distributed system is provided, where the distributed system includes a plurality of data nodes, and the data nodes are installed with one or more operating systems;
any of the data nodes is configured to perform the method of any of the first aspects.
According to a fifth aspect of one or more embodiments of the present description, a computer-readable storage medium is presented, having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to any one of the first aspect.
One or more embodiments of the present disclosure provide a method for extracting an abnormal index of an operating system in a multi-operating system scenario, which may determine an operating state of the operating system according to an acquisition duration of at least one resource of the operating system, preliminarily determine whether an operating condition of the operating system is healthy from a resource dimension, and implement active identification of the operating condition of the operating system. And under the condition that the running state is in a non-healthy state, counting at least one index to be checked of the operating system, acquiring at least one reference index of other operating systems in which the running state is in a healthy state, performing outlier analysis on the index to be checked according to the reference index, and automatically extracting the abnormal index of the operating system. The embodiment provides a technical scheme for actively checking an operating system, which can be implemented to actively monitor the running condition of the operating system by executing the method provided by the embodiment of the present specification in real time or periodically, and automatically extract an abnormal index of the operating system under the condition that the operating system is abnormal, so as to locate an abnormal reason based on the abnormal index, and the process of automatically extracting the abnormal index is also beneficial to improving the efficiency.
Drawings
Fig. 1 is a block diagram of a distributed system provided by an exemplary embodiment.
Fig. 2 is a flowchart illustrating an abnormal indicator extracting method of an operating system according to an exemplary embodiment.
Fig. 3 is a schematic structural diagram of an electronic device according to an exemplary embodiment.
Fig. 4 is a block diagram of an abnormal index extraction apparatus of an operating system according to an exemplary embodiment.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. The following description refers to the accompanying drawings in which the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with one or more embodiments of the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of one or more embodiments of the specification, as detailed in the claims which follow.
It should be noted that: in other embodiments, the steps of the corresponding methods are not necessarily performed in the order shown and described herein. In some other embodiments, the method may include more or fewer steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.
An Operating System (OS) is a computer program that manages computer hardware and software resources. The operating system needs to handle basic transactions such as managing and configuring memory, prioritizing system resources, controlling input devices and output devices, operating the network, and managing the file system. The operating system also provides an operator interface for the user to interact with the system.
Examples of operating systems include, but are not limited to, the Windows operating system, the Linux operating system, the IOS operating system, the Android operating system, or the UNIX operating system, among others. The operating system is a foundation for running all application software, and whether the running is healthy or not directly relates to the running quality of the application software.
The embodiment of the specification aims at a multi-operating-system scene, and provides an abnormal index extraction method of an operating system to actively check the operating system, wherein the operating state of the operating system can be determined according to the acquisition duration of at least one resource of the operating system, the operating condition of the operating system can be actively identified, and whether the operating condition of the operating system is healthy or not is preliminarily determined from the resource dimension. And under the condition that the running state is in a non-healthy state, counting at least one index to be checked of the operating system, acquiring at least one reference index of other operating systems in which the running state is in a healthy state, performing outlier analysis on the index to be checked according to the reference index, and automatically extracting the abnormal index of the operating system. The embodiment provides a technical scheme for actively troubleshooting an operating system, and the method provided by the embodiment of the present specification can be executed in real time or periodically to actively monitor the running condition of the operating system, and automatically extract an abnormal index of the operating system under the condition that the operating system is abnormal, so that the abnormal reason is located based on the abnormal index, and the automatic extraction process of the abnormal index is also beneficial to improving the troubleshooting efficiency.
In an exemplary embodiment, the method for extracting an abnormal index of an operating system provided in this specification may be applied to a distributed system as shown in fig. 1, where the distributed system is a loosely coupled system formed by interconnecting a plurality of data nodes 100 through a communication line, each data node 100 is a computing device capable of independently processing a certain transaction, and the computing device may be a physical device or a virtual machine. The data node is installed with one or more operating systems. Any data node in the distributed system may execute the method for extracting the abnormal index of the operating system provided in the embodiment of the present description. In one example, the data node is installed with a computer program product, which includes computer programs/instructions, and when the computer programs/instructions are executed by a processor in the data node, the steps in the method for extracting an abnormal index of an operating system provided by the embodiment of the present disclosure can be implemented.
In another exemplary embodiment, the method for extracting an abnormal index of an operating system provided in the embodiment of the present specification may be applied to an electronic device having a plurality of operating systems. In one example, the electronic device is installed with a computer program product, which includes computer programs/instructions, and when the computer programs/instructions are executed by a processor in the electronic device, the steps in the method for extracting an abnormal index of an operating system provided in the embodiment of the present disclosure can be implemented.
Referring to fig. 2, fig. 2 is a schematic flowchart illustrating an abnormal indicator extraction method for an operating system according to an embodiment of the present disclosure, where the method is applicable to a scenario with multiple operating systems; the method comprises the following steps:
in step S101, an operating state of the operating system is determined according to an acquisition duration of at least one resource of the operating system.
In step S102, if the operating state is a non-healthy state, at least one to-be-checked index of the operating system is collected, and at least one reference index of other operating systems with the operating state being a healthy state is obtained.
In step S103, performing outlier analysis on the to-be-checked index according to the reference index, and extracting an abnormal index of the operating system in an unhealthy state.
In this embodiment, the data node in the distributed system or the electronic device with multiple operating systems may execute step S101 in real time or periodically according to an actual situation to actively perform health detection on the operating condition of the operating system, so as to ensure good operation of the operating system, and execute step S102 and step S103 when the operating condition of the operating system is an unhealthy condition, so as to automatically extract an abnormal index of the operating system and locate an abnormal cause of the operating system.
In some embodiments, whether the operating state of the operating system is healthy may be determined based on a length of time that one or more resources of the operating system are acquired. For example, the acquisition duration of each resource of the operating system may be counted, and if the acquisition duration of each resource of the operating system is less than or equal to a preset duration, which indicates that the resource acquisition efficiency of the operating system is high, it may be determined that the operating state of the operating system is a healthy state; on the contrary, if the acquisition duration of at least one resource is longer than the preset duration, which indicates that the resource acquisition efficiency of the operating system is low, it may be determined that the operating state of the operating system is an unhealthy state. It can be understood that the preset durations corresponding to the resources may be the same or different, and the specific value of the preset duration may be specifically set according to an actual application scenario, for example, the preset duration corresponding to a certain resource, such as a CPU resource, may be determined according to an average value of the acquisition durations of the CPU resources in one or more operating systems in a healthy state screened by an operator.
In some embodiments, the at least one resource of the operating system includes, but is not limited to: processing resources (e.g., CPU resources), memory resources, network resources, IO resources, or lock resources, etc.
Illustratively, the acquisition duration of the processing resource includes: the duration of a request to allocate processing resources, i.e., the duration of time from the sending of a processing resource allocation request to the allocation of the processing resources in place. The shorter the acquisition duration of the processing resources is, the more idle processing resources are available, and the healthier the operating system is, whereas the longer the acquisition duration is, the more occupied the processing resources may be, and the unhealthy the operating system may be.
Illustratively, the obtaining duration of the memory resource includes: the duration of the memory resource allocation request, i.e. the duration from sending the memory resource allocation request to the time the memory resource is allocated in place. The shorter the acquisition time of the memory resource is, the more idle memory resources are indicated, and the healthier the operating system is, whereas the longer the acquisition time is, the more occupied the memory resource is, and the unhealthy the operating system is.
Illustratively, the obtaining duration of the network resource includes: the transceiving time of the data packet, i.e. the time required from sending the request data packet to receiving the response data packet. The shorter the acquisition time of the network resource is, the better the network performance of the operating system is, and the healthier the operating state of the operating system is, and on the contrary, the longer the acquisition time is, the worse the network performance of the operating system is, and the operating system may be in an unhealthy state.
Illustratively, the duration of the IO (input output) resource acquisition includes: a response time period of the input operation and/or the output operation. The shorter the response time, the better the read-write performance of the operating system, and the healthier the running state of the operating system, whereas the longer the response time, the worse the read-write performance of the operating system, and the operating system may be in an unhealthy state.
Illustratively, the acquisition duration of the lock resource includes: a wait duration for accessing a shared resource; and under the condition that the shared resource is occupied, other processes accessing the shared resource enter a waiting state until the shared resource is released. If the waiting time is shorter, the running state of the operating system is healthier, and if the waiting time is longer, the operating system may be in an unhealthy state, which means that a large number of processes may need to access the shared resource.
In some embodiments, the operating system includes a kernel mode and a user mode, and a program product for implementing the method for extracting an abnormal index of the operating system provided in the embodiments of the present specification may be installed in the user mode, and further, the acquisition duration of at least one resource of the operating system may be collected in the kernel mode based on a BPF technology. The object of the beckeley Packet Filter (abbreviated as BPF) is to provide a method for filtering packets and to avoid useless duplication of packets from kernel mode to user mode. The filtering function of the BPF is realized in the form of an interpreter of a BPF virtual machine language, a program of the language can capture packet data, take arithmetic operation on the data in the packet, compare the result with a constant or the data in the packet or a test bit in the result, and decide whether to accept or reject the packet according to the comparison result. A program code for counting the acquisition duration of at least one resource of the operating system can be injected into the kernel state based on the BPF technology without restarting the operating system, and then the statistical information is reported to the program product in the user state after the statistical information is obtained by running the program code and counting the acquisition duration of at least one resource of the operating system.
Of course, other manners may also be adopted to obtain the acquisition duration of the at least one resource of the operating system; in one example, for example, the kernel mode of the operating system may be reprogrammed to make the kernel mode of the operating system have a function of reporting the acquisition duration of the at least one resource of the operating system to the user mode; in another example, for example, a section of code may be injected into the operating system in a patching manner, and then the program code is run, and after the obtaining duration of at least one resource of the operating system is counted to obtain the statistical information, the statistical information is reported to the program product in the user mode. The patching mode is different from the process of collecting the acquisition duration of at least one resource of the operating system by using the BPF technology, the program codes implanted based on the BPF technology can be executed by a virtual machine, the operating system is not changed, the safety of the operating system can be ensured to the maximum extent, and other functions of the operating system cannot be influenced under the condition that the injected program codes are wrong; the patched program code is directly executed by a processor in the physical machine, and other functions of the operating system can be affected in the case of errors of the injected program code.
In some embodiments, if the operating state of the operating system is an unhealthy state, at least one to-be-checked index in the operating system may be collected for outlier detection, and for example, at least one to-be-checked index in the operating system may be collected, and at least one reference index in at least one other operating system whose operating state is a healthy state may be obtained, and then an outlier analysis may be performed on the to-be-checked index according to the reference index, so as to extract an abnormal index of the operating system. The reference index and the index to be checked belong to the same type of index, such as CPU utilization rate. In the embodiment, the abnormal indexes are quickly screened out through the outlier analysis process, and the reason for the abnormal operation system is locked.
In general, if the content indicated by the index to be checked is not abnormal, the difference between the index to be checked and one or more reference indexes belonging to the same type is not large, for example, the difference may be within a preset difference range, and if the content indicated by the index to be checked is abnormal, the difference between the index to be checked and one or more reference indexes belonging to the same type may be large. The process of the outlier analysis is to detect whether an outlier index which has a larger difference with one or more reference indexes belonging to the same type exists in the index to be detected, and if so, the outlier index is also the abnormal index of the operation index.
In a possible implementation manner, the difference between the index to be checked and the reference index may be compared, and if the difference between the index to be checked and the reference index exceeds a preset difference, it may be determined that the index to be checked is an abnormal index of the operating system in a non-healthy state, and the abnormal index is extracted so as to locate an abnormal reason of the operating system. The reference index is an index of the same type as the index to be checked, such as CPU utilization, memory utilization, or packet loss.
In some embodiments, in the case that the operating state of the operating system is an unhealthy state, all indexes to be checked of the operating system may be collected, and all reference indexes in at least one other operating system whose operating state is a healthy state may be obtained; and aiming at each index to be detected, performing outlier analysis on the index to be detected according to at least one reference index belonging to the same type as the index to be detected, thereby extracting the abnormal index of the operating system based on the outlier analysis result. In this embodiment, all indexes to be checked of the operating system are checked, which is beneficial to accurately positioning the root cause of the problem.
In an example, taking the CPU usage rate as an example, for example, the CPU usage rate to be checked of the operating system in the unhealthy state may be compared with one or more reference CPU usage rates of one or more other operating systems in the healthy state, and if differences between the CPU usage rate to be checked and the one or more reference CPU usage rates are greater than a preset difference, it is determined that the CPU usage rate to be checked is an abnormal indicator of the operating system in the unhealthy state, and the CPU resource of the operating system in the unhealthy state has an abnormal problem.
In other embodiments, considering that the number of all indexes to be checked of the operating system is large, there may be tens or even hundreds, and if all indexes to be checked are separately subjected to outlier analysis, the analysis efficiency is low. Therefore, in order to improve the analysis efficiency, the present specification considers that the health degree of the operating system is determined based on the acquisition duration of each resource, if the acquisition duration of a certain resource is greater than the preset duration, it may be determined that the resource is abnormal, for example, a memory resource is taken as an example, if the acquisition duration of the memory resource is greater than the preset duration, it indicates that there may be an abnormality in the memory resource of the operating resource, it may be considered that the resource corresponding to the acquisition duration greater than the preset duration is determined as a resource to be checked, and at least one index related to the resource to be checked is counted; acquiring at least one reference index related to the resource to be checked in other operating systems with the running state being a healthy state; and then performing outlier analysis on the index to be checked according to the reference index, and extracting the abnormal index of the operating system in the unhealthy state. In the embodiment, only at least one index to be checked corresponding to the resource to be checked is analyzed, so that the number of indexes to be analyzed is reduced, and the analysis efficiency is improved.
The indexes corresponding to the resources can be determined according to a mapping relation between pre-stored resources and the indexes, the mapping relation can be determined manually, or the electronic device can be automatically determined according to related resources required to be involved when counting related indexes. Indexes corresponding to different resources can have the same part or different parts, and can be specifically set according to actual application scenes.
Illustratively, the processing resource is a CPU resource, and the indexes corresponding to the CPU resource include, but are not limited to, a CPU utilization rate (such as a core-mode CPU utilization rate, a user-mode CPU utilization rate, a CPU utilization rate of each application, and a soft interrupt CPU utilization rate, etc.), a CPU cache capacity, a CPU operating frequency, a percentage of a time for the CPU to run a task using an internal virtual machine to occupy a CPU total time, a number of context switches per second, a rate (CPU iowait time) for a process to wait for a disk I/O to make the CPU in an idle state, or a number of runnable queues.
For example, the index corresponding to the memory resource includes, but is not limited to, a memory usage rate (such as a core-mode memory usage rate, a user-mode memory usage rate, and a cache usage rate), a memory operating frequency, a storage speed, an available memory size, an exchange partition size, or a free exchange partition percentage.
Illustratively, the index corresponding to the network resource includes, but is not limited to, an Incoming byte per second (Incoming network traffic), an Outgoing byte per second (Outgoing network traffic), a packet loss rate, a transmission rate, a bandwidth, a throughput, or a delay, and the like.
For example, the index corresponding to the IO resource includes, but is not limited to, a remaining percentage of a disk, a disk read speed, a disk write speed, a disk inode (index node) usage rate, or a disk IO read latency period.
For example, the corresponding indicators of the lock resource include, but are not limited to, an occupancy rate of the shared resource, a number of processes that need to use the shared resource, and the like.
In some embodiments, taking the data node applied to a distributed system in the method provided in the embodiments of the present specification as an example, the distributed system includes a plurality of data nodes, and the data nodes are installed with one or more operating systems; any data node can count the number of data nodes of the operating system in the non-healthy state in the distributed system, and if the number of the data nodes of the operating system in the non-healthy state exceeds a preset threshold value, it can be considered that resources of the distributed system are overloaded, and prompt information needing capacity expansion is output; if the number of data nodes of the operating system in the unhealthy state is lower than the preset threshold, the steps S102 and S103 may be executed to perform outlier analysis on the to-be-inspected index of the operating system in the unhealthy state, so as to extract an abnormal index of the operating system in the unhealthy state, so as to locate a cause of an abnormality of the operating system. It can be understood that the preset threshold may be specifically set according to an actual application scenario, and this embodiment does not limit this, for example, the preset threshold is 60% or 70% of the total number of data nodes in the distributed system. According to the embodiment, the active monitoring of the operating system of each data node in the distributed system is realized, and the healthy and stable operation of the distributed system is guaranteed.
Fig. 3 is a schematic block diagram of an electronic device according to an exemplary embodiment. Referring to fig. 3, at the hardware level, the apparatus includes a processor 302, an internal bus 304, a network interface 306, a memory 308, and a non-volatile memory 310, but may also include hardware required for other services. One or more embodiments of the present description may be implemented in software, such as by processor 302 reading a corresponding computer program from non-volatile storage 310 into memory 308 and then executing. Of course, besides software implementation, the one or more embodiments in this specification do not exclude other implementations, such as logic devices or combinations of software and hardware, and so on, that is, the execution subject of the following processing flow is not limited to each logic unit, and may also be hardware or logic devices. The electronic device may be installed with a plurality of operating systems.
Referring to fig. 4, the abnormal index extracting apparatus of the operating system may be applied to the electronic device shown in fig. 3 or the data node shown in fig. 1 to implement the technical solution of the present specification. The abnormality index extraction device of the operating system may include:
the running state determining module 21 is configured to determine a running state of the operating system according to an acquisition duration of at least one resource of the operating system.
The index obtaining module 22 is configured to, if the operating state is an unhealthy state, collect at least one to-be-checked index of the operating system, and obtain at least one reference index of other operating systems of which the operating state is a healthy state.
And the abnormal index extraction module 23 is configured to perform outlier analysis on the to-be-checked index according to the reference index, and extract an abnormal index of the operating system in an unhealthy state.
In some embodiments, the operation state determining module 21 is specifically configured to: if the acquisition duration of each resource of the operating system is less than or equal to the preset duration, determining that the running state of the operating system is a healthy state; and if the acquisition duration of at least one resource is longer than the preset duration, determining that the running state of the operating system is in an unhealthy state.
In some embodiments, the at least one resource of the operating system comprises: processing resources, memory resources, network resources, IO resources, or lock resources; the acquisition duration of the processing resource comprises: a duration of a request to allocate processing resources; the obtaining duration of the memory resource comprises: requesting the duration of memory resource allocation; the obtaining duration of the network resource comprises: the receiving and sending time length of the data packet; the duration of the IO resource acquisition includes: a response time length of the input operation and/or the output operation; the acquisition duration of the lock resource comprises: a wait time to access a shared resource; wherein at least two processes accessing the shared resource simultaneously have a mutual exclusion relationship.
In some embodiments, the index obtaining module 22 is specifically configured to: determining the resource corresponding to the acquisition time length greater than the preset time length as a resource to be checked, and counting at least one index to be checked related to the resource to be checked; the reference index comprises a reference index related to the resource to be checked in other operating systems with the running state being a healthy state.
In some embodiments, the abnormality index extraction module 23 is specifically configured to: and if the difference between the index to be detected and the reference index exceeds a preset difference, determining that the index to be detected is an abnormal index of the operating system in an unhealthy state.
In some embodiments, the operating system includes a kernel mode and a user mode; the acquisition duration of at least one resource of the operating system is collected in the kernel state based on the BPF technology.
In some embodiments, the method is applied to a distributed system, wherein the distributed system comprises a plurality of data nodes, and one or more operating systems are installed on the data nodes; or the method is applied to an electronic device having a plurality of operating systems.
In some embodiments, the method is applied to a distributed system, and the apparatus further comprises a quantity statistics module for: if the number of the data nodes of the operating system in the unhealthy state in the distributed system exceeds a preset threshold value, outputting prompt information needing capacity expansion; and if the number of the data nodes of the operating system in the unhealthy state in the distributed system is lower than the preset threshold value, extracting the abnormal index of the operating system in the unhealthy state.
In some embodiments, this specification embodiment also provides an electronic device, which is installed with a plurality of operating systems, and includes:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of the above by executing the executable instructions.
In some embodiments, referring to fig. 1, an embodiment of the present specification further provides a distributed system, where the distributed system includes a plurality of data nodes, and the data nodes are installed with one or more operating systems;
any of the data nodes is configured to perform the method of any of the above.
In an exemplary embodiment, a non-transitory computer-readable storage medium comprising instructions, such as a memory comprising instructions, executable by a processor of an apparatus to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.
A non-transitory computer readable storage medium, instructions in the storage medium, when executed by a processor of a terminal, enable the terminal to perform the above-described method.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, laptop, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
In a typical configuration, a computer includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic disk storage, quantum memory, graphene-based storage media or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing description of specific embodiments has been presented for purposes of illustration and description. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
The terminology used in the description of the one or more embodiments is for the purpose of describing the particular embodiments only and is not intended to be limiting of the description of the one or more embodiments. As used in one or more embodiments of the present specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It should be understood that although the terms first, second, third, etc. may be used in one or more embodiments of the present description to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of one or more embodiments herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The above description is only for the purpose of illustrating the preferred embodiments of the one or more embodiments of the present disclosure, and is not intended to limit the scope of the one or more embodiments of the present disclosure, and any modifications, equivalent substitutions, improvements, etc. made within the spirit and principle of the one or more embodiments of the present disclosure should be included in the scope of the one or more embodiments of the present disclosure.
Claims (12)
1. An abnormal index extraction method of an operating system is applied to a multi-operating system scene; the method comprises the following steps:
determining the running state of the operating system according to the acquisition duration of at least one resource of the operating system;
if the running state is a non-healthy state, acquiring at least one index to be checked of the operating system and acquiring at least one reference index in other operating systems with the running state being a healthy state;
and performing outlier analysis on the index to be checked according to the reference index, and extracting the abnormal index of the operating system in the unhealthy state.
2. The method of claim 1, wherein determining the operating state of the operating system according to the acquisition duration of the at least one resource of the operating system comprises:
if the acquisition duration of each resource of the operating system is less than or equal to the preset duration, determining that the operating state of the operating system is a healthy state;
and if the acquisition duration of at least one resource is longer than the preset duration, determining that the running state of the operating system is in an unhealthy state.
3. The method of claim 1, the at least one resource of the operating system comprising: processing resources, memory resources, network resources, IO resources, or lock resources;
the acquisition duration of the processing resource comprises: a duration of the request to allocate processing resources;
the obtaining duration of the memory resource comprises: requesting the duration of memory resource allocation;
the obtaining duration of the network resource comprises: the receiving and sending time length of the data packet;
the duration of the IO resource acquisition includes: a response time length of the input operation and/or the output operation;
the acquisition duration of the lock resource comprises: a wait time to access a shared resource; wherein at least two processes accessing the shared resource simultaneously have a mutual exclusion relationship.
4. The method of claim 1, the collecting at least one index of the operating system to be checked, comprising:
determining the resource corresponding to the acquisition time length greater than the preset time length as a resource to be checked, and collecting at least one index to be checked related to the resource to be checked;
the reference index comprises a reference index related to the resource to be checked in other operating systems with the running state being a healthy state.
5. The method of claim 1, the outlier analysis of the metric to be examined from the reference metric comprising:
and if the difference between the index to be detected and the reference index exceeds a preset difference, determining that the index to be detected is an abnormal index of the operating system in an unhealthy state.
6. The method of claim 1, the operating system comprising a kernel mode and a user mode;
the acquisition duration of at least one resource of the operating system is collected in the kernel state based on the BPF technology.
7. The method of claim 1, applied to a distributed system comprising a plurality of data nodes, wherein the data nodes are installed with one or more operating systems; or alternatively
The method is applied to the electronic equipment with a plurality of operating systems.
8. The method of claim 7, applied to a distributed system, further comprising:
if the number of the data nodes of the operating system in the unhealthy state in the distributed system exceeds a preset threshold value, outputting prompt information needing capacity expansion;
and if the number of the data nodes of the operating system in the unhealthy state in the distributed system is lower than the preset threshold value, extracting the abnormal index of the operating system in the unhealthy state.
9. An abnormal index extraction device of an operating system is applied to a multi-operating system scene; the device comprises:
the operation state determining module is used for determining the operation state of the operating system according to the acquisition duration of at least one resource of the operating system;
the index acquisition module is used for acquiring at least one index to be checked of the operating system and acquiring at least one reference index of other operating systems with the operating state being a healthy state if the operating state is a non-healthy state;
and the abnormal index extraction module is used for performing outlier analysis on the index to be checked according to the reference index and extracting the abnormal index of the operating system in the unhealthy state.
10. An electronic device having a plurality of operating systems installed thereon, the electronic device comprising:
a processor;
a memory for storing processor-executable instructions;
wherein the processor implements the method of any one of claims 1 to 8 by executing the executable instructions.
11. A distributed system comprises a plurality of data nodes, wherein one or more operating systems are installed on the data nodes;
any of the data nodes is configured to perform the method of any of claims 1 to 8.
12. A computer readable storage medium having stored thereon computer instructions which, when executed by a processor, carry out the steps of the method according to any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210289111.3A CN114741218A (en) | 2022-03-22 | 2022-03-22 | Method, device, equipment, system and medium for extracting abnormal index of operating system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210289111.3A CN114741218A (en) | 2022-03-22 | 2022-03-22 | Method, device, equipment, system and medium for extracting abnormal index of operating system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114741218A true CN114741218A (en) | 2022-07-12 |
Family
ID=82277400
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210289111.3A Pending CN114741218A (en) | 2022-03-22 | 2022-03-22 | Method, device, equipment, system and medium for extracting abnormal index of operating system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114741218A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116107843A (en) * | 2023-04-06 | 2023-05-12 | 阿里云计算有限公司 | Method for determining performance of operating system, task scheduling method and equipment |
-
2022
- 2022-03-22 CN CN202210289111.3A patent/CN114741218A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116107843A (en) * | 2023-04-06 | 2023-05-12 | 阿里云计算有限公司 | Method for determining performance of operating system, task scheduling method and equipment |
CN116107843B (en) * | 2023-04-06 | 2023-09-26 | 阿里云计算有限公司 | Method for determining performance of operating system, task scheduling method and equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108874624B (en) | Server, method for monitoring Java process and storage medium | |
US9658910B2 (en) | Systems and methods for spatially displaced correlation for detecting value ranges of transient correlation in machine data of enterprise systems | |
CN112346829B (en) | Method and equipment for task scheduling | |
CN106452818B (en) | Resource scheduling method and system | |
US8132170B2 (en) | Call stack sampling in a data processing system | |
CN110232010A (en) | A kind of alarm method, alarm server and monitoring server | |
CN111309644B (en) | Memory allocation method and device and computer readable storage medium | |
CN113067875B (en) | Access method, device and equipment based on dynamic flow control of micro-service gateway | |
CN114070755B (en) | Virtual machine network flow determination method and device, electronic equipment and storage medium | |
CN114741218A (en) | Method, device, equipment, system and medium for extracting abnormal index of operating system | |
JP2019012477A (en) | Diagnostic program, diagnostic method, and diagnostic apparatus | |
CN108667740A (en) | The method, apparatus and system of flow control | |
CN109408302B (en) | Fault detection method and device and electronic equipment | |
CN116185799A (en) | Interrupt time acquisition method, device, system, communication equipment and storage medium | |
CN110830385A (en) | Packet capturing processing method, network equipment, server and storage medium | |
CN107193721B (en) | Method and device for generating log | |
CN113220495B (en) | Method and device for processing process abnormal event, electronic equipment and storage medium | |
CN116126621A (en) | Task monitoring method of big data cluster and related equipment | |
CN117632454A (en) | Linux operating system resource monitoring method and device, storage medium and electronic equipment | |
CN112882854B (en) | Method and device for processing request exception | |
CN114285647A (en) | Method and device for detecting abnormal access of bucket in distributed object storage system | |
CN108989461B (en) | Multi-control storage balancing method, device, terminal and storage medium | |
CN115460622A (en) | Modeling method, network element data processing method and device, electronic equipment and medium | |
CN116107843B (en) | Method for determining performance of operating system, task scheduling method and equipment | |
US20230071976A1 (en) | Virtual function performance analysis system and analysis method thereof |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |