CN117632564A

CN117632564A - Global health management method and system based on container and object model operating system

Info

Publication number: CN117632564A
Application number: CN202311692174.4A
Authority: CN
Inventors: 吴翔虎; 魏明; 寇光丽
Original assignee: Shenzhen Academy of Aerospace Technology
Current assignee: Shenzhen Academy of Aerospace Technology
Priority date: 2023-12-11
Filing date: 2023-12-11
Publication date: 2024-03-01

Abstract

The invention discloses a global health management method and a global health management system based on a container and an object model operating system, wherein the global health management method comprises the following steps: acquiring an error, and judging whether the error is synchronous with a currently executed container; if not, searching a preset system security management table, and performing a first system level recovery measure; if yes, inquiring an item related to the current container in a preset multi-container safety management table to obtain an error level; if the error level is the system level, searching a preset multi-container security management table by taking the system level error as a range, and carrying out second system level recovery measures; if the error level is a non-system level, judging the error level of the error in a preset container security management table by taking the non-system level error as a range; if the error is at the container level, searching a preset container security management table, and carrying out related container level recovery measures according to the searching result; if the error is at the object level, the error is passed to the error management object for processing. The invention achieves the purpose of ensuring the global safety of the system.

Description

Global health management method and system based on container and object model operating system

Technical Field

The invention relates to the technical field of container and operating system management, in particular to a global health management method and system based on a container and an object model operating system.

Background

The container may be considered as an execution unit of application software, and various kinds of subsystem application software may be deployed inside different containers. To facilitate the independent running of various applications in multiple operating systems on the same physical computer without being affected by other programs, virtual machine techniques are typically employed to create a container model to implement an ARM container, and to package a set of fixed DSP processors into one container on a software level to implement a DSP container. And an operating system kernel based on an object model is carried in the ARM container, and the joint scheduling is carried out through two-stage operating systems between containers and in the container.

Virtualization (Virtualization) refers to the use of computer software technology to simulate the hardware resources of a physical computer as the resources of multiple devices. Virtualization technology has been widely used today on x86 server platforms. Along with the migration of the usage habit of the hardware platform, the application scenario and the demand of the virtualization technology are also continuously changed, so that in recent years, the popularization of mobile devices and the rapid increase of the number of internet clients make the problem of system software on ARM embedded hardware in the mobile internet industry important, and the virtualization technology is becoming an important solution in the development of the operating system of the mobile devices.

Among virtualization technologies, container (container) technology is an emerging virtualization scheme, and by virtue of its lightweight and high efficiency, it is becoming a new hotspot in the field of virtualization. In container virtualization technology, docker currently has the most hot container virtualization item. Docker acts as an open source container engine that can easily create a lightweight, portable, self-sufficient container for any application, whose core is a container technology-based virtualization solution.

Errors can occur in the running process of the container and object model operating system, but an effective technical means for diagnosing, classifying, responding and processing the errors occurring in the container and object model operating system and monitoring the running condition of the system is lacking at present, so that the global safety of the system is difficult to ensure.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides a global health management method and a global health management system based on a container and an object model operating system, which are used for solving the technical problem that the global safety of the system is difficult to guarantee due to the fact that an effective technical means is not available for the health management of the container and the object model operating system at present, so that the purpose of effectively managing the health of the container and the object model operating system and guaranteeing the global safety of the system is achieved.

In order to solve the problems, the technical scheme adopted by the invention is as follows:

a global health management method based on a container and an object model operating system, comprising the steps of:

acquiring errors based on a container and an object model operating system, and judging whether the errors are synchronous with a currently executed container;

if not, searching a preset system security management table, and carrying out a first system level recovery measure according to the searching result;

if yes, inquiring an item related to the current container in a preset multi-container safety management table, and obtaining an error level of the error in the preset multi-container safety management table;

if the error level is the system level, searching the preset multi-container security management table by taking the system level error as a range, and carrying out second system level recovery measures according to the search result;

if the error level is a non-system level, judging the error level of the error in a preset container security management table by taking the non-system level error as a range;

if the error is at the container level, searching the preset container security management table, and carrying out relevant container level recovery measures according to the search result;

and if the error is at the object level, the error is processed by an error management object.

As a preferred embodiment of the present invention, when determining that the error is at an error level in a preset container security management table, the method includes:

if the error is any one of a container level, an error management object is invalid and the error management object generates an error, the error is the container level;

if the error is at object level and an error management object is valid and the error is not generated by the error management object, the error is at object level.

As a preferred embodiment of the present invention, when retrieving a preset system security management table, it includes:

inquiring an item related to the current system in the preset system security management table, wherein the obtained retrieval result comprises the following steps: in the system initialization process, system configuration errors, system function execution errors, container scheduling errors and power failure occur;

when the first system level recovery measure is performed according to the search result, the method comprises the following steps:

if the system configuration is wrong in the system initialization process, outputting system error information, restarting the system or closing the system;

if the system function execution errors are the system function execution errors, outputting system error information, and not processing or restarting the system;

if errors occur in the dispatching of the containers, outputting system error information, and not processing, restarting or closing the containers;

if the power is lost, outputting system error information, and restarting or closing the system.

As a preferred embodiment of the present invention, when obtaining the error level of the error in the preset multi-container security management table, the method includes:

if the error is any one of a container manager software access exception, a system core hardware exception and a container manager processor exception, the error is at the system level;

when the second system level recovery measure is performed according to the search result, the method comprises the following steps:

if the access of the container manager software is abnormal, outputting system error information, and not processing or restarting the system;

if the system core hardware is abnormal, outputting system error information, restarting the system or stopping the system;

if the processor of the container manager is abnormal, outputting system error information, restarting the system or stopping the system.

As a preferred embodiment of the present invention, when retrieving the preset container security management table, the method includes:

inquiring an item related to the current container in the preset container security management table, wherein the obtained retrieval result comprises the following steps: ARM container level errors and DSP container level errors;

wherein the ARM container level error comprises: configuration errors during container initialization, container initialization errors and container internal mode switching errors;

the DSP container-level error includes: container startup configuration errors, container memory partition initialization errors, container response to inter-core interrupt errors of the host processor, container suspension and resume operation errors.

As a preferred embodiment of the present invention, when performing the related container-level restoration measure according to the search result, the method includes:

if any one of configuration errors, container starting configuration errors and memory partition initialization errors of the container is performed during container initialization, outputting container error information and stopping the container;

if any one of the container initialization error, the container internal mode switching error, the container response inter-core interrupt error of the main processor, the container pause and the operation recovery error is performed, the container error information is output, and the container is restarted.

As a preferred embodiment of the present invention, when the error is handled by an error management object, the method includes:

and managing the application software access exception, the system internal exception and the processor related hardware exception through the error management object.

In a preferred embodiment of the present invention, when managing an abnormality in access to application software, the method includes:

calling each parameter of the API through the error management object, and printing out error information for illegal parameters or errors lacking parameters and returning corresponding error codes; for the application program to try to access the illegal address, printing out error information and exiting the error management object;

wherein the application software access exception comprises: object stack overflow errors, object memory access errors.

In a preferred embodiment of the present invention, when managing an internal abnormality of a system, the system includes:

judging whether the internal abnormality of the system causes the processor to fail to work or enter an abnormal mode:

if the processor cannot work, directly outputting error information and resetting the processor;

outputting fault information according to the type of the abnormality for the processor to enter an abnormal mode, carrying out related abnormality processing, returning to a normal mode, or enabling the processor to execute a dead loop, stopping the object model operating system program at an abnormality processing program, and waiting for a user to check the fault;

wherein the system internal anomalies include: CPU fetch exception, resource access conflict exception, resource access exception, privilege exception, data abort exception;

when performing management of processor-related hardware exceptions, the method comprises:

enabling the object model operating system to enter a unified processor exception handler, printing fault information, and calling a stop function to stop the processor at the exception handler;

wherein the processor-related hardware exception comprises: interrupt controller exception for DSP, L1P memory protection exception, L1D memory protection exception, LL2 detects uncorrected bit errors.

A global health management system based on a container and object model operating system, comprising:

a first-level judging unit: the method comprises the steps of acquiring errors based on a container and an object model operating system, and judging whether the errors are synchronous with a currently executed container or not;

a first system error recovery unit: when the error is not synchronous with the current execution container, the error is used for searching a preset system security management table, and a first system level recovery measure is carried out according to a searching result;

a second-level judging unit: when the error is synchronous with the current execution container, the method is used for inquiring an item related to the current container in a preset multi-container safety management table and obtaining an error level of the error in the preset multi-container safety management table;

a second system error recovery unit: when the error level is the system level, the method is used for searching the preset multi-container security management table by taking the system level error as a range and carrying out second system level recovery measures according to the search result;

three-stage judging unit: when the error level is a non-system level, the method is used for judging the error level of the error in a preset container security management table by taking the non-system level error as a range;

a container error recovery unit: when the error is at the container level, the error is used for searching the preset container security management table and carrying out relevant container level recovery measures according to the searching result;

object error recovery unit: and when the error is at the object level, the error management module is used for processing the error by an error management object.

Compared with the prior art, the invention has the beneficial effects that:

(1) The invention divides the global health management of the container plus object model operating system into three layers: system level, container level, and object level; reporting and responding to errors and faults of a system level, a container level and an application object level through a hierarchical global health management mechanism, thereby effectively preventing the spread of the errors, faults and the like of the system;

(2) According to the invention, the first-level error judgment and processing of the system is firstly carried out, then the first-level error judgment and processing of the container is carried out, finally the first-level error judgment and processing of the application object is carried out, and the running condition of the container and the object model operating system is effectively monitored through the layer-by-layer progressive diagnosis, classification, response and error processing, so that the overall safety of the system is effectively ensured.

The invention is described in further detail below with reference to the drawings and the detailed description.

Drawings

FIG. 1 is a three-level management flow diagram of a global health management mechanism of an embodiment of the present invention;

FIG. 2 is a three-level management block diagram of a global health management mechanism of an embodiment of the present invention.

Detailed Description

The invention provides a global health management method based on a container and an object model operating system, which comprises the following steps:

step S1: acquiring errors based on the container and the object model operating system, and judging whether the errors are synchronous with the currently executed container;

step S2: if not, searching a preset system security management table, and carrying out a first system level recovery measure according to the searching result;

step S3: if yes, inquiring an item related to the current container in a preset multi-container safety management table, and obtaining an error level of an error in the preset multi-container safety management table;

step S31: if the error level is the system level, searching a preset multi-container security management table by taking the system level error as a range, and carrying out second system level recovery measures according to the search result;

step S32: if the error level is a non-system level, judging the error level of the error in a preset container security management table by taking the non-system level error as a range;

step S321: if the error is at the container level, searching a preset container security management table, and carrying out related container level recovery measures according to the searching result;

step S322: if the error is at the object level, the error is passed to the error management object for processing.

In the above step S32, when determining that the error is at the error level in the preset container security management table, the method includes:

if the error is any one of the container level, the error management object is invalid and the error management object generates the error, the error is the container level;

if the error is at object level and the error management object is valid and the error non-error management object is generated, the error is at object level.

In the step S2, when retrieving the preset system security management table, the method includes:

inquiring an item related to a current system in a preset system security management table, wherein the obtained retrieval result comprises: in the system initialization process, system configuration errors, system function execution errors, container scheduling errors and power failure occur;

In the step S3, when an error level of the error in the preset multi-container security management table is obtained, the method includes:

if the error is any one of the software access exception of the container manager, the hardware exception of the system core and the processor exception of the container manager, the error is at a system level;

In the step S321, when retrieving a preset container security management table, the method includes:

inquiring an item related to a current container in a preset container security management table, wherein the obtained retrieval result comprises the following steps: ARM container level errors and DSP container level errors;

DSP container-level errors include: container startup configuration errors, container memory partition initialization errors, container response to inter-core interrupt errors of the host processor, container suspension and resume operation errors.

In the step S321, when performing the related container-level restoration measure according to the search result, the method includes:

In step S322, when an error is handled as an error management object, the method includes:

management including application software access exceptions, system internal exceptions, and processor-related hardware exceptions is performed by the error management object.

Further, when managing the access exception of the application software, the method includes:

calling each parameter of the API through the error management object, printing out error information for illegal parameters or errors lacking parameters, and returning corresponding error codes; if the application program tries to access the illegal address, printing out error information and exiting the error management object;

Further, when managing the internal abnormality of the system, the method includes:

outputting fault information according to the type of the abnormality for the processor to enter an abnormal mode, carrying out related abnormal processing, returning to a normal mode, or enabling the processor to execute dead loop, stopping the object model operating system program at an abnormal processing program, and waiting for a user to check the fault;

wherein, the system internal exception includes: CPU fetch exception, resource access conflict exception, resource access exception, privilege exception, data abort exception;

wherein the processor-related hardware exception includes: interrupt controller exception for DSP, L1P memory protection exception, L1D memory protection exception, LL2 detects uncorrected bit errors.

Specifically, the stop function is a panic function.

The invention provides a global health management system based on a container and an object model operating system, which comprises: the device comprises a first-stage judging unit, a first system error recovery unit, a second-stage judging unit, a second system error recovery unit, a third-stage judging unit, a container error recovery unit and an object error recovery unit.

A first-level judging unit: for obtaining errors based on the container and the object model operating system, and determining whether the errors are synchronized with the currently executing container.

A first system error recovery unit: and when the error is not synchronous with the current execution container, the method is used for searching a preset system security management table and carrying out first system level recovery measures according to the searching result.

A second-level judging unit: and when the error is synchronous with the current execution container, the method is used for inquiring the table item related to the current container in the preset multi-container safety management table and obtaining the error level of the error in the preset multi-container safety management table.

A second system error recovery unit: and when the error level is at the system level, the system level error is used for searching a preset multi-container security management table by taking the system level error as a range, and performing a second system level recovery measure according to the search result.

Three-stage judging unit: when the error level is at a non-system level, the method is used for judging the error level of the error in a preset container security management table by taking the non-system level error as a range.

A container error recovery unit: and when the error is at the container level, the method is used for searching a preset container security management table and carrying out relevant container level recovery measures according to the searching result.

Object error recovery unit: when the errors are at the object level, for passing the errors to the error management object processing.

The following examples are further illustrative of the present invention, but the scope of the present invention is not limited thereto.

The global health management mechanism provided in this embodiment responds to errors as shown in fig. 1.

In the event that an object level error occurs, and in the event that the error management object is capable of handling, the error will be handled by the error management object.

The three-level management structure of the global health management mechanism provided in this embodiment is shown in fig. 2. Each level of management is specifically as follows:

system level health management is primarily to monitor and handle system level errors, including errors for all containers within the system.

When the error is out of sync with the currently executing container, the system level error includes:

1) System configuration errors occur in the system initialization process; 2) A system function execution error; 3) Errors in container scheduling; 4) Power down, etc. These errors can cause the system to malfunction or fail to operate. After the faults occur, the system prints out system fault information, and the processing measures mainly comprise: no handling, closing of containers, restarting of containers, closing of systems, restarting of systems, etc.

When an error is synchronized with a currently executing container, the system level error includes:

1) The container manager software accesses the exception; 2) Abnormal system core hardware; 3) The processor of the container manager is abnormal. After the faults occur, the system prints out system fault information, and the processing measures mainly comprise: unprocessed, restarting the system, stopping the system, etc.

The container-level health management comprises an ARM container and a DSP container.

ARM container level errors represent errors of an ARM container (virtual machine), including: 1) Configuration errors occur during container initialization; 2) Container initialization error; 3) A container internal mode switching error, etc. The ARM container level error response is mainly to query the security management table for response.

DSP container level errors represent errors of one DSP container, comprising: 1) Container start configuration errors; 2) Initializing the memory partition of the container to make errors; 3) The container responds to inter-core interrupt errors of the main processor; 4) Container pauses and resumes operation with errors, etc. The error response of the DSP container level is mainly to query the security management table for response.

Querying the security management table for response includes: outputting a container error message, stopping the container or restarting the container.

Object level health management is mainly the security management of an object model operating system, including application software access exception, system internal exception and processor related hardware exception management. The management of the abnormality is mainly to detect and process in time, and when the place capable of outputting the printing information is in error and falls into the abnormality, the place can be immediately jumped to the corresponding processing program to be executed, and the fault information is output. The object-level detection and processing of various anomalies are specifically as follows:

the application software access exception comprises an object stack overflow error, an object memory access error and the like, an internal interface of an operating system can detect each parameter of an API, print error information for illegal parameters or errors lacking parameters, and then return corresponding error codes. For applications attempting to access an illegitimate address, the system prints out the error message and exits the object.

Internal exceptions of the system include CPU instruction fetch exceptions, resource access conflict exceptions, resource access exceptions, privilege exceptions, data suspension exceptions and the like. If the processor is abnormal, the information is directly output, and then the processor is reset; for the processor to enter the abnormal mode, the fault information is required to be output firstly according to the type of the abnormality, then related abnormal processing is carried out, finally the normal mode is returned or the processor is allowed to execute the dead loop, the system program is stopped at the abnormal processing program, and the user waits for checking the fault.

Processor-related hardware exceptions include interrupt controller exceptions for DSPs, L1P memory protection exceptions, L1D memory protection exceptions, uncorrected bit errors detected by LL2, and the like. The faults are generally that the processor cannot work normally, the system enters a unified processor exception handler, fault information is printed, and a panic function is called to stop the processor at the exception handler.

The above embodiments are only preferred embodiments of the present invention, and the scope of the present invention is not limited thereto, but any insubstantial changes and substitutions made by those skilled in the art on the basis of the present invention are intended to be within the scope of the present invention as claimed.

Claims

1. A global health management method based on a container and an object model operating system, comprising the steps of:

2. The global health management method of a container and object model based operating system according to claim 1, wherein when determining the error level of the error in a preset container security management table, comprising:

3. The global health management method based on a container and object model operating system according to claim 1, wherein when retrieving a preset system security management table, comprising:

4. The global health management method based on a container and object model operating system according to claim 1, wherein when obtaining the error level of the error in the preset multi-container security management table, comprising:

5. The global health management method based on a container and object model operating system according to claim 1 or 2, characterized in that, when retrieving the preset container security management table, it comprises:

6. The global health management method based on a container and object model operating system according to claim 5, wherein when performing the related container-level restoration measure according to the search result, comprising:

7. The global health management method of a container and object model based operating system of claim 1, wherein when the error is handled by an error management object, comprising:

8. The global health management method based on a container and object model operating system according to claim 7, wherein when managing an access abnormality of application software, comprising:

9. The global health management method of a container and object model based operating system according to claim 7, wherein when managing internal anomalies of the system, comprising:

10. A global health management system based on a container and object model operating system, comprising: