CN118152215A - Cloud automation operation and maintenance method and system of terminal equipment, storage device and computer readable medium - Google Patents

Cloud automation operation and maintenance method and system of terminal equipment, storage device and computer readable medium Download PDF

Info

Publication number
CN118152215A
CN118152215A CN202410170352.5A CN202410170352A CN118152215A CN 118152215 A CN118152215 A CN 118152215A CN 202410170352 A CN202410170352 A CN 202410170352A CN 118152215 A CN118152215 A CN 118152215A
Authority
CN
China
Prior art keywords
cloud
terminal equipment
machine learning
model
learning model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410170352.5A
Other languages
Chinese (zh)
Inventor
曹祯庭
刘正方
赵梧初
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Nanshu Data Operation Research Institute Co ltd
Original Assignee
Nanjing Nanshu Data Operation Research Institute Co ltd
Filing date
Publication date
Application filed by Nanjing Nanshu Data Operation Research Institute Co ltd filed Critical Nanjing Nanshu Data Operation Research Institute Co ltd
Publication of CN118152215A publication Critical patent/CN118152215A/en
Pending legal-status Critical Current

Links

Abstract

The invention provides a cloud automation operation and maintenance method, a system, a storage device and a computer readable medium of terminal equipment, wherein a unified management platform of the terminal equipment is configured at the cloud, a series of automation systems, mechanisms and functions required by operation and maintenance are arranged at the management platform, software contents corresponding to the terminal equipment such as application programs, machine learning models and the like are packaged into independent containers by utilizing a container technology, and the cloud automation operation and maintenance method, the system, the storage device and the computer readable medium are uniformly managed and configured to the terminal equipment by utilizing a container arrangement technology, so that cross-platform and cross-system automation and efficient management from the cloud to the terminal equipment are realized.

Description

Cloud automation operation and maintenance method and system of terminal equipment, storage device and computer readable medium
Technical Field
The invention relates to the field of cloud computing, in particular to a cloud automation operation and maintenance method, a cloud automation operation and maintenance system, a storage device and a computer readable medium of terminal equipment.
Background
Containerization is a technique of software development deployment that is capable of packaging applications and their dependent items and execution environments into a single unit called a "container," which is a lightweight, portable unit of software that can be run in different work environments. The containerization technology is mainly used for improving portability and expandability of application programs so as to realize cross-operating system and cloud work of the programs.
In the process that various projects fall to the ground step by step, the problem of project maintenance gradually appears, a large number of terminal devices are often distributed in different areas in the same project, a large number of nodes need to be operated and maintained, daily maintenance and upgrading usually need to be implemented on site, and when terminal devices in a plurality of areas need to be maintained and upgraded at the same time, operation and maintenance personnel need to arrive at the plurality of areas, so that a large amount of time and labor cost are spent. Meanwhile, because the operating system versions and software architectures of different terminal devices are different, the upgrading, backup and recovery of each terminal device need customized processing, the operation and maintenance details are always mouth-to-mouth, and the old project on-site operation and maintenance documents are lost, so that the condition that operation and maintenance work is difficult to develop often occurs.
Disclosure of Invention
The invention aims to: the invention aims at providing a cloud automation operation and maintenance method of cross-platform, cross-system and highly-automated terminal equipment, which comprises the following steps:
And (3) building a management platform: establishing a management platform at a cloud end, and managing equipment configuration information and equipment monitoring information;
device discovery and registration: configuring a device discovery and registration mechanism on the management platform, and automatically completing discovery and registration when the terminal device is accessed;
establishing a monitoring system: deploying a monitoring system on the management platform, and collecting real-time data of terminal equipment;
remote management configuration: configuring a remote management function dynamically configured by a cloud end of the terminal equipment on a management platform;
Software and hardware update mechanism configuration: configuring a software and hardware updating mechanism on a management platform, and pushing updating of an application program, an operating system and firmware to terminal equipment in real time by a cloud end;
Fault diagnosis maintenance tool configuration: a fault diagnosis maintenance tool is configured on the management platform, fault diagnosis and repair of the terminal equipment are carried out on the cloud end, and fault diagnosis and repair can be carried out in a remote log mode, a remote command line mode and the like;
Program containerization: packaging the application program and the corresponding dependent items into independent containers by utilizing a containerization technology, and configuring the independent containers to a cloud;
Arranging a container: configuring a container arranging tool on the management platform, and automatically coordinating the deployment and operation of a plurality of independent containers on terminal equipment;
model containerization: packaging the machine learning model into independent containers by utilizing a containerization technology;
model configuration: configuring a machine learning model packaged into an independent container to a cloud end and synchronizing the machine learning model to corresponding terminal equipment;
Model management: and utilizing the management platform to intensively manage the machine learning model parameter configuration, and synchronizing the updated machine learning model parameter configuration to the terminal equipment.
Specifically, in the step of establishing the monitoring system, the monitoring system is further configured with an alarm mechanism, and the alarm information is displayed on the cloud when the collected terminal data information is abnormal.
Specifically, in the software and hardware update mechanism configuration step, the software and hardware update mechanism is further configured with a version rollback function, and rollback operation is performed when update fails.
Specifically, the management platform is further configured with a machine learning model scale adjustment mechanism, and the copy number of the machine learning model is automatically expanded or reduced according to the workload condition monitored in real time.
Specifically, the model configuration step includes the following substeps:
and (3) deriving a model: exporting the machine learning model into a format corresponding to a production environment of the terminal device, such as TensorFlow, savedModel, ONNX;
Inference service configuration: configuring reasoning services such as Web services, REST APIs and the like in a management platform;
and (3) production environment configuration: configuring a framework and software of a version corresponding to the machine learning model at the cloud;
and (3) safety setting: configuring an identity verification mechanism on a management platform;
optimizing production environment performance: optimizing the performance of the production environment of the terminal equipment;
Model monitoring and log recording: monitoring machine learning model operation indexes by using a monitoring system, and setting log records;
model version management: tracking real-time changes of the model and the reasoning service by using a software and hardware updating mechanism;
Model deployment: deploying a machine learning model by using a CI/CD automation tool, such as Docker, kubernetes;
Model gray level release: and gradually configuring the machine learning model into the production environment of the terminal equipment by using the gray level release strategy.
Specifically, in the model monitoring and logging step, the machine learning model operation index includes one or more of performance parameters, accuracy and hardware usage.
The invention also provides a cloud automation operation and maintenance system of the terminal equipment, which comprises the following steps:
the management platform module is used for managing equipment configuration information and equipment monitoring information in the cloud;
the program containerization module is used for packaging the application program and the dependent items into independent containers and configuring the independent containers to the cloud;
A container arrangement module for automating the coordination of the deployment and operation of a plurality of individual containers on the terminal device;
the model containerization module is used for packaging the machine learning model into an independent container and configuring the independent container to the cloud;
the model configuration module is used for configuring the machine learning model packaged into the independent container to the cloud end and synchronizing the machine learning model to the corresponding terminal equipment;
And the model management module is used for centrally managing the parameter configuration of the machine learning model and synchronizing the updated parameter configuration of the machine learning model to the terminal equipment.
Specifically, the management platform module comprises a device discovery and booklet injection module, a monitoring system sub-module, a remote management sub-module, a software and hardware update sub-module and a fault diagnosis maintenance sub-module, wherein the device discovery and registration sub-module is used for automatically discovering and registering terminal devices; the monitoring system sub-module collects real-time data of the terminal equipment; the remote management sub-module is used for dynamic cloud configuration of the terminal equipment; the software and hardware updating sub-module is used for pushing the updating of the application program, the operating system and the firmware to the terminal equipment in real time by the cloud; the fault diagnosis maintenance submodule is used for carrying out fault diagnosis, troubleshooting and repair of the terminal equipment at the cloud.
The invention also provides a device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, and is characterized in that the processor realizes the steps of the cloud automation operation and maintenance method of the terminal equipment when executing the computer program.
The invention also provides a computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the steps of the cloud automation operation and maintenance method of the terminal device.
The beneficial effects are that: compared with the prior art, the invention has the remarkable effects that: the unified management platform of the terminal equipment is configured at the cloud end, software contents corresponding to the terminal equipment such as application programs, machine learning models and the like are packaged into independent containers by utilizing a container technology, and the unified management and configuration of the terminal equipment at the cloud end are realized by utilizing a container arrangement technology, so that the cross-platform and cross-system automatic efficient management from the cloud end to the terminal equipment is realized.
Drawings
Fig. 1 is a flow chart of the present invention.
Fig. 2 is a flow chart of the inventive model configuration.
Detailed Description
The invention is further elucidated below in connection with the drawings and the detailed description.
Example 1:
Referring to fig. 1, the present embodiment provides a cloud automation operation and maintenance method of a terminal device, which includes the following steps:
step 1: and (3) building a management platform: a unified management platform is established at the cloud end, and equipment configuration information and equipment monitoring information are managed, so that the requirement that a user or operation and maintenance personnel views and manages the equipment information in real time is met;
Step 2: device discovery and registration: the management platform is configured with a device discovery and registration mechanism, so that when the terminal device is accessed, discovery and registration are automatically completed;
step 3: establishing a monitoring system: a monitoring system is deployed on the management platform, real-time data of the terminal equipment are collected, the monitoring system is further provided with an alarm mechanism, and alarm information is displayed on a cloud end when the collected terminal data information is abnormal;
Step 4: remote management configuration: configuring a remote management function of cloud dynamic configuration of the terminal equipment on a management platform, and updating configuration information of the equipment at any time according to requirements;
Step 5: software and hardware update mechanism configuration: a software and hardware updating mechanism is configured on the management platform, the cloud end pushes the update of an application program, an operating system and firmware to the terminal equipment in real time, a version rollback function is also configured on the mechanism, and rollback operation can be performed when the update fails;
Step 6: fault diagnosis maintenance tool configuration: a fault diagnosis maintenance tool is configured on a management platform, fault diagnosis and repair of terminal equipment are carried out on a cloud end, and fault diagnosis and repair can be carried out in a remote log mode, a remote command line mode and the like;
Step 7: program containerization: packaging the application program and the corresponding dependent items into independent containers by utilizing a containerization technology, and configuring the independent containers to a cloud;
Step 8: arranging a container: configuring a container arranging tool on the management platform, and automatically coordinating the deployment and operation of a plurality of independent containers on terminal equipment;
step 9: model containerization: packaging the machine learning model into independent containers by utilizing a containerization technology;
Step 10: model configuration: configuring a machine learning model packaged into an independent container to a cloud end and synchronizing the machine learning model to corresponding terminal equipment;
Step 11: model management: and utilizing the management platform to intensively manage the machine learning model parameter configuration, and synchronizing the updated machine learning model parameter configuration to the terminal equipment.
In this embodiment, the management platform is further configured with a machine learning model scale adjustment mechanism, and according to the workload condition monitored in real time, the number of copies of the machine learning model is automatically expanded or reduced, so as to cope with the change of the flow.
Referring to fig. 2, the embodiment provides a specific flow of the cloud automation operation and maintenance method model configuration step, which includes the following sub-steps:
step 101: and (3) deriving a model: deriving the machine learning model into SavedModel format;
Step 102: inference service configuration: configuring Web service and REST API reasoning service in a management platform;
Step 103: and (3) production environment configuration: configuring a framework and software of a version corresponding to the exported machine learning model at the cloud;
step 104: and (3) safety setting: configuring an identity verification mechanism on a management platform, and limiting access to a machine learning model;
step 105: optimizing production environment performance: optimizing the performance of the production environment of the terminal equipment, and ensuring that the model has good response time in real-time pushing;
Step 106: model monitoring and log recording: monitoring machine learning model operation indexes by using a monitoring system, wherein the machine learning model operation indexes comprise performance parameters, accuracy and hardware utilization rate, and log records are set so as to facilitate fault elimination when problems occur;
Step 107: model version management: tracking real-time changes of the model and the reasoning service by using a software and hardware updating mechanism, and setting a rollback strategy to cope with new problems possibly caused by the new model;
Step 108: model deployment: deploying a machine learning model by using an automation tool Docker;
step 109: model gray level release: and gradually configuring the machine learning model into the production environment of the terminal equipment by using the gray level release strategy.
The cloud automation operation and maintenance method of the terminal equipment provides a method for carrying out configuration, monitoring, updating and fault troubleshooting and repairing on a large number of built-in programs and machine learning models of the terminal equipment at the cloud, and is suitable for cross-platform automation operation and maintenance of multiple scenes and multiple systems.
Example 2:
the embodiment provides a cloud automation operation and maintenance system of terminal equipment, which comprises:
the management platform module is used for managing equipment configuration information and equipment monitoring information in the cloud;
the program containerization module is used for packaging the application program and the dependent items into independent containers and configuring the independent containers to the cloud;
A container arrangement module for automating the coordination of the deployment and operation of a plurality of individual containers on the terminal device;
the model containerization module is used for packaging the machine learning model into an independent container and configuring the independent container to the cloud;
the model configuration module is used for configuring the machine learning model packaged into the independent container to the cloud end and synchronizing the machine learning model to the corresponding terminal equipment;
And the model management module is used for centrally managing the parameter configuration of the machine learning model and synchronizing the updated parameter configuration of the machine learning model to the terminal equipment.
The management platform module comprises a device discovery and booklet injection module, a monitoring system sub-module, a remote management sub-module, a software and hardware updating sub-module and a fault diagnosis maintenance sub-module, wherein the device discovery and registration sub-module is used for automatically discovering and registering terminal devices; the monitoring system sub-module collects real-time data of the terminal equipment; the remote management sub-module is used for dynamic cloud configuration of the terminal equipment; the software and hardware updating sub-module is used for pushing the updating of the application program, the operating system and the firmware to the terminal equipment in real time by the cloud; the fault diagnosis maintenance submodule is used for carrying out fault diagnosis, troubleshooting and repair of the terminal equipment at the cloud.
The cloud end automatic operation and maintenance system of the terminal equipment provides an overall scheme for cloud end automatic operation and maintenance management of multiple terminal equipment, and a large number of terminal equipment can be automatically and intensively operated and maintained by configuring the system to the cloud end and accessing the terminal equipment.
Example 3:
The embodiment provides an apparatus, including a memory, a processor, and a computer program stored in the memory and capable of running on the processor, where the steps of the cloud automation operation and maintenance method of the terminal device in embodiment 1 are implemented when the processor executes the computer program.
Example 4:
the present embodiment provides a computer readable storage medium, on which a computer program is stored, where the computer program when executed by a processor implements the steps of the cloud automation operation and maintenance method of the terminal device described in embodiment 1.

Claims (10)

1. The cloud automation operation and maintenance method of the terminal equipment is characterized by comprising the following steps of:
And (3) building a management platform: establishing a management platform at a cloud end, and managing equipment configuration information and equipment monitoring information;
device discovery and registration: configuring a device discovery and registration mechanism on the management platform, and automatically completing discovery and registration when the terminal device is accessed;
establishing a monitoring system: deploying a monitoring system on the management platform, and collecting real-time data of terminal equipment;
remote management configuration: configuring a remote management function dynamically configured by a cloud end of the terminal equipment on a management platform;
Software and hardware update mechanism configuration: configuring a software and hardware updating mechanism on a management platform, and pushing updating of an application program, an operating system and firmware to terminal equipment in real time by a cloud end;
Fault diagnosis maintenance tool configuration: configuring a fault diagnosis maintenance tool on a management platform, and performing fault diagnosis and repair of terminal equipment on a cloud;
Program containerization: packaging the application program and the corresponding dependent items into independent containers by utilizing a containerization technology, and configuring the independent containers to a cloud;
Arranging a container: configuring a container arranging tool on the management platform, and automatically coordinating the deployment and operation of a plurality of independent containers on terminal equipment;
model containerization: packaging the machine learning model into independent containers by utilizing a containerization technology;
model configuration: configuring a machine learning model packaged into an independent container to a cloud end and synchronizing the machine learning model to corresponding terminal equipment;
Model management: and utilizing the management platform to intensively manage the machine learning model parameter configuration, and synchronizing the updated machine learning model parameter configuration to the terminal equipment.
2. The cloud automation operation and maintenance method according to claim 1, wherein: the model configuration step comprises the following substeps:
And (3) deriving a model: exporting the machine learning model into a format corresponding to a production environment of the terminal device;
inference service configuration: configuring an inference service in a management platform;
and (3) production environment configuration: configuring a framework and software of a version corresponding to the machine learning model at the cloud;
and (3) safety setting: configuring an identity verification mechanism on a management platform;
optimizing production environment performance: optimizing the performance of the production environment of the terminal equipment;
Model monitoring and log recording: monitoring machine learning model operation indexes by using a monitoring system, and setting log records;
model version management: tracking real-time changes of the model and the reasoning service by using a software and hardware updating mechanism;
Model deployment: deploying a machine learning model by using an automation tool;
Model gray level release: and gradually configuring the machine learning model into the production environment of the terminal equipment by using the gray level release strategy.
3. The cloud automation operation and maintenance method according to claim 1, wherein: in the step of establishing the monitoring system, the monitoring system is further provided with an alarm mechanism, and the alarm information is displayed on the cloud when the collected terminal data information is abnormal.
4. The cloud automation operation and maintenance method according to claim 1, wherein: in the software and hardware updating mechanism configuration step, the software and hardware updating mechanism is also configured with a version rollback function, and rollback operation is carried out when updating fails.
5. The cloud automation operation and maintenance method according to claim 1, wherein: the management platform is also provided with a machine learning model scale adjustment mechanism, and the copy number of the machine learning model is automatically expanded or reduced according to the workload condition monitored in real time.
6. The cloud automation operation and maintenance method according to claim 2, wherein: in the model monitoring and logging steps, the machine learning model operation index comprises one or more of performance parameters, accuracy and hardware utilization rate.
7. The cloud automation operation and maintenance system of the terminal equipment is characterized by comprising the following components:
the management platform module is used for managing equipment configuration information and equipment monitoring information in the cloud;
the program containerization module is used for packaging the application program and the dependent items into independent containers and configuring the independent containers to the cloud;
A container arrangement module for automating the coordination of the deployment and operation of a plurality of individual containers on the terminal device;
the model containerization module is used for packaging the machine learning model into an independent container and configuring the independent container to the cloud;
the model configuration module is used for configuring the machine learning model packaged into the independent container to the cloud end and synchronizing the machine learning model to the corresponding terminal equipment;
And the model management module is used for centrally managing the parameter configuration of the machine learning model and synchronizing the updated parameter configuration of the machine learning model to the terminal equipment.
8. The cloud automation operation and maintenance system of claim 7, wherein:
The management platform module comprises a device discovery and booklet injection module, a monitoring system sub-module, a remote management sub-module, a software and hardware updating sub-module and a fault diagnosis maintenance sub-module, wherein the device discovery and registration sub-module is used for automatically discovering and registering terminal devices; the monitoring system sub-module collects real-time data of the terminal equipment; the remote management sub-module is used for dynamic cloud configuration of the terminal equipment; the software and hardware updating sub-module is used for pushing the updating of the application program, the operating system and the firmware to the terminal equipment in real time by the cloud; the fault diagnosis maintenance submodule is used for carrying out fault diagnosis, troubleshooting and repair of the terminal equipment at the cloud.
9. An apparatus comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 6 when the computer program is executed.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
CN202410170352.5A 2024-02-06 Cloud automation operation and maintenance method and system of terminal equipment, storage device and computer readable medium Pending CN118152215A (en)

Publications (1)

Publication Number Publication Date
CN118152215A true CN118152215A (en) 2024-06-07

Family

ID=

Similar Documents

Publication Publication Date Title
CN107577475B (en) Software package management method and system of data center cluster system
CN103853595B (en) For replacing the method and system of virtual machine disks
US20180012145A1 (en) Machine learning based analytics platform
US20090024713A1 (en) Maintaining availability of a data center
CN108243012B (en) Charging application processing system, method and device in OCS (online charging System)
CN107656749A (en) A kind of device version management-control method and device
CN1497442A (en) Emulation system for multi-node process control system
CN103339612A (en) Dependability maintenance device, dependability maintenance system, malfunction supporting system, method for controlling dependability maintenance device, control program, computer readable recording medium recording control program
CN113254279B (en) Intelligent disaster recovery and backup management platform system
CN109144701A (en) A kind of task flow management method, device, equipment and system
US11087042B1 (en) Generation of a simulation plan and performance of a simulation based on the plan
US10372572B1 (en) Prediction model testing framework
CN113050929A (en) Intelligent contract development, operation and maintenance integrated platform based on HyperLegger Fabric
CN118152215A (en) Cloud automation operation and maintenance method and system of terminal equipment, storage device and computer readable medium
CN109150596B (en) SCADA system real-time data dump method and device
CN110764785A (en) Power industry cloud platform tool chain based on open source assembly and cloud platform operation and maintenance method
CN109814911A (en) Method, apparatus, computer equipment and storage medium for Manage Scripts program
CN109116818A (en) Real time data dump method and device when a kind of SCADA system upgrades
CN111913706B (en) Topology construction method of dispatching automation system, storage medium and computing equipment
CN109241029B (en) Method and device for realizing smooth migration of SCADA system database
CN113626044A (en) Service management method and device
CN102841842B (en) For the automation controller of next generation test system
CN113296825A (en) Application gray level publishing method and device and application publishing system
CN117851269B (en) Cloud-based automatic test environment management method and system
KR102637540B1 (en) System for configuring cloud computing environment and automating opertation based on standard stack and intelligent operator

Legal Events

Date Code Title Description
PB01 Publication