CN112463422A - Internet of things fault operation and maintenance method and device, computer equipment and storage medium - Google Patents

Internet of things fault operation and maintenance method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN112463422A
CN112463422A CN202011219233.2A CN202011219233A CN112463422A CN 112463422 A CN112463422 A CN 112463422A CN 202011219233 A CN202011219233 A CN 202011219233A CN 112463422 A CN112463422 A CN 112463422A
Authority
CN
China
Prior art keywords
fault
maintenance
state value
data
internet
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011219233.2A
Other languages
Chinese (zh)
Inventor
董学帅
陈旃
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Cormorant Technology Suzhou Co ltd
Original Assignee
Cormorant Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Cormorant Technology Suzhou Co ltd filed Critical Cormorant Technology Suzhou Co ltd
Priority to CN202011219233.2A priority Critical patent/CN112463422A/en
Publication of CN112463422A publication Critical patent/CN112463422A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/252Integrating or interfacing systems involving database management systems between a Database Management System and a front-end application
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/10Detection; Monitoring
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16YINFORMATION AND COMMUNICATION TECHNOLOGY SPECIALLY ADAPTED FOR THE INTERNET OF THINGS [IoT]
    • G16Y40/00IoT characterised by the purpose of the information processing
    • G16Y40/20Analytics; Diagnosis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The application discloses a fault operation and maintenance method, a fault operation and maintenance device, computer equipment and a storage medium for the Internet of things, and relates to the technical field of Internet of things networks.

Description

Internet of things fault operation and maintenance method and device, computer equipment and storage medium
Technical Field
The application relates to the technical field of internet of things, in particular to a method and a device for operation and maintenance of faults of the internet of things, computer equipment and a storage medium.
Background
In a traditional internet of things network, an existing operation and maintenance management platform and method pay attention to infrastructure (calculation, storage and network), depend on experience of personnel, and are a manual operation and maintenance method. The maintenance personnel are required to overhaul the equipment/system after the equipment/system is in fault or fails, maintenance process data such as fault reason judgment, fault reason test, maintenance method and the like are not registered and summarized, an effective maintenance fault data pool and a knowledge base cannot be formed, all maintenance methods and maintenance processes are repeatedly operated when the same fault occurs next time, and knowledge sharing cannot be performed on the previous fault maintenance methods. Therefore, the current operation and maintenance workload is large, and the efficiency is low.
Disclosure of Invention
The embodiment of the application aims to provide an Internet of things fault operation and maintenance method to solve the problem of low network fault operation and maintenance efficiency in an Internet of things network.
In order to solve the technical problem, an embodiment of the present application provides an internet of things fault operation and maintenance method, including the following steps:
acquiring fault data;
inputting fault data into a trained fault diagnosis model to determine a fault type corresponding to the fault data;
and triggering operation and maintenance measures corresponding to the fault types according to a preset fault operation and maintenance strategy.
Further, acquiring fault data comprises:
acquiring running data of an application node in real time, wherein the application node comprises a virtual server and a real server, and the running data comprises a state value;
determining an abnormal state value from the state values according to a preset normal state reference table;
and taking the abnormal state value and the application node corresponding to the abnormal state value as fault data, and storing the fault data in the storage and maintenance database.
Further, when the fault diagnosis model is a deep neural network model, inputting the fault data into the trained fault diagnosis model to determine the fault type corresponding to the fault data includes:
detecting the abnormal state value through a deep neural network model;
if the known fault type corresponding to the abnormal state value is detected, outputting the fault type;
and if the known fault type corresponding to the abnormal state value cannot be detected, taking the abnormal state value as an abnormal variable value, and storing the abnormal variable value into the operation and maintenance database.
Further, the deep neural network model training method includes:
marking the abnormal variable value;
inputting the marked abnormal variable values into a deep neural network model for training, and outputting results;
and if the matching probability of the output result is smaller than a preset matching threshold, performing parameter adjustment on the deep neural network model, and stopping training until the matching probability of the output result reaches the matching threshold.
Further, inputting the marked abnormal variable values into a deep neural network model for training, and outputting the result comprises:
calculating a fault weight value of each fault type according to the abnormal variable value and each fault type preset in the deep neural network model;
and taking the fault type with the fault weight value larger than a preset weight threshold value as an output result.
In order to solve the technical problem, an embodiment of the present application further provides an internet of things fault operation and maintenance device, where the internet of things fault operation and maintenance device includes:
the acquisition module is used for acquiring fault data;
the diagnosis module is used for inputting the fault data into the trained fault diagnosis model so as to determine the fault type corresponding to the fault data;
and the operation and maintenance module is used for triggering operation and maintenance measures corresponding to the fault types according to a preset fault operation and maintenance strategy.
Further, the acquisition module includes:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the running data of an application node in real time, the application node comprises a virtual server and a real server, and the running data comprises a state value;
the determining unit is used for determining an abnormal state value from the state values according to a preset normal state reference table;
and the storage unit is used for taking the abnormal state value and the application node corresponding to the abnormal state value as fault data and storing the fault data in the storage and maintenance database.
Further, when the fault diagnosis model is a deep neural network model, the diagnosis module includes:
the detection unit is used for detecting the abnormal state value through the deep neural network model;
the output unit is used for outputting the fault type if the known fault type corresponding to the abnormal state value is detected;
and the exception unit is used for taking the abnormal state value as an exception variable value and storing the exception variable value into the operation maintenance database if the known fault type corresponding to the abnormal state value is not detected.
Further, the internet of things fault operation and maintenance device further comprises a fault detection unit;
the marking module is used for marking the abnormal variable value;
the training module is used for inputting the marked abnormal variable values into the deep neural network model for training and outputting results;
and the adjusting module is used for adjusting parameters of the deep neural network model if the matching probability of the output result is smaller than a preset matching threshold value, and stopping training until the matching probability of the output result reaches the matching threshold value.
Further, the training module comprises:
the calculating unit is used for calculating a fault weight value of each fault type according to the abnormal variable value and each fault type preset in the deep neural network model;
and the result unit is used for taking the fault type with the fault weight value larger than a preset weight threshold value as an output result.
In order to solve the technical problem, an embodiment of the present application further provides a computer device, which includes a memory and a processor, where the memory stores a computer program, and the processor implements the steps of the internet of things fault operation and maintenance method when executing the computer program.
In order to solve the technical problem, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the steps of the internet of things fault operation and maintenance method are implemented.
Compared with the prior art, the embodiment of the application mainly has the following beneficial effects:
the fault data are input into the trained fault diagnosis model by acquiring the fault data to determine the fault type corresponding to the fault data, and the operation and maintenance measures corresponding to the fault type are triggered according to the preset fault operation and maintenance strategy, so that the workload of manual operation and maintenance is reduced, and the operation and maintenance efficiency is improved.
Drawings
In order to more clearly illustrate the solution of the present application, the drawings needed for describing the embodiments of the present application will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;
fig. 2 is a schematic structural diagram of an embodiment of a fault operation and maintenance method for the internet of things provided by the present application;
fig. 3 is a flowchart of an embodiment of a method for operation and maintenance of faults of the internet of things provided by the present application
Fig. 4 is a schematic structural diagram of an embodiment of an internet-of-things fault operation and maintenance device provided by the present application;
FIG. 5 is a schematic block diagram of one embodiment of a computer device provided herein.
Detailed Description
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs; the terminology used in the description of the application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application; the terms "including" and "having," and any variations thereof, in the description and claims of this application and the description of the above figures are intended to cover non-exclusive inclusions. The terms "first," "second," and the like in the description and claims of this application or in the above-described drawings are used for distinguishing between different objects and not for describing a particular order.
Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.
In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a web browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to a smart phone, a tablet computer, an E-book reader, an MP3 player (Moving Picture E internet of things failure operation and maintenance property Group Audio Layer III, motion Picture experts compression standard Audio Layer 3), an MP4 player (Moving Picture E internet of things failure operation and maintenance property Group Audio Layer IV, motion Picture experts compression standard Audio Layer 4), a laptop portable computer, a desktop computer, and the like.
The server 105 may be a server providing various services, such as a background server providing support for pages displayed on the terminal devices 101, 102, 103.
It should be noted that, the internet of things fault operation and maintenance method provided by the embodiment of the present application generally includes a server/a terminalTerminal end DeviceAnd executing, correspondingly, the internet of things fault operation and maintenance device is generally arranged in the server/terminal equipment.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continuing reference to fig. 2, fig. 2 is a schematic structural diagram of an embodiment of the internet of things fault operation and maintenance method provided by the present application. The internet of things network is used as an application scene of the embodiment, the main body for executing the internet of things fault operation and maintenance method is an internet of things application delivery device, the internet of things application delivery device comprises an internet of things network data analysis module, a machine learning module and an artificial intelligence control module, operation data of the internet of things network data analysis module is analyzed through the internet of things network data analysis module and converted into a data format meeting requirements, fault data are screened out from the operation data, the fault data are input into a trained fault diagnosis model in the machine learning module to obtain fault types of the fault data, and operation and maintenance measures corresponding to the fault types are triggered from fault operation and maintenance strategies in the artificial intelligence control module to achieve network operation and maintenance intellectualization.
Further, the internet of things network of the present application is a seven-layer network, and the seven-layer network refers to an Open System Interconnection (OSI) model, wherein the OSI enables reliable communication between different networks of different systems through seven layered structural models, and the OSI includes an application layer, a presentation layer, a session layer, a transport layer, a network layer, a data link layer, and a physical layer.
With continued reference to fig. 3, a flowchart of an embodiment of a method for fault operation and maintenance of the internet of things of the present application is shown. The Internet of things fault operation and maintenance method comprises the following steps:
s301: and acquiring fault data.
In the embodiment of the application, the running data of each application delivery node, the running data of the virtual server and the running data of each real server are collected in real time, wherein the running data are performance data, and the performance data comprise state values, such as CPU utilization rate, memory utilization rate, connection number and broadband number.
Further, the specific process of acquiring the fault data includes:
acquiring running data of an application node in real time, wherein the application node comprises a virtual server and a real server, and the running data comprises a state value;
determining an abnormal state value from the state values according to a preset normal state reference table;
and taking the abnormal state value and the application node corresponding to the abnormal state value as fault data, and storing the fault data in the storage and maintenance database.
Specifically, when the application node is a virtual server, each virtual server is collected in real time, and the collected operation data includes: the method comprises the steps of obtaining a data packet, a data packet and a data packet, wherein the data packet comprises a current active connection number, a current active data packet number, a current active bandwidth number, a maximum/average/minimum response time of a virtual server to a first data packet requested by a client, a maximum/average/minimum response time of a virtual server to all data packets returned after the client requests are processed, a current RTT (round trip time) maximum/average/minimum, a WAF (worldwide interoperability traffic) invasion number and a WAF processing number, wherein the RTT is composed of a propagation time (multiprotocol) of a link, a processing time of an end system and a queuing and processing time in a buffer of a router.
Specifically, when the application node is a real server, the current active connection number, the current active bandwidth number, the maximum/average/minimum response time of the real server to the first data packet requested by the client, the maximum/average/minimum response time of all data packets returned after the real server processes the client request, the current RTT maximum/average/minimum, and the number of connections maintained in the current session are collected.
Specifically, when the application node is an application delivery node, the CPU utilization, the memory utilization, the total connection number, the total packet number, the total bandwidth, the real-time status of the virtual server such as UP/DOWN/disable, the real-time status of the virtual service sub-node such as UP/DOWN/disable, the real-time status of the real server such as UP/DOWN/disable, the current active connection number, the current active packet number, and the current active bandwidth number are collected in real time.
The preset normal state reference table is the operation data of different application nodes in normal operation, that is, the normal state reference table records the normal state value ranges of different application nodes. Comparing the currently acquired state value of the application node according to the normal state value range of the application node, and when the state value is not in the normal state value range, marking the state value as an abnormal state value, for example, the marking mode can be a label carrying a preset field or color marking on the field where the state value is located, so as to realize accurate positioning of a fault source.
Furthermore, the abnormal state value and the application node generating the abnormal state value are used as fault data to be stored in the operation and maintenance database, so that the corresponding fault type can be more accurately identified by adopting a fault diagnosis model subsequently, and a data basis is provided for training the fault diagnosis model.
S302: and inputting the fault data into a trained fault diagnosis model to determine the fault type corresponding to the fault data.
The trained fault diagnosis model comprises interest information such as an abnormal state value and a corresponding application node of each fault type, and a weight value corresponding to the interest information of each fault type, wherein the weight values corresponding to the same interest information of different fault types can be the same or different.
Further, determining the fault type corresponding to the fault data specifically includes:
detecting the abnormal state value through a deep neural network model;
if the known fault type corresponding to the abnormal state value is detected, outputting the fault type;
and if the known fault type corresponding to the abnormal state value cannot be detected, taking the abnormal state value as an abnormal variable value, and storing the abnormal variable value into the operation and maintenance database.
In this embodiment, the detecting the abnormal state value by using the deep neural network model includes detecting that the abnormal state value matches with the interesting information of the corresponding fault type, for example, the abnormal state value is m that the number of active data packets of the virtual server is, the response time of the virtual server to the first data packet requested by the client is n, the interesting information of the fault type a includes m and n, and the interesting information of the fault type B includes m, n and x ', then the fault type a and the fault type B corresponding to the abnormal state value at the beginning are detected at this time, and further, m and n are calculated according to the weights in the fault type a and the fault type B, a fault weight value a ' of the abnormal state value in the fault type a and a fault weight value B ' of the fault type B are calculated, and whether the fault weight value a ' and the fault weight value B ' are greater than a preset weight threshold is compared, if the abnormal state value is greater than the preset value, it indicates that the abnormal state value is a key factor of the fault type fault, that is, the abnormal state value affects normal operation of the application node, and further a fault type corresponding to the abnormal state value can be determined, the output fault types can be at least 1, for example, the fault weight value a 'and the fault weight value B' are both greater than a preset weight threshold, the fault type a and the fault type B are respectively output, and the output can be in an order of the magnitude of the fault weight. The failure weight value may be calculated by multiplication and accumulation, for example, the weight values a and b corresponding to m and n in the failure type a, respectively, and the failure weight value a' may be m × a + n × b, where a and b are known set percentage values and both satisfy the range of (0, 1).
Further, if the known fault type corresponding to the abnormal state value is not detected, for example, the abnormal state value includes that the WAF intrusion amount is x, and the interested information does not exist in all the fault types, the abnormal state value is used as an abnormal variable value, an alarm can be further issued at this time to prompt manual diagnosis and manual operation and maintenance, and a new fault type redefined after the manual operation and maintenance and a corresponding operation and maintenance measure are stored in the operation and maintenance database together to be used as sample data for subsequently adjusting the deep neural network model.
Further, the deep neural network model training method includes:
marking the abnormal variable value;
inputting the marked abnormal variable values into a deep neural network model for training, and outputting results;
and if the matching probability of the output result is smaller than a preset matching threshold, performing parameter adjustment on the deep neural network model, and stopping training until the matching probability of the output result reaches the matching threshold.
In the embodiment of the application, the training data for training the deep neural network model may be obtained from the operation and maintenance database, and the historical operation data may be obtained from each application delivery node according to days, months and years, where the historical operation data includes the number of connections, the number of packets, the bandwidth, TPS (Transactions Per Second), and the number of SSL (secure socket protocol) connections; historical operating data of each virtual server comprises the number of active connections, the number of active bandwidths, the maximum/average/minimum response time of the virtual server to a first data packet requested by a client, the maximum/average/minimum response time of all data packets returned by the virtual server after the client requests are processed, the maximum/average/minimum RTT, the number of WAF intrusions, the number of WAF processes and WEB data compression; the historical operating data for each real server includes the number of active connections, the number of active bandwidths, the maximum/average/minimum response time of the real server to the first data packet requested by the client, the maximum/average/minimum response time of all data packets returned after the real server processes the client request, the maximum/average/minimum RTT, session holding connections, and the like.
Specifically, inputting the marked abnormal variable values into a deep neural network model for training, and outputting the result comprises the following steps:
calculating a fault weight value of each fault type according to the abnormal variable value and each fault type preset in the deep neural network model;
and taking the fault type with the fault weight value larger than a preset weight threshold value as an output result.
In the embodiment of the present application, based on supervised learning, the abnormal variable values are labeled, the labeling mode may be in a label form, and the labeled abnormal variable values are input into an initial deep neural network model for training, the initial neural network model is configured with parameters corresponding to known fault types, and obtains a trained output result, the output result is matched with a diagnostic type corresponding to a labeled state variable value, so as to obtain a matching probability, where the matching probability is a fault type comparison accuracy, for example, comparing fault types corresponding to 1000 labeled abnormal variable values output with fault types actually corresponding, and if the comparison is consistent, indicating that the abnormal variable values are successfully compared with the corresponding fault types, for example, there are 950, the matching probability is p ÷ 1000 × 100% ═ 95%, and if the preset matching threshold is 98%, performing parameter adjustment on the deep neural network model, the parameter adjustment can be that the abnormal variable value which is processed by clustering and has the obvious interesting information characteristic is used as the abnormal state value of a new fault type in the deep neural network model, the weight corresponding to the abnormal state value is distributed, the marked abnormal variable value is input into the adjusted deep neural network model again for training, the abnormal variable value is input into the adjusted deep neural network model and can be better matched with the corresponding fault type, the deep neural network model is continuously trained in the way until the matching probability of the output result reaches the matching threshold value, so that more abnormal state values can identify the corresponding fault type, and the identification accuracy and efficiency are improved.
S303: and triggering operation and maintenance measures corresponding to the fault types according to a preset fault operation and maintenance strategy.
The preset fault operation and maintenance strategy is operation and maintenance measures corresponding to each known fault type and pre-recorded in an operation and maintenance database by operation and maintenance personnel, and the operation and maintenance measures comprise operation and maintenance code segments, a capacity expansion tool, a file call tool and the like. For example, when the fault type is link jitter, packet loss, or delay, the capacity expansion may be performed by using a capacity expansion tool if the operation and maintenance measure is obtained as capacity expansion.
In the embodiment of the application, the fault data is acquired and input into the trained fault diagnosis model to determine the fault type corresponding to the fault data, and the operation and maintenance measures corresponding to the fault type are triggered according to the preset fault operation and maintenance strategy, so that the workload of manual operation and maintenance is reduced, and the operation and maintenance efficiency is improved.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and can include the processes of the embodiments of the methods described above when the computer program is executed. The storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a Random Access Memory (RAM).
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
With further reference to fig. 4, as an implementation of the method shown in fig. 3, the present application provides an embodiment of an internet of things fault operation and maintenance device, where the embodiment of the device corresponds to the embodiment of the method shown in fig. 3, and the device may be specifically applied to various electronic devices.
As shown in fig. 4, the internet of things fault operation and maintenance device described in this embodiment includes: an acquisition module 401, a diagnosis module 402 and an operation and maintenance module 403. Wherein:
an obtaining module 401, configured to obtain fault data;
a diagnosis module 402, configured to input fault data into a trained fault diagnosis model to determine a fault type corresponding to the fault data;
and the operation and maintenance module 403 is configured to trigger an operation and maintenance measure corresponding to the fault type according to a preset fault operation and maintenance policy.
Further, the acquisition module includes:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring the running data of an application node in real time, the application node comprises a virtual server and a real server, and the running data comprises a state value;
the determining unit is used for determining an abnormal state value from the state values according to a preset normal state reference table;
and the storage unit is used for taking the abnormal state value and the application node corresponding to the abnormal state value as fault data and storing the fault data in the storage and maintenance database.
Further, when the fault diagnosis model is a deep neural network model, the diagnosis module includes:
the detection unit is used for detecting the abnormal state value through the deep neural network model;
the output unit is used for outputting the fault type if the known fault type corresponding to the abnormal state value is detected;
and the exception unit is used for taking the abnormal state value as an exception variable value and storing the exception variable value into the operation maintenance database if the known fault type corresponding to the abnormal state value is not detected.
Further, the internet of things fault operation and maintenance device further comprises a fault detection unit;
the marking module is used for marking the abnormal variable value;
the training module is used for inputting the marked abnormal variable values into the deep neural network model for training and outputting results;
and the adjusting module is used for adjusting parameters of the deep neural network model if the matching probability of the output result is smaller than a preset matching threshold value, and stopping training until the matching probability of the output result reaches the matching threshold value.
Further, the training module comprises:
the calculating unit is used for calculating a fault weight value of each fault type according to the abnormal variable value and each fault type preset in the deep neural network model;
and the result unit is used for taking the fault type with the fault weight value larger than a preset weight threshold value as an output result.
With regard to the internet of things fault operation and maintenance device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment related to the method, and will not be elaborated here.
In order to solve the technical problem, an embodiment of the present application further provides a computer device. Referring to fig. 5, fig. 5 is a block diagram of a basic structure of a computer device according to the present embodiment.
The computer device 5 comprises a memory 51, a processor 52, a network interface 53 communicatively connected to each other via a system bus. It is noted that only a computer device 5 having components 51-53 is shown, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead. As will be understood by those skilled in the art, the computer device is a device capable of automatically performing numerical calculation and/or information processing according to a preset or stored instruction, and the hardware includes, but is not limited to, a microprocessor, an Application Specific Integrated Circuit (ASIC), a Programmable Gate Array (FPGA), a Digital Signal Processor (DSP), an embedded device, and the like.
The computer device can be a desktop computer, a notebook, a palm computer, a cloud server and other computing devices. The computer equipment can carry out man-machine interaction with a user through a keyboard, a mouse, a remote controller, a touch panel or voice control equipment and the like.
The memory 51 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., an SD or D internet of things fault operation and maintenance memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. In some embodiments, the memory 51 may be an internal storage unit of the computer device 5, such as a hard disk or a memory of the computer device 5. In other embodiments, the memory 51 may also be an external storage device of the computer device 5, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the computer device 5. Of course, the memory 51 may also comprise both an internal storage unit of the computer device 5 and an external storage device thereof. In this embodiment, the memory 51 is generally used for storing an operating system installed in the computer device 5 and various types of application software, such as program codes of the internet of things fault operation and maintenance method. Further, the memory 51 may also be used to temporarily store various types of data that have been output or are to be output.
The processor 52 may be a Central Processing Unit (CPU), controller, microcontroller, microprocessor, or other data Processing chip in some embodiments. The processor 52 is typically used to control the overall operation of the computer device 5. In this embodiment, the processor 52 is configured to execute the program code stored in the memory 51 or process data, for example, execute the program code of the internet of things fault operation and maintenance method.
The network interface 53 may comprise a wireless network interface or a wired network interface, and the network interface 53 is generally used for establishing communication connections between the computer device 5 and other electronic devices.
The present application further provides another embodiment, that is, a computer-readable storage medium is provided, where an internet of things fault operation and maintenance program is stored in the computer-readable storage medium, and the internet of things fault operation and maintenance program is executable by at least one processor, so as to cause the at least one processor to execute the steps of the internet of things fault operation and maintenance method as described above.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
It is to be understood that the above-described embodiments are merely illustrative of some, but not restrictive, of the broad invention, and that the appended drawings illustrate preferred embodiments of the invention and do not limit the scope of the invention. This application is capable of embodiments in many different forms and is provided for the purpose of enabling a thorough understanding of the disclosure of the application. Although the present application has been described in detail with reference to the foregoing embodiments, it will be apparent to one skilled in the art that the present application may be practiced without modification or with equivalents of some of the features described in the foregoing embodiments. All equivalent structures made by using the contents of the specification and the drawings of the present application are directly or indirectly applied to other related technical fields and are within the protection scope of the present application.

Claims (10)

1. An Internet of things fault operation and maintenance method is characterized by comprising the following steps:
acquiring fault data;
inputting the fault data into a trained fault diagnosis model to determine a fault type corresponding to the fault data;
and triggering operation and maintenance measures corresponding to the fault types according to a preset fault operation and maintenance strategy.
2. The internet of things fault operation and maintenance method according to claim 1, wherein the obtaining fault data comprises:
acquiring running data of an application node in real time, wherein the application node comprises a virtual server and a real server, and the running data comprises a state value;
determining an abnormal state value from the state values according to a preset normal state reference table;
and taking the abnormal state value and the application node corresponding to the abnormal state value as fault data, and storing the fault data in a storage and maintenance database.
3. The internet of things fault operation and maintenance method according to claim 2, wherein when the fault diagnosis model is a deep neural network model, the inputting the fault data into the trained fault diagnosis model to determine the fault type corresponding to the fault data comprises:
detecting the abnormal state value through the deep neural network model;
if the known fault type corresponding to the abnormal state value is detected, outputting the fault type;
and if the known fault type corresponding to the abnormal state value cannot be detected, taking the abnormal state value as an abnormal variable value, and storing the abnormal variable value into the operation and maintenance database.
4. The internet of things fault operation and maintenance method according to claim 3, wherein the deep neural network model is trained in a manner that:
marking the abnormal variable value;
inputting the marked abnormal variable values into a deep neural network model for training, and outputting results;
and if the matching probability of the output result is smaller than a preset matching threshold, performing parameter adjustment on the deep neural network model, and stopping training until the matching probability of the output result reaches the matching threshold.
5. The internet of things fault operation and maintenance method according to claim 4, wherein the inputting the marked abnormal variable values into the deep neural network model for training and outputting the result comprises:
calculating a fault weight value of each fault type according to the abnormal variable value and each fault type preset in the deep neural network model;
and taking the fault type with the fault weight value larger than a preset weight threshold value as an output result.
6. The utility model provides a thing networking trouble fortune dimension device which characterized in that includes:
the acquisition module is used for acquiring fault data;
the diagnosis module is used for inputting the fault data into a trained fault diagnosis model so as to determine the fault type corresponding to the fault data;
and the operation and maintenance module is used for triggering the operation and maintenance measures corresponding to the fault types according to a preset fault operation and maintenance strategy.
7. The internet of things fault operation and maintenance device according to claim 6, wherein the obtaining module comprises:
the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is used for acquiring running data of an application node in real time, the application node comprises a virtual server and a real server, and the running data comprises a state value;
the determining unit is used for determining an abnormal state value from the state values according to a preset normal state reference table;
and the storage unit is used for taking the abnormal state value and the application node corresponding to the abnormal state value as fault data and storing the fault data in the storage and maintenance database.
8. The internet of things fault operation and maintenance device according to claim 7, wherein when the fault diagnosis model is a deep neural network model, the diagnosis module comprises:
the detection unit is used for detecting the abnormal state value through the deep neural network model;
the output unit is used for outputting the fault type if the known fault type corresponding to the abnormal state value is detected;
and the exception unit is used for taking the abnormal state value as an exception variable value and storing the exception variable value into the operation and maintenance database if the known fault type corresponding to the abnormal state value is not detected.
9. A computer device comprising a memory having a computer program stored therein and a processor that when executed implements the steps of the internet of things fault operation and maintenance method as claimed in any one of claims 1 to 5.
10. A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when executed by a processor, the computer program implements the steps of the internet of things fault operation and maintenance method according to any one of claims 1 to 5.
CN202011219233.2A 2020-11-04 2020-11-04 Internet of things fault operation and maintenance method and device, computer equipment and storage medium Pending CN112463422A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011219233.2A CN112463422A (en) 2020-11-04 2020-11-04 Internet of things fault operation and maintenance method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011219233.2A CN112463422A (en) 2020-11-04 2020-11-04 Internet of things fault operation and maintenance method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN112463422A true CN112463422A (en) 2021-03-09

Family

ID=74835111

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011219233.2A Pending CN112463422A (en) 2020-11-04 2020-11-04 Internet of things fault operation and maintenance method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN112463422A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435307A (en) * 2021-06-23 2021-09-24 国网天津市电力公司 Operation and maintenance method, system and storage medium based on visual identification technology
CN114500235A (en) * 2022-04-06 2022-05-13 深圳粤讯通信科技有限公司 Communication equipment safety management system based on Internet of things
CN114490303A (en) * 2022-04-07 2022-05-13 阿里巴巴达摩院(杭州)科技有限公司 Fault root cause determination method and device and cloud equipment
CN116155956A (en) * 2023-04-18 2023-05-23 武汉森铂瑞科技有限公司 Multiplexing communication method and system based on gradient decision tree model
CN113434326B (en) * 2021-07-12 2024-05-31 国泰君安证券股份有限公司 Method and device for realizing network system fault positioning based on distributed cluster topology, processor and computer readable storage medium thereof

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109163913A (en) * 2018-09-30 2019-01-08 深圳市元征科技股份有限公司 A kind of Diagnosis method of automobile faults and relevant device
CN110362068A (en) * 2019-08-02 2019-10-22 苏州容思恒辉智能科技有限公司 A kind of mechanical equipment fault method for early warning, system and readable storage medium storing program for executing based on industrial Internet of Things

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109163913A (en) * 2018-09-30 2019-01-08 深圳市元征科技股份有限公司 A kind of Diagnosis method of automobile faults and relevant device
CN110362068A (en) * 2019-08-02 2019-10-22 苏州容思恒辉智能科技有限公司 A kind of mechanical equipment fault method for early warning, system and readable storage medium storing program for executing based on industrial Internet of Things

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113435307A (en) * 2021-06-23 2021-09-24 国网天津市电力公司 Operation and maintenance method, system and storage medium based on visual identification technology
CN113434326B (en) * 2021-07-12 2024-05-31 国泰君安证券股份有限公司 Method and device for realizing network system fault positioning based on distributed cluster topology, processor and computer readable storage medium thereof
CN114500235A (en) * 2022-04-06 2022-05-13 深圳粤讯通信科技有限公司 Communication equipment safety management system based on Internet of things
CN114490303A (en) * 2022-04-07 2022-05-13 阿里巴巴达摩院(杭州)科技有限公司 Fault root cause determination method and device and cloud equipment
CN114490303B (en) * 2022-04-07 2022-07-12 阿里巴巴达摩院(杭州)科技有限公司 Fault root cause determination method and device and cloud equipment
CN116155956A (en) * 2023-04-18 2023-05-23 武汉森铂瑞科技有限公司 Multiplexing communication method and system based on gradient decision tree model
CN116155956B (en) * 2023-04-18 2023-08-22 武汉森铂瑞科技有限公司 Multiplexing communication method and system based on gradient decision tree model

Similar Documents

Publication Publication Date Title
CN112463422A (en) Internet of things fault operation and maintenance method and device, computer equipment and storage medium
CN112052111B (en) Processing method, device and equipment for server abnormity early warning and storage medium
CN110347694B (en) Equipment monitoring method, device and system based on Internet of things
CN113038396B (en) Scheduling method, device and equipment of short message channel and storage medium
CN114039918A (en) Information age optimization method and device, computer equipment and storage medium
CN113986564A (en) Application data flow monitoring method and device, computer equipment and medium
CN112184169A (en) Dynamic planning method, device, equipment and storage medium for user backlogs
WO2019209503A1 (en) Unsupervised anomaly detection for identifying anomalies in data
CN114095567A (en) Data access request processing method and device, computer equipment and medium
CN111754241A (en) User behavior perception method, device, equipment and medium
CN112395351A (en) Visual identification group complaint risk method, device, computer equipment and medium
CN115237724A (en) Data monitoring method, device, equipment and storage medium based on artificial intelligence
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN112969172B (en) Communication flow control method based on cloud mobile phone
JP2022000775A (en) Test method, device and apparatus for traffic flow monitoring measurement system
CN113242301A (en) Method and device for selecting real server, computer equipment and storage medium
CN116843395A (en) Alarm classification method, device, equipment and storage medium of service system
WO2023066258A1 (en) Data processing method and apparatus for private data, computer device and medium
CN109889399A (en) RocketMQ client connection number monitoring method, device, electronic equipment and storage medium
CN115222181B (en) Robot operation state monitoring system and method
CN114090407A (en) Interface performance early warning method based on linear regression model and related equipment thereof
CN114124460A (en) Industrial control system intrusion detection method and device, computer equipment and storage medium
CN110719260B (en) Intelligent network security analysis method and device and computer readable storage medium
CN115941322B (en) Attack detection method, device, equipment and storage medium based on artificial intelligence
CN109547290B (en) Cloud platform garbage data detection processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20210309

RJ01 Rejection of invention patent application after publication