CN112783682B - Abnormal automatic repairing method based on cloud mobile phone service - Google Patents

Abnormal automatic repairing method based on cloud mobile phone service Download PDF

Info

Publication number
CN112783682B
CN112783682B CN202110133683.8A CN202110133683A CN112783682B CN 112783682 B CN112783682 B CN 112783682B CN 202110133683 A CN202110133683 A CN 202110133683A CN 112783682 B CN112783682 B CN 112783682B
Authority
CN
China
Prior art keywords
abnormal
event
abnormal condition
mobile phone
cloud mobile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110133683.8A
Other languages
Chinese (zh)
Other versions
CN112783682A (en
Inventor
汪小烽
连寿哲
杨重魁
李晶莹
郭志斌
林道桢
李毅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujian Duoduoyun Technology Co ltd
Original Assignee
Fujian Duoduoyun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujian Duoduoyun Technology Co ltd filed Critical Fujian Duoduoyun Technology Co ltd
Priority to CN202110133683.8A priority Critical patent/CN112783682B/en
Publication of CN112783682A publication Critical patent/CN112783682A/en
Application granted granted Critical
Publication of CN112783682B publication Critical patent/CN112783682B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0706Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment
    • G06F11/0709Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation the processing taking place on a specific hardware platform or in a specific software environment in a distributed system consisting of a plurality of standalone computer nodes, e.g. clusters, client-server systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/079Root cause analysis, i.e. error or fault diagnosis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/0703Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
    • G06F11/0793Remedial or corrective actions

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Hardware Design (AREA)
  • Mobile Radio Communication Systems (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides an automatic abnormity repairing method based on cloud mobile phone service, which is applied to a cloud mobile phone service system, wherein the cloud mobile phone service system comprises a user terminal and a cloud mobile phone service center, and the user terminal performs data interaction with the cloud mobile phone service center through a network. According to the invention, the equipment state is automatically detected, when an abnormal condition occurs, the repair strategy is automatically selected according to the abnormal state code, repair is attempted, and the repair result is notified, so that the whole process is automatic, convenient and rapid, and manual intervention is not needed.

Description

Abnormal automatic repairing method based on cloud mobile phone service
[ technical field ] A method for producing a semiconductor device
The invention relates to the technical field of anomaly detection and alarm, in particular to an anomaly automatic repairing method based on cloud mobile phone service.
[ background of the invention ]
The demand of netizens can be well met through the cloud mobile phone and the cloud service, and the provided resources are very rich. In a cloud network, due to heavy burden, a problem of system crash occurs in a large number of servers, which causes high uncertainty in cloud service. In this respect, various fault detection methods are proposed in the prior art, but these algorithms are not satisfactory in terms of reliability, for example, by proposing to set up indexes (including actual fault detection speed, false detection rate, etc.), quantitative description is performed on the service quality of the detector, and on this basis, fault detection is performed based on probability, however, this method is obviously not suitable for the dynamic characteristics of the network, and the service quality is not high. At present, the problems faced in fault detection mainly include:
(1) automatic analysis, in which distributed software application in a cloud computing environment is generally composed of hundreds of nodes and is divided into a plurality of layers, and in the face of such a huge system, a system administrator cannot manually analyze the problems of the system according to experience; (2) problem component positioning, distributed software application in a computing environment generally consists of a plurality of components, the components are distributed in different nodes, various services are called, the interaction relation among the components is complex, the association degree is high, and the fault component causing system faults is difficult to accurately position; (3) on-line detection, faults of a plurality of software systems are often shown in a large-scale operation process, and system operation and maintenance personnel are difficult to reproduce the problems in the product operation process in an off-line environment so as to track and locate the causes of the problems; (4) the environment adaptability is high, the execution environment can be dynamically changed in the application running process (such as external load fluctuation, application multi-host migration and virtual machine resource dynamic adjustment), the application can also show behaviors corresponding to the environment, and the system state is difficult to accurately depict by adopting an offline established model. How to automatically repair the abnormal condition of the cloud mobile phone service becomes a technical problem which needs to be solved urgently.
[ summary of the invention ]
The application provides an automatic abnormal repairing method based on cloud mobile phone service, and aims to solve one or more of the above-mentioned technical problems. By automatically detecting the state of the equipment, when an abnormal condition occurs, the repairing strategy is automatically selected according to the abnormal state code, repairing is attempted, and a repairing result is notified, so that the whole process is automatic, convenient and rapid, and manual intervention is not needed.
The technical scheme adopted by the application is as follows:
an automatic abnormal repairing method based on cloud mobile phone service is applied to a cloud mobile phone service system, the cloud mobile phone service system comprises a user terminal and a cloud mobile phone service center, the user terminal performs data interaction with the cloud mobile phone service center through a network, and the method specifically comprises the following steps:
step 1, obtaining abnormal condition event data of cloud mobile phone application service, and sending the abnormal condition event data to an abnormal condition classification module;
step 2, the abnormal situation classification module analyzes log description information contained in the abnormal situation event data and judges the determined classification of the abnormal situation event data according to the preset abnormal situation category;
step 3, acquiring a corresponding abnormal condition event training set according to the determined classification by an abnormal condition weighting analysis module, and calculating a characteristic information weight of the abnormal condition event data;
step 4, the state notification module sorts the feature information weight values in a descending order, and generates state notification information for abnormal condition event data corresponding to feature information larger than the feature information weight value threshold value according to a preset feature information weight value threshold value, and sends the state notification information to operation and maintenance personnel;
and 5, determining a corresponding abnormal condition event repairing strategy by the abnormal condition event repairing module according to the characteristic information weight, starting a repairing process, and sending repairing result notification information to operation and maintenance personnel through the state notification module.
Furthermore, the cloud mobile phone service center is provided with a monitoring device and a database device, wherein the monitoring device is used for collecting monitoring data of each layer, analyzing and processing the monitoring data, and executing a corresponding control strategy according to the analysis and processing result.
Furthermore, the monitoring device comprises an abnormal condition classification module, an abnormal condition weighting analysis module, a state notification module and an abnormal condition event restoration module.
Furthermore, the database device temporarily stores the abnormal condition event data collected periodically.
Further, the state notification information includes 3 parts of impact service, impact situation, and abnormal information description.
Further, the abnormal event is divided into 3 types, namely, a network transmission abnormal event, a program equipment abnormal event and an application service abnormal event;
the program equipment exception event comprises a virtualization layer exception event and a physical layer exception event;
the network transmission abnormal event comprises a message middleware abnormal event, an operating system platform abnormal event, a network abnormal event and a transaction middleware abnormal event;
the application service abnormal event comprises a Web application service abnormal event and a browser abnormal event.
Further, the abnormal situation event repairing strategy comprises an active repairing mode, a negotiation repairing mode and a passive repairing mode.
Furthermore, in the active repair mode, the service data flow affected by the abnormal condition event does not have service quality guarantee, and real-time data interaction with the cloud mobile phone service center is not needed.
Furthermore, in the negotiation and repair mode, the service data flow affected by the abnormal condition event has service quality guarantee, and real-time data interaction with the cloud mobile phone service center is required.
Further, the passive repair mode is started by sending a repair request to the cloud mobile phone service center, and due to the influence of an abnormal condition event, the redundant service resources cannot be allocated and switched, so that the protection mechanism cannot be normally started and repaired, and the service center can realize allocation and switching of the redundant service resources only under the condition of manual intervention and taking corresponding measures.
Through the embodiment of the application, the following technical effects can be obtained:
1) compared with the prior art, the method and the device for classifying the abnormal conditions of the cloud mobile phone application service are innovatively applied to the classification of the abnormal conditions of the cloud mobile phone application service, and can be used for classifying information of subsequent abnormal conditions through training. The method can realize more targeted issuing of the alarm, reduce the misjudgment of the alarm and support operation and maintenance personnel to more quickly and accurately locate the fault reason;
2) the classification method facing the abnormal event of the cloud mobile phone application service can be directly applied to a cloud mobile phone service center or an application service operation and maintenance platform, large classes of abnormal conditions are eliminated, alarms related to the classes are issued, the alarms which are irrelevant to convergence are pushed to operation and maintenance personnel, and the relevance can be changed and expanded through updating and dynamic adjustment, so that the classification timeliness can be ensured according to actual requirements.
3) According to the invention, the equipment state is automatically detected, when an abnormal condition occurs, the repair strategy is automatically selected according to the abnormal state code, repair is attempted, and the repair result is notified, so that the whole process is automatic, convenient and rapid, and manual intervention is not needed.
[ description of the drawings ]
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and those skilled in the art can also obtain other drawings according to the drawings without inventive labor.
Fig. 1 is a schematic structural diagram of a cloud mobile phone service system;
FIG. 2 is a schematic diagram illustrating a flow of detecting an abnormal situation event;
FIG. 3 is a diagram illustrating a basic format of status notification information;
FIG. 4 is a schematic diagram of the structure of an abnormal event;
fig. 5 is a schematic diagram of a component structure of an abnormal event repairing strategy.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The device state notification method is applied to a cloud mobile phone service system, and fig. 1 is a schematic view of a composition structure of the cloud mobile phone service system. The cloud mobile phone service system comprises a user terminal 1 and a cloud mobile phone service center 2, wherein the user terminal performs data interaction with the cloud mobile phone service center through a wireless network.
The cloud mobile phone service center is provided with a monitoring device and a database device, wherein the monitoring device is used for collecting monitoring data of each layer, analyzing and processing the monitoring data and executing a corresponding control strategy according to an analysis and processing result, and the monitoring device comprises an abnormal condition classification module, an abnormal condition weighting analysis module, a state notification module and an abnormal condition event restoration module. The database device temporarily stores the abnormal condition event data collected periodically. In background operation and maintenance of cloud mobile phone application, a maintainer can record feedback information of a cloud mobile phone user every day. Because a large amount of user feedback is generated when the cloud mobile phone application service is abnormal, the cloud mobile phone service center needs to cluster effective information from the large amount of user feedback, and forms a cloud mobile phone application service abnormal condition list after sorting and delivers the list to a maintainer responsible for corresponding application service for processing.
Each cloud mobile phone application service abnormal condition list records an emergent cloud mobile phone application service alarm event, such as application flash back, long-time non-response and the like, and in the existing abnormal condition processing mode, the abnormal condition event is usually directly sent to corresponding maintenance personnel. Due to the diversity of the service types related to the cloud mobile phone service center, various types of alarms can be generated under the abnormal conditions occurring at the same time, the efficiency of the processing mode is not high, the processing time of the abnormal conditions can be delayed if the abnormal conditions are misjudged or misinformed, and the maintenance resources are wasted.
Fig. 2 is a schematic diagram of a detection process of an abnormal event in the present invention, which includes the following steps:
step 1, obtaining abnormal condition event data of cloud mobile phone application service, and sending the abnormal condition event data to an abnormal condition classification module;
step 2, the abnormal situation classification module analyzes log description information contained in the abnormal situation event data and judges the determined classification of the abnormal situation event data according to the preset abnormal situation category;
step 3, acquiring a corresponding abnormal condition event training set according to the determined classification by an abnormal condition weighting analysis module, and calculating a characteristic information weight of the abnormal condition event data;
step 4, the state notification module sorts the feature information weight values in a descending order, and generates state notification information for abnormal condition event data corresponding to feature information larger than the feature information weight value threshold value according to a preset feature information weight value threshold value, and sends the state notification information to operation and maintenance personnel;
and 5, determining a corresponding abnormal condition event repairing strategy by the abnormal condition event repairing module according to the characteristic information weight, starting a repairing process, and sending repairing result notification information to operation and maintenance personnel through the state notification module.
The basic format of the status notification information is shown in fig. 3, and the status notification information includes 3 parts, namely, an impact service, an impact situation, and an exception information description.
In the present application, the abnormal situation event is evaluated and calculated according to the analysis result of each feature in the abnormal situation event, and the abnormal situation event is divided into 3 types, i.e., a network transmission abnormal event, a program device abnormal event, and an application service abnormal event according to the evaluation result, and fig. 4 is a schematic view of a composition structure of the abnormal situation event in the present invention.
The program equipment exception event comprises a virtualization layer exception event and a physical layer exception event; the virtualization layer abnormal events comprise VMM abnormal events, virtual software abnormal events, VM abnormal events and virtual network abnormal events; the physical layer abnormal events comprise server abnormal events, database abnormal events, storage facility abnormal events, processor abnormal events and physical network abnormal events;
the network transmission abnormal event comprises a message middleware abnormal event, an operating system platform abnormal event, a network abnormal event and a transaction middleware abnormal event; the message middleware abnormal events comprise server blocking abnormal events, transmission overtime abnormal events, queue overload abnormal events and array boundary crossing abnormal events; the abnormal events of the operating system platform comprise memory abnormal events, CPU abnormal events and disk IO abnormal events; the network abnormal events comprise network packet loss abnormal events and network delay abnormal events; the transaction middleware abnormal event comprises a configuration error abnormal event, a database deadlock abnormal event and a queue deletion abnormal event;
the application service abnormal event comprises a Web application service abnormal event and a browser abnormal event;
the Web application service abnormal event comprises a system port abnormal event and a software conflict abnormal event;
the browser abnormal events comprise browser blocking abnormal events, browser closing abnormal events, browser crash abnormal events and browser error reporting abnormal events;
the cloud mobile phone application anomaly detection is judged from the first layer of fig. 4, if the cloud mobile phone application anomaly event data to be detected belong to one or more types of the data, the detection is only carried out along the corresponding classification branches, and anomaly notification messages related to the equipment state are respectively sent to corresponding maintainers according to the classification conditions.
And the abnormal condition classification module is used for analyzing the log description information of the abnormal condition and judging the determined classification of the abnormal condition according to the preset abnormal condition event types, wherein the preset abnormal condition event types comprise 3 types, namely a network transmission abnormal event, a program equipment abnormal event and an application service abnormal event. The abnormal condition classification module can reduce the false alarm of the abnormal condition, so that the state notification of the abnormal condition has higher pertinence.
The abnormal condition classification module comprises a preprocessing unit, a mapping association unit and a type identification unit which are sequentially connected;
the preprocessing unit is used for sorting the abnormal condition event data and extracting characteristic information from the sorted abnormal condition event data;
the mapping association unit is used for carrying out initial abnormal condition event type labeling on the extracted feature information, carrying out mapping association on the feature information and labeled types to form a feature information base of each initial abnormal condition event type, and carrying out association storage on the feature information base and the labeled types in mapping association to form a corresponding association relation between the feature information base and the labeled types;
and the type identification unit is used for analyzing and calculating the abnormal condition event types according to the corresponding association relation between the characteristic information base and the labeled types, taking the calculation result with the maximum weight as the finally determined abnormal condition event type, and finishing the classification of the abnormal condition events.
The acquired abnormal condition event data of the cloud mobile phone application service is automatically classified into 3 abnormal condition event categories, namely a network transmission abnormal event, a program equipment abnormal event and an application service abnormal event, after being processed by the abnormal condition classification module. Through classification, irrelevant abnormal condition events can be converged, so that the specific notification of the abnormal condition events is realized.
The following respectively describes each unit in the abnormal situation classification module in detail:
pretreatment Unit
The data describing the abnormal condition events of the cloud mobile phone application service are extracted from feedback information of cloud mobile phone users, wherein the feedback information comprises user information (such as user identification, equipment identification number, login duration and the like) and problem description of some cloud mobile phone application service users, and as the information may contain a lot of contents irrelevant to the abnormal condition, or is not directly relevant to the abnormal condition, or has no practical significance for operation and maintenance personnel, the data of the abnormal condition events need to be preprocessed, and the data are collated and feature information is extracted from the collated abnormal condition event data, so that a feature information set describing the abnormal condition events is finally formed.
The extracting of the feature information from the sorted abnormal situation event data specifically includes:
setting sensorjWord, a problem description item for a sorted abnormal situation event jiIs sensorjThe data sorting method of the ith keyword and the abnormal condition event in the method is realized by the following mode:
sentencej=clean(data)
wordi=extra(sentencej)
the clear is a data sorting function, the extra is a data extracting function, the sorted abnormal condition event data is subjected to data extracting operation to obtain characteristic information, and the characteristic information is provided for the mapping association unit.
Mapping association unit
The system comprises a characteristic information database, a mapping relation database and a storage database, wherein the characteristic information database is used for carrying out initial abnormal condition event category marking on the extracted characteristic information, carrying out mapping relation on the characteristic information and marked categories to form a characteristic information base of each initial abnormal condition event category, and carrying out correlation storage on the characteristic information base and the marked categories related to the mapping;
the forming of the association relationship between the feature information base and the labeled categories specifically includes:
step 201, for the abnormal condition events with labeled categories, extracting the feature information with the frequency ranking of the first three in each abnormal condition event data by using the following formula, and constructing a feature information base of the corresponding abnormal condition event category;
wij=tfij×idfj=tfij×log(N/nj)
wherein, tjIs characteristic information, tfijIs referred to as tjData on abnormal situation events diThe number of occurrences in (a); idfjRepresenting the inverse frequency, N representing the total amount of information, NjRepresents tjNumber of (2), wijRepresents the frequency of occurrence;
step 202,Comparing the frequency of appearance w of the same characteristic information in different characteristic information basesijValue and divide the same feature information into the frequency of occurrence wijThe characteristic information base with higher value;
establishing a feature information base under each category, because in the abnormal condition expression, the description contents of the abnormal conditions under different categories may be the same, and different feature information bases may contain the same feature information, the occurrence frequencies w of the same feature information under different feature information bases need to be comparedijValue to ensure accuracy of the established feature information base;
step 203, performing associated storage on the characteristic information base and the labeled categories associated with mapping;
for example, the description of network transmission exceptions may refer to the term "network transmission," and program device exceptions may also be referred to frequently by the term "network transmission. The frequency of occurrence w of the word "network transmission" in both categoriesijThe values are all relatively high and are simultaneously incorporated into the characteristic information base of the network transmission abnormal event and the program equipment abnormal event, the total value of the network transmission abnormal event in the network transmission and the total value of the program equipment abnormal event are obtained by comparison, and the term of the network transmission is higher in the total value of the network transmission abnormal event, so that the term is contained in the characteristic information base of the network transmission abnormal event, and the like.
Through the above processes, the association relation between the 3 categories of the network transmission abnormal event, the program equipment abnormal event and the application service abnormal event and the corresponding characteristic information base is established, and the association relation is provided for the classification use of the next process. The association relationship can be directly used, but if the subsequent abnormal condition description is updated, the association relationship can be updated and dynamically adjusted by the method.
Type identification Unit
The abnormal condition event classification system is used for analyzing and calculating the abnormal condition event classification according to the characteristic information base stored in association and the labeled classification associated with mapping, taking the calculation result with the maximum weight as the finally determined abnormal condition event classification, and finishing the abnormal condition event classification;
the abnormal condition event category analysis and calculation is realized by the following modes:
step 301, obtaining the question description item sensor of each abnormal situation eventjAnd keywords word contained thereiniCalculating the probability P that the problem description item of the abnormal situation event belongs to each categorykProbability of each class PkBy mapping to each ClasskDivided by the number of the abnormal situation event question description item sensorjThe specific calculation formula is as follows:
Pk=count(wordi∈Classk)/length(sentencej) Wherein k is 1, 2, 3;
when P is present1>(P2&P3) When, Label is 1;
when P is present2>(P1&P3) When, Label is 2;
when P is present3>(P1&P2) When, Label is 3;
wherein, PkFor the probability under each category, k corresponds to the sequence of each category, the sequence of network transmission abnormal events is 1, the sequence of program equipment abnormal events is 2, the sequence of application service abnormal events is 3, and ClasskThe Class of the characteristic information base in the incidence relation, the characteristic information base for marking the abnormal events of the network transmission is Class1The characteristic information base of program equipment abnormal events is Class2The characteristic information base of the application service abnormal event is Class3Label represents the finally determined abnormal condition category;
step 302, compare the probability P of each classkIf it is P1If the probability is the highest, judging that the network transmission abnormal event is the network transmission abnormal event, and setting Label to be 1; if probability P2If the maximum value is the program equipment abnormal event, judging the program equipment abnormal event to be 2 by Label; if probability P3If the maximum value is the highest value, judging the application service abnormal event, and setting Label to be 3; if the probabilities are the same, judging the program equipment is an abnormal event, and setting Label to be 2;
it is very rare for the cases with the same probability, which can allow some fault tolerance, and considering that the abnormal cases of the program device are the most, the cases with the same probability are labeled as the abnormal events of the program device, i.e. Label is 2, thereby completing the classification process.
The classification method for the abnormal events of the cloud mobile phone application service can be directly applied to a cloud mobile phone service center or an application service operation and maintenance platform, a large number of abnormal conditions are eliminated, alarm information related to categories is issued, related alarms are pushed to operation and maintenance personnel, the whole process is automatic, convenient and fast, and manual intervention is not needed. Moreover, the incidence relation can be changed and expanded through updating and dynamic adjustment, so that the timeliness of classification can be guaranteed according to actual requirements.
After the abnormal condition events are classified, the classified abnormal condition events are weighted, and the weighting processing of the abnormal condition events is realized by an abnormal condition event weighting analysis module. The weighting of the abnormal condition events can determine different weighted values for various abnormal condition events in the abnormal condition event correlation events, and can also serve as corresponding references for the priority among the abnormal condition events.
The abnormal situation weighted analysis module obtains an abnormal situation event training set and calculates a feature information weight, wherein the abnormal situation event training set E { (x)i,ci,μ(xi) 1, …, n }, where x isiFor sample data items in the training set, ciIs equal to xiClassification of corresponding abnormal situation events, μ (x)i) For sample data items x in the training setiThe degree of adherence value;
the calculating the feature information weight specifically includes:
step 401, normalizing each sample data item of the abnormal condition event training set E, and determining a typical feature vector id of the abnormal condition event training set EtThe characteristic feature vector idtDetermining by performing a mean operation on all samples in each abnormal situation event classification;
step 402, calculating sample data items x in the abnormal condition event training set EiTypical feature vector id classified with the abnormal situation eventtThe similarity of each feature of (a), the calculation formula of the similarity is as follows:
Figure BDA0002926306710000131
wherein x isilFor the ith characteristic, id, of the ith sample data item in the abnormal condition event training set EtlFor sample data item xiCorresponding canonical feature vector idtObtaining an n × p-order similarity matrix S by similarity calculation;
Figure BDA0002926306710000132
step 403, according to the similarity matrix S, taking the similarity corresponding to the feature of each sample data item as an adherence degree value, and calculating an entropy value of the feature of each sample data item by using the following formula:
Figure BDA0002926306710000133
where 1, …, p is the l-th feature of the sample data item;
step 403, calculating a sum of correlation information between the ith feature and other features of the ith sample data item in the abnormal condition event training set E, wherein a specific calculation formula is as follows:
Figure BDA0002926306710000141
wherein r is 1, 2, …, p, i (l), H (r) is the sum of correlation information between the ith feature and other features of the ith sample data item in the abnormal condition event training set E, H (r) is the entropy value of the ith feature of the sample data item in the abnormal condition event training set E, H (l, r) is the joint entropy value of the ith feature and the ith feature of the sample data item, and the joint entropy value is calculated by the following formula:
Figure BDA0002926306710000142
step 404, calculating the weight w of the characteristic of each sample data item according to the correlation information and I (l)lThe specific calculation formula is as follows:
Figure BDA0002926306710000143
wherein, wlAnd representing the weight of the ith feature of the sample data item in the abnormal condition event training set E.
Fig. 5 is a schematic diagram of a component structure of an abnormal event repairing strategy. In the application, the abnormal event repair strategy is divided into 3 repair strategy modes according to the characteristic information weight, including an active repair mode, a negotiation repair mode and a passive repair mode.
In the active repair mode, a repair request is sent to the cloud mobile phone service center to trigger the allocation and switching of redundant service resources, so that the repair of abnormal conditions is realized. For example, the abnormal condition can be successfully repaired through redundant service resources, the affected service data flow has no service quality guarantee (has low real-time requirement), and real-time data interaction with the service center is not required.
For example, in the case of a network transmission abnormal event, if only the main transmission link is abnormal, the transmission recovery of the service data stream can be realized by switching the service data stream to the redundant auxiliary transmission link. However, the main transmission link is usually selected to be the most preferable transmission link, and when an abnormal condition occurs in the main transmission link, it cannot be ensured that the redundant auxiliary transmission link is the most preferable transmission link in the current network transmission environment. For a service data stream which has low real-time requirement and does not have service quality guarantee, because the existence of time delay does not affect the interaction effect of the service data stream and does not negatively affect the user experience of the cloud mobile phone, the service data stream is not necessarily switched to another optimal transmission link, and an active repair mode can be directly and actively switched from a service center to a redundant auxiliary transmission link.
However, for a service data stream with service quality assurance, such as a stream of high-bandwidth real-time multimedia service data, the requirements on transmission delay and packet loss rate of the data are high, and a redundant auxiliary transmission link needs to be selected and switched in time to meet the requirements on service quality assurance, that is, a repair mode is negotiated. The negotiation repair mode also triggers the allocation and switching of redundant service resources by sending a repair request to the cloud mobile phone service center, and is different from the active repair mode in that the negotiation repair mode aims at that the affected service data stream has service quality guarantee (has higher requirement on real-time performance), and needs to perform real-time data interaction with the service center to ensure the timeliness and user experience of the data stream, and for the situation, the service center should change into the optimal redundant service resources according to the characteristics of the service data stream and perform fast switching to ensure fast abnormal condition repair, for example, high-bandwidth real-time audio and video data stream service in the cloud mobile phone application service.
The passive repair mode is also started by sending a repair request to the cloud mobile phone service center, but for the passive repair mode, due to the influence of an abnormal condition event, the redundant service resources cannot be allocated and switched, so that the protection mechanism cannot normally start repair, and the service center can realize allocation and switching of the redundant service resources only under the condition of manual intervention and corresponding measures. For example, in the case of an abnormal situation occurring in both the main transmission link and the auxiliary transmission link, only one transmission link can be reselected by using a passive repair mode, but such a repair mode would additionally occupy the storage space of the hardware device and generate a computational load, so the passive repair mode is usually used in this case to repair the abnormal situation.
According to the technical scheme, the abnormal condition event data are weighted, the accuracy problem of abnormal condition event detection is solved while priority order reference is provided, the defects of the prior art are overcome, the practicability is good, and the application scene is rich.
In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via ROM. When being loaded and executed, may carry out one or more of the steps of the method described above.
The functions described above in this disclosure may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (4)

1. An automatic abnormal repairing method based on cloud mobile phone service is applied to a cloud mobile phone service system, the cloud mobile phone service system comprises a user terminal and a cloud mobile phone service center, and the user terminal performs data interaction with the cloud mobile phone service center through a network, and is characterized by specifically comprising the following steps:
step 1, obtaining abnormal condition event data of cloud mobile phone application service, and sending the abnormal condition event data to an abnormal condition classification module;
step 2, the abnormal situation classification module analyzes log description information contained in the abnormal situation event data, and judges the determined classification of the abnormal situation event data according to the preset abnormal situation category, wherein the abnormal situation event is classified into 3 types, namely a network transmission abnormal event, a program equipment abnormal event and an application service abnormal event;
step 3, acquiring a corresponding abnormal condition event training set according to the determined classification by an abnormal condition weighting analysis module, and calculating a characteristic information weight of the abnormal condition event data;
step 4, the state notification module sorts the feature information weight values in a descending order, and generates state notification information for abnormal condition event data corresponding to feature information larger than the feature information weight value threshold value according to a preset feature information weight value threshold value, and sends the state notification information to operation and maintenance personnel;
step 5, the abnormal condition event repairing module determines a corresponding abnormal condition event repairing strategy according to the characteristic information weight, starts a repairing process and sends repairing result notification information to operation and maintenance personnel through the state notification module;
the abnormal condition event repairing strategy comprises an active repairing mode, a negotiation repairing mode and a passive repairing mode;
in the active repair mode, service data flow influenced by abnormal condition events does not have service quality guarantee, and real-time data interaction with a cloud mobile phone service center is not needed; in the active repair mode, a repair request is sent to a cloud mobile phone service center to trigger the allocation and switching of redundant service resources, so that the repair of abnormal conditions is realized;
in the negotiation repair mode, the service data flow influenced by the abnormal condition event has service quality guarantee and needs to perform real-time data interaction with the cloud mobile phone service center;
the passive repair mode is started by sending a repair request to a cloud mobile phone service center, and due to the influence of abnormal events, redundant service resources cannot be distributed and switched, so that a protection mechanism cannot be normally started and repaired, and the service center can realize the distribution and switching of the redundant service resources under the condition of manual intervention and taking corresponding measures;
the cloud mobile phone service center is provided with a monitoring device and a database device, wherein the monitoring device is used for collecting monitoring data of each layer, analyzing and processing the monitoring data and executing a corresponding control strategy according to the analysis and processing result;
the monitoring device comprises an abnormal condition classification module, an abnormal condition weighting analysis module, a state notification module and an abnormal condition event restoration module;
the abnormal condition classification module is used for analyzing the log description information of the abnormal condition and judging the determined classification of the abnormal condition according to the preset abnormal condition event category; the abnormal condition classification module comprises a preprocessing unit, a mapping association unit and a type identification unit which are sequentially connected;
the preprocessing unit is used for sorting the abnormal condition event data and extracting characteristic information from the sorted abnormal condition event data;
the mapping association unit is used for carrying out initial abnormal condition event type labeling on the extracted feature information, carrying out mapping association on the feature information and labeled types to form a feature information base of each initial abnormal condition event type, and carrying out association storage on the feature information base and the labeled types in mapping association to form a corresponding association relation between the feature information base and the labeled types;
and the type identification unit is used for analyzing and calculating the abnormal condition event types according to the corresponding association relation between the characteristic information base and the labeled types, taking the calculation result with the maximum weight as the finally determined abnormal condition event type, and finishing the classification of the abnormal condition events.
2. The method for automatically remedying an abnormality according to claim 1, wherein periodically collected abnormal situation event data is temporarily stored in the database device.
3. The method according to claim 1, wherein the status notification information includes 3 parts of impact service, impact situation, and exception information description.
4. The method for automatically repairing an exception according to claim 1, wherein the exception event is classified into 3 types of a network transmission exception, a program device exception, and an application service exception;
the program equipment exception event comprises a virtualization layer exception event and a physical layer exception event;
the network transmission abnormal event comprises a message middleware abnormal event, an operating system platform abnormal event, a network abnormal event and a transaction middleware abnormal event;
the application service abnormal event comprises a Web application service abnormal event and a browser abnormal event.
CN202110133683.8A 2021-02-01 2021-02-01 Abnormal automatic repairing method based on cloud mobile phone service Active CN112783682B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110133683.8A CN112783682B (en) 2021-02-01 2021-02-01 Abnormal automatic repairing method based on cloud mobile phone service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110133683.8A CN112783682B (en) 2021-02-01 2021-02-01 Abnormal automatic repairing method based on cloud mobile phone service

Publications (2)

Publication Number Publication Date
CN112783682A CN112783682A (en) 2021-05-11
CN112783682B true CN112783682B (en) 2022-02-22

Family

ID=75760221

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110133683.8A Active CN112783682B (en) 2021-02-01 2021-02-01 Abnormal automatic repairing method based on cloud mobile phone service

Country Status (1)

Country Link
CN (1) CN112783682B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114025037B (en) * 2021-10-28 2024-04-30 北京百度网讯科技有限公司 Cloud mobile phone overhaul method and device, electronic equipment and storage medium
CN114567539B (en) * 2022-03-22 2024-04-12 中国农业银行股份有限公司 Network system exception handling method, device, equipment and medium
CN114968761B (en) * 2022-04-11 2023-07-21 杭州德适生物科技有限公司 Software running environment safety supervision system based on Internet
CN115098294B (en) * 2022-08-24 2022-11-15 摩尔线程智能科技(北京)有限责任公司 Abnormal event processing method, electronic equipment and management terminal

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103227734A (en) * 2013-04-27 2013-07-31 华南理工大学 Method for detecting abnormity of OpenStack cloud platform
CN103428026A (en) * 2012-05-14 2013-12-04 国际商业机器公司 Method and system for problem determination and diagnosis in shared dynamic clouds
CN105677538A (en) * 2016-01-11 2016-06-15 中国科学院软件研究所 Method for adaptive monitoring of cloud computing system based on failure prediction
CN109861844A (en) * 2018-12-07 2019-06-07 中国人民大学 A kind of cloud service problem fine granularity intelligence source tracing method based on log
CN110505283A (en) * 2019-07-31 2019-11-26 湖南微算互联信息技术有限公司 A kind of automatic maintenance system and method based on cloud mobile phone
CN110825545A (en) * 2019-08-31 2020-02-21 武汉理工大学 Cloud service platform anomaly detection method and system
CN111209131A (en) * 2019-12-30 2020-05-29 航天信息股份有限公司广州航天软件分公司 Method and system for determining fault of heterogeneous system based on machine learning
CN111245648A (en) * 2020-01-06 2020-06-05 华云数据(厦门)网络有限公司 Cloud mobile phone fault alarming and automatic recovery method
CN111859384A (en) * 2020-07-23 2020-10-30 平安证券股份有限公司 Abnormal event monitoring method and device, computer equipment and storage medium
CN112052109A (en) * 2020-08-28 2020-12-08 西安电子科技大学 Cloud service platform event anomaly detection method based on log analysis

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9996409B2 (en) * 2016-03-28 2018-06-12 Ca, Inc. Identification of distinguishable anomalies extracted from real time data streams
CN106209826A (en) * 2016-07-08 2016-12-07 瑞达信息安全产业股份有限公司 A kind of safety case investigation method of Network Security Device monitoring
KR102587127B1 (en) * 2017-12-26 2023-10-11 삼성전자주식회사 Method and apparatus for managing operational data of appliance device for failure prediction
US11757906B2 (en) * 2019-04-18 2023-09-12 Oracle International Corporation Detecting behavior anomalies of cloud users for outlier actions
US10908788B2 (en) * 2019-05-16 2021-02-02 Oracle International Corporation Automated process discovery and facilitation within a cloud business application
CN111193616A (en) * 2019-12-13 2020-05-22 广州朗国电子科技有限公司 Automatic operation and maintenance method, device and system, storage medium and automatic operation and maintenance server

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103428026A (en) * 2012-05-14 2013-12-04 国际商业机器公司 Method and system for problem determination and diagnosis in shared dynamic clouds
CN103227734A (en) * 2013-04-27 2013-07-31 华南理工大学 Method for detecting abnormity of OpenStack cloud platform
CN105677538A (en) * 2016-01-11 2016-06-15 中国科学院软件研究所 Method for adaptive monitoring of cloud computing system based on failure prediction
CN109861844A (en) * 2018-12-07 2019-06-07 中国人民大学 A kind of cloud service problem fine granularity intelligence source tracing method based on log
CN110505283A (en) * 2019-07-31 2019-11-26 湖南微算互联信息技术有限公司 A kind of automatic maintenance system and method based on cloud mobile phone
CN110825545A (en) * 2019-08-31 2020-02-21 武汉理工大学 Cloud service platform anomaly detection method and system
CN111209131A (en) * 2019-12-30 2020-05-29 航天信息股份有限公司广州航天软件分公司 Method and system for determining fault of heterogeneous system based on machine learning
CN111245648A (en) * 2020-01-06 2020-06-05 华云数据(厦门)网络有限公司 Cloud mobile phone fault alarming and automatic recovery method
CN111859384A (en) * 2020-07-23 2020-10-30 平安证券股份有限公司 Abnormal event monitoring method and device, computer equipment and storage medium
CN112052109A (en) * 2020-08-28 2020-12-08 西安电子科技大学 Cloud service platform event anomaly detection method based on log analysis

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
文本分类TF_IDF算法的改进研究;叶雪梅;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200115(第1期);I138-2532 *
文本聚类分析若干问题研究;高茂庭;《中国博士学位论文全文数据库 信息科技辑》;20090415(第4期);I138-45 *

Also Published As

Publication number Publication date
CN112783682A (en) 2021-05-11

Similar Documents

Publication Publication Date Title
CN112783682B (en) Abnormal automatic repairing method based on cloud mobile phone service
KR101984730B1 (en) Automatic predicting system for server failure and automatic predicting method for server failure
KR102522005B1 (en) Apparatus for VNF Anomaly Detection based on Machine Learning for Virtual Network Management and a method thereof
US8655623B2 (en) Diagnostic system and method
US8635498B2 (en) Performance analysis of applications
CN110704231A (en) Fault processing method and device
CN112954031B (en) Equipment state notification method based on cloud mobile phone
US11886276B2 (en) Automatically correlating phenomena detected in machine generated data to a tracked information technology change
CN111290913A (en) Fault location visualization system and method based on operation and maintenance data prediction
CN113515434A (en) Abnormity classification method, abnormity classification device, abnormity classification equipment and storage medium
CN111027591B (en) Node fault prediction method for large-scale cluster system
CN112969172B (en) Communication flow control method based on cloud mobile phone
CN116881962B (en) Security monitoring system, method, device and storage medium
CN116755974A (en) Cloud computing platform operation and maintenance method and device, electronic equipment and storage medium
CN115580528A (en) Fault root cause positioning method, device, equipment and readable storage medium
CN116264541A (en) Multi-dimension-based database disaster recovery method and device
CN114881112A (en) System anomaly detection method, device, equipment and medium
CN111835566A (en) System fault management method, device and system
CN117596133B (en) Service portrayal and anomaly monitoring system and monitoring method based on multidimensional data
CN115599077B (en) Vehicle fault delimiting method and device, electronic equipment and storage medium
CN112905479B (en) Cloud platform-based method and system for determining optimal path of alarm accident root cause
CN116841792B (en) Application program development fault repairing method
TR2022013419A2 (en) ROOT CAUSE DETECTION SYSTEM THAT PREDICTATS FAILURES THROUGH REAL-TIME ERROR LOGS
CN117827608A (en) Intelligent early warning and disposal method based on historical monitoring data
CN115587725A (en) Script type decision management system and method with big data association

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant