CN112783682B

CN112783682B - Abnormal automatic repairing method based on cloud mobile phone service

Info

Publication number: CN112783682B
Application number: CN202110133683.8A
Authority: CN
Inventors: 汪小烽; 连寿哲; 杨重魁; 李晶莹; 郭志斌; 林道桢; 李毅
Original assignee: Fujian Duoduoyun Technology Co ltd
Current assignee: Fujian Duoduoyun Technology Co ltd
Priority date: 2021-02-01
Filing date: 2021-02-01
Publication date: 2022-02-22
Anticipated expiration: 2041-02-01
Also published as: CN112783682A

Abstract

The invention provides an automatic abnormity repairing method based on cloud mobile phone service, which is applied to a cloud mobile phone service system, wherein the cloud mobile phone service system comprises a user terminal and a cloud mobile phone service center, and the user terminal performs data interaction with the cloud mobile phone service center through a network. According to the invention, the equipment state is automatically detected, when an abnormal condition occurs, the repair strategy is automatically selected according to the abnormal state code, repair is attempted, and the repair result is notified, so that the whole process is automatic, convenient and rapid, and manual intervention is not needed.

Description

Abnormal automatic repairing method based on cloud mobile phone service

[ technical field ] A method for producing a semiconductor device

The invention relates to the technical field of anomaly detection and alarm, in particular to an anomaly automatic repairing method based on cloud mobile phone service.

[ background of the invention ]

The demand of netizens can be well met through the cloud mobile phone and the cloud service, and the provided resources are very rich. In a cloud network, due to heavy burden, a problem of system crash occurs in a large number of servers, which causes high uncertainty in cloud service. In this respect, various fault detection methods are proposed in the prior art, but these algorithms are not satisfactory in terms of reliability, for example, by proposing to set up indexes (including actual fault detection speed, false detection rate, etc.), quantitative description is performed on the service quality of the detector, and on this basis, fault detection is performed based on probability, however, this method is obviously not suitable for the dynamic characteristics of the network, and the service quality is not high. At present, the problems faced in fault detection mainly include:

(1) automatic analysis, in which distributed software application in a cloud computing environment is generally composed of hundreds of nodes and is divided into a plurality of layers, and in the face of such a huge system, a system administrator cannot manually analyze the problems of the system according to experience; (2) problem component positioning, distributed software application in a computing environment generally consists of a plurality of components, the components are distributed in different nodes, various services are called, the interaction relation among the components is complex, the association degree is high, and the fault component causing system faults is difficult to accurately position; (3) on-line detection, faults of a plurality of software systems are often shown in a large-scale operation process, and system operation and maintenance personnel are difficult to reproduce the problems in the product operation process in an off-line environment so as to track and locate the causes of the problems; (4) the environment adaptability is high, the execution environment can be dynamically changed in the application running process (such as external load fluctuation, application multi-host migration and virtual machine resource dynamic adjustment), the application can also show behaviors corresponding to the environment, and the system state is difficult to accurately depict by adopting an offline established model. How to automatically repair the abnormal condition of the cloud mobile phone service becomes a technical problem which needs to be solved urgently.

[ summary of the invention ]

The application provides an automatic abnormal repairing method based on cloud mobile phone service, and aims to solve one or more of the above-mentioned technical problems. By automatically detecting the state of the equipment, when an abnormal condition occurs, the repairing strategy is automatically selected according to the abnormal state code, repairing is attempted, and a repairing result is notified, so that the whole process is automatic, convenient and rapid, and manual intervention is not needed.

The technical scheme adopted by the application is as follows:

an automatic abnormal repairing method based on cloud mobile phone service is applied to a cloud mobile phone service system, the cloud mobile phone service system comprises a user terminal and a cloud mobile phone service center, the user terminal performs data interaction with the cloud mobile phone service center through a network, and the method specifically comprises the following steps:

step 1, obtaining abnormal condition event data of cloud mobile phone application service, and sending the abnormal condition event data to an abnormal condition classification module;

step 2, the abnormal situation classification module analyzes log description information contained in the abnormal situation event data and judges the determined classification of the abnormal situation event data according to the preset abnormal situation category;

step 3, acquiring a corresponding abnormal condition event training set according to the determined classification by an abnormal condition weighting analysis module, and calculating a characteristic information weight of the abnormal condition event data;

step 4, the state notification module sorts the feature information weight values in a descending order, and generates state notification information for abnormal condition event data corresponding to feature information larger than the feature information weight value threshold value according to a preset feature information weight value threshold value, and sends the state notification information to operation and maintenance personnel;

and 5, determining a corresponding abnormal condition event repairing strategy by the abnormal condition event repairing module according to the characteristic information weight, starting a repairing process, and sending repairing result notification information to operation and maintenance personnel through the state notification module.

Furthermore, the cloud mobile phone service center is provided with a monitoring device and a database device, wherein the monitoring device is used for collecting monitoring data of each layer, analyzing and processing the monitoring data, and executing a corresponding control strategy according to the analysis and processing result.

Furthermore, the monitoring device comprises an abnormal condition classification module, an abnormal condition weighting analysis module, a state notification module and an abnormal condition event restoration module.

Furthermore, the database device temporarily stores the abnormal condition event data collected periodically.

Further, the state notification information includes 3 parts of impact service, impact situation, and abnormal information description.

Further, the abnormal event is divided into 3 types, namely, a network transmission abnormal event, a program equipment abnormal event and an application service abnormal event;

the program equipment exception event comprises a virtualization layer exception event and a physical layer exception event;

the network transmission abnormal event comprises a message middleware abnormal event, an operating system platform abnormal event, a network abnormal event and a transaction middleware abnormal event;

the application service abnormal event comprises a Web application service abnormal event and a browser abnormal event.

Further, the abnormal situation event repairing strategy comprises an active repairing mode, a negotiation repairing mode and a passive repairing mode.

Furthermore, in the active repair mode, the service data flow affected by the abnormal condition event does not have service quality guarantee, and real-time data interaction with the cloud mobile phone service center is not needed.

Furthermore, in the negotiation and repair mode, the service data flow affected by the abnormal condition event has service quality guarantee, and real-time data interaction with the cloud mobile phone service center is required.

Further, the passive repair mode is started by sending a repair request to the cloud mobile phone service center, and due to the influence of an abnormal condition event, the redundant service resources cannot be allocated and switched, so that the protection mechanism cannot be normally started and repaired, and the service center can realize allocation and switching of the redundant service resources only under the condition of manual intervention and taking corresponding measures.

Through the embodiment of the application, the following technical effects can be obtained:

1) compared with the prior art, the method and the device for classifying the abnormal conditions of the cloud mobile phone application service are innovatively applied to the classification of the abnormal conditions of the cloud mobile phone application service, and can be used for classifying information of subsequent abnormal conditions through training. The method can realize more targeted issuing of the alarm, reduce the misjudgment of the alarm and support operation and maintenance personnel to more quickly and accurately locate the fault reason;

2) the classification method facing the abnormal event of the cloud mobile phone application service can be directly applied to a cloud mobile phone service center or an application service operation and maintenance platform, large classes of abnormal conditions are eliminated, alarms related to the classes are issued, the alarms which are irrelevant to convergence are pushed to operation and maintenance personnel, and the relevance can be changed and expanded through updating and dynamic adjustment, so that the classification timeliness can be ensured according to actual requirements.

3) According to the invention, the equipment state is automatically detected, when an abnormal condition occurs, the repair strategy is automatically selected according to the abnormal state code, repair is attempted, and the repair result is notified, so that the whole process is automatic, convenient and rapid, and manual intervention is not needed.

[ description of the drawings ]

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present application, and those skilled in the art can also obtain other drawings according to the drawings without inventive labor.

Fig. 1 is a schematic structural diagram of a cloud mobile phone service system;

FIG. 2 is a schematic diagram illustrating a flow of detecting an abnormal situation event;

FIG. 3 is a diagram illustrating a basic format of status notification information;

FIG. 4 is a schematic diagram of the structure of an abnormal event;

fig. 5 is a schematic diagram of a component structure of an abnormal event repairing strategy.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The device state notification method is applied to a cloud mobile phone service system, and fig. 1 is a schematic view of a composition structure of the cloud mobile phone service system. The cloud mobile phone service system comprises a user terminal 1 and a cloud mobile phone service center 2, wherein the user terminal performs data interaction with the cloud mobile phone service center through a wireless network.

The cloud mobile phone service center is provided with a monitoring device and a database device, wherein the monitoring device is used for collecting monitoring data of each layer, analyzing and processing the monitoring data and executing a corresponding control strategy according to an analysis and processing result, and the monitoring device comprises an abnormal condition classification module, an abnormal condition weighting analysis module, a state notification module and an abnormal condition event restoration module. The database device temporarily stores the abnormal condition event data collected periodically. In background operation and maintenance of cloud mobile phone application, a maintainer can record feedback information of a cloud mobile phone user every day. Because a large amount of user feedback is generated when the cloud mobile phone application service is abnormal, the cloud mobile phone service center needs to cluster effective information from the large amount of user feedback, and forms a cloud mobile phone application service abnormal condition list after sorting and delivers the list to a maintainer responsible for corresponding application service for processing.

Each cloud mobile phone application service abnormal condition list records an emergent cloud mobile phone application service alarm event, such as application flash back, long-time non-response and the like, and in the existing abnormal condition processing mode, the abnormal condition event is usually directly sent to corresponding maintenance personnel. Due to the diversity of the service types related to the cloud mobile phone service center, various types of alarms can be generated under the abnormal conditions occurring at the same time, the efficiency of the processing mode is not high, the processing time of the abnormal conditions can be delayed if the abnormal conditions are misjudged or misinformed, and the maintenance resources are wasted.

Fig. 2 is a schematic diagram of a detection process of an abnormal event in the present invention, which includes the following steps:

The basic format of the status notification information is shown in fig. 3, and the status notification information includes 3 parts, namely, an impact service, an impact situation, and an exception information description.

In the present application, the abnormal situation event is evaluated and calculated according to the analysis result of each feature in the abnormal situation event, and the abnormal situation event is divided into 3 types, i.e., a network transmission abnormal event, a program device abnormal event, and an application service abnormal event according to the evaluation result, and fig. 4 is a schematic view of a composition structure of the abnormal situation event in the present invention.

The program equipment exception event comprises a virtualization layer exception event and a physical layer exception event; the virtualization layer abnormal events comprise VMM abnormal events, virtual software abnormal events, VM abnormal events and virtual network abnormal events; the physical layer abnormal events comprise server abnormal events, database abnormal events, storage facility abnormal events, processor abnormal events and physical network abnormal events;

the network transmission abnormal event comprises a message middleware abnormal event, an operating system platform abnormal event, a network abnormal event and a transaction middleware abnormal event; the message middleware abnormal events comprise server blocking abnormal events, transmission overtime abnormal events, queue overload abnormal events and array boundary crossing abnormal events; the abnormal events of the operating system platform comprise memory abnormal events, CPU abnormal events and disk IO abnormal events; the network abnormal events comprise network packet loss abnormal events and network delay abnormal events; the transaction middleware abnormal event comprises a configuration error abnormal event, a database deadlock abnormal event and a queue deletion abnormal event;

the application service abnormal event comprises a Web application service abnormal event and a browser abnormal event;

the Web application service abnormal event comprises a system port abnormal event and a software conflict abnormal event;

the browser abnormal events comprise browser blocking abnormal events, browser closing abnormal events, browser crash abnormal events and browser error reporting abnormal events;

the cloud mobile phone application anomaly detection is judged from the first layer of fig. 4, if the cloud mobile phone application anomaly event data to be detected belong to one or more types of the data, the detection is only carried out along the corresponding classification branches, and anomaly notification messages related to the equipment state are respectively sent to corresponding maintainers according to the classification conditions.

And the abnormal condition classification module is used for analyzing the log description information of the abnormal condition and judging the determined classification of the abnormal condition according to the preset abnormal condition event types, wherein the preset abnormal condition event types comprise 3 types, namely a network transmission abnormal event, a program equipment abnormal event and an application service abnormal event. The abnormal condition classification module can reduce the false alarm of the abnormal condition, so that the state notification of the abnormal condition has higher pertinence.

The abnormal condition classification module comprises a preprocessing unit, a mapping association unit and a type identification unit which are sequentially connected;

the preprocessing unit is used for sorting the abnormal condition event data and extracting characteristic information from the sorted abnormal condition event data;

the mapping association unit is used for carrying out initial abnormal condition event type labeling on the extracted feature information, carrying out mapping association on the feature information and labeled types to form a feature information base of each initial abnormal condition event type, and carrying out association storage on the feature information base and the labeled types in mapping association to form a corresponding association relation between the feature information base and the labeled types;

and the type identification unit is used for analyzing and calculating the abnormal condition event types according to the corresponding association relation between the characteristic information base and the labeled types, taking the calculation result with the maximum weight as the finally determined abnormal condition event type, and finishing the classification of the abnormal condition events.

The acquired abnormal condition event data of the cloud mobile phone application service is automatically classified into 3 abnormal condition event categories, namely a network transmission abnormal event, a program equipment abnormal event and an application service abnormal event, after being processed by the abnormal condition classification module. Through classification, irrelevant abnormal condition events can be converged, so that the specific notification of the abnormal condition events is realized.

The following respectively describes each unit in the abnormal situation classification module in detail:

pretreatment Unit

The data describing the abnormal condition events of the cloud mobile phone application service are extracted from feedback information of cloud mobile phone users, wherein the feedback information comprises user information (such as user identification, equipment identification number, login duration and the like) and problem description of some cloud mobile phone application service users, and as the information may contain a lot of contents irrelevant to the abnormal condition, or is not directly relevant to the abnormal condition, or has no practical significance for operation and maintenance personnel, the data of the abnormal condition events need to be preprocessed, and the data are collated and feature information is extracted from the collated abnormal condition event data, so that a feature information set describing the abnormal condition events is finally formed.

The extracting of the feature information from the sorted abnormal situation event data specifically includes:

setting sensor_jWord, a problem description item for a sorted abnormal situation event j_iIs sensor_jThe data sorting method of the ith keyword and the abnormal condition event in the method is realized by the following mode:

sentence_j＝clean(data)

word_i＝extra(sentence_j)

the clear is a data sorting function, the extra is a data extracting function, the sorted abnormal condition event data is subjected to data extracting operation to obtain characteristic information, and the characteristic information is provided for the mapping association unit.

Mapping association unit

The system comprises a characteristic information database, a mapping relation database and a storage database, wherein the characteristic information database is used for carrying out initial abnormal condition event category marking on the extracted characteristic information, carrying out mapping relation on the characteristic information and marked categories to form a characteristic information base of each initial abnormal condition event category, and carrying out correlation storage on the characteristic information base and the marked categories related to the mapping;

the forming of the association relationship between the feature information base and the labeled categories specifically includes:

step 201, for the abnormal condition events with labeled categories, extracting the feature information with the frequency ranking of the first three in each abnormal condition event data by using the following formula, and constructing a feature information base of the corresponding abnormal condition event category;

w_ij＝tf_ij×idf_j＝tf_ij×log(N/n_j)

wherein, t_jIs characteristic information, tf_ijIs referred to as t_jData on abnormal situation events d_iThe number of occurrences in (a); idf_jRepresenting the inverse frequency, N representing the total amount of information, N_jRepresents t_jNumber of (2), w_ijRepresents the frequency of occurrence;

step 202,Comparing the frequency of appearance w of the same characteristic information in different characteristic information bases_ijValue and divide the same feature information into the frequency of occurrence w_ijThe characteristic information base with higher value;

establishing a feature information base under each category, because in the abnormal condition expression, the description contents of the abnormal conditions under different categories may be the same, and different feature information bases may contain the same feature information, the occurrence frequencies w of the same feature information under different feature information bases need to be compared_ijValue to ensure accuracy of the established feature information base;

step 203, performing associated storage on the characteristic information base and the labeled categories associated with mapping;

for example, the description of network transmission exceptions may refer to the term "network transmission," and program device exceptions may also be referred to frequently by the term "network transmission. The frequency of occurrence w of the word "network transmission" in both categories_ijThe values are all relatively high and are simultaneously incorporated into the characteristic information base of the network transmission abnormal event and the program equipment abnormal event, the total value of the network transmission abnormal event in the network transmission and the total value of the program equipment abnormal event are obtained by comparison, and the term of the network transmission is higher in the total value of the network transmission abnormal event, so that the term is contained in the characteristic information base of the network transmission abnormal event, and the like.

Through the above processes, the association relation between the 3 categories of the network transmission abnormal event, the program equipment abnormal event and the application service abnormal event and the corresponding characteristic information base is established, and the association relation is provided for the classification use of the next process. The association relationship can be directly used, but if the subsequent abnormal condition description is updated, the association relationship can be updated and dynamically adjusted by the method.

Type identification Unit

The abnormal condition event classification system is used for analyzing and calculating the abnormal condition event classification according to the characteristic information base stored in association and the labeled classification associated with mapping, taking the calculation result with the maximum weight as the finally determined abnormal condition event classification, and finishing the abnormal condition event classification;

the abnormal condition event category analysis and calculation is realized by the following modes:

step 301, obtaining the question description item sensor of each abnormal situation event_jAnd keywords word contained therein_iCalculating the probability P that the problem description item of the abnormal situation event belongs to each category_kProbability of each class P_kBy mapping to each Class_kDivided by the number of the abnormal situation event question description item sensor_jThe specific calculation formula is as follows:

P_k＝count(word_i∈Class_k)/length(sentence_j) Wherein k is 1, 2, 3;

when P is present₁>(P₂&P₃) When, Label is 1;

when P is present₂>(P₁&P₃) When, Label is 2;

when P is present₃>(P₁&P₂) When, Label is 3;

wherein, P_kFor the probability under each category, k corresponds to the sequence of each category, the sequence of network transmission abnormal events is 1, the sequence of program equipment abnormal events is 2, the sequence of application service abnormal events is 3, and Class_kThe Class of the characteristic information base in the incidence relation, the characteristic information base for marking the abnormal events of the network transmission is Class₁The characteristic information base of program equipment abnormal events is Class₂The characteristic information base of the application service abnormal event is Class₃Label represents the finally determined abnormal condition category;

step 302, compare the probability P of each class_kIf it is P₁If the probability is the highest, judging that the network transmission abnormal event is the network transmission abnormal event, and setting Label to be 1; if probability P₂If the maximum value is the program equipment abnormal event, judging the program equipment abnormal event to be 2 by Label; if probability P₃If the maximum value is the highest value, judging the application service abnormal event, and setting Label to be 3; if the probabilities are the same, judging the program equipment is an abnormal event, and setting Label to be 2;

it is very rare for the cases with the same probability, which can allow some fault tolerance, and considering that the abnormal cases of the program device are the most, the cases with the same probability are labeled as the abnormal events of the program device, i.e. Label is 2, thereby completing the classification process.

The classification method for the abnormal events of the cloud mobile phone application service can be directly applied to a cloud mobile phone service center or an application service operation and maintenance platform, a large number of abnormal conditions are eliminated, alarm information related to categories is issued, related alarms are pushed to operation and maintenance personnel, the whole process is automatic, convenient and fast, and manual intervention is not needed. Moreover, the incidence relation can be changed and expanded through updating and dynamic adjustment, so that the timeliness of classification can be guaranteed according to actual requirements.

After the abnormal condition events are classified, the classified abnormal condition events are weighted, and the weighting processing of the abnormal condition events is realized by an abnormal condition event weighting analysis module. The weighting of the abnormal condition events can determine different weighted values for various abnormal condition events in the abnormal condition event correlation events, and can also serve as corresponding references for the priority among the abnormal condition events.

The abnormal situation weighted analysis module obtains an abnormal situation event training set and calculates a feature information weight, wherein the abnormal situation event training set E { (x)_i，c_i，μ(x_i) 1, …, n }, where x is_iFor sample data items in the training set, c_iIs equal to x_iClassification of corresponding abnormal situation events, μ (x)_i) For sample data items x in the training set_iThe degree of adherence value;

the calculating the feature information weight specifically includes:

step 401, normalizing each sample data item of the abnormal condition event training set E, and determining a typical feature vector id of the abnormal condition event training set E_tThe characteristic feature vector id_tDetermining by performing a mean operation on all samples in each abnormal situation event classification;

step 402, calculating sample data items x in the abnormal condition event training set E_iTypical feature vector id classified with the abnormal situation event_tThe similarity of each feature of (a), the calculation formula of the similarity is as follows:

wherein x is_ilFor the ith characteristic, id, of the ith sample data item in the abnormal condition event training set E_tlFor sample data item x_iCorresponding canonical feature vector id_tObtaining an n × p-order similarity matrix S by similarity calculation;

step 403, according to the similarity matrix S, taking the similarity corresponding to the feature of each sample data item as an adherence degree value, and calculating an entropy value of the feature of each sample data item by using the following formula:

where 1, …, p is the l-th feature of the sample data item;

step 403, calculating a sum of correlation information between the ith feature and other features of the ith sample data item in the abnormal condition event training set E, wherein a specific calculation formula is as follows:

wherein r is 1, 2, …, p, i (l), H (r) is the sum of correlation information between the ith feature and other features of the ith sample data item in the abnormal condition event training set E, H (r) is the entropy value of the ith feature of the sample data item in the abnormal condition event training set E, H (l, r) is the joint entropy value of the ith feature and the ith feature of the sample data item, and the joint entropy value is calculated by the following formula:

step 404, calculating the weight w of the characteristic of each sample data item according to the correlation information and I (l)_lThe specific calculation formula is as follows:

wherein, w_lAnd representing the weight of the ith feature of the sample data item in the abnormal condition event training set E.

Fig. 5 is a schematic diagram of a component structure of an abnormal event repairing strategy. In the application, the abnormal event repair strategy is divided into 3 repair strategy modes according to the characteristic information weight, including an active repair mode, a negotiation repair mode and a passive repair mode.

In the active repair mode, a repair request is sent to the cloud mobile phone service center to trigger the allocation and switching of redundant service resources, so that the repair of abnormal conditions is realized. For example, the abnormal condition can be successfully repaired through redundant service resources, the affected service data flow has no service quality guarantee (has low real-time requirement), and real-time data interaction with the service center is not required.

For example, in the case of a network transmission abnormal event, if only the main transmission link is abnormal, the transmission recovery of the service data stream can be realized by switching the service data stream to the redundant auxiliary transmission link. However, the main transmission link is usually selected to be the most preferable transmission link, and when an abnormal condition occurs in the main transmission link, it cannot be ensured that the redundant auxiliary transmission link is the most preferable transmission link in the current network transmission environment. For a service data stream which has low real-time requirement and does not have service quality guarantee, because the existence of time delay does not affect the interaction effect of the service data stream and does not negatively affect the user experience of the cloud mobile phone, the service data stream is not necessarily switched to another optimal transmission link, and an active repair mode can be directly and actively switched from a service center to a redundant auxiliary transmission link.

However, for a service data stream with service quality assurance, such as a stream of high-bandwidth real-time multimedia service data, the requirements on transmission delay and packet loss rate of the data are high, and a redundant auxiliary transmission link needs to be selected and switched in time to meet the requirements on service quality assurance, that is, a repair mode is negotiated. The negotiation repair mode also triggers the allocation and switching of redundant service resources by sending a repair request to the cloud mobile phone service center, and is different from the active repair mode in that the negotiation repair mode aims at that the affected service data stream has service quality guarantee (has higher requirement on real-time performance), and needs to perform real-time data interaction with the service center to ensure the timeliness and user experience of the data stream, and for the situation, the service center should change into the optimal redundant service resources according to the characteristics of the service data stream and perform fast switching to ensure fast abnormal condition repair, for example, high-bandwidth real-time audio and video data stream service in the cloud mobile phone application service.

The passive repair mode is also started by sending a repair request to the cloud mobile phone service center, but for the passive repair mode, due to the influence of an abnormal condition event, the redundant service resources cannot be allocated and switched, so that the protection mechanism cannot normally start repair, and the service center can realize allocation and switching of the redundant service resources only under the condition of manual intervention and corresponding measures. For example, in the case of an abnormal situation occurring in both the main transmission link and the auxiliary transmission link, only one transmission link can be reselected by using a passive repair mode, but such a repair mode would additionally occupy the storage space of the hardware device and generate a computational load, so the passive repair mode is usually used in this case to repair the abnormal situation.

According to the technical scheme, the abnormal condition event data are weighted, the accuracy problem of abnormal condition event detection is solved while priority order reference is provided, the defects of the prior art are overcome, the practicability is good, and the application scene is rich.

In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via ROM. When being loaded and executed, may carry out one or more of the steps of the method described above.

The functions described above in this disclosure may be performed at least in part by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), and the like.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims

1. An automatic abnormal repairing method based on cloud mobile phone service is applied to a cloud mobile phone service system, the cloud mobile phone service system comprises a user terminal and a cloud mobile phone service center, and the user terminal performs data interaction with the cloud mobile phone service center through a network, and is characterized by specifically comprising the following steps:

step 2, the abnormal situation classification module analyzes log description information contained in the abnormal situation event data, and judges the determined classification of the abnormal situation event data according to the preset abnormal situation category, wherein the abnormal situation event is classified into 3 types, namely a network transmission abnormal event, a program equipment abnormal event and an application service abnormal event;

step 5, the abnormal condition event repairing module determines a corresponding abnormal condition event repairing strategy according to the characteristic information weight, starts a repairing process and sends repairing result notification information to operation and maintenance personnel through the state notification module;

the abnormal condition event repairing strategy comprises an active repairing mode, a negotiation repairing mode and a passive repairing mode;

in the active repair mode, service data flow influenced by abnormal condition events does not have service quality guarantee, and real-time data interaction with a cloud mobile phone service center is not needed; in the active repair mode, a repair request is sent to a cloud mobile phone service center to trigger the allocation and switching of redundant service resources, so that the repair of abnormal conditions is realized;

in the negotiation repair mode, the service data flow influenced by the abnormal condition event has service quality guarantee and needs to perform real-time data interaction with the cloud mobile phone service center;

the passive repair mode is started by sending a repair request to a cloud mobile phone service center, and due to the influence of abnormal events, redundant service resources cannot be distributed and switched, so that a protection mechanism cannot be normally started and repaired, and the service center can realize the distribution and switching of the redundant service resources under the condition of manual intervention and taking corresponding measures;

the cloud mobile phone service center is provided with a monitoring device and a database device, wherein the monitoring device is used for collecting monitoring data of each layer, analyzing and processing the monitoring data and executing a corresponding control strategy according to the analysis and processing result;

the monitoring device comprises an abnormal condition classification module, an abnormal condition weighting analysis module, a state notification module and an abnormal condition event restoration module;

the abnormal condition classification module is used for analyzing the log description information of the abnormal condition and judging the determined classification of the abnormal condition according to the preset abnormal condition event category; the abnormal condition classification module comprises a preprocessing unit, a mapping association unit and a type identification unit which are sequentially connected;

2. The method for automatically remedying an abnormality according to claim 1, wherein periodically collected abnormal situation event data is temporarily stored in the database device.

3. The method according to claim 1, wherein the status notification information includes 3 parts of impact service, impact situation, and exception information description.

4. The method for automatically repairing an exception according to claim 1, wherein the exception event is classified into 3 types of a network transmission exception, a program device exception, and an application service exception;