WO2021068831A1 - 一种业务告警方法、设备及存储介质 - Google Patents

一种业务告警方法、设备及存储介质 Download PDF

Info

Publication number
WO2021068831A1
WO2021068831A1 PCT/CN2020/119303 CN2020119303W WO2021068831A1 WO 2021068831 A1 WO2021068831 A1 WO 2021068831A1 CN 2020119303 W CN2020119303 W CN 2020119303W WO 2021068831 A1 WO2021068831 A1 WO 2021068831A1
Authority
WO
WIPO (PCT)
Prior art keywords
alarm
records
record
service
target
Prior art date
Application number
PCT/CN2020/119303
Other languages
English (en)
French (fr)
Inventor
罗刚
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021068831A1 publication Critical patent/WO2021068831A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3003Monitoring arrangements specially adapted to the computing system or computing system component being monitored
    • G06F11/3006Monitoring arrangements specially adapted to the computing system or computing system component being monitored where the computing system is distributed, e.g. networked systems, clusters, multiprocessor systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3055Monitoring arrangements for monitoring the status of the computing system or of the computing system component, e.g. monitoring if the computing system is on, off, available, not available
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/30Monitoring
    • G06F11/3065Monitoring arrangements determined by the means or processing involved in reporting the monitored data
    • G06F11/3072Monitoring arrangements determined by the means or processing involved in reporting the monitored data where the reporting involves data filtering, e.g. pattern matching, time or event triggered, adaptive or policy-based reporting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Definitions

  • This application relates to the field of data security, and in particular to a service alarm method, device and storage medium.
  • the business monitoring platform receives tens of thousands of alarm records every day. By analyzing the alarm records, abnormal businesses can be monitored.
  • the inventor realizes that the current analysis method of alarm records is mainly to manually check the individual alarm records, that is, the system operation and maintenance personnel judge the abnormal business that may be alarmed by the alarm records based on experience.
  • the embodiment of the present application provides a service alarm method, which can realize rapid alarm for abnormal services.
  • an embodiment of the present application provides a service alarm method, which includes:
  • the target alarm record is analyzed by using the service alarm model to obtain the service label of the target alarm record, and alarm information including the target alarm record and the service label of the target alarm record is generated.
  • an embodiment of the present application provides a service alarm device, the service alarm device includes a unit for executing the service alarm method of the first aspect, and the service alarm device includes:
  • the clustering unit is used to cluster multiple alarm records to obtain the alarm set to which each alarm record of the multiple alarm records belongs.
  • Different alarm sets correspond to different service labels, and the service labels are used to indicate The business alarmed by the alarm record in the corresponding alarm set;
  • a generating unit configured to generate training samples according to the multiple alarm records and the service label of each alarm record of the multiple alarm records
  • the training unit is used to train the support vector machine by using the training samples to obtain a business alarm model
  • the analysis unit is configured to analyze the target alarm record by using the service alarm model to obtain the service label of the target alarm record;
  • the alarm unit is used to generate alarm information including the target alarm record and the service label of the target alarm record.
  • an embodiment of the present application provides a service alarm device, including a processor and a memory, the processor and the memory are connected to each other, wherein the memory is used to store a computer program, and the computer program includes program instructions,
  • the processor is configured to call the program instructions to execute the following methods:
  • the target alarm record is analyzed by using the service alarm model to obtain the service label of the target alarm record, and alarm information including the target alarm record and the service label of the target alarm record is generated.
  • an embodiment of the present application provides a computer-readable storage medium, the computer-readable storage medium stores a computer program, the computer program includes program instructions, and the program instructions are executed by a processor to execute The following methods:
  • the target alarm record is analyzed by using the service alarm model to obtain the service label of the target alarm record, and alarm information including the target alarm record and the service label of the target alarm record is generated.
  • This application can realize rapid alarms for abnormal services.
  • Figure 1 is a schematic diagram of a service alarm system provided by an embodiment of the present application.
  • FIG. 2 is a schematic flowchart of a service alarm method provided by an embodiment of the present application
  • FIG. 3 is a schematic flowchart of a service alarm method provided by another embodiment of the present application.
  • FIG. 4 is a schematic block diagram of a service alarm device provided by an embodiment of the present application.
  • Fig. 5 is a structural block diagram of a service alarm device provided by an embodiment of the present application.
  • the technical solution of the present application can be applied to the fields of artificial intelligence, blockchain and/or big data technology, and related data such as alarm records, alarm information, etc. can be stored in a database, or can be stored in a blockchain.
  • the service alarm method of the present application can be applied to a service alarm device, and the service alarm device may be a node in a blockchain.
  • This application is mainly applied to service alarm equipment, which can be traditional service alarm equipment, terminal equipment, server or the service alarm equipment described in the third and fourth embodiments of this application, which is not limited in this application .
  • service alarm equipment which can be traditional service alarm equipment, terminal equipment, server or the service alarm equipment described in the third and fourth embodiments of this application, which is not limited in this application .
  • the business alarm device and other devices record and transmit the characteristics of the data according to the preset format, and can perform corresponding processing and processing on the received data in the preset format. Analysis, etc., where the characteristics of the data include time, location, type, etc.
  • the business monitoring platform receives a large number of alarm records every day. If the abnormal online business is found by manually checking the individual alarm records, the efficiency is very low, so there is still a lack of an efficient business alarm method.
  • the embodiment of the present application provides a service alarm method, which can realize automatic and efficient abnormality detection for the service. Specifically, first obtain multiple alarm records, cluster the multiple alarm records according to the number of services that require alarms, and obtain an alarm set whose number is consistent with the number of services. The alarm records in the same alarm set are used to compare the same alarm records. Each service performs an alarm, so the alarm records in the same alarm set are labeled with the same service label, and finally a training sample containing the above-mentioned multiple alarm records and the service labels corresponding to the multiple alarm records is generated.
  • the support vector machine (SVM, Support Vector Machine) is trained using the training samples to obtain an alarm classification model that can accurately classify any alarm record.
  • the target alarm record in real time, and use the above-mentioned alarm classification model to classify the target alarm record, thereby obtaining the business label of the target alarm record.
  • the alarm information including the target alarm record and the target alarm record business tag is generated, and the alarm information is sent to the operation and maintenance personnel, so that the operation and maintenance personnel can perform abnormal inspection and maintenance of the business based on the alarm information.
  • SVM is a classification model that maps the original linearly inseparable data points to a new space and converts them to linearly separable data in the new space to achieve classification, and the classification speed and classification efficiency of SVM are excellent.
  • training samples are used to train the SVM, so that the SVM can fully learn the classification rules contained in the training samples, thereby obtaining a business alarm model that can directly classify any alarm record.
  • SVM can be used for linear/non-linear classification, and can also be used for regression, with low generalization error rate, good learning ability, and good generalization of training results.
  • this application does not need to manually check the alarm records one by one, but makes full use of historical alarm records through machine learning, and automatically analyzes the alarm records while reducing the time of operation and maintenance personnel analysis and troubleshooting, thereby greatly improving Operation and maintenance quality and improved operation and maintenance efficiency.
  • the method of applying the embodiments of the present application will be introduced below in conjunction with FIG. 1.
  • the embodiments of the present application can be applied to a scenario of detecting business abnormalities.
  • the business alarm device may obtain multiple alarm records from the cloud server, and may also obtain multiple alarm records from other terminal devices, which is not limited in the embodiment of the present application.
  • this application will then take the business alarm device to obtain multiple alarm records from the cloud server as an example in conjunction with FIG. 1 to illustrate the above process in detail.
  • the business alarm device first collects data to obtain multiple alarm records, and then classifies the multiple alarm records into multiple alarm sets through clustering.
  • Each alarm set corresponds to a business label, which can be used to determine The above-mentioned multiple alarm records respectively alarm services, and finally use the training samples composed of the multiple alarm records and the service label corresponding to each alarm record to train the support vector machine to obtain the service label capable of recording any alarm The business alarm model for accurate analysis.
  • the business alarm device obtains the target alarm record, it is convenient to use the business alarm model to analyze the target alarm record, obtain the business label of the target alarm record, and generate the target alarm record and the target
  • the alarm information of the business tag in the alarm record is to alert the business maintenance personnel to check and take relevant countermeasures.
  • the business alarm device can collect multiple alarm records into the database, or it can collect multiple original alarm records into the database first, and then perform digital characterization processing on the multiple original alarm records to Reduce the data volume of the original alarm record while retaining the data characteristics of the original alarm record, thereby obtaining multiple alarm records.
  • the original alarm record contains multiple alarm items, and the alarm items in the alarm record are obtained after digital characterization processing. No change, except that the alarm item in the original alarm record corresponds to the alarm data.
  • the alarm data is expressed in the form of characters such as numeric or text.
  • the alarm data under different alarm items can be expressed in different character forms.
  • the alarm data under different alarm items in each of the multiple original alarm records is digitally characterized to obtain multiple alarm records, for example , Correct the value of the alarm data under the alarm item "Central Processing Unit Utilization Rate”, perform hash calculation on the alarm data under the alarm item "Interface Call”, and perform characterization on the alarm data under the alarm item "Network Connectivity” coding.
  • the above clustering of multiple alarm records refers to the direct use of any one of fuzzy C-means clustering, hierarchical clustering, density-based clustering, and k-value clustering algorithm (also known as kmeans algorithm)
  • the algorithm clusters the above multiple alarm records to obtain multiple alarm sets, or first determines the number of services to be warned as a preset number, and then according to the similar distance between any two alarm records in the multiple alarm records, there are multiple alarm records. Determine the preset number of cluster centers consistent with the number of services in the alarm record, and determine the alarm set centered on each cluster center, and obtain the number of alarm sets consistent with the number of services, that is, the number of alarm sets It is also a preset quantity.
  • the process of determining the cluster center is a process of multiple clustering. Specifically, a preset number of alarm records are randomly selected from the obtained multiple alarm records as the center to perform the above-mentioned multiple alarm records. Clustering to obtain a preset number of alarm sets, and then determine the alarm records in each alarm set as the actual center, then determine the preset number of alarm records that are the actual center, and then use the preset number as the actual center.
  • the alarm records of the center cluster the multiple alarm records obtained above to obtain a preset number of new alarm sets, and then determine the preset number of new alarm sets as the actual center of the alarm records, and Cluster the multiple alarm records obtained above with the alarm record of the actual center in the new alarm set as the center kannuntil the alarm record that is the actual center in the clustered alarm set no longer changes, then Take the finally determined alarm record as the actual center as the cluster center to obtain a preset number of cluster centers, and then cluster the multiple alarm records obtained above with the preset number of cluster centers as the center to obtain the forecast Set
  • the above clustering of multiple obtained alarm records means that the similar distance between each alarm record and the alarm set as the center is calculated first, and then each alarm record is divided into the center with the closest similar distance. Come together.
  • the above determination of the alarm record in each alarm set as the actual center refers to calculating the average similarity distance between each alarm set in the alarm set and other alarm records in the alarm set, and the alarm record with the smallest average similarity distance is taken as The actual center in this alarm set.
  • the above-mentioned generating training samples refers to combining the obtained multiple alarm records and the service label of each alarm record to obtain the training samples.
  • there is another way to generate training samples is to filter out the effective alarm records among the above multiple alarm records, and combine the effective alarm records and the service labels of the effective alarm records to obtain the training samples. , Obtain the effective threshold corresponding to each alarm item in each alarm record, and then according to the effective threshold corresponding to each alarm item in each alarm record, and the alarm value under each alarm item in each alarm record , Filter out valid alarm records from multiple alarm records, and finally generate training samples containing the valid alarm records and the service labels of the valid alarm records. It can be seen that if the latter method of generating training samples is adopted, the quality of the generated training samples will be higher, and the analysis efficiency of the service alarm model obtained by the final training will also be higher.
  • the alarm record is an invalid alarm record. Only when the alarm values under all alarm items in the alarm record are valid values , The alarm record is a valid alarm record, where the method for judging whether the alarm value under the alarm item is a valid alarm value is whether the alarm value under the alarm item meets the effective threshold corresponding to the alarm item. Among them, corresponding to some alarm items, when the alarm value under the alarm item is greater than or equal to the alarm value corresponding to the alarm item, the alarm value under the alarm item meets the effective threshold corresponding to the alarm item and corresponds to another part of the alarm item. When the alarm value under the alarm item is less than or equal to the alarm value corresponding to the alarm item, the alarm value under the alarm item meets the effective threshold corresponding to the alarm item.
  • the service alarm model After the service alarm model is obtained by training with the above training samples, the service alarm model can be further trained in the follow-up to correct the service alarm model. Specifically, after the service alarm model analyzes and obtains the service label of the target alarm record, multiple pieces of feedback information are received, and the multiple pieces of feedback information are the service labels of the target alarm record marked by different users, and are determined in the multiple pieces of feedback information. The business label with the most occurrences is used as the target business label of the target alarm record. Finally, the target business label is used to modify the business alarm model to implement the process of revising the business alarm model.
  • the implementation of the correction process of the business alarm model can refer to the above-mentioned training process of the business alarm model, but the training process of the business alarm model aims to obtain a business alarm model that can be applied to a wide range of different business alarm scenarios.
  • the correction process is equivalent to adaptively modifying the business alarm model according to the currently applied business alarm scenario, so that the business alarm model is better optimized according to actual use, and is more suitable for solving current business alarm problems, such as multiple Use the service tag to modify the service alarm model for the alarm record of the network connection service, and the obtained service alarm model will more accurately warn the abnormality of the network connection service, so the service alarm model is trained and modified twice before and after. The meaning is different.
  • the embodiment of the application first obtains training samples through clustering, and uses the training samples to train the support vector machine to obtain the business alarm model. Finally, the business alarm model is used to analyze the target alarm record to quickly obtain the target alarm. Record the business label, and generate alarm information containing the target alarm record and the business label of the target alarm record.
  • the business label of the target alarm record analyzed by the business alarm model indicates the occurrence of the business corresponding to the business label of the target alarm record If an abnormality is detected, that is, the service corresponding to the service label is an abnormal service, so the generated alarm information realizes a rapid alarm for the abnormal service.
  • the content shown in FIG. 1 is an example, and does not constitute a limitation to the embodiments of the present application. Because in this application, the business alarm device can obtain any number of alarm records from any number of other devices.
  • the service alarm method may include:
  • the business alarm device first collects multiple alarm records into the database of the business alarm device, then obtains multiple alarm records, and clusters the multiple alarm records to divide the multiple alarm records
  • each alarm set corresponds to a business label, so the business labels of the alarm records in the same alarm set are the same, and the business labels are used to indicate the business that is alerted by the alarm records in the corresponding alarm set, and the business is
  • the network communication service of one network point corresponds to a service label, and the corresponding relationship between the service and the service label can be obtained by querying the mapping relationship table before the service and the service label.
  • Alarm records are used to describe business operation conditions. Each alarm record contains multiple alarm items.
  • the alarm items are detailed operation items of the business. Each alarm item contains the alarm value, and the alarm value is the specific item indicated by the alarm item. Value, alarm items such as time, computer room/network area, system, application name, node, host name/Internet Protocol Address (IP, Internet Protocol Address), central processing unit (CPU, Central Processing Unit) usage rate, network interruption/ Delay, network connectivity, disk space/input output (IO, Input Output), interface call, alarm level, business impact, upstream system, downstream system, processing plan, processing result, etc. It should be noted that the alarm value under the alarm item can be any value including a null value, and there is a fixed correspondence between some alarm items. The correspondence relationship is pre-stored in the dictionary of the database.
  • the alarm value under the alarm item that has a fixed correspondence with the alarm item is also determined. For example, there is a fixed correspondence between the two alarm items "upstream system” and "downstream system”.
  • the alarm value under "Upstream System” is determined, the alarm value under "Downstream System” is also determined, that is, the alarm value under "Downstream System” corresponds to the alarm value under "Upstream System” in the dictionary in the database The alarm value under "Downstream System”.
  • the above clustering of multiple alarm records to obtain the alarm set to which each of the multiple alarm records belongs means that the number of services to be warned is determined as a preset number, and then the number of services is adjusted according to the number of services.
  • Multiple alarm records are clustered to obtain a preset number of alarm sets, the number of alarm sets is consistent with the number of services, one alarm set corresponds to the business label of one business, and the business labels of alarm records in the same alarm set are consistent.
  • clustering one of fuzzy C-means clustering, hierarchical clustering, density-based clustering, and k-value clustering algorithm (also known as kmeans algorithm) can be used.
  • kmeans algorithm also known as kmeans algorithm
  • the foregoing clustering of multiple alarm records according to the number of services to obtain a preset number of alarm sets means that according to the similar distance between any two alarm records in the multiple alarm records, A preset number of cluster centers are determined in the alarm record, and then an alarm set centered on each cluster center is determined to obtain a preset number of alarm sets.
  • the similarity distance can be one of Euclidean distance, Mahalanobis distance, Manhattan distance, and angle cosine.
  • the size of the similarity distance between alarm records reflects the degree of association between alarm records. The greater the similarity distance, the representative The higher the correlation between alarm records and the smaller the similarity distance, the lower the correlation between alarm records.
  • the service alarm device first randomly determines n alarm records from m (m>n) alarm records as the cluster center, and then calculates the remaining ( mn) the similar distances between the alarm records and the n centers. The remaining (mn) alarm records are divided into the center with the smallest similar distance to an alarm set.
  • the actual center of the n alarm sets is determined, and the alarm record in each alarm set with the smallest average similarity distance from other alarm records in the alarm set is calculated as the actual center of the alarm set (assuming target The alarm set contains the i-th alarm record, and the similar distances between the i-th alarm record and other alarm records in the target alarm set are b1, b2...bj, so the above-mentioned item i
  • the average similarity distance between the alarm record and other alarm records in the target alarm set is (b1+b2 whil+bj)/j.
  • the average similarity distance of all alarm records in, and the alarm record with the smallest average similarity distance is taken as the actual center of the target alarm set), and the actual center of the alarm set is taken as the center of the above m alarm records.
  • the alarm records are clustered to obtain n new alarm sets, and then a new actual center is determined in each new alarm set.
  • the alarm records from each source are collected to the database mongodb through the distributed messaging system kafka, so as to collect the above multiple alarm records.
  • the above-mentioned database mongodb is a memory-based database that supports fragmentation, and has fast retrieval speed and high concurrent access.
  • the distributed message system kafka is a distributed, partitioned, multi-copy, distributed message system based on distributed locks. Its biggest feature is that it can process large amounts of data in real time to meet various demand scenarios.
  • each original alarm record contains multiple alarm items, and each alarm item contains alarm data.
  • the original alarm record and the alarm item contained in the alarm record are the same, but the original alarm record is under the alarm item Corresponds to the alarm data, and the alarm item of the alarm record corresponds to the alarm value.
  • the alarm value under the alarm item of the above alarm record is obtained .
  • the alarm values of different alarm items can be expressed in different character forms.
  • the character forms include numerical values or text, etc., and it should be noted that some data does not require digital characterization processing, such as time and other dimensions. Data, and some data needs to be digitally characterized, such as conventional data such as CPU usage, network connectivity, and interface calls.
  • digital characterization processing is used to process the original alarm records according to the digital characterization rules, so that the alarm records obtained after processing can not only retain the data characteristics of the original alarm records, but also reduce the data of the original alarm records. the amount.
  • the digital characterization processing includes at least one of numerical positive normalization, hash calculation, and character encoding of the alarm data.
  • the digital characterization processing includes numerical positive normalization processing. At least one of calculation processing and character encoding processing.
  • the numerical positive normalization process refers to the conversion of alarm data into positive integers
  • the hash calculation process refers to the conversion of alarm data into hash values
  • the character encoding process refers to the conversion of alarm data into digital codes. It should also be noted that the character encoding is used to encode the alarm data that is not convenient for transmission into an alarm value that is convenient for transmission. There is a mapping relationship between the alarm data and the alarm value.
  • the characterization rule corresponding to the alarm item "central processing unit utilization rate” is to positively normalize the alarm data, so the alarm data "96" under the alarm item “central processing unit utilization rate” in the original alarm record %" is positively normalized, that is, the CPU usage rate is converted to a positive integer, and the alarm value "96" is obtained;
  • the characterization rule corresponding to the alarm item “interface call” is to hash the alarm data, so the original The alarm data "384592546" under the alarm item “interface call” in the alarm record is hashed to obtain the alarm value "83c278845f00450c4222da1a4e35f408";
  • the characterization rule corresponding to the alarm item "network connection” is to encode the alarm data, so Perform character encoding processing on the alarm data "data packet received from the ping end" under the alarm item "network connectivity” in the original alarm record, and get the alarm value "1".
  • the alarm item in the original alarm record is "network connection” If the alarm item in
  • a training sample is generated.
  • the above-mentioned multiple alarm records and the service label of each alarm record of the above-mentioned multiple alarm records are combined to obtain training samples for subsequent training of the support vector machine.
  • generating training samples based on the multiple alarm records and the business label of each alarm record of the multiple alarm records refers to filtering the multiple alarm records, and then meeting the filtering conditions
  • the alarm records of and the service labels of the alarm records that meet the screening conditions are combined to obtain training samples for subsequent training of the support vector machine.
  • each alarm record contains multiple alarm items, and each alarm item contains an alarm value; according to each alarm record in each alarm record.
  • the effective threshold corresponding to each alarm item, and the alarm value under each alarm item in each alarm record filter out effective alarm records from multiple alarm records; generate a business label containing the effective alarm record and the effective alarm record Training samples.
  • the effective alarm records among the above-mentioned multiple alarm records are filtered out, and the effective alarm records and the service tags of the effective alarm records are combined to obtain training samples. Specifically, each alarm record is obtained Each alarm item corresponds to the effective threshold, and then according to the effective threshold corresponding to each alarm item in each alarm record, and the alarm value under each alarm item in each alarm record, from multiple alarm records The effective alarm records are filtered out, and finally a training sample containing the effective alarm record and the service label of the effective alarm record is generated.
  • each alarm item in the alarm record corresponds to an effective threshold.
  • the effective threshold it can be judged whether the data under the alarm item of the alarm record meets the filtering conditions, so as to filter the above multiple alarm records. First obtain the effective threshold corresponding to each alarm item in the alarm record, and then compare the corresponding effective threshold for the alarm value of the alarm item in each alarm record. When there is an alarm value in the alarm record that does not meet the corresponding effective threshold In the case of, the alarm record does not meet the filtering conditions and is filtered out. On the contrary, if each alarm value in the alarm value meets the corresponding valid threshold, it is selected.
  • comparing the effective threshold to determine whether the alarm value meets the corresponding effective threshold means that when the alarm value under the alarm item in the alarm record is greater than or less than the corresponding effective threshold, corresponding to different alarm items, the corresponding judging whether the alarm value is The methods for meeting the corresponding effective thresholds are different, and the alarm thresholds corresponding to the alarm items are also different.
  • the alarm record is an invalid alarm record. Only when the alarm values under all alarm items in the alarm record are valid values Below, the alarm record is a valid alarm record.
  • the method for judging whether the alarm value under the alarm item is a valid alarm value can refer to the method described before in this embodiment, which will not be repeated here.
  • the alarm record is an invalid alarm record, only when there are at least a preset number of alarms under the alarm item in the alarm record.
  • the alarm record is a valid alarm record.
  • each alarm item corresponds to a weight, and when the total weight of the alarm items containing valid values in the alarm record exceeds the preset weight, the alarm record is a valid alarm record.
  • the support vector machine is trained using the above training samples, so that the support vector machine can fully learn the classification rules in the training samples, that is, the target alarm records in the training samples are first input into the support vector machine, and the support vector Machine analysis obtains the ideal service label of the alarm record, and the training sample contains the actual service label of the alarm record, that is, the correct service label manually labeled. Therefore, the ideal service label is inconsistent with the actual service label.
  • the loss function is used to calculate the error of the support vector machine for classification, and then the error is used to carry out the reverse training process of the support vector machine to modify the parameters in the support vector machine. Finally, all the alarm records in the training sample are used for the After the support vector machine is trained, the support vector machine becomes a mature business alarm model that can quickly and efficiently classify other arbitrary alarm records.
  • the target alarm record is analyzed using the service alarm model, thereby analyzing the service label of the target alarm record, and then generating the target alarm record and the target alarm record.
  • the alarm information of the service tag the alarm information is used to indicate that the service corresponding to the service tag of the target alarm record is abnormal, where the target alarm record is the alarm record obtained by the local service alarm device in real time.
  • the above-mentioned alarm information is sent to the terminal device of the operation and maintenance personnel through email or telephone or other means, so as to notify the operation and maintenance personnel to process or automatically associate and execute the corresponding error processing program.
  • the above-mentioned alarm information is sent to the terminal device of the operation and maintenance personnel through email or telephone or other means, so as to notify the operation and maintenance personnel to process or automatically associate and execute the corresponding error processing program.
  • the above-mentioned alarm information is sent to the terminal device of the operation and maintenance personnel through email or telephone or other means, so as to notify the operation and maintenance personnel to process or automatically associate and execute the corresponding error processing program.
  • determine whether the service tag of the target alarm record is an important service label and if so, execute the sending of the above-mentioned alarm information to the terminal equipment of the operation and maintenance personnel Operation.
  • the service alarm device at the local end may also receive multiple pieces of feedback information, and the multiple pieces of feedback information are target alarms marked by different users. Recorded business labels; determine the business label with the most occurrences in multiple pieces of feedback information, and use the business label with the most occurrences as the target business label of the above target alarm record; use the target business label to modify the above business alarm model .
  • any number of users can also manually label the service labels of the target alarm records, so the local service alarm device will receive multiple pieces of feedback information marked by different users, and the feedback information includes
  • the service alarm device determines the service label with the most occurrences in the received multiple pieces of feedback information, and uses the label with the most occurrences as the target service label of the target alarm record.
  • Use the target service label to train the service alarm model, thereby further revising the service alarm model, so that the service alarm model can subsequently analyze the service label of the alarm record more accurately and improve the accuracy of the analysis.
  • n1 pieces of feedback information are the first service label s1 and n2 pieces of feedback information are the second service label s2.
  • n3 pieces of feedback information for the third service label s3 the sum of n1, n2, and n3 is n, and n1 is greater than n2 and n3.
  • the service label that appears most frequently in the n pieces of feedback information is the first service label s1, Therefore, the first service label s1 is used as the target service label of the target alarm record, and the second service label s1 is used to train the service alarm model.
  • This application uses clustering to cluster multiple alarm records into alarm sets whose number is consistent with the number of services.
  • Each alarm set corresponds to a business label.
  • the alarm records in the same alarm set are used to alert the same business, and then
  • the alarm record and the service label of the alarm record are combined into a training sample to train the support vector machine to obtain the service alarm model.
  • the service alarm model is used to analyze the target alarm record, and then the service label of the target alarm record can be quickly obtained and generated The target alarm record and the alarm information of the service tag of the target alarm record.
  • the embodiment of the application first obtains training samples containing alarm records and service labels of the alarm records through clustering, and then uses the training samples to train the support vector machine to obtain the service alarm model, and finally uses the service alarm model to analyze the received target alarm
  • the business label of the record, and the business label of the target alarm record analyzed by the business alarm model indicates that the business corresponding to the business label of the target alarm record is abnormal, that is, the business corresponding to the business label is an abnormal business, and then it is generated that contains the The target alarm record and the alarm information of the business tag of the target alarm record to realize rapid alarm for abnormal services.
  • the service alarm method may include:
  • Each original alarm record contains multiple alarm items, and each alarm item contains alarm data.
  • the original alarm records of each source are uniformly collected into the database kafka through the distributed messaging system mongodb, so that multiple original alarm records are collected.
  • the multiple original alarm records are collected, the multiple original alarm records are digitally characterized, and the above multiple alarm records are obtained.
  • each original alarm record contains multiple alarm items, and each alarm item contains alarm data.
  • the original alarm record and the alarm item contained in the alarm record are the same, but the original alarm record is under the alarm item Corresponds to the alarm data, and the alarm item of the alarm record corresponds to the alarm value.
  • the alarm value under the alarm item of the above alarm record is obtained .
  • the alarm values of different alarm items can be expressed in different character forms.
  • the character forms include numerical values or text, etc., and it should be noted that some data does not require digital characterization processing, such as time and other dimensions. Data, and some data needs to be digitally characterized, such as conventional data such as CPU usage, network connectivity, and interface calls.
  • the alarm data under different alarm items in each of the multiple original alarm records is performed.
  • Digital characterization processing to obtain multiple alarm records.
  • the characterization rules include at least one of numerical positive normalization, hash calculation, and character encoding of the alarm data.
  • the digital characterization processing includes numerical positive normalization processing, At least one of hash calculation processing and character encoding processing.
  • clustering multiple alarm records to obtain the alarm set to which each of the multiple alarm records belongs refers to first determining the number of services to be warned as a preset number, and then according to The number of services clusters multiple alarm records to obtain a preset number of alarm sets.
  • the number of alarm sets is consistent with the number of services.
  • One alarm set corresponds to the business label of a business, and the business of alarm records in the same alarm set The labels are consistent.
  • clustering one of fuzzy C-means clustering, hierarchical clustering, density-based clustering, and k-value clustering algorithm (also known as kmeans algorithm) can be used.
  • kmeans algorithm also known as kmeans algorithm
  • the foregoing clustering of multiple alarm records according to the number of services to obtain a preset number of alarm sets means that according to the similar distance between any two alarm records in the multiple alarm records, A preset number of cluster centers are determined in the alarm record, and then an alarm set centered on each cluster center is determined to obtain a preset number of alarm sets.
  • the similarity distance can be one of Euclidean distance, Mahalanobis distance, Manhattan distance, and angle cosine.
  • the size of the similarity distance between alarm records reflects the degree of association between alarm records. The greater the similarity distance, the representative The higher the correlation between alarm records and the smaller the similarity distance, the lower the correlation between alarm records.
  • a training sample is generated.
  • the above-mentioned multiple alarm records and the service label of each alarm record of the above-mentioned multiple alarm records are combined to obtain training samples for subsequent training of the support vector machine.
  • the above-mentioned second method of generating training samples refers to obtaining the effective threshold corresponding to each alarm item in each alarm record.
  • Each alarm record contains multiple alarm items, and each alarm item contains There are alarm values; according to the effective threshold value corresponding to each alarm item in each alarm record, and the alarm value under each alarm item in each alarm record, the effective alarm records are filtered from multiple alarm records; generation contains The effective alarm record and the training sample of the service label of the effective alarm record.
  • the alarm record is an invalid alarm record. Only when the alarm values under all alarm items in the alarm record are valid values Below, the alarm record is a valid alarm record.
  • the method for judging whether the alarm value under the alarm item is a valid alarm value can refer to the method described before in this embodiment, which will not be repeated here.
  • the alarm record is an invalid alarm record, only when there are at least a preset number of alarms under the alarm item in the alarm record.
  • the alarm record is a valid alarm record.
  • each alarm item corresponds to a weight, and when the total weight of the alarm items containing valid values in the alarm record exceeds the preset weight, the alarm record is a valid alarm record.
  • any number of users can also manually label the service labels of the target alarm records, so the local service alarm device will receive multiple pieces of feedback information marked by different users, and the feedback information includes The service label marked by the user for the target alarm record.
  • the target service label determined in the above steps is used to train the service alarm model, thereby further revising the service alarm model, so that the service alarm model can subsequently analyze the service label of the alarm record more accurately. , Improve the accuracy of analysis.
  • the embodiment of this application provides a more detailed implementation process of the service alarm method in this application. It should be noted that the above description of the various embodiments tends to emphasize the differences between the various embodiments, and the similarities or similarities can be referred to each other. For the sake of brevity, details are not repeated herein.
  • An embodiment of the present application also provides a service alarm device, which includes a unit for executing any one of the foregoing service alarm methods.
  • a service alarm device which includes a unit for executing any one of the foregoing service alarm methods.
  • FIG. 4 it is a schematic block diagram of a service alarm device provided by an embodiment of the present application.
  • the service alarm device of this embodiment includes: a clustering unit 410, a generating unit 420, a training unit 430, an analysis unit 440, and an alarm unit 450. specific:
  • the clustering unit 410 is configured to cluster multiple alarm records to obtain the alarm set to which each alarm record of the multiple alarm records belongs. Different alarm sets correspond to different service tags, and the service tags are used to indicate the corresponding The alarm records in the alarm set of the alarm set are alarmed; the generating unit 420 is used to generate training samples according to the multiple alarm records and the service label of each alarm record of the multiple alarm records; the training unit 430 is used to Use the above training samples to train the support vector machine to obtain a business alarm model; the analysis unit 440 is used to analyze the target alarm record using the above business alarm model to obtain the business label of the target alarm record; the alarm unit 450 is used to generate The above-mentioned target alarm record and the alarm information of the service tag of the above-mentioned target alarm record.
  • the service alarm device further includes a determining unit 460 for determining the number of services to be warned; the clustering unit 410 is specifically configured to cluster the multiple alarm records according to the number of services to obtain A preset number of alarm sets, and the number of the above-mentioned alarm sets is consistent with the number of the above-mentioned services.
  • the clustering unit 410 is specifically configured to determine a preset number of cluster centers in the multiple alarm records according to the similar distance between any two alarm records in the multiple alarm records; Determine the alarm sets centered on each cluster center respectively, and obtain a preset number of alarm sets.
  • the above-mentioned generating unit 420 is specifically configured to obtain the effective threshold corresponding to each alarm item in each alarm record.
  • Each alarm record contains multiple alarm items, and each alarm item contains alarms. Numerical value;
  • the effective threshold corresponding to each alarm item in each of the above-mentioned alarm records, and the alarm value under each alarm item in each of the above-mentioned alarm records the effective alarm records are filtered from the above-mentioned multiple alarm records; A training sample containing the above-mentioned effective alarm record and the service label of the above-mentioned effective alarm record.
  • the above-mentioned business alarm equipment further includes a collection unit 470, which is used to collect multiple original alarm records into the database through the distributed message system.
  • Each original alarm record contains multiple alarm items, and each alarm item is Contains alarm data;
  • the preprocessing unit 480 is used to perform digital characterization processing on the alarm data under each alarm item in the above-mentioned multiple original alarm records, to obtain the above-mentioned multiple alarm records.
  • Each alarm record contains multiple alarm items, and each alarm item contains an alarm value.
  • the preprocessing unit 480 is specifically configured to perform processing on the alarm data under different alarm items in each of the multiple original alarm records according to the characterization rules corresponding to different alarm items.
  • the digital characterization process obtains the above-mentioned multiple alarm records, and the above-mentioned characterization rule includes at least one of performing numerical positive normalization on the alarm data, hash calculation, and character encoding.
  • the business alarm device further includes a receiving unit 490, which is configured to receive multiple pieces of feedback information, where the multiple pieces of feedback information are respectively the business tags of the target alarm records marked by different users; the business alarm device also includes The determining unit 460 is configured to determine the business tag with the most occurrences in the multiple pieces of feedback information, and use the business tag with the most occurrences as the target business tag of the target alarm record; the training unit 430 also It is used to modify the above-mentioned service alarm model by using the above-mentioned target service label.
  • the clustering unit in the service alarm device clusters multiple alarm records into alarm sets whose number is the same as the number of services through clustering. Each alarm set is used to alert one service, and the same
  • the alarm record in the alarm set corresponds to a service label
  • the generating unit combines the alarm record and the service label of the alarm record into a training sample.
  • the training unit uses the training sample to train the support vector machine to obtain the service alarm model, and finally analyzes and uses the service
  • the alarm model analyzes the target alarm record to quickly obtain the service label of the target alarm record, and the alarm unit generates alarm information containing the target alarm record and the service label of the target alarm record.
  • the embodiment of the application first obtains training samples containing alarm records and service labels of the alarm records through clustering, then uses the training samples to train the support vector machine to obtain the service alarm model, and finally uses the service alarm model to analyze other received alarms.
  • the business label of the record, and the business label of the target alarm record analyzed by the business alarm model indicates that the business corresponding to the business label of the target alarm record is abnormal, that is, the business corresponding to the business label is an abnormal business, and then it is generated that contains the The target alarm record and the alarm information of the service label of the target alarm record are used to realize rapid alarms for abnormal services.
  • FIG. 5 is a schematic block diagram of a service alarm device provided by another embodiment of the present application.
  • the service alarm device in this embodiment may include: a processor 510 and a memory 520.
  • the aforementioned processor 510 and memory 520 are connected through a bus 530. specific:
  • the processor 510 is configured to perform the function of the clustering unit 410, and is configured to cluster multiple alarm records to obtain the alarm set to which each alarm record of the multiple alarm records belongs, and different alarm sets correspond to different services Label, the above-mentioned service label is used to indicate the service alarmed by the alarm record in the corresponding alarm set; the processor 510 is used to execute the function of the generating unit 420, and is used for each of the above-mentioned multiple alarm records and the above-mentioned multiple alarm records.
  • the service labels of the alarm records are generated to obtain training samples; the processor 510 is used to execute the function of the training unit 430, and is used to train the support vector machine using the above training samples to obtain the service alarm model; the processor 510 is used to execute The function of the analysis unit 440 is used to analyze the target alarm record using the above-mentioned service alarm model to obtain the service label of the above-mentioned target alarm record; the processor 510 is used to execute the function of the alarm unit 450 and is used to generate the target alarm record and The alarm information of the business tag of the above target alarm record.
  • the processor 510 is further configured to perform the function of the determining unit 460, and is configured to determine the number of services to be warned; the processor 510 is specifically configured to aggregate the multiple alarm records according to the number of services. Class, a preset number of alarm sets are obtained, and the number of the above-mentioned alarm sets is consistent with the number of the above-mentioned services.
  • the processor 510 is specifically configured to determine a preset number of cluster centers in the multiple alarm records according to the similar distance between any two alarm records in the multiple alarm records; With each cluster center as the center of the alarm set, a preset number of alarm sets are obtained.
  • the processor 510 is specifically configured to obtain the effective threshold corresponding to each alarm item in each alarm record.
  • Each alarm record contains multiple alarm items, and each alarm item contains an alarm value. ;
  • the processor 510 is also used to perform the function of the collection unit 470, which is used to collect multiple original alarm records into the database through the distributed message system.
  • Each original alarm record contains multiple alarm items, each Each alarm item contains alarm data;
  • the processor 510 is also used to perform the function of the preprocessing unit 480, and is used to perform digital characterization processing on the alarm data under each alarm item in the multiple original alarm records, respectively, Obtain the foregoing multiple alarm records, each of the foregoing multiple alarm records includes multiple alarm items, and each alarm item includes an alarm value.
  • the processor 510 is specifically configured to perform digital characteristics on the alarm data under different alarm items in each of the multiple original alarm records according to the characterization rules corresponding to different alarm items.
  • the above-mentioned multiple alarm records are obtained through transformation processing, and the above-mentioned characterization rule includes at least one of performing numerical positive normalization on the above-mentioned alarm data, hash calculation, and character encoding.
  • the business alarm device further includes an input device 540, which is used to perform the function of the receiving unit 490, and is used to receive multiple pieces of feedback information.
  • the multiple pieces of feedback information are the service tags of the target alarm records marked by different users.
  • the processor 510 is also used to perform the function of the determining unit 460 for determining the business tag with the most occurrences in the multiple pieces of feedback information, and taking the business tag with the most occurrences as the target alarm record Target service label; the processor 510 is further configured to use the target service label to amend the service alarm model.
  • the processor 510 may be a central processing unit (Central Processing Unit, CPU), and the processor 510 may also be other general-purpose processors or digital signal processors (Digital Signal Processors, DSPs). ), Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components, etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the memory 520 may include a read-only memory and a random access memory, and provides instructions and data to the processor 510. A part of the memory 520 may also include a non-volatile random access memory. For example, the memory 520 may also store device type information.
  • the embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • a computer program is stored on which a computer program is stored.
  • the following method can be implemented: clustering multiple alarm records to obtain the alarm set to which each alarm record of the multiple alarm records belongs, and different alarm sets correspond to different business tags ,
  • the service label is used to indicate the service that is alerted by the alarm record in the corresponding alarm set; according to the multiple alarm records and the service label of each alarm record of the multiple alarm records, a training sample is generated; the training sample is used to support The vector machine is trained to obtain a service alarm model; the service alarm model is used to analyze the target alarm record to obtain the service label of the target alarm record, and to generate a service label including the target alarm record and the target alarm record Alarm information.
  • the computer program when executed by the processor, it can also implement other steps of the method in the foregoing embodiment, which will not be repeated here.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • the computer-readable storage medium may be an internal storage unit of the business alarm device of any of the foregoing embodiments, such as the hard disk or memory of the business alarm device.
  • the computer-readable storage medium can also be an external storage device of the business alarm device, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, and a flash memory equipped on the business alarm device. Card (Flash Card), etc.
  • the computer-readable storage medium may also include both the internal storage unit of the service alarm device and the external storage device.
  • the computer-readable storage medium is used to store computer programs and other programs and data required by the business alarm device.
  • the computer-readable storage medium can also be used to temporarily store data that has been output or will be output.
  • the processor 510 described in the embodiment of the present application can execute the implementation manners described in the second and third embodiments of the service alarm method provided in the embodiment of the present application, and can also execute the embodiment of the present application The implementation of the described service alarm device will not be repeated here.
  • the disclosed service alarm device and service alarm method can be implemented in other ways.
  • the device embodiments described above are only illustrative, for example, the division of units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may also be electrical, mechanical or other forms of connection.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments of the present application.
  • the functional units in the various embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this application is essentially or the part that contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium. It includes several instructions to make a computer device (which may be a personal computer, a business alarm device, or a network device, etc.) execute all or part of the steps of the methods in the various embodiments of the present application.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disks or optical disks and other media that can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Alarm Systems (AREA)

Abstract

一种业务告警方法、设备及存储介质,其中方法包括:对多条告警记录进行聚类得到该多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签(201);根据上述多条告警记录和上述多条告警记录的每条告警记录的业务标签,生成得到训练样本(202);利用上述训练样本对支持向量机进行训练,得到业务告警模型(203);利用上述业务告警模型对目标告警记录进行分析得到目标告警记录的业务标签,并生成包括目标告警记录和目标告警记录的业务标签的告警信息(204)。该方法先通过聚类来得到包含告警记录和告警记录的业务标签的训练样本,然后利用该训练样本训练支持向量机得到业务告警模型,最后利用业务告警模型分析目标告警记录的业务标签,并生成告警信息,从而实现对异常业务的快速告警。

Description

一种业务告警方法、设备及存储介质
本申请要求于2019年10月10日提交中国专利局、申请号为201910961590.7,发明名称为“一种业务告警方法、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据安全领域,尤其涉及一种业务告警方法、设备及存储介质。
背景技术
业务监控平台每天会接收到数万条告警记录,通过对告警记录进行分析,可以监测到出现异常的业务。
发明人意识到,目前对告警记录的分析方法主要是由人工对零碎单条的告警记录进行逐步排查,即系统运维人员根据经验来判断告警记录所告警的可能存在异常的业务。
可见,通过人工逐条排查告警记录来定位异常的业务是十分困难的,并且由于太依赖于系统运维人员的经验,因此效率不高。总的来说,还缺少一种高效的业务告警方法。
发明内容
本申请实施例提供一种业务告警方法,可以实现对异常业务的快速告警。
第一方面,本申请实施例提供了一种业务告警方法,该方法包括:
对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,所述业务标签用于指示对应的告警集合中的告警记录所告警的业务;
根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本;
利用所述训练样本对支持向量机进行训练,得到业务告警模型;
利用所述业务告警模型对目标告警记录进行分析得到所述目标告警记录的业务标签,并生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息。
第二方面,本申请实施例提供了一种业务告警设备,该业务告警设备包括用于执行上述第一方面的业务告警方法的单元,该业务告警设备包括:
聚类单元,用于对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,所述业务标签用于指示对应的告警集合中的告警记录所告警的业务;
生成单元,用于根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本;
训练单元,用于利用所述训练样本对支持向量机进行训练,得到业务告警模型;
分析单元,用于利用所述业务告警模型对目标告警记录进行分析得到所述目标告警记录的业务标签;
告警单元,用于生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息。
第三方面,本申请实施例提供了一种业务告警设备,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,用以执行以下方法:
对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,所述业务标签用于指示对应的告警集合中的告警记录所告警的业务;
根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本;
利用所述训练样本对支持向量机进行训练,得到业务告警模型;
利用所述业务告警模型对目标告警记录进行分析得到所述目标告警记录的业务标签,并生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行,用以执行以下方法:
对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,所述业务标签用于指示对应的告警集合中的告警记录所告警的业务;
根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本;
利用所述训练样本对支持向量机进行训练,得到业务告警模型;
利用所述业务告警模型对目标告警记录进行分析得到所述目标告警记录的业务标签,并生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息。
本申请能够实现对异常业务的快速告警。
附图说明
为了更清楚地说明本申请实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍。
图1是本申请实施例提供的一种业务告警系统示意图;
图2是本申请实施例提供的一种业务告警方法的示意流程图;
图3是本申请另一实施例提供的一种业务告警方法的示意流程图;
图4是本申请实施例提供的一种业务告警设备的示意性框图;
图5是本申请实施例提供的一种业务告警设备的结构性框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述。
本申请的技术方案可应用于人工智能、区块链和/或大数据技术领域,涉及的数据如告警记录、告警信息等可存储于数据库中,或者可以存储于区块链中。例如,本申请的业务告警方法可应用于业务告警设备,该业务告警设备可以为区块链中的节点。
本申请主要应用于业务告警设备,该业务告警设备可以是传统业务告警设备、终端设备、服务器或者本申请第三实施例和第四实施例所描述的业务告警设备,本申请对此不做限制。当业务告警设备与其他设备进行数据交互的时候,业务告警设备和其他设备都按照预设格式进行对数据的特性进行记录并传送,并能对接收到的预设格式的数据进行相应的处理和解析等,其中,数据的特性包括时间、地点、类型等。
业务监控平台每天会接收到大量的告警记录,如果通过人工对零碎单条的告警记录逐步排查来发现存在异常的线上业务,效率十分的低,于是目前还缺少一种高效的业务告警方法。
为此解决上述问题,本申请实施例提供一种业务告警方法,可以针对业务实现自动高效的异常检测。具体的,先获取多条告警记录,根据需要告警的业务的数量对该多条告警记录进行聚类,得到数量与业务的数量一致的告警集合,同个告警集合中的告警记录用于对同个业务进行告警,因此给同个告警集合中的告警记录打上同样的业务标签,最后生成包含上述多个告警记录以及该多个告警记录分别对应的业务标签的训练样本。利用训练样本对支持向量机(SVM,Support Vector Machine)进行训练,得到一个可以对任意告警记录进行准确分类的告警分类模型。然后实时获取目标告警记录,并利用上述告警分类模型 对该目标告警记录进行分类,从而得到该目标告警记录的业务标签。最后生成包括目标告警记录以及目标告警记录业务标签的告警信息,并将该告警信息发送给运维人员,使得运维人员根据该告警信息对业务进行异常核查以及维护等。
需要说明的是,SVM是一种分类模型,通过将原来线性不可分的数据点映射到一个新的空间,转换为在新空间中线性可分数据来实现分类,且SVM的分类速度和分类效率优于传统分类方法。本申请实施例中利用训练样本来对SVM进行训练,使得SVM可以充分学习训练样本中包含的分类规律,从而得到可以对任意告警记录直接进行分类的业务告警模型。可见,SVM可用于线性/非线性分类,也可以用于回归,泛化错误率低,具有良好的学习能力,训练的结果具有很好的推广性。
可见,本申请不需要人工逐条对告警记录进行核查,而是通过机器学习充分利用历史告警记录,在减少运维人员分析和故障排查时间的情况下,自动对告警记录进行分析,从而大大改善了运维质量以及提高了运维效率。
为了能够更好地理解本申请实施例,下面将结合图1,对应用本申请实施例的方法进行介绍,本申请实施例可以应用于检测业务异常的场景中。
业务告警设备可以从云端服务器中获取多条告警记录,也可以从其他终端设备中获取多条告警记录,本申请实施例对此不作限定。为了便于理解,接下来本申请将结合图1,以业务告警设备从云端服务器中获取多条告警记录为例,来对上述过程进行具体说明。
具体的,业务告警设备先进行数据采集得到多条告警记录,然后通过聚类的方式,将多条告警记录分类为多个告警集合,每个告警集合对应一个业务标签,通过该业务标签可以确定上述多条告警记录分别所告警的业务,最后利用该多个告警记录以及每个告警记录分别对应的业务标签所组成的训练样本来对支持向量机进行训练,得到能够对任意告警记录的业务标签进行准确分析的业务告警模型,当业务告警设备获取得到目标告警记录时,便利用该业务告警模型对目标告警记录进行分析,得到目标告警记录的业务标签,并生成包含该目标告警记录以及该目标告警记录的业务标签的告警信息,以警示业务维护人员进行查看和采取相关的应对措施。
在采集告警记录时,业务告警设备可以将多条告警记录采集到数据库,也可以是先将多条原始告警记录采集到数据库,然后再对该多条原始告警记录进行数字特征化处理,以在保留原始告警记录的数据特征的情况下减少原始告警记录的数据量,从而得到多条告警记录,其中,原始告警记录包含多条告警项,并且数字特征化处理之后得到告警记录中的告警项并无变化,只是原始告警记录中的告警项下对应是告警数据,该告警数据以数值或文字等字符形式表示,不同的告警项下的告警数据可以用不同的字符形式表示,在对原始告警记录进行数字特征化处理时,实际上是对原始告警记录中的告警项下的告警数据进行数字特征化处理。具体的,按照不同的告警项对应的特征化规则,对多条原始告警记录中的每条原始告警记录中的不同告警项下的告警数据进行数字特征化处理得到多条告警记录,举例来说,对告警项“中央处理器使用率”下的告警数据进行数值正整化,对告警项“接口调用”下的告警数据进行哈希计算,对告警项“网络连通”下的告警数据进行字符编码。
上述对多条告警记录进行聚类指的是,直接使用模糊C均值聚类、层次聚类、基于密度的聚类以及k值聚类算法(也称为kmeans算法)中的任意一种聚类算法对上述多个告警记录进行聚类得到多个告警集合,或者先确定待预警的业务的数量为预设数量,然后根据多条告警记录中任意两条告警记录之间的相似距离在多条告警记录中确定出与业务的数量一致的预设数量的聚类中心,并确定分别以每条聚类中心为中心的告警集合,得数量与业务的数量一致的告警集合,即告警集合的数量也为预设数量。其中,确定聚类中心的过程是一个多次聚类的过程,具体的,先在获取得到的多条告警记录中随机选出预设数量的告警记录作为中心对上述获取的多条告警记录进行聚类,得到预设数量的告警集合,然后确 定每个告警集合中作为实际中心的告警记录,于是确定得到预设数量的为实际中心的告警记录,再然后以该预设数量的为实际中心的告警记录为中心对上述获取的多条告警记录进行聚类,得到预设数量的新的告警集合,再然后又确定出该预设数量的新的告警集合中作为实际中心的告警记录,并以该新的告警集合中为实际中心的告警记录为中心对上述获取的多条告警记录进行聚类.....直到聚类得到的告警集合中作为实际中心的告警记录不再变化,则将最终确定得到的作为实际中心的告警记录作为聚类中心,得到预设数量的聚类中心,然后以该预设数量的聚类中心为中心对上述获取的多个告警记录进行聚类得到预设数量的告警集合,最后根据人工的标注,确定每个告警集合对应的业务标签,相同告警集合中的告警记录的业务标签相同。
需要说明的是,上述对获取的多个告警记录进行聚类指的是,先计算得到各个告警记录分别与作为中心的告警集合之间相似距离,然后将各个告警记录与其相似距离最近的中心划分到一起。上述确定每个告警集合中作为实际中心的告警记录指的是,计算告警集合中每个告警集合与该告警集合中的其他告警记录之间的平均相似距离,将平均相似距离最小的告警记录作为该告警集合中的实际中心。
上述生成训练样本指的是,将获取得到的多条告警记录以及每条告警记录的业务标签组合在一起得到训练样本。此外,还有另一种生成训练样本的方法是,筛选出上述多条告警记录中的有效告警记录,并将该有效告警记录以及该有效告警记录的业务标签组合在一起得到训练样本,具体的,获取每条告警记录中的每个告警项分别对应的有效阈值,然后根据每条告警记录中的每个告警项分别对应的有效阈值,以及每条告警记录中每个告警项下的告警数值,从多条告警记录中筛选出有效告警记录,最后生成包含该有效告警记录和该有效告警记录的业务标签的训练样本。可见,若采用后一种生成训练样本的方法,则生成的训练样本的质量更高,也使得最终训练得到的业务告警模型的分析效率更高。
需要说明的是,若告警记录中存在告警项下的告警数值不为有效数值,则该告警记录为无效告警记录,只有当告警记录中的所有告警项下的告警数值都为有效数值的情况下,该告警记录才为有效告警记录,其中,判断告警项下的告警数值是否为有效告警数值的方法为,在告警项下的告警数值是否满足于该告警项对应的有效阈值。其中,对应于部分告警项,当告警项下的告警数值大于等于该告警项对应的告警数值时,该告警项下的告警数值满足于该告警项对应的有效阈值,对应于另一部分告警项,当告警项下的告警数值小于等于该告警项对应的告警数值时,该告警项下的告警数值满足于该告警项对应的有效阈值。
在利用上述训练样本训练得到业务告警模型之后,还可以在后续对业务告警模型继续进行训练,以对业务告警模型进行修正。具体的,在业务告警模型分析得到目标告警记录的业务标签之后,接收多条反馈信息,该多条反馈信息分别为不同用户标注的目标告警记录的业务标签,确定出在该多条反馈信息中的出现次数最多的业务标签,并将该出现次数最多的业务标签作为目标告警记录的目标业务标签,最后利用该目标业务标签对业务告警模型进行修正,以实现对业务告警模型的修正过程。需要说明的是,业务告警模型的修正过程的实现可以参考上述业务告警模型的训练过程,只是业务告警模型的训练过程旨在得到一个可以应用于广泛不同的业务告警场景的业务告警模型,而后续的修正过程相当于根据当前所应用的业务告警场景对业务告警模型进行适应性的修改,使得业务告警模型根据实际的使用得到比较好的优化,更适用于解决当前的业务告警问题,例如多次利用业务标签为网络连接业务的告警记录对业务告警模型进行修正,得到的业务告警模型则会更准确的对网络连接业务的异常进行告警,于是前后两次分别对业务告警模型的训练和修改的意义不同。
可见,本申请实施例先通过聚类来获得训练样本,并利用该训练样本来对支持向量机进行训练得到业务告警模型,最后利用业务告警模型对目标告警记录进行分析,便可以快 速得到目标告警记录的业务标签,并生成包含该目标告警记录和该目标告警记录的业务标签的告警信息,由于业务告警模型分析出的目标告警记录的业务标签指示了目标告警记录的业务标签所对应的业务出现了异常,即该业务标签对应的业务为异常业务,于是生成的告警信息实现了对异常业务的快速告警。
需要说明的是,图1中所示内容为一种举例,并不构成对本申请实施例的限定。因为在本申请中,业务告警设备可以从任意数量的其他设备中获取任意数量的告警记录。
参见图2,是本申请实施例提供一种业务告警方法的示意流程图,如图2所示业务告警方法可包括:
201:对多条告警记录进行聚类得到该多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签。
在本申请实施例中,业务告警设备先将多条告警记录采集到业务告警设备的数据库,然后获取多条告警记录,并对该多条告警记录进行聚类,以将该多个告警记录划分为多个告警集合,每个告警集合对应于一个业务标签,于是同个告警集合中的告警记录的业务标签一致,业务标签用于指示对应的告警集合中的告警记录所告警的业务,业务为不同网点的网络通信业务,一个网点的网络通信业务对应于一个业务标签,业务与业务标签之间的对应关系可以通过查询业务与业务标签之前的映射关系表得到。告警记录用于描述业务运行情况,每条告警记录中包含多条告警项,告警项为业务的详细运行项目,每个告警项下包含有告警数值,告警数值为告警项所指示的项目的具体数值,告警项例如有时间、机房/网络区域、系统、应用名称、节点、主机名称/互联网协议地址(IP,Internet Protocol Address)、中央处理器(CPU,Central Processing Unit)使用率、网络中断/延时、网络连通、磁盘空间/输入输出(IO,Input Output)、接口调用、告警等级、业务影响、上游系统、下游系统、处理预案以及处理结果等。需要说明的是,告警项下的告警数值可以为包括空值的任意值,且部分告警项之间存在固定的对应关系,该对应关系预先被保存在数据库的字典中,当其中一个告警项下有确定的告警数值时,与该告警项存在固定的对应关系的告警项下的告警数值也是确定的,例如“上游系统”和“下游系统”两个告警项之间存在固定的对应关系,在“上游系统”下的告警数值确定的情况下,“下游系统”下的告警数值也确定,即“下游系统”下的告警数值为数据库中的字典中“上游系统”下的告警数值所对应的“下游系统”下的告警数值。
进一步的,上述对多条告警记录进行聚类得到多条告警记录中的每条告警记录所属的告警集合指的是,先确定待预警的业务的数量为预设数量,然后根据业务的数量对多条告警记录进行聚类,得到预设数量的告警集合,告警集合的数量与业务的数量一致,一个告警集合对应一个业务的业务标签,同个告警集合中的告警记录的业务标签一致。其中,在聚类的时候,可以采用模糊C均值聚类、层次聚类、基于密度的聚类以及k值聚类算法(也称为kmeans算法)中的一种聚类方法,本申请实施例对此不做限定。
可选的,上述根据业务的数量对多条告警记录进行聚类,得到预设数量的告警集合指的是,根据上述多条告警记录中任意两条告警记录之间的相似距离,在多条告警记录中确定出预设数量的聚类中心,然后确定分别以每条聚类中心为中心的告警集合,得到预设数量的告警集合。其中,相似距离可以是欧式距离、马氏距离、曼哈顿距离和夹角余弦中的一种,告警记录之间的相似距离的大小,反映了告警记录之间的关联度,相似距离越大,代表告警记录之间的关联度越高,相似距离越小,代表告警记录之间的关联度越低。
更具体的,假设业务有p个维度(例如区域、系统、应用节点以及IP段等),每个维度对应的业务数量分别有p1、p2、.......pp个,于是总共有p1*p2*......*pp=n个业务,业务告警设备先从m(m>n)条告警记录中随机确定出n条告警记录作为聚类的中心,然后计算剩余的(m-n)条告警记录分别与该n条中心之间的相似距离,将上述剩余的(m-n)条 告警记录分别与其相似距离最小的中心划分到一个告警集合内,于是得到分别包含以上述n条中心的告警集合(假设上述m条告警记录中的第i条告警记录分别与上述n条中心之间的相似距离为a1、a2......an,其中最小的相似距离为a2,于是将该第i条告警记录与第2个中心划分到一个告警集合中,直到将(m-n)条告警记录划分到n个告警集合中)。在得到n个告警集合之后,确定该n个告警集合的实际中心,计算每个告警集合中与该告警集合中其他告警记录的平均相似距离最小的告警记录作为该告警集合的实际中心(假设目标告警集合中包含第i条告警记录,以及该第i条告警记录与该目标告警集合中的其他告警记录之间的相似距离分别为b1、b2......bj,于是上述第i条告警记录与目标告警集合中的其他告警记录之间的平均相似距离为(b1+b2......+bj)/j,参考该计算平均相似距离的方法,可以确定出目标告警集合中的所有告警记录的平均相似距离,并将其中平均相似距离最小的告警记录作为目标告警集合的实际中心),并将该告警集合的实际中心作为上述m条告警记录的中心,重新对该m条告警记录进行聚类得到n个新的告警集合,再在该每个新的告警集合中确定出新的实际中心.......重复上述以告警集合的实际中心为中心对m条告警记录进行聚合得到n个告警集合,以及重新确定该n个告警集合中的实际中心的过程,直到无论再经过几次上述的过程,得到的n个稳定的告警集合,其实际中心都不再变化,且其包含的告警记录也不再变化。最后将该n个稳定的告警集合中的实际中心作为n个聚类中心,并分别以该n个聚类中心为中心对上述m条告警记录进行聚类,得到n个稳定的告警集合。
可选的,在对多个告警记录进行聚类之前,先将各个来源的告警记录通过分布式消息系统kafka统一采集到数据库mongodb,从而采集得到上述多条告警记录。需要说明的是,上述数据库mongodb是一种基于内存的、支持分片的,且具有快速检索速度和高并发访问的数据库。分布式消息系统kafka是一个分布式、支持分区的、多副本的,基于分布式锁的分布式消息系统,它的最大的特性就是可以实时的处理大量数据以满足各种需求场景。
可选的,在对多个告警记录进行聚类之前,将各个来源的原始告警记录通过分布式消息系统mongodb统一采集到数据库kafka,从而采集得到多条原始告警记录,然后对该多条原始告警记录进行数字特征化处理之后,才得到上述多条告警记录。具体的,每条原始告警记录中包含多条告警项,每个告警项下包含有告警数据,实际上原始告警记录与告警记录中包含的告警项都是一致,只是原始告警记录的告警项下对应的是告警数据,而告警记录的告警项下对应的是告警数值,在将原始告警记录的告警项下的告警数据进行数字特征化处理之后,才得到上述告警记录的告警项下的告警数值。其中,在原始告警记录中,不同的告警项的告警数值可以用不同字符形式表示,字符形式包括数值或文字等,且需要注意的是,有些数据不需要进行数字特征化处理,例如时间等维度数据,而有些数据则需要进行数字特征化处理,例如中央处理器使用率、网络连通和接口调用等常规数据。
需要说明的是,数字特征化处理用于按照数字特征化规则对原始告警记录进行处理,使得处理后得到的告警记录不仅能够在保留原始告警记录的数据特征的情况下,减少原始告警记录的数据量。具体的,在对原始告警记录进行数字特征化处理时,按照不同的告警项对应的特征化规则,对多条原始告警记录中的每条原始告警记录中的不同告警项下的告警数据进行数字特征化处理,得到多条告警记录,特征化规则包括对告警数据进行数值正整化,哈希计算以及字符编码中的至少一种,相应的,数字特征化处理包括数值正整化处理,哈希计算处理以及字符编码处理中的至少一种。其中,数值正整化处理指的是将告警数据变换为正整数;哈希计算处理指的是将告警数据变换为哈希值;字符编码处理指的是将告警数据变换为数字编码。还需要注意的是,字符编码用于将不便于传输的告警数据,编码为便于传输的告警数值。告警数据与告警数值之间存在映射关系。
举例来说,告警项“中央处理器使用率”对应的特征化规则为对告警数据进行数值正整化,因此对原始告警记录中的告警项“中央处理器使用率”下的告警数据“96%”进行数值正整 化处理,即将中央处理器使用率转换为正整数,得到告警数值“96”;告警项“接口调用”对应的特征化规则为对告警数据进行哈希计算,因此对原始告警记录中的告警项“接口调用”下的告警数据“384592546”进行哈希计算处理,得到告警数值“83c278845f00450c4222da1a4e35f408”;告警项“网络连通”对应的特征化规则为对告警数据进行字符编码,因此对原始告警记录中的告警项“网络连通”下的告警数据“接收到ping端的数据包”进行字符编码处理,得到告警数值“1”,相应的,若原始告警记录中的告警项“网络连通”下的告警数据未“未接收到ping端的数据包”,则字符编码处理之后,得到告警数值“0”。
202:根据上述多条告警记录和上述多条告警记录的每条告警记录的业务标签,生成得到训练样本。
在本申请实施例中,将上述多条告警记录和上述多条告警记录的每条告警记录的业务标签组合在一起,得到用于后续训练支持向量机的训练样本。
可选的,上述根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本指的是,对上述多条告警记录进行筛选,然后将满足筛选条件的告警记录以及该满足筛选条件的告警记录的业务标签组合在一起,得到用于后续训练支持向量机的训练样本。
具体的,获取每条告警记录中的每个告警项分别对应的有效阈值,每条告警记录中包含多条告警项,每个告警项下包含有告警数值;根据每条告警记录中的每个告警项分别对应的有效阈值,以及每条告警记录中每个告警项下的告警数值,从多条告警记录中筛选出有效告警记录;生成包含该有效告警记录和所述有效告警记录的业务标签的训练样本。
在本申请实施例中,筛选出上述多条告警记录中的有效告警记录,并将该有效告警记录以及该有效告警记录的业务标签组合在一起得到训练样本,具体的,获取每条告警记录中的每个告警项分别对应的有效阈值,然后根据每条告警记录中的每个告警项分别对应的有效阈值,以及每条告警记录中每个告警项下的告警数值,从多条告警记录中筛选出有效告警记录,最后生成包含该有效告警记录和该有效告警记录的业务标签的训练样本。
更具体的,告警记录中的每个告警项都分别对应有一个有效阈值,根据该有效阈值可以判断告警记录的告警项下的数据是否满足筛选条件,从而对上述多个告警记录进行筛选。先获取告警记录中的每个告警项分别对应的有效阈值,然后针对每条告警记录中的告警项的告警数值,对比对应的有效阈值,当告警记录中存在一个告警数值不满足对应的有效阈值的情况下,该告警记录则不满足筛选条件,于是被筛除,反之,如果告警数值中的每个告警数值都满足对应的有效阈值的情况下,于是被选中。其中,对比有效阈值判断告警数值是否满足对应的有效阈值指的是,当告警记录中告警项下的告警数值满足大于或者小于对应的有效阈值,对应于不同的告警项,对应的判断告警数值是否满足对应的有效阈值的方法不同,以及告警项分别对应的告警阈值也不同。
需要注意的是,若告警记录中存在至少一个告警项下的告警数值为有效数值,则该告警记录为无效告警记录,只有当告警记录中的所有告警项下的告警数值都为有效数值的情况下,该告警记录才为有效告警记录,其中,判断告警项下的告警数值是否为有效告警数值的方法可以参照本实施例之前所描述的方法,此处不再赘述。
可选的,若告警记录中存在少于预设数量的告警项下的告警数值为有效数值,则该告警记录为无效告警记录,只有当告警记录中存在至少预设数量的告警项下的告警数值为有效数值的情况下,该告警记录为有效告警记录。或者,每个告警项对应有权重,当告警记录中的包含有效数值的告警项的权重总和超过预设权重时,该告警记录为有效告警记录。
203:利用上述训练样本对支持向量机进行训练,得到业务告警模型。
在本申请实施例中,利用上述训练样本对支持向量机进行训练,使得支持向量机充分学习训练样本中的分类规律,即先将训练样本中的目标告警记录输入到支持向量机中,支 持向量机分析得到告警记录的理想的业务标签,而训练样本中包含的是告警记录的实际的业务标签,也即是人工标注的正确的业务标签,因此在理想的业务标签与实际的业务标签不一致的情况下,利用损失函数计算支持向量机进行分类的误差,然后利用该误差对支持向量机进行逆向的训练过程,以修改支持向量机中的参数,最后在使用训练样本中的所有告警记录对该支持向量机进行训练之后,该支持向量机便成为了一个能够对其他任意告警记录进行快速且高效分类的成熟的业务告警模型。
204:利用上述业务告警模型对目标告警记录进行分析得到目标告警记录的业务标签,并生成包括目标告警记录和目标告警记录的业务标签的告警信息。
在本申请实施例中,在训练得到业务告警模型之后,利用该业务告警模型对目标告警记录进行分析,从而分析得到目标告警记录的业务标签,然后生成包含有目标告警记录和该目标告警记录的业务标签的告警信息,告警信息用于指示该目标告警记录的业务标签对应的业务出现了异常,其中,目标告警记录是本端的业务告警设备实时的获取得到的告警记录。
进一步的,通过邮件电话或其它方式通知将上述告警信息发送给运维人员的终端设备,以通知运维人员处理或自动关联执行对应的错误处理程序。在另一种方式中,在将上述告警信息发送给运维人员的终端设备之前,判断目标告警记录的业务标签是否为重要业务标签,若是则执行将上述告警信息发送给运维人员的终端设备的操作。
进一步的,上述生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息之后,本端的业务告警设备还可以接收多条反馈信息,多条反馈信息分别为不同用户标注的目标告警记录的业务标签;确定出在多条反馈信息中的出现次数最多的业务标签,并将出现次数最多的业务标签作为上述目标告警记录的目标业务标签;利用目标业务标签对上述业务告警模型进行修正。
在本申请实施例中,任意多个用户还可以对上述目标告警记录的业务标签进行人工标注,于是本端的业务告警设备会接收到多条来自于不同用户标注的反馈信息,该反馈信息中包含了用户为目标告警记录标注的业务标签,业务告警设备确定出在接收到的多条反馈信息中出现次数最多的业务标签,并将该出现次数最多的标签作为目标告警记录的目标业务标签,最后利用该目标业务标签来对业务告警模型进行训练,从而进一步对业务告警模型进行修正,使得业务告警模型后续能够更准确的分析出告警记录的业务标签,提高分析的准确度。
举例来说,假设本端的业务告警设备接收到用户为目标告警记录标注的n条反馈信息,其中,有n1条反馈信息为第一业务标签s1,有n2条反馈信息为第二业务标签s2,有n3条反馈信息为第三业务标签s3,n1、n2以及n3的总和为n,且n1大于n2以及n3,可见在该n条反馈信息中出现次数最多的业务标签为第一业务标签s1,因此将第一业务标签s1作为目标告警记录的目标业务标签,于是利用第二业务标签s1对业务告警模型进行训练。
本申请通过聚类将多条告警记录聚类成数量与业务的数量一致的告警集合,每个告警集合对应于一个业务标签,同个告警集合中的告警记录用于告警同一个业务,然后将告警记录和告警记录的业务标签组合为训练样本来对支持向量机进行训练得到业务告警模型,最后利用业务告警模型对目标告警记录进行分析,便可以快速得到目标告警记录的业务标签,并生成包含该目标告警记录和该目标告警记录的业务标签的告警信息。可见,本申请实施例先通过聚类来得到包含告警记录和告警记录的业务标签的训练样本,然后利用该训练样本训练支持向量机得到业务告警模型,最后利用业务告警模型分析接收到的目标告警记录的业务标签,而业务告警模型分析出的目标告警记录的业务标签指示了目标告警记录的业务标签所对应的业务出现了异常,即该业务标签对应的业务为异常业务,然后生成包含有该目标告警记录以及该目标告警记录的业务标签的告警信息,以实现对异常业务的快 速告警。
参见图3,是本申请实施例提供另一种业务告警方法的示意流程图,如图3所示业务告警方法可包括:
301:通过分布式消息系统将多条原始告警记录采集到数据库,每条原始告警记录中包含多条告警项,每个告警项下包含有告警数据。
在本申请实施例中,在对多个告警记录进行聚类之前,将各个来源的原始告警记录通过分布式消息系统mongodb统一采集到数据库kafka,从而采集得到多条原始告警记录。
302:对上述多条原始告警记录中的每个告警项下的告警数据分别进行数字特征化处理,得到多条告警记录。
在本申请实施例中,在采集到多条原始告警记录之后,对该多条原始告警记录进行数字特征化处理之后,才得到上述多条告警记录。
具体的,每条原始告警记录中包含多条告警项,每个告警项下包含有告警数据,实际上原始告警记录与告警记录中包含的告警项都是一致,只是原始告警记录的告警项下对应的是告警数据,而告警记录的告警项下对应的是告警数值,在将原始告警记录的告警项下的告警数据进行数字特征化处理之后,才得到上述告警记录的告警项下的告警数值。其中,在原始告警记录中,不同的告警项的告警数值可以用不同字符形式表示,字符形式包括数值或文字等,且需要注意的是,有些数据不需要进行数字特征化处理,例如时间等维度数据,而有些数据则需要进行数字特征化处理,例如中央处理器使用率、网络连通和接口调用等常规数据。
更具体的,在对原始告警记录进行数字特征化处理时,按照不同的告警项对应的特征化规则,对多条原始告警记录中的每条原始告警记录中的不同告警项下的告警数据进行数字特征化处理,得到多条告警记录,特征化规则包括对告警数据进行数值正整化,哈希计算以及字符编码中的至少一种,相应的,数字特征化处理包括数值正整化处理,哈希计算处理以及字符编码处理中的至少一种。
303:对上述多条告警记录进行聚类得到多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签。
在本申请实施例中,上述对多条告警记录进行聚类得到多条告警记录中的每条告警记录所属的告警集合指的是,先确定待预警的业务的数量为预设数量,然后根据业务的数量对多条告警记录进行聚类,得到预设数量的告警集合,告警集合的数量与业务的数量一致,一个告警集合对应一个业务的业务标签,同个告警集合中的告警记录的业务标签一致。其中,在聚类的时候,可以采用模糊C均值聚类、层次聚类、基于密度的聚类以及k值聚类算法(也称为kmeans算法)中的一种聚类方法,本申请实施例对此不做限定。
可选的,上述根据业务的数量对多条告警记录进行聚类,得到预设数量的告警集合指的是,根据上述多条告警记录中任意两条告警记录之间的相似距离,在多条告警记录中确定出预设数量的聚类中心,然后确定分别以每条聚类中心为中心的告警集合,得到预设数量的告警集合。其中,相似距离可以是欧式距离、马氏距离、曼哈顿距离和夹角余弦中的一种,告警记录之间的相似距离的大小,反映了告警记录之间的关联度,相似距离越大,代表告警记录之间的关联度越高,相似距离越小,代表告警记录之间的关联度越低。
304:根据上述多条告警记录和多条告警记录的每条告警记录的业务标签,生成得到训练样本。
在本申请实施例中,将上述多条告警记录和上述多条告警记录的每条告警记录的业务标签组合在一起,得到用于后续训练支持向量机的训练样本。或者,对上述多条告警记录进行筛选,然后将满足筛选条件的告警记录以及该满足筛选条件的告警记录的业务标签组合在一起,得到用于后续训练支持向量机的训练样本。
具体的,上述第二种生成训练样本的方式指的是,获取每条告警记录中的每个告警项分别对应的有效阈值,每条告警记录中包含多条告警项,每个告警项下包含有告警数值;根据每条告警记录中的每个告警项分别对应的有效阈值,以及每条告警记录中每个告警项下的告警数值,从多条告警记录中筛选出有效告警记录;生成包含该有效告警记录和所述有效告警记录的业务标签的训练样本。
需要注意的是,若告警记录中存在至少一个告警项下的告警数值为有效数值,则该告警记录为无效告警记录,只有当告警记录中的所有告警项下的告警数值都为有效数值的情况下,该告警记录才为有效告警记录,其中,判断告警项下的告警数值是否为有效告警数值的方法可以参照本实施例之前所描述的方法,此处不再赘述。
可选的,若告警记录中存在少于预设数量的告警项下的告警数值为有效数值,则该告警记录为无效告警记录,只有当告警记录中存在至少预设数量的告警项下的告警数值为有效数值的情况下,该告警记录为有效告警记录。或者,每个告警项对应有权重,当告警记录中的包含有效数值的告警项的权重总和超过预设权重时,该告警记录为有效告警记录。
305:利用上述训练样本对支持向量机进行训练,得到业务告警模型。
306:利用上述业务告警模型对目标告警记录进行分析得到目标告警记录的业务标签,并生成包括目标告警记录和目标告警记录的业务标签的告警信息。
307:接收多条反馈信息,多条反馈信息分别为不同用户标注的目标告警记录的业务标签。
在本申请实施例中,任意多个用户还可以对上述目标告警记录的业务标签进行人工标注,于是本端的业务告警设备会接收到多条来自于不同用户标注的反馈信息,该反馈信息中包含了用户为目标告警记录标注的业务标签。
308:确定出在上述多条反馈信息中的出现次数最多的业务标签,并将该出现次数最多的业务标签作为目标告警记录的目标业务标签。
309:利用目标业务标签对业务告警模型进行修正。
在本申请实施例中,利用上述步骤中确定得到的目标业务标签来对业务告警模型进行训练,从而进一步对业务告警模型进行修正,使得业务告警模型后续能够更准确的分析出告警记录的业务标签,提高分析的准确度。
本申请实施例在第一实施例中的基础上,更加详细的本申请中的业务告警方法的实现过程。需要说明的是,上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考,为了简洁,本文不再赘述。
本申请实施例还提供一种业务告警设备,该业务告警设备包含用于执行前述任一项的业务告警方法的单元。具体地,参见图4,是本申请实施例提供的一种业务告警设备的示意框图。本实施例的业务告警设备包括:聚类单元410、生成单元420、训练单元430、分析单元440以及告警单元450。具体的:
聚类单元410,用于对多条告警记录进行聚类得到上述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,上述业务标签用于指示对应的告警集合中的告警记录所告警的业务;生成单元420,用于根据上述多条告警记录和上述多条告警记录的每条告警记录的业务标签,生成得到训练样本;训练单元430,用于利用上述训练样本对支持向量机进行训练,得到业务告警模型;分析单元440,用于利用上述业务告警模型对目标告警记录进行分析得到上述目标告警记录的业务标签;告警单元450,用于生成包括上述目标告警记录和上述目标告警记录的业务标签的告警信息。
在一种实施中,业务告警设备还包括确定单元460,用于确定待预警的业务的数量;上述聚类单元410,具体用于根据上述业务的数量对上述多条告警记录进行聚类,得到预设数量的告警集合,上述告警集合的数量与上述业务的数量一致。
在一种实施中,上述聚类单元410,具体用于根据上述多条告警记录中任意两条告警记录之间的相似距离,在上述多条告警记录中确定出预设数量的聚类中心;确定分别以每条聚类中心为中心的告警集合,得到预设数量的告警集合。
在一种实施中,上述生成单元420,具体用于获取每条告警记录中的每个告警项分别对应的有效阈值,每条告警记录中包含多条告警项,每个告警项下包含有告警数值;根据上述每条告警记录中的每个告警项分别对应的有效阈值,以及上述每条告警记录中每个告警项下的告警数值,从上述多条告警记录中筛选出有效告警记录;生成包含上述有效告警记录和上述有效告警记录的业务标签的训练样本。
在一种实施中,上述业务告警设备还包括采集单元470,用于通过分布式消息系统将多条原始告警记录采集到数据库,每条原始告警记录中包含多条告警项,每个告警项下包含有告警数据;预处理单元480,用于对上述多条原始告警记录中的每个告警项下的告警数据分别进行数字特征化处理,得到上述多条告警记录,上述多条告警记录中的每条告警记录中包含多条告警项,每个告警项下包含有告警数值。
在一种实施中,上述预处理单元480,具体用于按照不同的告警项对应的特征化规则,对上述多条原始告警记录中的每条原始告警记录中的不同告警项下的告警数据进行数字特征化处理,得到上述多条告警记录,上述特征化规则包括对上述告警数据进行数值正整化,哈希计算以及字符编码中的至少一种。
在一种实施中,业务告警设备还包括接收单元490,该接收单元用于接收多条反馈信息,上述多条反馈信息分别为不同用户标注的上述目标告警记录的业务标签;业务告警设备还包括确定单元460,该确定单元用于确定出在上述多条反馈信息中的出现次数最多的业务标签,并将上述出现次数最多的业务标签作为上述目标告警记录的目标业务标签;上述训练单元430还用于,利用上述目标业务标签对上述业务告警模型进行修正。
在本申请实施例中,业务告警设备中的聚类单元通过聚类将多条告警记录聚类成个数与业务个数一致的告警集合,每个告警集合用于告警一个业务,同个告警集合中的告警记录对应于一个业务标签,然后生成单元将告警记录和告警记录的业务标签组合为训练样本,训练单元利用该训练样本对支持向量机进行训练得到业务告警模型,最后分析利用业务告警模型对目标告警记录进行分析,便可以快速得到目标告警记录的业务标签,并且告警单元生成包含该目标告警记录和该目标告警记录的业务标签的告警信息。可见,本申请实施例先通过聚类来得到包含告警记录和告警记录的业务标签的训练样本,然后利用该训练样本训练支持向量机得到业务告警模型,最后利用业务告警模型分析其他接收到的告警记录的业务标签,而业务告警模型分析出的目标告警记录的业务标签指示了目标告警记录的业务标签所对应的业务出现了异常,即该业务标签对应的业务为异常业务,然后生成包含有该目标告警记录以及该目标告警记录的业务标签的告警信息,用以实现对异常业务的快速告警。
参见图5,是本申请另一实施例提供的一种业务告警设备示意框图。如图所示的本实施例中的业务告警设备可以包括:处理器510和存储器520。上述处理器510和存储器520通过总线530连接。具体的:
处理器510,用于执行聚类单元410的功能,用于对多条告警记录进行聚类得到上述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,上述业务标签用于指示对应的告警集合中的告警记录所告警的业务;处理器510,用于执行生成单元420的功能,用于根据上述多条告警记录和上述多条告警记录的每条告警记录的业务标签,生成得到训练样本;处理器510,用于执行训练单元430的功能,用于利用上述训练样本对支持向量机进行训练,得到业务告警模型;处理器510,用于执行分析单元440的功能,用于利用上述业务告警模型对目标告警记录进行分析得到上述目标告警记 录的业务标签;处理器510,用于执行告警单元450的功能,用于生成包括上述目标告警记录和上述目标告警记录的业务标签的告警信息。
在一种实施中,处理器510,还用于执行确定单元460的功能,用于确定待预警的业务的数量;处理器510,具体用于根据上述业务的数量对上述多条告警记录进行聚类,得到预设数量的告警集合,上述告警集合的数量与上述业务的数量一致。
在一种实施中,处理器510,具体用于根据上述多条告警记录中任意两条告警记录之间的相似距离,在上述多条告警记录中确定出预设数量的聚类中心;确定分别以每条聚类中心为中心的告警集合,得到预设数量的告警集合。
在一种实施中,处理器510,具体用于获取每条告警记录中的每个告警项分别对应的有效阈值,每条告警记录中包含多条告警项,每个告警项下包含有告警数值;根据上述每条告警记录中的每个告警项分别对应的有效阈值,以及上述每条告警记录中每个告警项下的告警数值,从上述多条告警记录中筛选出有效告警记录;生成包含上述有效告警记录和上述有效告警记录的业务标签的训练样本。
在一种实施中,处理器510,还用于执行采集单元470的功能,用于通过分布式消息系统将多条原始告警记录采集到数据库,每条原始告警记录中包含多条告警项,每个告警项下包含有告警数据;处理器510,还用于执行预处理单元480的功能,用于对上述多条原始告警记录中的每个告警项下的告警数据分别进行数字特征化处理,得到上述多条告警记录,上述多条告警记录中的每条告警记录中包含多条告警项,每个告警项下包含有告警数值。
在一种实施中,处理器510,具体用于按照不同的告警项对应的特征化规则,对上述多条原始告警记录中的每条原始告警记录中的不同告警项下的告警数据进行数字特征化处理,得到上述多条告警记录,上述特征化规则包括对上述告警数据进行数值正整化,哈希计算以及字符编码中的至少一种。
在一种实施中,业务告警设备还包括输入设备540,用于执行接收单元490的功能,用于接收多条反馈信息,上述多条反馈信息分别为不同用户标注的上述目标告警记录的业务标签;上述处理器510,还用于执行确定单元460的功能,用于确定出在上述多条反馈信息中的出现次数最多的业务标签,并将上述出现次数最多的业务标签作为上述目标告警记录的目标业务标签;上述处理器510,还用于利用上述目标业务标签对上述业务告警模型进行修正。
应当理解,在本申请实施例中,所称处理器510可以是中央处理单元(Central Processing Unit,CPU),该处理器510还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
该存储器520可以包括只读存储器和随机存取存储器,并向处理器510提供指令和数据。存储器520的一部分还可以包括非易失性随机存取存储器。例如,存储器520还可以存储设备类型的信息。
本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现上述实施例中方法的部分或全部步骤,或者,计算机程序被处理器执行时实现上述实施例中设备的各模块/单元的功能。例如,该计算机程序被处理器执行时,可实现以下方法:对多条告警记录进行聚类得到多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,业务标签用于指示对应的告警集合中的告警记录所告警的业务;根据多条告警记录和多条告警记录的每条告警记录的业务标签, 生成得到训练样本;利用所述训练样本对支持向量机进行训练,得到业务告警模型;利用所述业务告警模型对目标告警记录进行分析得到所述目标告警记录的业务标签,并生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息。可选的,该计算机程序被处理器执行时,还可实现上述实施例中方法的其他步骤,这里不再赘述。可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
计算机可读存储介质可以是前述任一实施例的业务告警设备的内部存储单元,例如业务告警设备的硬盘或内存。计算机可读存储介质也可以是业务告警设备的外部存储设备,例如业务告警设备上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,计算机可读存储介质还可以既包括业务告警设备的内部存储单元也包括外部存储设备。计算机可读存储介质用于存储计算机程序以及业务告警设备所需的其他程序和数据。计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的数据。
具体实现中,本申请实施例中所描述的处理器510可执行本申请实施例提供的业务告警方法的第二实施例和第三实施例中所描述的实现方式,也可执行本申请实施例所描述的业务告警设备的实现方式,在此不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同业务告警方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的业务告警设备和单元的具体工作过程,可以参考前述业务告警方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的业务告警设备和业务告警方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另外,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口、装置或单元的间接耦合或通信连接,也可以是电的,机械的或其它的形式连接。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本申请实施例方案的目的。
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以是两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分,或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,业务告警设备,或者网络设备等)执行本申请各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。

Claims (20)

  1. 一种业务告警方法,其中,包括:
    对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,所述业务标签用于指示对应的告警集合中的告警记录所告警的业务;
    根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本;
    利用所述训练样本对支持向量机进行训练,得到业务告警模型;
    利用所述业务告警模型对目标告警记录进行分析得到所述目标告警记录的业务标签,并生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息。
  2. 根据权利要求1所述的方法,其中,所述对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合,包括:
    确定待预警的业务的数量;
    根据所述业务的数量对所述多条告警记录进行聚类,得到预设数量的告警集合,所述告警集合的数量与所述业务的数量一致。
  3. 根据权利要求2所述的方法,其中,所述根据所述业务的数量对所述多条告警记录进行聚类,得到预设数量的告警集合,包括:
    根据所述多条告警记录中任意两条告警记录之间的相似距离,在所述多条告警记录中确定出预设数量的聚类中心;
    确定分别以每条聚类中心为中心的告警集合,得到预设数量的告警集合。
  4. 根据权利要求1至3任意一项所述的方法,其中,所述根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本,包括:
    获取每条告警记录中的每个告警项分别对应的有效阈值,每条告警记录中包含多条告警项,每个告警项下包含有告警数值;
    根据所述每条告警记录中的每个告警项分别对应的有效阈值,以及所述每条告警记录中每个告警项下的告警数值,从所述多条告警记录中筛选出有效告警记录;
    生成包含所述有效告警记录和所述有效告警记录的业务标签的训练样本。
  5. 根据权利要求1至3任意一项所述的方法,其中,所述对多条告警记录进行聚类之前,还包括:
    通过分布式消息系统将多条原始告警记录采集到数据库,每条原始告警记录中包含多条告警项,每个告警项下包含有告警数据;
    对所述多条原始告警记录中的每个告警项下的告警数据分别进行数字特征化处理,得到所述多条告警记录,所述多条告警记录中的每条告警记录中包含多条告警项,每个告警项下包含有告警数值。
  6. 根据权利要求5所述的方法,其中,所述对所述多条原始告警记录中的每个告警项下的告警数据分别进行数字特征化处理,得到所述多条告警记录,包括:
    按照不同的告警项对应的特征化规则,对所述多条原始告警记录中的每条原始告警记录中的不同告警项下的告警数据进行数字特征化处理,得到所述多条告警记录,所述特征化规则包括对所述告警数据进行数值正整化,哈希计算以及字符编码中的至少一种。
  7. 根据权利要求1至3任意一项所述的方法,其中,所述生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息之后,还包括:
    接收多条反馈信息,所述多条反馈信息分别为不同用户标注的所述目标告警记录的业务标签;
    确定出在所述多条反馈信息中的出现次数最多的业务标签,并将所述出现次数最多的 业务标签作为所述目标告警记录的目标业务标签;
    利用所述目标业务标签对所述业务告警模型进行修正。
  8. 一种业务告警设备,其中,包括:
    聚类单元,用于对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,所述业务标签用于指示对应的告警集合中的告警记录所告警的业务;
    生成单元,用于根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本;
    训练单元,用于利用所述训练样本对支持向量机进行训练,得到业务告警模型;
    分析单元,用于利用所述业务告警模型对目标告警记录进行分析得到所述目标告警记录的业务标签;
    告警单元,用于生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息。
  9. 一种业务告警设备,其中,包括处理器和存储器,所述处理器和存储器相互连接,其中,所述存储器用于存储计算机程序,所述计算机程序包括程序指令,所述处理器被配置用于调用所述程序指令,用以执行以下方法:
    对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,所述业务标签用于指示对应的告警集合中的告警记录所告警的业务;
    根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本;
    利用所述训练样本对支持向量机进行训练,得到业务告警模型;
    利用所述业务告警模型对目标告警记录进行分析得到所述目标告警记录的业务标签,并生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息。
  10. 根据权利要求9所述的设备,其中,所述对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合时,具体用于:
    确定待预警的业务的数量;
    根据所述业务的数量对所述多条告警记录进行聚类,得到预设数量的告警集合,所述告警集合的数量与所述业务的数量一致。
  11. 根据权利要求10所述的设备,其中,所述根据所述业务的数量对所述多条告警记录进行聚类,得到预设数量的告警集合时,具体用于:
    根据所述多条告警记录中任意两条告警记录之间的相似距离,在所述多条告警记录中确定出预设数量的聚类中心;
    确定分别以每条聚类中心为中心的告警集合,得到预设数量的告警集合。
  12. 根据权利要求9至11任意一项所述的设备,其中,所述根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本时,具体用于:
    获取每条告警记录中的每个告警项分别对应的有效阈值,每条告警记录中包含多条告警项,每个告警项下包含有告警数值;
    根据所述每条告警记录中的每个告警项分别对应的有效阈值,以及所述每条告警记录中每个告警项下的告警数值,从所述多条告警记录中筛选出有效告警记录;
    生成包含所述有效告警记录和所述有效告警记录的业务标签的训练样本。
  13. 根据权利要求9至11任意一项所述的设备,其中,所述对多条告警记录进行聚类之前,所述处理器还用于执行:
    通过分布式消息系统将多条原始告警记录采集到数据库,每条原始告警记录中包含多 条告警项,每个告警项下包含有告警数据;
    对所述多条原始告警记录中的每个告警项下的告警数据分别进行数字特征化处理,得到所述多条告警记录,所述多条告警记录中的每条告警记录中包含多条告警项,每个告警项下包含有告警数值。
  14. 根据权利要求9至11任意一项所述的设备,其中,所述生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息之后,所述处理器还用于执行:
    接收多条反馈信息,所述多条反馈信息分别为不同用户标注的所述目标告警记录的业务标签;
    确定出在所述多条反馈信息中的出现次数最多的业务标签,并将所述出现次数最多的业务标签作为所述目标告警记录的目标业务标签;
    利用所述目标业务标签对所述业务告警模型进行修正。
  15. 一种计算机可读存储介质,其中,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行,用以执行以下方法:
    对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合,不同的告警集合对应于不同的业务标签,所述业务标签用于指示对应的告警集合中的告警记录所告警的业务;
    根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本;
    利用所述训练样本对支持向量机进行训练,得到业务告警模型;
    利用所述业务告警模型对目标告警记录进行分析得到所述目标告警记录的业务标签,并生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述对多条告警记录进行聚类得到所述多条告警记录中的每条告警记录所属的告警集合时,具体用于:
    确定待预警的业务的数量;
    根据所述业务的数量对所述多条告警记录进行聚类,得到预设数量的告警集合,所述告警集合的数量与所述业务的数量一致。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述根据所述业务的数量对所述多条告警记录进行聚类,得到预设数量的告警集合时,具体用于:
    根据所述多条告警记录中任意两条告警记录之间的相似距离,在所述多条告警记录中确定出预设数量的聚类中心;
    确定分别以每条聚类中心为中心的告警集合,得到预设数量的告警集合。
  18. 根据权利要求15至17任意一项所述的计算机可读存储介质,其中,所述根据所述多条告警记录和所述多条告警记录的每条告警记录的业务标签,生成得到训练样本时,具体用于:
    获取每条告警记录中的每个告警项分别对应的有效阈值,每条告警记录中包含多条告警项,每个告警项下包含有告警数值;
    根据所述每条告警记录中的每个告警项分别对应的有效阈值,以及所述每条告警记录中每个告警项下的告警数值,从所述多条告警记录中筛选出有效告警记录;
    生成包含所述有效告警记录和所述有效告警记录的业务标签的训练样本。
  19. 根据权利要求15至17任意一项所述的计算机可读存储介质,其中,所述对多条告警记录进行聚类之前,所述程序指令被处理器执行时,还用于执行:
    通过分布式消息系统将多条原始告警记录采集到数据库,每条原始告警记录中包含多条告警项,每个告警项下包含有告警数据;
    对所述多条原始告警记录中的每个告警项下的告警数据分别进行数字特征化处理,得 到所述多条告警记录,所述多条告警记录中的每条告警记录中包含多条告警项,每个告警项下包含有告警数值。
  20. 根据权利要求15至17任意一项所述的计算机可读存储介质,其中,所述生成包括所述目标告警记录和所述目标告警记录的业务标签的告警信息之后,所述程序指令被处理器执行时,还用于执行:
    接收多条反馈信息,所述多条反馈信息分别为不同用户标注的所述目标告警记录的业务标签;
    确定出在所述多条反馈信息中的出现次数最多的业务标签,并将所述出现次数最多的业务标签作为所述目标告警记录的目标业务标签;
    利用所述目标业务标签对所述业务告警模型进行修正。
PCT/CN2020/119303 2019-10-10 2020-09-30 一种业务告警方法、设备及存储介质 WO2021068831A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910961590.7 2019-10-10
CN201910961590.7A CN110851321B (zh) 2019-10-10 2019-10-10 一种业务告警方法、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2021068831A1 true WO2021068831A1 (zh) 2021-04-15

Family

ID=69597973

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/119303 WO2021068831A1 (zh) 2019-10-10 2020-09-30 一种业务告警方法、设备及存储介质

Country Status (2)

Country Link
CN (1) CN110851321B (zh)
WO (1) WO2021068831A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113381890A (zh) * 2021-06-08 2021-09-10 中国电信股份有限公司 告警信息关联方法、装置、电子设备和可读存储介质
CN114095339A (zh) * 2021-10-29 2022-02-25 北京百度网讯科技有限公司 一种报警处理方法、装置、设备以及存储介质
CN115514613A (zh) * 2022-11-15 2022-12-23 阿里云计算有限公司 告警策略获得方法、装置
CN115955388A (zh) * 2022-12-20 2023-04-11 浪潮云信息技术股份公司 一种分布式云综合告警系统

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110851321B (zh) * 2019-10-10 2022-06-28 平安科技(深圳)有限公司 一种业务告警方法、设备及存储介质
CN111651340B (zh) * 2020-06-10 2023-07-18 创新奇智(上海)科技有限公司 告警数据规则挖掘方法、装置及电子设备
CN113162801B (zh) * 2021-03-26 2022-10-11 中国联合网络通信集团有限公司 一种告警分析方法、装置及存储介质
CN113052338B (zh) * 2021-03-31 2022-11-08 上海天旦网络科技发展有限公司 基于规则与模型增强的运维告警规则生成方法和系统
CN113946464B (zh) * 2021-10-19 2022-08-12 腾云悦智科技(深圳)有限责任公司 一种结合模型及经验的预训练和并行推演的告警降噪方法
CN114091704B (zh) * 2021-11-26 2022-07-12 奇点浩翰数据技术(北京)有限公司 一种告警压制方法和装置
CN115033464A (zh) * 2022-05-31 2022-09-09 中电信数智科技有限公司 一种基于人工智能的分布式灾备演练数据预警方法及系统

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693683B2 (en) * 2004-11-25 2010-04-06 Sharp Kabushiki Kaisha Information classifying device, information classifying method, information classifying program, information classifying system
CN104834940A (zh) * 2015-05-12 2015-08-12 杭州电子科技大学 一种基于支持向量机的医疗影像检查疾病分类方法
CN106708692A (zh) * 2015-11-13 2017-05-24 阿里巴巴集团控股有限公司 建立过滤报警模型的方法和装置以及过滤报警的方法、装置和电子设备
CN106815198A (zh) * 2015-11-27 2017-06-09 北京国双科技有限公司 模型训练方法及装置和语句业务类型的识别方法及装置
CN109558298A (zh) * 2018-10-12 2019-04-02 平安科技(深圳)有限公司 基于深度学习模型的告警执行频率优化方法及相关设备
CN110263172A (zh) * 2019-06-26 2019-09-20 国网江苏省电力有限公司南京供电分公司 一种电网监控告警信息事件化自主识别方法
CN110851321A (zh) * 2019-10-10 2020-02-28 平安科技(深圳)有限公司 一种业务告警方法、设备及存储介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10732621B2 (en) * 2016-05-09 2020-08-04 Strong Force Iot Portfolio 2016, Llc Methods and systems for process adaptation in an internet of things downstream oil and gas environment
CN106548210B (zh) * 2016-10-31 2021-02-05 腾讯科技(深圳)有限公司 基于机器学习模型训练的信贷用户分类方法及装置
CN107908530B (zh) * 2017-11-27 2019-10-18 新华三云计算技术有限公司 一种告警处理方法以及装置
CN108108352A (zh) * 2017-12-18 2018-06-01 广东广业开元科技有限公司 一种基于机器学习文本挖掘技术的企业投诉风险预警方法
CN110096408A (zh) * 2019-03-11 2019-08-06 中国平安人寿保险股份有限公司 告警监测方法、装置、电子设备及计算机可读存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7693683B2 (en) * 2004-11-25 2010-04-06 Sharp Kabushiki Kaisha Information classifying device, information classifying method, information classifying program, information classifying system
CN104834940A (zh) * 2015-05-12 2015-08-12 杭州电子科技大学 一种基于支持向量机的医疗影像检查疾病分类方法
CN106708692A (zh) * 2015-11-13 2017-05-24 阿里巴巴集团控股有限公司 建立过滤报警模型的方法和装置以及过滤报警的方法、装置和电子设备
CN106815198A (zh) * 2015-11-27 2017-06-09 北京国双科技有限公司 模型训练方法及装置和语句业务类型的识别方法及装置
CN109558298A (zh) * 2018-10-12 2019-04-02 平安科技(深圳)有限公司 基于深度学习模型的告警执行频率优化方法及相关设备
CN110263172A (zh) * 2019-06-26 2019-09-20 国网江苏省电力有限公司南京供电分公司 一种电网监控告警信息事件化自主识别方法
CN110851321A (zh) * 2019-10-10 2020-02-28 平安科技(深圳)有限公司 一种业务告警方法、设备及存储介质

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113381890A (zh) * 2021-06-08 2021-09-10 中国电信股份有限公司 告警信息关联方法、装置、电子设备和可读存储介质
CN113381890B (zh) * 2021-06-08 2023-01-13 天翼云科技有限公司 告警信息关联方法、装置、电子设备和可读存储介质
CN114095339A (zh) * 2021-10-29 2022-02-25 北京百度网讯科技有限公司 一种报警处理方法、装置、设备以及存储介质
CN114095339B (zh) * 2021-10-29 2023-08-08 北京百度网讯科技有限公司 一种报警处理方法、装置、设备以及存储介质
CN115514613A (zh) * 2022-11-15 2022-12-23 阿里云计算有限公司 告警策略获得方法、装置
CN115514613B (zh) * 2022-11-15 2023-04-11 阿里云计算有限公司 告警策略获得方法、装置
CN115955388A (zh) * 2022-12-20 2023-04-11 浪潮云信息技术股份公司 一种分布式云综合告警系统

Also Published As

Publication number Publication date
CN110851321A (zh) 2020-02-28
CN110851321B (zh) 2022-06-28

Similar Documents

Publication Publication Date Title
WO2021068831A1 (zh) 一种业务告警方法、设备及存储介质
Fu et al. Service usage classification with encrypted internet traffic in mobile messaging apps
WO2020207167A1 (zh) 文本分类方法、装置、设备及计算机可读存储介质
WO2019141144A1 (zh) 确定网络故障的方法和装置
CN111339297B (zh) 网络资产异常检测方法、系统、介质和设备
CN108833376B (zh) 面向软件定义网络的DoS攻击检测方法
CN111176953B (zh) 一种异常检测及其模型训练方法、计算机设备和存储介质
CN112148772A (zh) 告警根因识别方法、装置、设备和存储介质
WO2021189831A1 (zh) 日志优化方法、装置、设备及可读存储介质
CN113762377B (zh) 网络流量识别方法、装置、设备及存储介质
WO2020082588A1 (zh) 异常业务请求的识别方法、装置、电子设备及介质
CN104239553A (zh) 一种基于Map-Reduce框架的实体识别方法
CN113254255B (zh) 一种云平台日志的分析方法、系统、设备及介质
CN112328425A (zh) 一种基于机器学习的异常检测方法和系统
WO2024031930A1 (zh) 一种异常日志检测方法、装置、电子设备及存储介质
WO2019209503A1 (en) Unsupervised anomaly detection for identifying anomalies in data
CN111444072A (zh) 客户端的异常识别方法、装置、计算机设备和存储介质
CN115600128A (zh) 一种半监督加密流量分类方法、装置及存储介质
CN115794578A (zh) 一种电力系统的数据管理方法、装置、设备及介质
CN111191720A (zh) 一种业务场景的识别方法、装置及电子设备
CN113343228B (zh) 事件可信度分析方法、装置、电子设备及可读存储介质
CN114595765A (zh) 数据处理方法、装置、电子设备及存储介质
CN112632000B (zh) 日志文件聚类方法、装置、电子设备和可读存储介质
CN116127400B (zh) 基于异构计算的敏感数据识别系统、方法及存储介质
CN106304084B (zh) 信息处理方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20874545

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20874545

Country of ref document: EP

Kind code of ref document: A1