CN116132273A

CN116132273A - Service abnormality warning method and device, equipment, medium and product thereof

Info

Publication number: CN116132273A
Application number: CN202211599211.2A
Authority: CN
Inventors: 姜金涛
Original assignee: Guangzhou Wangxing Information Technology Co Ltd
Current assignee: Guangzhou Wangxing Information Technology Co Ltd
Priority date: 2022-12-12
Filing date: 2022-12-12
Publication date: 2023-05-16

Abstract

The application relates to a service abnormality warning method, a device, equipment, a medium and a product thereof, wherein the method comprises the following steps: starting a feature detector in response to an online service start event for listening to a local log record generated by the online service; intercepting a local log record of an online service by a feature detector, and generating category features of an abnormal event characterized by the local log record; periodically sending reporting events carrying the category characteristics to a service controller by a feature detector, triggering the service controller to count and update the total reporting record corresponding to the category characteristics according to the occurrence times of the category characteristics in the corresponding timing period; and monitoring whether the total amount of reported records of each category characteristic exceeds a preset threshold value by a service controller, and generating alarm information corresponding to the category characteristic when the total amount of reported records exceeds the preset threshold value. The method and the device can balance processing timeliness and data scale required by online service abnormal alarming and timely discover abnormal events.

Description

Service abnormality warning method and device, equipment, medium and product thereof

Technical Field

The present disclosure relates to network security technologies, and in particular, to a service abnormality warning method and apparatus, device, medium, and product thereof.

Background

The large-scale internet platform deploys online services usually by means of a micro-service architecture, and massive online services enrich functions of various aspects of the platform and ensure the robust operation of the platform. And the on-line service can realize embedded point codes in advance aiming at various generated exceptions of the on-line service according to the requirement of platform maintenance, generate corresponding log records at key instruction nodes and store the log records in a log file for later examination.

In order to realize centralized management, in the conventional technology, the platform further uploads log files generated by various online services to the service controller by means of the service controller with centralized management capability, so that the service controller centralizes log files generated by all online services, and therefore, operation and maintenance personnel can analyze log data of various sources on the service controller to check problems. It is not easy to understand that the conventional technology cannot trigger the real-time problem of log uploading, and an operation and maintenance person discovers the problem according to the log file, which is delayed in time from the real time of the problem.

In addition to real-time issues, data-level issues. Specifically, the number of online services of each platform is huge, log records generated by various online services are massive naturally, a large number of online services frequently submit massive original log records to a server, which inevitably results in a service controller being overwhelmed, and test is formed on operation resources and storage resources of the service controller, so that the indirect consequence is that the deployment cost of the platform is high.

In view of the shortcomings of the conventional technology in finding abnormal events of online services, it is necessary to explore effective means to timely and efficiently capture abnormal events of various online services so as to quickly find problems.

Disclosure of Invention

It is an object of the present application to solve the above-mentioned problems and provide a service abnormality warning method and corresponding apparatus, device, non-volatile readable storage medium, and computer program product.

According to one aspect of the present application, there is provided a service abnormality warning method, including:

the starting feature detector is used for monitoring local log records generated by the online service;

intercepting, by the feature detector, a local log record of an online service, and generating a category feature to which an abnormal event represented by the local log record belongs;

periodically sending a reporting event carrying the category characteristics to a service controller by the characteristic detector, triggering the service controller to statistically update the total reporting record corresponding to the category characteristics according to the occurrence times of the category characteristics in the corresponding timing period;

and monitoring whether the total amount of the reported records of each category characteristic exceeds a preset threshold value by the service controller, and generating alarm information corresponding to the category characteristic when the total amount of the reported records exceeds the preset threshold value.

According to another aspect of the present application, there is provided a service abnormality warning apparatus including:

the service starting module is set as a starting feature detector and is used for monitoring local log records generated by the online service;

the feature generation module is used for responding to the generation event of the local log record of the online service by the feature detector and generating the category feature of the abnormal event characterized by the local log record;

the reporting processing module is configured to send a reporting event carrying the category characteristics to the service controller by the characteristic detector, and trigger the service controller to count and update the total reporting record quantity corresponding to the category characteristics;

the monitoring alarm module is set to monitor whether the total reported record amount of each category characteristic exceeds a preset threshold value or not by the service controller, and when the total reported record amount exceeds the preset threshold value, alarm information corresponding to the category characteristic is generated.

According to another aspect of the present application, there is provided a service abnormality warning apparatus including a central processor and a memory, the central processor being operable to invoke the steps of running a computer program stored in the memory to perform the service abnormality warning method described herein.

According to another aspect of the present application, there is provided a non-volatile readable storage medium storing in the form of computer readable instructions a computer program implemented according to the service abnormality warning method, the computer program executing the steps comprised by the method when being invoked by a computer.

According to another aspect of the present application, there is provided a computer program product comprising computer programs/instructions which when executed by a processor implement the steps of the method as described in any of the embodiments of the present application.

The present application has various technical advantages over the prior art, including but not limited to: according to the method, the local log records triggered by the abnormal events of the online service are intercepted by the feature detector of the online service, category features are extracted from the local log records, then the occurrence times corresponding to the category features are obtained periodically, the occurrence times are related to the category features and submitted to the service controller for statistics, the total reporting record amount corresponding to each category feature is determined, whether warning is needed or not is judged according to the total reporting record amount, thus the division of the online service and the service controller is reasonably distributed, the occurrence times of the category features are counted periodically by the online service to report to the service controller periodically, the request quantity corresponding to the service controller is greatly reduced, and meanwhile, warning can be avoided by utilizing the relatively instant information of a short period, so that the processing time and the data scale required by the online service abnormal warning are balanced, the running resources and the storage resources required by the service controller are lower, the running state is more robust, the abnormal events are found out more rapidly, the deployment implementation cost is lower, and the economic effect is obtained.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a network architecture schematic diagram of an exemplary deployment environment of the present application;

FIG. 2 is a flow chart of an embodiment of a service anomaly alerting method of the present application;

FIG. 3 is a flow chart of a feature detector processing local log records according to an embodiment of the present application;

FIG. 4 is a schematic flow chart of the interaction of the feature detector with the service controller in the embodiment of the present application;

FIG. 5 is a flowchart illustrating a process of updating the total reported records by the service controller according to an embodiment of the present application;

FIG. 6 is a flow chart of a service controller distribution feature detector in an embodiment of the present application;

FIG. 7 is a flowchart of a service controller responding to an event review instruction according to an embodiment of the present application;

FIG. 8 is a schematic block diagram of a service anomaly alerting device of the present application;

Fig. 9 is a schematic structural diagram of a service abnormality warning apparatus employed in the present application.

Detailed Description

Referring to fig. 1, a network architecture adopted in an exemplary application scenario of the present application includes a terminal device 80, a front-end server 81 and a service controller 82, where the service controller 82 is configured to deploy a service abnormality alert service, and distribute corresponding function plug-ins, such as feature detectors, to online services implemented on the front-end server 81, so as to enable data communication with the service abnormality alert service through the function plug-ins, and when an operation and maintenance user accesses a corresponding page of the service controller 81 from the terminal device 80, various information provided by the service abnormality alert service can be obtained.

The service abnormality warning service of the present application may be implemented by executing the service abnormality warning method of the present application, and specifically, the service abnormality warning method of the present application may be implemented as a computer program product, which is installed in a corresponding device, for example, the service controller, and after the service abnormality warning service is executed, the service abnormality warning service may be opened by executing the method through each functional component of the computer program product.

Referring to fig. 2, in one embodiment, a service abnormality warning method provided in the present application includes the following steps:

step S1100, starting a feature detector for monitoring local log records generated by the online service;

in one embodiment, by pre-configuring, when various online services in the internet are started, feature detectors preset for the various online services may be started, bringing the feature detectors into a servo state to receive local log records from which the online services are redirected. In another embodiment, the feature detector may be started in advance in the front-end server deployed by each online service, so that the feature detector enters the servo state in advance, and an equivalent effect may be achieved.

The feature detector may monitor local log records generated by the online service in a plurality of ways, and in one embodiment, the local log record output interface of the online service is redirected to the feature detector by modifying a log management module preset by the online service; in another embodiment, the feature extractor is implemented as a Hook function (Hook), which hooks code instructions corresponding to the local log records output by the online service, and when the code instructions are to be executed, the corresponding business logic of the feature detector is executed; in yet another embodiment, the feature extractor may also monitor incremental local log records generated by the corresponding online service in the shared memory or local log file to effect snooping. It follows that there are a number of ways that the feature detector can obtain local log records generated by an online service.

There may be a plurality of online services, which are deployed in a plurality of front-end servers, respectively, and the same front-end server may also deploy a plurality of online services. Each online service may be configured with a dedicated feature detector alone, or multiple online services may be configured to share a feature detector in the same front-end server.

The local log record, depending on the data structure that is specified by the online service itself that generated it, for example, in one embodiment may organize the corresponding data according to the following data structure:

{ abnormal event class; generating time; thread identification; a file name; abnormal instruction line number; message body }

Different online services generally have different local log record data structures, and for the feature detector, the feature detector can adapt to the corresponding online service in advance, determine the specific type of local log record to be acquired, and perform identification processing.

Step 1200, intercepting, by the feature detector, a local log record of an online service, and generating a category feature to which an abnormal event represented by the local log record belongs;

when the feature detector is operated to enter a servo state, local log records generated by corresponding online services can be intercepted. Each local log record usually indicates that an abnormal event occurs, so that the attribute items and the corresponding attribute values in the local log records are obtained by analyzing the data structure of the local log records, the abnormal event indicated by each local log record can be determined by utilizing the attribute values of part or all of the attribute items, and the corresponding category characteristics of the local log records can be generated according to the attribute values and are used for indicating the type of the abnormal event.

When the category characteristics of the characterized abnormal event are generated according to the partial attribute values in the local log records, the attribute values corresponding to the attribute items with distinguishing significance for the abnormal event can be selected, and the attribute values form category data to generate the category characteristics. The attribute item selected for the category data generally has an attribute value that characterizes the abnormal event represented by the local log record, so that after a corresponding category feature is generated according to the category data, the generated category feature can also obtain the corresponding uniqueness.

In one embodiment, the category data may be the following attribute terms: abnormal event level, file name, abnormal instruction line number. It can be seen that in the example, the nature, the source and the specific position of the abnormal event can be determined through the abnormal event level, the file name and the abnormal instruction line number, so that the abnormal event can be accurately positioned, and the abnormal event can be conveniently and rapidly detected. In this way, it is not easy to understand that each generated category feature can distinguish not only the nature of the abnormal event, but also the source and the specific location thereof, and not that as long as the nature, source or location of the abnormal event is different, different category features are generated, and the nature, source and specific location of the corresponding abnormal event can be determined through the same category feature.

When the class characteristics with the unique characteristics are generated according to the class data, the class characteristics can be constructed by adopting a hash function, which is also called a hash function, the class data can be mapped into a coded character string with uniform word length and uniqueness, and the coded character string can be 128, 256 or 512-bit binary data or can be converted into hexadecimal data.

When the local log record corresponding to an abnormal event arrives for the first time, the category data and the corresponding category characteristics of the local log record can be determined according to the process, so that the service controller can conveniently check and determine the abnormal event in the follow-up process, the category data of the local log record which arrives at first and the category characteristics of the local log record form mapping relation data to be sent to the service controller, and the service controller stores the mapping relation data in a database of the service controller so as to facilitate the follow-up call.

After the service controller has mastered the category data and the category characteristics thereof corresponding to the abnormal event, the feature detector subsequently obtains the same category data and generates the category characteristics for the same, the category data is not required to be transmitted to the service controller, and the mapping relation data formed by the category characteristics and the occurrence times obtained by periodical statistics of the category characteristics is only required to be uploaded to the service controller.

Step S1300, periodically sending a report event carrying the category characteristics to a service controller by the characteristic detector, triggering the service controller to statistically update the total report records corresponding to the category characteristics according to the occurrence times of the category characteristics in the corresponding timing period;

in order to avoid frequently sending a report corresponding to each local log record to the service controller and lighten the flow pressure of the service controller, the feature detector adopts a caching mechanism to open up a cache area in the memory of the front-end server where the feature detector is positioned, and is used for caching the counted occurrence times of each category feature in a preset timing period.

Specifically, after a new category feature appears for the first time, the number of occurrences of the local log records belonging to the category feature may be counted periodically, and in each timing period, the total number of occurrences of the local log records of the same category feature is accumulated, so as to obtain mapping relationship data between each category feature and the number of occurrences thereof. Each cycle may begin accumulating the number of occurrences based on an initial value, and after one cycle ends, the number of occurrences of each category feature may be reset to an initial value, e.g., 0, and then reckoned on the initial value basis in the next cycle.

After one period is finished, the feature detector can package the mapping relation data formed by each category feature and the corresponding occurrence number into the same message body, and submit the mapping relation data to the service controller through the corresponding reporting event. Therefore, the periodic occurrence times corresponding to each category characteristic are prevented from being reported to the service controller for each generation of a local log record, and the periodic occurrence times corresponding to each category characteristic are periodically reported to the service controller, so that the data interaction with the service controller can be greatly reduced, and the traffic load of the service controller is reduced. Meanwhile, because the category data corresponding to the category features are submitted to the service controller in advance, when the occurrence times corresponding to each category feature are submitted to the service controller later, the service controller can inquire and acquire the corresponding category data only by the category features provided together with the occurrence times without submitting the category data of the category features again.

The service controller is responsible for maintaining an alarm detection table for storing the total amount of reported records of category characteristics of category data, specifically, the service controller can store the category characteristics after receiving mapping relation data of the category characteristics and the category data thereof, then create a data record corresponding to the category characteristics in the alarm detection table, wherein at least two fields are included, one field is a category field, the category field is used for storing the category characteristics, and the event total amount field is used for storing the total amount of reported records of the corresponding category characteristics. The total reported record may be initialized to a value of 0. In one embodiment, the category feature and the total amount of reported records thereof may be stored in the form of Key-Value based on Redis, i.e., the category feature is stored as Key field data, and the total amount of reported records is stored as Value field data. In another embodiment, the data record may be implemented based on relational data, which may include not only the category field and the total event field, but also a data field for storing category data corresponding to category features.

When the service controller receives the category characteristics and the occurrence times thereof periodically submitted by the characteristic detectors running on each front-end server each time, firstly, inquiring and determining specific data records in the alarm detection table according to the category characteristics, and then accumulating the occurrence times corresponding to the category characteristics into the total reported records corresponding to the category characteristics in the data records thereof, so that the total reported records are updated, and the latest total reported records after periodical updating are obtained.

It can be seen that, although the total reported record amount of each category feature is not updated immediately corresponding to each local log record, but is updated periodically, because the timing period can be set to a reasonable word interval, for example, 1 minute, 30 seconds, etc., the trade-off of flow efficiency and instantaneity is realized, the service controller can be prevented from frequently responding to the local log record related requests of a plurality of concurrent online services, the load of the service controller is greatly reduced, and the operation of the service controller is more robust.

And step 1400, monitoring whether the total amount of the reported records of each category characteristic exceeds a preset threshold by the service controller, and generating alarm information corresponding to the category characteristic when the total amount of the reported records exceeds the preset threshold.

In order to identify whether the abnormal event corresponding to each category characteristic needs to be intervened, the service controller can monitor the total reported record amount of each category characteristic in the alarm detection table through an independent process or thread, and alarm is carried out when the abnormality is monitored.

Specifically, a threshold corresponding to the triggering alarm is preset, the process or the thread traverses the alarm detection table at regular or irregular time, whether the total amount of the corresponding reported records exceeds the threshold is detected according to the category characteristics in each data record, and when the total amount of the corresponding reported records does not exceed the threshold, the alarm condition is not met and the report is ignored; when the threshold value is exceeded, the alarm condition is met, and then the alarm information corresponding to the category characteristic can be generated and sent to a preset communication interface.

The alarm information, in one embodiment, may include the category feature and corresponding category data called from the database according to the category feature, and of course, may also include the total amount of reported records corresponding to the category feature as required.

The communication interface can be an instant communication interface or a mailbox address, and the alarm information can be sent to corresponding operation and maintenance users through the communication interface, so that the related operation and maintenance users can quickly master the details of the abnormal event and quickly handle the abnormal event.

In one embodiment, in step S1200, when the feature detector submits, to the service controller, category data and category features corresponding to an abnormal event for the first time, operation state information of the corresponding online service, such as CPU occupancy rate, storage space occupancy rate, etc., may be further obtained, and then, the operation state information is associated with the category features and submitted to the service controller together for association and storage in the database for further investigation. In step S1400, when the alarm information needs to be packaged, the running state information and the category data can be packaged together in the alarm information, so that the operation and maintenance user can not only check the category data corresponding to the abnormal event and determine the information such as the nature, the source and the position of the abnormal event, but also grasp the health degree of the front-end service where the online service is located, so that the information is more comprehensive, and the abnormal event can be conveniently, quickly, accurately and efficiently processed.

From the above embodiments, the present application has various technical advantages, including but not limited to: according to the method, the local log records triggered by the abnormal events of the online service are intercepted by the feature detector of the online service, category features are extracted from the local log records, then the occurrence times corresponding to the category features are obtained periodically, the occurrence times are related to the category features and submitted to the service controller for statistics, the total reporting record amount corresponding to each category feature is determined, whether warning is needed or not is judged according to the total reporting record amount, thus the division of the online service and the service controller is reasonably distributed, the occurrence times of the category features are counted periodically by the online service to report to the service controller periodically, the request quantity corresponding to the service controller is greatly reduced, and meanwhile, warning can be avoided by utilizing the relatively instant information of a short period, so that the processing time and the data scale required by the online service abnormal warning are balanced, the running resources and the storage resources required by the service controller are lower, the running state is more robust, the abnormal events are found out more rapidly, the deployment implementation cost is lower, and the economic effect is obtained.

On the basis of any embodiment of the present application, referring to fig. 3, intercepting, by the feature detector, a local log record of an online service, and generating a category feature to which an abnormal event represented by the local log record belongs, including:

step S1210, the feature detector receives the local log record output by the online service redirection, analyzes the data structure of the local log record, extracts the attribute values of part of attribute items in the local log record, and forms category data;

in this embodiment, the output object of the local log record of the online service may be configured in advance, and redirected to the feature detector corresponding to the online service by configuration, so that the feature detector may receive each local log record output by the online service.

After the feature detector obtains the local log record, analyzing the data structure of the local log record, and extracting attribute values of corresponding attribute items in the local log record according to a plurality of attribute items given in a preset template corresponding to the description abnormal event, wherein the attribute values form category data.

Step S1220, the feature detector determines whether the attribute value of the predetermined attribute item satisfies a preset condition, and if not, ignores the local log record; if yes, generating category characteristics of the characterized abnormal event according to the category data;

The local log records output by the online service usually contain different class types, so that, in each local log record, an attribute item specially used for indicating the abnormal event class is recorded with an attribute value of a corresponding class, for example, the attribute value may be FATAL, ERROR, WARN, INFO, DEBUG, TRACE, ALL, which indicates the abnormal event class of a deadly class, an error class, an early warning class, an information class, a debugging class, a tracking class and a general class, but the local log records of some types, such as FATAL, ERROR, WARN, of which the specific alarms are needed, for this case, the feature detector presets the attribute values as predetermined attribute values, and then determines, for each local log record, whether the attribute value of the corresponding attribute item is the predetermined attribute values, if not, the local log record may be ignored, and does not serve as a basis for counting the occurrence times of the corresponding class features. If not, the local log record can be used as a basis for counting the occurrence times of the corresponding category characteristics, category data are extracted from the local log record according to the related process of the application to generate the corresponding category characteristics, and then the occurrence times of the category characteristics are counted and updated.

Through the step, the screening of the local log records is realized, the generation and reporting of category characteristics aiming at each local log record can be avoided, and the key type information is focused, so that the processing efficiency of the characteristic detector is improved, and the counted information is more accurate.

In step S1230, the feature detector stores the local log record in a log file of the online service.

Considering that the online service needs to maintain the local log file stored in the online service, the feature detector intercepts the local log record of the online service by redirecting, and thus, after finishing the screening and subsequent processing of the local log record, the intercepted local log record can be restored back to the log file of the online service so as to avoid destroying the log file of the online service.

According to the above embodiment, the feature detector obtains the local log record of the online service based on redirection, processes the local log record and then returns the local log record to the log file of the online service, so that the original business logic of the online service is not damaged, the online service is not required to realize the business logic interacting with the service controller, and the local log file can be screened by loading the standardized feature detector, thereby further realizing reporting of abnormal events of a specific level, being very efficient and convenient and being beneficial to standardized realization.

On the basis of any embodiment of the present application, referring to fig. 4, after generating the category characteristics of the characterized abnormal event according to the category data, the method includes:

step S1221, the feature detector judges whether the category data appears for the first time, if so, a period timer is started to count the appearance times of the category data in a preset period, and an initialization instruction is submitted to the service controller, wherein the initialization instruction carries the category data and the category features thereof;

after extracting the category data of a piece of local log record each time and generating a corresponding category characteristic according to the category data, judging whether the category data appears for the first time, and for this purpose, the method can be realized in various modes. In another embodiment, when the service controller stores the category characteristics and the mapping relationship data between the category characteristics, a query request may be submitted to the service controller, and the service controller queries whether the service controller has stored the mapping relationship data between the corresponding category characteristics and the mapping relationship data between the category characteristics and returns a result, and when the service controller does not have the corresponding mapping relationship data, the service controller can see that the corresponding category data appears for the first time.

For the first occurrence of the category data, the feature detector starts a corresponding period timer for the feature detector, calculates a corresponding timing period according to the corresponding duration of the predetermined period, for example, 1 minute or 30 seconds, and when one timing period is finished, can reset the period timer to continue the timing of the next period.

In a timing period corresponding to the period timer, the feature detector calculates the occurrence number of the corresponding local log records for the same category feature, that is, after the corresponding category data first appears, the feature detector counts the occurrence number corresponding to the category feature generated according to the category data in the corresponding timing period, and in fact counts the periodic total number of the local log records corresponding to the same category feature.

After confirming that the category data appears for the first time, the feature detector also sends an initialization instruction to the service controller, wherein the initialization instruction carries the category data appearing for the first time and the category features generated according to the category data so that the category data and the feature data form mapping relation data.

It will be appreciated that, if the local log record obtained each time, its category data does not appear for the first time, the initialization command need not be submitted, but only the number of occurrences of the local log record may be incremented.

In one embodiment, the initialization instruction may carry operation state information of the online service, so that the service controller may store, in association with the operation state information, category data of category features for query.

Step S1222, the service controller responds to the initialization instruction to create a data record indexed by the category characteristic in a database, so that the category characteristic is associated with the category data, and the total amount of the reported records corresponding to the category characteristic is initialized to be an initial value;

and the service controller responds after receiving the initialization instruction. And analyzing the mapping relation data, then creating a data record in a database, and correspondingly storing the mapping relation data so as to store the category data and the category characteristics thereof into the database, thereby realizing the record of the category data corresponding to the category characteristics and subsequently providing for query and call.

The act of storing the category data in the database means that the service controller receives a new category feature and needs to start counting the total amount of reported records corresponding to the category feature. Accordingly, the category characteristics are also associated in the database, the corresponding total reported records are stored, and the total reported records are initialized to an initial value, e.g., a value of 0. In the embodiments disclosed in the foregoing application, the mapping relationship between the category characteristics and the total amount of the records reported by the category characteristics may be stored in the form of key value pairs.

Step S1223, in response to the arrival event counted by the periodic timer, forming mapping relationship data by the number of occurrences obtained in the counted period of the periodic timer and the category feature, and generating the reporting event, where the reporting event includes the mapping relationship data.

When the period timer reaches the time, a time-counting arrival event is triggered, and the time-counting arrival event is responded, so that mapping relation data between category characteristics and occurrence times of the category characteristics counted in the corresponding time period can be packaged into corresponding message bodies, then reporting events are constructed corresponding to the message bodies, the reporting events are sent to the service controller, and the message bodies are submitted to the service controller through the reporting events, so that the service controller obtains the category characteristics and the occurrence times of the category characteristics in the corresponding period and is used for updating reporting record total quantity corresponding to the category characteristics.

According to the above embodiment, the feature detector only needs to submit the category data to the service controller when the category data of each category feature appears next time, and other times only need to periodically report the appearance times corresponding to the category feature, so that the service controller can rapidly count the total report record quantity corresponding to each category feature.

On the basis of any embodiment of the present application, referring to fig. 5, periodically sending, by the feature detector, a report event carrying the category feature to a service controller, triggering the service controller to statistically update the total report records corresponding to the category feature according to the occurrence number of the category feature in a corresponding timing period, where the steps include:

step 1310, the service controller responds to the reporting event and analyzes the mapping relation data carried in the reporting event to obtain corresponding category characteristics and corresponding occurrence times of the category characteristics in a single timing period;

after the service control receives a reporting event submitted by a feature detector corresponding to any online service, the reporting event is responded, mapping relation data carried in the reporting event is analyzed, and as mentioned above, the mapping relation data comprises each category feature and the occurrence times in the corresponding period, so that each category feature and the corresponding occurrence times in a single timing period are obtained.

Step S1320, the service controller searches the data record corresponding to the category characteristic from the database, accumulates the total number of reported records in the data record for the occurrence number, and updates the total number of reported records.

The service controller stores the category characteristics and the corresponding total reported records in a database, so that after the new category characteristics and the mapping relation data of the occurrence times are obtained, the corresponding data records are inquired and determined from the database according to the category characteristics, and then the corresponding occurrence times of the category characteristics are accumulated on the basis of the total reported records in the data records of the category characteristics, so that the total reported records of the category characteristics can be updated.

According to the above embodiment, the service controller can quickly find the corresponding data record according to the category feature, and timely accumulate the occurrence times corresponding to the category feature into the total amount of the report records corresponding to the category feature, so as to realize quick and light update, and can embody efficiency advantages when submitting massive report events concurrently for massive online services.

On the basis of any embodiment of the present application, referring to fig. 6, before starting a feature detector for monitoring a local log record generated by an online service, the method includes:

step S2100, the service controller responds to the deployment completion notification message corresponding to the online service, and sends the feature detector and the identity token thereof to the online service;

The various online services may be implemented through different development frameworks, and thus feature detectors corresponding to the different development frameworks may be standardized and then distributed accordingly for each online service by the service controller.

After the online service is deployed on the corresponding front-end server, a distribution request can be submitted to the service controller, the distribution request contains corresponding development framework identification information, after the service controller receives the distribution request, a resource file of a corresponding feature detector is searched according to the development framework identification information, and meanwhile, an identity token with unique identity is generated and is sent to the corresponding online service together with the resource file and the corresponding identity token.

Step S2200, the online service configures the feature detector as a synchronization plug-in that is activated in response to the online service, and configures the identity token as necessary information for the feature detector to communicate data with the service controller.

After the online service receives the resource file and the identity token corresponding to the feature detector, the identity token is cached, and the resource text is configured as a synchronous plug-in started in response to the start of the online service, so that the configuration is completed.

Through configuration, when the online service is started each time, a pre-configured resource file is operated to correspondingly call the pre-configured feature detector, the feature detector carries the identity token in a request or instruction each time in the process of carrying out data communication with the service controller, the service controller checks the communication legitimacy of the feature detector by the identity token each time, only receives legal requests or instructions, and refuses to respond to illegal requests or instructions, thereby ensuring that the service controller is prevented from illegal attack and ensuring network security.

According to the above embodiment, the feature detector can be standardized and distributed intensively by the service controller, which is beneficial to centralized configuration of massive online services of the distributed architecture by a large platform, so that the installation and update efficiency of the feature detector is higher, and the network communication security can be ensured by the identity token, so that the operation of the service controller is more stable.

On the basis of any embodiment of the present application, referring to fig. 7, the service controller monitors whether the total amount of reported records of each category feature exceeds a preset threshold, and when the total amount of reported records exceeds the preset threshold, after generating alarm information corresponding to the category feature, the method includes:

Step S3100, the service controller responds to an event checking instruction to obtain each category characteristic in the database, category data and total reported records of the category characteristic, and corresponding data samples are formed;

the operation and maintenance user can check the overall situation of abnormal events of mass online services of the whole large-scale internet platform by accessing the page correspondingly provided by the service controller. Specifically, through the page, the operation and maintenance user can trigger an event viewing instruction to be sent to the service controller. After receiving the event checking instruction, the service controller responds to the instruction and can call specific data in the database, wherein the specific data can specifically comprise mapping relation data between category characteristics and total reported records of the category characteristics in the alarm detection table, and also can comprise mapping relation data between the category characteristics and corresponding category data, namely mapping relation data between each category characteristic, total reported records of the category characteristics and the category data, and each category characteristic and total reported records of the category characteristics are constructed into a corresponding data sample. It is easy to understand that the semantics of the data sample can be enriched by constructing the data sample to contain the category data, and the follow-up clustering can be ensured to obtain more accurate clustering results.

Step S3200, the service controller clusters the data samples to obtain a plurality of clusters, and determines the accumulated value of the total reported records of the data samples contained in each cluster as the total cluster event;

considering that the different types of features may belong to different online services, but may have similarity due to file names therein, and the total amount of reported records may also exhibit a certain similarity, a clustering algorithm may be applied to cluster the data samples, so as to comprehensively examine the overall classification features of a large number of abnormal events from the perspective of the whole platform.

In one embodiment, in order to facilitate subsequent clustering, each item of information in each data sample may be encoded correspondingly to obtain a corresponding encoded vector, and then deep semantic information of the encoded vector of each data sample is extracted through training to a converged feature in advance to obtain a corresponding semantic vector, and all data samples are clustered based on the semantic vector of each data sample.

In the clustering, any mature clustering algorithm may be used for implementation, and the clustering algorithm includes, but is not limited to, any of the following: K-Means clustering algorithm, mean shift clustering algorithm, density-based clustering algorithm (DBSCAN), maximum expectation clustering algorithm based on Gaussian mixture model, aggregation hierarchy clustering algorithm, graph group detection clustering algorithm and the like.

The K-Means clustering algorithm needs to select some cluster classes first and randomly initialize core sample points of each cluster class as center points. The sample point may be a word pair and its similarity. The center point is the same location as each sample point vector length. Then, the distance from each sample point to the center point is calculated, and the sample point is closest to which center point and is divided into which cluster class. Further, the center point of each cluster class is recalculated. Repeating the steps until the central point of each cluster does not change more than a preset range after each iteration. Alternatively, the center point may be initialized randomly a plurality of times, and then the one with the best operation result may be selected.

The mean shift clustering algorithm is an algorithm based on a sliding window, and a dense region of sample points is found through the sliding window. This is a centroid-based algorithm that locates the center point of each cluster by updating the candidate points for the center point to the mean of the points within the sliding window. And then removing similar windows of the candidate windows to finally form a center point set and corresponding cluster types.

Density-based clustering algorithms (DBSCAN), which are similar to mean shift clustering algorithms, are also density-based clustering algorithms. Firstly, determining an adjacent range and a preset quantity threshold, starting from an arbitrary sample point which is not accessed, taking the sample point as a center point, detecting whether the quantity of the sample points contained in the adjacent range is greater than or equal to the preset quantity threshold, if so, marking the sample point as the center point, otherwise, marking the sample point as a noise node. The above steps are repeated, if a noise node exists in the adjacent range of a certain central point, the noise node is marked as an edge node, otherwise, the noise node is still the noise node. The above process is repeated until all sample points have been accessed.

Based on the maximum Expectation (EM) clustering algorithm of Gaussian Mixture Model (GMM), firstly, the number of clusters (similar to K-Means) is selected, and Gaussian distribution parameters (mean and variance) of each cluster are randomly initialized, and the data can be observed first to give a relatively accurate mean and variance. Then, given the gaussian distribution of each cluster class, the probability that each sample point belongs to each cluster class is calculated. The closer a sample point is to the center of the gaussian distribution, the more likely it is that it belongs to the cluster class. Further, calculating gaussian distribution parameters based on these probabilities maximizes the probability of a sample point, and weighting of the probability of a sample point, which is the probability that a sample point belongs to the cluster class, may be used to calculate these new parameters. Then, the first two steps of iteration are repeated until the variation in the iteration does not exceed the preset range, and clustering is completed.

The aggregation hierarchical clustering algorithm is divided into two categories: top-down and bottom-up. The aggregation hierarchical clustering (HAC) is a bottom-up clustering algorithm. HAC first treats each sample point as a single cluster class and then calculates the distance between all cluster classes to merge the cluster classes until all cluster classes are aggregated into one cluster class.

Graph community detection (Graph Community Detection) clustering algorithm, first initially assigning each sample point considered as a vertex to its own community, and then computing the modularity M of the entire network. Second, if two communities merge together, the algorithm calculates the resulting modular change ΔM, essentially taking the community pair for which ΔM has the greatest growth, and then merging. Third, a new modularity M is calculated for this cluster and recorded. Then, repeating the first and second steps, fusing the community pairs each time, thus finally obtaining the maximum gain of DeltaM, and recording the new cluster mode and the corresponding modularity score M.

According to the above description of each optional clustering algorithm, any one feasible clustering algorithm is adopted, and based on the similarity of each word pair, one or more similar cluster types can be determined by clustering each word pair, and each similar cluster type is composed of a plurality of data samples. Because each data sample establishes a real edge connection relation based on similarity, each similar cluster obtained by clustering is generally composed of data samples with the same semantics.

Step S3300, sorting each cluster according to the cluster event total amount, generating a mapping relationship list of clusters and cluster event total amounts thereof, and responding to the event checking instruction.

Each cluster has the corresponding cluster event total amount, the larger the cluster event total amount is, the more obvious the corresponding problem is, therefore, according to the principle of light and heavy urgency, each cluster is ordered according to the cluster event total amount, a mapping relation list can be obtained, in the mapping relation list, the identification of the cluster and the mapping relation data between the cluster event total amount are stored, and then the mapping relation list is pushed to the operation and maintenance user, so that the response to the event checking instruction is completed. And the operation and maintenance user can conduct overall problem investigation after the mapping relation list is analyzed and displayed by the browser. Further, when the operation and maintenance user clicks the identifier of any one cluster, details of each data sample in the cluster are further pushed to the operation and maintenance user, category characteristics of each data sample, corresponding category data, operation state information of corresponding online service and the like are displayed, and the operation and maintenance user can conveniently analyze and treat problems corresponding to abnormal events existing in the whole internet platform from shallow to deep.

According to the embodiment, the category characteristics, the category data and the reporting record quantity of the method are utilized to provide rich initial semantics, macro clustering of abnormal events reported by massive online services can be realized by means of a clustering algorithm, and the association among abnormal events with different sources, positions and properties is mined across the online services through clustering, so that an operation and maintenance user can quickly acquire the overall profile of the abnormal events of the whole internet platform, the operation and maintenance efficiency is improved, and the method is beneficial to quickly troubleshooting.

Referring to fig. 8, a service abnormality alarm device provided according to an aspect of the present application includes a service starting module 1100, a feature generating module 1200, a reporting processing module 1300, and a monitoring alarm module 1400, where: the service starting module 1100 is configured to start a feature detector, and is configured to monitor a local log record generated by an online service; the feature generation module 1200 is configured to generate, by the feature detector, a category feature to which the abnormal event characterized by the local log record belongs in response to a generation event of the local log record of the online service; the report processing module 1300 is configured to send a report event carrying the category feature to a service controller by using the feature detector, and trigger the service controller to statistically update the total report records corresponding to the category feature; the monitoring alarm module 1400 is configured to monitor, by the service controller, whether the total amount of the reported records of each of the class features exceeds a preset threshold, and when the total amount of the reported records exceeds the preset threshold, generate alarm information corresponding to the class features.

On the basis of any embodiment of the present application, the feature generating module 1200 includes: the record analysis unit is used for receiving the local log record output by the online service redirection by the feature detector, analyzing the data structure of the local log record, extracting the attribute values of part of attribute items in the local log record, and forming category data; a record screening unit, configured to determine whether the attribute value of the predetermined attribute term meets a preset condition by using the feature detector, and if not, ignore the local log record; if yes, generating category characteristics of the characterized abnormal event according to the category data; and the local storage unit is used for storing the local log record in a log file of the online service by the feature detector.

On the basis of any embodiment of the present application, the feature generating module 1200 further includes: an initial preparation unit, configured to determine whether the category data appears for the first time by using the feature detector, if so, starting a period timer to count the appearance times of the category data in a preset period, and submitting an initialization instruction to the service controller, where the initialization instruction carries the category data and the category features thereof; the initial processing unit is configured to respond to the initialization instruction, create a data record indexed by the category characteristic in a database, associate the category characteristic with category data of the data record, and initialize the total amount of reported records corresponding to the category characteristic as an initial value; the period accumulating unit is configured to respond to the timing arrival event of the period timer, form mapping relation data with the appearance times and the category characteristics obtained in the timing period of the period timer, generate the reporting event, and contain the mapping relation data in the reporting event.

On the basis of any embodiment of the present application, the report processing module 1300 includes: the reporting analysis unit is configured to analyze mapping relation data carried in the reporting event in response to the reporting event by the service controller, and obtain corresponding category characteristics and corresponding occurrence times of the category characteristics in a single timing period; and the total amount updating unit is arranged for the service controller to search out the data record corresponding to the category characteristic from the database, accumulate the total amount of the reported records in the data record for the occurrence times and update the total amount of the reported records.

On the basis of any embodiment of the application, the initialization instruction also carries running state information of the online service, and the category data comprises attribute values for describing the nature, source and/or position of the abnormal event.

On the basis of any embodiment of the application, the service abnormality warning device of the application further comprises: the distribution processing module is configured to respond to the deployment completion notification message corresponding to the online service by the service controller and send the feature detector and the identity token thereof to the online service; a distribution configuration module arranged for the online service to configure the feature detector as a synchronization plug-in activated in response to the online service and to configure the identity token as necessary information for the feature detector to communicate data with the service controller.

On the basis of any embodiment of the application, the service abnormality warning device of the application further comprises: the checking response module is set for the service controller to respond to the event checking instruction, and obtains each category characteristic in the database, category data and total reported record quantity of the category characteristic, and corresponding each data sample is formed; the clustering processing module is used for clustering the data samples by the service controller to obtain a plurality of cluster types, and determining the accumulated value of the total reported record quantity of the data samples contained in each cluster type as the total cluster event quantity; the information output module is arranged for sequencing all the cluster events according to the cluster event total amount, generating a cluster event and a mapping relation list of the cluster event total amount, and responding to the event checking instruction.

Another embodiment of the present application further provides a service abnormality alarm device. As shown in fig. 9, the internal structure of the service abnormality warning apparatus is schematically shown. The service abnormality warning apparatus includes a processor, a computer readable storage medium, a memory, and a network interface connected by a system bus. The computer readable non-volatile storage medium of the service abnormality warning device stores an operating system, a database and computer readable instructions, the database can store information sequences, and the computer readable instructions, when executed by a processor, can enable the processor to realize a service abnormality warning method.

The processor of the service anomaly alarm device is configured to provide computing and control capabilities to support the operation of the entire service anomaly alarm device. The memory of the service anomaly alarm device may store computer readable instructions that, when executed by the processor, cause the processor to perform the service anomaly alarm method of the present application. The network interface of the service abnormality warning device is used for connecting communication with the terminal.

It will be appreciated by those skilled in the art that the structure shown in fig. 9 is merely a block diagram of a portion of the structure associated with the present application and does not constitute a limitation of the service anomaly alarm device to which the present application is applied, and that a particular service anomaly alarm device may include more or fewer components than shown in the figures, or may combine certain components, or have a different arrangement of components.

The processor in this embodiment is configured to perform specific functions of each module in fig. 8, and the memory stores program codes and various types of data required for executing the above-described modules or sub-modules. The network interface is used for realizing data transmission between the user terminals or the servers. The nonvolatile readable storage medium in this embodiment stores therein program codes and data necessary for executing all modules in the service abnormality warning apparatus of the present application, and the server can call the program codes and data of the server to execute the functions of all modules.

The present application also provides a non-transitory readable storage medium storing computer readable instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the service anomaly alerting method of any embodiment of the present application.

The present application also provides a computer program product comprising computer programs/instructions which when executed by one or more processors implement the steps of the method described in any of the embodiments of the present application.

It will be appreciated by those skilled in the art that implementing all or part of the above-described methods according to the embodiments of the present application may be accomplished by way of a computer program stored in a non-transitory readable storage medium, which when executed may comprise the steps of the embodiments of the methods described above. The storage medium may be a computer readable storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).

In summary, the method and the device can balance processing timeliness and data scale required by online service abnormality alarming, enable operation resources and storage resources required by a service controller to be lower, enable the operation state to be more robust, enable abnormal events to be found more rapidly, enable deployment implementation cost to be lower, and obtain a scale economic effect.

Claims

1. A service abnormality warning method, characterized by comprising:

2. The service anomaly alert method of claim 1, wherein intercepting, by the feature detector, a local log record of an online service, generating a category feature to which the local log record characterizes an anomaly event, comprises:

the feature detector receives the local log record output by the online service redirection, analyzes the data structure of the local log record, extracts the attribute values of part of attribute items in the local log record and forms category data;

The feature detector judges whether the attribute value of the preset attribute item meets a preset condition, and if not, the local log record is ignored; if yes, generating category characteristics of the characterized abnormal event according to the category data;

the feature detector stores the local log record in a log file of the online service.

3. The service anomaly alert method of claim 2, wherein after generating the category characteristics of the anomaly event characterized by the category data, comprising:

the feature detector judges whether the category data appears for the first time, if so, a period timer is started to count the appearance times of the category data in a preset period, an initialization instruction is submitted to the service controller, and the category data and the category features thereof are carried in the initialization instruction;

the service controller responds to the initialization instruction, creates a data record taking the category characteristic as an index in a database, associates the category characteristic with category data of the data record, and initializes the total amount of reported records corresponding to the category characteristic as an initial value;

And responding to the timing arrival event of the periodic timer, forming mapping relation data by the occurrence times and the category characteristics obtained in the timing period of the periodic timer, generating the reporting event, and containing the mapping relation data in the reporting event.

4. The service abnormality warning method according to claim 3, wherein periodically sending, by the feature detector, a report event carrying the category feature to a service controller, triggering the service controller to statistically update a total number of report records corresponding to the category feature according to the number of occurrences of the category feature in a corresponding timing period, including:

the service controller responds to the reporting event and analyzes the mapping relation data carried in the reporting event to obtain corresponding category characteristics and corresponding occurrence times of the category characteristics in a single timing period;

and the service controller searches the data record corresponding to the category characteristic from the database, accumulates the total number of reported records in the data record for the occurrence times, and updates the total number of reported records.

5. A service abnormality warning method according to claim 3, characterized in that: the initialization instruction also carries running state information of the online service, and the category data comprises attribute values for describing the nature, source and/or position of the abnormal event.

6. The service anomaly alerting method of any one of claims 1 to 5, wherein the feature detector is activated to listen to a local log record generated by an online service, prior to the local log record comprising:

the service controller responds to the deployment completion notification message corresponding to the online service and sends the feature detector and the identity token thereof to the online service;

the online service configures the feature detector as a synchronization plug-in that is activated in response to the online service and configures the identity token as necessary information for the feature detector to communicate data with the service controller.

7. The service abnormality warning method according to any one of claims 1 to 5, characterized in that the service controller monitors whether the total amount of reported records of each of the category characteristics exceeds a preset threshold, and when the total amount of reported records exceeds the preset threshold, after generating warning information corresponding to the category characteristics, the method includes:

the service controller responds to an event checking instruction to obtain each category characteristic in the database and category data and total reported records of the category characteristic, so as to form corresponding data samples;

The service controller clusters the data samples to obtain a plurality of cluster types, and determines the accumulated value of the total reported record quantity of the data samples contained in each cluster type as the total cluster event quantity;

and sequencing all the clusters according to the cluster event total amount, generating a mapping relation list of the clusters and the cluster event total amount, and responding to the event checking instruction.

8. A service abnormality warning apparatus, characterized by comprising:

9. A service anomaly alerting device comprising a central processor and a memory, wherein the central processor is arranged to invoke a computer program stored in the memory to perform the steps of the method of any one of claims 1 to 7.

10. A non-transitory readable storage medium, characterized in that it stores a computer program in the form of computer readable instructions, which when invoked by a computer to run, performs the steps comprised by the method according to any one of claims 1 to 7.

11. A computer program product comprising computer programs/instructions which, when executed by a processor, perform the steps of the method of any of claims 1 to 7.