CN110519102A - A kind of server failure recognition methods, device and storage medium - Google Patents

A kind of server failure recognition methods, device and storage medium Download PDF

Info

Publication number
CN110519102A
CN110519102A CN201910865239.8A CN201910865239A CN110519102A CN 110519102 A CN110519102 A CN 110519102A CN 201910865239 A CN201910865239 A CN 201910865239A CN 110519102 A CN110519102 A CN 110519102A
Authority
CN
China
Prior art keywords
server
information
failure
terminal
condition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910865239.8A
Other languages
Chinese (zh)
Other versions
CN110519102B (en
Inventor
孙翌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Abacus Industrial Technology Co ltd
Original Assignee
Guizhou Gloud Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guizhou Gloud Technology Co Ltd filed Critical Guizhou Gloud Technology Co Ltd
Priority to CN201910865239.8A priority Critical patent/CN110519102B/en
Publication of CN110519102A publication Critical patent/CN110519102A/en
Application granted granted Critical
Publication of CN110519102B publication Critical patent/CN110519102B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/06Management of faults, events, alarms or notifications
    • H04L41/0677Localisation of faults

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a kind of server failure recognition methods, device and storage mediums, server is used for service reservation project, multiple server distributions for service reservation project are in multiple regions, access is used for service reservation item destination server when terminal logs in predetermined item, the exception information that method occurs when the following steps are included: obtaining terminal access server;According to exception information, timing node and the corresponding information collection section of timing node of the terminal access server for the exception information occur are determined;Obtain the variation tendency information and log duration information in information collection section;According to variation tendency information and log duration information, judge whether server breaks down, so that the position of the server of server failure and failure be fast and accurately identified, efficiently solve the problems, such as the process for identifying positioning failure server in the prior art in the presence of time-consuming, identification positioning is not accurate enough.

Description

A kind of server failure recognition methods, device and storage medium
Technical field
The present invention relates to fault identification technical field more particularly to a kind of server failure recognition methods, device and storage Medium.
Background technique
Traditional server failure recognition methods is mainly inspected periodically by staff, and uses the game of server The modes such as player feedback find server failure.It is excessive due to failing to report although the information of server failure can be obtained, And the serious hysteresis quality that fault message is sent, it is unable to reach the purpose for being really quickly accurately positioned failed server, is often enabled Server failure issue handling personnel feel at a loss.The fault recognition method of above-mentioned server is too strong to the dependence of people, can not Data mining is carried out to failure, and trouble shoot target is indefinite.
Existing server background system can collect all users (i.e. using terminal enter game game player) The time (practical game on line time) of each access server, and the end message that service is abnormal.Exception information and user The time data of access service are able to reflect the case where user is using server, but have and service when abnormal end message occurs simultaneously It is not necessarily mean that failure occur in server or service.Server failure issue handling personnel need by checking backstage one by one The data of system carry out Artificial Diagnosis, come positioning failure server, and input system, then Maintenance Engineer is sent to examine It repairs.
The process that this results in identifying positioning failure server in the prior art is in the presence of time-consuming, identification positioning is not accurate enough The problem of.
Summary of the invention
In order to solve the above-mentioned technical problems, the present invention provides a kind of server failure recognition methods, device and storages to be situated between Matter.
Server failure recognition methods provided by the invention, the server is used for service reservation project, multiple for taking Be engaged in the predetermined item server distribution in multiple regions, access is described pre- for servicing when terminal logs in the predetermined item Determine item destination server, the described method comprises the following steps:
The exception information occurred when obtaining terminal access server;
According to the exception information, determine that the terminal for the exception information occur accesses the timing node of the server, with And the corresponding information collection section of the timing node;
Obtain the variation tendency information and log duration information in the information collection section;
According to the variation tendency information and log duration information, judge whether the server breaks down.
The above method also has the following characteristics that the method also includes following steps:
Obtain the total time information and total quantity information for all terminals for accessing the server affiliated area;
According to the total time information and total quantity information, judge the server with the presence or absence of failure.
The above method also has the following characteristics that described according to the variation tendency information and log duration information, judges institute It states server and whether breaks down and include:
According to the variation tendency information and log duration information, the service is judged using the first decision condition prestored The predetermined item of device service whether there is first kind failure;
And/or
It is described according to the total time information and total quantity information, judge that the server includes: with the presence or absence of failure
According to the total time information and the total quantity information, the service is judged using the second decision condition prestored The predetermined item of the affiliated area service of device whether there is the second class failure.
The above method also have the following characteristics that it is described according to the variation tendency information and log duration information, using pre- The first decision condition deposited judges that the predetermined item of the server service includes: with the presence or absence of first kind failure
When meeting any one condition in the following conditions, determining the predetermined item of the server service, there are failures:
Condition one, the variation tendency information are less than the first preset value;
Condition two, the variation tendency information are equal to the second preset value;
Condition three, the log duration information are less than third preset value;
And/or
It is described according to the total time information and the total quantity information, using described in the second decision condition judgement prestored The predetermined item of the affiliated area service of server whether there is the second class failure
When meeting following full terms, determining the predetermined item of the server affiliated area service, there are failures:
The terminal of condition four, the total time information less than the first duration, in described first when a length of access whole region Log in the lower quartile of the average log duration of the predetermined item;
Condition five, the total time information are less than the second duration;
Condition six, the total quantity information are greater than the 4th preset value.
The above method also have the following characteristics that the method also includes:
Obtain the number of terminal access server;
According to the number of the terminal access server, different many ratios of the access number of the terminal are determined;
When meeting condition four, condition five and condition six simultaneously, judges whether different many ratios are less than and preset different many ratios Rate, if so, determining that failure is not present in the predetermined item of the server affiliated area service.
The above method also have the following characteristics that the method also includes:
When meeting following full terms, determine that there are failures for the server:
The server affiliated area is judged as the second class failure, and the terminal quantity for accessing the server is greater than it The quantity of the terminal of any one server of remaining in affiliated area access;
The average log duration for accessing the terminal of the server is less than in the server affiliated area remaining any one The average log duration of the terminal of server access;
Terminal in the whole terminals accessed in the region greater than predetermined ratio accesses on the server.
The above method also has the following characteristics that the determination method in the information collection section includes:
Centered on the timing node, on time dimension, chooses each n of timing node upstream and downstream and be connected Data access point, total 2n+1 node is as information gathering point, wherein the data access point be remaining terminal access described in The timing node of server.
The above method also has the following characteristics that the variation tendency information includes the slope of 2n+1 node line;
After the log duration information includes terminal access server, the total duration that is spent in the predetermined item.
Present invention also provides a kind of server failure identification device, the server is used for service reservation project, multiple Server distribution for servicing the predetermined item accesses when multiple regions, the terminal login predetermined item for servicing The scheduled item destination server, the identification device include:
Exception information obtains module, the exception information occurred when for obtaining terminal access server;
Section determining module, for according to the exception information, determining that the terminal for the exception information occur accesses the clothes The timing node of business device and the corresponding information collection section of the timing node;
Critical parameter obtains module, for obtaining the variation tendency information and log duration letter in the information collection section Breath;
First judgment module, for judging that the server is according to the variation tendency information and log duration information No failure.
Above-mentioned apparatus also has the following characteristics that described device further include:
Area information obtain module, for obtains access the server affiliated area all terminals total time information with Total quantity information;
Second judgment module, for judging whether the server is deposited according to the total time information and total quantity information In failure.
Above-mentioned apparatus also has the following characteristics that the first judgment module for according to the variation tendency information and login Duration information judges the predetermined item of the server service with the presence or absence of the first kind using the first decision condition prestored Failure;
And/or
Second judgment module is used to utilize second prestored according to the total time information and the total quantity information Decision condition judges the predetermined item of the affiliated area service of the server with the presence or absence of the second class failure.
Above-mentioned apparatus also has the following characteristics that the first judgment module for executing following judgement:
When meeting any one condition in the following conditions, determining the predetermined item of the server service, there are the first kind Failure:
Condition one, the variation tendency information are less than the first preset value;
Condition two, the variation tendency information are equal to the second preset value;
Condition three, the log duration information are less than third preset value;
And/or
Second judgment module is for executing following judgement:
When meeting following full terms, determining the predetermined item of the server affiliated area service, there are the second classes Failure:
The terminal of condition four, the total time information less than the first duration, in described first when a length of access whole region Log in the lower quartile of the average log duration of the predetermined item;
Condition five, the total time information are less than the second duration;
Condition six, the total quantity information are greater than the 4th preset value.
Above-mentioned apparatus also has the following characteristics that described device further include:
Number obtains module, for obtaining the number of terminal access server;
Different crowd's rate determination block determines the access of the terminal for the number according to the terminal access server Different many ratios of number;
Third judgment module, for when simultaneously meet condition four, condition five and condition six when, judge it is described it is different crowd ratios be It is no be less than preset different many ratios, if so, determining the predetermined item of the server affiliated area service, there is no failures.
Above-mentioned apparatus also has the following characteristics that described device further include:
4th judging unit, for executing following judgement:
When meeting following full terms, determine that there are failures for the server:
The server affiliated area is judged as the second class failure, and the terminal quantity for accessing the server is greater than it The quantity of the terminal of any one server of remaining in affiliated area access;
The average log duration for accessing the terminal of the server is less than in the server affiliated area remaining any one The average log duration of the terminal of server access;
Terminal in the whole terminals accessed in the region greater than predetermined ratio accesses on the server.
Above-mentioned apparatus also has the following characteristics that the area information obtains module and includes:
Area information determination unit is used for centered on the timing node, on time dimension, segmentum intercalaris when choosing described Each n of upstream and downstream connected data access points of point, total 2n+1 node is as information gathering point, wherein the data connect Access point is the timing node that remaining terminal accesses the server.
Present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer Server failure recognition methods as described above is realized when program is executed by processor.
Using in the present invention server failure recognition methods and device, utilize what can conveniently be got in the prior art The information such as log duration, the login time node of terminal quickly sentence the server for the terminal access for Information abnormity occur It is disconnected, so that the position of the server of server failure and failure be fast and accurately identified, efficiently solve the prior art The process of middle identification positioning failure server has that time-consuming, identification positioning is not accurate enough.
Detailed description of the invention
The attached drawing for constituting a part of the invention is used to provide further understanding of the present invention, schematic reality of the invention It applies example and its explanation is used to explain the present invention, do not constitute improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is one of the flow chart of server failure recognition methods in embodiment;
Fig. 2 is two of the flow chart of server failure recognition methods in embodiment;
Fig. 3 is the structure chart of server failure identification device in embodiment.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.It needs Illustrate, in the absence of conflict, the features in the embodiments and the embodiments of the present application can mutual any combination.
This application provides a kind of server failure recognition methods, can be according to the terminal that can be obtained in the prior art The information such as log duration, login time node, quickly to there is Information abnormity terminal access server judge, thus The position of the server of server failure and failure is fast and accurately identified.
In running game either some large-scale national projects, need using server to be game or project operation It provides and supports and ensure.In order to guarantee game player or user in China, cracking fastly trip can be linked into In play or project, server can be set in national multiple regions, multiple servers, each server can be set in each region Multiple game or multiple projects can be serviced respectively, i.e., can be run simultaneously on same one server of time multiple game or Person's project.
As shown in Figure 1, a kind of server failure recognition methods, comprising the following steps:
The exception information occurred when S10, acquisition terminal access server;
S20, according to exception information, determine the timing node of the terminal access server of the exception information occur, with timely The corresponding information collection section of intermediate node;
S30, the variation tendency information and log duration information for obtaining information collection section;
S40, according to variation tendency information and log duration information, judge whether server breaks down.
In S10, exception information is that terminal issues, for example terminal appearance sudden strain of a muscle in logging in game process is moved back, Caton, trip Play starting failure, can not to enter game by App medium, belongs to exception information.The case where starting failure due to game is non- It is often complicated, but causing the exception information of game starting failure is quoted by server, due to can not will in the application All problems all cover, therefore, under normal circumstances, lesser extent in the exception information that when terminal access server occurs It is abnormal, for example Caton, sudden strain of a muscle are moved back, player's disconnection reconnecting, decoding channels create failure, server GPU resource deficiency etc., these degree When lighter exception occurs, the probability very little that server breaks down, it is therefore not necessary to carry out fault identification to server, directly Ignore exception.Occur when terminal is in access server degree it is heavier abnormal when, be possible to be by server failure Cause, that is to say, that only occur when terminal is in access server degree it is heavier abnormal when, just server is carried out therefore Barrier identification.Therefore, the exception information referred in this method S10 refers to the heavier exception of the degree occurred when terminal access server Information.Herein, it should be noted that the exception information occurred when obtaining terminal access server is using existing in the prior art Method, and be to have been able to realize in the prior art.
Above method step is mainly used for the predetermined item (can be game or mega project) run on server Whether go wrong and judged, further, in order to ensure predetermined item normal table operation, this method further include for pair Whether predetermined item that terminal occurs running on the server affiliated area of exception information when logging in, which goes wrong, is judged Method and step, as shown in Figure 2, comprising the following steps:
S50, the total time information and total quantity information for obtaining all terminals for accessing the server affiliated area;
S60, according to total time information and total quantity information, judge server with the presence or absence of failure.
It has been related to information collection section in above-mentioned control method, it is also assumed that be the window phase of data acquisition, it is right The desired node being acquired in information collection section carries out data acquisition, to obtain the variation tendency information in S30 and step on Record duration information.Here, the determination method to information section is defined, and centered on timing node, on time dimension, choosing Each n of timing node upstream and downstream connected data access points are taken, total 2n+1 node is as information gathering point, wherein number It is the timing node of remaining terminal access server according to access point.The selection of n value determines according to actual conditions, for example, when can be with It is set according to the performance quality of server, when server performance is preferable, the quantity of the node of selection can be relatively fewer, When server performance is poor, the quantity of the node of selection is more, to obtain more accurate data, it is preferable that n value can be 5 to 20.Herein, it should be noted that n node of timing node upstream is in the terminal access service for exception information occur The timing node that the terminal access in server is had already accessed to before device, can be the timing node normally logged in, can also be with It is exception information timing node occurred.The n node in timing node downstream is in the terminal access service for exception information occur The timing node of the terminal access server of server is linked into after device.Either n node of upstream or the n in downstream A node, be all using the time as unit continued presence, with guarantee information collection section obtain variation tendency information it is reliable Property.
In a specific embodiment, the variation of the variation tendency information acquisition in S30 according to information collection section becomes Gesture information is the slope of 2n+1 node line.That is, if using the time as abscissa, with the duration spent when access server As ordinate, 2n+1 node is connected in turn sequentially in time, can substantially obtain straight line, makes 2n+1 Node distribution is in the two sides of straight line.Using the slope of this straight line as the variation tendency information in information collection section.
In S30, after log duration information includes terminal access server, in predetermined item, such as in predetermined games The total duration of cost.For example, a terminal since triggering App to game is exited, altogether time-consuming 30 minutes, then the terminal is stepped on Recording duration information is 30 minutes.
In S50, total time the determination method of information be to determine region belonging to the server, obtain and serviced in the region The quantity of device counts the quantity of the terminal accessed in each server, when counting each terminal since access server It carves the online hours between at the time of disconnection and server connects, exists to whole terminals in servers whole in the region It is averaged after the summation of line duration, using the average value as total time information.That is, Servers-all access in a certain region The average value of the duration of the corresponding player's running game of terminal.
In S50, the method for determining total quantity information is to determine region belonging to the server, obtains and services in the region The quantity of device counts the quantity of the terminal accessed in each server, to all whole of server accesses whole in the region The quantity at end is summed, as total quantity information.That is, there is player in server to run some game described in a certain region Total sample size.
In a preferred embodiment, according to variation tendency information and log duration information, judge whether server goes out Existing failure is in specific implementation procedure, comprising:
According to variation tendency information and log duration information, server service is judged using the first decision condition prestored Predetermined item whether there is first kind failure.
Specifically, utilizing the first decision condition judgement service prestored according to variation tendency information and log duration information The predetermined item of device service whether there is first kind failure
When meeting any one condition in the following conditions, there are failures for the predetermined item of determining server service:
Condition one, variation tendency information are less than the first preset value;
Condition two, variation tendency information are equal to the second preset value;
Condition three, log duration information are less than third preset value.
Wherein, the second preset value is 0, that is to say, that condition two represents each in entire information collection section The log duration of node is the same.But the reason of network environment due to terminal performance itself, where terminal, is all It will cause log duration difference, so, if once variation tendency information is 0, illustrate to certainly exist problem.First preset value and Third preset value is that by such as CART decision tree etc., commonly classification prediction model is determined when establishing fault identification model (being discussed in detail below).
In another preferred embodiment, according to total time information and total quantity information, judge that server whether there is Failure includes:
According to total time information and total quantity information, the affiliated area of server is judged using the second decision condition prestored The predetermined item of service whether there is the second class failure.
Specifically, judging server using the second decision condition prestored according to total time information and total quantity information The predetermined item of affiliated area service whether there is the second class failure
When meeting following full terms, there are failures for the predetermined item of determining server affiliated area service:
Less than the first duration, the terminal in first when a length of access whole region logs in predetermined for condition four, total time information The lower quartile of the average log duration of project;
Condition five, total time information are less than the second duration;
Condition six, total quantity information are greater than the 4th preset value.
Wherein, the first duration is determined by the method for statistics, that is, it is predetermined will to access the login of the terminal in whole region The average log duration of project is ordered from large to small according to duration, and total time information is located at one of latter four parts of arrangement Position is considered as it and has met condition four.Second duration and the 4th preset value are also during server failure identification model Commonly classify what prediction model determined by such as CART decision tree etc..
Herein, it should be noted that when terminal network environment in the environment it is poor when, it is possible that terminal is attempted The mode connected infinitely is taken, to drag down the numerical value of total information duration, causes to judge by accident, to will be not present the feelings of failure Condition is classified as that there are failures.In order to solve this problem, the erroneous judgement to the second class failure, the method in the present invention are avoided further include:
Obtain the number of terminal access server;
According to the number of terminal access server, different many ratios of the access number of terminal are determined;
When meeting condition four, condition five and condition six simultaneously, judges whether different many ratios are less than and preset different many ratios, if It is that then failure is not present in the predetermined item of determining server affiliated area service.
Wherein, the network condition for presetting different many ratios region according to locating for server is set, and preferably 70%, That is, when the number of the access server of some terminal accounts for the 30% of the total degree of whole terminal access servers, The case where thinking the server does not meet the second class failure, determines that it is normal operating condition, it is believed that the problem of being terminal.
Further, in order to further increase judgement accuracy, erroneous judgement situation is avoided to occur, the fault identification in the present invention Method further include:
When meeting following full terms, there are failures for determining server:
The server affiliated area is judged as there are the second class failure, and accesses the quantity of the terminal of the server Greater than the quantity for the terminal that remaining any one server in its affiliated area accesses;
The average log duration for accessing the terminal of the server is less than in the server affiliated area remaining any one The average log duration of the terminal of server access;
Occur on that server in the whole terminals accessed in the region greater than the terminal of predetermined ratio.
If having and when only a server all meets above three condition, determine that the server breaks down, without It is that predetermined item in whole region there is a problem.
Server failure recognition methods in the present invention is deployed on linux system, is judged to when by the above method When existing failure, it can be first kind failure, the second class failure either individual server failure, take in following methods at least It is a kind of to be handled:
1, fault message is recorded, and fault message is pushed to the backstage of linux system, operator is it can be seen that clothes The fault message of business device, to realize coordinated management;
2, the service for the server that failure is related to is shut down;
3, the fault message of server is sent at maintenance personal by way of mail, short message or wechat Reason;
4, after the trouble shooting of failed server, the state of server is changed on normal on the backstage of linux system Linear state, to guarantee server energy normal use.
Before the fault recognition method in operation the present patent application, need to establish fault identification model.Establishing failure It during identification model, needs to sample the server there are failure, to construct sample data, know to obtain in failure The various threshold values used during not.When carrying out failure sampling, whole server failures in enough duration sections are acquired Information can clearly and completely react fault message to guarantee that sampled data is more than enough.For example, whole to the same day when acquisition same day 0 All information when 24 only, including fault message and non-faulting information, are handled fault message, for example remove fault message In some unnecessary noise informations, data are changed, to obtain the numerical value about Rule of judgment of fault point.For example, The first preset value for being judged variation tendency information it has been related in the first Rule of judgment, and to log duration The third preset value that information is judged.So in order to obtain the first preset value and third preset value, need to fault data into Row processing.It is treated in journey to fault data, the failure variation tendency information obtained according to fault information acquisition section For the slope of 2n+1 node line, using the slope of this straight line as the variation tendency information in fault information acquisition section.
After failure log duration information includes failed terminals access server, in predetermined item, such as predetermined games The total duration of middle cost.For example, failed terminals since triggering APP to game is exited, time-consuming 5 minutes altogether, then the failure is whole The failure log duration information at end is 5 minutes.
Using CART decision tree, variation tendency information and log duration information etc. are input to decision tree as mould dimension is entered It is middle to be used as independent variable, it will whether be failed server as dependent variable.Decision tree after toning is joined will automatically calculate out first The first preset value, third preset value etc. in Rule of judgment for being judged.Likewise, each for carrying out in this method The threshold value of judgement can determine by way of sampling and using CART decision tree.There are the numbers of uncertainty in decision condition Value, such as lower quartile are and the moulds of classifying before because one opposite numerical value of demand is judged in following judgement Numerical value of the threshold value determined by type close to a certain relativity in sample set at that time.
In addition, before the online identification for server failure of server failure identification model in the present invention, it is also necessary into Row test.When testing, accurate rate (PPV) is calculated using confusion matrix, Precision=TP/ (TP+FP), i.e., in mould Type prediction is the correct specific gravity of model prediction in all results of positive (Positive).Service by testing, in the application The PPV of device recognition methods is greater than 90%.
Present invention also provides a kind of server failure identification device, server is used for service reservation project, multiple to be used for The server distribution of service reservation project is in multiple regions, and access is used for the clothes of service reservation project when terminal logs in predetermined item Business device, as shown in figure 3, identification device includes:
Exception information obtains module, the exception information occurred when for obtaining terminal access server;
Section determining module, for according to exception information, determine the terminal access server for the exception information occur when Intermediate node and the corresponding information collection section of timing node;
Critical parameter obtains module, for obtaining the variation tendency information and log duration information in information collection section;
First judgment module, for judging whether server event occurs according to variation tendency information and log duration information Barrier.
Further, device further include:
Area information obtain module, for obtains access the server affiliated area all terminals total time information with Total quantity information;
Second judgment module, for judging server with the presence or absence of failure according to total time information and total quantity information.
Wherein, area information acquisition module includes:
Area information determination unit, for centered on timing node, on time dimension, access time node upstream and Each n in downstream connected data access points, total 2n+1 node is as information gathering point, wherein data access point is remaining end Hold the timing node of access server.
Further, first judgment module is used for according to variation tendency information and log duration information, utilizes the prestored One decision condition judges the predetermined item of server service with the presence or absence of first kind failure;
And/or
Second judgment module is used to be judged according to total time information and total quantity information using the second decision condition prestored The predetermined item of the affiliated area service of server whether there is the second class failure.
Further, first judgment module is for executing following judgement:
When meeting any one condition in the following conditions, there are failures for the predetermined item of determining server service:
Condition one, variation tendency information are less than the first preset value;
Condition two, variation tendency information are equal to the second preset value;
Condition three, log duration information are less than third preset value;
And/or
Second judgment module is for executing following judgement:
When meeting following full terms, there are failures for the predetermined item of determining server affiliated area service:
Less than the first duration, the terminal in first when a length of access whole region logs in predetermined for condition four, total time information The lower quartile of the average log duration of project;
Condition five, total time information are less than the second duration;
Condition six, total quantity information are greater than the 4th preset value.
Further, device further include:
Number obtains module, for obtaining the number of terminal access server;
Different crowd's rate determination block determines the different of the access number of terminal for the number according to terminal access server Many ratios;
Third judgment module, for when meeting condition four, condition five and condition six simultaneously, judging whether different many ratios are small In presetting different many ratios, if so, failure is not present in the predetermined item of determining server affiliated area service.
Further, device further include:
4th judging unit, for executing following judgement:
When meeting following full terms, there are failures for determining server:
The server affiliated area is judged as the second class failure, and the terminal quantity for accessing the server is greater than it The quantity of the terminal of any one server of remaining in affiliated area access;
The average log duration for accessing the terminal of the server is less than in the server affiliated area remaining any one The average log duration of the terminal of server access;
Terminal in the whole terminals accessed in the region greater than predetermined ratio accesses on the server.
Since the server failure identification device in the present patent application is for realizing above-mentioned server failure recognition methods , therefore, details are not described herein for function and effect for server failure identification device.
In addition, being stored thereon with computer program, computer present invention also provides a kind of computer readable storage medium Above-mentioned server failure recognition methods is realized when program is executed by processor.
The present patent application is passed through by the way that expert's sample set is sampled and established to server failure according to expert's sample set Classification prediction model determines each threshold value for being judged, so as to more accurately and rapidly determine server Failure.
Descriptions above can combine implementation individually or in various ways, and these variants all exist Within protection scope of the present invention.
Those of ordinary skill in the art will appreciate that all or part of the steps in the above method can be instructed by program Related hardware is completed, and described program can store in computer readable storage medium, such as read-only memory, disk or CD Deng.Optionally, one or more integrated circuits also can be used to realize, accordingly in all or part of the steps of above-described embodiment Ground, each module/unit in above-described embodiment can take the form of hardware realization, can also use the shape of software function module Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that including the article of a series of elements or equipment not only includes those elements, but also including not having There is the other element being expressly recited, or further includes for this article or the intrinsic element of equipment.Do not limiting more In the case where system, the element that is limited by sentence " including ... ", it is not excluded that in the article or equipment for including the element There is also other identical elements.
The above examples are only used to illustrate the technical scheme of the present invention and are not limiting, reference only to preferred embodiment to this hair It is bright to be described in detail.Those skilled in the art should understand that can modify to technical solution of the present invention Or equivalent replacement should all cover in claim model of the invention without departing from the spirit and scope of the technical solution of the present invention In enclosing.

Claims (16)

1. a kind of server failure recognition methods, the server is used for service reservation project, multiple described predetermined for servicing The server distribution of project accesses the clothes for servicing the predetermined item in multiple regions, when terminal logs in the predetermined item Business device, which is characterized in that the described method comprises the following steps:
The exception information occurred when obtaining terminal access server;
According to the exception information, determine that the terminal for the exception information occur accesses the timing node of the server, Yi Jisuo State the corresponding information collection section of timing node;
Obtain the variation tendency information and log duration information in the information collection section;
According to the variation tendency information and log duration information, judge whether the server breaks down.
2. server failure recognition methods according to claim 1, which is characterized in that the method also includes following steps It is rapid:
Obtain the total time information and total quantity information for all terminals for accessing the server affiliated area;
According to the total time information and total quantity information, judge the server with the presence or absence of failure.
3. server failure recognition methods according to claim 2, which is characterized in that described to be believed according to the variation tendency Breath and log duration information, judging whether the server breaks down includes:
According to the variation tendency information and log duration information, judge that the server takes using the first decision condition prestored The predetermined item of business whether there is first kind failure;
And/or
It is described according to the total time information and total quantity information, judge that the server includes: with the presence or absence of failure
According to the total time information and the total quantity information, the server is judged using the second decision condition prestored The predetermined item of affiliated area service whether there is the second class failure.
4. server failure recognition methods according to claim 3, which is characterized in that described to be believed according to the variation tendency Breath and log duration information, judge whether the predetermined item of the server service is deposited using the first decision condition prestored Include: in first kind failure
When meeting any one condition in the following conditions, determining the predetermined item of the server service, there are failures:
Condition one, the variation tendency information are less than the first preset value;
Condition two, the variation tendency information are equal to the second preset value;
Condition three, the log duration information are less than third preset value;
And/or
It is described according to the total time information and the total quantity information, judge the service using the second decision condition prestored The predetermined item of the affiliated area service of device includes: with the presence or absence of failure
When meeting following full terms, determine the predetermined item of the server affiliated area service there are the second class therefore Barrier:
Less than the first duration, the terminal in described first when a length of access whole region is logged in for condition four, the total time information The lower quartile of the average log duration of the predetermined item;
Condition five, the total time information are less than the second duration;
Condition six, the total quantity information are greater than the 4th preset value.
5. server failure recognition methods according to claim 4, which is characterized in that the method also includes:
Obtain the number of terminal access server;
According to the number of the terminal access server, different many ratios of the access number of the terminal are determined;
When meeting condition four, condition five and condition six simultaneously, judges whether different many ratios are less than and preset different many ratios, if It is then to determine that failure is not present in the predetermined item of the server affiliated area service.
6. server failure recognition methods according to claim 5, which is characterized in that the method also includes:
When meeting following full terms, determine that there are failures for the server:
The server affiliated area is judged as the second class failure, and the terminal quantity for accessing the server is greater than belonging to it The quantity of the terminal of any one server of remaining in region access;
The average log duration for accessing the terminal of the server is less than any one service of remaining in the server affiliated area The average log duration of the terminal of device access;
Terminal in the whole terminals accessed in the region greater than predetermined ratio accesses on the server.
7. server failure recognition methods according to any one of claims 1 to 6, which is characterized in that the information collection The determination method in section includes:
Centered on the timing node, on time dimension, each n of timing node upstream and downstream connected numbers are chosen According to access point, total 2n+1 node is as information gathering point, wherein the data access point is that remaining terminal accesses the service The timing node of device.
8. server failure recognition methods according to claim 7, which is characterized in that the variation tendency information includes 2n The slope of+1 node line;
After the log duration information includes terminal access server, the total duration that is spent in the predetermined item.
9. a kind of server failure identification device, the server is used for service reservation project, multiple described predetermined for servicing The server distribution of project accesses the clothes for servicing the predetermined item in multiple regions, when terminal logs in the predetermined item It is engaged in device, which is characterized in that the identification device includes:
Exception information obtains module, the exception information occurred when for obtaining terminal access server;
Section determining module, for according to the exception information, determining that the terminal for the exception information occur accesses the server Timing node and the corresponding information collection section of the timing node;
Critical parameter obtains module, for obtaining the variation tendency information and log duration information in the information collection section;
First judgment module, for judging whether the server goes out according to the variation tendency information and log duration information Existing failure.
10. server failure identification device according to claim 9, which is characterized in that described device further include:
Area information obtains module, the total time information and sum of all terminals for obtaining the access server affiliated area Measure information;
Second judgment module, for judging the server with the presence or absence of event according to the total time information and total quantity information Barrier.
11. server failure identification device according to claim 10, which is characterized in that the first judgment module is used for According to the variation tendency information and log duration information, the server service is judged using the first decision condition prestored The predetermined item whether there is first kind failure;
And/or
Second judgment module is used to be determined according to the total time information and the total quantity information using second prestored Condition judges the predetermined item of the affiliated area service of the server with the presence or absence of the second class failure.
12. server failure identification device according to claim 11, which is characterized in that the first judgment module is used for Execute following judgement:
When meeting any one condition in the following conditions, determine the predetermined item of the server service there are the first kind therefore Barrier:
Condition one, the variation tendency information are less than the first preset value;
Condition two, the variation tendency information are equal to the second preset value;
Condition three, the log duration information are less than third preset value;
And/or
Second judgment module is for executing following judgement:
When meeting following full terms, determine the predetermined item of the server affiliated area service there are the second class therefore Barrier:
Less than the first duration, the terminal in described first when a length of access whole region is logged in for condition four, the total time information The lower quartile of the average log duration of the predetermined item;
Condition five, the total time information are less than the second duration;
Condition six, the total quantity information are greater than the 4th preset value.
13. server failure identification device according to claim 12, which is characterized in that described device further include:
Number obtains module, for obtaining the number of terminal access server;
Different crowd's rate determination block determines the access number of the terminal for the number according to the terminal access server Different many ratios;
Third judgment module, for when meeting condition four, condition five and condition six simultaneously, judging whether different many ratios are small In presetting different many ratios, if so, determining that failure is not present in the predetermined item of the server affiliated area service.
14. server failure identification device according to claim 13, which is characterized in that described device further include:
4th judging unit, for executing following judgement:
When meeting following full terms, determine that there are failures for the server:
The server affiliated area is judged as the second class failure, and the terminal quantity for accessing the server is greater than belonging to it The quantity of the terminal of any one server of remaining in region access;
The average log duration for accessing the terminal of the server is less than any one service of remaining in the server affiliated area The average log duration of the terminal of device access;
Terminal in the whole terminals accessed in the region greater than predetermined ratio accesses on the server.
15. according to the described in any item server failure identification devices of claim 9 to 14, which is characterized in that the region letter Breath obtains module
Area information determination unit, on time dimension, choosing on the timing node centered on the timing node The data access point being connected with each n in downstream is swum, total 2n+1 node is as information gathering point, wherein the data access point The timing node of the server is accessed for remaining terminal.
16. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program Server failure recognition methods as claimed in any one of claims 1 to 8 is realized when being executed by processor.
CN201910865239.8A 2019-09-12 2019-09-12 Server fault identification method and device and storage medium Active CN110519102B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910865239.8A CN110519102B (en) 2019-09-12 2019-09-12 Server fault identification method and device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910865239.8A CN110519102B (en) 2019-09-12 2019-09-12 Server fault identification method and device and storage medium

Publications (2)

Publication Number Publication Date
CN110519102A true CN110519102A (en) 2019-11-29
CN110519102B CN110519102B (en) 2020-10-30

Family

ID=68630775

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910865239.8A Active CN110519102B (en) 2019-09-12 2019-09-12 Server fault identification method and device and storage medium

Country Status (1)

Country Link
CN (1) CN110519102B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111835702A (en) * 2020-01-20 2020-10-27 北京嘀嘀无限科技发展有限公司 Login method, device, equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102130726A (en) * 2010-01-15 2011-07-20 西门子公司 Fault diagnosis method in vehicle-mounted wireless communication system and device thereof
CN103650569A (en) * 2013-07-22 2014-03-19 华为技术有限公司 Fault diagnosis method and device of wireless network
WO2015150743A1 (en) * 2014-03-31 2015-10-08 British Telecommunications Public Limited Company Network monitor
CN107391341A (en) * 2017-07-21 2017-11-24 郑州云海信息技术有限公司 A kind of fault early warning method and device
CN107864063A (en) * 2017-12-12 2018-03-30 北京奇艺世纪科技有限公司 A kind of abnormality monitoring method, device and electronic equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102130726A (en) * 2010-01-15 2011-07-20 西门子公司 Fault diagnosis method in vehicle-mounted wireless communication system and device thereof
CN103650569A (en) * 2013-07-22 2014-03-19 华为技术有限公司 Fault diagnosis method and device of wireless network
WO2015150743A1 (en) * 2014-03-31 2015-10-08 British Telecommunications Public Limited Company Network monitor
CN107391341A (en) * 2017-07-21 2017-11-24 郑州云海信息技术有限公司 A kind of fault early warning method and device
CN107864063A (en) * 2017-12-12 2018-03-30 北京奇艺世纪科技有限公司 A kind of abnormality monitoring method, device and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111835702A (en) * 2020-01-20 2020-10-27 北京嘀嘀无限科技发展有限公司 Login method, device, equipment and computer readable storage medium
CN111835702B (en) * 2020-01-20 2023-10-31 北京嘀嘀无限科技发展有限公司 Login method, login device, login equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN110519102B (en) 2020-10-30

Similar Documents

Publication Publication Date Title
CN111209131B (en) Method and system for determining faults of heterogeneous system based on machine learning
CN108052528A (en) A kind of storage device sequential classification method for early warning
US20060129367A1 (en) Systems, methods, and computer program products for system online availability estimation
CN106980627A (en) The display methods and device of log content
CN109586952A (en) Method of server expansion, device
CN108170566A (en) Product failure information processing method, system, equipment and collaboration platform
CN106598020B (en) A kind of equipment failure diagnostic method and system based on BIT and case fusion
CN106407083A (en) Fault detection method and device
CN109495291B (en) Calling abnormity positioning method and device and server
WO2023071039A1 (en) Fault diagnosis method, apparatus and device, and readable storage medium
CN110795260B (en) Smart customer care system
CN112286771A (en) Alarm method for monitoring global resources
CN104364664A (en) An algorithm and structure for creation, definition, and execution of an SPC rule decision tree
CN110175100B (en) Storage disk fault prediction method and prediction system
CN115658420A (en) Database monitoring method and system
CN102959521B (en) The management method of computer system is with administrating system
CN110519102A (en) A kind of server failure recognition methods, device and storage medium
CN106502887A (en) A kind of stability test method, test controller and system
CN110489260A (en) Fault recognition method, device and BMC
CN106878109A (en) Server detection method and server system
CN114090037A (en) Service degradation method, device, computer equipment and storage medium
CN113408969B (en) Maintenance scheme generation method and system for distributed system
CN115840686A (en) Server performance test method and device, electronic equipment and storage medium
CN114661505A (en) Storage component fault processing method, device, equipment and storage medium
CN114338458A (en) Data security detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240403

Address after: Room 503, Building 3, No. 6, Xicheng Xi'an North Road, Xinluo District, Longyan City, Fujian Province, 364000

Patentee after: Xie Xinyong

Country or region after: China

Address before: 550000 floor 5, building a, Liyang building (Gaoke No.1), 160 Changling South Road, Guiyang National High tech Industrial Development Zone, Guiyang City, Guizhou Province

Patentee before: GUIYANG GLOUD TECHNOLOGY Co.,Ltd.

Country or region before: China

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20240424

Address after: Room 501-2432, Office Building, Development Zone, No. 8, Xingsheng South Road, Economic Development Zone, Miyun District, Beijing 100000 (Central Office Area of Economic Development Zone)

Patentee after: Beijing Abacus Industrial Technology Co.,Ltd.

Country or region after: China

Address before: Room 503, Building 3, No. 6, Xicheng Xi'an North Road, Xinluo District, Longyan City, Fujian Province, 364000

Patentee before: Xie Xinyong

Country or region before: China