Summary of the invention
In order to solve the above-mentioned technical problems, the present invention provides a kind of server failure recognition methods, device and storages to be situated between
Matter.
Server failure recognition methods provided by the invention, the server is used for service reservation project, multiple for taking
Be engaged in the predetermined item server distribution in multiple regions, access is described pre- for servicing when terminal logs in the predetermined item
Determine item destination server, the described method comprises the following steps:
The exception information occurred when obtaining terminal access server;
According to the exception information, determine that the terminal for the exception information occur accesses the timing node of the server, with
And the corresponding information collection section of the timing node;
Obtain the variation tendency information and log duration information in the information collection section;
According to the variation tendency information and log duration information, judge whether the server breaks down.
The above method also has the following characteristics that the method also includes following steps:
Obtain the total time information and total quantity information for all terminals for accessing the server affiliated area;
According to the total time information and total quantity information, judge the server with the presence or absence of failure.
The above method also has the following characteristics that described according to the variation tendency information and log duration information, judges institute
It states server and whether breaks down and include:
According to the variation tendency information and log duration information, the service is judged using the first decision condition prestored
The predetermined item of device service whether there is first kind failure;
And/or
It is described according to the total time information and total quantity information, judge that the server includes: with the presence or absence of failure
According to the total time information and the total quantity information, the service is judged using the second decision condition prestored
The predetermined item of the affiliated area service of device whether there is the second class failure.
The above method also have the following characteristics that it is described according to the variation tendency information and log duration information, using pre-
The first decision condition deposited judges that the predetermined item of the server service includes: with the presence or absence of first kind failure
When meeting any one condition in the following conditions, determining the predetermined item of the server service, there are failures:
Condition one, the variation tendency information are less than the first preset value;
Condition two, the variation tendency information are equal to the second preset value;
Condition three, the log duration information are less than third preset value;
And/or
It is described according to the total time information and the total quantity information, using described in the second decision condition judgement prestored
The predetermined item of the affiliated area service of server whether there is the second class failure
When meeting following full terms, determining the predetermined item of the server affiliated area service, there are failures:
The terminal of condition four, the total time information less than the first duration, in described first when a length of access whole region
Log in the lower quartile of the average log duration of the predetermined item;
Condition five, the total time information are less than the second duration;
Condition six, the total quantity information are greater than the 4th preset value.
The above method also have the following characteristics that the method also includes:
Obtain the number of terminal access server;
According to the number of the terminal access server, different many ratios of the access number of the terminal are determined;
When meeting condition four, condition five and condition six simultaneously, judges whether different many ratios are less than and preset different many ratios
Rate, if so, determining that failure is not present in the predetermined item of the server affiliated area service.
The above method also have the following characteristics that the method also includes:
When meeting following full terms, determine that there are failures for the server:
The server affiliated area is judged as the second class failure, and the terminal quantity for accessing the server is greater than it
The quantity of the terminal of any one server of remaining in affiliated area access;
The average log duration for accessing the terminal of the server is less than in the server affiliated area remaining any one
The average log duration of the terminal of server access;
Terminal in the whole terminals accessed in the region greater than predetermined ratio accesses on the server.
The above method also has the following characteristics that the determination method in the information collection section includes:
Centered on the timing node, on time dimension, chooses each n of timing node upstream and downstream and be connected
Data access point, total 2n+1 node is as information gathering point, wherein the data access point be remaining terminal access described in
The timing node of server.
The above method also has the following characteristics that the variation tendency information includes the slope of 2n+1 node line;
After the log duration information includes terminal access server, the total duration that is spent in the predetermined item.
Present invention also provides a kind of server failure identification device, the server is used for service reservation project, multiple
Server distribution for servicing the predetermined item accesses when multiple regions, the terminal login predetermined item for servicing
The scheduled item destination server, the identification device include:
Exception information obtains module, the exception information occurred when for obtaining terminal access server;
Section determining module, for according to the exception information, determining that the terminal for the exception information occur accesses the clothes
The timing node of business device and the corresponding information collection section of the timing node;
Critical parameter obtains module, for obtaining the variation tendency information and log duration letter in the information collection section
Breath;
First judgment module, for judging that the server is according to the variation tendency information and log duration information
No failure.
Above-mentioned apparatus also has the following characteristics that described device further include:
Area information obtain module, for obtains access the server affiliated area all terminals total time information with
Total quantity information;
Second judgment module, for judging whether the server is deposited according to the total time information and total quantity information
In failure.
Above-mentioned apparatus also has the following characteristics that the first judgment module for according to the variation tendency information and login
Duration information judges the predetermined item of the server service with the presence or absence of the first kind using the first decision condition prestored
Failure;
And/or
Second judgment module is used to utilize second prestored according to the total time information and the total quantity information
Decision condition judges the predetermined item of the affiliated area service of the server with the presence or absence of the second class failure.
Above-mentioned apparatus also has the following characteristics that the first judgment module for executing following judgement:
When meeting any one condition in the following conditions, determining the predetermined item of the server service, there are the first kind
Failure:
Condition one, the variation tendency information are less than the first preset value;
Condition two, the variation tendency information are equal to the second preset value;
Condition three, the log duration information are less than third preset value;
And/or
Second judgment module is for executing following judgement:
When meeting following full terms, determining the predetermined item of the server affiliated area service, there are the second classes
Failure:
The terminal of condition four, the total time information less than the first duration, in described first when a length of access whole region
Log in the lower quartile of the average log duration of the predetermined item;
Condition five, the total time information are less than the second duration;
Condition six, the total quantity information are greater than the 4th preset value.
Above-mentioned apparatus also has the following characteristics that described device further include:
Number obtains module, for obtaining the number of terminal access server;
Different crowd's rate determination block determines the access of the terminal for the number according to the terminal access server
Different many ratios of number;
Third judgment module, for when simultaneously meet condition four, condition five and condition six when, judge it is described it is different crowd ratios be
It is no be less than preset different many ratios, if so, determining the predetermined item of the server affiliated area service, there is no failures.
Above-mentioned apparatus also has the following characteristics that described device further include:
4th judging unit, for executing following judgement:
When meeting following full terms, determine that there are failures for the server:
The server affiliated area is judged as the second class failure, and the terminal quantity for accessing the server is greater than it
The quantity of the terminal of any one server of remaining in affiliated area access;
The average log duration for accessing the terminal of the server is less than in the server affiliated area remaining any one
The average log duration of the terminal of server access;
Terminal in the whole terminals accessed in the region greater than predetermined ratio accesses on the server.
Above-mentioned apparatus also has the following characteristics that the area information obtains module and includes:
Area information determination unit is used for centered on the timing node, on time dimension, segmentum intercalaris when choosing described
Each n of upstream and downstream connected data access points of point, total 2n+1 node is as information gathering point, wherein the data connect
Access point is the timing node that remaining terminal accesses the server.
Present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer
Server failure recognition methods as described above is realized when program is executed by processor.
Using in the present invention server failure recognition methods and device, utilize what can conveniently be got in the prior art
The information such as log duration, the login time node of terminal quickly sentence the server for the terminal access for Information abnormity occur
It is disconnected, so that the position of the server of server failure and failure be fast and accurately identified, efficiently solve the prior art
The process of middle identification positioning failure server has that time-consuming, identification positioning is not accurate enough.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.It needs
Illustrate, in the absence of conflict, the features in the embodiments and the embodiments of the present application can mutual any combination.
This application provides a kind of server failure recognition methods, can be according to the terminal that can be obtained in the prior art
The information such as log duration, login time node, quickly to there is Information abnormity terminal access server judge, thus
The position of the server of server failure and failure is fast and accurately identified.
In running game either some large-scale national projects, need using server to be game or project operation
It provides and supports and ensure.In order to guarantee game player or user in China, cracking fastly trip can be linked into
In play or project, server can be set in national multiple regions, multiple servers, each server can be set in each region
Multiple game or multiple projects can be serviced respectively, i.e., can be run simultaneously on same one server of time multiple game or
Person's project.
As shown in Figure 1, a kind of server failure recognition methods, comprising the following steps:
The exception information occurred when S10, acquisition terminal access server;
S20, according to exception information, determine the timing node of the terminal access server of the exception information occur, with timely
The corresponding information collection section of intermediate node;
S30, the variation tendency information and log duration information for obtaining information collection section;
S40, according to variation tendency information and log duration information, judge whether server breaks down.
In S10, exception information is that terminal issues, for example terminal appearance sudden strain of a muscle in logging in game process is moved back, Caton, trip
Play starting failure, can not to enter game by App medium, belongs to exception information.The case where starting failure due to game is non-
It is often complicated, but causing the exception information of game starting failure is quoted by server, due to can not will in the application
All problems all cover, therefore, under normal circumstances, lesser extent in the exception information that when terminal access server occurs
It is abnormal, for example Caton, sudden strain of a muscle are moved back, player's disconnection reconnecting, decoding channels create failure, server GPU resource deficiency etc., these degree
When lighter exception occurs, the probability very little that server breaks down, it is therefore not necessary to carry out fault identification to server, directly
Ignore exception.Occur when terminal is in access server degree it is heavier abnormal when, be possible to be by server failure
Cause, that is to say, that only occur when terminal is in access server degree it is heavier abnormal when, just server is carried out therefore
Barrier identification.Therefore, the exception information referred in this method S10 refers to the heavier exception of the degree occurred when terminal access server
Information.Herein, it should be noted that the exception information occurred when obtaining terminal access server is using existing in the prior art
Method, and be to have been able to realize in the prior art.
Above method step is mainly used for the predetermined item (can be game or mega project) run on server
Whether go wrong and judged, further, in order to ensure predetermined item normal table operation, this method further include for pair
Whether predetermined item that terminal occurs running on the server affiliated area of exception information when logging in, which goes wrong, is judged
Method and step, as shown in Figure 2, comprising the following steps:
S50, the total time information and total quantity information for obtaining all terminals for accessing the server affiliated area;
S60, according to total time information and total quantity information, judge server with the presence or absence of failure.
It has been related to information collection section in above-mentioned control method, it is also assumed that be the window phase of data acquisition, it is right
The desired node being acquired in information collection section carries out data acquisition, to obtain the variation tendency information in S30 and step on
Record duration information.Here, the determination method to information section is defined, and centered on timing node, on time dimension, choosing
Each n of timing node upstream and downstream connected data access points are taken, total 2n+1 node is as information gathering point, wherein number
It is the timing node of remaining terminal access server according to access point.The selection of n value determines according to actual conditions, for example, when can be with
It is set according to the performance quality of server, when server performance is preferable, the quantity of the node of selection can be relatively fewer,
When server performance is poor, the quantity of the node of selection is more, to obtain more accurate data, it is preferable that n value can be
5 to 20.Herein, it should be noted that n node of timing node upstream is in the terminal access service for exception information occur
The timing node that the terminal access in server is had already accessed to before device, can be the timing node normally logged in, can also be with
It is exception information timing node occurred.The n node in timing node downstream is in the terminal access service for exception information occur
The timing node of the terminal access server of server is linked into after device.Either n node of upstream or the n in downstream
A node, be all using the time as unit continued presence, with guarantee information collection section obtain variation tendency information it is reliable
Property.
In a specific embodiment, the variation of the variation tendency information acquisition in S30 according to information collection section becomes
Gesture information is the slope of 2n+1 node line.That is, if using the time as abscissa, with the duration spent when access server
As ordinate, 2n+1 node is connected in turn sequentially in time, can substantially obtain straight line, makes 2n+1
Node distribution is in the two sides of straight line.Using the slope of this straight line as the variation tendency information in information collection section.
In S30, after log duration information includes terminal access server, in predetermined item, such as in predetermined games
The total duration of cost.For example, a terminal since triggering App to game is exited, altogether time-consuming 30 minutes, then the terminal is stepped on
Recording duration information is 30 minutes.
In S50, total time the determination method of information be to determine region belonging to the server, obtain and serviced in the region
The quantity of device counts the quantity of the terminal accessed in each server, when counting each terminal since access server
It carves the online hours between at the time of disconnection and server connects, exists to whole terminals in servers whole in the region
It is averaged after the summation of line duration, using the average value as total time information.That is, Servers-all access in a certain region
The average value of the duration of the corresponding player's running game of terminal.
In S50, the method for determining total quantity information is to determine region belonging to the server, obtains and services in the region
The quantity of device counts the quantity of the terminal accessed in each server, to all whole of server accesses whole in the region
The quantity at end is summed, as total quantity information.That is, there is player in server to run some game described in a certain region
Total sample size.
In a preferred embodiment, according to variation tendency information and log duration information, judge whether server goes out
Existing failure is in specific implementation procedure, comprising:
According to variation tendency information and log duration information, server service is judged using the first decision condition prestored
Predetermined item whether there is first kind failure.
Specifically, utilizing the first decision condition judgement service prestored according to variation tendency information and log duration information
The predetermined item of device service whether there is first kind failure
When meeting any one condition in the following conditions, there are failures for the predetermined item of determining server service:
Condition one, variation tendency information are less than the first preset value;
Condition two, variation tendency information are equal to the second preset value;
Condition three, log duration information are less than third preset value.
Wherein, the second preset value is 0, that is to say, that condition two represents each in entire information collection section
The log duration of node is the same.But the reason of network environment due to terminal performance itself, where terminal, is all
It will cause log duration difference, so, if once variation tendency information is 0, illustrate to certainly exist problem.First preset value and
Third preset value is that by such as CART decision tree etc., commonly classification prediction model is determined when establishing fault identification model
(being discussed in detail below).
In another preferred embodiment, according to total time information and total quantity information, judge that server whether there is
Failure includes:
According to total time information and total quantity information, the affiliated area of server is judged using the second decision condition prestored
The predetermined item of service whether there is the second class failure.
Specifically, judging server using the second decision condition prestored according to total time information and total quantity information
The predetermined item of affiliated area service whether there is the second class failure
When meeting following full terms, there are failures for the predetermined item of determining server affiliated area service:
Less than the first duration, the terminal in first when a length of access whole region logs in predetermined for condition four, total time information
The lower quartile of the average log duration of project;
Condition five, total time information are less than the second duration;
Condition six, total quantity information are greater than the 4th preset value.
Wherein, the first duration is determined by the method for statistics, that is, it is predetermined will to access the login of the terminal in whole region
The average log duration of project is ordered from large to small according to duration, and total time information is located at one of latter four parts of arrangement
Position is considered as it and has met condition four.Second duration and the 4th preset value are also during server failure identification model
Commonly classify what prediction model determined by such as CART decision tree etc..
Herein, it should be noted that when terminal network environment in the environment it is poor when, it is possible that terminal is attempted
The mode connected infinitely is taken, to drag down the numerical value of total information duration, causes to judge by accident, to will be not present the feelings of failure
Condition is classified as that there are failures.In order to solve this problem, the erroneous judgement to the second class failure, the method in the present invention are avoided further include:
Obtain the number of terminal access server;
According to the number of terminal access server, different many ratios of the access number of terminal are determined;
When meeting condition four, condition five and condition six simultaneously, judges whether different many ratios are less than and preset different many ratios, if
It is that then failure is not present in the predetermined item of determining server affiliated area service.
Wherein, the network condition for presetting different many ratios region according to locating for server is set, and preferably 70%,
That is, when the number of the access server of some terminal accounts for the 30% of the total degree of whole terminal access servers,
The case where thinking the server does not meet the second class failure, determines that it is normal operating condition, it is believed that the problem of being terminal.
Further, in order to further increase judgement accuracy, erroneous judgement situation is avoided to occur, the fault identification in the present invention
Method further include:
When meeting following full terms, there are failures for determining server:
The server affiliated area is judged as there are the second class failure, and accesses the quantity of the terminal of the server
Greater than the quantity for the terminal that remaining any one server in its affiliated area accesses;
The average log duration for accessing the terminal of the server is less than in the server affiliated area remaining any one
The average log duration of the terminal of server access;
Occur on that server in the whole terminals accessed in the region greater than the terminal of predetermined ratio.
If having and when only a server all meets above three condition, determine that the server breaks down, without
It is that predetermined item in whole region there is a problem.
Server failure recognition methods in the present invention is deployed on linux system, is judged to when by the above method
When existing failure, it can be first kind failure, the second class failure either individual server failure, take in following methods at least
It is a kind of to be handled:
1, fault message is recorded, and fault message is pushed to the backstage of linux system, operator is it can be seen that clothes
The fault message of business device, to realize coordinated management;
2, the service for the server that failure is related to is shut down;
3, the fault message of server is sent at maintenance personal by way of mail, short message or wechat
Reason;
4, after the trouble shooting of failed server, the state of server is changed on normal on the backstage of linux system
Linear state, to guarantee server energy normal use.
Before the fault recognition method in operation the present patent application, need to establish fault identification model.Establishing failure
It during identification model, needs to sample the server there are failure, to construct sample data, know to obtain in failure
The various threshold values used during not.When carrying out failure sampling, whole server failures in enough duration sections are acquired
Information can clearly and completely react fault message to guarantee that sampled data is more than enough.For example, whole to the same day when acquisition same day 0
All information when 24 only, including fault message and non-faulting information, are handled fault message, for example remove fault message
In some unnecessary noise informations, data are changed, to obtain the numerical value about Rule of judgment of fault point.For example,
The first preset value for being judged variation tendency information it has been related in the first Rule of judgment, and to log duration
The third preset value that information is judged.So in order to obtain the first preset value and third preset value, need to fault data into
Row processing.It is treated in journey to fault data, the failure variation tendency information obtained according to fault information acquisition section
For the slope of 2n+1 node line, using the slope of this straight line as the variation tendency information in fault information acquisition section.
After failure log duration information includes failed terminals access server, in predetermined item, such as predetermined games
The total duration of middle cost.For example, failed terminals since triggering APP to game is exited, time-consuming 5 minutes altogether, then the failure is whole
The failure log duration information at end is 5 minutes.
Using CART decision tree, variation tendency information and log duration information etc. are input to decision tree as mould dimension is entered
It is middle to be used as independent variable, it will whether be failed server as dependent variable.Decision tree after toning is joined will automatically calculate out first
The first preset value, third preset value etc. in Rule of judgment for being judged.Likewise, each for carrying out in this method
The threshold value of judgement can determine by way of sampling and using CART decision tree.There are the numbers of uncertainty in decision condition
Value, such as lower quartile are and the moulds of classifying before because one opposite numerical value of demand is judged in following judgement
Numerical value of the threshold value determined by type close to a certain relativity in sample set at that time.
In addition, before the online identification for server failure of server failure identification model in the present invention, it is also necessary into
Row test.When testing, accurate rate (PPV) is calculated using confusion matrix, Precision=TP/ (TP+FP), i.e., in mould
Type prediction is the correct specific gravity of model prediction in all results of positive (Positive).Service by testing, in the application
The PPV of device recognition methods is greater than 90%.
Present invention also provides a kind of server failure identification device, server is used for service reservation project, multiple to be used for
The server distribution of service reservation project is in multiple regions, and access is used for the clothes of service reservation project when terminal logs in predetermined item
Business device, as shown in figure 3, identification device includes:
Exception information obtains module, the exception information occurred when for obtaining terminal access server;
Section determining module, for according to exception information, determine the terminal access server for the exception information occur when
Intermediate node and the corresponding information collection section of timing node;
Critical parameter obtains module, for obtaining the variation tendency information and log duration information in information collection section;
First judgment module, for judging whether server event occurs according to variation tendency information and log duration information
Barrier.
Further, device further include:
Area information obtain module, for obtains access the server affiliated area all terminals total time information with
Total quantity information;
Second judgment module, for judging server with the presence or absence of failure according to total time information and total quantity information.
Wherein, area information acquisition module includes:
Area information determination unit, for centered on timing node, on time dimension, access time node upstream and
Each n in downstream connected data access points, total 2n+1 node is as information gathering point, wherein data access point is remaining end
Hold the timing node of access server.
Further, first judgment module is used for according to variation tendency information and log duration information, utilizes the prestored
One decision condition judges the predetermined item of server service with the presence or absence of first kind failure;
And/or
Second judgment module is used to be judged according to total time information and total quantity information using the second decision condition prestored
The predetermined item of the affiliated area service of server whether there is the second class failure.
Further, first judgment module is for executing following judgement:
When meeting any one condition in the following conditions, there are failures for the predetermined item of determining server service:
Condition one, variation tendency information are less than the first preset value;
Condition two, variation tendency information are equal to the second preset value;
Condition three, log duration information are less than third preset value;
And/or
Second judgment module is for executing following judgement:
When meeting following full terms, there are failures for the predetermined item of determining server affiliated area service:
Less than the first duration, the terminal in first when a length of access whole region logs in predetermined for condition four, total time information
The lower quartile of the average log duration of project;
Condition five, total time information are less than the second duration;
Condition six, total quantity information are greater than the 4th preset value.
Further, device further include:
Number obtains module, for obtaining the number of terminal access server;
Different crowd's rate determination block determines the different of the access number of terminal for the number according to terminal access server
Many ratios;
Third judgment module, for when meeting condition four, condition five and condition six simultaneously, judging whether different many ratios are small
In presetting different many ratios, if so, failure is not present in the predetermined item of determining server affiliated area service.
Further, device further include:
4th judging unit, for executing following judgement:
When meeting following full terms, there are failures for determining server:
The server affiliated area is judged as the second class failure, and the terminal quantity for accessing the server is greater than it
The quantity of the terminal of any one server of remaining in affiliated area access;
The average log duration for accessing the terminal of the server is less than in the server affiliated area remaining any one
The average log duration of the terminal of server access;
Terminal in the whole terminals accessed in the region greater than predetermined ratio accesses on the server.
Since the server failure identification device in the present patent application is for realizing above-mentioned server failure recognition methods
, therefore, details are not described herein for function and effect for server failure identification device.
In addition, being stored thereon with computer program, computer present invention also provides a kind of computer readable storage medium
Above-mentioned server failure recognition methods is realized when program is executed by processor.
The present patent application is passed through by the way that expert's sample set is sampled and established to server failure according to expert's sample set
Classification prediction model determines each threshold value for being judged, so as to more accurately and rapidly determine server
Failure.
Descriptions above can combine implementation individually or in various ways, and these variants all exist
Within protection scope of the present invention.
Those of ordinary skill in the art will appreciate that all or part of the steps in the above method can be instructed by program
Related hardware is completed, and described program can store in computer readable storage medium, such as read-only memory, disk or CD
Deng.Optionally, one or more integrated circuits also can be used to realize, accordingly in all or part of the steps of above-described embodiment
Ground, each module/unit in above-described embodiment can take the form of hardware realization, can also use the shape of software function module
Formula is realized.The present invention is not limited to the combinations of the hardware and software of any particular form.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that including the article of a series of elements or equipment not only includes those elements, but also including not having
There is the other element being expressly recited, or further includes for this article or the intrinsic element of equipment.Do not limiting more
In the case where system, the element that is limited by sentence " including ... ", it is not excluded that in the article or equipment for including the element
There is also other identical elements.
The above examples are only used to illustrate the technical scheme of the present invention and are not limiting, reference only to preferred embodiment to this hair
It is bright to be described in detail.Those skilled in the art should understand that can modify to technical solution of the present invention
Or equivalent replacement should all cover in claim model of the invention without departing from the spirit and scope of the technical solution of the present invention
In enclosing.