CN110348209A

CN110348209A - Data processing method, device, computer equipment and storage medium

Info

Publication number: CN110348209A
Application number: CN201810307408.1A
Authority: CN
Inventors: 申瑞珉
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2018-04-08
Filing date: 2018-04-08
Publication date: 2019-10-18

Abstract

This application involves a kind of data processing method, system, computer equipment and storage mediums.Method includes: acquisition initial data, and initial data includes terminal iidentification, user identifier, login result and entry address；Initial data is reconstructed to obtain reconstruct data, reconstruct data include station address and address distance, and address distance is entry address at a distance from station address；According to initial data and the corresponding first eigenvector of reconstruct each terminal iidentification of data configuration, first eigenvector include user identifier quantity factor, address distance average value factor and login successfully rate factor；First eigenvector is inputted into supervised classification model, determine whether terminal iidentification is that malice identifies, supervised classification model is the disaggregated model being trained to training sample, and training sample includes second feature vector, and second feature vector is identical as the data structure of first eigenvector.It can be improved the accuracy rate of detection using this method.

Description

Data processing method, device, computer equipment and storage medium

Technical field

This application involves technical field of data processing, more particularly to a kind of data processing method, device, computer equipment And storage medium.

Background technique

With the development of computer information technology, there is hacker and hacker's industrial chain.The upstream industry chain of hacker, usually User's account number cipher used in website or application program is obtained by hitting the channels such as library, wooden horse, fishing or virus, then It is packaged into account number cipher set, is sold to the downstream industry chain of hacker.The downstream industry chain of hacker is close to account by automatic machine Account number cipher in code collection conjunction carries out login authentication one by one, and picks out the correct account for a certain website or application program Password, and then stolen.Account number cipher is stolen, and not only threatens under the virtual assets (such as Q coin) and line of user property (such as Borrowed money by wechat to kith and kin), personal reputation (as forwarded content improperly microblogging) is influenced, or even can also affect on virtual society The ecology rule (as waterborne troops carrying out products propaganda as manipulated largely stolen accounts) of friendship, be individual subscriber or enterprise bring it is tired It disturbs, therefore detection is carried out to steal-number and is had a very important significance.

Traditional steal-number detection, comprising: the detection based on front end virus, wooden horse；Version malice is logged in based on client to jump The recognition methods etc. of change.It is mainly concentrating on upstream industry chain, and the detection of the various channels of upstream industry chain is carried out to client Coverage rate is low, and therefore, the accuracy rate of detection is lower.

Summary of the invention

Based on this, it is necessary in view of the above technical problems, provide a kind of data processing side that can be improved Detection accuracy Method, device, computer equipment and storage medium.

A kind of data processing method, which comprises

Initial data is obtained, the initial data includes terminal iidentification, user identifier, login result and entry address；

Be reconstructed to obtain reconstruct data to the initial data, the reconstruct data include station address and address away from From the address distance is the entry address at a distance from the station address；

According to the corresponding first eigenvector of each terminal iidentification of the initial data and the reconstruct data configuration, institute State first eigenvector include user identifier quantity factor, the address distance average value factor and login successfully rate factor；

The first eigenvector is inputted into supervised classification model, determines whether the terminal iidentification is that malice identifies, institute Stating supervised classification model is the disaggregated model being trained to training sample, the training sample include second feature to Amount, the second feature vector are identical as the data structure of the first eigenvector.

A kind of data processing equipment, described device include:

Initial data obtains module, includes terminal iidentification for obtaining initial data described in initial data, user identifier, steps on Record result and entry address；

Data reconstruction module obtains reconstruct data for the initial data to be reconstructed, and the reconstruct data include Station address and address distance, the address distance are the entry address at a distance from the station address；

Feature vector constructing module, for according to the initial data and each terminal iidentification of reconstruct data configuration Corresponding first eigenvector, the first eigenvector include the average value of user identifier quantity factor, the address distance Factor and login successfully rate factor；

Malice mark determining module determines the terminal for the first eigenvector to be inputted supervised classification model It identifies whether maliciously to identify, the supervised classification model is the disaggregated model being trained to training sample, the instruction Practicing sample includes second feature vector, and the second feature vector is identical as the data structure of the first eigenvector.

A kind of computer equipment, including memory and processor, the memory are stored with computer program, the processing Device performs the steps of when executing the computer program

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor It is performed the steps of when row

Above-mentioned data processing method, device, computer equipment and storage medium pass through the behavior in hacker's downstream industry chain Collect link to be detected, i.e., automatic machine logs in the account number cipher in account password set one by one by obtaining hacker Initial data when verifying, and initial data is reconstructed to obtain reconstruct data；According to initial data and reconstruct data configuration The corresponding first eigenvector of each terminal iidentification；Finally, first eigenvector is inputted supervised classification model, terminal iidentification is determined It whether is that malice identifies.In this way, can effectively be advised without the detection for carrying out the various channels of hacker's upstream industry chain at client It keeps away login protocol to be cracked at client, improves the accuracy rate of detection.

Detailed description of the invention

Fig. 1 is the applied environment figure of data processing method in one embodiment；

Fig. 2 is the flow diagram of data processing method in one embodiment；

Fig. 3 is the idiographic flow schematic diagram of a step of the data processing method in one embodiment；

Fig. 4 is the idiographic flow schematic diagram of another step of the data processing method in one embodiment；

Fig. 5 is the building flow diagram of the supervised classification model of data processing method in one embodiment；

Fig. 6 is the idiographic flow schematic diagram of another step of the data processing method in one embodiment；

Fig. 7 is the structural block diagram of data processing equipment in one embodiment；

Fig. 8 is the structural block diagram of data processing equipment in second embodiment；

Fig. 9 is the structural block diagram of data processing equipment in third embodiment；

Figure 10 is the structural block diagram of data processing equipment in the 4th embodiment；

Figure 11 is the internal structure chart of computer equipment in one embodiment.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Data processing method provided by the present application can be applied in application environment as shown in Figure 1.Wherein, terminal 102 It is communicated by network with server 104.The data processing method of the application may operate on server 104.Terminal 102 User's logging request is sent to server 104, server 104 carries out storage to the user's logging request received and forms original number According to library.When the data processing method of embodiments herein executes on the server 104, original number can be obtained from database According to.It is to be appreciated that the data processing method of the application also may operate on another server different from server 104, When obtaining initial data, initial data is read from server 104.Wherein, terminal 102 can be, but not limited to be various personal meters Calculation machine, laptop, smart phone, tablet computer and portable wearable device, server 104 can use independent service The server cluster of device either multiple servers composition is realized.

In one embodiment, as shown in Fig. 2, providing a kind of data processing method, comprising the following steps:

S201 obtains initial data.

Initial data includes terminal iidentification, user identifier, login result and entry address.Terminal iidentification is for identifying hair The unique identification of the terminal of user's logging request out, such as can be internet protocol address or physical address.User identifier is to use The unique identification of the user of user's logging request is issued in mark, such as can be account, Customs Assigned Number or user name.Log in knot Fruit is for indicating whether the request results of user's logging request succeed.Entry address is to issue the terminal of user's logging request Location, such as can be for city or specific to street, community, even down to the specific address at which family.

S203 is reconstructed initial data to obtain reconstruct data.

Reconstruct data include station address and address distance, and address distance is entry address at a distance from station address.

The place that the place or user that station address is registered when can be user's registration most often log in；With entry address pair It answers, which can be for city or specific to street, community, even down to the specific address at which family.Such as, station address It can the city where user's registered permanent residence address or the specific address specific to street, community.

S205, according to initial data and the corresponding first eigenvector of reconstruct each terminal iidentification of data configuration.

First eigenvector include user identifier quantity factor, address distance average value factor and login successfully rate because Element.

User identifier quantity is the quantity for the user identifier that the corresponding terminal of terminal iidentification logs in.The quantity can be duplicate removal Quantity later.User identifier quantity factor is data relevant to user identifier quantity, or be can be according to user identifier The data that quantity determines.Similar, the average value factor of address distance is data relevant to the average value of address distance, or It can be the data determined according to the average value of address distance.The rate factor of logining successfully is data relevant to the rate that logins successfully, Or it can be the data determined according to the rate that logins successfully.The rate of logining successfully can be determined according to each login result.It is understood that Ground, in a relatively simple embodiment, user identifier quantity factor is user identifier quantity；The average value of address distance because Element is the average value of address distance；Logining successfully rate factor is to login successfully rate.

First eigenvector is inputted supervised classification model by S207, determines whether terminal iidentification is that malice identifies.Supervision point Class model is the disaggregated model being trained to training sample.Training sample includes second feature vector, second feature to It measures identical as the data structure of first eigenvector.

Malice is identified as the terminal iidentification of malicious peer.First eigenvector can be divided by supervised classification model Class determines whether terminal iidentification is malicious peer so that it is determined that whether the corresponding terminal of terminal iidentification is that malice identifies.

It is to be appreciated that training sample may include the tag along sort classified to second feature vector.I.e. second is special Sign vector include user identifier quantity factor, address distance average value factor and login successfully rate factor.

Above-mentioned data processing method is detected by collecting link in the behavior of hacker's downstream industry chain, i.e., by obtaining Initial data when hacker being taken to carry out login authentication one by one to the account number cipher in account password set to automatic machine, and to original Data are reconstructed to obtain reconstruct data；According to initial data and the corresponding fisrt feature of reconstruct each terminal iidentification of data configuration to Amount；Finally, first eigenvector is inputted supervised classification model, determine whether terminal iidentification is that malice identifies.In this way, being not necessarily to The detection that the various channels of hacker's upstream industry chain are carried out at client, can effectively evade login protocol and be broken at client Solution, improves the accuracy rate of detection.

It should be noted that the mode of malice mark is determine whether relative to the mode for directlying adopt rule and policy, The application exercises supervision classification to terminal iidentification using artificial intelligence technology, and fixed threshold value that no setting is required, subsequent strike is gone Supervised classification model to be obtained by training determines, it is difficult to guess out the strike behavior of supervised classification model, therefore can be certain Prevent bad person around strike, the final accuracy for improving detection in degree.

Meanwhile being identified relative to the mode for directlying adopt rule and policy to determine whether malice, need security expert's It is well-designed, higher cost and the horizontal experience for depending on expert.It is exercised supervision point using artificial intelligence technology to terminal iidentification Maintenance training sample need to be only fed back after class according to the later period, supervised classification model can learn rule out from training sample automatically, It reduces the dependence to security expert and simplifies system operation.

In one embodiment, terminal iidentification includes internet protocol address.Since physical address is often protected, It is difficult to get, and internet protocol address can often be easy to get, therefore, by internet protocol address as terminal Identifying has the beneficial effect for facilitating acquisition.

Further, is reconstructed to initial data the step of obtaining reconstruct data, i.e. step S203 may include:

S3031: the corresponding station address of each user identifier is determined.

The corresponding station address of each user identifier is determined according to user identifier.Can by way of data query, according to User identifier inquires station address.

It is to be appreciated that can also may include: according to original number before inquiring station address according to user identifier According to counting user address.Such as, moon active user (nearly 30 days logged users) can be done with full dose statistics, calculated each The frequency of the logged entry address of user takes an entry address of the highest frequency as station address, is stored in user data Library.

S3033: address distance is determined according to station address and entry address.

In the case where determining two addresses, the distance of two addresses can be determined.

In a wherein specific embodiment, according to station address and entry address determine address apart from the step of, can wrap It includes:

(a) it determines the corresponding login longitude of each entry address and logs in latitude and the corresponding user city of each user identifier User's longitude and user's latitude in city.

Login longitude is the corresponding longitude of entry address, and login latitude is the corresponding latitude of entry address, and user's longitude is The corresponding longitude of station address, user's latitude are the corresponding latitude of station address.

(b) according to log in longitude, log in latitude, user's longitude and user's latitude determine the corresponding address of each user identifier away from From.

In this way, determining the distance of two addresses by the distance between longitude and latitude of two addresses, may finally make Testing result is more accurate.It is to be appreciated that reconstruct data further include logging in longitude, logging in latitude, user in this embodiment Longitude and user's latitude.

In one embodiment, as shown in figure 4, it is corresponding according to initial data and reconstruct each terminal iidentification of data configuration The step of first eigenvector, i.e. step S205, comprising:

S402 obtains internet protocol address.

S404 inquires each user identifier and login result corresponding to internet protocol address within a preset period of time, and root According to each user identifier and login result, internet protocol address is corresponding within a preset period of time logins successfully rate and user's mark for statistics Know quantity.

S406 according to the average value of each address distance, logins successfully rate and user identifier quantity determines first eigenvector.

In the present embodiment, terminal is used as with internet protocol address (IP address, Internet Protocol Address) Mark is identified terminal.Internet protocol address can be got according to preset rules from raw data base, e.g., can obtained It takes and carries out the internet protocol address of the terminal of logging request in preset time period.Then, based on the internet protocol address, system The internet protocol address is corresponding within a preset period of time logins successfully rate and user identifier quantity for meter.Finally, according to each address The average value of distance logins successfully rate and user identifier quantity determines first eigenvector.

In a wherein specific embodiment, preset time period is preset time interval, if prefixed time interval can be 5 Minute.In this way, the data that data flow can be split into different batches count, and then available different terminals are each First eigenvector in batch.First eigenvector in each batch supervised classification model can be input to be divided Class, to obtain it is each results are averaged so that testing result is more accurate.

In one embodiment, first eigenvector further includes login times factor or/and logins successfully number factor.

The step of according to initial data and reconstruct each terminal iidentification of data configuration corresponding first eigenvector, further includes: According to login result, counts the corresponding login times of internet protocol address within a preset period of time or/and login successfully number.

According to the average value of each address distance, login successfully rate and user identifier quantity determines the step of first eigenvector Suddenly, comprising: according to the average value of each address distance, login successfully rate, user identifier quantity and login times or/and log at Function number determines first eigenvector.

In the present embodiment, login times factor is data relevant to login times, or can be secondary according to logging in The determining data of number.Logining successfully number factor is data relevant to number is logined successfully, or can according to log at The data that function number determines.It, can be directly using login times as login times in a wherein relatively simple embodiment Factor；Number factor will can also directly be logined successfully as logining successfully number factor.

In a wherein specific embodiment, the data structure of first eigenvector is as shown in table 1, including user identifier number Amount, the average value of address distance, login times and login successfully number.

1 first eigenvector of table

Feature name	Description
		uin_num	User identifier quantity (duplicate removal)
dist_avg	The average value of address distance
		login_cnt	Login times
login_succ_cnt	Login successfully number
		login_succ_rate	Login successfully rate

In a wherein specific embodiment, the data structure of initial data is as shown in table 2, and terminal iidentification is for network protocol Whether location, login result are logined successfully using the Data Identification of Boolean type, and entry address uses the login city of 32 integer types City's number indicates the city logged in.

2 original login data of table

Field name	Description	Data type
			ip	Internet protocol address	32 integers
uin	User identifier	64 integers
			succ	Whether login result (logins successfully)	Boolean type
city	Log in city number	32 integers

In a wherein specific embodiment, the data structure for reconstructing data is as shown in table 3, passes through user's registered permanent residence city It numbers to indicate station address in city.

Table 3 reconstructs data

Field name	Description	Data type
			home	User's registered permanent residence city ID	32 integers
city_lat	Log in longitude	32 floating numbers
			city_lng	Log in latitude	32 floating numbers
home_lat	User's longitude	32 floating numbers
			home_lng	User's latitude	32 floating numbers
dist	City distance	32 floating numbers

In one embodiment, first eigenvector is inputted into supervised classification model, determines whether terminal iidentification is evil After the step of meaning mark, further includes: when determining that terminal iidentification identifies for malice, will be logged on maliciously identifying corresponding terminal User identifier be determined as stolen user identifier.In this way, the user that can be identified to stolen user identifier does further account Number protection as the user that stolen user identifier can be forbidden to be identified carries out register, or forbids stolen user identifier institute The user of mark carries out some predetermined registration operations.In this way, can be damaged to avoid stolen family in the unwitting situation of user Other people or factum.

In one embodiment, first eigenvector is inputted into supervised classification model, determines whether terminal iidentification is evil After the step of meaning mark, further includes: when determining that terminal iidentification identifies for malice, forbid malice to identify corresponding terminal and used Family logs in.Forbid all user login operations being determined as on malicious peer.In this way, can terminate the malicious peer continue into Row malicious act.

In one embodiment, as shown in figure 5, the building process of supervised classification model, comprising:

S502 obtains training sample.

Training sample is input to supervised classification algorithm and is trained to obtain supervised classification model by S504.

In this way, training obtains supervised classification model by way of sample training.Ground can be cracked, in the present embodiment, The step of obtaining supervised classification model, accessed supervised classification model is the supervised classification model that the training obtains.

Further, the step of obtaining training sample, i.e. step S502 may include: to obtain malice sample, malice sample Including second feature vector corresponding with the mark terminal iidentification of malicious peer；Normal sample is obtained, normal sample includes and mark Know the corresponding second feature vector of terminal iidentification of normal terminal；Training sample is formed in conjunction with malice sample and normal sample.Such as This, so that including normal sample and malice sample in training sample, training result is more accurate, thus, so that testing result is more It is accurate.

In a wherein specific embodiment, training sample is made of second feature vector combining classification label.Tag along sort For mark second feature vector corresponding to terminal whether be malicious peer class label.It is calculated in order to facilitate unified numerical value, it can The class label of malicious peer is defined as 1, the class label of normal terminal is defined as 0.As shown in table 4, some training are given The simple examples of sample.

4 training sample of table

In one embodiment, training sample supervised classification algorithm is input to be trained to obtain supervised classification model The step of, i.e. step S504, comprising: training sample is input to multilayer neural network and is trained, supervised classification model is obtained. It, being capable of more great Cheng using multilayer neural network as supervised classification algorithm since the data structure of first eigenvector is limited Degree ground improves the accuracy of supervised classification model, thus, so that testing result is more accurate.It is to be appreciated that in other implementations In example, supervised classification algorithm can use deep neural network, can also achieve the effect that improve testing result, but due to the first spy The restriction for levying the data structure of vector, causes its effect poor compared to the effect of multilayer neural network.

In one embodiment, the activation primitive that the middle layer of multilayer neural network uses is the linear unit activating of amendment Function (ReLU activation primitive).In this way, can prevent gradient from disappearing in the training process, the accuracy of training result is improved, i.e., The accuracy of supervised classification model can be improved, thus, so that testing result is more accurate.It is to be appreciated that in other implementations In example, the middle layer of multilayer neural network can also be using other activation primitives, such as S sigmoid growth curve function (Sigmoid letter Number), flexible maximum activation function (Softmax activation primitive) etc..

In one embodiment, the activation primitive that the output layer of multilayer neural network uses is flexible maximum activation letter Number.In this way, to export result in the range of O-1, the accuracy of training result can be improved, it can improve supervised classification The accuracy of model, thus, so that testing result is more accurate.It is to be appreciated that in other embodiments, multilayer neural network Output layer can also be using other activation primitives, such as S sigmoid growth curve function, the linear unit activating function of amendment.

In one embodiment, it is trained, obtains as shown in fig. 6, training sample is input to multilayer neural network The step of supervised classification model, comprising:

Training sample is input to multilayer neural network and is trained by S602, so that the loss function of multilayer neural network It is minimum.

The expression formula of the loss function can be with are as follows:

Wherein, ξ represents the multilayer neural network of a needs assessment,For input training sample second feature to Amount,For should corresponding tag along sort in the second feature Vector Theory in the training sample of input.Data type can Think solely hot (one-hot) vector；Only hot vector (1,0) can be used^TIndicate that class label is 0 tag along sort, with only hot vector (0,1)^TIndicate that class label is 1 tag along sort.It indicates using second feature vector as the defeated of Multi-Layered Network Model Enter the tag along sort actually obtained at a distance from the tag along sort that should theoretically obtain.

S604, by the corresponding weight matrix of multilayer neural network and bias vector when loss function minimum, as supervision point The weight matrix and bias vector of class model.

In the present embodiment, by using loss function minimum as the optimization aim of multilayer neural network, so as to so that It obtains multilayer neural network and is gradually fitted training sample, carrying out gradient decline to optimization aim can be obtained supervised classification model.

It, can be with when the data structure of supervised classification model output is only hot vector in a wherein specific embodiment WithIt indicates, if x₁>x₂Then it is considered that class label is 0 tag along sort (normal terminal mark), otherwise it is assumed that being The tag along sort (malicious peer mark) that class label is 1.The user logined successfully on malicious peer is i.e. it is believed that stolen.

In one embodiment, W can be used₁,W₂,…,W_nIndicate weight matrix；Indicate bias vector；f Indicate activation primitive,Indicate the second feature vector of input；Indicate the tag along sort actually exported.In this way, multilayer is refreshing It can be indicated through network are as follows:

In one embodiment, f indicates to correct linear unit activating function, and softmax indicates flexible maximum activation letter Number.Then, multilayer neural network can indicate are as follows:

In one embodiment, the formula of flexible maximum activation function can indicate are as follows:

Wherein,Indicate the feature vector of input.

In a wherein specific embodiment, the data processing method, comprising:

Initial data is obtained, initial data includes internet protocol address, user identifier, login result and entry address；

Be reconstructed to obtain reconstruct data to initial data, reconstruct data include station address and address distance, address away from From being entry address at a distance from station address；

According to initial data and the corresponding first eigenvector of reconstruct each internet protocol address of data configuration, fisrt feature to Amount include user identifier quantity factor, address distance average value factor, login successfully rate factor, login times factor or/and Login successfully number factor；

First eigenvector is inputted into supervised classification model, determines whether internet protocol address is that malice identifies, supervision point Class model is the disaggregated model being trained to training sample, and training sample includes second feature vector, second feature to It measures identical as the data structure of first eigenvector；

Wherein, the building process of supervised classification model, comprising:

Obtain training sample；

Training sample is input to multilayer neural network to be trained, so that the loss function of multilayer neural network is minimum；

By the corresponding weight matrix of multilayer neural network and bias vector when loss function minimum, as supervised classification model Weight matrix and bias vector；

The step of obtaining reconstruct data, is reconstructed to initial data, comprising:

Determine the corresponding station address of each user identifier；

Address distance is determined according to station address and entry address；

The step of according to initial data and reconstruct each internet protocol address of data configuration corresponding first eigenvector, packet It includes:

Obtain internet protocol address；

Each user identifier and login result corresponding to internet protocol address within a preset period of time are inquired, and according to each use Family mark and login result, internet protocol address is corresponding within a preset period of time logins successfully rate and user identifier number for statistics Amount；

According to login result, statistics within a preset period of time the corresponding login times of internet protocol address or/and log at Function number；

According to the average value of each address distance, login successfully rate, user identifier quantity and login times or/and log at Function number determines first eigenvector.

It should be understood that although each step in the flow chart of Fig. 2-6 is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, these steps can execute in other order.Moreover, at least one in Fig. 2-6 Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, the execution sequence in these sub-steps or stage is also not necessarily successively It carries out, but can be at least part of the sub-step or stage of other steps or other steps in turn or alternately It executes.

In one embodiment, as shown in fig. 7, providing a kind of data processing equipment, comprising:

Initial data obtains module 701, and for obtaining initial data, the initial data includes terminal iidentification, Yong Hubiao Knowledge, login result and entry address；

Data reconstruction module 703 obtains reconstruct data, the reconstruct data packet for the initial data to be reconstructed Station address and address distance are included, the address distance is the entry address at a distance from the station address；

Feature vector constructing module 705, for according to the initial data and each terminal of reconstruct data configuration Identify corresponding first eigenvector, the first eigenvector include user identifier quantity factor, the address distance it is flat Mean value factor and login successfully rate factor；

Malice mark determining module 707 determines the end for the first eigenvector to be inputted supervised classification model End identifies whether maliciously to identify；The supervised classification model is the disaggregated model being trained to training sample, described Training sample includes second feature vector, and the second feature vector is identical as the data structure of the first eigenvector.

Above-mentioned data processing equipment is detected by collecting link in the behavior of hacker's downstream industry chain, i.e., by obtaining Initial data when hacker being taken to carry out login authentication one by one to the account number cipher in account password set to automatic machine, and to original Data are reconstructed to obtain reconstruct data；According to initial data and the corresponding fisrt feature of reconstruct each terminal iidentification of data configuration to Amount；Finally, first eigenvector is inputted supervised classification model, determine whether terminal iidentification is that malice identifies.In this way, being not necessarily to The detection that the various channels of hacker's upstream industry chain are carried out at client, can effectively evade login protocol and be broken at client Solution, improves the accuracy rate of detection.

In one embodiment, referring to Fig. 8, data processing equipment, further includes:

The training sample is input to supervised classification for obtaining the training sample by disaggregated model training module 813 Algorithm is trained to obtain the supervised classification model.

In one embodiment, data processing equipment, the disaggregated model training module 813 are used for the training Sample is input to multilayer neural network and is trained, and obtains the supervised classification model.

In one embodiment, the activation primitive that the middle layer of the multilayer neural network uses is amendment linear unit Activation primitive, the activation primitive that the output layer of the multilayer neural network uses is flexible maximum activation function.

In one embodiment, please continue to refer to Fig. 8, the disaggregated model training module 813, comprising:

Sample acquisition unit 8131, for obtaining the training sample；

Model training unit 8132 is trained for the training sample to be input to the multilayer neural network, makes The loss function for obtaining the multilayer neural network is minimum；

Model determination unit 8134, the multilayer neural network corresponding weight square when for by the loss function minimum Battle array and bias vector, weight matrix and bias vector as the supervised classification model.

In one embodiment, referring to Fig. 9, the data reconstruction module 903, comprising:

Station address determination unit 9032, for determining the corresponding user of each user identifier according to the user identifier Address；

Address distance determining unit 9034, for determining address distance according to the station address and the entry address；

In one embodiment, the terminal iidentification includes internet protocol address；Described eigenvector constructing module 905, comprising:

Network address acquiring unit 9052, for obtaining internet protocol address；

Data statistics unit 9054, it is each described corresponding to the internet protocol address within a preset period of time for inquiring User identifier and the login result, and according to each user identifier and the login result, it counts in the preset time The internet protocol address is corresponding in section logins successfully rate and user identifier quantity；

Feature vector determination unit 9056, for according to the average value of each address distance, it is described login successfully rate and The user identifier quantity determines the first eigenvector.

In one embodiment, the first eigenvector further includes login times factor or/and logins successfully number Factor.Here include three kinds of modes: 1, first eigenvector include user identifier quantity factor, address distance average value because Element logins successfully rate factor and login times factor；2, first eigenvector includes user identifier quantity factor, address distance Average value factor logins successfully rate factor and logins successfully number factor；3, first eigenvector include user identifier quantity because Plain, address distance average value factor logins successfully rate factor, login times factor and logins successfully number factor.

Described eigenvector constructing module 905, further includes:

Login times statistic unit 9058, for counting the network within a preset period of time according to the login result The corresponding login times of protocol address or/and login successfully number；

Described eigenvector determination unit 9056, for according to the average value of each address distance, described login successfully It rate, the user identifier quantity and the login times or/and logins successfully number and constructs the first eigenvector.

In one embodiment, referring to Fig. 10, described device further include:

Stolen mark determination module 1015, for determining that the terminal iidentification is described in malice mark determining module When malice identifies, it is determined as stolen user identifier for the user identifier logged in corresponding terminal is identified in the malice.

In one embodiment, please continue to refer to Figure 10, described device further include:

Malice mark disabled module 1017, for determining that the terminal iidentification is described in malice mark determining module When malice identifies, forbids the malice to identify corresponding terminal and carry out user's login.

Specific about data processing equipment limits the restriction that may refer to above for data processing method, herein not It repeats again.Modules in above-mentioned data processing equipment can be realized fully or partially through software, hardware and combinations thereof.On Stating each module can be embedded in the form of hardware or independently of in the processor in computer equipment, can also store in a software form In memory in computer equipment, the corresponding operation of the above modules is executed in order to which processor calls.

In one embodiment, a kind of computer equipment is provided, which can be server, internal junction Composition can be as shown in figure 11.The computer equipment include by system bus connect processor, memory, network interface and Database.Wherein, the processor of the computer equipment is for providing calculating and control ability.The memory packet of the computer equipment Include non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program and data Library.The built-in storage provides environment for the operation of operating system and computer program in non-volatile memory medium.The calculating The database of machine equipment is for storing initial data and reconstruct data.The network interface of the computer equipment is used for and external end End passes through network connection communication.To realize a kind of data processing method when the computer program is executed by processor.

It will be understood by those skilled in the art that structure shown in Figure 11, only part relevant to application scheme The block diagram of structure, does not constitute the restriction for the computer equipment being applied thereon to application scheme, and specific computer is set Standby may include perhaps combining certain components or with different component layouts than more or fewer components as shown in the figure.

In one embodiment, a kind of computer equipment, including memory and processor are provided, is stored in memory Computer program, the processor perform the steps of when executing computer program

In one embodiment, the building process of the supervised classification model, comprising:

Obtain the training sample；

The training sample is input to supervised classification algorithm to be trained to obtain the supervised classification model.

In one embodiment, it is described by the training sample be input to supervised classification algorithm be trained to obtain it is described The step of supervised classification model, comprising:

The training sample is input to multilayer neural network to be trained, obtains the supervised classification model.

In one embodiment, described the training sample is input to multilayer neural network to be trained, obtain institute The step of stating supervised classification model, comprising:

The training sample is input to the multilayer neural network to be trained, so that the damage of the multilayer neural network It is minimum to lose function；

The corresponding weight matrix of the multilayer neural network and bias vector when by the loss function minimum, as described The weight matrix and bias vector of supervised classification model.

It is in one embodiment, described that the step of obtaining reconstruct data is reconstructed to the initial data, comprising:

Determine the corresponding station address of each user identifier；

Address distance is determined according to the station address and the entry address.

In one embodiment, the terminal iidentification includes internet protocol address；It is described according to the initial data and The step of each terminal iidentification of the reconstruct data configuration corresponding first eigenvector, comprising:

Obtain the internet protocol address；

Inquire each user identifier corresponding to the internet protocol address within a preset period of time and login knot Fruit, and according to each user identifier and the login result, statistics internet protocol address in the preset time period It is corresponding to login successfully rate and user identifier quantity；

According to the average value of each address distance, described login successfully rate and the user identifier quantity determines described One feature vector.

In one embodiment, the first eigenvector further includes login times factor or/and logins successfully number Factor；

It is described according to the corresponding fisrt feature of each terminal iidentification of the initial data and the reconstruct data configuration to The step of amount, further includes: according to the login result, count the corresponding login of the internet protocol address within a preset period of time Number or/and login successfully number；

The average value according to each address distance described logins successfully rate and the user identifier quantity determines institute The step of stating first eigenvector, comprising: according to the average value of each address distance, described login successfully rate, the user It identifies quantity and the login times or/and logins successfully number and determine the first eigenvector.

In one embodiment, described that the first eigenvector is inputted into the supervised classification model, determine described in After whether terminal iidentification is malice the step of identifying, further includes:

When determining that the terminal iidentification identifies for malice, marked the user logged in corresponding terminal is identified in the malice Knowledge is determined as stolen user identifier；

And/or

When determining that the terminal iidentification identifies for malice, forbids the malice to identify corresponding terminal and carry out user's login.

In one embodiment, a kind of computer readable storage medium is provided, computer program is stored thereon with, is calculated Machine program performs the steps of when being executed by processor

Obtain the training sample；

Determine the corresponding station address of each user identifier；

Obtain the internet protocol address；

And/or

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, To any reference of memory, storage, database or other media used in each embodiment provided herein, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double data rate sdram (DDRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

Each technical characteristic of above embodiments can be combined arbitrarily, for simplicity of description, not to above-described embodiment In each technical characteristic it is all possible combination be all described, as long as however, the combination of these technical characteristics be not present lance Shield all should be considered as described in this specification.

The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the concept of this application, various modifications and improvements can be made, these belong to the protection of the application Range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.

Claims

1. a kind of data processing method, which comprises

The initial data is reconstructed to obtain reconstruct data, the reconstruct data include station address and address distance, institute Stating address distance is the entry address at a distance from the station address；

According to the corresponding first eigenvector of each terminal iidentification of the initial data and the reconstruct data configuration, described One feature vector include user identifier quantity factor, the address distance average value factor and login successfully rate factor；

The first eigenvector is inputted into supervised classification model, determines whether the terminal iidentification is that malice identifies, the prison Superintending and directing disaggregated model is the disaggregated model being trained to training sample, and the training sample includes second feature vector, institute It is identical as the data structure of the first eigenvector to state second feature vector.

2. the method according to claim 1, wherein the building process of the supervised classification model, comprising:

Obtain the training sample；

3. according to the method described in claim 2, it is characterized in that, described be input to supervised classification algorithm for the training sample It is trained the step of obtaining the supervised classification model, comprising:

4. according to the method described in claim 3, it is characterized in that, described be input to multilayer neural network for the training sample The step of being trained, obtaining the supervised classification model, comprising:

The training sample is input to the multilayer neural network to be trained, so that the loss letter of the multilayer neural network Number is minimum；

The corresponding weight matrix of the multilayer neural network and bias vector when by the loss function minimum, as the supervision The weight matrix and bias vector of disaggregated model.

5. the method according to claim 1, wherein described be reconstructed the initial data to obtain reconstruct number According to the step of, comprising:

Determine the corresponding station address of each user identifier；

6. according to the method described in claim 5, it is characterized in that, the terminal iidentification includes internet protocol address；Described The step of according to the initial data and each terminal iidentification of the reconstruct data configuration corresponding first eigenvector, comprising:

Obtain the internet protocol address；

Each user identifier and the login result corresponding to the internet protocol address within a preset period of time are inquired, and According to each user identifier and the login result, statistics internet protocol address in the preset time period is corresponding Login successfully rate and user identifier quantity；

According to the average value of each address distance, described login successfully rate and the user identifier quantity determines that described first is special Levy vector.

7. according to the method described in claim 6, it is characterized in that, the first eigenvector further includes login times factor Or/and login successfully number factor；

It is described according to the initial data and each terminal iidentification of the reconstruct data configuration corresponding first eigenvector Step, further includes: according to the login result, count the corresponding login times of the internet protocol address within a preset period of time Or/and login successfully number；

The average value according to each address distance described logins successfully rate and the user identifier quantity determines described The step of one feature vector, comprising: according to the average value of each address distance, described login successfully rate, the user identifier It quantity and the login times or/and logins successfully number and determines the first eigenvector.

8. according to claim 1 to method described in 7 any one, which is characterized in that described that the first eigenvector is defeated Enter the supervised classification model, after determining whether the terminal iidentification is malice the step of identifying, further includes:

When determining that the terminal iidentification is malice mark, marked the user logged in corresponding terminal is identified in the malice Knowledge is determined as stolen user identifier；

And/or

When determining that the terminal iidentification is malice mark, forbids the malice to identify corresponding terminal and carry out user's login.

9. a kind of data processing equipment, described device include:

Initial data obtains module, and for obtaining initial data, the initial data includes terminal iidentification, user identifier, login And entry address as a result；

Data reconstruction module obtains reconstruct data for the initial data to be reconstructed, and the reconstruct data include user Address and address distance, the address distance are the entry address at a distance from the station address；

Feature vector constructing module, for corresponding according to each terminal iidentification of the initial data and the reconstruct data configuration First eigenvector, the first eigenvector includes the average value factor of user identifier quantity factor, the address distance And login successfully rate factor；

Malice mark determining module determines the terminal iidentification for the first eigenvector to be inputted supervised classification model It whether is that malice identifies, the supervised classification model is the disaggregated model being trained to training sample, the trained sample This includes second feature vector, and the second feature vector is identical as the data structure of the first eigenvector.

10. device according to claim 9, which is characterized in that further include:

Disaggregated model training module, for obtaining the training sample, by the training sample be input to multilayer neural network into Row training, obtains the supervised classification model.

11. device according to claim 10, which is characterized in that the disaggregated model training module, comprising:

Sample acquisition unit, for obtaining the training sample；

Model training unit is trained for the training sample to be input to the multilayer neural network, so that described more The loss function of layer neural network is minimum；

Model determination unit, the corresponding weight matrix of the multilayer neural network and biasing when for by the loss function minimum Vector, weight matrix and bias vector as the supervised classification model.

12. device according to claim 9, which is characterized in that the data reconstruction module, comprising:

Station address determination unit, for determining the corresponding station address of each user identifier；

Address distance determining unit, for determining address distance according to the station address and the entry address；

The terminal iidentification includes internet protocol address；The first eigenvector further includes login times factor or/and login Number of success factor；Described eigenvector constructing module, comprising:

Network address acquiring unit, for obtaining the internet protocol address；

Data statistics unit, for inquiring each user identifier corresponding to the internet protocol address within a preset period of time And the login result, and according to each user identifier and the login result, statistics is described in the preset time period Internet protocol address is corresponding to login successfully rate and user identifier quantity；

Login times statistic unit, for counting the internet protocol address within a preset period of time according to the login result Corresponding login times or/and login successfully number；

Described eigenvector determination unit, for according to the average value of each address distance, described login successfully rate, the use Family, which identifies quantity and the login times or/and logins successfully number, constructs the first eigenvector.

13. according to device described in claim 9 to 12 any one, which is characterized in that described device further include:

Stolen mark determination module, for determining the terminal iidentification for malice mark in malice mark determining module When, it is determined as stolen user identifier for the user identifier logged in corresponding terminal is identified in the malice；

And/or

Malice mark disabled module, for determining the terminal iidentification for malice mark in malice mark determining module When, forbid the malice to identify corresponding terminal and carries out user's login.

14. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 8 the method when executing the computer program.

15. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any item of the claim 1 to 8 is realized when being executed by processor.