CN110990164A

CN110990164A - Account detection method and device and account detection model training method and device

Info

Publication number: CN110990164A
Application number: CN201911089114.7A
Authority: CN
Inventors: 曹绍升; 崔卿
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-11-08
Filing date: 2019-11-08
Publication date: 2020-04-10
Anticipated expiration: 2039-11-08
Also published as: CN110990164B

Abstract

The embodiment of the specification provides an account detection method and device and an account detection model training method and device. Each RPC unit in the RPC sequence used in the unsupervised learning model training process is related, so that model parameters obtained by unsupervised training are used as initial values to be applied to the account detection model training process, and the trained account detection model obtains higher accuracy in the account safety prediction process.

Description

Account detection method and device and account detection model training method and device

Technical Field

The specification relates to the technical field of artificial intelligence, in particular to an account detection method and device and an account detection model training method and device.

Background

Recently, it is common for lawless persons to gain illegal benefits by stealing their accounts, which impairs the legitimate interests of the users. Therefore, in order to ensure the security of the account and maintain the legitimate rights and interests of the user, it is necessary to detect the security of the account.

Disclosure of Invention

Based on the above, the embodiment of the specification provides an account detection method and device, and an account detection model training method and device.

According to a first aspect of embodiments herein, there is provided an account detection method, the method comprising:

the method comprises the steps of obtaining an RPC sequence of an account to be detected, wherein the RPC sequence is used for recording a historical operation set generated by the account to be detected;

inputting the RPC sequence into a pre-trained account detection model to detect the security of the account to be detected; the account detection model takes model parameters of an unsupervised learning model which is trained in advance as initial model parameters in the training process, and the unsupervised learning model is trained in advance through a first training RPC sequence.

According to a second aspect of embodiments herein, there is provided a method for training an account detection model, the method comprising:

training an unsupervised learning model through a first training RPC sequence to obtain model parameters of the unsupervised learning model;

and taking the model parameters as initial model parameters of the account detection model, taking a second training RPC sequence as the input of the account detection model, and taking a label of the second training RPC sequence as the output of the account detection model so as to train the account detection model.

According to a third aspect of embodiments herein, there is provided an account detection apparatus, the apparatus comprising:

the system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an RPC sequence of an account to be detected, and the RPC sequence is used for recording a historical operation set generated by the account to be detected;

the detection module is used for inputting the RPC sequence into a pre-trained account detection model so as to detect the security of the account to be detected; the account detection model takes model parameters of an unsupervised learning model which is trained in advance as initial model parameters in the training process, and the unsupervised learning model is trained in advance through a first training RPC sequence.

According to a fourth aspect of embodiments herein, there is provided an account detection model training apparatus, the apparatus including:

the first training module is used for training an unsupervised learning model through a first training RPC sequence to obtain model parameters of the unsupervised learning model;

and the second training module is used for taking the model parameters as initial model parameters of the account detection model, taking a second training RPC sequence as the input of the account detection model, and taking the label of the second training RPC sequence as the output of the account detection model so as to train the account detection model.

According to a fifth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the embodiments.

According to a sixth aspect of the embodiments of the present specification, there is provided a computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any embodiment when executing the program.

By applying the scheme of the embodiment of the specification, model parameters of the unsupervised learning model are trained in advance through the first training RPC sequence, then the model parameters are used as initial model parameters to train the account detection model, and finally the account detection model is used for carrying out account safety prediction. Each RPC unit in the RPC sequence used in the unsupervised learning model training process is related, so that model parameters obtained by unsupervised training are used as initial values to be applied to the account detection model training process, and the trained account detection model obtains higher accuracy in the account safety prediction process.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.

Fig. 1 is a flowchart of an account detection method according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of a dimension reduction process according to an embodiment of the present specification.

Fig. 3 is a flow chart of an unsupervised training process according to an embodiment of the present disclosure.

Fig. 4 is a flow chart of a supervised training process in one embodiment of the present description.

Fig. 5 is a flowchart of a training method of an account detection model according to an embodiment of the present disclosure.

Fig. 6 is a block diagram of an account detection apparatus according to an embodiment of the present specification.

Fig. 7 is a block diagram of an account detection model training apparatus according to an embodiment of the present specification.

FIG. 8 is a schematic diagram of a computer device for implementing methods of embodiments of the present description, according to an embodiment of the present description.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

As shown in fig. 1, an embodiment of the present specification provides a picture detection method, which may include:

step S102: the method comprises the steps of obtaining an RPC sequence of an account to be detected, wherein the RPC sequence is used for recording a historical operation set generated by the account to be detected;

step S104: inputting the RPC sequence into a pre-trained account detection model to detect the security of the account to be detected; the account detection model takes model parameters of an unsupervised learning model which is trained in advance as initial model parameters in the training process, and the unsupervised learning model is trained in advance through a first training RPC sequence.

For step S102, a Remote Procedure Call (RPC) is a protocol that requests services from a Remote computer program over a network without knowledge of the underlying network technology. An RPC sequence includes a plurality of RPC cells, each of which is usually a specific character string encoding representing a specific meaning.

The account to be detected may be an account of some APP, for example, a pay account, a nail account, an account of a network platform, for example, a microblog account, an account of some website, or an account of other types. Taking the account to be detected as a shopping website or an account of an APP as an example, in an account detection scenario, the RPC sequence may be used to record a historical operation set generated by the account within a period of time. An RPC sequence in a practical application scenario may be { login, password modification, account name modification, check-in, merchandise search, merchandise purchase }. Of course, those skilled in the art will appreciate that the above RPC sequences are only examples, and the embodiments of the present disclosure are not limited thereto. The security of an account may be analyzed by its RPC sequence.

In the traditional method, different RPC units are manually classified according to the specific meaning of each RPC unit in the RPC sequence, and knowledge summary of service angle is carried out. As the variety of RPC is increasing, this manual summary can consume a lot of time; at the same time, the knowledge of manual summarization cannot deeply characterize the intrinsic properties of RPC cells.

For convenience of understanding, a wind-controlled scenario is exemplified here, such as the following RPC sequence "… login; modifying the password; returning verification information error information; modifying the password; returning verification information error information; modifying the password; returning verification information error information; modifying the password; returning verification information error information; … ", at this time, the wind system may detect an account anomaly. The traditional method is to manually summarize the specific pattern, but the number of RPC sequences is increasing, and new patterns are generated continuously, so that the manual summarization is difficult to cover the whole. To solve the above problem, a classification model in machine learning can be used, i.e. the same RPC cell is considered as a feature, but this method has the disadvantage that it is not possible to characterize the intrinsic relationship between RPC cells, but rather that it is apparent that different RPC cells are treated differently.

In order to characterize the internal relations between the different RPC units, for step S104, the RPC sequence can be input into a pre-trained account inspection model, which is trained using the model parameters of the unsupervised learning model pre-trained by the first training RPC sequence as initial model parameters, since each first training RPC sequence has a certain meaning, i.e., there is a relationship between each RPC unit in the first training RPC sequence. The process of training the model parameters of the unsupervised learning model through the first training RPC sequence just utilizes the internal relation among all RPC units in the first training RPC sequence, so that the model parameters obtained by unsupervised training are used as initial model parameters to be applied to the training process of the account detection model, the trained account detection model can depict the internal relation among the RPC units, and the detection result is more accurate.

When training the account detection model, the model parameters of the unsupervised learning model can be used as the initial model parameters of the account detection model, the second training RPC sequence is used as the input of the account detection model, and the label of the second training RPC sequence is used as the output of the account detection model to train the account detection model.

Wherein the second training RPC sequence is tagged training data indicating whether an account generating the second training RPC sequence is at risk, i.e. the security of the account is high or low. If the first training RPC sequence comprises both a labeled sequence and an unlabeled sequence, the unsupervised learning model can be trained by adopting the whole first training RPC sequence (without labels), and then the sequence with the labels in the first training RPC sequence is taken as the second training RPC sequence for training the account detection model.

In training the account detection model, each RPC unit in the first training RPC sequence may be converted into the same vector space, respectively, and the vector dimension is fixed, for example, 100 dimensions. Then, each vector corresponding to the same RPC sequence may be input into the unsupervised learning model as a piece of training data, and the unsupervised learning model may be trained according to a preset optimization target. An optional optimization objective is the minimization of the loss function, and of course, other conditions may also be adopted as the optimization objective, which is not limited in the embodiments of the present specification.

The vector can be directly used in tasks such as intention recognition, commodity recommendation and the like. Intent recognition is the recognition of the purpose of a user to perform an operation. For example, the purpose of performing a "modify password" operation may be to modify the original password to a new password; the purpose of performing a "search" operation may be to obtain information of some object (e.g., a commodity); the intent to perform the "buy" operation is to purchase the item. The item recommendation may or may not be based on the intention recognition, for example, the user performs an operation of purchasing running shoes, and the sports-related item may be recommended to the user.

And performing dimensionality reduction on the vectors corresponding to the RPC units respectively to map the vectors to the same two-dimensional space and obtain a planar visualization graph of each vector in the two-dimensional space. The distance between two vectors in the two-dimensional space can be used to characterize the similarity between the two vectors. In this way, business personnel can intuitively perform data analysis. FIG. 2 shows a planar visualization diagram of an embodiment, in which RPC _1 to RPC _9 are two-dimensional vectors corresponding to different RPC units, respectively, and two circles are partial enlarged views in the planar visualization diagram, and in the circle at the top left corner, the distance between the two-dimensional vectors corresponding to RPC _1 and RPC _5 is relatively close, which means that the similarity between the two-dimensional vectors corresponding to RPC _1 and RPC _5 is relatively high; similarly, the similarity between the two-dimensional vectors corresponding to RPC _2, RPC _3, and RPC _4 is high, and the similarity between the three two-dimensional vectors and the two-dimensional vectors corresponding to RPC _1 and RPC _5 is low. The relationships between the vectors in the lower right corner are similar and will not be described further herein.

It should be noted that the model parameters of the trained unsupervised learning model include the input parameters of the unsupervised learning model (i.e., the vectors generated from the RPC cells) in addition to the parameters inside the model. That is, in the process of training the unsupervised learning model, not only the parameters inside the model can be adjusted, but also each vector can be adjusted. After the unsupervised learning model training is completed, the adjusted vector and the parameters inside the model can be used as initial model parameters of the account detection model together to train the account detection model.

During the process of training the unsupervised learning model, the first training RPC sequence may be input into the unsupervised learning model; randomly hiding at least one RPC unit in the first training RPC sequence, and predicting the hidden at least one RPC unit through the unsupervised learning model; and training the unsupervised learning model according to the prediction result.

For example, assuming that the first training RPC sequence is { RPC _1, RPC _2, RPC _3, … …, RPC _ N }, RPC _3 therein can be hidden, and then the remaining RPC units are input, the unsupervised learning model through which the hidden RPC units are predicted. For the example of the aforementioned wind-controlled scenario, the unsupervised learning model may predict whether the hidden RPC _3 is specifically "login" or "modified password", i.e., output the probability that RPC _3 is "login" and the probability that RPC _3 is "modified password", and determine that RPC _3 is "login" if the probability that RPC _3 is "login" is greater than the probability that RPC _3 is "modified password". In practical operation, two RPC units, for example, RPC _1 and RPC _3, may also be randomly hidden, or more than two RPC units may also be randomly hidden, which is not limited in this embodiment of the present specification.

In one embodiment, the unsupervised learning model is a machine learning model consisting of several layers of LSTM (Long Short-term memory network). In another embodiment, the account detection model is a machine learning model made up of several layers of LSTM. Each layer of LSTM comprises a plurality of LSTM cells, and each LSTM cell can be used for inputting a vector corresponding to one RPC cell. As shown in FIG. 3, the machine learning model is composed of two LSTMs, the lowest layer is an RPC sequence input layer, the upper two layers are LSTM layers, and the uppermost layer is an output layer. In this embodiment, there are N RPC units per RPC sequence, and each RPC unit inputs one LSTM unit. A large number of RPC sequences can be collected in advance and then learned in the manner of fig. 3. Here, one RPC unit is "hidden" randomly (as shown in RPC _ 3), and is predicted from other RPC units in the RPC sequence. Thus, through a plurality of times of prediction, according to the optimization target, the model parameters of the machine learning model are finally obtained.

As shown in fig. 4, the parameter value obtained from unsupervised learning is used as an initial value, and then a label (label) is added for supervised learning (as shown in the last column). In this process, rather than "hiding" any RPC unit, the entire RPC sequence is used to predict the label, i.e., whether the account is abnormal. After the model for supervised learning is trained, the RPC sequence corresponding to the operation of a certain account in a certain period of time is input, and whether the account behavior is abnormal can be predicted in the same way as in fig. 4.

The traditional method, which is only a second step of supervision, does not make good use of the inherent relations between the individual RPC units in the RPC sequence. According to the method, a large number of RPC sequences are fully used, the initial value of the neural network is obtained through unsupervised training, and after a good initial value is obtained, a better effect is easily generated in the process of supervised learning, and the prediction accuracy is improved. In a whole view, the semi-supervised learning method makes full use of a large number of RPC sequences and limited labeled information.

Fig. 5 is a flowchart of a training method of an account detection model according to an embodiment of the present disclosure. The method may comprise:

step S502: training an unsupervised learning model through a first training RPC sequence to obtain model parameters of the unsupervised learning model;

step S504: and taking the model parameters as initial model parameters of the account detection model, taking a second training RPC sequence as the input of the account detection model, and taking a label of the second training RPC sequence as the output of the account detection model so as to train the account detection model.

In the embodiment of the present specification, the account detection model obtained by training the model parameters of the unsupervised learning model trained by the first training RPC sequence as the initial model parameters has a certain meaning, that is, there is a relationship between each RPC unit in the first training RPC sequence. The process of training the model parameters of the unsupervised learning model through the first training RPC sequence just utilizes the internal relation among all RPC units in the first training RPC sequence, so that the model parameters obtained by unsupervised training are used as initial model parameters to be applied to the training process of the account detection model, the trained account detection model can depict the internal relation among the RPC units, and the detection result is more accurate.

In one embodiment, the step of training the unsupervised learning model by the first training RPC sequence comprises: inputting the first training RPC sequence into the unsupervised learning model; randomly hiding at least one RPC unit in the first training RPC sequence, and predicting the hidden at least one RPC unit through the unsupervised learning model; and training the unsupervised learning model according to the prediction result.

And performing dimensionality reduction on the vectors corresponding to the RPC units respectively to map the vectors to the same two-dimensional space and obtain a planar visualization graph of each vector in the two-dimensional space. The distance between two vectors in the two-dimensional space can be used to characterize the similarity between the two vectors. In this way, business personnel can intuitively perform data analysis.

In one embodiment, the unsupervised learning model is a machine learning model composed of several layers of LSTM. In another embodiment, the account detection model is a machine learning model made up of several layers of LSTM.

As shown in fig. 6, a block diagram of an account detection apparatus according to an embodiment of the present disclosure may include:

an obtaining module 602, configured to obtain an RPC sequence of an account to be detected, where the RPC sequence is used to record a historical operation set generated by the account to be detected;

a detection module 604, configured to input the RPC sequence into a pre-trained account detection model to detect the security of the account to be detected; the account detection model takes model parameters of an unsupervised learning model which is trained in advance as initial model parameters in the training process, and the unsupervised learning model is trained in advance through a first training RPC sequence.

The specific details of the implementation process of the functions and actions of each module in the device are found in the implementation process of the corresponding step in the account detection method, and are not described herein again.

Fig. 7 is a block diagram of an account detection model training apparatus according to an embodiment of the present disclosure, where the apparatus may include:

a first training module 702, configured to train an unsupervised learning model through a first training RPC sequence, to obtain model parameters of the unsupervised learning model;

a second training module 704, configured to use the model parameters as initial model parameters of the account detection model, use a second training RPC sequence as an input of the account detection model, and use a label of the second training RPC sequence as an output of the account detection model, so as to train the account detection model.

The specific details of the implementation process of the functions and actions of each module in the device are found in the implementation process of the corresponding step in the training method of the account detection model, and are not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

The embodiments of the apparatus of the present specification can be applied to a computer device, such as a server or a terminal device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor in which the file processing is located. From a hardware aspect, as shown in fig. 8, the hardware structure diagram of a computer device in which the apparatus of this specification is located is shown, except for the processor 802, the memory 804, the network interface 806, and the nonvolatile memory 808 shown in fig. 8, a server or an electronic device in which the apparatus is located in the embodiment may also include other hardware according to an actual function of the computer device, which is not described again.

Accordingly, the embodiments of the present specification also provide a computer storage medium, in which a program is stored, and the program, when executed by a processor, implements the method in any of the above embodiments.

Accordingly, the embodiments of the present specification also provide a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the method in any of the above embodiments is implemented.

Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The above description is only exemplary of the present disclosure and should not be taken as limiting the disclosure, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. An account detection method, the method comprising:

2. The method of claim 1, wherein the account detection model is trained according to the following:

and taking the model parameters of the unsupervised learning model as initial model parameters of the account detection model, taking a second training RPC sequence as the input of the account detection model, and taking the label of the second training RPC sequence as the output of the account detection model so as to train the account detection model.

3. The method of claim 2, the unsupervised learning model being trained according to:

inputting the first training RPC sequence into the unsupervised learning model; randomly hiding at least one RPC unit in the first training RPC sequence, and predicting the hidden at least one RPC unit through the unsupervised learning model;

and training the unsupervised learning model according to the prediction result.

4. The method of any of claims 1 to 3, the unsupervised learning model and/or the account detection model being a machine learning model consisting of several layers of LSTM.

5. A method of training an account detection model, the method comprising:

6. The method of claim 5, the step of training the unsupervised learning model with a first training RPC sequence comprising:

7. The method of claim 5 or 6, the unsupervised learning model and/or the account detection model being a machine learning model consisting of several layers of LSTM.

8. An account detection apparatus, the apparatus comprising:

9. An account detection model training apparatus, the apparatus comprising:

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7.

11. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.