CN107872436B - Account identification method, device and system - Google Patents

Account identification method, device and system Download PDF

Info

Publication number
CN107872436B
CN107872436B CN201610857050.0A CN201610857050A CN107872436B CN 107872436 B CN107872436 B CN 107872436B CN 201610857050 A CN201610857050 A CN 201610857050A CN 107872436 B CN107872436 B CN 107872436B
Authority
CN
China
Prior art keywords
account
user
identified
features
user behavior
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610857050.0A
Other languages
Chinese (zh)
Other versions
CN107872436A (en
Inventor
柯力
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201610857050.0A priority Critical patent/CN107872436B/en
Publication of CN107872436A publication Critical patent/CN107872436A/en
Application granted granted Critical
Publication of CN107872436B publication Critical patent/CN107872436B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/316User authentication by observing the pattern of computer usage, e.g. typical user behaviour

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • General Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Social Psychology (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses an account identification method, an account identification device and an account identification system, which are used for identifying the relevance between an account to be identified and a user, wherein a plurality of characteristics can be extracted from user behavior information of the account to be identified in a preset time, and the account identification method comprises the following steps: determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified; respectively comparing the features to be judged with the plurality of features to obtain a plurality of comparison results; and identifying the relevance between the account number to be identified and the user according to the comparison results.

Description

Account identification method, device and system
Technical Field
The invention relates to the technical field of network data processing, in particular to an account identification method, device and system.
Background
With the rapid development of internet technology, people's lives are increasingly unable to leave the internet. A user may register an account with an internet platform to use a service provided by the internet platform. At present, after a user registers an account on an internet platform, correct identification card information (for example, information including an identification card number, a name, an identification card address, a user photo, and the like) is registered, that is, the account is considered to be a real name. However, at present, the condition that the account number is resale or stolen is serious. In particular, some people, especially some lawbreakers, purchase identity information of others, even account numbers which have been authenticated by real names, and conduct illegal criminal activities such as network fraud and the like. Because of the imposition of a famous crime, the real person in charge is difficult to be positioned after the case, and the network crime is rampant. In addition, when the account is stolen by a lawbreaker, the internet platform cannot monitor the stealing condition of the account in time, which may cause property loss of the user.
In the related art, there is a technology of remote login authentication to monitor an account, that is, when an address of a user login account is changed, authentication of the account user is prompted.
In summary, although the related art has a technique of remote login verification, a method for continuously and accurately identifying the association between an account (i.e. an account number) and an account user is lacking.
Disclosure of Invention
The following is a summary of the subject matter described in detail herein. This summary is not intended to limit the scope of the claims.
The embodiment of the application provides an account identification method, device and system, which can continuously and accurately identify the relevance between an account and a user so as to improve the network security.
The embodiment of the application provides an account identification method, which is used for identifying the relevance between an account to be identified and a user, wherein a plurality of characteristics can be extracted from user behavior information of the account to be identified in a preset time, and the account identification method comprises the following steps: determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified; respectively comparing the characteristics to be judged of the account to be identified with the plurality of characteristics to obtain a plurality of comparison results; and identifying the relevance between the account number to be identified and the user according to a plurality of comparison results.
Wherein, according to a plurality of comparison results, the relevance between the account number to be identified and the user is identified, and the method comprises the following steps:
identifying the relevance between the account to be identified and the user by calculating a plurality of comparison results by using a user identification model; alternatively, the first and second electrodes may be,
and identifying the relevance between the account number to be identified and the user by judging whether the comparison results meet the preset conditions.
The account identification method further comprises the following steps: extracting a plurality of characteristics from the user behavior information of the account to be identified within a preset time length through one of the following modes:
extracting a plurality of user habit features from user behavior information of an account to be identified within a preset time;
and extracting user relationship characteristics and at least one user habit characteristic from the user behavior information of the account to be identified within a preset time.
The user behavior information comprises a plurality of user behavior characteristics, and each user behavior characteristic corresponds to one or more characteristic values;
the method comprises the following steps of extracting user habit characteristics from user behavior information of an account to be identified within a preset time length in the following mode:
determining one or more characteristic values corresponding to the user behavior characteristics from the user behavior information within the preset time length;
respectively determining a characteristic value meeting a first preset condition in the one or more characteristic values aiming at each user behavior characteristic, and determining the characteristic value meeting the first preset condition as a user habit characteristic;
wherein the first preset condition comprises at least one of: the most use times; the service life is longest; the weighted sum of the number of uses and the duration of use is maximized.
The method for extracting the user relationship characteristics from the user behavior information of the account to be identified in the preset time comprises the following steps:
respectively determining a first relation score and a second relation score according to the user behavior information within the preset time length; wherein the first relationship score comprises a relationship score between the account to be identified and each first associated feature, the first associated feature being an associated feature associated with the account to be identified; the second relation score comprises a relation score between a relation account and the associated first associated characteristic, and the relation account refers to an account associated with any first associated characteristic except the account to be identified;
determining a third relation score between the account to be identified and each relation account according to the first relation score and the second relation score;
determining the user relationship characteristics of the account to be identified according to the third relationship value;
the association characteristics refer to characteristic values of user behavior characteristics capable of constructing relationships among different accounts.
Wherein the user relationship features include: the relationship account number meeting a second preset condition, wherein the second preset condition comprises: and a third relation score between the relation account and the account to be identified is greater than or equal to a first threshold, or the third relation score belongs to the first N in the sequence from high to low, wherein N is a positive integer.
The step of comparing the features to be determined of the account to be identified with the plurality of features respectively to obtain a plurality of comparison results includes:
when the features to be judged comprise the user behavior features to be judged and the features comprise the user habit features corresponding to the user behavior features to be judged, comparing whether the feature values of the user behavior features to be judged are consistent with the user habit features or not to obtain a comparison result;
and when the plurality of features comprise user relationship features and the features to be judged comprise the user relationship features to be judged, calculating Jacard distances between the user relationship features to be judged and the user relationship features comprised in the plurality of features, and comparing the Jacard distances with a second threshold value to obtain a comparison result.
Before the identifying the association between the account to be identified and the user by calculating the comparison results by using a user identification model, the account identification method further includes: obtaining the user identification model by:
selecting sample data, wherein the sample data comprises the known data of the account changed by the user; and training a machine learning algorithm model by using the sample data to obtain the user identification model.
Wherein the calculating the comparison results by using a user identification model to identify the association between the account to be identified and the user comprises:
calculating the comparison results by using a user identification model to obtain a probability value that the user of the account to be identified does not change; when the probability value is larger than a third threshold value, identifying that the user of the account to be identified is unchanged; and when the probability value is smaller than or equal to a third threshold value, identifying that the user of the account to be identified changes.
The method for extracting a plurality of features from the user behavior information of the account to be identified in the preset time comprises the following steps:
extracting a plurality of features from the user behavior information within a preset time length at each information acquisition time from the registration time of the account to be identified;
the preset time length refers to a time length between the registration time of the account to be identified and the latest information acquisition time, or the preset time length refers to an interval time length between adjacent information acquisition times.
The embodiment of the present application further provides an account identification apparatus, configured to identify a relationship between an account to be identified and a user, where the account identification apparatus includes:
the first acquisition module is used for extracting a plurality of characteristics from the user behavior information of the account to be identified within a preset time;
the second acquisition module is used for determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified;
the comparison module is used for respectively comparing the features to be judged of the account to be identified with the plurality of features to obtain a plurality of comparison results;
and the identification module is used for identifying the relevance between the account number to be identified and the user according to the comparison results.
The identification module is used for identifying the relevance between the account to be identified and the user according to the comparison results in the following modes:
identifying an association between the account to be identified and a user by calculating the comparison results using a user identification model; alternatively, the first and second electrodes may be,
and identifying the relevance between the account to be identified and the user by judging whether the comparison results meet the preset conditions.
The first obtaining module is used for extracting a plurality of characteristics from the user behavior information of the account to be identified within a preset time length in one of the following manners:
extracting a plurality of user habit features from user behavior information of an account to be identified within a preset time;
and extracting user relationship characteristics and at least one user habit characteristic from the user behavior information of the account to be identified within a preset time.
The user behavior information comprises a plurality of user behavior characteristics, and each user behavior characteristic corresponds to one or more characteristic values;
the first acquisition module is used for extracting user habit characteristics from user behavior information of the account to be identified within a preset time length in the following mode:
determining one or more characteristic values corresponding to the user behavior characteristics from the user behavior information within the preset time length;
respectively determining a characteristic value meeting a first preset condition in the one or more characteristic values aiming at each user behavior characteristic, and determining the characteristic value meeting the first preset condition as a user habit characteristic;
wherein the first preset condition comprises at least one of: the most use times; the service life is longest; the weighted sum of the number of uses and the duration of use is maximized.
The first obtaining module is used for extracting user relationship characteristics from user behavior information of the account to be identified within a preset time length in the following mode:
respectively determining a first relation score and a second relation score according to the user behavior information within the preset time length; wherein the first relationship score comprises a relationship score between the account to be identified and each first associated feature, the first associated feature being an associated feature associated with the account to be identified; the second relation score comprises a relation score between a relation account and the associated first associated characteristic, and the relation account refers to an account associated with any first associated characteristic except the account to be identified;
determining a third relation score between the account to be identified and each relation account according to the first relation score and the second relation score;
determining the user relationship characteristics of the account to be identified according to the third relationship value;
the association characteristics refer to characteristic values of user behavior characteristics capable of constructing relationships among different accounts.
Wherein the user relationship features include: the relationship account satisfying a second preset condition, where the second preset condition includes: and a third relation score between the relation account and the account to be identified is greater than or equal to a first threshold, or the third relation score belongs to the first N in the sequence from high to low, wherein N is a positive integer.
The comparison module is used for comparing the features to be judged with the plurality of features respectively in the following modes to obtain a plurality of comparison results:
when the features to be judged comprise the user behavior features to be judged and the features comprise the user habit features corresponding to the user behavior features to be judged, comparing whether the feature values of the user behavior features to be judged are consistent with the user habit features or not to obtain a comparison result;
and when the plurality of features comprise user relationship features and the features to be judged comprise the user relationship features to be judged, calculating Jacard distances between the user relationship features to be judged and the user relationship features comprised in the plurality of features, and comparing the Jacard distances with a second threshold value to obtain a comparison result.
Wherein, the account number recognition device further comprises: a model building module, configured to obtain the user identification model through the following steps: selecting sample data, wherein the sample data comprises the known data of the account changed by the user; and training a machine learning algorithm model by using the sample data to obtain the user identification model.
The identification module is used for calculating the comparison results by using a user identification model to identify the relevance between the account to be identified and the user by the following method:
calculating the comparison results by using a user identification model to obtain a probability value that the user of the account to be identified does not change; when the probability value is larger than a third threshold value, identifying that the user of the account to be identified is unchanged; and when the probability value is smaller than or equal to a third threshold value, identifying that the user of the account to be identified changes.
The first acquisition module is used for extracting a plurality of characteristics from the user behavior information of the account to be identified within a preset time length in the following mode:
extracting a plurality of features from the user behavior information within a preset time length at each information acquisition time from the registration time of the account to be identified;
the preset time length refers to a time length between the registration time of the account to be identified and the latest information acquisition time, or the preset time length refers to an interval time length between adjacent information acquisition times.
An embodiment of the present application further provides an account identification system, configured to identify a relationship between an account to be identified and a user, where the account identification system includes: a first device and a second device;
the first device is used for extracting a plurality of features from the user behavior information of the account to be identified within a preset time length, determining the features to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified, respectively comparing the features to be judged with the features to obtain a plurality of comparison results, and sending the comparison results to the second device;
and the second device is used for receiving the comparison results sent by the first device and identifying the relevance between the account to be identified and the user according to the comparison results.
The second device is used for identifying the relevance between the account to be identified and the user according to the comparison results in the following way:
identifying an association between the account to be identified and a user by calculating the comparison results using a user identification model; alternatively, the first and second electrodes may be,
and identifying the relevance between the account to be identified and the user by judging whether the comparison results meet the preset conditions.
Wherein the first apparatus is further configured to obtain the subscriber identity module by: selecting sample data, wherein the sample data comprises known data of an account number changed by a user; training a machine learning algorithm model by using the sample data to obtain the user identification model;
the first device is further configured to send the obtained user identification model to the second device.
An embodiment of the present application further provides a data processing electronic device for account identification, including: a memory and a processor; the memory is used for storing a program for account identification, and when the program for account identification is read and executed by the processor, the program performs the following operations: determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified; respectively comparing the characteristics to be judged of the account to be identified with a plurality of characteristics to obtain a plurality of comparison results; and identifying the relevance between the account to be identified and the user according to the comparison results, wherein the characteristics are extracted from the user behavior information of the account to be identified in a preset time.
In the embodiment of the application, a plurality of features can be extracted from the user behavior information of the account to be identified within a preset time; determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified; respectively comparing the features to be judged with the plurality of features to obtain a plurality of comparison results; and identifying the relevance between the account number to be identified and the user according to the comparison results. According to the method and the device, the relevance between the account number and the account number user is effectively identified by deeply analyzing a plurality of characteristic dimensions of online behaviors of the user depending on big data mining, for example, whether the account number user is changed or not is monitored, so that the network security is improved.
Other aspects will be apparent upon reading and understanding the attached drawings and detailed description.
Drawings
Fig. 1 is a flowchart of an account identification method according to an embodiment of the present application;
fig. 2 is a diagram illustrating an example of a relationship between an account and an associated feature in an embodiment of the present application;
fig. 3 is a diagram illustrating a relationship between accounts according to an embodiment of the present application;
fig. 4 is a schematic diagram of an account identification apparatus according to a third embodiment of the present application;
fig. 5 is a schematic diagram of an account identification system according to the fourth embodiment of the present application.
Detailed Description
The embodiments of the present application will be described in detail below with reference to the accompanying drawings, and it should be understood that the embodiments described below are only for illustrating and explaining the present application and are not intended to limit the present application.
It should be noted that the terms "first," "second," and the like in the description and claims of this application and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
It should be noted that, if not conflicting, the embodiments and features of the embodiments may be combined with each other and are within the scope of protection of the present application. Additionally, while a logical order is shown in the flow diagrams, in some cases, the steps shown or described may be performed in an order different than here.
Definition of terms:
user behavior: the operation behavior of the account (i.e. account) performed by the user using the account is referred to, for example, account registration, identity authentication, web browsing, account modification, account login, and the like.
User behavior information: the information refers to information generated by an operation behavior performed by a user of an account using the account, for example, registration behavior information of the account (hereinafter, referred to as account registration information), identity authentication behavior information (hereinafter, referred to as identity authentication information), web browsing behavior information (hereinafter, referred to as web browsing information), account modification behavior information (hereinafter, referred to as account modification information), account login behavior information (hereinafter, referred to as account login information), and the like.
The user behavior characteristics are as follows: the data is data reflecting indexes related to user behaviors, such as device information used by an account to perform user behaviors, a used mobile phone number, a used delivery address and the like.
Characteristic values corresponding to the user behavior characteristics: for example, when the user behavior characteristic is a mobile phone number used by the user, the corresponding characteristic value may be a mobile phone number 1, a mobile phone number 2, or the like, that is, a specific mobile phone number.
User habit characteristics: the characteristic value meeting corresponding conditions in one or more characteristic values corresponding to the user behavior characteristics is referred to; for example, when the shipping addresses used by the user include shipping addresses a1, a2, and A3, and the shipping address a1 is used the most times, the user habit characteristics may be: the common shipping address is shipping address a 1.
And (3) correlation characteristics: the characteristic value of the user behavior characteristic capable of constructing the relationship among different account numbers is referred to; for example, if the account a1 performed a corresponding action using the cell phone number 1 and the account a2 performed a corresponding action using the cell phone number 1, the association between the account a1 and the account a2 may be the cell phone number 1.
Example one
Fig. 1 is a flowchart of an account identification method according to an embodiment of the present application. The present embodiment provides an account identification method for identifying a relationship between an account to be identified and an account user, for example, identifying whether a user (i.e., a user) of the account to be identified changes. The account identification method provided by the embodiment can be applied to a server computing device (e.g., a server) or a virtual machine running on the server computing device. The following description will be given taking the application to a server-side computing device as an example.
In the embodiment, a plurality of features can be extracted from the user behavior information of the account to be identified within the preset time length. The plurality of features may comprise a plurality of user habit features or a user relationship feature and at least one user habit feature. The plurality of features may be used as reference features for a subsequent account identification process. In addition, each time account identification is performed, a plurality of features may be extracted from the user behavior information of the account to be identified within a predetermined time period, or a plurality of features extracted previously may be used. This is not limited by the present application. The following description will take as an example a process of extracting a plurality of features from the user behavior information of the account to be identified within a predetermined time period, and then identifying the account by using the plurality of features.
As shown in fig. 1, the account identification method provided in this embodiment may include the following steps:
step 101: a plurality of features are extracted from user behavior information of the account to be identified within a preset time length.
In this embodiment, after a user registers an account on an internet platform (e.g., a social platform such as a business platform such as a city of great capital, and a social platform such as a QQ), a background server of the internet platform records operation information (i.e., user behavior information) performed by the user through the account, and stores all the user behavior information of each account into a database according to a time sequence, for example, in a log manner.
In this embodiment, step 101 may include: and extracting a plurality of features from the user behavior information within a preset time length at each information acquisition time from the registration time of the account to be identified. The preset time length may refer to a time length between the registration time of the account to be identified and the latest information acquisition time. In this embodiment, the information obtaining time may be set periodically, that is, from the registration time of the account to be identified, a plurality of features may be extracted from the user behavior information within a predetermined time period periodically.
Alternatively, from the registration time of the account to be identified, a plurality of features may be extracted from the user behavior information within a time period between the registration time and the latest fixed time at a fixed time of day (i.e., the aforementioned information acquisition time is a fixed setting or a periodic setting). Alternatively, the information acquisition time may be determined according to instruction triggering, that is, when a triggering instruction is received from the registration time of the account to be identified, a plurality of features are extracted from the user behavior information within a predetermined time length (that is, the time length between the registration time and the instruction triggering time). However, this is not limited in this application.
In this embodiment, regardless of the manner in which the information acquisition time is triggered, the plurality of features are extracted from the user behavior information occurring from the registration time of the account to be identified to the latest information acquisition time. However, in other implementations, the predetermined length of time may refer to a length of time between adjacent information acquisition times. That is, a plurality of features are extracted from the user behavior information during the latest information acquisition time and the previous information acquisition time.
For example, from the registration time T0 of the account to be identified, at the time T1, a plurality of features may be extracted from the user behavior information within the time T0 to T1, then at the time T2, a plurality of features may be extracted from the user behavior information within the time T0 to T2, then at the time T3, a plurality of features may be extracted from the user behavior information within the time T0 to T3, where T3-T2 equals T2-T1 equals T1-T0; alternatively, from the registration time T0 of the account to be identified, at time T1, a plurality of features may be extracted from the user behavior information within the time T0 to T1, then, at time T2, a plurality of features may be extracted from the user behavior information within the time T1 to T2, and then, at time T3, a plurality of features may be extracted from the user behavior information within the time T2 to T3.
In this embodiment, the user behavior information may include one or more of the following items: account registration information, identity authentication information, account login information, network browsing information, transaction information, account modification information and chat information:
the account registration information may include one or more of the following items: account registration time, a mobile phone number, a mailbox address, an account name, an account password, network information (e.g., an IP address and a WIFI address) used for account registration, and geographic location information (e.g., GPS (Global Positioning System) information) used for account registration;
the identity authentication information may include one or more of: authentication time, identification card number, name, identification card address, equipment information used for identification authentication, network information used for identification authentication, geographical location information used for identification authentication, and physical biological information (e.g., fingerprint and face information);
the account login information may include one or more of: login time, equipment information used for account login, network information used for account login, geographical location information when account login is performed, and real life biological information;
the network browsing information may include one or more of: browsing web page address, browsing time, browsing duration, collected web address, equipment information used for network browsing, network information used for network browsing, and geographical location information when network browsing is performed;
the transaction information may include one or more of: purchase information, sale information, equipment information used for network transaction, network information and geographical location information; wherein, purchase information includes the receipt information, and the receipt information includes: receiving name, receiving mobile phone number and receiving address;
the account modification information may include one or more of: account modification time, account modification content (such as passwords, account names, password prompt problems and the like), equipment information, network information and geographic position information used for account modification;
the chat information may include one or more of: chat objects, chat times, device information used to conduct chats, network information, and geographic location information.
The user behavior information is only an example, and the present application does not limit this.
In this embodiment, step 101 may include one of the following:
extracting a plurality of user habit features from user behavior information of an account to be identified within a preset time;
and extracting user relationship characteristics and at least one user habit characteristic from the user behavior information of the account to be identified within a preset time.
In this embodiment, the user behavior information includes a plurality of user behavior characteristics, and each user behavior characteristic corresponds to one or more characteristic values;
the user habit characteristics can be extracted from the user behavior information of the account to be identified within the preset time length in the following mode:
determining one or more characteristic values corresponding to the user behavior characteristics from the user behavior information within the preset time length;
respectively determining a characteristic value meeting a first preset condition in the one or more characteristic values aiming at each user behavior characteristic, and determining the characteristic value meeting the first preset condition as a user habit characteristic; wherein, a user habit characteristic can be correspondingly determined according to a user behavior characteristic;
wherein the first preset condition comprises at least one of: the most use times; the service life is longest; the weighted sum of the number of uses and the duration of use is maximized.
In this embodiment, after the user behaviors are combed through the time axis for each account, as the use time of the account increases, more and more user behaviors are recorded, so that the user habit characteristics can be abstracted according to the user behavior information.
Here, assuming that the matrix of the habit features of the user is H, H may be represented as follows:
H=<H1,H2,......,Hm>;
the above formula indicates that the user has m habitual features, such as habitually used equipment, frequently linked web pages, frequently visited geographical positions, frequently used delivery addresses, frequently used mobile phone numbers and the like. Wherein m is an integer greater than or equal to 1.
Assume a habitual feature H in the above matrix HmThe general shipping address is represented, since there may be several shipping addresses of a user, here, the shipping address with the most usage times is taken as the general shipping address, and the calculation formula is as follows:
Hm=MAX(x1,x2,......,xn);
in the above formula, n receiving addresses, xnIndicating the number of uses of the nth shipping address, i.e. xn=fn. Wherein n is a positive integer.
In addition, assume a habitual feature H in the matrix HmRepresenting the habitually used equipment, the link use time of the equipment is more referential than the use times, therefore, the equipment with the longest link use time is taken as the habitually used equipment, and the calculation formula is as follows:
Hm=MAX(x1,x2,......,xn);
in the above formula, n devices, x, are usednIndicating the duration of use of the nth device, i.e. xn=tn. Wherein n is a positive integer.
For example, the user behavior characteristics include, for example, a shipping address adopted by the user, and a device used by the user to log in an account; the shipping address adopted by the user corresponds to, for example, the following values (i.e., characteristic values): a shipping address A1, a shipping address A2, a shipping address A3; the device used by the user login account corresponds to the following values (i.e., characteristic values), for example: device B1, device B2, device B3. Here, when determining the commonly used shipping address of the user, the shipping address with the largest number of uses among the three shipping addresses in the predetermined time period is used as the commonly used shipping address, for example, the shipping address a 1; in determining the habitual use device, the device with the longest use time among the three devices within the predetermined time period is used as the habitual use device, for example, device B1.
Similarly, the common receiving names, the common mobile phone numbers and other user habit characteristics can be determined in the same determining mode as the common receiving addresses. The habit features of the user, such as frequently linked web pages, frequently visited geographical locations, etc., can be determined in the same way as the habitually used devices.
However, this is not limited in this application. In other embodiments, the habit characteristics of the user can be determined by comprehensively considering the use times and the use duration of one or more characteristic values corresponding to the behavior characteristics of the user. For example, assume a habitual feature H in the matrix HmThe calculation formula is as follows:
Hm=MAX(x1,x2,......,xn);
the expression shows that one user behavior characteristic corresponds to n characteristic values, xnRepresenting a weighted sum of the duration and number of uses of the nth characteristic value, i.e. xn=a×fn+b×tn,fnIndicates the number of uses, tnIndicating the use time; a. b is weight which can be preset according to needs; n is a positive integer.
In some embodiments, one or more of the above ways may be selected to determine the user habit feature based on the attributes of the user habit feature. However, this is not limited in this application.
In this embodiment, the extracting the user relationship feature from the user behavior information of the account to be identified within the predetermined time includes:
respectively determining a first relation score and a second relation score according to the user behavior information within the preset time length; wherein the first relationship score comprises a relationship score between the account to be identified and each first associated feature, the first associated feature being an associated feature associated with the account to be identified; the second relation score comprises a relation score between a relation account and the associated first associated characteristic, and the relation account refers to an account associated with any first associated characteristic except the account to be identified;
determining a third relation score between the account to be identified and each relation account according to the first relation score and the second relation score;
determining the user relationship characteristics of the account to be identified according to the third relationship value;
the association characteristics refer to characteristic values of user behavior characteristics capable of constructing relationships among different accounts.
In this embodiment, the user relationship feature may include: the relationship account number meeting a second preset condition, wherein the second preset condition comprises: and a third relation score between the relation account and the account to be identified is greater than or equal to a first threshold, or the third relation score belongs to the first N in the sequence from high to low, wherein N is a positive integer. The first threshold and the N value may be set according to actual conditions, which is not limited in the present application.
In this embodiment, when a plurality of accounts log on the same device and have user behavior, the accounts can be regarded as having a certain relationship; similarly, several accounts having the same WIFI address, shipping address, etc. may also be considered to be related. As can be seen from the above-listed relationships, the relationships between accounts are linked together by various association features, i.e., the relationships between accounts are constructed by the association features. In this embodiment, the user behavior characteristics that can be used as the association characteristics may include one or more of the following: WIFI address, delivery address, mailbox address, cell-phone number.
When determining the relationship between the account numbers, the relationship between the account numbers and the associated features is determined. In this embodiment, when the relationship score between the account and the associated feature is described, a value weight is set according to a corresponding rule or experience, that is, the operation of the account on the associated feature has a value weight W. For example, an account purchase on an associated feature may be more valuable than an add-on shopping cart operation. The setting rule of the value weight may be determined according to actual conditions, and the present application does not limit this.
In this embodiment, the relationship score between the account and the associated feature may be determined according to the following manner:
calculating a first sum of value weights of operation behaviors of an account on a correlation characteristic, and calculating a second sum of value weights of all operation behaviors on a user behavior characteristic to which the correlation characteristic (namely the characteristic value of the user behavior characteristic) belongs;
calculating the result of dividing the first sum value by the second sum value, and multiplying the result by the maximum value of all value weights corresponding to the user behavior characteristics to which the associated characteristics belong;
and taking the half power of the multiplication result as the relation score of the account and the associated characteristics.
For example, as shown in fig. 2, the account a1 has had corresponding user behaviors on three mobile phone numbers (association features M1, M2, M3) within a predetermined time period. Wherein, the value weight of the purchasing behavior is W1, the value weight of the password modifying behavior is W2, and the value weight of the mailbox binding behavior is W3.
The calculation formula of the relationship score between the account A1 and the associated feature M1 (namely, the mobile phone number 1) is given as follows:
Figure BDA0001121621980000151
from the above equation, it can be seen that if the more a user operates an action on a certain associated feature, the more important the value weight of the action is, the higher the relationship score is. MAX in the above equation is to harmonize the relationship score, since the latter is likely to be 1, and in extreme cases the account has only one associated feature and one action, but R (A) at this time1,M1) Not necessarily the largest, but rather the value weight of the action that the account makes on the associated feature.
After determining the relationship scores (i.e., the first relationship score and the second relationship score) for each account and associated feature, the full number of accounts are associated by the associated feature. Several definitions are explained below:
M(Ai) Represents Account number AiSet of all associated features, i.e.
M(Ai)={M1,M2,......,Mn};
U(Mk) Representing associated features MkSet of all accounts associated therewith, i.e.
U(Mk)={A1,A2,......,An};
Account AiAnd AjThe relationship score of (a) of (b) (i.e., the aforementioned third relationship score) may be calculated according to the following equation:
Figure BDA0001121621980000161
the log function has the effect of reconciliation, so that the overall value of the term is not sharply reduced when the number of denominator terms is large.
The relationship score (i.e., the third relationship score described above) for each account number and relationship account number can be calculated by the above equation. For an account to be identified, according to all third relationship scores between the calculated relationship scores and the relationship accounts, taking the first N relationship accounts as a relationship list of the account to be identified (namely, the user relationship characteristics) in descending order of the third relationship scores. N is a positive integer, and a value of N is determined according to actual needs, which is not limited in the present application.
An example is briefly described below. As shown in FIG. 3, account A1 and A2 are associated by association features M1 and M2, account A1 is also associated with association feature M3, and account A3 is associated with association features M1 and M4. According to the calculation formula of the relationship scores between the account numbers, the relationship scores of the account numbers A1 and A2 can be calculated according to the following formula:
Figure BDA0001121621980000162
in this embodiment, the user behind the account can be assisted and judged by the user relationship characteristic of the account, and if the user relationship characteristic suddenly changes, it can be shown that the user of the account is likely to change.
Step 102: and determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified.
In this embodiment, the features to be determined may include: the user relationship characteristics to be judged and at least one user behavior characteristic; alternatively, a plurality of user behavior characteristics to be judged may be included.
In this embodiment, the feature to be determined may be determined according to any one of the following manners:
determining user behavior information to be judged according to a preset period; at the moment, aiming at an account to be identified, at each cycle moment, according to the user behavior information which is closest to the cycle moment, obtaining the user behavior characteristics to be judged; determining a user relationship characteristic (such as a user relationship list) to be judged according to user behavior information between the registration time of the account to be identified and the current period time or according to the user behavior information within a period of time before the current period time;
triggering and determining user behavior information to be judged according to the instruction; at the moment, aiming at an account to be identified, at the moment of instruction triggering, according to the user behavior information which is closest to the moment of instruction triggering, obtaining the user behavior characteristics to be judged; determining the user relationship characteristics to be judged according to the user behavior information between the registration time of the account to be identified and the instruction triggering time or according to the user behavior information within a period of time before the instruction triggering time;
determining user behavior information to be judged according to the latest user behavior occurrence moment of the account to be identified; at the moment, aiming at an account to be identified, obtaining user behavior characteristics to be judged according to the latest generated user behavior information; determining the user relationship characteristics to be judged according to the user behavior information between the registration time of the account to be identified and the latest user behavior occurrence time or according to the user behavior information within a period of time before the latest user behavior occurrence time;
determining user behavior information to be judged according to the occurrence time of the specific user behavior of the account to be identified; for example, determining the user behavior characteristics to be judged according to the user behavior information at the moment when the user login address of the account to be identified changes; and determining the user relationship characteristics to be judged according to the user behavior information between the registration time of the account to be identified and the time when the login address changes or according to the user behavior information within a period of time before the time when the login address changes.
However, the present application is not limited to this, and in other embodiments, other conditions may be set as needed to determine the features to be determined.
Step 103: and respectively comparing the characteristics to be judged of the account to be identified with the plurality of characteristics to obtain a plurality of comparison results.
The plurality of features used for comparison in this step are the plurality of features obtained in step 101. In some implementations, after the account identification, when the account identification is performed again, the multiple features obtained in step 101 may be directly used, and it is not necessary to repeat step 101 to extract multiple features again.
In this embodiment, the step may include:
when the features to be judged comprise the user behavior features to be judged and the features comprise the user habit features corresponding to the user behavior features to be judged, comparing whether the feature values of the user behavior features to be judged are consistent with the user habit features or not to obtain a comparison result;
and when the plurality of features comprise user relationship features and the features to be judged comprise the user relationship features to be judged, calculating Jacard distances between the user relationship features to be judged and the user relationship features comprised in the plurality of features, and comparing the Jacard distances with a second threshold value to obtain a comparison result.
When the plurality of characteristics comprise user habit characteristics corresponding to the user behavior characteristics to be judged, if the characteristic values of the user behavior characteristics to be judged are consistent with the user habit characteristics, the comparison result corresponding to the user habit characteristics is determined to be a first numerical value, and if the characteristic values of the user behavior characteristics to be judged are inconsistent with the user habit characteristics, the comparison result corresponding to the user habit characteristics is determined to be a second numerical value;
and when the plurality of features comprise user relationship features and the features to be judged comprise the user relationship features to be judged, calculating Jacard distances between the user relationship features to be judged and the user relationship features comprised in the plurality of features, if the Jacard distances are smaller than a second threshold, determining that a comparison result corresponding to the user relationship features is a third numerical value, and if the Jacard distances are larger than or equal to the second threshold, determining that the comparison result corresponding to the user relationship features is a fourth numerical value.
When the user habit features included in the plurality of features do not have corresponding user behavior features to be judged, it may be determined that the comparison result corresponding to the user habit features is null.
In this embodiment, the first value is 1, which indicates normal, and the second value is 0, which indicates abnormal. The third value is 0 indicating abnormality, and the fourth value is 1 indicating normality. That is, the comparison result obtained is of the boolean type.
For example, for the user habit features, the latest user behavior features are matched with the corresponding user habit features, if the latest user behavior features are matched with the user habit features, the comparison result of the user habit features is determined to be 1, and if not, the comparison result of the user habit features is determined to be 0. For example, a user often makes a purchase in Hangzhou, and suddenly and currently has a purchase behavior in Beijing, the user is not in accordance with the habit, so the characteristics of the user's purchase habit are abnormal, and the comparison result of the purchase habit is 0. And if the user habit characteristics do not have the corresponding relation with the latest user behavior characteristics, setting the comparison result corresponding to the user habit characteristics to be null. For example, if the latest user behavior characteristics do not include the shipping address information, the comparison result corresponding to the user habit characteristics that represent the common shipping address may be set to null.
In practical applications, the number of the user habit features extracted in step 101 may be multiple, and the number of the user behavior features to be determined obtained in this step may be less than or equal to the number of the user habit features extracted in step 101. In this embodiment, the number of comparison results is determined according to the number of user behavior features and user relationship features included in the features to be determined (e.g., determined according to the latest user behavior information) obtained in this step, and is, for example, less than or equal to the number of features extracted in step 101.
For example, for the user relationship features, the Jaccard Distance (Jaccard Distance) between the latest relationship list (denoted as the first set) and the relationship list (denoted as the second set) determined in step 101 is calculated. And if the number of the account numbers in the intersection of the first set and the second set is called the intersection number, and the number of the account numbers in the union of the first set and the second set is called the union number, the Jacard distance is equal to the value obtained by dividing the intersection number by the union number. If the Jacard distance is smaller than the second threshold, it can be determined that the user relationship characteristic of the account to be identified is abnormal, and the corresponding comparison result can be 0; if the jaccard distance is greater than or equal to the second threshold, it may be determined that the user relationship characteristic of the account to be identified is normal, and the corresponding comparison result may be 1. Wherein, the second threshold value can be set according to actual conditions.
After the comparison result corresponding to each user habit feature and the comparison result corresponding to the user relationship feature are obtained, all the calculated comparison results may be combined into a feature vector X.
Step 104: and identifying the relevance between the account number to be identified and the user according to the comparison results.
In this embodiment, the correlation between the account to be recognized and the user is recognized by calculating the comparison results using a user recognition model.
In this embodiment, before step 104, the method further includes: obtaining the user identification model by:
selecting sample data, wherein the sample data comprises the known data of the account changed by the user; and training a machine learning algorithm model by using the sample data to obtain the user identification model.
In this embodiment, a user identification model is obtained through model training, and then, in each account identification process, the feature vector obtained in step 103 may be directly put into the user identification model for calculation, so as to identify the association between the account and the user.
In this embodiment, the existing identity masquerading case is used to deposit the artificially labeled sample. These samples were used for model training. And then, carrying out full-user online prediction by using the trained user identification model, so as to obtain the authenticity score of each account identity.
In this embodiment, the machine learning algorithm may include any one of the following: a logic return algorithm, a random forest algorithm, a GBDT (Gradient Boosting Decision Tree), a Decision Tree algorithm, an SVM (Support Vector Machine) algorithm, and a neural network algorithm.
In the present embodiment, a logistic regression algorithm is taken as an example to construct the user identification model. And solving the optimal weight of the feature vector through a logistic regression algorithm to obtain an approximate optimal solution of the non-convex optimization problem.
The explanation of the construction of the user identification model based on the logistic regression algorithm is as follows:
one, giving a sigmoid function for smoothly mapping the model output result to two values (0 and 1):
Figure BDA0001121621980000201
given a feature vector and a weight vector, it is assumed here that the feature vector is n-dimensional, i.e., the feature vector is X ═ X (X)1,x2,......,xn) (ii) a Because the supervised learning is performed by using the manual labeled sample, after the sample is audited, the manual label indicates whether the sample is a real person (whether the user of the account changes), if the result is y (answer) after the sample is audited, if the account is bought or sold (i.e. the user of the account changes), the result is 0, and no buying or selling occurs (i.e. the user of the account does not have any change)Change) then the result is 1, i.e., y ∈ (0, 1); thus, the feature weight vector is given as follows:
ΘTX=θ01x12x2+......+θnxn
thirdly, constructing a prediction function, wherein the characteristic weight vector is a result obtained after weighting the characteristic vector, and the result obtained by calculating the characteristic weight vector in the sigmoid function is as follows:
Figure BDA0001121621980000202
result h output by the above formulaθ(X) represents a probability that the user identity of the account is not mutated, which is a probability value of 1;
fourthly, constructing a loss function; assuming a total of m samples, the loss function of logistic regression is as follows:
Figure BDA0001121621980000211
in this regard, the goal is to find an optimal set of Θ, minimizing the value of the loss function; and (3) optimizing theta by using a gradient descent method to obtain a minimum loss function, and directly giving an update function as follows:
Figure BDA0001121621980000212
wherein, a in the update function represents a model training rate, i.e., a learning rate, and the smaller this value is, the closer to the optimal solution is, but the model training speed will be very slow, and vice versa, this value may be determined according to specific situations, and in this embodiment, this value is, for example, 0.01;
Figure BDA0001121621980000213
representing the jth feature in the ith sample. And the theta is an important result obtained by model training.
User recognition after trainingAfter the model is identified, the feature vector obtained in step 103 can be weighted directly by using the result theta of model training, and then the prediction result h is calculatedθ(X), the obtained prediction result is the probability value that the user of the account to be identified does not change.
In this embodiment, step 104 may include:
calculating the comparison results by using a user identification model to obtain a probability value that the user of the account to be identified does not change; when the probability value is larger than a third threshold value, identifying that the user of the account to be identified is unchanged; and when the probability value is smaller than or equal to a third threshold value, identifying that the user of the account to be identified changes. Wherein, the third threshold value can be set according to actual conditions.
In some implementations, the comparing process of the probability value and the third threshold may also be put into the user identification model, that is, after the multiple comparison results are put into the user identification model, the result of whether the user of the account to be identified changes may be directly obtained.
In this embodiment, when the user who recognizes the account changes, the user may be prompted to perform real-person authentication, for example, to request to authenticate real-person biological information such as a fingerprint and a face of the user. And allowing the user to continue to use the account for operation after the user passes the real person authentication.
In this embodiment, based on the big data of the internet platform, the user habit features are abstracted, the user relationship features are extracted, and when the user behavior of the account is monitored subsequently, the association between the account and the user is identified through calculation of the user identification model based on the user habit features or the user habit features and the user relationship features.
The following scenarios are illustrated:
scene one
After a user registers an account number on an internet platform and performs real-name authentication, the account number is transferred to other people without using the account number to perform webpage browsing or shopping operation, and the other people use the account number. For this situation, after the user registers an account and performs real-name authentication, the internet platform may determine, for example, a common mobile phone number, a common geographical location, a common habit feature such as a name, and a user relationship feature (for example, a user relationship list) according to the information of the account registered by the user and the information of the real-name authentication; then, the internet platform monitors the use condition of the account in real time, and when the use behavior of the account is detected (for example, other people use the account to perform web browsing), the latest user behavior information of the account (for example, including currently used device information, current geographic location information, and a current user relationship list, etc.) may be determined. And comparing the information recorded after the user registers the account with the current latest user behavior information to obtain a plurality of comparison results, calculating the comparison results by using a user identification model, and judging whether the user of the account changes. At this time, since the user actually changes, the ue information, the geographical location information, the subscriber relationship list, and the like may change, and the user change may be recognized based on the change. After the user who recognizes the account number changes, the internet platform prompts the user of the account number to perform identity verification modes such as living body detection, face recognition and the like so as to ensure that the user of the account number is the identity of identity authentication.
Scene two
After a user registers an account number on an internet platform, the user logs in the account number in one place to browse a webpage, and logs in the account number in another place to perform shopping operation. The internet platform can detect whether the user changes each time the user uses the account. When a user logs in the account to perform a shopping operation in another place, although the geographic location information is changed, the internet platform can recognize that the user of the account is not changed under the condition that the user relationship characteristic and other habit characteristics of the account are not changed, so that the user is not prompted to perform identity authentication.
In summary, in this embodiment, the habit features of the user or the habit features and the user relationship features are abstracted from the behavior trace of the e-commerce data, and the association between the account and the user, such as whether the account is purchased or sold or stolen, can be continuously and accurately identified by combining a plurality of user habit features or a plurality of user habit features and user relationship features. Once the user of the account is monitored to change, the user is required to perform identity verification modes such as living body detection or face recognition and the like so as to ensure that the account user is the identity of the identity authentication person, and therefore, the network security is improved.
Example two
The present embodiment provides an account identification method for identifying an association between an account to be identified and a user, for example, identifying whether the user (i.e., user) of the account to be identified changes.
The difference between the account identification method provided by the embodiment and the first embodiment is as follows: in this embodiment, the association between the account to be identified and the user is identified by determining whether the comparison results satisfy a predetermined condition.
In this embodiment, identifying the relationship between the account to be identified and the user by determining whether the comparison results satisfy the predetermined condition may include:
respectively calculating products of each comparison result and the corresponding weight, and calculating sum values of all the products;
when the sum is larger than a fourth threshold value, identifying that the user of the account to be identified is not changed;
and when the sum is less than or equal to a fourth threshold value, identifying that the user of the account to be identified changes.
In this embodiment, each feature extracted from the user behavior information of the account to be identified within the predetermined time period corresponds to a weight, and the weight may be calculated by a machine learning algorithm.
Wherein the machine learning algorithm may include any one of: a logic return algorithm, a random forest algorithm, a GBDT (Gradient Boosting Decision Tree), a Decision Tree algorithm, an SVM (Support Vector Machine) algorithm, and a neural network algorithm. For example, the process of calculating the weight by using the logistic regression algorithm can refer to the first embodiment, where Θ is the obtained optimal weight vector.
In this embodiment, after obtaining the comparison result corresponding to each user habit feature and the comparison result corresponding to the user relationship feature, all the calculated comparison results may be combined into a feature vector X, and a feature weight vector is calculated according to the feature vector X and the optimal weight vector Θ:
ΘTX=θ01x12x2+......+θnxn
and identifying the relevance between the account to be identified and the user according to the result of the characteristic weight vector. For example, when the feature weight vector is greater than a fourth threshold, identifying that the user of the account to be identified has not changed; and when the characteristic weight vector is less than or equal to a fourth threshold value, identifying that the user of the account to be identified changes. The fourth threshold may be set according to an actual situation, which is not limited in this application.
However, the present embodiment is not limited thereto. In other embodiments, it may be further configured that when the number of the comparison results that satisfies the predetermined number is greater than or equal to the fifth threshold, the user of the account to be identified is identified as unchanged, and when the number of the comparison results that satisfies the predetermined number is less than the fifth threshold, the user of the account to be identified is identified as changed. In this embodiment, based on the description of the comparison result in example one, when the comparison result is of the boolean type, the predetermined value may be 1 (indicating normal). Namely, the number of the normal features in the features to be judged is greater than or equal to the fifth threshold, the user who recognizes the account to be recognized is not changed. Wherein, the fifth threshold value can be set according to actual conditions.
In addition, other descriptions related to the present embodiment can refer to the description of the first embodiment, and thus are not repeated herein.
EXAMPLE III
As shown in fig. 4, the present embodiment provides an account identification apparatus for identifying a relationship between an account to be identified and a user, the account identification apparatus including:
the first acquisition module is used for extracting a plurality of characteristics from the user behavior information of the account to be identified within a preset time;
the second acquisition module is used for determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified;
the comparison module is used for respectively comparing the features to be judged of the account to be identified with the plurality of features to obtain a plurality of comparison results;
and the identification module is used for identifying the relevance between the account number to be identified and the user according to the comparison results.
In this embodiment, the first obtaining module may extract a plurality of features from the user behavior information of the account to be identified within a predetermined time period by one of the following methods:
extracting a plurality of user habit features from user behavior information of an account to be identified within a preset time;
and extracting user relationship characteristics and at least one user habit characteristic from the user behavior information of the account to be identified within a preset time.
The user behavior information comprises a plurality of user behavior characteristics, and each user behavior characteristic corresponds to one or more characteristic values;
the first acquisition module can extract user habit characteristics from user behavior information of the account to be identified within a preset time period in the following way:
determining one or more characteristic values corresponding to the user behavior characteristics from the user behavior information within the preset time length;
respectively determining a characteristic value meeting a first preset condition in the one or more characteristic values aiming at each user behavior characteristic, and determining the characteristic value meeting the first preset condition as a user habit characteristic;
wherein the first preset condition comprises at least one of: the most use times; the service life is longest; the weighted sum of the number of uses and the duration of use is maximized.
In this embodiment, the first obtaining module may extract the user relationship feature from the user behavior information of the account to be identified within the predetermined time period in the following manner:
respectively determining a first relation score and a second relation score according to the user behavior information within the preset time length; wherein the first relationship score comprises a relationship score between the account to be identified and each first associated feature, the first associated feature being an associated feature associated with the account to be identified; the second relation score comprises a relation score between a relation account and the associated first associated characteristic, and the relation account refers to an account associated with any first associated characteristic except the account to be identified;
determining a third relation score between the account to be identified and each relation account according to the first relation score and the second relation score;
determining the user relationship characteristics of the account to be identified according to the third relationship value;
the association characteristics refer to characteristic values of user behavior characteristics capable of constructing relationships among different accounts.
In this embodiment, the user relationship features include: the relationship account satisfying a second preset condition, where the second preset condition includes: and a third relation score between the relation account and the account to be identified is greater than or equal to a first threshold, or the third relation score belongs to the first N in the sequence from high to low, wherein N is a positive integer.
In this embodiment, the comparing module may be configured to compare the feature to be determined with the plurality of features respectively in the following manner to obtain a plurality of comparison results:
when the features to be judged comprise the user behavior features to be judged and the features comprise the user habit features corresponding to the user behavior features to be judged, comparing whether the feature values of the user behavior features to be judged are consistent with the user habit features or not to obtain a comparison result;
and when the plurality of features comprise user relationship features and the features to be judged comprise the user relationship features to be judged, calculating Jacard distances between the user relationship features to be judged and the user relationship features comprised in the plurality of features, and comparing the Jacard distances with a second threshold value to obtain a comparison result.
In this embodiment, the identification module may be configured to identify the association between the account to be identified and the user according to the comparison results in the following manner:
identifying an association between the account to be identified and a user by calculating the comparison results using a user identification model; alternatively, the first and second electrodes may be,
and identifying the relevance between the account to be identified and the user by judging whether the comparison results meet the preset conditions.
In this embodiment, the account identification apparatus may further include: a model building module, configured to obtain the user identification model through the following steps: selecting sample data, wherein the sample data comprises the known data of the account changed by the user; and training a machine learning algorithm model by using the sample data to obtain the user identification model.
In this embodiment, the identification module may be configured to identify the association between the account to be identified and the user by calculating the comparison results using a user identification model in the following manner:
calculating the comparison results by using a user identification model to obtain a probability value that the user of the account to be identified does not change; when the probability value is larger than a third threshold value, identifying that the user of the account to be identified is unchanged; and when the probability value is smaller than or equal to a third threshold value, identifying that the user of the account to be identified changes.
In this embodiment, the first obtaining module may be configured to extract a plurality of features from the user behavior information of the account to be identified within the predetermined time period in the following manner:
extracting a plurality of features from the user behavior information within a preset time length at each information acquisition time from the registration time of the account to be identified;
the preset time length refers to a time length between the registration time of the account to be identified and the latest information acquisition time, or the preset time length refers to an interval time length between adjacent information acquisition times.
The processing flow of the account identification apparatus provided in this embodiment may refer to the method described in the first embodiment, and therefore, the description thereof is omitted here.
Example four
As shown in fig. 5, the present embodiment provides an account identification system for identifying a relationship between an account to be identified and a user, the account identification system including: a first device and a second device;
the first device is used for extracting a plurality of characteristics from the user behavior information of the account to be identified within a preset time length, determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified, respectively comparing the characteristics to be judged with the plurality of characteristics to obtain a plurality of comparison results, and sending the plurality of comparison results to the second device;
and the second device is used for receiving the comparison results sent by the first device and identifying the relevance between the account to be identified and the user according to the comparison results.
In this embodiment, the second device is configured to identify the association between the account to be identified and the user according to the comparison results in the following manner:
identifying an association between the account to be identified and a user by calculating the comparison results using a user identification model; alternatively, the first and second electrodes may be,
and identifying the relevance between the account to be identified and the user by judging whether the comparison results meet the preset conditions.
In this embodiment, the first apparatus is further configured to obtain the subscriber identity module by: selecting sample data, wherein the sample data comprises the known data of the account changed by the user; training a machine learning algorithm model by using the sample data to obtain the user identification model;
the first device is further configured to send the obtained user identification model to the second device.
In this embodiment, the first apparatus may be a server computing device or a virtual machine running on the server computing device, and the second apparatus may be a client computing device or a server computing device different from the first apparatus.
The difference between this embodiment and the first embodiment is: in this embodiment, the determination process of the comparison result and the training process of the user recognition model are executed by different subjects from the execution subject of the account recognition process.
In this embodiment, for example, the server-side computing device extracts a plurality of features from the user behavior information of the account to be identified within a predetermined time period, determines the features to be determined of the account to be identified according to the user behavior information to be determined of the account to be identified, compares the features to be determined with the plurality of features respectively to obtain a plurality of comparison results, and sends the plurality of comparison results to the client-side computing device in real time; the server-side computing equipment performs model training by using the selected sample data to obtain a user identification model, and sends the trained user identification model to the client-side computing equipment; and the client computing equipment calculates the received comparison results by using a user identification model to obtain a probability value that the user of the account to be identified does not change, and identifies whether the user of the account to be identified changes or not according to the probability value. The specific implementation of the related process in this embodiment can be referred to the description of the first embodiment, and therefore, the description thereof is omitted here.
EXAMPLE five
An embodiment of the present invention provides a data processing electronic device for account identification, including: a memory and a processor; the memory is used for storing a program for account identification, and when the program for account identification is read and executed by the processor, the program performs the following operations: determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified; respectively comparing the features to be judged with the plurality of features to obtain a plurality of comparison results; and identifying the relevance between the account to be identified and the user according to the comparison results, wherein the characteristics are extracted from the user behavior information of the account to be identified in a preset time.
In addition, the specific operations performed by the processor may refer to the first embodiment and the second embodiment, and therefore, the detailed description thereof is omitted here.
In addition, the embodiment of the invention also provides a computer-readable storage medium, which stores computer-executable instructions, and the computer-executable instructions, when executed by a processor, implement the account identification method.
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by a program instructing associated hardware (e.g., a processor) to perform the steps, and the program may be stored in a computer readable storage medium, such as a read only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, the modules/units in the above embodiments may be implemented in hardware, for example, by an integrated circuit, or may be implemented in software, for example, by a processor executing programs/instructions stored in a memory to implement the corresponding functions. The present application is not limited to any specific form of hardware or software combination.
The foregoing shows and describes the general principles and features of the present application, together with the advantages thereof. The present application is not limited to the above-described embodiments, which are described in the specification and drawings only to illustrate the principles of the application, but also to provide various changes and modifications within the spirit and scope of the application, which are within the scope of the claimed application.

Claims (13)

1. An account identification method for identifying the relevance between an account to be identified and a user, wherein a plurality of characteristics can be extracted from user behavior information of the account to be identified in a preset time, the account identification method comprises the following steps:
determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified;
respectively comparing the features to be judged of the account to be identified with the plurality of features to obtain a plurality of comparison results;
according to the comparison results, identifying the relevance between the account number to be identified and the user;
the method comprises the following steps of extracting a plurality of characteristics from user behavior information of an account to be identified within a preset time length in the following mode:
extracting user relationship characteristics and at least one user habit characteristic from user behavior information of an account to be identified within a preset time;
the method for extracting the user relationship characteristics from the user behavior information of the account to be identified in the preset time comprises the following steps:
respectively determining a first relation score and a second relation score according to the user behavior information within the preset time length; wherein the first relationship score comprises a relationship score between the account to be identified and each first associated feature, the first associated feature being an associated feature associated with the account to be identified; the second relation score comprises a relation score between a relation account and the associated first associated characteristic, and the relation account refers to an account associated with any first associated characteristic except the account to be identified;
determining a third relation score between the account to be identified and each relation account according to the first relation score and the second relation score;
determining the user relationship characteristics of the account to be identified according to the third relationship value;
the association characteristics refer to characteristic values of user behavior characteristics capable of constructing relationships among different accounts.
2. The method of claim 1, wherein identifying the association between the account to be identified and the user according to the plurality of comparison results comprises:
identifying an association between the account to be identified and a user by calculating the comparison results using a user identification model; alternatively, the first and second electrodes may be,
and identifying the relevance between the account to be identified and the user by judging whether the comparison results meet the preset conditions.
3. The method of claim 1, wherein the user behavior information comprises a plurality of user behavior characteristics, each user behavior characteristic corresponding to one or more characteristic values;
the method comprises the following steps of extracting user habit characteristics from user behavior information of an account to be identified within a preset time length in the following mode:
determining one or more characteristic values corresponding to the user behavior characteristics from the user behavior information within the preset time length;
respectively determining a characteristic value meeting a first preset condition in the one or more characteristic values aiming at each user behavior characteristic, and determining the characteristic value meeting the first preset condition as a user habit characteristic;
wherein the first preset condition comprises at least one of: the most use times; the service life is longest; the weighted sum of the number of uses and the duration of use is maximized.
4. The method of claim 1, wherein the user relationship features comprise: the relationship account number meeting a second preset condition, wherein the second preset condition comprises: and a third relation score between the relation account and the account to be identified is greater than or equal to a first threshold, or the third relation score belongs to the first N in the sequence from high to low, wherein N is a positive integer.
5. The method according to claim 1, wherein the comparing the features to be determined of the account to be identified with the plurality of features respectively to obtain a plurality of comparison results comprises:
when the features to be judged comprise the user behavior features to be judged and the features comprise the user habit features corresponding to the user behavior features to be judged, comparing whether the feature values of the user behavior features to be judged are consistent with the user habit features or not to obtain a comparison result;
and when the plurality of features comprise user relationship features and the features to be judged comprise the user relationship features to be judged, calculating Jacard distances between the user relationship features to be judged and the user relationship features comprised in the plurality of features, and comparing the Jacard distances with a second threshold value to obtain a comparison result.
6. The method of claim 2, wherein before identifying the association between the account to be identified and the user by computing the plurality of comparison results using a user identification model, the method further comprises: obtaining the user identification model by:
selecting sample data, wherein the sample data comprises the known data of the account changed by the user;
and training a machine learning algorithm model by using the sample data to obtain the user identification model.
7. The method of claim 2, wherein the calculating the comparison results using a user identification model to identify the association between the account to be identified and the user comprises:
calculating the comparison results by using a user identification model to obtain a probability value that the user of the account to be identified does not change;
when the probability value is larger than a third threshold value, identifying that the user of the account to be identified is unchanged;
and when the probability value is smaller than or equal to a third threshold value, identifying that the user of the account to be identified changes.
8. The method of claim 1, wherein the extracting a plurality of features from the user behavior information of the account to be identified within a predetermined time period comprises:
extracting a plurality of features from the user behavior information within a preset time length at each information acquisition time from the registration time of the account to be identified;
the preset time length refers to a time length between the registration time of the account to be identified and the latest information acquisition time, or the preset time length refers to an interval time length between adjacent information acquisition times.
9. An account identification apparatus for identifying an association between an account to be identified and a user, the account identification apparatus comprising:
the first acquisition module is used for extracting a plurality of characteristics from the user behavior information of the account to be identified within a preset time;
the second acquisition module is used for determining the characteristics to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified;
the comparison module is used for respectively comparing the features to be judged of the account to be identified with the plurality of features to obtain a plurality of comparison results;
the identification module is used for identifying the relevance between the account to be identified and the user according to the comparison results;
the first acquisition module extracts a plurality of characteristics from the user behavior information of the account to be identified within a preset time length in the following mode:
extracting user relationship characteristics and at least one user habit characteristic from user behavior information of an account to be identified within a preset time;
the method for extracting the user relationship characteristics from the user behavior information of the account to be identified in the preset time comprises the following steps:
respectively determining a first relation score and a second relation score according to the user behavior information within the preset time length; wherein the first relationship score comprises a relationship score between the account to be identified and each first associated feature, the first associated feature being an associated feature associated with the account to be identified; the second relation score comprises a relation score between a relation account and the associated first associated characteristic, and the relation account refers to an account associated with any first associated characteristic except the account to be identified;
determining a third relation score between the account to be identified and each relation account according to the first relation score and the second relation score;
determining the user relationship characteristics of the account to be identified according to the third relationship value;
the association characteristics refer to characteristic values of user behavior characteristics capable of constructing relationships among different accounts.
10. The apparatus of claim 9, wherein the identifying module is configured to identify the association between the account to be identified and the user according to the comparison results by:
identifying an association between the account to be identified and a user by calculating the comparison results using a user identification model; alternatively, the first and second electrodes may be,
and identifying the relevance between the account to be identified and the user by judging whether the comparison results meet the preset conditions.
11. An account identification system for identifying an association between an account to be identified and a user, the account identification system comprising: a first device and a second device;
the first device is used for extracting a plurality of features from the user behavior information of the account to be identified within a preset time length, determining the features to be judged of the account to be identified according to the user behavior information to be judged of the account to be identified, respectively comparing the features to be judged with the features to obtain a plurality of comparison results, and sending the comparison results to the second device;
the second device is used for receiving the comparison results sent by the first device and identifying the relevance between the account to be identified and the user according to the comparison results;
the first device extracts a plurality of characteristics from user behavior information of an account to be identified within a preset time length by the following method:
extracting user relationship characteristics and at least one user habit characteristic from user behavior information of an account to be identified within a preset time;
the method for extracting the user relationship characteristics from the user behavior information of the account to be identified in the preset time comprises the following steps:
respectively determining a first relation score and a second relation score according to the user behavior information within the preset time length; wherein the first relationship score comprises a relationship score between the account to be identified and each first associated feature, the first associated feature being an associated feature associated with the account to be identified; the second relation score comprises a relation score between a relation account and the associated first associated characteristic, and the relation account refers to an account associated with any first associated characteristic except the account to be identified;
determining a third relation score between the account to be identified and each relation account according to the first relation score and the second relation score;
determining the user relationship characteristics of the account to be identified according to the third relationship value;
the association characteristics refer to characteristic values of user behavior characteristics capable of constructing relationships among different accounts.
12. The system of claim 11, wherein the second means is configured to identify the association between the account to be identified and the user based on the plurality of comparisons by:
identifying an association between the account to be identified and a user by calculating the comparison results using a user identification model; alternatively, the first and second electrodes may be,
and identifying the relevance between the account to be identified and the user by judging whether the comparison results meet the preset conditions.
13. The system of claim 12, wherein the first means is further configured to obtain the subscriber identity module by: selecting sample data, wherein the sample data comprises known data of an account number changed by a user; training a machine learning algorithm model by using the sample data to obtain the user identification model;
the first device is further configured to send the obtained user identification model to the second device.
CN201610857050.0A 2016-09-27 2016-09-27 Account identification method, device and system Active CN107872436B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610857050.0A CN107872436B (en) 2016-09-27 2016-09-27 Account identification method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610857050.0A CN107872436B (en) 2016-09-27 2016-09-27 Account identification method, device and system

Publications (2)

Publication Number Publication Date
CN107872436A CN107872436A (en) 2018-04-03
CN107872436B true CN107872436B (en) 2020-11-24

Family

ID=61761075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610857050.0A Active CN107872436B (en) 2016-09-27 2016-09-27 Account identification method, device and system

Country Status (1)

Country Link
CN (1) CN107872436B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961034A (en) * 2018-05-25 2018-12-07 中国建设银行股份有限公司 System and method, storage medium based on user behavior certification
CN108875327A (en) 2018-05-28 2018-11-23 阿里巴巴集团控股有限公司 One seed nucleus body method and apparatus
CN108898418B (en) * 2018-05-31 2023-06-23 康键信息技术(深圳)有限公司 User account detection method, device, computer equipment and storage medium
CN109190874A (en) * 2018-07-12 2019-01-11 阿里巴巴集团控股有限公司 The implementation method and device of multiple-limb operation flow
CN109242470A (en) * 2018-08-14 2019-01-18 阿里巴巴集团控股有限公司 Personal identification method, device, equipment and computer readable storage medium
CN110046783B (en) * 2018-12-13 2023-04-28 创新先进技术有限公司 Method and device for identifying fraudulent account, electronic equipment and storage medium
CN110008980B (en) * 2019-01-02 2024-01-19 创新先进技术有限公司 Identification model generation method, identification device, identification equipment and storage medium
CN110399538B (en) * 2019-07-26 2023-11-07 腾讯科技(深圳)有限公司 User account identification method and related equipment
CN110532957B (en) * 2019-08-30 2021-05-07 北京市商汤科技开发有限公司 Face recognition method and device, electronic equipment and storage medium
CN112667982A (en) * 2019-10-16 2021-04-16 吴昌宇 Fingerprint and face identification monitoring system
CN110995655B (en) * 2019-11-06 2022-08-23 国网浙江武义县供电有限公司 Method and device for monitoring corresponding relation between personnel and equipment behaviors
CN112380561A (en) * 2020-05-10 2021-02-19 蔡萍萍 Data encryption method and data encryption system based on E-commerce live broadcast platform
CN111708995A (en) * 2020-06-12 2020-09-25 中国建设银行股份有限公司 Service processing method, device and equipment
CN111652626B (en) * 2020-06-18 2023-03-24 支付宝(杭州)信息技术有限公司 Method and device for realizing service
CN111784468B (en) * 2020-07-01 2022-11-18 支付宝(杭州)信息技术有限公司 Account association method and device and electronic equipment
CN111784355B (en) * 2020-07-17 2023-03-10 支付宝(杭州)信息技术有限公司 Transaction security verification method and device based on edge calculation
CN112115455B (en) * 2020-09-28 2023-10-24 中国银行股份有限公司 Method, device, server and medium for setting association relation of multiple user accounts
CN114531252B (en) * 2020-10-30 2024-05-10 中国电信股份有限公司 Security audit method and security audit device for account log
CN113011889B (en) * 2021-03-10 2023-09-15 腾讯科技(深圳)有限公司 Account anomaly identification method, system, device, equipment and medium
FR3138955A1 (en) * 2022-08-19 2024-02-23 Worldline Behavioral biometric authentication method and device
CN116451201A (en) * 2023-03-14 2023-07-18 电子科技大学 Mobile communication identity authentication method and system based on artificial intelligence

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544289A (en) * 2013-10-28 2014-01-29 公安部第三研究所 Feature extraction achieving method based on deploy and control data mining
CN104426884A (en) * 2013-09-03 2015-03-18 深圳市腾讯计算机系统有限公司 Method for authenticating identity and device for authenticating identity
CN104574192A (en) * 2013-10-25 2015-04-29 华为技术有限公司 Method and device for identifying same user from multiple social networks
CN104660594A (en) * 2015-02-09 2015-05-27 中国科学院信息工程研究所 Method for identifying virtual malicious nodes and virtual malicious node network in social networks
CN104811428A (en) * 2014-01-28 2015-07-29 阿里巴巴集团控股有限公司 Method, device and system for verifying client identity by social relation data
CN104917643A (en) * 2014-03-11 2015-09-16 腾讯科技(深圳)有限公司 Abnormal account detection method and device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7953814B1 (en) * 2005-02-28 2011-05-31 Mcafee, Inc. Stopping and remediating outbound messaging abuse
US8769684B2 (en) * 2008-12-02 2014-07-01 The Trustees Of Columbia University In The City Of New York Methods, systems, and media for masquerade attack detection by monitoring computer user behavior

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104426884A (en) * 2013-09-03 2015-03-18 深圳市腾讯计算机系统有限公司 Method for authenticating identity and device for authenticating identity
CN104574192A (en) * 2013-10-25 2015-04-29 华为技术有限公司 Method and device for identifying same user from multiple social networks
CN103544289A (en) * 2013-10-28 2014-01-29 公安部第三研究所 Feature extraction achieving method based on deploy and control data mining
CN104811428A (en) * 2014-01-28 2015-07-29 阿里巴巴集团控股有限公司 Method, device and system for verifying client identity by social relation data
CN104917643A (en) * 2014-03-11 2015-09-16 腾讯科技(深圳)有限公司 Abnormal account detection method and device
CN104660594A (en) * 2015-02-09 2015-05-27 中国科学院信息工程研究所 Method for identifying virtual malicious nodes and virtual malicious node network in social networks

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"Identifying user behavior on Twitter based on multi-scale entropy";S. He, H. Wang and Z. H. Jiang;《Proceedings 2014 IEEE International Conference on Security, Pattern Analysis, and Cybernetics (SPAC), Wuhan, 2014》;20141231;第381-384页 *
"社交网络账号的马甲关系辨识方法";樊茜,许洪波,梁英;《中文信息学报》;20141130;第28卷(第6期);第162-167页 *

Also Published As

Publication number Publication date
CN107872436A (en) 2018-04-03

Similar Documents

Publication Publication Date Title
CN107872436B (en) Account identification method, device and system
CN110399925B (en) Account risk identification method, device and storage medium
Halvaiee et al. A novel model for credit card fraud detection using Artificial Immune Systems
CN105590055B (en) Method and device for identifying user credible behaviors in network interaction system
CN111784348B (en) Account risk identification method and device
US20140351109A1 (en) Method and apparatus for automatically identifying a fraudulent order
CN112837069B (en) Block chain and big data based secure payment method and cloud platform system
CN108932582B (en) Risk information determination method and device, computer equipment and storage medium
CN111291015B (en) User behavior abnormity detection method and device
CN112231570B (en) Recommendation system support attack detection method, device, equipment and storage medium
CN109726556A (en) The near line cluster of entity attribute in anti-abuse infrastructure and propagation
CN108961019B (en) User account detection method and device
CN112801670B (en) Risk assessment method and device for payment operation
CN110659807B (en) Risk user identification method and device based on link
US20170140023A1 (en) Techniques for Determining Whether to Associate New User Information with an Existing User
CN110570188A (en) Method and system for processing transaction requests
Gabryel et al. Browser fingerprint coding methods increasing the effectiveness of user identification in the web traffic
CN114693192A (en) Wind control decision method and device, computer equipment and storage medium
CN113468520A (en) Data intrusion detection method applied to block chain service and big data server
CN112749973A (en) Authority management method and device and computer readable storage medium
CN112488163A (en) Abnormal account identification method and device, computer equipment and storage medium
CN112347457A (en) Abnormal account detection method and device, computer equipment and storage medium
US20210182710A1 (en) Method and system of user identification by a sequence of opened user interface windows
RU2745362C1 (en) System and method of generating individual content for service user
CN114386488A (en) User category identification method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant