WO2016054988A1 - 识别智能设备用户的方法和装置 - Google Patents

识别智能设备用户的方法和装置 Download PDF

Info

Publication number
WO2016054988A1
WO2016054988A1 PCT/CN2015/091226 CN2015091226W WO2016054988A1 WO 2016054988 A1 WO2016054988 A1 WO 2016054988A1 CN 2015091226 W CN2015091226 W CN 2015091226W WO 2016054988 A1 WO2016054988 A1 WO 2016054988A1
Authority
WO
WIPO (PCT)
Prior art keywords
variable
smart device
behavior data
network behavior
user
Prior art date
Application number
PCT/CN2015/091226
Other languages
English (en)
French (fr)
Inventor
沈进东
Original Assignee
阿里巴巴集团控股有限公司
沈进东
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司, 沈进东 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2016054988A1 publication Critical patent/WO2016054988A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a method and apparatus for identifying a user of a smart device.
  • the user can perform network communication through the client device, and the client device can include a non-smart device and a smart device, a non-smart device such as a personal computer (PC), a smart device such as a smart phone or a tablet computer.
  • a non-smart device such as a personal computer (PC)
  • a smart device such as a smart phone or a tablet computer.
  • the method for identifying the smart device user is mainly based on the existing user access log including the smart device model, and the smart device user is identified according to the smart device model recorded in the user access log.
  • the smart device user does not use the smart device for network access, the smart device information is not recorded in the user access log. In this case, the smart device user cannot be identified in the above manner, and the recognition effect of the above manner is not satisfactory.
  • the present application aims to solve at least one of the technical problems in the related art to some extent.
  • an object of the present application is to propose a method for identifying a user of a smart device, which can improve the recognition effect of the user of the smart device.
  • Another object of the present application is to propose a device for identifying a smart device.
  • a method for identifying a user of a smart device includes: extracting network behavior data of a user, determining a feature variable in the network behavior data; and acquiring a first variable value and a number a second variable value, the first variable value comprising a variable value of the characteristic variable of the device user to be detected, the second variable value comprising a variable value of the characteristic variable of the predetermined positive sample; Distance information between a variable value and the second variable value; identifying the smart device user based on the distance information.
  • the method for identifying a user of the smart device proposed by the embodiment of the first aspect of the present application, by extracting the network behavior data of the user, determining a feature variable in the network behavior data, and calculating a variable value of the feature variable of the device user to be detected, and The distance information between the variable values of the characteristic variables of the positive sample is determined, and the smart device user is identified according to the distance information, and the smart device user identification based on the user's network behavior data can be implemented, which is not dependent on the embodiment.
  • the user accesses the smart device information in the log. Therefore, when there is no smart device information in the user access log, the smart device user can still be identified, thereby improving the recognition effect.
  • the device for identifying a user of the smart device includes: a determining module, configured to extract network behavior data of the user, and determine a feature variable in the network behavior data; And a first variable value including a variable value of the characteristic variable of the device user to be detected, the second variable value including the predetermined positive sample a variable value of the characteristic variable; a calculation module, configured to calculate distance information between the first variable value and the second variable value; and an identification module, configured to identify the smart device user according to the distance information.
  • the device for identifying a user of the smart device proposed by the embodiment of the second aspect of the present application, by extracting network behavior data of the user, determining a feature variable in the network behavior data, and calculating a variable value of the feature variable of the device user to be detected, and The distance information between the variable values of the characteristic variables of the positive sample is determined, and the smart device user is identified according to the distance information, and the smart device user identification based on the user's network behavior data can be implemented, because the embodiment does not depend on the user access log. Intelligent device information, therefore, when there is no smart device information in the user access log, the smart device user can still be identified, thereby improving the recognition effect.
  • FIG. 1 is a schematic flowchart of a method for identifying a user of a smart device according to an embodiment of the present application
  • FIG. 2 is a schematic flowchart of a method for identifying a user of a smart device according to another embodiment of the present application
  • FIG. 3 is a schematic structural diagram of an apparatus for identifying a user of a smart device according to another embodiment of the present application.
  • FIG. 4 is a schematic structural diagram of an apparatus for identifying a user of a smart device according to another embodiment of the present application.
  • FIG. 1 is a schematic flowchart of a method for identifying a user of a smart device according to an embodiment of the present application, where the method includes:
  • S11 Extracting network behavior data of the user, and determining a feature variable in the network behavior data.
  • the user's network behavior data includes: data when the website is logged in, for example, login time, login location, etc.; access record data, for example, product information browsed by the user; recharge data; payment behavior data; transaction data; cash withdrawal data; Register one or more of the mobile phone number, shopping behavior data, and so on.
  • the feature variable is a preset number of variables extracted from the user's network behavior data, and is a variable having a large difference between the positive sample and the negative sample.
  • the extracting the network behavior data of the user, and determining the characteristic variables in the network behavior data includes:
  • the positive sample is a known smart device user
  • the negative sample is a known non- Smart device user
  • a feature variable is determined based on the differentiated score.
  • the characteristic variable is a variable with a large difference between the positive sample and the negative sample.
  • the variable can be sorted according to the order of the difference score from large to small, and the variable of the preset number is selected as the characteristic variable in the variable after sorting. .
  • S12 Acquire a first variable value including a variable value of the characteristic variable of the device user to be detected, the second variable value including the predetermined feature of the positive sample The variable value of the variable.
  • the first variable value may be obtained from a user access log of the device user to be detected
  • the second variable value can be obtained from the user access log of the positive sample.
  • the second variable value may be obtained from a user access log used when determining a positive sample, for example, a positive sample is determined according to the beginning of 2012 to the end of 2013, and then may be based on a positive sample from early 2012 to 2013.
  • the second variable value is obtained in the user access log at the end of the year, for example, the login value of the login time is obtained.
  • the smart device or the non-smart device has consistent access behaviors in different time periods, it can also be obtained according to user access logs of other time periods, for example, also from the existing 2014
  • the user access log gets, for example, the login time of the smartphone from the beginning of 2012 to the end of 2013 is usually morning, then the login time of the smartphone in 2014 is still usually in the morning.
  • the predetermined positive sample refers to a preset number of smart device users selected from known smart device users.
  • the known smart device users may be smart device users within a preset time period, and may be randomly selected when selected.
  • the smart device user and the non-intelligent device user may be determined according to historical data acquired by the server in advance, and the preset number of users are randomly selected from the smart device user to determine a positive sample, and the non-intelligent device user is determined.
  • the user who randomly selects the preset number is determined to be a negative sample.
  • the smart device is a smart phone
  • users who have had a smart phone access record from the beginning of 2012 to the end of 2013 can be selected.
  • These users are known smart device users, and then they can Among the known smart device users, 500,000 users were randomly selected as positive samples.
  • users who have not had a smartphone access record before the end of 2013 can be screened out, and users who have had a smartphone access record after the end of 2013 can be regarded as non-intelligent device users by the end of 2013. Randomly select 500,000 users from these non-intelligent device users as negative samples.
  • a center value may be determined according to a variable value of the characteristic variable of each positive sample, and a distance value between the first variable value and the center value may be calculated.
  • the device to be tested is X
  • the positive sample includes Y1, Y2, Y3, and the characteristic variable is A, B, C
  • Y1 (A, B, C), Y2 (A, B, C), Y3 (A) can be calculated.
  • the center point of B, C) assuming that the center point is O(A, B, C), and then calculate the spatial distance between X(A, B, C) and O(A, B, C) to obtain the distance value.
  • the step may include:
  • the normalized score value is greater than a preset threshold, it is determined that the device user to be detected is a smart device user.
  • the preset threshold is determined according to specific business requirements. For example, the score value is normalized to 0-10, the preset threshold is, for example, 6, and if the score value is greater than 6, it can be determined as a smart device user.
  • the smart device user by extracting the network behavior data of the user, determining the feature variable in the network behavior data, calculating a variable value of the characteristic variable of the device user to be detected, and a variable value of the characteristic variable of the predetermined positive sample.
  • the distance information is used to identify the smart device user according to the distance information, and the smart device user identification based on the user's network behavior data can be implemented. Since the embodiment does not rely on the smart device information in the user access log, the user does not access the log. When the smart device information is used, the smart device user can still be identified, thereby improving the recognition effect.
  • FIG. 2 is a schematic flowchart of a method for identifying a user of a smart device according to an embodiment of the present application, where the method includes:
  • the positive sample refers to a known smart device user
  • the negative sample is a known non-smart device user.
  • the smart device user and the non-smart device user may be determined according to historical data acquired by the server in advance. Specifically, the device information and the access time information may be included in the history data.
  • the access time information may be preset within a preset time period before the preset time point, and the device information is a user of the smart device information, and is determined to be a smart device user; the access time information is after the preset time point, the device information is Intelligent device information, and the device information is non-intelligent device information before the preset time point, and is determined to be a non-smart device user.
  • the preset time point is the beginning of 2014.
  • the preset time period is from the beginning of 2012 to the end of 2013.
  • Users who have had smart device access records from the beginning of 2012 to the end of 2013 can be regarded as smart device users, but there is no intelligence before 2014.
  • Users who have access to the device and have had a smart device access record after 2014 can be considered a non-smart device user.
  • the user who randomly selects a preset number from the smart device user is determined to be a positive sample, and the user who randomly selects a preset number from the non-smart device users is determined to be a negative sample.
  • the preset number is, for example, 500,000.
  • S22 Acquire network behavior data of a positive sample and network behavior data of a negative sample.
  • the network behavior data of the positive sample and the network behavior data of the negative sample in the preset time period before the preset time point may be acquired.
  • the network behavior data may include: selecting a preset number of network behavior variables, and then obtaining data of the selected network behavior variable, and the network behavior variables may include website login, access record, recharge, payment, transaction, withdrawal, registration of the mobile phone number, Shopping behavior, etc.
  • the preset number is 130, for example, 130 variables can be selected among the above network behavior variables, and the data of the corresponding variable is obtained as the network behavior data.
  • the preset correlation algorithm you can choose randomly, or select the most relevant variable according to the preset correlation algorithm.
  • S23 Determine the feature variable according to the network behavior data of the positive sample and the network behavior data of the negative sample.
  • the network behavior data of the positive sample and the network behavior data of the negative sample may be differentially calculated, the differentiated score of each variable in the network behavior data is obtained, and the characteristic variable is determined according to the differentiated score.
  • the algorithm used in the differential calculation may include: a Population Stability Index (PSI) algorithm and/or an Effective Size (ES) algorithm.
  • PSI Population Stability Index
  • ES Effective Size
  • the PSI algorithm and/or the ES algorithm can be executed in a Sequence Retrieval System (SRS).
  • SRS Sequence Retrieval System
  • determining the feature variable according to the differentiated score includes:
  • a preset number of variables are selected among the first group of variables and the second group of variables, and are determined as feature variables.
  • 30 variables with large differences can be obtained as the first group of variables, according to the ES algorithm, From the variables of ES>20, 30 variables with larger differences were selected as the second group of variables. After that, 20 repeating variables are selected among the first set of variables and the second set of variables. When the number of repeated variables is less than 20, the variables with larger differences can be selected according to the difference values, and finally 20 characteristic variables are obtained.
  • S24 Obtain a first variable value including a variable value of the characteristic variable of the device user to be detected, and a second variable value, the second variable value including the predetermined feature of the positive sample The variable value of the variable.
  • variable value of the feature variable can be obtained from the user access log of the device user to be detected, and the first variable value is obtained; and the variable value of the feature variable can be obtained from the user access log of the positive sample. Get the second variable value.
  • the center value may be determined according to the variable value of the characteristic variable of each positive sample, and the distance value between the first variable value and the center value may be calculated.
  • the device user to be detected is X
  • the positive sample includes Y1, Y2, Y3, and the characteristic variable is A, B, C
  • Y1 (A, B, C), Y2 (A, B, C), Y3 can be calculated.
  • the center point of (A, B, C) assuming the center point O (A, B, C), and then calculating the spatial distance between X (A, B, C) and O (A, B, C), and determining the distance value.
  • the distance value may be determined as a score value; the score value is normalized to obtain a normalized score value; when the normalized score value is greater than a preset threshold, it is determined that the device user to be detected is a smart device user
  • the preset threshold is determined according to specific business requirements.
  • the score value is normalized to 0-10, and the preset threshold value is, for example, 6. If the score of the device user to be detected is greater than 6, the device user to be detected can be determined. For smart device users.
  • a dedicated data processing, statistical computing language (SAS language), HADOOP based hiveSql implementation can be used.
  • the smart device user by extracting the network behavior data of the user, determining the feature variable in the network behavior data, calculating a variable value of the characteristic variable of the device user to be detected, and a variable value of the characteristic variable of the predetermined positive sample.
  • the distance information is used to identify the smart device user according to the distance information, and the smart device user identification based on the user's network behavior data can be implemented. Since the embodiment does not rely on the smart device information in the user access log, the user does not access the log. When the smart device information is used, the smart device user can still be identified, thereby improving the recognition effect.
  • different algorithms can be combined and used in the differential calculation, and more accurate feature variables can be determined to further improve the recognition effect.
  • FIG. 3 is a schematic structural diagram of an apparatus for identifying a smart device user according to another embodiment of the present application.
  • the apparatus 30 includes a determining module 31, an obtaining module 32, a calculating module 33, and an identifying module 34.
  • the determining module 31 is configured to extract network behavior data of the user, and determine feature changes in the network behavior data. the amount.
  • the user's network behavior data includes: data when the website is logged in, for example, login time, login location, etc.; access record data, for example, product information browsed by the user; recharge data; payment behavior data; transaction data; cash withdrawal data; Register one or more of the mobile phone number, shopping behavior data, and so on.
  • the feature variable is a preset number of variables extracted from the user's network behavior data, and is a variable having a large difference between the positive sample and the negative sample.
  • the determining module 31 includes:
  • a first unit 311 configured to select the positive sample and the negative sample, and obtain network behavior data of the positive sample and network behavior data of the negative sample, where the positive sample is a known smart device user, Negative samples are known non-intelligent device users.
  • the network behavior data of the positive sample and the network behavior data of the negative sample in the preset time period before the preset time point may be acquired.
  • the network behavior data may include: selecting a preset number of network behavior variables, and then obtaining data of the selected network behavior variable, and the network behavior variables may include website login, access record, recharge, payment, transaction, withdrawal, registration of the mobile phone number, Shopping behavior, etc.
  • the preset number is 130, for example, 130 variables can be selected among the above network behavior variables, and the data of the corresponding variable is obtained as the network behavior data.
  • the preset correlation algorithm you can choose randomly, or select the most relevant variable according to the preset correlation algorithm.
  • the first unit 311 is specifically configured to determine a smart device user and a non-smart device user according to historical data acquired by the server in advance;
  • a user randomly selecting a preset number of the smart device users is determined to be a positive sample, and a user randomly selecting the preset number from the non-smart device users is determined to be a negative sample.
  • the preset number is, for example, 500,000.
  • the first unit 311 is further configured to: obtain device information and access time information from historical data acquired in advance by the server; and set the access time information within a preset time period before the preset time point, and
  • the device information is a user of the smart device information, and is determined to be a smart device user; after the preset time point, the device information is smart device information, and the device information is not before the preset time point.
  • Intelligent device information determined as a non-smart device user.
  • the preset time point is the beginning of 2014
  • the preset time period is from the beginning of 2012 to the end of 2013.
  • Users who have had smart device access records from the beginning of 2012 to the end of 2013 can be regarded as smart device users, but not before 2014.
  • a user who has accessed a smart device and has a smart device access record after 2014 can be considered a non-smart device user.
  • the first unit 311 is further configured to acquire network behavior data of the positive sample and network behavior data of the negative sample within a preset time period before the preset time point.
  • the second unit 312 performs differential calculation on the network behavior data of the positive sample and the network behavior data of the negative sample, and obtains a differentiated score of each variable in the network behavior data.
  • the algorithm used in the differential calculation may include: a Population Stability Index (PSI) algorithm and/or an Effective Size (ES) algorithm.
  • PSI Population Stability Index
  • ES Effective Size
  • the PSI algorithm and/or the ES algorithm can be executed in a Sequence Retrieval System (SRS).
  • SRS Sequence Retrieval System
  • the second unit is specifically configured to perform differential calculation on the network behavior data of the positive sample and the network behavior data of the negative sample by using a PSI algorithm and/or an ES algorithm.
  • the third unit 313 determines the feature variable according to the differentiated score.
  • the characteristic variable is a variable with a large difference between the positive sample and the negative sample.
  • the variable can be sorted according to the order of the difference score from large to small, and the variable of the preset number is selected as the characteristic variable in the variable after sorting. .
  • the third unit 313 is specifically configured to calculate a differentiated score of each variable according to the PSI algorithm, and according to the The differential scores are selected in descending order to select the first set of variables;
  • a preset number of variables are selected among the first group of variables and the second group of variables, and are determined as feature variables.
  • 30 variables with large differences can be obtained as the first group of variables.
  • 30 variables with larger differences are selected from the variables of ES>20 as the second group of variables. After that, 20 repeating variables are selected among the first set of variables and the second set of variables. When the number of repeated variables is less than 20, the variables with larger differences can be selected according to the difference values, and finally 20 characteristic variables are obtained.
  • the obtaining module 32 is configured to obtain a first variable value including a variable value of the characteristic variable of the device user to be detected, and a second variable value comprising a predetermined positive sample The variable value of the feature variable.
  • variable value of the feature variable can be obtained from the user access log of the device user to be detected, and the first variable value is obtained; and the variable value of the feature variable can be obtained from the user access log of the positive sample. Get the second variable value.
  • the predetermined positive sample refers to a preset number of smart device users selected from known smart device users.
  • the known smart device user can be a smart device user within a preset time period, and can be randomly selected when selected.
  • the smart device user and the non-intelligent device user may be determined according to historical data acquired by the server in advance, and the preset number of users are randomly selected from the smart device user to determine a positive sample, and the non-intelligent device user is determined.
  • the user who randomly selects the preset number is determined to be a negative sample.
  • the smart device is a smart phone
  • users who have had a smartphone access record from the beginning of 2012 to the end of 2013 can be selected, and these users are known smart device users, and can be known from these.
  • 500,000 users were randomly selected as positive samples.
  • users who have not had a smartphone access record before the end of 2013 can be screened out, and users who have had a smartphone access record after the end of 2013 can be regarded as non-intelligent device users by the end of 2013. Randomly select 500,000 users from these non-intelligent device users as negative samples.
  • the calculation module 33 is configured to calculate distance information between the first variable value and the second variable value.
  • the calculating module 33 is specifically configured to determine a center value according to a variable value of the characteristic variable of each positive sample, and calculate a distance value between the first variable value and the center value.
  • the device to be tested is X
  • the positive sample includes Y1, Y2, Y3, and the characteristic variable is A, B, C
  • Y1 (A, B, C), Y2 (A, B, C), Y3 (A) can be calculated.
  • the center point of B, C) assuming that the center point is O(A, B, C), and then calculate the spatial distance between X(A, B, C) and O(A, B, C) to obtain the distance value.
  • the calculating module 33 is further configured to determine a center value by using a minimum distance algorithm for a variable value of the characteristic variable of each positive sample.
  • the identification module 34 is configured to identify the smart device user according to the distance information.
  • the identification module 34 is specifically configured to determine the distance value as a score value; normalize the score value to obtain a normalized score value; when the normalization When the subsequent score value is greater than the preset threshold, it is determined that the device user to be detected is a smart device user.
  • the preset threshold is determined according to specific business requirements. For example, the score value is normalized to 0-10, the preset threshold is, for example, 6, and if the score value is greater than 6, it can be determined as a smart device user.
  • the smart device user by extracting the network behavior data of the user, determining the feature variable in the network behavior data, calculating a variable value of the characteristic variable of the device user to be detected, and a variable value of the characteristic variable of the predetermined positive sample.
  • the distance information is used to identify the smart device user according to the distance information, and the smart device user identification based on the user's network behavior data can be implemented. Since the embodiment does not rely on the smart device information in the user access log, the user does not access the log. When the smart device information is used, the smart device user can still be identified, thereby improving the recognition effect.
  • portions of the application can be implemented in hardware, software, firmware, or a combination thereof.
  • multiple steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system.
  • a suitable instruction execution system For example, if implemented in hardware, as in another embodiment, it can be implemented by any one or combination of the following techniques well known in the art: having logic gates for implementing logic functions on data signals. Discrete logic circuits, application specific integrated circuits with suitable combinational logic gates, programmable gate arrays (PGAs), field programmable gate arrays (FPGAs), etc.
  • each functional unit in each embodiment of the present application may be integrated into one processing module, or each unit may exist physically separately, or two or more units may be integrated into one module.
  • the above integrated modules can be implemented in the form of hardware or in the form of software functional modules.
  • the integrated modules, if implemented in the form of software functional modules and sold or used as stand-alone products, may also be stored in a computer readable storage medium.
  • the above mentioned storage medium may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Telephonic Communication Services (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种识别智能设备用户的方法和装置,该识别智能设备用户的方法包括对用户的网络行为数据进行提取,确定所述网络行为数据中的特征变量(S11);获取第一变量值和第二变量值,所述第一变量值包括待检测的设备用户的所述特征变量的变量值,所述第二变量值包括预先确定的正样本的所述特征变量的变量值(S12);计算所述第一变量值与所述第二变量值之间的距离信息(S13);根据所述距离信息,识别智能设备用户(S14)。该方法能够提高智能设备用户的识别效果。

Description

识别智能设备用户的方法和装置 技术领域
本申请涉及数据处理技术领域,尤其涉及一种识别智能设备用户的方法和装置。
背景技术
用户可以通过客户端设备进行网络通信,客户端设备可以包括非智能设备和智能设备,非智能设备例如个人电脑(Personal Computer,PC),智能设备例如智能手机或者平板电脑等。由于业务需要,一些情况下需要识别用户是智能设备用户还是非智能设备用户,以便进行相应的业务引导,例如,对于智能设备用户,可以引导用户进行话费充值或者进行短信营销等。
相关技术中,识别智能设备用户的方式主要是基于已有的包含智能设备型号的用户访问日志,根据用户访问日志中记录的智能设备型号识别出智能设备用户。
但是,当智能设备用户没有采用智能设备进行网络访问时,在用户访问日志中不会记录智能设备信息,此时采用上述的方式不能识别出智能设备用户,导致上述方式的识别效果不理想。
发明内容
本申请旨在至少在一定程度上解决相关技术中的技术问题之一。
为此,本申请的一个目的在于提出一种识别智能设备用户的方法,该方法能够提高智能设备用户的识别效果。
本申请的另一个目的在于提出一种识别智能设备用户装置。
为达到上述目的,本申请第一方面实施例提出的识别智能设备用户的方法,包括:对用户的网络行为数据进行提取,确定所述网络行为数据中的特征变量;获取第一变量值和第二变量值,所述第一变量值包括待检测的设备用户的所述特征变量的变量值,所述第二变量值包括预先确定的正样本的所述特征变量的变量值;计算所述第一变量值与所述第二变量值之间的距离信息;根据所述距离信息,识别智能设备用户。
本申请第一方面实施例提出的识别智能设备用户的方法,通过对用户的网络行为数据进行提取,确定网络行为数据中的特征变量,计算待检测的设备用户的特征变量的变量值,与预先确定的正样本的特征变量的变量值之间的距离信息,根据该距离信息,识别智能设备用户,可以实现基于用户的网络行为数据的智能设备用户识别,由于本实施例不依赖用 户访问日志中的智能设备信息,因此,当用户访问日志中没有智能设备信息时,依然可以识别出智能设备用户,从而提高识别效果。
为达到上述目的,本申请第二方面实施例提出的识别智能设备用户的装置,包括:确定模块,用于对用户的网络行为数据进行提取,确定所述网络行为数据中的特征变量;获取模块,用于获取第一变量值和第二变量值,所述第一变量值包括待检测的设备用户的所述特征变量的变量值,所述第二变量值包括预先确定的正样本的所述特征变量的变量值;计算模块,用于计算所述第一变量值与所述第二变量值之间的距离信息;识别模块,用于根据所述距离信息,识别智能设备用户。
本申请第二方面实施例提出的识别智能设备用户的装置,通过对用户的网络行为数据进行提取,确定网络行为数据中的特征变量,计算待检测的设备用户的特征变量的变量值,与预先确定的正样本的特征变量的变量值之间的距离信息,根据该距离信息,识别智能设备用户,可以实现基于用户的网络行为数据的智能设备用户识别,由于本实施例不依赖用户访问日志中的智能设备信息,因此,当用户访问日志中没有智能设备信息时,依然可以识别出智能设备用户,从而提高识别效果。
本申请附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本申请的实践了解到。
附图说明
本申请上述的和/或附加的方面和优点从下面结合附图对实施例的描述中将变得明显和容易理解,其中:
图1是本申请一实施例提出的识别智能设备用户的方法的流程示意图;
图2是本申请另一实施例提出的识别智能设备用户的方法的流程示意图;
图3是本申请另一实施例提出的识别智能设备用户的装置的结构示意图;
图4是本申请另一实施例提出的识别智能设备用户的装置的结构示意图。
具体实施方式
下面详细描述本申请的实施例,所述实施例的示例在附图中示出,其中自始至终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。下面通过参考附图描述的实施例是示例性的,仅用于解释本申请,而不能理解为对本申请的限制。相反,本申请的实施例包括落入所附加权利要求书的精神和内涵范围内的所有变化、修改和等同物。
图1是本申请一实施例提出的识别智能设备用户的方法的流程示意图,该方法包括:
S11:对用户的网络行为数据进行提取,确定所述网络行为数据中的特征变量。
其中,用户的网络行为数据包括:网站登录时的数据,例如,登录时间,登录地点等;访问记录数据,例如,用户浏览的商品信息等;充值数据;支付行为数据;交易数据;提现数据;注册手机号、购物行为数据等中的一项或者多项。
特征变量是从用户的网络行为数据中提取出的预设个数的变量,是正样本和负样本差异较大的变量。
具体的,所述对用户的网络行为数据进行提取,确定所述网络行为数据中的特征变量,包括:
选取所述正样本和负样本,并获取所述正样本的网络行为数据和所述负样本的网络行为数据,所述正样本是已知的智能设备用户,所述负样本是已知的非智能设备用户;
对所述正样本的网络行为数据和所述负样本的网络行为数据进行差异化计算,获取所述网络行为数据中每个变量的差异化得分;
根据所述差异化得分,确定特征变量。
其中,特征变量是正样本和负样本具有较大差异的变量,例如,可以按照差异化得分从大到小的顺序对变量进行排序,在排序后变量中依次选择预设个数的变量作为特征变量。
S12:获取第一变量值和第二变量值,所述第一变量值包括待检测的设备用户的所述特征变量的变量值,所述第二变量值包括预先确定的正样本的所述特征变量的变量值。
其中,从待检测的设备用户的用户访问日志中可以获取第一变量值;
从正样本的用户访问日志中可以获取第二变量值。
具体的,对于正样本,第二变量值可以是从确定正样本时采用的用户访问日志中获取的,例如,根据2012年初至2013年底确定出正样本,之后可以根据正样本在2012年初至2013年底的用户访问日志中获取第二变量值,例如获取登录时间的登录值等。当然,可以理解的是,由于智能设备或非智能设备在不同的时间段会具有一致的访问行为,因此也可以根据其他时间段的用户访问日志获取,例如,也可以从已经存在的2014年的用户访问日志中获取,例如,智能手机在2012年初至2013年底的登录时间通常是早上,那么该智能手机在2014年的登录时间依然通常会是在早上。
预先确定的正样本是指从已知的智能设备用户中选取的预设个数的智能设备用户,已知的智能设备用户可以是预设时间段内的智能设备用户,选取时可以随机选择。具体的,可以根据服务端预先获取的历史数据,确定智能设备用户和非智能设备用户,从所述智能设备用户中随机选取预设个数的用户确定为正样本,从所述非智能设备用户中随机选择所述预设个数的用户确定为负样本。
以智能设备是智能手机为例,例如,根据用户访问日志,可以筛选出2012年初至2013年底有过智能手机访问记录的用户,这些用户是已知的智能设备用户,之后可以从这些已 知的智能设备用户中随机挑取50万个用户作为正样本。
根据用户访问日志,可以筛选出2013年底之前没有过智能手机访问记录的用户,而2013年底之后有过智能手机访问记录的用户,这些用户在2013年底之间可以看作非智能设备用户,之后可以从这些非智能设备用户中随机选取50万个用户作为负样本。
S13:计算所述第一变量值与所述第二变量值之间的距离信息。
具体地,可以根据每个正样本的所述特征变量的变量值,确定中心值,计算所述第一变量值与所述中心值之间的距离值。
例如,待检测设备是X,正样本包括Y1,Y2,Y3,特征变量是A,B,C,则可以计算Y1(A,B,C),Y2(A,B,C),Y3(A,B,C)的中心点,假设中心点是O(A,B,C),进而计算X(A,B,C)与O(A,B,C)的空间距离,得到距离值。
S14:根据所述距离信息,识别智能设备用户。
具体的,该步骤可以包括:
将所述距离值确定为评分值;
对所述评分值进行归一化处理,得到归一化后的评分值;
当所述归一化后的评分值大于预设阈值时,确定所述待检测的设备用户是智能设备用户。
其中,预设阈值根据具体的业务需求来确定。例如,将评分值归一化到0-10之内,预设阈值例如6,如果评分值大于6,则可以确定为智能设备用户。
本实施例通过对用户的网络行为数据进行提取,确定网络行为数据中的特征变量,计算待检测的设备用户的特征变量的变量值,与预先确定的正样本的特征变量的变量值之间的距离信息,根据该距离信息,识别智能设备用户,可以实现基于用户的网络行为数据的智能设备用户识别,由于本实施例不依赖用户访问日志中的智能设备信息,因此,当用户访问日志中没有智能设备信息时,依然可以识别出智能设备用户,从而提高识别效果。
图2是本申请一实施例提出的识别智能设备用户的方法的流程示意图,该方法包括:
S21:获取正样本和负样本。
其中,正样本指已知的智能设备用户,负样本是已知的非智能设备用户。
可以根据服务端预先获取的历史数据,确定智能设备用户和非智能设备用户。具体地,历史数据中可以包括设备信息和访问时间信息。
可以将访问时间信息在预设时间点之前预设时间段内,且设备信息是智能设备信息的用户,确定为智能设备用户;将访问时间信息在所述预设时间点之后所述设备信息是智能设备信息,且在所述预设时间点之前所述设备信息是非智能设备信息,确定为非智能设备用户。
例如,预设时间点是2014年年初,预设时间段是2012年初到2013年底,当2012年初到2013年底有过智能设备访问记录的用户可以看作智能设备用户,而2014年之前没有过智能设备访问记录而2014年之后有过智能设备访问记录的用户可以看作非智能设备用户。
之后,可以从智能设备用户中随机选取预设个数的用户确定为正样本,从非智能设备用户中随机选择预设个数的用户确定为负样本。预设个数例如50万个。
S22:获取正样本的网络行为数据和负样本的网络行为数据。
具体地,可以获取预设时间点之前预设时间段内,正样本的网络行为数据和负样本的网络行为数据。
例如,获取正样本在2012年初到2013年底的网络行为数据,以及负样本在2012年初到2013年底的网络行为数据。
网络行为数据可以包括:选取预设个数的网络行为变量,之后获取该选取的网络行为变量的数据,网络行为变量可以包括网站登录,访问记录,充值,支付,交易,提现,注册手机号,购物行为等。
预设个数例如为130个,则可以在上述的网络行为变量中选择130个变量,并获取相应变量的数据作为网络行为数据。选择时,可以随机选择,或者根据预设相关性算法,选择最相关的变量。
S23:根据正样本的网络行为数据和负样本的网络行为数据,确定特征变量。
其中,可以对正样本的网络行为数据和负样本的网络行为数据进行差异化计算,获取网络行为数据中每个变量的差异化得分,并根据差异化得分,确定特征变量。
其中,差异化计算采用的算法可以包括:群体稳定性指标(Population Stability Index,PSI)算法和/或有效距离(Effective Size,ES)算法。PSI算法和/或ES算法可以在信息检索系统(Sequence Retrieval System,SRS)中执行。
具体的,当采用PSI算法和ES算法进行所述差异化计算时,所述根据所述差异化得分,确定特征变量,包括:
根据所述PSI算法,计算所述每个变量的差异化得分,并根据所述差异化得分降序选择第一组变量;
根据所述ES算法,计算所述每个变量的差异化得分,并根据所述差异化得分降序选择第二组变量;
按照重复变量、差异化得分的选择顺序,在所述第一组变量和所述第二组变量中选择预设个数的变量,确定为特征变量。
例如,根据PSI算法可以得到差异较大的30个变量作为第一组变量,根据ES算法, 从ES>20的变量选择差异较大的30个变量作为第二组变量。之后,在第一组变量和第二组变量中选择20个重复变量,当重复变量不够20个时,可以根据差异值选择差异较大的变量,最终得到20个特征变量。
S24:获取第一变量值和第二变量值,所述第一变量值包括待检测的设备用户的所述特征变量的变量值,所述第二变量值包括预先确定的正样本的所述特征变量的变量值。
其中,在确定出特征变量后,从待检测的设备用户的用户访问日志中可以获取特征变量的变量值,得到第一变量值;从正样本的用户访问日志中可以获取特征变量的变量值,得到第二变量值。
S25:计算所述第一变量值与所述第二变量值之间的距离信息。
具体地,可以根据每个正样本的特征变量的变量值,确定中心值,计算第一变量值与中心值之间的距离值。
例如,待检测的设备用户是X,正样本包括Y1,Y2,Y3,特征变量是A,B,C,则可以计算Y1(A,B,C),Y2(A,B,C),Y3(A,B,C)的中心点,假设中心点O(A,B,C),进而计算X(A,B,C)与O(A,B,C)的空间距离,确定距离值。
S26:根据所述距离信息,识别智能设备用户。
可以将距离值确定为评分值;对评分值进行归一化处理,得到归一化后的评分值;当归一化后的评分值大于预设阈值时,确定待检测的设备用户是智能设备用户,其中,预设阈值根据具体的业务需求来确定。
例如,在根据距离值得到评分值后,将评分值归一化到0-10之内,预设阈值例如6,如果待检测的设备用户的评分值大于6,则可以确定待检测的设备用户为智能设备用户。
在实施本发明中,可以使用一种专用的数据处理、统计计算语言(SAS语言),基于HADOOP的hiveSql实现。
本实施例通过对用户的网络行为数据进行提取,确定网络行为数据中的特征变量,计算待检测的设备用户的特征变量的变量值,与预先确定的正样本的特征变量的变量值之间的距离信息,根据该距离信息,识别智能设备用户,可以实现基于用户的网络行为数据的智能设备用户识别,由于本实施例不依赖用户访问日志中的智能设备信息,因此,当用户访问日志中没有智能设备信息时,依然可以识别出智能设备用户,从而提高识别效果。本实施例在差异化计算时可以采用不同算法结合运算,可以确定出更为准确的特征变量,进一步提高识别效果。
图3是本申请另一实施例提出的识别智能设备用户的装置的结构示意图,该装置30包括确定模块31、获取模块32、计算模块33以及识别模块34。
确定模块31用于对用户的网络行为数据进行提取,确定所述网络行为数据中的特征变 量。
其中,用户的网络行为数据包括:网站登录时的数据,例如,登录时间,登录地点等;访问记录数据,例如,用户浏览的商品信息等;充值数据;支付行为数据;交易数据;提现数据;注册手机号、购物行为数据等中的一项或者多项。
特征变量是从用户的网络行为数据中提取出的预设个数的变量,是正样本和负样本差异较大的变量。
一个实施例中,参见图4,所述确定模块31包括:
第一单元311,用于选取所述正样本和负样本,并获取所述正样本的网络行为数据和所述负样本的网络行为数据,所述正样本是已知的智能设备用户,所述负样本是已知的非智能设备用户。
具体地,可以获取预设时间点之前预设时间段内,正样本的网络行为数据和负样本的网络行为数据。
例如,获取正样本在2012年初到2013年底的网络行为数据,以及负样本在2012年初到2013年底的网络行为数据。
网络行为数据可以包括:选取预设个数的网络行为变量,之后获取该选取的网络行为变量的数据,网络行为变量可以包括网站登录,访问记录,充值,支付,交易,提现,注册手机号,购物行为等。
预设个数例如为130个,则可以在上述的网络行为变量中选择130个变量,并获取相应变量的数据作为网络行为数据。选择时,可以随机选择,或者根据预设相关性算法,选择最相关的变量。
另一个实施例中,所述第一单元311具体用于根据服务端预先获取的历史数据,确定智能设备用户和非智能设备用户;
从所述智能设备用户中随机选取预设个数的用户确定为正样本,从所述非智能设备用户中随机选择所述预设个数的用户确定为负样本。
预设个数例如50万个。
另一个实施例中,所述第一单元311进一步具体用于从服务端预先获取的历史数据中获取设备信息和访问时间信息;将访问时间信息在预设时间点之前预设时间段内,且设备信息是智能设备信息的用户,确定为智能设备用户;将访问时间信息在所述预设时间点之后所述设备信息是智能设备信息,且在所述预设时间点之前所述设备信息是非智能设备信息,确定为非智能设备用户。
例如,预设时间点是2014年年初,预设时间段是2012年初到2013年底,当2012年初到2013年底有过智能设备访问记录的用户可以看作智能设备用户,而2014年之前没有 过智能设备访问记录而2014年之后有过智能设备访问记录的用户可以看作非智能设备用户。
另一个实施例中,所述第一单元311进一步具体用于获取所述预设时间点之前预设时间段内,所述正样本的网络行为数据和所述负样本的网络行为数据。
第二单元312,对所述正样本的网络行为数据和所述负样本的网络行为数据进行差异化计算,获取所述网络行为数据中每个变量的差异化得分。
其中,差异化计算采用的算法可以包括:群体稳定性指标(Population Stability Index,PSI)算法和/或有效距离(Effective Size,ES)算法。PSI算法和/或ES算法可以在信息检索系统(Sequence Retrieval System,SRS)中执行。
另一个实施例中,所述第二单元具体用于采用PSI算法和/或ES算法,对所述正样本的网络行为数据和所述负样本的网络行为数据进行差异化计算。
第三单元313,根据所述差异化得分,确定特征变量。
其中,特征变量是正样本和负样本具有较大差异的变量,例如,可以按照差异化得分从大到小的顺序对变量进行排序,在排序后变量中依次选择预设个数的变量作为特征变量。
另一个实施例中,当采用PSI算法和ES算法进行所述差异化计算时,所述第三单元313具体用于根据所述PSI算法,计算所述每个变量的差异化得分,并根据所述差异化得分降序选择第一组变量;
根据所述ES算法,计算所述每个变量的差异化得分,并根据所述差异化得分降序选择第二组变量;
按照重复变量、差异化得分的选择顺序,在所述第一组变量和所述第二组变量中选择预设个数的变量,确定为特征变量。
例如,根据PSI算法可以得到差异较大的30个变量作为第一组变量,根据ES算法,从ES>20的变量选择差异较大的30个变量作为第二组变量。之后,在第一组变量和第二组变量中选择20个重复变量,当重复变量不够20个时,可以根据差异值选择差异较大的变量,最终得到20个特征变量。
获取模块32用于获取第一变量值和第二变量值,所述第一变量值包括待检测的设备用户的所述特征变量的变量值,所述第二变量值包括预先确定的正样本的所述特征变量的变量值。
其中,在确定出特征变量后,从待检测的设备用户的用户访问日志中可以获取特征变量的变量值,得到第一变量值;从正样本的用户访问日志中可以获取特征变量的变量值,得到第二变量值。
预先确定的正样本是指从已知的智能设备用户中选取的预设个数的智能设备用户,已 知的智能设备用户可以是预设时间段内的智能设备用户,选取时可以随机选择。具体的,可以根据服务端预先获取的历史数据,确定智能设备用户和非智能设备用户,从所述智能设备用户中随机选取预设个数的用户确定为正样本,从所述非智能设备用户中随机选择所述预设个数的用户确定为负样本。
以智能设备是智能手机为例,例如,根据用户访问日志,可以筛选出2012年初至2013年底有过智能手机访问记录的用户,这些用户是已知的智能设备用户,之后可以从这些已知的智能设备用户中随机挑取50万个用户作为正样本。
根据用户访问日志,可以筛选出2013年底之前没有过智能手机访问记录的用户,而2013年底之后有过智能手机访问记录的用户,这些用户在2013年底之间可以看作非智能设备用户,之后可以从这些非智能设备用户中随机选取50万个用户作为负样本。
计算模块33用于计算所述第一变量值与所述第二变量值之间的距离信息。
另一个实施例中,所述计算模块33具体用于根据每个正样本的所述特征变量的变量值,确定中心值,计算所述第一变量值与所述中心值之间的距离值。
例如,待检测设备是X,正样本包括Y1,Y2,Y3,特征变量是A,B,C,则可以计算Y1(A,B,C),Y2(A,B,C),Y3(A,B,C)的中心点,假设中心点是O(A,B,C),进而计算X(A,B,C)与O(A,B,C)的空间距离,得到距离值。
另一个实施例中,所述计算模块33进一步具体用于对每个正样本的所述特征变量的变量值,采用最小距离算法,确定中心值。
识别模块34用于根据所述距离信息,识别智能设备用户。
另一个实施例中,所述识别模块34具体用于将所述距离值确定为评分值;对所述评分值进行归一化处理,得到归一化后的评分值;当所述归一化后的评分值大于预设阈值时,确定所述待检测的设备用户是智能设备用户。
其中,预设阈值根据具体的业务需求来确定。例如,将评分值归一化到0-10之内,预设阈值例如6,如果评分值大于6,则可以确定为智能设备用户。
本实施例通过对用户的网络行为数据进行提取,确定网络行为数据中的特征变量,计算待检测的设备用户的特征变量的变量值,与预先确定的正样本的特征变量的变量值之间的距离信息,根据该距离信息,识别智能设备用户,可以实现基于用户的网络行为数据的智能设备用户识别,由于本实施例不依赖用户访问日志中的智能设备信息,因此,当用户访问日志中没有智能设备信息时,依然可以识别出智能设备用户,从而提高识别效果。
需要说明的是,在本申请的描述中,术语“第一”、“第二”等仅用于描述目的,而不能理解为指示或暗示相对重要性。此外,在本申请的描述中,除非另有说明,“多个”的含义是两个或两个以上。
流程图中或在此以其他方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
应当理解,本申请的各部分可以用硬件、软件、固件或它们的组合来实现。在上述实施方式中,多个步骤或方法可以用存储在存储器中且由合适的指令执行系统执行的软件或固件来实现。例如,如果用硬件来实现,和在另一实施方式中一样,可用本领域公知的下列技术中的任一项或他们的组合来实现:具有用于对数据信号实现逻辑功能的逻辑门电路的离散逻辑电路,具有合适的组合逻辑门电路的专用集成电路,可编程门阵列(PGA),现场可编程门阵列(FPGA)等。
本技术领域的普通技术人员可以理解实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
此外,在本申请各个实施例中的各功能单元可以集成在一个处理模块中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。所述集成的模块如果以软件功能模块的形式实现并作为独立的产品销售或使用时,也可以存储在一个计算机可读取存储介质中。
上述提到的存储介质可以是只读存储器,磁盘或光盘等。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”、或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
尽管上面已经示出和描述了本申请的实施例,可以理解的是,上述实施例是示例性的,不能理解为对本申请的限制,本领域的普通技术人员在本申请的范围内可以对上述实施例进行变化、修改、替换和变型。

Claims (13)

  1. 一种识别智能设备用户的方法,其特征在于,包括:
    对用户的网络行为数据进行提取,确定所述网络行为数据中的特征变量;
    获取第一变量值和第二变量值,所述第一变量值包括待检测的设备用户的所述特征变量的变量值,所述第二变量值包括预先确定的正样本的所述特征变量的变量值;
    计算所述第一变量值与所述第二变量值之间的距离信息;
    根据所述距离信息,识别智能设备用户。
  2. 根据权利要求1所述的方法,其特征在于,所述对用户的网络行为数据进行提取,确定所述网络行为数据中的特征变量,包括:
    选取所述正样本和负样本,并获取所述正样本的网络行为数据和所述负样本的网络行为数据,所述正样本是已知的智能设备用户,所述负样本是已知的非智能设备用户;
    对所述正样本的网络行为数据和所述负样本的网络行为数据进行差异化计算,获取所述网络行为数据中每个变量的差异化得分;
    根据所述差异化得分,确定特征变量。
  3. 根据权利要求2所述的方法,其特征在于,所述根据所述差异化得分,确定特征变量,包括:
    根据所述差异化得分从高到低的顺序,对所述变量进行排序;
    从排序后的变量中依次选择预设个数的变量作为所述特征变量。
  4. 根据权利要求2所述的方法,其特征在于,所述对所述正样本的网络行为数据和所述负样本的网络行为数据进行差异化计算,包括:
    采用PSI算法和/或ES算法,对所述正样本的网络行为数据和所述负样本的网络行为数据进行差异化计算。
  5. 根据权利要求4所述的方法,其特征在于,当采用PSI算法和ES算法进行所述差异化计算时,所述根据所述差异化得分,确定特征变量,包括:
    根据所述PSI算法,计算所述每个变量的差异化得分,并根据所述差异化得分降序选择第一组变量;
    根据所述ES算法,计算所述每个变量的差异化得分,并根据所述差异化得分降序选择第二组变量;
    按照重复变量、差异化得分的选择顺序,在所述第一组变量和所述第二组变量中选择预设个数的变量,确定为特征变量。
  6. 根据权利要求2所述的方法,其特征在于,所述选取正样本和负样本,包括:
    根据服务端预先获取的历史数据,确定智能设备用户和非智能设备用户;
    从所述智能设备用户中随机选取预设个数的用户确定为正样本,从所述非智能设备用户中随机选择所述预设个数的用户确定为负样本。
  7. 根据权利要求6所述的方法,其特征在于,所述根据服务端预先获取的历史数据,确定智能设备用户和非智能设备用户,包括:
    从服务端预先获取的历史数据中获取设备信息和访问时间信息;
    将访问时间信息在预设时间点之前预设时间段内,且设备信息是智能设备信息的用户,确定为智能设备用户;
    将访问时间信息在所述预设时间点之后所述设备信息是智能设备信息,且在所述预设时间点之前所述设备信息是非智能设备信息,确定为非智能设备用户。
  8. 根据权利要求7所述的方法,其特征在于,所述获取所述正样本的网络行为数据和所述负样本的网络行为数据,包括:
    获取所述预设时间点之前预设时间段内,所述正样本的网络行为数据和所述负样本的网络行为数据。
  9. 根据权利要求1所述的方法,其特征在于,所述计算所述第一变量值与所述第二变量值之间的距离信息,包括:
    根据每个正样本的所述特征变量的变量值,确定中心值;
    计算所述第一变量值与所述中心值之间的距离值。
  10. 根据权利要求9所述的方法,其特征在于,所述根据每个正样本的所述特征变量的变量值,确定中心值,包括:
    对每个正样本的所述特征变量的变量值,采用最小距离算法,确定中心值。
  11. 根据权利要求1所述的方法,其特征在于,所述根据所述距离信息,识别智能设备用户,包括:
    将所述距离值确定为评分值;
    对所述评分值进行归一化处理,得到归一化后的评分值;
    当所述归一化后的评分值大于预设阈值时,确定所述待检测的设备用户是智能设备用户。
  12. 一种识别智能设备用户的装置,其特征在于,包括:
    确定模块,用于对用户的网络行为数据进行提取,确定所述网络行为数据中的特征变量;
    获取模块,用于获取第一变量值和第二变量值,所述第一变量值包括待检测的设备用户的所述特征变量的变量值,所述第二变量值包括预先确定的正样本的所述特征变量的变 量值;
    计算模块,用于计算所述第一变量值与所述第二变量值之间的距离信息;
    识别模块,用于根据所述距离信息,识别智能设备用户。
  13. 根据权利要求12所述的装置,其特征在于,所述确定模块包括:
    第一单元,用于选取所述正样本和负样本,并获取所述正样本的网络行为数据和所述负样本的网络行为数据,所述正样本是已知的智能设备用户,所述负样本是已知的非智能设备用户;
    第二单元,对所述正样本的网络行为数据和所述负样本的网络行为数据进行差异化计算,获取所述网络行为数据中每个变量的差异化得分;
    第三单元,根据所述差异化得分,确定特征变量。
PCT/CN2015/091226 2014-10-09 2015-09-30 识别智能设备用户的方法和装置 WO2016054988A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410528152.9 2014-10-09
CN201410528152.9A CN105573999B (zh) 2014-10-09 2014-10-09 识别智能设备用户的方法和装置

Publications (1)

Publication Number Publication Date
WO2016054988A1 true WO2016054988A1 (zh) 2016-04-14

Family

ID=55652584

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/091226 WO2016054988A1 (zh) 2014-10-09 2015-09-30 识别智能设备用户的方法和装置

Country Status (3)

Country Link
CN (1) CN105573999B (zh)
HK (1) HK1223712A1 (zh)
WO (1) WO2016054988A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019218927A1 (zh) * 2018-05-14 2019-11-21 新华三信息安全技术有限公司 异常用户识别

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106709318B (zh) * 2017-01-24 2019-05-03 腾云天宇科技(北京)有限公司 一种用户设备唯一性的识别方法、装置和计算设备
CN112507041B (zh) * 2021-01-29 2021-07-06 北京明略昭辉科技有限公司 设备机型识别方法及装置、电子设备、存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011198170A (ja) * 2010-03-23 2011-10-06 Oki Software Co Ltd ユーザ同定システム、ユーザ同定サーバ、携帯機器、ユーザ同定プログラム及び携帯機器のプログラム
CN102647508A (zh) * 2011-12-15 2012-08-22 中兴通讯股份有限公司 一种移动终端及用户身份识别方法
US20140040068A1 (en) * 2011-04-15 2014-02-06 Saravanan MOHAN Service Recommender System For Mobile Users
CN103761296A (zh) * 2014-01-20 2014-04-30 北京集奥聚合科技有限公司 移动终端用户网络行为分析方法及系统
CN103955637A (zh) * 2014-04-09 2014-07-30 可牛网络技术(北京)有限公司 移动终端用户身份的识别方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011198170A (ja) * 2010-03-23 2011-10-06 Oki Software Co Ltd ユーザ同定システム、ユーザ同定サーバ、携帯機器、ユーザ同定プログラム及び携帯機器のプログラム
US20140040068A1 (en) * 2011-04-15 2014-02-06 Saravanan MOHAN Service Recommender System For Mobile Users
CN102647508A (zh) * 2011-12-15 2012-08-22 中兴通讯股份有限公司 一种移动终端及用户身份识别方法
CN103761296A (zh) * 2014-01-20 2014-04-30 北京集奥聚合科技有限公司 移动终端用户网络行为分析方法及系统
CN103955637A (zh) * 2014-04-09 2014-07-30 可牛网络技术(北京)有限公司 移动终端用户身份的识别方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019218927A1 (zh) * 2018-05-14 2019-11-21 新华三信息安全技术有限公司 异常用户识别
US11671434B2 (en) 2018-05-14 2023-06-06 New H3C Security Technologies Co., Ltd. Abnormal user identification

Also Published As

Publication number Publication date
CN105573999B (zh) 2019-02-26
HK1223712A1 (zh) 2017-08-04
CN105573999A (zh) 2016-05-11

Similar Documents

Publication Publication Date Title
US10404720B2 (en) Method and system for identifying a human or machine
CN109818942B (zh) 一种基于时序特征的用户帐号异常检测方法及装置
CA2791597C (en) Biometric training and matching engine
CN105590055B (zh) 用于在网络交互系统中识别用户可信行为的方法及装置
WO2017140222A1 (zh) 机器学习模型的建模方法及装置
CN107563757B (zh) 数据风险识别的方法及装置
WO2017143934A1 (zh) 网络访问行为识别方法和装置、服务器和存储介质
CN110442712B (zh) 风险的确定方法、装置、服务器和文本审理系统
US11288674B2 (en) System, method, and computer program product for determining fraud rules
US10922206B2 (en) Systems and methods for determining performance metrics of remote relational databases
WO2017013529A1 (en) System and method for determining credit worthiness of a user
KR102085593B1 (ko) 기계학습 기법 기반 블록체인 sns 포스팅 봇 검출 방법 및 장치
TWI772287B (zh) 問題推薦方法及設備
CN111028016A (zh) 销量数据预测方法、装置以及相关设备
WO2019128311A1 (zh) 广告的相似度处理方法和装置、计算设备及存储介质
US11823197B2 (en) Authenticating based on user behavioral transaction patterns
WO2016054988A1 (zh) 识别智能设备用户的方法和装置
US10372702B2 (en) Methods and apparatus for detecting anomalies in electronic data
WO2018033052A1 (zh) 一种评估用户画像数据的方法及系统
WO2016171923A1 (en) Method and system for identifying a human or machine
US9323987B2 (en) Apparatus and method for detecting forgery/falsification of homepage
US20210182710A1 (en) Method and system of user identification by a sequence of opened user interface windows
US10540402B2 (en) Re-execution of an analytical process based on lineage metadata
US20180096081A1 (en) Relocation of an analytical process based on lineage metadata
US20210110322A1 (en) Computer Implemented Method for Detecting Peers of a Client Entity

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15849573

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15849573

Country of ref document: EP

Kind code of ref document: A1