CN107622198B - Method, apparatus, and computer-readable storage medium for implementing device fingerprinting - Google Patents

Method, apparatus, and computer-readable storage medium for implementing device fingerprinting Download PDF

Info

Publication number
CN107622198B
CN107622198B CN201710562300.2A CN201710562300A CN107622198B CN 107622198 B CN107622198 B CN 107622198B CN 201710562300 A CN201710562300 A CN 201710562300A CN 107622198 B CN107622198 B CN 107622198B
Authority
CN
China
Prior art keywords
data
array
user equipment
training
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710562300.2A
Other languages
Chinese (zh)
Other versions
CN107622198A (en
Inventor
王海君
徐翎
陈平
吴羿辰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Dianrong Information Technology Co ltd
Original Assignee
Shanghai Dianrong Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Dianrong Information Technology Co ltd filed Critical Shanghai Dianrong Information Technology Co ltd
Priority to CN201710562300.2A priority Critical patent/CN107622198B/en
Publication of CN107622198A publication Critical patent/CN107622198A/en
Application granted granted Critical
Publication of CN107622198B publication Critical patent/CN107622198B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Collating Specific Patterns (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The present disclosure relates to methods, apparatuses, and computer-readable storage media for implementing device fingerprinting. The method comprises the following steps: receiving device data about a user device and operation data for operating on the user device; and identifying the device fingerprint using a logistic regression model based on the device data and the operational data, comprising: comparing a training array containing the device data and the operational data with a pre-stored reference array for the user device; and calculating the probability that the training array and the reference array come from the same user equipment according to the comparison result and by utilizing the logistic regression model.

Description

Method, apparatus, and computer-readable storage medium for implementing device fingerprinting
Technical Field
Embodiments of the present disclosure relate generally to the field of device fingerprinting and, more particularly, to a method, apparatus, and computer-readable storage medium for implementing device fingerprinting.
Background
A device fingerprint refers to a device characteristic or unique device identification that can be used to uniquely identify a certain device (e.g., mobile phone, laptop, desktop, tablet, etc.). The device fingerprint includes some inherent, hard to tamper, unique device identification.
The application scene of the device fingerprint comprises the following steps: (1) behavior tracking: the user behavior tracking is mainly related to business, for example, a shopping website collects equipment information of a user and carries out related commodity recommendation on the user according to the equipment fingerprint information; (2) advertisement promotion: recording equipment by combining search records, browsing records and the like of a user, and pushing advertisements in a targeted manner; (3) anti-fraud: the device fingerprint plays an important role in anti-fraud wind control, safety guarantee can be provided for related services through the device fingerprint technology, for example, abnormal behaviors such as garbage registration, number stealing, warehouse collision, remote login and the like with higher risk at present can be effectively controlled through the device fingerprint identification technology.
General devices are implemented by reading a Universal Unique Identifier (UUID) interface, but the general devices have the following disadvantages: (1) the device authority is required for reading, the user is not authorized, and the server side cannot distinguish uniqueness; (2) the same device may generate many different UUIDs and fraud problems may exist.
Since device fingerprinting plays an important role in fields such as anti-fraud, there is a need to find a technique that enables reliable device fingerprinting.
Disclosure of Invention
Embodiments of the present disclosure provide a method, apparatus, and computer-readable storage medium for enabling device fingerprinting to address, at least in part, the above-mentioned and other potential problems of the prior art.
In a first aspect of the disclosure, a method for enabling device fingerprinting is provided. The method comprises the following steps: receiving device data about a user device and operation data for operating on the user device; and identifying a device fingerprint based on the device data and the operational data and using a logistic regression model, comprising: comparing a training array containing device data and operational data with a pre-stored reference array for the user device; and calculating the probability that the training array and the reference array come from the same user equipment according to the comparison result and by using a logistic regression model.
In a second aspect of the disclosure, an apparatus for enabling device fingerprinting is provided. The device includes: a processor; a memory coupled to the processor and storing instructions that, when executed by the processor, cause the processor to: receiving device data about a user device and operation data for operating on the user device; and identifying a device fingerprint based on the device data and the operational data and using a logistic regression model, comprising: comparing a training array containing device data and operational data with a pre-stored reference array for the user device; and calculating the probability that the training array and the reference array come from the same user equipment according to the comparison result and by using a logistic regression model.
In a third aspect of the present disclosure, there is provided a computer readable storage medium having computer readable program instructions stored thereon for performing the method according to the first aspect of the present disclosure.
Drawings
Embodiments of the present disclosure will now be described, by way of example only, with reference to the accompanying schematic drawings in which like reference symbols indicate like or similar elements, and in which:
FIG. 1 illustrates an environment in which a method, apparatus, and computer-readable storage medium for implementing device fingerprinting of embodiments of the present disclosure apply;
FIG. 2 shows a flow diagram of a method for implementing device fingerprinting, in accordance with an embodiment of the present disclosure;
FIG. 3 shows an example flow diagram of step 104 of FIG. 2 in accordance with an embodiment of the present disclosure;
FIG. 4 illustrates the form of a logistic regression function in accordance with an embodiment of the present disclosure; and
fig. 5 shows a schematic structural diagram of an apparatus for implementing device fingerprints according to an embodiment of the present disclosure.
Detailed Description
Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.
The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment". Relevant definitions for other terms will be given in the following description.
As shown in fig. 1, there is illustrated an environment in which the method, apparatus, and computer-readable storage medium for implementing device fingerprinting of the present disclosure applies, including one or more user devices (which may be referred to as "front-ends"), and servers (which may be referred to as "servers"). Information gathered by the user device is transmitted to the server and identified at the server to implement the device fingerprinting technique.
For example, the present disclosure includes two main aspects: (1) front-end data acquisition: after opening an App or a Web, the front end collects various device data (such as data needing user authorization or data not needing user authorization), and then transmits the collected data (such as information which can be encrypted to prevent user equipment from being leaked) to the server; the same equipment is used to ensure that each acquisition is the same as much as possible. (2) The server side performs multidimensional matching on the big data: the server generates the device fingerprint through the data acquired by the front end and stores the device fingerprint in the database. When the key API is accessed (for example, the API is an interface for accessing data of a server by user equipment, the data can be classified into different levels, such as account data, which belongs to key data that cannot be leaked, and APIs related to account login and registration all belong to the key API), whether the data are the same equipment is judged through multi-dimensional fuzzy matching, and potential risks are judged in time.
As shown in fig. 2, a flow diagram of a method for implementing device fingerprinting is shown in accordance with an embodiment of the present disclosure.
The method 100 begins with step 102 of receiving device data regarding a user device and operational data for operating on the user device.
For example, a user may trigger an upload mechanism for the device fingerprint sdk when performing a particular operation (e.g., the user is accessing a login API). The device fingerprint information (e.g., different data collected from different clients) for the operation and the token representing the current operation are uploaded. The user brings the same token when requesting from the mainapp (i.e., back-end service server). The device fingerprint server is requested by the mailapp to verify the authenticity of the token received. The authenticity check is passed and normal flow is performed (e.g., token can upload to mainapp following an API (e.g., login API) request and can check authenticity on a fingerprint server (e.g., security server)).
As an example, device data about the user equipment and operation data for operating on the user equipment may be acquired in different ways for Android and iOS, which are specifically described as follows.
When Android is adopted, the collected data can include: (1) TOKEN data and FHASH data; (2) basic data: for example, IMEI, AD _ ID, ANDROID _ ID, Version, etc.; (3) wifi data: for example, SSID, BSSID, MacAddress, etc.; (4) network data: for example, UserAgent, SimOperatORName, NetworkType, PhoneType, ActiveNetworkInfo, PhoneNumber, etc.; (5) mal data: for example, ROOT, Emulator, HOOK, MalFrame, device, OS, etc.
Here, in order to ensure that the content of the UUID read from the file system is unchanged after the App unmounts the remount, the following App mount unmount unique guarantee scheme may be employed: and generating a random character string by the UUID, storing the random character string into the file system through the NDK interface, and then only performing read operation, so that the content of the UUID read from the file system is unchanged after the app is unloaded and reloaded. For example, a UUID generated by the Android system (updated every time installation) may be stored in a file system of the device, so that after the device unloads the App, the data still exists in the file system. If the App is installed subsequently, whether the UUID exists or not is judged, if yes, the UUID is directly read without repeated generation, and therefore the server knows that the data are the data of the same equipment.
When iOS is employed, the data it collects may include: (1) platform data: for example, type, version, language, etc., (2) basic data: for example, deviceModel, idfa, idfv, etc.; (3) sdkVersion data; (4) appVersion data; (5) identifier data; (6) and (3) mal data: for example, emulator, jailbreak, etc.; (7) network data: for example, type, telco, etc.; (8) ua data; (9) fhsh data; (10) device data: such as mobile, etc.; (11) OS data: such as iOS and the like.
Here, in order to ensure that the content of the UUID read from the file system is unchanged after the App unmounts the remount, the following App mount unmount unique guarantee scheme may be employed: and generating a random character string by the UUID, storing the random character string in the Keychain Data, and then only performing read operation, so that the content of the Keychain Data is unchanged after the app is unloaded and reloaded. For example, a UUID (updated every time installation) generated by the iOS system may be saved to a file system of the device, and it is ensured that data still exists in the file system after the device unloads the App. If the App is installed subsequently, whether the UUID exists or not is judged, if yes, the UUID is directly read without repeated generation, and therefore the server knows that the data are the data of the same equipment. Furthermore, iOS is data stored by keyhide.
According to an embodiment of the present disclosure, device data may be collected at a front end, the front end including an Android (Android) front end, an iOS front end, and a Web front end; the device data comprises at least one of: hardware configuration data of the user equipment comprises the type of the user equipment and the CPU model of the user equipment; software configuration data of the user equipment comprises an operating system type, an operating system version and browser setting data of the user equipment; and network configuration data of the user equipment, including the network type, network service provider, SIM card number, MAC address of the user equipment.
According to an embodiment of the present disclosure, the operational data may comprise at least one of: the method comprises the steps that a user browses first history information of a webpage on user equipment, and second history information of applications installed on the user equipment by the user.
With continued reference to FIG. 2, the method 100 continues to step 104 by identifying a device fingerprint based on the device data and the operational data and using a logistic regression model.
An example flow of step 104 is shown in fig. 3, wherein step 104 may further include: in step 106, comparing a training array containing device data and operational data with a pre-stored reference array for the user device; and in step 108, calculating the probability that the training array and the reference array come from the same user equipment according to the comparison result and by using a logistic regression model.
In step 106, comparing a training array containing device data and operational data to a pre-stored reference array for the user device, as an example, includes: and when the data of the corresponding dimensionality of the training array and the data of the corresponding dimensionality of the reference array are the same, setting the value of the corresponding dimensionality in the training array to be 1, and when the data of the corresponding dimensionality of the training array and the data of the corresponding dimensionality of the reference array are different, setting the value of the corresponding dimensionality in the training array to be 0.
For example, all data acquired at one time may be taken as an array, which is in the form of: [0,0,0,0,0,0,0,0,1,0,0,0, 0], where 0 and 1 respectively indicate whether the currently acquired dimension and the standard dimension (i.e., a reference dimension, for example, a standard dimension may refer to a standard dimension relative to the device fingerprint obtained by finding all data related to this time from previously stored data and processing all related data) are the same or not, if the dimensions are the same, setting 1, and if not, setting 0, that is, the final model is actually used to calculate: when "n dimensions are the same as the standard dimension, m dimensions are different from the standard dimension, and m + n is the number of total acquisition dimensions", the probability (i.e., probability, which may be used, for example, to determine whether a device fingerprint is repeating a specific action, such as registration, login, etc.) of 0 (no device fingerprint is repeated) and 1 (device fingerprint is repeated) of the set of data.
In step 108, calculating the probability that the training array and the reference array are from the same user equipment using a logistic regression model according to the result of the comparison includes, as an example: using the formula P (y is 1| x; theta) is hθ(x) Calculating the probability that the training array and the reference array come from the same user equipment; wherein, P (y is 1| x; theta) represents the probability that the training array and the reference array come from the same user equipment; x is a training array.
Alternatively, the formula P (y is 0| x; θ) is 1-hθ(x) Calculating the probability that the training array and the reference array come from the same user equipment; wherein, P (y is 0| x; theta) represents the probability that the training array and the reference array are from different user equipments (i.e. the probability that the training array and the reference array are from the same user equipment is reflected from opposite angles); x is a training array.
Here, hθ(x) Is a prediction function of the form
Figure GDA0001475196010000061
Wherein, thetaTx is a boundary function when the boundary is linear, i.e.
Figure GDA0001475196010000071
Where theta is0,θ1,...θnFor the regression coefficients (e.g., approximations can be calculated by a large number of verifications); wherein g (theta)Tx) is a logistic regression function of the form
Figure GDA0001475196010000072
For example, the mathematical model used by the server is "Logistic Regression", and is used for supervised machine learning. Logistic regression function
Figure GDA0001475196010000073
The form of the 0,1 classification is represented, and the functional form thereof is shown in fig. 4.
In the case of a linear boundary, the boundary function is
Figure GDA0001475196010000074
The corresponding prediction function is
Figure GDA0001475196010000075
Where the function hθ(x) The value of (a) has a special meaning, which indicates the probability that the result takes 1, so the probability of the classification result for input x being class 1 and class 0, respectively, is:
Figure GDA0001475196010000076
for step 104, for example, the self-contained logistic regression classes in the scimit-spare standard library of Python can be used to make logistic regression models, and the data training of supervised models is performed to determine the fuzzy values of uncertain variables in the functions, and then, after the prediction results are calculated, the corresponding data can be stored and then trained as a training set to achieve more accurate prediction.
Accordingly, as an example, as shown in FIG. 3, identifying a device fingerprint based on the device data and the operational data and using a logistic regression model after step 108 may further include step 110. At this step 110, the logistic regression model may be trained using the training array to obtain updated values of the regression coefficients. This may result in more accurate regression coefficients to make the predicted values of the probabilities from the same ue obtained after using the logistic regression model more accurate.
As shown in fig. 5, an embodiment of the present disclosure provides an apparatus 200 for implementing device fingerprinting. The device includes: a processor 202; a memory 204 coupled to the processor 202 and storing instructions that, when executed by the processor 202, cause the processor 202 to perform the following acts: receiving device data about a user device and operation data for operating on the user device; and identifying a device fingerprint based on the device data and the operational data and using a logistic regression model, comprising: comparing a training array containing device data and operational data with a pre-stored reference array for the user device; and calculating the probability that the training array and the reference array come from the same user equipment according to the comparison result and by using a logistic regression model.
According to an embodiment of the present disclosure, comparing a training array containing device data and operational data with a pre-stored reference array for a user device includes: and when the data of the corresponding dimensionality of the training array and the data of the corresponding dimensionality of the reference array are the same, setting the value of the corresponding dimensionality in the training array to be 1, and when the data of the corresponding dimensionality of the training array and the data of the corresponding dimensionality of the reference array are different, setting the value of the corresponding dimensionality in the training array to be 0.
According to an embodiment of the present disclosure, calculating the probability that the training array and the reference array are from the same user equipment according to the comparison result and by using a logistic regression model comprises: using the formula P (y is 1| x; theta) is hθ(x) Calculating the probability that the training array and the reference array come from the same user equipment; whereinP (y is 1| x; theta) represents the probability that the training array and the reference array come from the same user equipment; x is a training array.
Alternatively, the formula P (y is 0| x; θ) is 1-hθ(x) Calculating the probability that the training array and the reference array come from the same user equipment; wherein, P (y is 0| x; theta) represents the probability that the training array and the reference array are from different user equipments (i.e. the probability that the training array and the reference array are from the same user equipment is reflected from opposite angles); x is a training array.
Here, hθ(x) Is a prediction function of the form
Figure GDA0001475196010000081
Wherein, thetaTx is a boundary function when the boundary is linear, i.e.
Figure GDA0001475196010000082
Where theta is0,θ1,...θnIs a regression coefficient; wherein g (theta)Tx) is a logistic regression function of the form
Figure GDA0001475196010000083
According to an embodiment of the present disclosure, identifying a device fingerprint based on the device data and the operational data and using a logistic regression model further comprises: the logistic regression model is trained using the training array to obtain updated values of the regression coefficients.
According to the embodiment of the disclosure, device data is collected at a front end, wherein the front end comprises an android front end, an iOS front end and a Web front end; the device data comprises at least one of: hardware configuration data of the user equipment comprises the type of the user equipment and the CPU model of the user equipment; software configuration data of the user equipment comprises an operating system type, an operating system version and browser setting data of the user equipment; and network configuration data of the user equipment, including the network type, network service provider, SIM card number, MAC address of the user equipment.
According to an embodiment of the disclosure, the operational data comprises at least one of: the method comprises the steps that a user browses first history information of a webpage on user equipment, and second history information of applications installed on the user equipment by the user.
Embodiments of the present disclosure also provide a computer-readable storage medium having computer-readable program instructions stored thereon for performing the method as described in fig. 2.
The present disclosure may be embodied as a system, method, and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied thereon for carrying out various aspects of the present disclosure.
The methods and functions described in this disclosure may be performed, at least in part, by one or more hardware logic components. By way of example, and not limitation, illustrative types of hardware logic components that may be used include Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.
The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device via a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.
The computer program instructions for carrying out operations of the present disclosure may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, the electronic circuitry that can execute the computer-readable program instructions implements aspects of the present disclosure by utilizing the state information of the computer-readable program instructions to personalize the electronic circuitry, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA).
Various aspects of the present disclosure are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the disclosure. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.
These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.
The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Further, while operations are depicted in a particular order, this should be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate implementations can also be implemented in combination in a single implementation. Conversely, various features that are described in the context of a single implementation can also be implemented in multiple implementations separately or in any suitable subcombination.
Some example implementations of the present disclosure are listed below.
The present disclosure may be implemented as a method for implementing device fingerprinting. The method comprises the following steps: receiving device data about a user device and operation data for operating on the user device; and identifying a device fingerprint based on the device data and the operational data and using a logistic regression model, comprising: comparing a training array containing device data and operational data with a pre-stored reference array for the user device; and calculating the probability that the training array and the reference array come from the same user equipment according to the comparison result and by using a logistic regression model.
In some embodiments, comparing the training array containing the device data and the operational data to a pre-stored reference array for the user device comprises: and when the data of the corresponding dimensionality of the training array and the data of the corresponding dimensionality of the reference array are the same, setting the value of the corresponding dimensionality in the training array to be 1, and when the data of the corresponding dimensionality of the training array and the data of the corresponding dimensionality of the reference array are different, setting the value of the corresponding dimensionality in the training array to be 0.
In some embodiments, calculating the probability that the training array and the reference array are from the same user device using a logistic regression model based on the results of the comparison comprises: using the formula P (y is 1| x; theta) is hθ(x) Calculating the probability that the training array and the reference array come from the same user equipment; wherein, P (y is 1| x; theta) represents the probability that the training array and the reference array come from the same user equipment; x is a training array; wherein h isθ(x) Is a prediction function of the form
Figure GDA0001475196010000121
Wherein, thetaTx is a boundary function when the boundary is linear, i.e.
Figure GDA0001475196010000122
Where theta is0,θ1,...θnIs a regression coefficient; wherein g (theta)Tx) is a logistic regression function of the form
Figure GDA0001475196010000123
In some embodiments, identifying the device fingerprint based on the device data and the operational data and using a logistic regression model further comprises: the logistic regression model is trained using the training array to obtain updated values of the regression coefficients.
In some embodiments, device data is collected at a front end, the front end comprising an android front end, an iOS front end, and a Web front end; the device data comprises at least one of: hardware configuration data of the user equipment comprises the type of the user equipment and the CPU model of the user equipment; software configuration data of the user equipment comprises an operating system type, an operating system version and browser setting data of the user equipment; and network configuration data of the user equipment, including the network type, network service provider, SIM card number, MAC address of the user equipment.
In some embodiments, the operational data includes at least one of: the method comprises the steps that a user browses first history information of a webpage on user equipment, and second history information of applications installed on the user equipment by the user.
The present disclosure may also be embodied as an apparatus for implementing a device fingerprint. The device includes: a processor; a memory coupled to the processor and storing instructions that, when executed by the processor, cause the processor to: receiving device data about a user device and operation data for operating on the user device; and identifying a device fingerprint based on the device data and the operational data and using a logistic regression model, comprising: comparing a training array containing device data and operational data with a pre-stored reference array for the user device; and calculating the probability that the training array and the reference array come from the same user equipment according to the comparison result and by using a logistic regression model.
In some embodiments, comparing the training array containing the device data and the operational data to a pre-stored reference array for the user device comprises: and when the data of the corresponding dimensionality of the training array and the data of the corresponding dimensionality of the reference array are the same, setting the value of the corresponding dimensionality in the training array to be 1, and when the data of the corresponding dimensionality of the training array and the data of the corresponding dimensionality of the reference array are different, setting the value of the corresponding dimensionality in the training array to be 0.
In some embodiments, calculating the probability that the training array and the reference array are from the same user device using a logistic regression model based on the results of the comparison comprises: using the formula P (y is 1| x; theta) is hθ(x) Calculating the probability that the training array and the reference array come from the same user equipment; wherein, P (y is 1| x; theta) represents the probability that the training array and the reference array come from the same user equipment; x is a training array; wherein h isθ(x) Is a prediction function of the form
Figure GDA0001475196010000131
Wherein, thetaTx is a boundary function when the boundary is linear, i.e.
Figure GDA0001475196010000141
Where theta is0,θ1,...θnIs a regression coefficient; wherein g (theta)Tx) is a logistic regression function of the form
Figure GDA0001475196010000142
In some embodiments, identifying the device fingerprint based on the device data and the operational data and using a logistic regression model further comprises: the logistic regression model is trained using the training array to obtain updated values of the regression coefficients.
In some embodiments, device data is collected at a front end, the front end comprising an android front end, an iOS front end, and a Web front end; the device data comprises at least one of: hardware configuration data of the user equipment comprises the type of the user equipment and the CPU model of the user equipment; software configuration data of the user equipment comprises an operating system type, an operating system version and browser setting data of the user equipment; and network configuration data of the user equipment, including the network type, network service provider, SIM card number, MAC address of the user equipment.
In some embodiments, the operational data includes at least one of: the method comprises the steps that a user browses first history information of a webpage on user equipment, and second history information of applications installed on the user equipment by the user.
The present disclosure may also be embodied as a computer-readable storage medium having computer-readable program instructions stored thereon for performing a method according to the above.
Many modifications and other embodiments of the disclosure set forth herein will come to mind to one skilled in the art to which this disclosure pertains having the benefit of the teachings presented in the foregoing descriptions and the associated drawings. Therefore, it is to be understood that the embodiments of the disclosure are not to be limited to the specific embodiments disclosed and that modifications and other embodiments are intended to be included within the scope of the disclosure. Moreover, while the above description and the related figures describe example embodiments in the context of certain example combinations of components and/or functions, it should be appreciated that different combinations of components and/or functions may be provided by alternative embodiments without departing from the scope of the present disclosure. In this regard, for example, other combinations of components and/or functions than those explicitly described above are also contemplated as within the scope of the present disclosure. Although specific terms are employed herein, they are used in a generic and descriptive sense only and not for purposes of limitation.

Claims (9)

1. A method for enabling device fingerprinting, comprising:
receiving device data about a user device and operation data for operating on the user device; and
identifying the device fingerprint using a logistic regression model based on the device data and the operational data, comprising:
comparing a training array containing the device data and the operational data with a pre-stored reference array for the user device;
calculating the probability that the training array and the reference array come from the same user equipment according to the comparison result and by utilizing the logistic regression model; and
training the logistic regression model using the training array to obtain updated values of regression coefficients;
wherein the operational data comprises at least one of: the user browses the first history information of the webpage on the user equipment, and the user installs the second history information of the application on the user equipment.
2. The method of claim 1, wherein the comparing the training array containing the device data and the operational data to a pre-stored reference array for the user device comprises:
setting the value of the corresponding dimension in the training array to 1 when the data of the corresponding dimension of the training array and the reference array are the same, and
and when the data of the corresponding dimension of the training array and the data of the corresponding dimension of the reference array are different, setting the value of the corresponding dimension in the training array to be 0.
3. The method of claim 2, wherein the calculating the probability that the training array and the reference array are from the same user device using a logistic regression model according to the result of the comparison comprises:
using the formula P (y is 1| x; theta) is hθ(x) Calculating the probability that the training array and the reference array are from the same user equipment;
wherein P (y is 1| x; theta) represents the probability that the training array and the reference array are from the same user equipment; x is the training array;
wherein h isθ(x) Is a prediction function of the form
Figure FDA0002531147290000021
Wherein, thetaTx is a boundary function when the boundary is linear, i.e.
Figure FDA0002531147290000022
Where theta is1,...θnIs a regression coefficient;
wherein g (theta)Tx) is a logistic regression function of the form
Figure FDA0002531147290000023
4. The method of claim 1, wherein the device data is collected at a front end, the front end comprising an android front end, an iOS front end, and a Web front end;
the device data comprises at least one of:
the hardware configuration data of the user equipment comprises the type of the user equipment and the CPU model of the user equipment;
the software configuration data of the user equipment comprises an operating system type, an operating system version and browser setting data of the user equipment; and
and the network configuration data of the user equipment comprises the network type, the network service provider, the SIM card number and the MAC address of the user equipment.
5. An apparatus for enabling device fingerprinting, comprising:
a processor;
a memory coupled to the processor and storing instructions that, when executed by the processor, cause the processor to:
receiving device data about a user device and operation data for operating on the user device; and
identifying the device fingerprint using a logistic regression model based on the device data and the operational data, comprising:
comparing a training array containing the device data and the operational data with a pre-stored reference array for the user device;
calculating the probability that the training array and the reference array come from the same user equipment according to the comparison result and by utilizing the logistic regression model; and
training the logistic regression model using the training array to obtain updated values of regression coefficients;
wherein the operational data comprises at least one of: the user browses the first history information of the webpage on the user equipment, and the user installs the second history information of the application on the user equipment.
6. The apparatus of claim 5, wherein the comparing the training array containing the device data and the operational data to a pre-stored reference array for the user device comprises:
setting the value of the corresponding dimension in the training array to 1 when the data of the corresponding dimension of the training array and the reference array are the same, and
and when the data of the corresponding dimension of the training array and the data of the corresponding dimension of the reference array are different, setting the value of the corresponding dimension in the training array to be 0.
7. The apparatus of claim 6, wherein the calculating the probability that the training array and the reference array are from the same user device using a logistic regression model according to the result of the comparison comprises:
using the formula P (y is 1| x; theta) is hθ(x) Calculating the probability that the training array and the reference array are from the same user equipment;
wherein P (y is 1| x; theta) represents the probability that the training array and the reference array are from the same user equipment; x is the training array;
wherein h isθ(x) Is a prediction function of the form
Figure FDA0002531147290000031
Wherein, thetaTx is a boundary function when the boundary is linear, i.e.
Figure FDA0002531147290000032
Where theta is1,...θnIs a regression coefficient;
wherein g (theta)Tx) is a logistic regression function of the form
Figure FDA0002531147290000033
8. The apparatus of claim 5, wherein the device data is collected at a front end, the front end comprising an android front end, an iOS front end, and a Web front end;
the device data comprises at least one of:
the hardware configuration data of the user equipment comprises the type of the user equipment and the CPU model of the user equipment;
the software configuration data of the user equipment comprises an operating system type, an operating system version and browser setting data of the user equipment; and
and the network configuration data of the user equipment comprises the network type, the network service provider, the SIM card number and the MAC address of the user equipment.
9. A computer-readable storage medium having computer-readable program instructions stored thereon for performing the method of any of claims 1-4.
CN201710562300.2A 2017-07-11 2017-07-11 Method, apparatus, and computer-readable storage medium for implementing device fingerprinting Expired - Fee Related CN107622198B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710562300.2A CN107622198B (en) 2017-07-11 2017-07-11 Method, apparatus, and computer-readable storage medium for implementing device fingerprinting

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710562300.2A CN107622198B (en) 2017-07-11 2017-07-11 Method, apparatus, and computer-readable storage medium for implementing device fingerprinting

Publications (2)

Publication Number Publication Date
CN107622198A CN107622198A (en) 2018-01-23
CN107622198B true CN107622198B (en) 2020-08-25

Family

ID=61087070

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710562300.2A Expired - Fee Related CN107622198B (en) 2017-07-11 2017-07-11 Method, apparatus, and computer-readable storage medium for implementing device fingerprinting

Country Status (1)

Country Link
CN (1) CN107622198B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108681667A (en) * 2018-04-02 2018-10-19 阿里巴巴集团控股有限公司 A kind of unit type recognition methods, device and processing equipment
CN109446791A (en) * 2018-11-23 2019-03-08 杭州优行科技有限公司 New equipment recognition methods, device, server and computer readable storage medium
CN109657447B (en) * 2018-11-28 2023-03-14 腾讯科技(深圳)有限公司 Equipment fingerprint generation method and device
CN109766678B (en) * 2018-12-12 2020-11-03 同济大学 Fingerprint identification authentication method, system, medium and equipment for mobile terminal equipment
US11556823B2 (en) * 2018-12-17 2023-01-17 Microsoft Technology Licensing, Llc Facilitating device fingerprinting through assignment of fuzzy device identifiers
RU2724783C1 (en) * 2018-12-28 2020-06-25 Акционерное общество "Лаборатория Касперского" Candidate fingerprint matching and comparison system and method
CN112861112A (en) * 2021-02-08 2021-05-28 北京顶象技术有限公司 Method and device for preventing equipment fingerprint identification fraud
CN114783007B (en) * 2022-06-22 2022-09-27 成都新希望金融信息有限公司 Equipment fingerprint identification method and device and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105224623A (en) * 2015-09-22 2016-01-06 北京百度网讯科技有限公司 The training method of data model and device
CN105989373A (en) * 2015-02-15 2016-10-05 阿里巴巴集团控股有限公司 Method and apparatus for obtaining equipment fingerprint by training model
CN106709318A (en) * 2017-01-24 2017-05-24 腾云天宇科技(北京)有限公司 Recognition method, device and calculation equipment for user equipment uniqueness
CN106776873A (en) * 2016-11-29 2017-05-31 珠海市魅族科技有限公司 A kind of recommendation results generation method and device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105989373A (en) * 2015-02-15 2016-10-05 阿里巴巴集团控股有限公司 Method and apparatus for obtaining equipment fingerprint by training model
CN105224623A (en) * 2015-09-22 2016-01-06 北京百度网讯科技有限公司 The training method of data model and device
CN106776873A (en) * 2016-11-29 2017-05-31 珠海市魅族科技有限公司 A kind of recommendation results generation method and device
CN106709318A (en) * 2017-01-24 2017-05-24 腾云天宇科技(北京)有限公司 Recognition method, device and calculation equipment for user equipment uniqueness

Also Published As

Publication number Publication date
CN107622198A (en) 2018-01-23

Similar Documents

Publication Publication Date Title
CN107622198B (en) Method, apparatus, and computer-readable storage medium for implementing device fingerprinting
CN112567367B (en) Similarity-based method for clustering and accelerating multiple incident surveys
JP6609047B2 (en) Method and device for application information risk management
EP3256978B1 (en) Method and apparatus for assigning device fingerprints to internet devices
CN105453102B (en) The system and method for the private cipher key leaked for identification
EP3120281B1 (en) Dynamic identity checking
CN105591743B (en) Method and device for identity authentication through equipment operation characteristics of user terminal
US11580222B2 (en) Automated malware analysis that automatically clusters sandbox reports of similar malware samples
CN109800560B (en) Equipment identification method and device
CN106030527B (en) By the system and method for application notification user available for download
CN113767613A (en) Managing data and data usage in an IOT network
US11663329B2 (en) Similarity analysis for automated disposition of security alerts
US11281773B2 (en) Access card penetration testing
US20230096895A1 (en) Command classification using active learning
CN113918949A (en) Recognition method of fraud APP based on multi-mode fusion
Medvet et al. Exploring the usage of topic modeling for android malware static analysis
CN113037746B (en) Method and device for extracting client fingerprint, identifying identity and detecting network security
US11403539B2 (en) Pattern-optimized session logs for improved web analytics
US20210342651A1 (en) Data classification device, data classification method, and data classification program
CN111698082A (en) Method for generating fingerprint identification of hybrid terminal equipment based on JS
CN112751813A (en) Network intrusion detection method and device
CN111368164A (en) Crawler recognition model training method, crawler recognition device, crawler recognition system, crawler recognition equipment and crawler recognition medium
US9858423B2 (en) Application modification based on a security vulnerability
US20240171613A1 (en) Security policy selection based on calculated uncertainty and predicted resource consumption
US11874752B1 (en) Methods and systems for facilitating cyber inspection of connected and autonomous electrical vehicles using smart charging stations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20200825

Termination date: 20210711

CF01 Termination of patent right due to non-payment of annual fee