CN107392121B - Self-adaptive equipment identification method and system based on fingerprint identification - Google Patents

Self-adaptive equipment identification method and system based on fingerprint identification Download PDF

Info

Publication number
CN107392121B
CN107392121B CN201710548621.7A CN201710548621A CN107392121B CN 107392121 B CN107392121 B CN 107392121B CN 201710548621 A CN201710548621 A CN 201710548621A CN 107392121 B CN107392121 B CN 107392121B
Authority
CN
China
Prior art keywords
module
information
equipment
sample
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710548621.7A
Other languages
Chinese (zh)
Other versions
CN107392121A (en
Inventor
蒋昌俊
闫春钢
丁志军
张亚英
周婉
王松
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Original Assignee
Tongji University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University filed Critical Tongji University
Priority to CN201710548621.7A priority Critical patent/CN107392121B/en
Publication of CN107392121A publication Critical patent/CN107392121A/en
Application granted granted Critical
Publication of CN107392121B publication Critical patent/CN107392121B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/12Fingerprints or palmprints
    • G06V40/1365Matching; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Storage Device Security (AREA)
  • Collating Specific Patterns (AREA)

Abstract

An adaptive device identification method and system based on device fingerprint identification, comprising: acquiring user login information, acquiring fingerprint login data of a user and user equipment data of a mobile terminal and a browser terminal in real time, and storing the fingerprint login data and the user equipment data as equipment record information; screening fingerprint login data, and extracting login feature information as sample feature information; digitizing sample characteristic information to obtain a hash characteristic value, normalizing the hash characteristic value to be the sample characteristic value, and converting the sample characteristic value into a multidimensional characteristic vector; taking the multidimensional feature vector of the sample as input, and determining the K value of a clustering algorithm according to a preset similarity measurement function; determining a cluster center according to a clustering algorithm and storing; and comparing the Hamming distance between the device information and the cluster center with a trusted threshold, and identifying the new login device according to the comparison result of the Hamming distance and the trusted threshold.

Description

Self-adaptive equipment identification method and system based on fingerprint identification
Technical Field
The present invention relates to a fingerprint identification-based identification system, and in particular, to a fingerprint identification-based adaptive device identification method and system.
Background
With the rapid development of the Internet, the network gradually becomes a second space for human beings, the Internet people cannot identify the identity and cannot identify the reputation of the identity, the expansion of Internet business is greatly hindered, the uncertainty of the identity is generated, and various online fraud behaviors are bred. The device fingerprint technology is a technology for identifying devices in a network and is widely applied to the fields of anti-fraud wind control, security authentication, user behavior tracking, access control and the like. The device fingerprint can be used for uniquely identifying the device characteristics of the device or unique device identification, and can be used for more accurately analyzing the behavior track of an internet fraudster, identifying risks and early warning risks from spider silk and horse marks and accurately tracking and positioning user main bodies generated by the risks and all related users. ''
Most of the existing device fingerprint identification technologies are completely dependent on dominant identifiers, such as CPU serial numbers, MAC addresses, IMEIs and the like, but the dominant identifiers have the following 3 problems that firstly, the dominant identifiers such as the CPU serial numbers and the like can be set by hardware manufacturers and are not completely reliable; secondly, the dominant identifier such as the MAC address of the same equipment can have a plurality of values, and cannot represent the equipment; thirdly, the collection of a part of explicit identifiers (such as IMEI) depends on sensitive rights, which can lead to the problems of rights abuse, user privacy disclosure and the like. In recent years, researchers have attempted to introduce implicit identifiers, such as information of the browser type, browser language, etc. of a device, and combine the implicit identifiers to form a device fingerprint, thereby identifying a device terminal. The research works overcome the limitation of device fingerprint identification when the dominant identifier is unreliable to a certain extent, and improve the device identification accuracy to a certain extent. However, when the trusted fingerprint library is built, a certain dominant identifier is selected as the device ID for uniquely identifying the user device, and the dependence on the dominant identifier is not removed in the modeling process of device identification. With the enhancement of the privacy protection consciousness of users, it is more and more difficult to acquire complete explicit identifier information, and when the explicit identifier of the device is repeated, missing or forged, the above scheme cannot accurately establish the trusted fingerprint library of the device. The system reliability is poor due to excessive dependence on the dominant identifier, so how to solve the bottleneck problem of equipment identification caused by unreliable dominant identifier is a research subject with great theoretical significance and application value in the Internet age of electronic commerce.
In summary, the conventional technology has the technical problems of low identification accuracy, low identification security and low reliability, and is easy to cause permission abuse and excessive dependence on an explicit identifier.
Disclosure of Invention
In view of the above drawbacks of the prior art, the present invention aims to provide a fingerprint identification-based adaptive device identification method and system, which are used for solving the technical problems of low identification accuracy, low identification security and low reliability in the prior art. To achieve the above and other related objects, the present invention provides an adaptive device identification method based on device fingerprint identification, including: acquiring user login information, acquiring fingerprint login data of a user, user equipment data of a mobile terminal and a browser terminal in real time, and storing the fingerprint login data and the user equipment data as equipment record information; screening fingerprint login data, and extracting login feature information as sample feature information; digitizing sample characteristic information to obtain a hash characteristic value, normalizing the hash characteristic value to be the sample characteristic value, and converting the sample characteristic value into a multidimensional characteristic vector; taking the multidimensional feature vector of the sample as input, and determining the K value of a clustering algorithm according to a preset similarity measurement function; determining a cluster center according to a clustering algorithm and storing; and comparing the Hamming distance between the device information and the cluster center with a trusted threshold, and identifying the new login device according to the comparison result of the Hamming distance and the trusted threshold.
In one embodiment of the present invention, acquiring user login information, collecting fingerprint login data of a user and user equipment data of a mobile terminal and a browser terminal in real time, and storing the fingerprint login data and the user equipment data as equipment record information, includes: acquiring fingerprint data of a current user, and acquiring equipment record information corresponding to the fingerprint data; acquiring an identifier according to the fingerprint data; acquiring an original sample according to the identifier; storing the identifier, serializing the identifier into an identifier string; analyzing sample characteristics according to the identifier character string, and storing the sample characteristics; acquiring equipment record information according to fingerprint data of a user; and according to the sample characteristics, summarizing all the equipment record information of the user into an original training data set.
In one embodiment of the present invention, screening fingerprint login data to extract device login feature information as sample feature information includes: acquiring an implicit identifier in an identifier character string; screening the hidden identifiers according to the information gain principle to obtain related identifiers; sample characteristic information is selected based on the correlation identifier.
In one embodiment of the present invention, digitizing sample feature information to obtain a hash feature value, normalizing the hash feature value to be attribute data, and converting the sample feature value to a multidimensional feature vector includes: mapping the character string attribute value in the sample characteristic information into an integer interval of a specific bit number by adopting a hash method to obtain a hash characteristic value; normalizing the hash characteristic value as attribute data in a preset characteristic interval; and converting the sample characteristic information into a multidimensional characteristic vector according to the attribute data.
In one embodiment of the present invention, the determining the K value of the clustering algorithm according to the preset similarity metric function using the multi-dimensional feature vector of the sample as an input includes: acquiring all equipment record information corresponding to a user according to the multidimensional characteristic information; taking each piece of equipment record information as a vertex connecting line as an edge, and taking a hamming distance as an edge weight, and constructing a weighted undirected graph of the user; inputting a multidimensional feature vector and a preset threshold value; randomly setting any device record information in the weighted undirected graph as an initial cluster center; traversing equipment record information, and judging whether the Hamming distance between the initial cluster center and non-center equipment information is smaller than a preset threshold value; if yes, the current equipment record information is put into a close-range set, and the cluster center count is increased by 1; if not, the current equipment record information is put back to the original set for traversal; counting according to the cluster center to obtain a K value; and re-clustering the equipment data of each user by using an adaptive clustering algorithm, determining a new clustering center until convergence, and calculating a credibility threshold.
In one embodiment of the present invention, determining and storing cluster centers according to a clustering algorithm includes: obtaining a K value and initializing a cluster center; taking each piece of equipment record information as a vertex connecting line as an edge, and taking a hamming distance as an edge weight, and constructing a weighted undirected graph of the user; clustering objects in the dataset according to the hamming distance between the sample point and each center point; and calculating the Hamming distance between two sample points in each cluster, and taking the sample point with the smallest sum of the distance to other records as a new cluster center.
In one embodiment of the present invention, comparing the hamming distance between the device information and the cluster center with a trusted threshold, and identifying a new login device according to a comparison result of the hamming distance and the trusted threshold includes: carrying out data preprocessing and feature extraction on the equipment record information; extracting a cluster center and a trusted threshold corresponding to a user; calculating the Hamming distance between the recorded information of the equipment and the center of each cluster, and judging whether all Hamming distances of the recorded information of the equipment are larger than a credible threshold; if yes, judging that the equipment corresponding to the equipment record information is not trusted; if not, judging that the equipment record information corresponds to the equipment as the trusted equipment, and updating the cluster center of the user.
In one embodiment of the present invention, an adaptive device identification system based on device fingerprint identification includes: the device comprises a login information acquisition module, a feature extraction module, a vector acquisition module, a K value calculation module, a cluster center determination module and an equipment authentication module; the login information acquisition module is used for acquiring user login information, acquiring fingerprint login data of a user and user equipment data of a mobile terminal and a browser terminal in real time and storing the fingerprint login data and the user equipment data as equipment record information; the characteristic extraction module is used for screening fingerprint login data, extracting equipment login characteristic information as sample characteristic information, and is connected with the login information acquisition module; the vector acquisition module is used for digitizing the sample characteristic information to obtain a hash characteristic value, normalizing the hash characteristic value to be the sample characteristic value, converting the sample characteristic value into a multidimensional characteristic vector, and connecting the vector acquisition module with the characteristic extraction module; the K value calculation module is used for taking the multidimensional feature vector of the sample as input, determining the K value of a clustering algorithm according to a preset similarity measurement function, and the K value extraction module is connected with the vector acquisition module; the cluster center determining module is used for determining a cluster center according to a clustering algorithm and storing the cluster center, and the cluster center determining module is connected with the K value calculating module; the device authentication module is used for comparing the Hamming distance between the device information and the cluster center with a trusted threshold value, identifying new login devices according to the comparison result of the Hamming distance and the trusted threshold value, and is connected with the cluster center determination module.
In an embodiment of the present invention, the login information obtaining module includes: the fingerprint device comprises a fingerprint device information module, a fingerprint identifier acquisition module, an original sample module, a character string module, a sample characteristic analysis module, a device information extraction module and an original data set module; the fingerprint equipment information module is used for acquiring fingerprint data of a current user and collecting equipment record information corresponding to the fingerprint data; the fingerprint identifier acquisition module is used for acquiring an identifier according to fingerprint data, and is connected with the fingerprint equipment information module; the original sample module is used for acquiring an original sample according to the identifier, and is connected with the fingerprint identifier acquisition module; the character string module is used for storing the identifier, serializing the identifier into an identifier character string, and is connected with the fingerprint identifier acquisition module; the sample feature analysis module is used for analyzing sample features according to the identifier character strings, storing the sample features and connecting the sample feature analysis module with the original sample module; the device information extraction module is used for acquiring device record information according to fingerprint data of a user, and is connected with the fingerprint device information module; and the original data set module is used for summarizing all the equipment record information of the user into an original training data set according to the sample characteristics, and is connected with the sample characteristic analysis module.
In an embodiment of the present invention, the feature extraction module includes: the system comprises a hidden identifier acquisition module, a related identifier acquisition module and a sample characteristic acquisition module; the implicit identifier acquisition module is used for acquiring the implicit identifier in the identifier character string; the related identifier acquisition module is used for screening the hidden identifier according to the information gain principle to obtain a related identifier, and the related identifier acquisition module is connected with the hidden identifier acquisition module; and the sample characteristic acquisition module is used for selecting sample characteristic information according to the related identifier, and is connected with the related identifier acquisition module.
In an embodiment of the present invention, the vector obtaining module includes: the device comprises a hash eigenvalue module, an eigenvalue normalization module and a vector conversion module; the hash characteristic value module is used for mapping the character string attribute value in the sample characteristic information into an integer interval of a specific bit number by adopting a hash method to obtain a hash characteristic value; the characteristic value normalization module is used for normalizing the hash characteristic value to be attribute data in a preset characteristic interval, and is connected with the hash characteristic value module; the vector conversion module is used for converting the sample characteristic information into a multidimensional characteristic vector according to the attribute data, and the vector conversion module is connected with the characteristic value normalization module.
In an embodiment of the present invention, the K value calculation module includes: the device comprises a device record information extraction module, an undirected graph construction module, a data input module, a cluster center initialization module, a distance judgment module, a K value accumulation module, a cycle traversal module, a K value acquisition module and a cluster center update module; the device record information extraction module is used for acquiring all device record information corresponding to the user according to the multidimensional characteristic information; the undirected graph construction module is used for constructing a weighted undirected graph of a user by taking each piece of equipment record information as a vertex connecting line as an edge and taking a hamming distance as an edge weight, and is connected with the equipment record information extraction module; the data input module is used for inputting the multidimensional feature vector and a preset threshold value, and is connected with the equipment record information extraction module; the cluster center initializing module is used for randomly setting any device record information in the weighted undirected graph as an initial cluster center, and is connected with the undirected graph constructing module; the distance judging module is used for traversing the equipment record information and judging whether the Hamming distance between the initial cluster center and the non-center equipment information is smaller than a preset threshold value or not, and the distance judging module is connected with the cluster center initializing module; the K value accumulation module is used for putting the current equipment record information into close-range concentration when the Hamming distance between the initial cluster center and the non-center equipment information is smaller than a preset threshold value, and the cluster center count is increased by 1, and the K value accumulation module is connected with the distance judgment module; the circulating traversing module is used for replacing the current equipment record information into the original set for traversing when the Hamming distance between the initial cluster center and the non-center equipment information is not smaller than a preset threshold value, and is connected with the distance judging module; the K value acquisition module is used for counting according to the cluster center to obtain a K value, and is connected with the K value accumulation module; and the cluster center updating module is used for re-clustering the equipment data of each user by using a self-adaptive clustering algorithm, determining a new cluster center until convergence, calculating a credible threshold value, and connecting the cluster center updating module with the K value acquisition module.
In one embodiment of the present invention, the cluster center determining module includes: the system comprises a clustering initial module, an undirected graph module, an object clustering module and a cluster center module; the cluster initial module is used for acquiring the K value and initializing the cluster center; the undirected graph module is used for constructing a weighted undirected graph of the user by taking each piece of equipment record information as a vertex connecting line as an edge and taking a hamming distance as an edge weight, and is connected with the clustering initial module; the object clustering module is used for clustering objects in the data set according to the Hamming distance between the sample point and each center point, and is connected with the undirected graph module; and the cluster center module is used for calculating the Hamming distance between two sample points in each cluster, taking the sample point with the smallest sum of the distance between the sample point and other records as a new cluster center, and connecting the cluster center module with the object clustering module.
In one embodiment of the present invention, a device authentication module includes: the device comprises a device information extraction module, a user information extraction module, a device judgment module, an illegal device judgment module and a legal device judgment module; the device information extraction module is used for carrying out data preprocessing and feature extraction on the device record information; the user information extraction module is used for extracting a cluster center and a credible threshold value corresponding to a user; the device judging module is used for calculating the Hamming distance between the device record information and the center of each cluster, judging whether all Hamming distances of the device record information are larger than a trusted threshold value or not, and is connected with the device information extracting module; the illegal equipment judging module is used for judging that equipment corresponding to the equipment record information is not credible when all Hamming distances of the equipment record information are larger than a credible threshold value, and is connected with the equipment judging module; and the legal equipment judging module is used for judging that the equipment recorded information is corresponding to the equipment as the trusted equipment when all Hamming distances of the equipment recorded information are not more than the trusted threshold value, updating the cluster center of the user, and connecting the legal equipment judging module with the equipment judging module.
As described above, the self-adaptive device identification method and system based on fingerprint identification provided by the invention have the following beneficial effects:
the invention aims to provide a self-adaptive equipment identification method and a self-adaptive equipment identification system based on fingerprint identification, which take an empirical threshold as a reference value of a trusted distance, perform data analysis according to historical login information of user equipment, and train corresponding self-adaptive clustering models for equipment data of each user. Therefore, the new login data is identified, whether the new login data is a trusted device is judged, and the technical problems of low identification accuracy, low identification safety and low reliability in the prior art are solved.
Drawings
Fig. 1 shows a schematic diagram of steps of an adaptive device identification method based on device fingerprint identification.
Fig. 2 is a schematic diagram showing the steps of obtaining the device record information according to the present invention.
Fig. 3 is a schematic diagram showing a sample feature information acquisition step.
FIG. 4 is a diagram showing the steps of the special information digitizing process according to the present invention.
Fig. 5 shows a schematic diagram of the K value determining step of the present invention.
Fig. 6 shows a schematic diagram of a cluster center determination step of the present invention.
FIG. 7 is a diagram showing a new login device identification procedure according to the present invention.
Fig. 8 is a schematic diagram of an adaptive device identification system module based on device fingerprint identification according to the present invention.
Fig. 9 is a schematic diagram of a login information obtaining module according to the present invention.
Fig. 10 is a schematic diagram of a feature extraction module according to the present invention.
Fig. 11 is a schematic diagram of a vector acquisition module according to the present invention.
Fig. 12 is a schematic diagram of a K-value calculation module according to the present invention.
Fig. 13 is a schematic diagram showing a cluster center determination module according to the present invention.
FIG. 14 is a schematic diagram of a device authentication module according to the present invention
Description of element reference numerals
1 self-adaptive device identification system based on device fingerprint identification
11 login information acquisition module
12 feature extraction module
13 vector acquisition module
14K value calculation module
15 cluster center determining module
16 equipment authentication module
111 fingerprint equipment information module
112 fingerprint identifier acquisition module
113 raw sample module
114 character string module
115 sample feature analysis module
116 the equipment information extraction module
117 raw data set Module
121 hidden identifier acquisition module
122 related identifier acquisition module
123 sample feature acquisition module
131 hash eigenvalue module
132 eigenvalue normalization module
133 vector conversion module
141 equipment record information extraction module
142 undirected graph construction module
143 data input module
144 cluster center initialization module
145 distance judging module
146K value accumulation module
147 circulation traversing module
148K value acquisition module
149 cluster center updating module
151 clustering initial module
152 undirected graph module
153 object clustering module
154 cluster center module
161 equipment information extraction module
162 user information extraction module
163 equipment judging module
164 illegal equipment judging module
165 legal device decision module
Description of step reference numerals
FIGS. 1 S1 to S6
FIGS. 2S 11 to S17
FIGS. 3S 21 to S23
FIGS. 4S 31 to S33
FIGS. 5S 41 to S49
FIGS. 6S 51 to S54
FIGS. 7S 61 to S65
Detailed Description
Further advantages and effects of the present invention will become apparent to those skilled in the art from the disclosure of the present invention, which is described by the following specific examples.
Referring to fig. 1 to 11, it should be understood that the structures shown in the drawings are only for understanding and reading by those skilled in the art, and are not intended to limit the applicable scope of the present invention, and that any structural modifications, proportional changes or size adjustments should fall within the scope of the present invention without affecting the efficacy and achievement of the present invention. Also, the terms such as "upper," "lower," "left," "right," "middle," and "a" and the like recited in the present specification are merely for descriptive purposes and are not intended to limit the scope of the invention, but are intended to provide relative positional changes or modifications without materially altering the technical context in which the invention may be practiced.
Referring to fig. 1, a schematic diagram of steps of an adaptive device identification method based on device fingerprint identification according to the present invention is shown, and as shown in fig. 1, an adaptive device identification method based on device fingerprint identification includes:
s1, acquiring user login information, acquiring fingerprint login data of a user, user equipment data of a mobile terminal and a browser terminal in real time, and storing the fingerprint login data and the user equipment data as equipment record information, firstly, carrying out data according to historical login information of user equipment, and acquiring historical data and real-time acquisition equipment login data;
s2, screening fingerprint login data, and extracting login feature information as sample feature information;
s3, digitizing sample characteristic information to obtain a hash characteristic value, normalizing the hash characteristic value to obtain a sample characteristic value, converting the sample characteristic value into a multidimensional characteristic vector, extracting information capable of reflecting characteristics of terminal equipment as sample characteristics, preprocessing the extracted sample characteristics, converting characteristic data of a text type into a numerical value, and vectorizing the characteristic value, so that the method can be suitable for a clustering algorithm;
s4, taking the multi-dimensional feature vector of the sample as input, determining a K value of a clustering algorithm according to a preset similarity measurement function, and selecting a corresponding clustering algorithm to divide data according to the features of the fingerprint data of the user equipment;
S5, determining cluster centers according to a clustering algorithm, storing the clustered cluster centers of a plurality of clusters in a trusted fingerprint database, identifying newly logged-in equipment, taking an experience threshold as a reference value of a trusted distance, carrying out data analysis according to historical login information of user equipment, and training equipment data of each user into a corresponding self-adaptive clustering model;
s6, comparing the Hamming distance between the equipment information and the cluster center with a trusted threshold, and identifying new login equipment according to a comparison result of the Hamming distance and the trusted threshold, if the new login equipment is identified as legal equipment, such as a portable computer or a mobile phone, and the like, continuing to allow login access operation, and if the login equipment is identified as illegal, forcibly popping up an authentication interface to organize equipment login.
Referring to fig. 2, a schematic diagram of a device record information obtaining step of the present invention is shown in fig. 2, and as shown in fig. 2, S1, obtaining user login information, collecting fingerprint login data of a user and user device data of a mobile terminal and a browser terminal in real time, and storing the fingerprint login data and the user device data as device record information, including:
s11, acquiring fingerprint data of a current user, acquiring equipment record information corresponding to the fingerprint data, and acquiring data of a mobile terminal and a browser terminal through an equipment fingerprint data acquisition module when equipment logs in;
S12, acquiring identifiers according to fingerprint data, storing the key value pairs of all the identifiers in a HashMap object, serializing the content in the HashMap object into a JSON format character string after the acquisition is completed, analyzing information capable of reflecting the characteristics of terminal equipment from the JSON format character string as sample characteristics, and uploading the data to a server by using an HTTPPOST mode;
s13, acquiring an original sample according to the identifier, and acquiring all equipment records of the user from a historical fingerprint database according to the user ID of the login equipment to serve as an original training data set;
s14, storing the identifier, serializing the identifier into an identifier character string, collecting newly logged-in equipment data according to the steps when a user sends an equipment login request, and storing the newly logged-in equipment data into a historical fingerprint database;
s15, analyzing sample characteristics according to the identifier character strings, and storing the sample characteristics, wherein the device fingerprints contain more hidden identifiers, so that the acquired hidden identifiers are required to be screened, namely, characteristic selection is performed. The effective feature selection algorithm not only can effectively remove irrelevant or redundant features, but also can reduce the computational complexity and improve the recognition efficiency;
S16, acquiring equipment record information according to fingerprint data of a user, and acquiring equipment hardware information by a browser end through a browser plug-in;
and S17, according to sample characteristics, summarizing all equipment record information of a user into an original training data set, acquiring a complete HTTP return header as an original sample by sending an HTTP-GET request, and when the user sends an equipment login request, acquiring newly logged equipment data according to the steps and storing the newly logged equipment data into a historical fingerprint database. And finally, acquiring all device records of the user from a historical fingerprint database according to the user ID of the login device, and taking the device records as an original training data set.
Referring to fig. 3, a schematic diagram of a sample feature information obtaining step is shown, as shown in fig. 3, S2, screening fingerprint login data, extracting device login feature information as sample feature information, including:
s21, acquiring hidden identifiers in the identifier character string, wherein the mobile terminal acquires all the explicit identifiers and the hidden identifiers by calling a system API and executing a LinuxShell command, and the acquired hidden identifiers are required to be screened, namely, feature selection is performed because the device fingerprint contains more hidden identifiers. The effective feature selection algorithm not only can effectively remove irrelevant or redundant features, but also can reduce the computational complexity and improve the recognition efficiency;
S22, screening the hidden identifiers according to the information gain principle to obtain related identifiers, wherein the hidden identifiers with high information entropy and small change times are considered to be beneficial to identifying equipment and identifying equipment;
s23, selecting sample characteristic information according to the related identifier, wherein the characteristic selection algorithm adopted by the invention is mainly based on the information gain principle, combines a statistical method to perform characteristic selection on the original data set and the newly registered equipment fingerprint, and selects related characteristics on the basis of not losing important characteristics, removes useless characteristics and reduces redundant characteristics.
Referring to fig. 4, which is a schematic diagram of a step of digitizing special information according to the present invention, as shown in fig. 4, S3, digitizing sample feature information to obtain a hash feature value, normalizing the hash feature value to be attribute data, and converting the sample feature value into a multidimensional feature vector, including:
s31, mapping the character string attribute value in the sample characteristic information into an integer interval of a specific bit number by adopting a hash method to obtain a hash characteristic value, wherein the data preprocessing module mainly carries out the numerical processing and normalization processing on the extracted sample characteristic. Because the 'order' relation does not exist between the fingerprint attribute values of the equipment, the attribute values are required to be quantized into discrete values corresponding to one, but cannot be quantized into continuous values based on a space model;
S32, normalizing the hash characteristic values to be attribute data in a preset characteristic interval, wherein the purpose of normalization processing is to limit each attribute value between 0 and 1, so that the algorithm execution efficiency is further improved, and the algorithm precision is improved;
s33, converting the sample characteristic information into a multidimensional characteristic vector according to the attribute data, and storing the sample data after data preprocessing in a database as input of the self-adaptive clustering module.
Referring to fig. 5, a schematic diagram of a K value determining step of the present invention is shown in fig. 5, and S4, taking a multi-dimensional feature vector of a sample as an input, determines a K value of a clustering algorithm according to a preset similarity metric function, including:
s41, acquiring all equipment record information corresponding to the user according to the multidimensional characteristic information, wherein in a clustering algorithm, similarity calculation among 2 samples is very important, and the final clustering effect is determined by the advantages and disadvantages of a similarity measurement function. Under the vector space model, the similarity between samples can be represented by a certain distance between vectors, such as Euclidean distance, mahalanobis distance and the like; the values of the fingerprint attributes of the equipment are discrete values, no 'order' relation exists between the attribute values, the similarity of the fingerprint of the equipment cannot be accurately reflected by the distances, and the Hamming distance can be used for measuring the distance of the discrete values. Therefore, the invention selects the hamming distance function as the similarity measurement function, and the hamming distance between two equal-length character strings s1 and s2 is defined as the minimum number of times of replacement needed to change one of the character strings into the other. Since in the device fingerprint application case, we refer to the idea of the hamming distance, the hamming distance between the feature vector xi and the cluster center cj is defined as: xi number of non-matching features in the features corresponding to cj;
S42, taking the connection line of each piece of equipment record information as a vertex as a side, taking the Hamming distance as a side weight, constructing a weighted undirected graph of a user, namely constructing a weighted undirected graph taking each piece of record as the vertex and a connection line between records as the side and taking the Hamming distance as the side, and solving the central point problem of the weighted undirected graph. In order to simplify the calculation process of the center point, the invention takes the record with the smallest sum of Hamming distances between each cluster and other records as the cluster center;
s43, inputting a multidimensional feature vector and a preset threshold value, wherein the input of the self-adaptive clustering algorithm is historical login data which is read from a database and is subjected to data preprocessing. Firstly, taking an experience threshold Z as a reference value, and recording N pieces of historical login data of a user with an empty dominant identifier as Set0, wherein the confidence distance is the experience threshold Z;
s44, randomly setting any device record information in the weighted undirected graph as an initial cluster center, and initializing the cluster center which is updated continuously in the subsequent traversal process of the weighted undirected graph;
s45, traversing the equipment record information, judging whether the Hamming distance between the initial cluster center and the non-central equipment information is smaller than a preset threshold value, re-clustering the equipment data of each user by using a self-adaptive clustering algorithm, determining a new clustering center until convergence, and calculating a credible threshold value. Finally, cluster centers and trusted thresholds of a plurality of clustered clusters are stored in a trusted fingerprint database and used for identifying newly logged-in equipment, and a K value determining algorithm is adopted:
Input: the N pieces of historical login data of a user with an empty dominant identifier are recorded as Set0, and the credible distance experience threshold value Z is output: the number of devices K of the user and the initial cluster center R1 of each class
Figure SMS_1
The value of i is K, and R1 selected each time is the initial cluster center of each type;
s46, if yes, putting the current equipment record information into a close-range set S1, and adding 1 to the cluster center count, namely adding 1 to the variable value representing the K value;
s47, if not, the current equipment record information Ri is put back into the original set S2 for traversing;
s48, counting according to the cluster center to obtain a K value, wherein the K value is the number K of the user equipment;
s49, re-clustering the equipment data of each user by using an adaptive clustering algorithm, determining a new clustering center until convergence, and calculating a credible threshold value.
Referring to fig. 6, a schematic diagram of a cluster center determining step of the present invention is shown in fig. 6, and S5, determining and storing a cluster center according to a clustering algorithm, includes:
s51, obtaining a K value, initializing a cluster center, initializing N pieces of historical login data of a user with an empty dominant identifier, assuming K types, determining the K value according to a K value determining algorithm, and initializing the cluster center;
S52, constructing a weighted undirected graph of the user by taking each piece of equipment record information as a vertex connecting line as an edge and taking a hamming distance as an edge weight;
s53, clustering objects in the data set according to the Hamming distance between each sample point and each center point, respectively calculating the Hamming distance between each sample point Xi and k center points, classifying Xi into a cluster with the nearest distance, marking the cluster of Xi as Ci, and if the recorded cluster mark is changed, setting changed=1; if the flag bit is unchanged, the flag bit is changed=0;
s54, calculating the Hamming distance between two sample points in each cluster, and taking the sample point with the smallest sum of the distance to other records as a new cluster center.
Referring to fig. 7, a schematic diagram of a new login device identification step of the present invention is shown, as shown in fig. 7, S6, comparing a hamming distance between device information and a cluster center with a trusted threshold, and identifying a new login device according to a comparison result of the hamming distance and the trusted threshold, including:
s61, carrying out data preprocessing and feature extraction on equipment record information, when a user sends a Login request, matching equipment fingerprints requested by the user with trusted fingerprints stored in a database to judge the credibility of authentication, inputting equipment information device_fp logged by the user in real time, and storing the input equipment information device_fp into a historical fingerprint database Login_DB;
S62, extracting a cluster center and a trusted threshold value corresponding to a user, preprocessing data and extracting features of newly registered equipment information device fp to obtain an input vector X of a self-adaptive clustering algorithm, and reading the cluster center C and the trusted threshold value Z corresponding to the user from a trusted fingerprint database Cred_DB;
s63, calculating the Hamming distance between the equipment record information and each cluster center, judging whether all Hamming distances of the equipment record information are larger than a trusted threshold, calculating the Hamming distance dist (X, C) between the equipment information X and each cluster center, and after the result is obtained by calculation, judging that the equipment fingerprint larger than the trusted distance threshold is trusted, otherwise, judging that the equipment fingerprint is not trusted;
s64, if so, judging that the equipment corresponding to the equipment record information is not trusted, and if dist (X, C) > Z exists for each cluster center, outputting 0 if the equipment is not trusted;
s65, if not, judging that the equipment record information corresponds to equipment as trusted equipment, updating the cluster center of the user, classifying X into the class with the minimum distance value, updating the cluster center of the user, outputting 1.
Referring to fig. 8, a schematic diagram of an adaptive device recognition system based on device fingerprint recognition according to the present invention is shown, and as shown in fig. 8, an adaptive device recognition system 1 based on device fingerprint recognition includes: a login information acquisition module 11, a feature extraction module 12, a vector acquisition module 13, a K value calculation module 14, a cluster center determination module 15, and an equipment authentication module 16; the login information acquisition module 11 is configured to acquire user login information, collect fingerprint login data of a user and user equipment data of a mobile terminal and a browser terminal in real time, store the fingerprint login data and the user equipment data as equipment record information, and firstly perform data according to historical login information of the user equipment to acquire historical data and acquire equipment login data in real time; the feature extraction module 12 is used for screening fingerprint login data, extracting login feature information as sample feature information, and the feature extraction module 12 is connected with the login information acquisition module 11; the vector acquisition module 13 is configured to digitize the sample feature information to obtain a hashed feature value, normalize the hashed feature value to obtain a sample feature value, convert the sample feature value into a multidimensional feature vector, extract information capable of reflecting characteristics of the terminal device as a sample feature, pre-process the extracted sample feature, convert feature data of a text type into a numerical value, and vectorize the feature value, so that the feature value can be applied to a clustering algorithm, and the vector acquisition module 13 is connected with the feature extraction module 12; the K value calculation module 14 is used for taking the multidimensional feature vector of the sample as input, determining the K value of a clustering algorithm according to a preset similarity measurement function, and dividing the data according to the corresponding clustering algorithm selected by the features of the fingerprint data of the user equipment, wherein the K value extraction module 14 is connected with the vector acquisition module 13; the cluster center determining module 15 is configured to determine and store cluster centers according to a clustering algorithm, store clustered cluster centers of a plurality of clusters in a trusted fingerprint database, thereby identify newly logged-in devices, perform data analysis according to historical login information of user devices by using an empirical threshold as a reference value of a trusted distance, and train a corresponding adaptive clustering model for device data of each user, where the cluster center determining module 15 is connected with the K-value calculating module 14; the device authentication module 16 is configured to compare the hamming distance between the device information and the cluster center with a trusted threshold, identify a new login device according to a comparison result of the hamming distance and the trusted threshold, if the new login device is identified as a legal device, such as a portable computer or a mobile phone, etc., continue to allow login access operation, if the login device is identified as illegal, forcibly pop up an authentication interface, organize device login, and connect the device authentication module 16 with the cluster center determination module 15.
Referring to fig. 9, a schematic diagram of a login information obtaining module according to the present invention is shown, and as shown in fig. 9, a login information obtaining module 11 includes: a fingerprint device information module 111, a fingerprint identifier acquisition module 112, an original sample module 113, a character string module 114, a sample feature analysis module 115, a device information extraction module 116, and an original data set module 117; the fingerprint device information module 111 is configured to acquire fingerprint data of a current user, collect device record information corresponding to the fingerprint data, and collect data of a mobile terminal and a browser terminal through the device fingerprint data collection module when the device logs in; the fingerprint identifier obtaining module 112 is configured to obtain identifiers according to fingerprint data, store < key, value > key pair data of all identifiers in a HashMap object, after obtaining, serialize content in the HashMap object into a JSON format character string, parse information capable of reflecting characteristics of terminal devices from the JSON format character string as sample features, upload the data to a server using an http post method, and connect the fingerprint identifier obtaining module 112 with the fingerprint device information module 111; the original sample module 113 is configured to obtain an original sample according to the identifier, obtain all device records of the user from the historical fingerprint database according to the user ID of the login device, and use the original sample module 113 as an original training data set, and connect with the fingerprint identifier obtaining module 112; the character string module 114 is used for storing the identifier, serializing the identifier into an identifier character string, collecting newly logged-in device data according to the steps when the user sends a device login request, and storing the newly logged-in device data in the historical fingerprint database, and the character string module 114 is connected with the fingerprint identifier acquisition module 112; the sample feature analysis module 115 is configured to analyze the sample feature according to the identifier string, store the sample feature, and because the device fingerprint contains more hidden identifiers, it is necessary to screen the collected hidden identifiers, that is, perform feature selection. The effective feature selection algorithm not only can effectively remove irrelevant or redundant features, but also can reduce the computational complexity and improve the recognition efficiency, and the sample feature analysis module 115 is connected with the original sample module 113; the device information extraction module 116 is configured to obtain device record information according to fingerprint data of a user, the browser side obtains device hardware information through a browser plug-in, and the device information extraction module 116 is connected with the fingerprint device information module 111; the original data set module 117 is configured to aggregate all device record information of a user into an original training data set according to sample characteristics, obtain a complete HTTP return header as an original sample by sending an HTTP-GET request, and collect newly logged device data according to the above steps and store the newly logged device data in the history fingerprint database when the user sends a device login request. Finally, all device records of the user are obtained from the historical fingerprint database according to the user ID of the login device, and the device records are used as an original training data set, and the original data set module 117 is connected with the sample feature analysis module 115.
Referring to fig. 10, a schematic diagram of a feature extraction module according to the present invention is shown, and as shown in fig. 10, a feature extraction module 12 includes: a implicit identifier acquisition module 121, a related identifier acquisition module 122, and a sample feature acquisition module 123; the implicit identifier obtaining module 121 is configured to obtain the implicit identifier in the identifier string, where the mobile terminal obtains all the explicit identifiers and the implicit identifiers by calling the system API and executing the Linux Shell command, and because the device fingerprint contains more implicit identifiers, the collected implicit identifiers need to be filtered, i.e. feature selection is performed. The effective feature selection algorithm not only can effectively remove irrelevant or redundant features, but also can reduce the computational complexity and improve the recognition efficiency; the related identifier obtaining module 122 is configured to screen the hidden identifiers according to the information gain principle to obtain related identifiers, and we consider that the hidden identifiers with high information entropy and small change times are beneficial to identifying equipment and identifying equipment, and the related identifier obtaining module 122 is connected with the hidden identifier obtaining module 121; the sample feature obtaining module 123 is configured to select sample feature information according to the relevant identifier, and because the feature selection algorithm adopted by the present invention is mainly based on the information gain principle and combines a statistical method to perform feature selection on the original data set and the newly registered device fingerprint, the basic principle of feature selection is to select "relevant features" on the basis of not losing important features, remove "useless features" and reduce "redundant features", and the sample feature obtaining module 123 is connected with the relevant identifier obtaining module 122.
Referring to fig. 11, a schematic diagram of a vector acquisition module according to the present invention is shown, and as shown in fig. 11, a vector acquisition module 13 includes: a hash eigenvalue module 131, an eigenvalue normalization module 132, and a vector conversion module 133; the hash characteristic value module 131 is configured to map a string attribute value in the sample characteristic information to an integer interval with a specific number of bits by using a hash method to obtain a hash characteristic value, and the data preprocessing module mainly performs quantization and normalization processing on the extracted sample characteristic. Because the 'order' relation does not exist between the fingerprint attribute values of the equipment, the attribute values are required to be quantized into discrete values corresponding to one, but cannot be quantized into continuous values based on a space model; the feature value normalization module 132 is configured to normalize the hash feature value to be attribute data in a preset feature interval, and the purpose of normalization processing is to limit each attribute value to be between [0,1], so as to further accelerate algorithm execution efficiency and improve algorithm precision, where the feature value normalization module 132 is connected with the hash feature value module 131; the vector conversion module 133 is configured to convert the sample feature information into a multidimensional feature vector according to the attribute data, store the sample data after the data preprocessing in a database, and serve as an input of the adaptive clustering module, and the vector conversion module 133 is connected with the feature value normalization module 132.
Referring to fig. 12, a schematic diagram of a K-value calculating module according to the present invention is shown, and as shown in fig. 12, the K-value calculating module 14 includes: the device record information extraction module 141, the undirected graph construction module 142, the data input module 143, the cluster center initialization module 144, the distance judgment module 145, the K value accumulation module 146, the cycle traversal module 147, the K value acquisition module 148 and the cluster center update module 149; the device record information extraction module 141 is configured to obtain all device record information corresponding to the user according to the multidimensional feature information, and in the clustering algorithm, similarity calculation between 2 samples is very important, and the quality of the similarity measurement function determines a final clustering effect. Under the vector space model, the similarity between samples can be represented by a certain distance between vectors, such as Euclidean distance, mahalanobis distance and the like; the values of the fingerprint attributes of the equipment are discrete values, no 'order' relation exists between the attribute values, the similarity of the fingerprint of the equipment cannot be accurately reflected by the distances, and the Hamming distance can be used for measuring the distance of the discrete values. Therefore, the invention selects the hamming distance function as the similarity measurement function, and the hamming distance between two equal-length character strings s1 and s2 is defined as the minimum number of times of replacement needed to change one of the character strings into the other. Since in the device fingerprint application case, we refer to the idea of the hamming distance, the hamming distance between the feature vector xi and the cluster center cj is defined as: xi number of non-matching features in the features corresponding to cj; the undirected graph construction module 142 is configured to construct a weighted undirected graph of the user with the connection line of the vertices of the record information of each device as an edge, the hamming distance as an edge weight, and the problem of solving the center of the cluster, that is, the weighted undirected graph with the connection line between the vertices of each record and the record as an edge, and the weighted undirected graph with the hamming distance as an edge, and solve the problem of the center point of the weighted undirected graph. In order to simplify the calculation process of the center point, the record with the smallest sum of Hamming distances between each cluster and other records is used as the cluster center, and the undirected graph construction module 142 is connected with the equipment record information extraction module 141; the data input module 143 is configured to input the multidimensional feature vector and a preset threshold, and input the adaptive clustering algorithm is historical login data after data preprocessing, which is read from a database. Firstly, taking an experience threshold Z as a reference value, recording N pieces of historical login data of a user with an empty dominant identifier as Set0, and connecting a data input module 143 with a device record information extraction module 141; the cluster center initializing module 144 is configured to randomly set any device record information in the weighted undirected graph as an initial cluster center, and is configured to initialize the cluster center that is updated continuously in the subsequent traversal process of the weighted undirected graph, where the cluster center initializing module 144 is connected to the undirected graph construction module 142; the distance judging module 145 is configured to traverse the device record information, judge whether the hamming distance between the initial cluster center and the non-central device information is smaller than a preset threshold, re-cluster the device data of each user by using an adaptive clustering algorithm, determine a new cluster center until convergence, and calculate a trusted threshold. Finally, cluster centers and trusted thresholds of a plurality of clustered clusters are stored in a trusted fingerprint database and used for identifying newly logged-in equipment, and a distance judging module 145 is connected with a cluster center initializing module 144; the K value accumulating module 146 is configured to, if yes, put the current device record information into the close-range set S1, count the cluster center by 1, that is, add 1 to the variable value representing the K value, where the K value accumulating module 146 is connected to the distance judging module 145; the circulation traversing module 147 is configured to, if not, put the current device record information Ri back into the original set S2 for traversing, where the circulation traversing module 147 is connected to the distance judging module 145; the K value obtaining module 148 is configured to obtain a K value according to the cluster center count, where the K value is the number K of devices of the user, and the K value obtaining module 148 is connected to the K value accumulating module 146; the cluster center updating module 149 is configured to re-cluster the device data of each user by using an adaptive clustering algorithm, determine a new cluster center until convergence, and calculate a trusted threshold, where all the adaptive clustering K-center algorithm and the device fingerprint identification determination algorithm of the present invention are all performed on the data of a certain user, and the processing manners of all the users are the same, and the cluster center updating module 149 is connected with the K value obtaining module 148.
Referring to fig. 13, a schematic diagram of a cluster center determining module according to the present invention is shown, and as shown in fig. 13, a cluster center determining module 15 includes: a cluster initialization module 151, an undirected graph module 152, an object clustering module 153, and a cluster center module 154; the cluster initial module 151 is configured to obtain a K value, initialize a cluster center, determine the K value according to a K value determining algorithm assuming that K classes are shared by N pieces of historical login data of a user with an empty explicit identifier, and initialize the cluster center; the undirected graph module 152 is configured to construct a weighted undirected graph of the user by taking each piece of equipment record information as a vertex connecting line as an edge and a hamming distance as an edge weight, and the undirected graph module 152 is connected with the clustering initial module 151; the object clustering module 153 is configured to cluster objects in the dataset according to hamming distances between the sample points and each center point, respectively calculate hamming distances between each sample point Xi and k center points, classify Xi as a cluster closest to each other, record the cluster mark of Xi as Ci, and if the recorded cluster mark changes, set changed=1; if the object is not changed, the flag bit changed=0 is set, and the object clustering module 153 is connected with the undirected graph module 152; the cluster center module 154 calculates the hamming distance between two sample points in each cluster, and takes the sample point with the smallest sum of the distances to other records as a new cluster center, and the cluster center module 154 is connected with the object clustering module 153.
Referring to fig. 14, which is a schematic diagram of a device authentication module according to the present invention, as shown in fig. 14, a device authentication module 16 includes: the method comprises the steps of carrying out a first treatment on the surface of the The device information extraction module 161 is configured to perform data preprocessing and feature extraction on device record information, and when a user sends a Login request, match a device fingerprint requested by the user with a trusted fingerprint stored in a database to determine the reliability of authentication, input device information device_fp that the user logs in real time, and store the input device information device_fp into the historical fingerprint database logic_db; the user information extraction module 162 is configured to extract a cluster center and a trusted threshold corresponding to a user, perform data preprocessing and feature extraction on newly registered device information device_fp, obtain an input vector X of a self-adaptive clustering algorithm, and read a cluster center C and a trusted threshold Z corresponding to the user from a trusted fingerprint database cred_db; a device judging module 163, configured to calculate hamming distances between the device record information and each cluster center, judge whether all hamming distances of the device record information are greater than a trusted threshold, calculate hamming distances dist (X, C) between the device information X and each cluster center, and after the result is obtained by calculation, consider a device fingerprint that is greater than the trusted distance threshold as trusted, and if not, consider the device fingerprint as untrusted, the device judging module 163 is connected to the device information extracting module 161; an illegal device judging module 164, configured to judge that the device corresponding to the device record information is not trusted when all hamming distances of the device record information are greater than a trusted threshold, and if dist (X, C) > Z is present for each cluster center, the device is not trusted, and output 0, where the illegal device judging module 164 is connected to the device judging module 163; the legal device determining module 165 is configured to determine that the device record information corresponds to a device being a trusted device, update a cluster center of a user, classify X into a class with a minimum distance value, update the cluster center of the user, output 1, and connect the legal device determining module 165 with the device determining module 163.
In summary, the invention provides a fingerprint identification-based self-adaptive device identification method and system, and provides a self-adaptive clustering K-center algorithm aiming at the condition that an explicit identifier is missing, so that different devices of the same user can be effectively distinguished under the condition that the explicit identifier is missing, repeated or even forged. Aiming at the characteristic of disorder of the characteristic value of the equipment fingerprint, a coordinate-based distance measurement mode that the Hamming distance replaces the common Euclidean distance, the Markov distance and the like is used in a clustering algorithm is provided to calculate the similarity among samples. The invention breaks through the traditional cluster center determination schemes such as mean value calculation and the like, selects the most representative data in the equipment fingerprint login data as the cluster center, reserves the original characteristics of the data, accords with the actual application scene of equipment fingerprint identification, takes an experience threshold value as a reference value of a trusted distance, performs data analysis according to the historical login information of the user equipment, and trains a corresponding self-adaptive cluster model for the equipment data of each user. The new login data are identified, whether the new login data are trusted devices is judged, firstly, historical data and real-time acquisition device login data are acquired, information capable of reflecting characteristics of terminal devices is extracted to serve as sample characteristics, the extracted sample characteristics are preprocessed, characteristic data of text types are converted into numerical values, and vectorization is carried out on the characteristic values, so that the method can be suitable for clustering algorithms. The multi-dimensional feature vector of the sample is used as input, the data is divided according to the corresponding clustering algorithm selected by the features of the fingerprint data of the user equipment, and the cluster centers of a plurality of clustered clusters are stored in the trusted fingerprint database, so that the new login equipment is identified, the technical problems of low identification precision, low identification safety and low reliability in the traditional technology are solved, and the method has high commercial value and practicability.

Claims (10)

1. An adaptive device identification method based on device fingerprint identification is characterized by comprising the following steps:
acquiring user login information, acquiring fingerprint login data of a user and user equipment data of a mobile terminal and a browser terminal in real time, and storing the fingerprint login data and the user equipment data as equipment record information;
screening the fingerprint login data, and extracting login feature information as sample feature information; digitizing the sample characteristic information to obtain a hash characteristic value, normalizing the hash characteristic value to be a sample characteristic value, and converting the sample characteristic value into a multidimensional characteristic vector; the step of digitizing the sample characteristic information to obtain a hash characteristic value, the step of normalizing the hash characteristic value into attribute data and converting the sample characteristic value into a multidimensional characteristic vector comprises the following steps: mapping the character string attribute value in the sample characteristic information into an integer interval of a specific bit number by adopting a hash method to obtain a hash characteristic value; normalizing the hash characteristic value to be the attribute data in a preset characteristic interval; converting the sample characteristic information into the multidimensional characteristic vector according to the attribute data;
taking the multidimensional feature vector of the sample as input, and determining the K value of a clustering algorithm according to a preset similarity measurement function; the method for determining the K value of the clustering algorithm by taking the multidimensional feature vector of the sample as input according to a preset similarity measurement function comprises the following steps: acquiring all the equipment record information corresponding to the user according to the multidimensional characteristic information; taking each piece of equipment record information as a vertex connecting line as an edge, and taking a hamming distance as the edge weight to construct a weighted undirected graph of the user; inputting the multidimensional feature vector and a preset threshold; randomly setting any one of the equipment record information in the weighted undirected graph as an initial cluster center; traversing the equipment record information, and judging whether the Hamming distance between the initial cluster center and the non-center equipment information is smaller than the preset threshold value; if yes, the current equipment record information is put into a close-range set, and the cluster center count is increased by 1; if not, the current equipment record information is put back to the original set for traversal; counting according to the cluster center to obtain the K value; re-clustering the equipment data of each user by using a self-adaptive clustering algorithm, determining a new clustering center until convergence, and calculating a credible threshold;
Determining a cluster center according to the clustering algorithm and storing;
and comparing the Hamming distance between the equipment information and the cluster center with a trusted threshold, and identifying the new login equipment according to the comparison result of the Hamming distance and the trusted threshold.
2. The method according to claim 1, wherein the acquiring the user login information, collecting the fingerprint login data of the user and the user device data of the mobile terminal and the browser terminal in real time and storing the fingerprint login data and the user device data as device record information includes:
acquiring fingerprint data of a current user, and acquiring the equipment record information corresponding to the fingerprint data;
acquiring an identifier according to the fingerprint data;
acquiring an original sample according to the identifier;
storing the identifier, and serializing the identifier into an identifier character string;
analyzing sample characteristics according to the identifier character string, and storing the sample characteristics;
acquiring the equipment record information according to the fingerprint data of the user;
and according to the sample characteristics, summarizing all the equipment record information of the user into an original training data set.
3. The method of claim 2, wherein said screening the fingerprint login data to extract device login feature information as sample feature information comprises:
Acquiring an implicit identifier in the identifier character string;
screening the hidden identifiers according to an information gain principle to obtain related identifiers;
sample characteristic information is selected according to the correlation identifier.
4. The method of claim 1, wherein said determining cluster centers and saving according to said clustering algorithm comprises:
acquiring the K value and initializing a cluster center;
taking each piece of equipment record information as a vertex connecting line as an edge, and taking a hamming distance as the edge weight to construct a weighted undirected graph of the user;
clustering objects in the dataset according to the hamming distance between the sample point and each center point;
and calculating the Hamming distance between two sample points in each cluster, and taking the sample point with the smallest sum of the distance to other records as a new cluster center.
5. The method of claim 4, wherein said comparing the hamming distance of the device information to the cluster center to a trusted threshold, and identifying a new login device based on the comparison of the hamming distance to the trusted threshold, comprises:
carrying out data preprocessing and feature extraction on the equipment record information;
extracting the cluster center and the credible threshold corresponding to the user;
Calculating the Hamming distance between the equipment record information and the center of each cluster, and judging whether all the Hamming distances of the equipment record information are larger than the credible threshold;
if yes, judging that the equipment corresponding to the equipment record information is not trusted;
if not, judging that the equipment record information corresponds to the equipment as the trusted equipment, and updating the cluster center of the user.
6. An adaptive device identification system based on device fingerprint identification, comprising: the device comprises a login information acquisition module, a feature extraction module, a vector acquisition module, a K value calculation module, a cluster center determination module and an equipment authentication module;
the login information acquisition module is used for acquiring user login information, acquiring fingerprint login data of the user and user equipment data of the mobile terminal and the browser terminal in real time and storing the fingerprint login data and the user equipment data as equipment record information;
the characteristic extraction module is used for screening the fingerprint login data and extracting equipment login characteristic information as sample characteristic information;
the vector acquisition module is used for digitizing the sample characteristic information to obtain a hash characteristic value, normalizing the hash characteristic value to be a sample characteristic value and converting the sample characteristic value into a multidimensional characteristic vector;
The K value calculation module is used for taking the multidimensional feature vector of the sample as input and determining the K value of a clustering algorithm according to a preset similarity measurement function;
the cluster center determining module is used for determining a cluster center according to the clustering algorithm and storing the cluster center;
the device authentication module is used for comparing the Hamming distance between the device information and the cluster center with a trusted threshold value, and identifying new login devices according to the comparison result of the Hamming distance and the trusted threshold value;
the vector acquisition module includes: the device comprises a hash eigenvalue module, an eigenvalue normalization module and a vector conversion module; the hash characteristic value module is used for mapping the character string attribute value in the sample characteristic information into an integer interval of a specific bit number by adopting a hash method to obtain a hash characteristic value;
the characteristic value normalization module is used for normalizing the hash characteristic value into attribute data in a preset characteristic interval; the vector conversion module is used for converting the sample characteristic information into the multidimensional characteristic vector according to the attribute data;
the K value calculation module comprises: the device comprises a device record information extraction module, an undirected graph construction module, a data input module, a cluster center initialization module, a distance judgment module, a K value accumulation module, a cycle traversal module, a K value acquisition module and a cluster center update module;
The equipment record information extraction module is used for acquiring all the equipment record information corresponding to the user according to the multidimensional characteristic information;
the undirected graph construction module is used for constructing a weighted undirected graph of the user by taking each piece of equipment record information as a vertex connecting line as an edge and taking a hamming distance as the edge weight;
the data input module is used for inputting the multidimensional feature vector and a preset threshold value;
the cluster center initializing module is used for randomly setting any one of the equipment record information in the weighted undirected graph as an initial cluster center;
the distance judging module is used for traversing the equipment record information and judging whether the Hamming distance between the initial cluster center and the non-center equipment information is smaller than the preset threshold value or not;
the K value accumulation module is used for putting the current equipment record information into a close-range set when the Hamming distance between the initial cluster center and the non-center equipment information is smaller than the preset threshold value, and adding 1 to the cluster center count;
the circulation traversing module is used for putting the current equipment record information back to the original set for traversing when the Hamming distance between the initial cluster center and the non-center equipment information is not smaller than the preset threshold value;
The K value acquisition module is used for counting according to the cluster center to obtain the K value;
and the cluster center updating module is used for re-clustering the equipment data of each user by using a self-adaptive clustering algorithm, determining a new cluster center until convergence, and calculating a credibility threshold.
7. The system of claim 6, wherein the login information acquisition module comprises: the device comprises a fingerprint device information module, a fingerprint identifier acquisition module, an original sample module, a character string module, a sample characteristic analysis module, a device information extraction module and an original data set module;
the fingerprint equipment information module is used for acquiring fingerprint data of a current user and collecting the equipment record information corresponding to the fingerprint data;
the fingerprint identifier acquisition module is used for acquiring an identifier according to the fingerprint data;
the original sample module is used for acquiring an original sample according to the identifier;
the character string module is used for storing the identifier and serializing the identifier into an identifier character string;
the sample characteristic analysis module is used for analyzing sample characteristics according to the identifier character string and storing the sample characteristics;
The device information extraction module is used for acquiring the device record information according to the fingerprint data of the user;
and the original data set module is used for summarizing all the equipment record information of the user into an original training data set according to the sample characteristics.
8. The system of claim 7, wherein the feature extraction module comprises: the system comprises a hidden identifier acquisition module, a related identifier acquisition module and a sample characteristic acquisition module;
the hidden identifier acquisition module is used for acquiring the hidden identifier in the identifier character string;
the related identifier acquisition module is used for screening the hidden identifier according to the information gain principle to obtain a related identifier;
the sample characteristic acquisition module is used for selecting sample characteristic information according to the related identifier.
9. The system of claim 6, wherein the cluster center determination module comprises: the system comprises a clustering initial module, an undirected graph module, an object clustering module and a cluster center module;
the cluster initial module is used for acquiring the K value and initializing a cluster center;
the undirected graph module is used for constructing a weighted undirected graph of the user by taking each piece of equipment record information as a vertex connecting line as an edge and taking a hamming distance as the edge weight;
The object clustering module is used for clustering objects in the data set according to the Hamming distance between the sample point and each center point;
and the cluster center module is used for calculating the Hamming distance between two sample points in each cluster, and taking the sample point with the smallest sum of the distance between the sample point and other records as a new cluster center.
10. The system of claim 6 or 9, wherein the device authentication module comprises: the device comprises a device information extraction module, a user information extraction module, a device judgment module, an illegal device judgment module and a legal device judgment module;
the device information extraction module is used for carrying out data preprocessing and feature extraction on the device record information;
the user information extraction module is used for extracting the cluster center and the credible threshold value corresponding to the user;
the device judging module is used for calculating the Hamming distance between the device record information and each cluster center and judging whether all the Hamming distances of the device record information are larger than the credible threshold value;
the illegal device judging module is used for judging that the device corresponding to the device record information is not trusted when all the Hamming distances of the device record information are larger than the trusted threshold;
And the legal equipment judging module is used for judging that the equipment record information corresponds to equipment as trusted equipment and updating the cluster center of the user when all the Hamming distances of the equipment record information are not all larger than the trusted threshold.
CN201710548621.7A 2017-07-06 2017-07-06 Self-adaptive equipment identification method and system based on fingerprint identification Active CN107392121B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710548621.7A CN107392121B (en) 2017-07-06 2017-07-06 Self-adaptive equipment identification method and system based on fingerprint identification

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710548621.7A CN107392121B (en) 2017-07-06 2017-07-06 Self-adaptive equipment identification method and system based on fingerprint identification

Publications (2)

Publication Number Publication Date
CN107392121A CN107392121A (en) 2017-11-24
CN107392121B true CN107392121B (en) 2023-05-09

Family

ID=60335631

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710548621.7A Active CN107392121B (en) 2017-07-06 2017-07-06 Self-adaptive equipment identification method and system based on fingerprint identification

Country Status (1)

Country Link
CN (1) CN107392121B (en)

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2019192197A (en) * 2018-03-02 2019-10-31 エーオー カスペルスキー ラボAO Kaspersky Lab System and method of identifying new devices during user's interaction with banking services
CN108600414B (en) * 2018-05-09 2022-04-26 中国平安人寿保险股份有限公司 Equipment fingerprint construction method and device, storage medium and terminal
CN108596271B (en) * 2018-05-09 2023-05-26 中国平安人寿保险股份有限公司 Evaluation method and device of fingerprint construction algorithm, storage medium and terminal
CN109657107B (en) * 2018-11-02 2021-01-01 同盾控股有限公司 Terminal matching method and device based on third-party application
CN109766678B (en) * 2018-12-12 2020-11-03 同济大学 Fingerprint identification authentication method, system, medium and equipment for mobile terminal equipment
CN109995751B (en) * 2019-02-13 2022-09-09 平安科技(深圳)有限公司 Internet access equipment marking method and device, storage medium and computer equipment
CN110072183B (en) * 2019-03-14 2020-09-04 天津大学 Passive positioning fingerprint database construction method based on crowd sensing
CN110189049A (en) * 2019-06-05 2019-08-30 重庆两江新区管理委员会 A kind of supervisory systems based on early warning mechanism
CN110300027A (en) * 2019-06-29 2019-10-01 西安交通大学 A kind of abnormal login detecting method
CN110458094B (en) * 2019-08-09 2020-12-18 国家计算机网络与信息安全管理中心 Equipment classification method based on fingerprint similarity
CN110738396A (en) * 2019-09-18 2020-01-31 阿里巴巴集团控股有限公司 method, device and equipment for extracting characteristics of equipment
CN110942536B (en) * 2019-11-15 2021-03-30 西安电子科技大学 Fingerprint identification unlocking system
CN110956468B (en) * 2019-11-15 2023-05-23 西安电子科技大学 Fingerprint payment system
CN113495710A (en) * 2020-03-18 2021-10-12 中国电信股份有限公司 Sound awakening processing method and device, sound analysis platform and storage medium
CN111400695B (en) * 2020-04-09 2024-05-10 中国建设银行股份有限公司 Equipment fingerprint generation method, device, equipment and medium
CN112152997B (en) * 2020-08-20 2021-10-22 同济大学 Equipment identification-oriented double-factor authentication method, system, medium and server
CN112650528B (en) * 2020-12-31 2024-05-14 新奥数能科技有限公司 Personalized algorithm generation method, device, electronic equipment and computer readable medium
CN113139082A (en) * 2021-05-14 2021-07-20 北京字节跳动网络技术有限公司 Multimedia content processing method, apparatus, device and medium
CN113612781A (en) * 2021-08-06 2021-11-05 公安部第三研究所 Banking network attack early warning method and system based on device fingerprint and related products
CN117390708B (en) * 2023-12-11 2024-02-23 南京向日葵大数据有限公司 Privacy data security protection method and system

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101674184A (en) * 2009-10-19 2010-03-17 北京微通新成网络科技有限公司 Identity recognition method based on user keystroke characteristic
CN105279405A (en) * 2015-10-28 2016-01-27 同济大学 Keypress behavior pattern construction and analysis system of touch screen user and identity recognition method thereof
CN106446148A (en) * 2016-09-21 2017-02-22 中国运载火箭技术研究院 Cluster-based text duplicate checking method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9135320B2 (en) * 2012-06-13 2015-09-15 Opera Solutions, Llc System and method for data anonymization using hierarchical data clustering and perturbation
CN104602183A (en) * 2014-04-22 2015-05-06 腾讯科技(深圳)有限公司 Group positioning method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101674184A (en) * 2009-10-19 2010-03-17 北京微通新成网络科技有限公司 Identity recognition method based on user keystroke characteristic
CN105279405A (en) * 2015-10-28 2016-01-27 同济大学 Keypress behavior pattern construction and analysis system of touch screen user and identity recognition method thereof
CN106446148A (en) * 2016-09-21 2017-02-22 中国运载火箭技术研究院 Cluster-based text duplicate checking method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
利用服务聚类优化面向过程模型的语义Web服务发现;孙萍等;计算机学报(第08期);52-65 *

Also Published As

Publication number Publication date
CN107392121A (en) 2017-11-24

Similar Documents

Publication Publication Date Title
CN107392121B (en) Self-adaptive equipment identification method and system based on fingerprint identification
CN109165639B (en) Finger vein identification method, device and equipment
CN111866196B (en) Domain name traffic characteristic extraction method, device and equipment and readable storage medium
EP4258610A1 (en) Malicious traffic identification method and related apparatus
CN112633051A (en) Online face clustering method based on image search
CN109886334A (en) A kind of shared nearest neighbor density peak clustering method of secret protection
CN111326236A (en) Medical image automatic processing system
WO2022180613A1 (en) Global iterative clustering algorithm to model entities&#39; behaviors and detect anomalies
CN116437355A (en) Radio frequency fingerprint-based wireless equipment identity authentication method and device
CN113315851A (en) Domain name detection method, device and storage medium
CN109766678B (en) Fingerprint identification authentication method, system, medium and equipment for mobile terminal equipment
CN114448657A (en) Power distribution communication network security situation perception and abnormal intrusion detection method
CN110225009A (en) It is a kind of that user&#39;s detection method is acted on behalf of based on communication behavior portrait
CN113886821A (en) Malicious process identification method and device based on twin network, electronic equipment and storage medium
Zou et al. Browser fingerprinting identification using incremental clustering algorithm based on autoencoder
CN113438239B (en) Network attack detection method and device based on depth k nearest neighbor
CN116186708A (en) Class identification model generation method, device, computer equipment and storage medium
CN111475380B (en) Log analysis method and device
CN115186138A (en) Comparison method and terminal for power distribution network data
CN112312590A (en) Equipment communication protocol identification method and device
CN115392238A (en) Equipment identification method, device, equipment and readable storage medium
CN113657443A (en) Online Internet of things equipment identification method based on SOINN network
CN114266046A (en) Network virus identification method and device, computer equipment and storage medium
CN116189706A (en) Data transmission method, device, electronic equipment and computer readable storage medium
Malach et al. Optimal face templates: the next step in surveillance face recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant