CN109376277B - Method and device for determining equipment fingerprint homology - Google Patents

Method and device for determining equipment fingerprint homology Download PDF

Info

Publication number
CN109376277B
CN109376277B CN201811406605.5A CN201811406605A CN109376277B CN 109376277 B CN109376277 B CN 109376277B CN 201811406605 A CN201811406605 A CN 201811406605A CN 109376277 B CN109376277 B CN 109376277B
Authority
CN
China
Prior art keywords
hash value
similarity
similarity hash
fingerprint
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811406605.5A
Other languages
Chinese (zh)
Other versions
CN109376277A (en
Inventor
陈海龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JD Digital Technology Holdings Co Ltd
Jingdong Technology Holding Co Ltd
Original Assignee
JD Digital Technology Holdings Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by JD Digital Technology Holdings Co Ltd filed Critical JD Digital Technology Holdings Co Ltd
Priority to CN201811406605.5A priority Critical patent/CN109376277B/en
Publication of CN109376277A publication Critical patent/CN109376277A/en
Application granted granted Critical
Publication of CN109376277B publication Critical patent/CN109376277B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Collating Specific Patterns (AREA)
  • Storage Device Security (AREA)

Abstract

The invention discloses a method and a device for determining equipment fingerprint homology, and relates to the technical field of internet. One embodiment of the method comprises: acquiring a first device fingerprint and a second device fingerprint; determining a first similarity hash value according to the first device fingerprint; determining a second similarity hash value according to the second device fingerprint; determining a similarity between the first similarity hash value and the second similarity hash value; determining that the first device fingerprint and the second device fingerprint are homologous when the similarity is less than a preset threshold. According to the embodiment, the similarity hash algorithm is adopted to calculate the corresponding similarity hash values of the device fingerprints, and whether the different device fingerprints are homologous or not is determined according to the similarity between the similarity hash values corresponding to the different device fingerprints, so that the method can be used for searching and finding the device aggregation from the acquired massive device fingerprint data to find abnormal devices or abnormal user operation behaviors.

Description

Method and device for determining equipment fingerprint homology
Technical Field
The invention relates to the technical field of internet, in particular to a method and a device for determining equipment fingerprint homology.
Background
With the continuous development of internet technology, more and more transactions are transferred from off-line to on-line. Because the identities of both internet transaction parties are concealed, the cheating group sees the business opportunity, and implements various cheating or attacking behaviors by analyzing the business and technical vulnerabilities of large internet companies. For example, the same user or a large number of users use the same device to send a large number of requests to the server, referred to as "device aggregation".
The condition of centralized use of the same equipment can occur in the actions of batch registration, library collision login, batch ordering and the like. Theoretically, if each device can be assigned a unique device id, then such aggregative anomalies can be discovered by device id, but the characteristics of JS device fingerprints determine that assigning a unique id to each device is not possible, so that such aggregations of devices need to be discovered by similarity.
Aiming at the network fraud behaviors, each internet company can establish a corresponding wind control means to prevent black-birth users and ensure the capital and property safety of normal users. The core of wind control aims at preventing people behind account numbers, but the characteristics of the internet also aim at that the identity of a user cannot be uniquely determined only through the behaviors on a subscriber line.
According to the JS device fingerprint technology (the device fingerprint refers to a device feature which can be used for uniquely identifying the device or a unique device identifier), a JavaScript code is embedded in a front-end page, when a user accesses the page by using a browser, the JavaScript code can collect various information of user equipment, the information is reported to a server after the information collection is completed, the server can distribute a unique id to the current device according to the collected information, and when the user accesses the same page next time, the server can return the same id according to the collected information. This method can guarantee the uniqueness of the generated device id, but often the stability cannot be guaranteed.
Since JavaScript is a front-end technology, various data collected by a code deployed at the front end is easily tampered, and after the tampered device index is reported to the server, the server may not query the device id corresponding to the current device information. Although this problem can be solved by adding a cache at the client and using some strong-uniqueness index to identify the device at the server, the id of the device cannot be tracked as long as the user clears the cache and modifies the strong-dependence index.
With the continuous improvement of black production technology, the indexes such as IP (Internet protocol), user agent (user agent) and the like can be easily modified by a bad user through various tools, so that the purpose of hiding own equipment is achieved, and the purpose of equipment identification cannot be realized by equipment fingerprints.
It is common in wind control to determine whether a user on one device is at risk through device aggregation because a risky user performs a number of similar operations on the same device. If the id of the device changes, then the aggregative analysis cannot be implemented.
Disclosure of Invention
In view of this, embodiments of the present invention provide a method and an apparatus for determining device fingerprint homology, and a method and an apparatus for discovering abnormal devices, which can effectively search and discover device aggregation from acquired massive device fingerprint data to discover abnormal devices or abnormal user operation behaviors, and provide a basis for implementation decision of wind control.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method of determining device fingerprint homology, including: acquiring a first device fingerprint and a second device fingerprint; determining a first similarity hash value according to the first device fingerprint; determining a second similarity hash value according to the second device fingerprint; determining a similarity between the first similarity hash value and the second similarity hash value; determining that the first device fingerprint and the second device fingerprint are homologous when the similarity is less than a preset threshold.
Optionally, the determining the first similarity hash value according to the first device fingerprint includes:
performing word segmentation processing on the first device fingerprint to obtain a plurality of keywords; performing hash calculation on the plurality of keywords to obtain a plurality of keyword hash values; and determining the first similarity hash value according to the plurality of keyword hash values.
Optionally, the determining the first similarity hash value according to the plurality of keyword hash values includes: performing a weighted summation of the plurality of keyword hash values to determine the first similarity hash value; and the weight of each keyword hash value is determined according to the inverse text frequency index of the keyword.
Optionally, the first device fingerprint and the second device fingerprint respectively include at least one of: operating system name, operating system version, font list, and plug-in list of the corresponding device.
In order to achieve the above object, according to another aspect of the embodiments of the present invention, there is provided an abnormal device discovery method, including: determining the equipment fingerprint of the current user, calculating the similarity hash value of the equipment fingerprint of the current user, and taking the current similarity hash value of the equipment fingerprint of the current user as the current similarity hash value data; calculating a plurality of similarity hash values of a plurality of device fingerprints in a device fingerprint library; inquiring and determining the number of the similarity hash value data of which the similarity between the current similarity hash value data and the similarity hash value database is smaller than a preset threshold value; and when the number is larger than a set value, determining that the equipment of the current user is abnormal equipment.
Optionally, the method further includes: dividing the similarity hash value into a plurality of subdivisions; and storing each part in the plurality of sub parts as an index in association with the corresponding similarity hash value respectively.
Optionally, the current similarity hash value data includes the current similarity hash value and a plurality of sub-portions corresponding to the current similarity hash value;
the querying and determining the number of the similarity hash value data of which the similarity value with the current similarity hash value data is smaller than a preset threshold value in the similarity hash value database includes: taking any one of the subsections of the current similarity hash value as an index, inquiring the subsection which is the same as the index in a similarity hash value library, and taking the similarity hash value which corresponds to the subsection which is the same as the index as a candidate hash value; respectively calculating the similarity between each similarity hash value in the candidate hash values and the current similarity hash value to obtain a plurality of similarities; and determining the number of the similarity degrees smaller than a preset threshold value from the plurality of similarity degrees.
To achieve the above object, according to still another aspect of embodiments of the present invention, there is provided an apparatus for determining device fingerprint homology, including: the information acquisition module is used for acquiring a first device fingerprint and a second device fingerprint; a first hash value determination module, configured to determine a first similarity hash value according to the first device fingerprint; the second hash value determining module is used for determining a second similarity hash value according to the second device fingerprint; a similarity calculation module for determining a similarity between the first similarity hash value and the second similarity hash value; and the homology determining module is used for determining that the first device fingerprint and the second device fingerprint are homologous when the similarity is smaller than a preset threshold value.
Optionally, the first hash value determining module includes: the word segmentation processing unit is used for carrying out word segmentation processing on the first equipment fingerprint to obtain a plurality of key words; the word hash value determining unit is used for carrying out hash calculation on the plurality of keywords to obtain a plurality of keyword hash values; a first hash value determination unit configured to determine the first similarity hash value according to the plurality of keyword hash values.
Optionally, the determining the first similarity hash value according to the plurality of keyword hash values includes: performing a weighted summation of the plurality of keyword hash values to determine the first similarity hash value; and the weight of each keyword hash value is determined according to the inverse text frequency index of the keyword.
Optionally, the first device fingerprint and the second device fingerprint respectively include at least one of: operating system name, operating system version, font list, and plug-in list of the corresponding device.
To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided an abnormal device discovery apparatus including: the current hash value determining module is used for acquiring the equipment fingerprint of the current user, calculating the similarity hash value of the equipment fingerprint of the current user, and taking the current similarity hash value of the equipment fingerprint of the current user as current similarity hash value data; a computing module to compute a plurality of similarity hash values for a plurality of device fingerprints in the device fingerprint repository; the query determining module is used for querying and determining the number of the similarity hash value data of which the similarity with the current similarity hash value data is smaller than a preset threshold value in the similarity hash value database; and the abnormal equipment determining module is used for determining that the equipment of the current user is abnormal equipment when the number is larger than a set value.
Optionally, the calculation module includes: dividing the similarity hash value into a plurality of subdivisions; and storing each part in the plurality of sub parts as an index in association with the corresponding similarity hash value respectively.
Optionally, the current similarity hash value data includes the current similarity hash value and a plurality of sub-portions corresponding to the current similarity hash value;
the query determination module is further to: taking any one of the subsections of the current similarity hash value as an index, inquiring the subsection which is the same as the index in a similarity hash value library, and taking the similarity hash value which corresponds to the subsection which is the same as the index as a candidate hash value; respectively calculating the similarity between each similarity hash value in the candidate hash values and the current similarity hash value to obtain a plurality of similarities; and determining the number of the similarity degrees smaller than a preset threshold value from the plurality of similarity degrees.
According to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus including: one or more processors; storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out a method according to any one of the above-described embodiments of the invention.
According to a further aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, performs the method of any of the above-described embodiments of the present invention.
The above embodiments have at least the following advantages or benefits: the similarity hash algorithm is adopted to calculate the device fingerprints to obtain corresponding similarity hash values, and whether the different device fingerprints are homologous (i.e., whether the different device fingerprints are from operation on the same device) is determined according to the similarity (for example, hamming distance) between the similarity hash values corresponding to the different device fingerprints, so that the device clustering property can be effectively searched and found from the acquired massive device fingerprint data to find abnormal devices or abnormal user operation behaviors, and a basis is provided for implementation decision of wind control.
Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a flow chart of one embodiment of a method of determining device fingerprint homology of the present invention;
FIG. 2 is a diagram illustrating an embodiment of computing a similarity hash value according to the present invention;
FIG. 3 is a flowchart of an embodiment of an abnormal device discovery method according to the present invention;
FIG. 4 is a diagram illustrating partitioning and storing similarity hash values in the present invention;
FIG. 5 is a diagram illustrating an embodiment of an apparatus for determining device fingerprint homology according to the present invention;
FIG. 6 is a diagram illustrating a first hash value determination module according to an embodiment of the present invention;
FIG. 7 is a diagram illustrating an abnormal device discovery apparatus according to an embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;
fig. 9 is a schematic structural diagram of a computer system suitable for implementing a terminal device or a server according to an embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
The embodiment of the invention aims to analyze the similarity of the equipment by analyzing the fingerprint index of the collected JS equipment, and if a user performs a large number of similar operations on similar equipment, the clustering of the equipment can be reflected. The embodiment of the invention provides a method for identifying risk equipment by utilizing similarity of acquired equipment indexes, aiming at the problem of poor fingerprint stability of JS equipment.
Fig. 1 is a flowchart of an embodiment of a method for determining device fingerprint homology according to the present invention, which can be applied to search from massive device fingerprint data to find device aggregations, and further identify risk devices.
As shown in fig. 1, the method includes:
step S101: acquiring a first device fingerprint and a second device fingerprint;
for example, a JS device fingerprint technology can be used to obtain a device fingerprint (the device fingerprint refers to a device feature or a unique device identifier that can be used to uniquely identify the device), and by embedding a JavaScript code in a front-end page, when a user accesses the page with a browser, the JavaScript code will collect various information of the user device, and after the information collection is completed, the information is reported to a server.
Wherein the first device fingerprint and the second device fingerprint each comprise at least one of: operating system name, operating system version, font list, and plug-in list of the corresponding device.
In an alternative embodiment, the first device fingerprint and the second device fingerprint each include hundreds of device metrics for the respective device, including an operating system name, an operating system version, a font list, a plug-in list, and the like.
Step S102: determining a first similarity hash value according to the first device fingerprint;
illustratively, determining the first similarity hash value from the first device fingerprint comprises:
performing word segmentation processing on the first device fingerprint to obtain a plurality of keywords;
performing hash calculation on the plurality of keywords to obtain a plurality of keyword hash values;
and determining the first similarity hash value according to the plurality of keyword hash values.
Step S103: determining a second similarity hash value according to the second device fingerprint;
illustratively, determining the second similarity hash value from the second device fingerprint comprises: performing word segmentation processing on the second device fingerprint to obtain a plurality of keywords;
performing hash calculation on the plurality of keywords to obtain a plurality of keyword hash values;
and determining the second similarity hash value according to the plurality of keyword hash values.
Step S104: determining a similarity between the first similarity hash value and the second similarity hash value.
For example, the similarity between the first similarity hash value and the second similarity hash value may be represented by a hamming distance between the first similarity hash value and the second similarity hash value. In the information theory, the hamming distance between two character strings with equal length is the number of different characters at the corresponding positions of the two character strings. In other words, it is the number of characters that need to be replaced to convert one string into another.
Step S105: determining that the first device fingerprint and the second device fingerprint are homologous when the similarity is less than a preset threshold. The preset threshold may be flexibly set according to an application scenario, and the present invention is not limited herein.
The method of the embodiment of the invention calculates the device fingerprints through the similarity hash algorithm to obtain the corresponding similarity hash values, and determines whether the different device fingerprints are homologous according to the similarity between the similarity hash values corresponding to the different device fingerprints, so that the method can be used for searching and finding the device aggregation from the acquired massive device fingerprint data to find abnormal devices or abnormal user operation behaviors.
In an alternative embodiment, the determining the first similarity hash value from the plurality of keyword hash values comprises: performing a weighted summation of the plurality of keyword hash values to determine the first similarity hash value; and the weight of each keyword hash value is determined according to the inverse text frequency index of the keyword.
In some embodiments of the present invention, a similarity hash (SimHash) algorithm is used to calculate the similarity between device fingerprints, thereby calculating the aggregation of the devices and improving the stability of the device fingerprints. SimHash is a similarity hash algorithm, the algorithm can convert the device fingerprint into a 64-bit (64-bit) hash value, the converted hash value can reflect the similarity of the device fingerprint, and the similarity of the two device fingerprints can be judged by calculating the Hamming distance between the two hash values.
Fig. 2 is a schematic diagram of an embodiment of calculating a similarity hash value according to the present invention, which is divided into 5 steps, i.e., word segmentation, hash, weighting, merging, and dimension reduction. The following describes in detail an embodiment in the device fingerprint similarity algorithm.
The word segmentation is to split the acquired index of the device fingerprint into key words, and each word is a value corresponding to one device fingerprint index item. Because the indexes of the device fingerprint are different, different keyword splitting schemes can be provided for different indexes. For a simple index such as an operating system name and an operating system version, the value of the field may be directly used as a key. Some indexes are reported in the form of lists, such as font lists, plug-in lists, etc., and these indexes need to split each single value in the list as a keyword. And other indexes are long, the indexes also contain rich information, for example, information such as a browser name and a browser version can be analyzed from a user agent (user agent), the complex indexes occupy less in all acquired JS equipment indexes, and customized splitting processing is required.
All the indexes put together constitute a set of device fingerprint index keys. All the split keywords are set as follows:
f1,f2……fN
where N is the number of keywords, the original information is represented as a vector of keywords.
And (3) hashing: after the vectors of the keywords are obtained, hashing needs to be performed on each keyword, namely, a second step of calculating a similarity hash value. In this application canAny hashing Algorithm is used, for example, the five algorithms of the SHA (Secure Hash Algorithm) family, SHA-1, SHA-224, SHA-256, SHA-384, and SHA-512; or Murmur hash, which is used in the implementation process of the present application. A hash algorithm is selected to convert each keyword in the keyword vector to a 64-bit hash value (2, such as 32-bit or 128-bit)nBit), the conversion formula is as follows:
Hash(fi)=bi1bi2……bi64,i=1,2,…,N
wherein, bimFor 0 or 1, m takes values from 1 to 64, so we convert the original keyword vector with length N into a 64-bit (64-bit) hash value vector with length N.
Weighting: different keywords in the acquired device fingerprint information have different discriminations for the devices, and therefore different weights should be assigned to each keyword. Because the indexes collected by the device fingerprint index are not repeated in general, an inverse text frequency Index (IDF) is used as the weight of the keyword. The formula for the IDF is as follows:
Figure BDA0001877514110000101
where D is the total number of new devices that have access to the server over a period of time, kiIs the index-related key word f collected in the D devicesiThe logarithmic function used is base 10, or any integer greater than 1. Calculating the weight of each keyword to obtain a weight vector as follows:
ω1,ω2……ωN
order to
Figure BDA0001877514110000102
Then for the ith keyword fiFor example, the weighting is to be 6The 4-bit hash value is converted into a 64-dimensional vector as follows:
hi=(ωig(bi1,ωig(bi2)……ωig(bi64)),i=1,2,…,N
and merging, and performing weighted operation on the N keywords to obtain weighted vectors of the keywords. And finally, combining and adding the weighted hash vectors of all the words to obtain the weighted hash vector sum of the equipment:
Figure BDA0001877514110000103
dimension reduction, namely changing a weighted hash vector obtained by weighting and combining the hash values of the N keywords into a 64-bit hash vector. The similarity hash value of the device fingerprint after dimensionality reduction can be obtained by applying the following formula to each bit of the merged hash vector, for example, the weighted hash vector may be: (10, -20, 100, 50, 1, 34, 23), then the dimensionality reduction becomes: (1, 0, 1, 1., 0, 1, 1) (1 for greater than 0, 0 for less than or equal to 0).
Figure BDA0001877514110000111
The expression of the final similarity hash value is:
Figure BDA0001877514110000112
fig. 3 is a flowchart of an embodiment of an abnormal device discovery method according to the present invention, where the method includes:
step S201: acquiring the device fingerprint of the current user, calculating the similarity hash value of the device fingerprint of the current user, and taking the current similarity hash value of the device fingerprint of the current user as the current similarity hash value data. Wherein, the similarity hash value of the device fingerprint of the current user can be calculated according to the method shown in fig. 1.
Step S202: device fingerprints of a plurality of devices are collected within a set time and stored as a device fingerprint library.
By setting the time limit, if abnormal behaviors such as too frequent accesses, registrations, orders, and the like from the same device exist in a short time, it can be determined that the device is an abnormal device and the corresponding current operation behavior on the device is an abnormal behavior.
The setting time may be several seconds, several minutes, or several hours, which is not limited by the present invention.
Step S203: and calculating a plurality of similarity hash values of a plurality of device fingerprints in the device fingerprint database, and storing the similarity hash values as a similarity hash value database according to a preset format.
Specifically, the following processing is respectively performed on the plurality of similarity hash values:
dividing the similarity hash value into a plurality of subdivisions;
and storing each part in the plurality of sub parts as an index in association with the corresponding similarity hash value respectively.
In the similarity hash value library, the data obtained above may be stored in the form of a table, for example, as shown in fig. 4.
Step S204: inquiring and determining the number of the similarity hash value data of which the similarity between the current similarity hash value data and the similarity hash value database is smaller than a preset threshold value;
specifically, the current similarity hash value data includes the current similarity hash value and a plurality of subsections corresponding to the current similarity hash value;
at this time, querying and determining the number of the similarity hash value data of which the similarity value with the current similarity hash value data is smaller than a preset threshold in the similarity hash value library includes:
taking any one of the subsections of the current similarity hash value as an index, inquiring the subsection which is the same as the index in a similarity hash value library, and taking the similarity hash value which corresponds to the subsection which is the same as the index as a candidate hash value;
calculating a similarity between each of the candidate hash values and the current similarity hash value respectively to obtain a plurality of similarities (e.g., a hamming distance between each of the candidate hash values and the current similarity hash value may be calculated);
and determining the number of the similarity degrees smaller than a preset threshold value from the plurality of similarity degrees.
In this embodiment, the similarity hash value is divided into a plurality of sub-parts, when the similarity hash value data with the similarity degree smaller than the preset threshold value is queried, any one of the sub-parts is used as an index, the sub-part same as the index is queried in the similarity hash value library, and the similarity hash value corresponding to the sub-part same as the index is used as a candidate hash value, so that a fast search method is provided, which can perform real-time calculation and query in case of an excessively large data size.
Step S205: and when the number is larger than a set value, determining that the equipment of the current user is abnormal equipment. The setting value can be flexibly set according to the application scene, and the invention is not limited herein.
According to the method and the device, the corresponding similarity hash value of the device fingerprint is calculated by adopting a similarity hash algorithm, and whether the different device fingerprints are homologous or not is determined according to the similarity between the similarity hash values corresponding to the different device fingerprints, so that the method and the device can be used for searching and finding the aggregation of the devices from the acquired massive device fingerprint data to find abnormal devices or abnormal user operation behaviors.
The similarity hash of the device fingerprints can be calculated to solve the problem of device similarity, and the method provided by the invention solves the problem of how to perform similarity search under the condition of mass data. Because the time complexity of searching for a device similar to the current similarity hash value is O (n) (there is a function in the parentheses after "O" indicating the relationship between the time/space consumption of a certain algorithm and the amount of data increase, where n represents the amount of input data.
Therefore, the fast search concept of the similarity hash in the embodiment of the present invention is derived from the drawer principle, and n +1 elements are put into n sets, where at least two elements in one set must be present. Since our goal is to find devices with a hamming distance of 3 or less from the current device's similarity hash value. We can split the 64-bit hash into 4 segments of 16 bits each. According to the drawer principle, the similarity hash values of two devices within 3 must have one segment identical.
Based on this theory, for each device, we acquire the device fingerprint of the device, calculate the similarity hash value, divide the similarity hash value into 4 segments, and store the complete similarity hash value corresponding to each segment as an index, as shown in fig. 4, so that the similarity hash values of the same device are stored in 4 copies. During query, the similarity hash value to be queried (namely, the similarity hash value of the device fingerprint of the current user) is also divided into 4 segments, 16 bits of each segment are used as keys, all the similarity hash values corresponding to the segment are queried from the database, and the candidate similarity hash value is obtained.
And traversing the candidate values, respectively calculating the distances between the candidate values and the similarity hash value to be inquired, and finding out the similarity hash value with the real distance within 3. If the hash values are uniform enough, searching for similar equipment can be changed from the original global search into the search only from the range of 4/2^16 ≈ 1/16000 of the original data, the range of data search is greatly reduced, and if the data search range is still large after being reduced by 16000 times, the method can be recursively used for the remaining 48 bits after being divided into 4 segments, and the segmented storage and the search are continued. The method is a space-time-conversion accelerated search method, so that real-time calculation and query can be realized under the condition that the data volume is too large, such as billions.
Illustratively, assume that there are 3 hash values: a1 ═ (a11, a12, a13, a 14);
a2=(a21,a22,a23,a24);a3=(a31,a32,a33,a34);
before the hash value is not split, the hamming distances of a1 from a2 and a3 are calculated, respectively, provided that a device similar to a1 is determined.
After the segmentation, whether a21 and a31 are the same as a11 or not is judged, if a21 and a31 are both the same as a11, a2 and a3 are both used as candidate values, and the Hamming distance between a1 and a2 and the Hamming distance between a1 and a3 are calculated.
In embodiments of the present invention, the actual situation is a lookup problem, not a comparison problem.
In a non-cutting way, if 1 hundred million data exist in a database, one hundred million data are searched from the one hundred million data, and the data similar to a1 need to be extracted, and one data is calculated with a1 until the data a2 and a3 similar to a1 are found. This is at the cost of searching all the data in the database and calculating one hundred million hamming distances.
In a slicing way, the method of querying is to query the data which is the same as the data of the first 16 bits of a1, namely querying according to a11, and can query about 10^8/2^16 ≈ 1500 data including a2 and a 3. Then, according to the query of a12, about 1500 additional pieces of data are obtained. By analogy, four sections of query results are obtained, and the four sections of query results are merged to obtain about 6000 candidate similar data. The hamming distance is then calculated using a1 and the six thousand data, respectively, and the final result would also be a2 and a 3. Compared with the original Hamming distance calculated one hundred million times, the process is reduced by more than ten thousand times.
Ideally, each bit 0 and 1 of all 64-bit hashes has the same probability of occurrence, so that the performance of the method for searching similar devices can reach the theoretical maximum. In the worst case, the first 48 bits of the hash values generated by all the devices are the same, so the above-mentioned segmentation method cannot accelerate the calculation, because all the data are searched by any segment, and the candidate similar data are all the data of the database. Normally, the hash algorithm will make the probability of each bit occurring substantially the same.
In practical application, the search is accelerated in a 4-segment mode, all the devices with the similarity within 3 of the current similarity hash can be returned in real time, so that the similarity between the devices and the current accessed devices can be effectively inquired, and a basis is provided for implementation decision of wind control.
Fig. 5 is a schematic diagram of main blocks of an apparatus 500 for determining device fingerprint homology according to an embodiment of the present invention, the apparatus 500 including:
an information obtaining module 510, configured to obtain a first device fingerprint and a second device fingerprint;
a first hash value determination module 520, configured to determine a first similarity hash value according to the first device fingerprint;
a second hash value determination module 530, configured to determine a second similarity hash value according to the second device fingerprint;
a similarity calculation module 540, configured to determine a similarity between the first similarity hash value and the second similarity hash value;
a homology determining module 550, configured to determine that the first device fingerprint and the second device fingerprint are homologous when the similarity is smaller than a preset threshold.
As shown in fig. 6, in an alternative embodiment, the first hash value determining module 520 includes:
a word segmentation processing unit 521, configured to perform word segmentation processing on the first device fingerprint to obtain a plurality of keywords;
a word hash value determining unit 522, configured to perform a hash calculation on the multiple keywords to obtain multiple keyword hash values;
a first hash value determining unit 523 configured to determine the first similarity hash value according to the plurality of keyword hash values.
In an optional embodiment, the first hash value determination unit is further configured to: performing a weighted summation of the plurality of keyword hash values to determine the first similarity hash value; and the weight of each keyword hash value is determined according to the keyword reverse text frequency index.
In an optional embodiment, the first device fingerprint and the second device fingerprint each comprise at least one of: operating system name, operating system version, font list, and plug-in list of the corresponding device.
Fig. 7 is a schematic diagram of main blocks of an abnormal device discovery apparatus 700 according to an embodiment of the present invention, where the apparatus 700 includes:
a current hash value determining module 710, configured to obtain a device fingerprint of a current user, calculate a similarity hash value of the device fingerprint of the current user, and use the current similarity hash value of the device fingerprint of the current user as a current similarity hash value data;
a calculating module 720, configured to calculate a plurality of similarity hash values of a plurality of device fingerprints in the device fingerprint library;
a query determining module 730, configured to query and determine the number of the similarity hash value data in the similarity hash value database, where the similarity between the current similarity hash value data and the current similarity hash value data is smaller than a preset threshold;
and an abnormal device determining module 740, configured to determine that the device of the current user is an abnormal device when the number is greater than the set value.
In an alternative embodiment, the calculation module 710 is further configured to: dividing the similarity hash value into a plurality of subdivisions; and storing each part in the plurality of sub parts as an index in association with the corresponding similarity hash value respectively.
In an optional embodiment, the current similarity hash value data includes the current similarity hash value and a plurality of subsections corresponding to the current similarity hash value; in this case, the query determining module 720 is further configured to:
taking any one of the subsections of the current similarity hash value as an index, inquiring the subsection which is the same as the index in a similarity hash value library, and taking the similarity hash value which corresponds to the subsection which is the same as the index as a candidate hash value;
respectively calculating the similarity between each similarity hash value in the candidate hash values and the current similarity hash value to obtain a plurality of similarities;
and determining the number of the similarity degrees smaller than a preset threshold value from the plurality of similarity degrees. Current similarity hash value
The device can execute the method provided by the embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method. For technical details that are not described in detail in this embodiment, reference may be made to the method provided by the embodiment of the present invention.
Fig. 8 shows an exemplary system architecture 800 of a method for determining device fingerprint homology or an apparatus for determining device fingerprint homology to which embodiments of the present invention may be applied.
As shown in fig. 8, the system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves to provide a medium for communication links between the terminal devices 801, 802, 803 and the server 805. Network 804 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.
A user may use the terminal devices 801, 802, 803 to interact with a server 805 over a network 804 to receive or send messages or the like. Various communication client applications may be installed on the terminal devices 801, 802, 803.
The terminal devices 801, 802, 803 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 805 may be a server that provides various services, such as a background management server that supports shopping websites browsed by users using the terminal devices 801, 802, 803. The background management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (e.g., target push information and product information) to the terminal device.
It should be noted that the method for reducing inventory provided by the embodiment of the present invention is generally performed by the server 805, and accordingly, the apparatus for determining device fingerprint homology is generally disposed in the server 805.
It should be understood that the number of terminal devices, networks, and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 9, shown is a block diagram of a computer system 900 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU)901 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.
The following components are connected to the I/O interface 905: an input portion 906 including a keyboard, a mouse, and the like; an output section 907 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 908 including a hard disk and the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as necessary. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 910 as necessary, so that a computer program read out therefrom is mounted into the storage section 908 as necessary.
In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 909, and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are executed when the computer program is executed by a Central Processing Unit (CPU) 901.
It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes a sending module, an obtaining module, a determining module, and a first processing module. The names of these modules do not in some cases constitute a limitation on the unit itself, and for example, the sending module may also be described as a "module that sends a picture acquisition request to a connected server".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise:
acquiring a first device fingerprint and a second device fingerprint;
determining a first similarity hash value according to the first device fingerprint;
determining a second similarity hash value according to the second device fingerprint;
determining a similarity between the first similarity hash value and the second similarity hash value;
determining that the first device fingerprint and the second device fingerprint are homologous when the similarity is less than a preset threshold.
According to the technical scheme, the corresponding similarity hash values of the device fingerprints are calculated by adopting a similarity hash algorithm, whether the different device fingerprints are homologous (namely whether the different device fingerprints are from operation on the same device) is determined according to the similarity (such as Hamming distance) between the similarity hash values corresponding to the different device fingerprints, so that the device aggregation can be effectively searched and found from the acquired massive device fingerprint data to find abnormal devices or abnormal user operation behaviors, and a basis for wind control implementation decision can be provided for searching and finding the device aggregation from the acquired massive device fingerprint data to find the abnormal devices or the abnormal user operation behaviors.
The above-described embodiments should not be construed as limiting the scope of the invention. Those skilled in the art will appreciate that various modifications, combinations, sub-combinations, and substitutions can occur, depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (2)

1. An abnormal device discovery method, comprising:
determining the equipment fingerprint of the current user, calculating the similarity hash value of the equipment fingerprint of the current user, and taking the similarity hash value of the equipment fingerprint of the current user as the current similarity hash value data;
calculating a plurality of similarity hash values of a plurality of device fingerprints in a device fingerprint library;
inquiring and determining the number of the similarity hash value data of which the similarity between the current similarity hash value data and the similarity hash value database is smaller than a preset threshold value;
when the number is larger than a set value, determining that the equipment of the current user is abnormal equipment;
the method further comprises the following steps: dividing the similarity hash value into a plurality of subdivisions; storing each part in the plurality of sub parts as an index in association with the corresponding similarity hash value respectively;
the current similarity hash value data comprises a similarity hash value of the device fingerprint of the current user and a plurality of subsections corresponding to the similarity hash value of the device fingerprint of the current user;
the querying and determining the number of the similarity hash value data of which the similarity value with the similarity hash value data of the device fingerprint of the current user in the similarity hash value library is smaller than a preset threshold value includes:
taking any one of the subsections of the current similarity hash value as an index, inquiring the subsection which is the same as the index in a similarity hash value library, and taking the similarity hash value which corresponds to the subsection which is the same as the index as a candidate hash value;
respectively calculating the similarity between each similarity hash value in the candidate hash values and the similarity hash value of the equipment fingerprint of the current user to obtain a plurality of similarities;
and determining the number of the similarity degrees smaller than a preset threshold value from the plurality of similarity degrees.
2. An abnormal device discovery apparatus, comprising:
the current hash value determining module is used for determining the equipment fingerprint of the current user, calculating the similarity hash value of the equipment fingerprint of the current user, and taking the current similarity hash value of the equipment fingerprint of the current user as current similarity hash value data;
a computing module to compute a plurality of similarity hash values for a plurality of device fingerprints in a device fingerprint repository;
the query determining module is used for querying and determining the number of the similarity hash value data of which the similarity with the current similarity hash value data is smaller than a preset threshold value in the similarity hash value database;
the abnormal equipment determining module is used for determining that the equipment of the current user is abnormal equipment when the number is larger than a set value;
the calculation module comprises: dividing the similarity hash value into a plurality of subdivisions; storing each part in the plurality of sub parts as an index in association with the corresponding similarity hash value respectively;
the current similarity hash value data comprises a similarity hash value of the device fingerprint of the current user and a plurality of subsections corresponding to the similarity hash value of the device fingerprint of the current user;
the query determination module is further to:
any one of a plurality of subsections of the similarity hash value of the device fingerprint of the current user is used as an index, the subsection which is the same as the index is inquired in a similarity hash value library, and the similarity hash value corresponding to the subsection which is the same as the index is used as a candidate hash value;
respectively calculating the similarity between each similarity hash value in the candidate hash values and the similarity hash value of the equipment fingerprint of the current user to obtain a plurality of similarities;
and determining the number of the similarity degrees smaller than a preset threshold value from the plurality of similarity degrees.
CN201811406605.5A 2018-11-23 2018-11-23 Method and device for determining equipment fingerprint homology Active CN109376277B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811406605.5A CN109376277B (en) 2018-11-23 2018-11-23 Method and device for determining equipment fingerprint homology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811406605.5A CN109376277B (en) 2018-11-23 2018-11-23 Method and device for determining equipment fingerprint homology

Publications (2)

Publication Number Publication Date
CN109376277A CN109376277A (en) 2019-02-22
CN109376277B true CN109376277B (en) 2020-11-20

Family

ID=65383428

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811406605.5A Active CN109376277B (en) 2018-11-23 2018-11-23 Method and device for determining equipment fingerprint homology

Country Status (1)

Country Link
CN (1) CN109376277B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110084298B (en) * 2019-04-23 2021-09-28 北京百度网讯科技有限公司 Method and device for detecting image similarity
CN111414528B (en) * 2020-03-16 2024-02-09 同盾控股有限公司 Method and device for determining equipment identification, storage medium and electronic equipment
CN112100616B (en) * 2020-09-14 2024-05-28 北京天空卫士网络安全技术有限公司 Monitoring method and device
CN112685799B (en) * 2020-12-29 2022-11-29 五八有限公司 Device fingerprint generation method and device, electronic device and computer readable medium
CN113676480B (en) * 2021-08-20 2023-11-14 北京顶象技术有限公司 Equipment fingerprint tampering detection method and device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653984A (en) * 2015-12-25 2016-06-08 北京奇虎科技有限公司 File fingerprint check method and apparatus
CN105912514A (en) * 2016-04-28 2016-08-31 吴国华 Fingerprint feature-based text copy detection system and method
CN107679575A (en) * 2017-10-10 2018-02-09 小花互联网金融服务(深圳)有限公司 A kind of real-time device fingerprint acquisition device based on user
CN107908666A (en) * 2017-10-23 2018-04-13 北京京东尚科信息技术有限公司 A kind of method and apparatus of identification equipment mark
CN108566372A (en) * 2018-03-01 2018-09-21 云易天成(北京)安全科技开发有限公司 Fileinfo leakage prevention method, medium and equipment based on hash algorithm

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102598007B (en) * 2009-05-26 2017-03-01 韦伯森斯公司 Effective detection fingerprints the system and method for data and information
CN103336957B (en) * 2013-07-18 2016-12-28 中国科学院自动化研究所 A kind of network homology video detecting method based on space-time characteristic
CN104915403B (en) * 2015-06-01 2018-07-27 腾讯科技(北京)有限公司 A kind of information processing method and server
CN107423613B (en) * 2017-06-29 2020-08-04 江苏通付盾信息安全技术有限公司 Method and device for determining device fingerprint according to similarity and server
CN107633078B (en) * 2017-09-25 2019-02-22 北京达佳互联信息技术有限公司 Audio-frequency fingerprint extracting method, audio-video detection method, device and terminal

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105653984A (en) * 2015-12-25 2016-06-08 北京奇虎科技有限公司 File fingerprint check method and apparatus
CN105912514A (en) * 2016-04-28 2016-08-31 吴国华 Fingerprint feature-based text copy detection system and method
CN107679575A (en) * 2017-10-10 2018-02-09 小花互联网金融服务(深圳)有限公司 A kind of real-time device fingerprint acquisition device based on user
CN107908666A (en) * 2017-10-23 2018-04-13 北京京东尚科信息技术有限公司 A kind of method and apparatus of identification equipment mark
CN108566372A (en) * 2018-03-01 2018-09-21 云易天成(北京)安全科技开发有限公司 Fileinfo leakage prevention method, medium and equipment based on hash algorithm

Also Published As

Publication number Publication date
CN109376277A (en) 2019-02-22

Similar Documents

Publication Publication Date Title
CN109376277B (en) Method and device for determining equipment fingerprint homology
Xia et al. EPCBIR: An efficient and privacy-preserving content-based image retrieval scheme in cloud computing
US7617231B2 (en) Data hashing method, data processing method, and data processing system using similarity-based hashing algorithm
Fu et al. Privacy-preserving smart similarity search based on simhash over encrypted data in cloud computing
CN108717407B (en) Entity vector determination method and device, and information retrieval method and device
US20140068768A1 (en) Apparatus and Method for Identifying Related Code Variants in Binaries
US20090043767A1 (en) Approach For Application-Specific Duplicate Detection
CN108090351B (en) Method and apparatus for processing request message
CN104598815B (en) Recognition methods, device and the client of malice advertising program
US11100073B2 (en) Method and system for data assignment in a distributed system
Zou et al. Efficient and secure encrypted image search in mobile cloud computing
US10783153B2 (en) Efficient internet protocol prefix match support on No-SQL and/or non-relational databases
CN113282630B (en) Data query method and device based on interface switching
CN111476595A (en) Product pushing method and device, computer equipment and storage medium
CN110618999A (en) Data query method and device, computer storage medium and electronic equipment
CN110390011B (en) Data classification method and device
CN114490923A (en) Training method, device and equipment for similar text matching model and storage medium
KR102289395B1 (en) Document search device and method based on jaccard model
WO2023103928A1 (en) Esop system-based data query method and apparatus, medium and device
CN113761565A (en) Data desensitization method and apparatus
CN112287952A (en) Virus clustering method, virus clustering device, storage medium and electronic device
CN113992625B (en) Domain name source station detection method, system, computer and readable storage medium
CN115391581A (en) Index creation method, image storage method, image retrieval method, device and electronic equipment
US20210336973A1 (en) Method and system for detecting malicious or suspicious activity by baselining host behavior
CN113656466A (en) Policy data query method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 101111 Room 221, 2nd Floor, Block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone

Applicant after: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.

Address before: 101111 Room 221, 2nd Floor, Block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone

Applicant before: BEIJING JINGDONG FINANCIAL TECHNOLOGY HOLDING Co.,Ltd.

GR01 Patent grant
GR01 Patent grant
CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 101111 Room 221, 2nd Floor, Block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone

Patentee after: Jingdong Technology Holding Co.,Ltd.

Address before: 101111 Room 221, 2nd Floor, Block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone

Patentee before: Jingdong Digital Technology Holding Co.,Ltd.

Address after: 101111 Room 221, 2nd Floor, Block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone

Patentee after: Jingdong Digital Technology Holding Co.,Ltd.

Address before: 101111 Room 221, 2nd Floor, Block C, 18 Kechuang 11th Street, Beijing Economic and Technological Development Zone

Patentee before: JINGDONG DIGITAL TECHNOLOGY HOLDINGS Co.,Ltd.