CN107423613B - Method and device for determining device fingerprint according to similarity and server - Google Patents

Method and device for determining device fingerprint according to similarity and server Download PDF

Info

Publication number
CN107423613B
CN107423613B CN201710575930.3A CN201710575930A CN107423613B CN 107423613 B CN107423613 B CN 107423613B CN 201710575930 A CN201710575930 A CN 201710575930A CN 107423613 B CN107423613 B CN 107423613B
Authority
CN
China
Prior art keywords
attribute
similarity
equipment
weight
data records
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710575930.3A
Other languages
Chinese (zh)
Other versions
CN107423613A (en
Inventor
汪德嘉
宋银平
葛彦霆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
JIANGSU PAY EGIS TECHNOLOGY Co.,Ltd.
JIANGSU TONGFUDUN INFORMATION SECURITY TECHNOLOGY Co.,Ltd.
Original Assignee
Jiangsu Pay Egis Technology Co ltd
Jiangsu Tongfudun Information Security Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jiangsu Pay Egis Technology Co ltd, Jiangsu Tongfudun Information Security Technology Co ltd filed Critical Jiangsu Pay Egis Technology Co ltd
Publication of CN107423613A publication Critical patent/CN107423613A/en
Application granted granted Critical
Publication of CN107423613B publication Critical patent/CN107423613B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures

Abstract

The invention discloses a method, a device, a server and a computer storage medium for determining a device fingerprint according to similarity. According to the method for determining the device fingerprint according to the similarity, provided by the embodiment of the invention, the first weight value of each attribute is calculated by using a characteristic weight calculation algorithm or a weight model algorithm according to the device attribute data record instead of setting the weight value of the weight according to the experience of a person, so that the dependence on the experience of the person is avoided, the accuracy of calculating the similarity of the device is effectively improved, and the accuracy of judging whether the devices are the same device is further improved; by verifying the calculated weight value, the accuracy of the similarity calculation of the equipment can be further improved.

Description

Method and device for determining device fingerprint according to similarity and server
Technical Field
The invention relates to the technical field of computers, in particular to a method, a device, a server and a computer storage medium for determining device fingerprints according to similarity.
Background
At present, many websites desire to perform user behavior analysis according to corresponding information of users to achieve the purpose of anti-fraud or accurate marketing, and particularly some banks or e-commerce and the like involved in transactions need to determine whether the user login equipment is changed. Specifically, the method can be solved through a device fingerprint technology, wherein the device fingerprint refers to a non-repetitive and unique device identifier generated based on various attribute information of the device, and is an 'identity card' of the device in a virtual space.
The core of the device fingerprint technology is the judgment of the device similarity, and the device similarity is determined by adopting the following two methods in the prior art:
first, Delphi method (expert scoring): the method is characterized in that a plurality of evaluation items are selected according to the specific requirements of an evaluation object, then an evaluation standard is formulated according to the evaluation items, a plurality of representative experts are engaged to give evaluation scores of all items according to the evaluation standard by means of own experience, and then the evaluation scores are collected. One commonly used scoring method is additive evaluation. The scores obtained by evaluating each index item are added and summed up, and the evaluation result is expressed by total score, and is often used for simple relation between indexes.
Although the method is simple and intuitive, the method depends on the familiarity and the understanding degree of the expert on the indexes, and is generally determined by both business experts and technical experts for the initial equipment fingerprint service, but the complete evaluation by the experts is not necessarily appropriate and is time-consuming and labor-consuming for the equipment fingerprint.
Second, analytic hierarchy process: and judging every two indexes, constructing a matrix, summing every two rows to obtain a characteristic vector, and then carrying out normalization processing. And finally, adjusting the proportion according to the consistency check result, and determining the weight if the consistency check is passed. However, the results of pairwise comparisons in the initially constructed matrix are still empirically evaluated.
Both of the above methods depend too much on human experience, thereby reducing the accuracy and stringency of the judgment.
Disclosure of Invention
In view of the above, the present invention has been developed to provide a method of determining a device fingerprint in dependence on similarity, an apparatus for determining a device fingerprint in dependence on similarity, a server and a computer storage medium that overcome or at least partially address the above-mentioned problems.
According to an aspect of the invention, there is provided a method of determining a device fingerprint from similarity, the method comprising:
extracting a plurality of device attribute data records belonging to the same operating system type in a device fingerprint library, wherein each device attribute data record consists of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information consists of an attribute and an attribute value corresponding to the attribute;
calculating a first weight value of each attribute by using a characteristic weight calculation algorithm or a weight model algorithm according to a plurality of equipment attribute data records;
judging whether the calculated first weight value of each attribute meets a first preset condition or not, and if so, assigning the weight of the attribute as the first weight value;
acquiring attribute information of the equipment to be confirmed, and calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute of the equipment to be confirmed, the attribute value corresponding to the attribute and an attribute similarity algorithm;
judging whether the similarity of each device is greater than or equal to a preset threshold value or not;
and if at least one device similarity is larger than or equal to a preset threshold, assigning the device fingerprint of the device with the highest device similarity in the at least one device similarity larger than or equal to the preset threshold to the device to be confirmed.
According to another aspect of the present invention, there is provided an apparatus for determining a fingerprint of a device based on similarity, the apparatus comprising:
the device comprises an extraction module, a storage module and a processing module, wherein the extraction module is used for extracting a plurality of device attribute data records belonging to the same operating system type in a device fingerprint library, each device attribute data record consists of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information consists of an attribute and an attribute value corresponding to the attribute;
the first calculation module is used for calculating a first weight value of each attribute by using a characteristic weight calculation algorithm or a weight model algorithm according to a plurality of equipment attribute data records;
the first judgment module is used for judging whether the calculated first weight value of each attribute meets a first preset condition or not;
the first assignment module is used for assigning the weight of each attribute to be a first weight value if the first weight value of each attribute accords with a first preset condition;
the acquisition module is used for acquiring attribute information of the equipment to be confirmed;
the second calculation module is used for calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute of the equipment to be confirmed, the attribute value corresponding to the attribute and the attribute similarity algorithm;
the second judging module is used for judging whether the similarity of each device is greater than or equal to a preset threshold value or not;
and the second assignment module is used for assigning the device fingerprint of the device with the highest device similarity in the at least one device similarity which is greater than or equal to the preset threshold value to the device to be confirmed if at least one device similarity is greater than or equal to the preset threshold value.
According to still another aspect of the present invention, there is provided an electronic device/terminal/server comprising: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is used for storing at least one executable instruction, and the executable instruction enables the processor to execute the operation corresponding to the method for determining the device fingerprint according to the similarity.
According to a further aspect of the present invention, there is provided a computer storage medium having at least one executable instruction stored therein, the executable instruction causing the processor to perform operations corresponding to the method for determining a device fingerprint according to similarity as described above.
According to the scheme provided by the invention, the first weight value of each attribute is calculated by using a characteristic weight calculation algorithm or a weight model algorithm according to the equipment attribute data record instead of setting the weight value of the weight according to the experience of people, so that the dependence on the experience of people is avoided, the accuracy of equipment similarity calculation is effectively improved, and the accuracy of judging whether the equipment is the same equipment is further improved; by verifying the calculated weight value, the accuracy of the similarity calculation of the equipment can be further improved. In addition, the weight value of each attribute is calculated by using the device attribute information belonging to the same operating system, so that the calculation amount is reduced, and the accuracy of device similarity calculation is improved.
The above description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a flow chart of a method for determining device fingerprints based on similarity according to a first embodiment of the present invention;
FIG. 2 is a flow chart of a method for determining device fingerprints based on similarity according to a second embodiment of the invention;
FIG. 3 is a flow chart of a method for determining device fingerprints based on similarity according to a third embodiment of the present invention;
FIG. 4 is a schematic structural diagram of an apparatus for determining device fingerprints according to similarity according to a fourth embodiment of the present invention;
FIG. 5 is a schematic diagram of an apparatus for determining device fingerprints according to similarity according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of an apparatus for determining device fingerprints according to similarity according to a sixth embodiment of the present invention;
fig. 7 is a schematic structural diagram of a server according to an eighth embodiment of the present invention.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Example one
Fig. 1 is a flow chart illustrating a method for determining device fingerprints according to similarity according to an embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:
step S100, extracting a plurality of device attribute data records belonging to the same operating system type in the device fingerprint database.
Each device attribute data record is composed of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information is composed of an attribute and an attribute value corresponding to the attribute.
In the embodiment of the present invention, the similarity of the devices is for devices having the same operating system type, and currently, the operating system types of the devices can be roughly divided into: the operating system management method includes that an IOS operating system, an Android operating system, a Windows operating system, and a Windows Phone operating system, for each operating system type, a large number of device fingerprints and a plurality of attribute information corresponding to the device fingerprints are stored in a device fingerprint library, wherein one device fingerprint and the plurality of attribute information corresponding to the device fingerprint are identified as a device attribute data record, each attribute information is composed of an attribute and an attribute value corresponding to the attribute, according to the different operating system types, the plurality of device attribute data records of the operating system type can be extracted from the device fingerprint library, for example, 10000 device attribute data records are respectively extracted for the IOS operating system, the Android operating system, the Windows operating system, and the Windows Phone operating system, which are only examples, and do not have any limiting function.
Step S101, calculating a first weight value of each attribute by using a characteristic weight calculation algorithm or a weight model algorithm according to a plurality of equipment attribute data records.
After extracting a plurality of device attribute data records belonging to the same operating system type, setting weights for each attribute information, wherein the weights of each attribute represent the relative importance degree of the attribute in device similarity calculation, specifically, a feature weight calculation algorithm or a weight model algorithm can be used for calculating a first weight value of each attribute.
Step S102, determining whether the calculated first weight values of the attributes meet a first preset condition, if yes, executing step S103.
After the first weight values of the attributes are obtained through calculation, it is necessary to determine whether the calculated first weight values of the attributes meet a first preset condition, where it is determined whether the first weight values meet the first preset condition, mainly to determine whether the calculated first weight values are better.
Step S103, assigning the weight of the attribute as a first weight value.
And under the condition that the first weight value of each attribute is judged to accord with the first preset condition, assigning the weight of the attribute as the first weight value.
Step S104, obtaining attribute information of the equipment to be confirmed, and calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute of the equipment to be confirmed, the attribute value corresponding to the attribute and the attribute similarity algorithm.
When the device login or access is detected, the attribute information of the device to be confirmed is obtained, and then the device similarity between the device to be confirmed and each device in the device library is calculated according to the obtained weight corresponding to the attribute of the device to be confirmed, the attribute value corresponding to the attribute and the attribute similarity calculation method.
Step S105, determining whether at least one device similarity is greater than or equal to a preset threshold, if yes, executing step S106.
Judging whether the calculated similarity of each device is greater than or equal to a preset threshold, if at least one device similarity is greater than or equal to the preset threshold, indicating that the device to be confirmed is similar to at least one device in the device library, and if the similarity of each device is less than the preset threshold, indicating that the device to be confirmed is not similar to any device in the device library, wherein the preset threshold can be set according to actual experience, and no specific description is provided here.
And step S106, assigning the device fingerprint of the device with the highest device similarity in at least one device similarity which is larger than or equal to a preset threshold value to the device to be confirmed.
If at least one device similarity is larger than or equal to the preset threshold, the device to be confirmed is similar to at least one device in the device library, so that the device with the highest device similarity in the at least one device similarity larger than or equal to the preset threshold and the device to be confirmed can be determined to be the same device, and the device fingerprint of the device with the highest device similarity can be assigned to the device to be confirmed.
According to the method provided by the embodiment of the invention, the first weight value of each attribute is calculated by using the characteristic weight calculation algorithm or the weight model algorithm according to the equipment attribute data record instead of setting the weight value of the weight according to the experience of people, so that the dependence on the experience of people is avoided, the accuracy of calculating the similarity of the equipment is effectively improved, and the accuracy of judging whether the equipment is the same equipment is further improved; by verifying the calculated weight value, the accuracy of the similarity calculation of the equipment can be further improved. In addition, the weight value of each attribute is calculated by using the device attribute information belonging to the same operating system, so that the calculation amount is reduced, and the accuracy of device similarity calculation is improved.
Example two
Fig. 2 is a flow chart of a method for determining device fingerprints according to similarity according to a second embodiment of the present invention. As shown in fig. 2, the method comprises the steps of:
step S200, extracting a plurality of device attribute data records belonging to the same operating system type in the device fingerprint database.
Each device attribute data record is composed of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information is composed of an attribute and an attribute value corresponding to the attribute.
Specifically, the attribute information includes: hardware attribute information, software attribute information, and/or behavior attribute information; wherein the hardware attribute information comprises one or more of the following information: MAC address, brand, model, IMEI, serial number; the software attribute information includes one or more of the following information: OS type, system settings, network settings, protocol fingerprint, browser attributes, geographic location; the behavior attribute information includes one or more of the following information: access frequency, access time and operation track.
The operating system types include: an IOS operating system, an Android operating system, a Windows operating system, and/or a Windows Phone operating system.
Step S201, a plurality of device attribute data records are preprocessed to obtain preprocessed device attribute data records.
In order to more effectively calculate the weight value of the weight and the device similarity, after the plurality of device attribute data records are extracted, the plurality of device attribute data records also need to be preprocessed, and specifically, the plurality of device attribute data records may be preprocessed by the following method:
the method comprises the following steps: and deleting the device attribute data records of which the number of the attribute information is less than a preset threshold value.
Each device fingerprint corresponds to a lot of attribute information, and for those device fingerprints with less attribute information, the device fingerprint can be determined to be an abnormal device fingerprint, so that for the device fingerprints with the number of attribute information smaller than the preset threshold, a deletion process can be performed, that is, the device attribute data records with the number of attribute information smaller than the preset threshold are deleted.
The second method comprises the following steps: matching the device fingerprints corresponding to the plurality of device attribute data records with the device fingerprints in the blacklist respectively; and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
Specifically, some abnormal device fingerprints may be recorded in the device fingerprint library, for example, device fingerprints corresponding to devices logged in by using stolen information need to be deleted for such device attribute data records, and more specifically, a blacklist for storing the abnormal device fingerprints may be set, and after a plurality of device attribute data records are extracted, the device fingerprints corresponding to the plurality of device attribute data records are respectively matched with the device fingerprints in the blacklist; and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
Certainly, after deleting the device attribute data records of which the number of the attribute information is smaller than the preset threshold, judging that the device fingerprints corresponding to the remaining device attribute data records are respectively matched with the device fingerprints in the blacklist; and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
Step S202, according to the preprocessed device attribute data records, calculating a first weight value of each attribute by using the following formula (1):
Figure BDA0001350951960000051
wherein, T represents a set of device attribute data records, | T | represents the number of device attribute data records contained in the set T, values (a) represents a set of all attribute values of the attribute a, T represents a set of all attribute values of the attribute a, andvis a subset of the device attribute data records for which the attribute value of attribute A in the set T is v, | TvWhen | represents that the attribute value of the attribute A is v, TvThe number of device attribute data records included in the set, S (T) represents the entropy of T, S (T)v) Represents TvEntropy of (d);
Figure BDA0001350951960000061
wherein c represents the number of attribute categories in the T set, piThe device attribute data record representing the ith attribute category in the T set accounts for the total device attribute data record in the T setThe ratio of the records;
Figure BDA0001350951960000062
wherein, cvWhen the attribute value representing the attribute A is v, TvNumber of attribute classes in the set, pviWhen the attribute value representing the attribute A is v, TvDevice attribute data records of ith attribute category in set account for TvThe proportion of total device attribute data records in the collection.
Thus, with this step, the first weight values of the respective attributes can be calculated. In addition, in the embodiment of the present invention, it may be determined that the types of the attributes corresponding to the device fingerprints are the same, and for an attribute that the device fingerprint does not have, it may be determined that the attribute value corresponding to the attribute is null, for example, the device fingerprint 1, the attribute a — the attribute value: 1, attribute B-attribute value: 1, attribute C-attribute value: null, attribute D-attribute value: 1.
after the first weight values of the attributes are calculated, verification needs to be performed on the first weight values, and a specific verification method may be as shown in step S203 to step S207:
step S203, analyzing the multiple device attribute data records to obtain a first sample and a second sample.
The first sample is the device attribute data records of the same device at different moments, and the second sample is the device attribute data records of different devices.
The device attribute data records of the same device at different time points may be different, for example, the attribute values of the attributes such as access frequency, access time, operation track, etc. in the behavior attribute information may be different. The device property data records of different devices may differ significantly for different devices.
As is known, the higher the device similarity is, the more similar the two devices are, in the embodiment of the present invention, the multiple device attribute data records are divided into the device attribute data records of the same device at different times and the device attribute data records of different devices, so as to verify whether the first weight value is better than the initial weight value by using the device similarity of the same device and the device similarity between different devices.
And step S204, calculating the first equipment similarity of the same equipment and the second equipment similarity between different equipment according to the first weight value.
Specifically, after the first weight value is calculated, the first device similarity of the same device and the second device similarity between different devices need to be calculated according to the first weight value, and here, the calculation may be performed by using an initially set attribute similarity algorithm.
In a preferred embodiment of the present invention, the attribute information may be further filtered according to a first weight value, for example, the calculated first weight values are sorted from small to large, the attribute information with the first weight value smaller than the preset weight value is filtered, the attribute information with the first weight value greater than or equal to the preset weight value is obtained, then, the first device similarity of the same device and the second device similarity between different devices are calculated according to the first weight value corresponding to the attribute in the filtered attribute information, and the number of the attribute information participating in the device similarity calculation process can be reduced by filtering the first weight value, so as to increase the calculation speed of the device similarity.
Specifically, the attribute similarity may be calculated according to an attribute value corresponding to the attribute and an attribute similarity algorithm, and then the following formula is used to calculate the first device similarity of the same device and the second device similarity between different devices, respectively:
Figure BDA0001350951960000071
wherein S isdThe degree of similarity of the devices is represented,
Figure BDA0001350951960000072
representing the similarity of attributes of the ith attribute class, WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
Step S205, judging whether the first device similarity of the same device and the second device similarity between different devices meet a first preset condition, if so, executing step S206; if not, go to step S207.
After the first device similarity of the same device and the second device similarity between different devices are obtained through calculation, it is necessary to determine whether the first device similarity of the same device and the second device similarity between different devices satisfy a first preset condition, where the first preset condition specifically is: the similarity of first equipment of the same equipment is greater than the similarity of initial equipment of the same equipment, and the similarity of second equipment between different equipment is less than the similarity of initial equipment between different equipment; the initial device similarity of the same device and the initial device similarity between different devices are calculated according to the initial weight values of the attributes.
If the first device similarity of the same device and the second device similarity between different devices meet a first preset condition, the device similarity of the same device calculated by using the first weight values of the attributes is higher, and the first weight value is better than the initial weight value; if the first device similarity of the same device and the second device similarity between different devices do not meet the first preset condition, it indicates that the device similarity of the same device calculated by using the initial weight values of the attributes is higher than the device similarity of the same device calculated by using the first weight values, indicating that the initial weight values are better than the first weight values.
In step S206, the weight of the attribute is assigned as the first weight value.
If the first device similarity of the same device and the second device similarity between different devices meet a first preset condition, the weight of the attribute can be assigned as a first weight value.
In step S207, the weight of the attribute is assigned as the initial weight value.
If the first device similarity of the same device and the second device similarity between different devices do not meet a first preset condition, the weight of the attribute can be assigned as an initial weight value.
Step S208, obtaining attribute information of the equipment to be confirmed, and determining attribute information of the equipment to be confirmed participating in similarity calculation of each equipment and weight corresponding to the attribute.
When the device login or access is detected, the attribute information of the device to be confirmed is obtained, the attribute with the weight value smaller than the preset weight value is filtered, and the attribute does not participate in the device similarity calculation process.
Step S209, calculating the device similarity between the device to be confirmed and each device in the device library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute, and the attribute similarity calculation method in the attribute information of the determined device to be confirmed participating in the similarity calculation of each device.
After determining attribute information and weights corresponding to the attributes of the devices to be confirmed participating in the similarity calculation of the devices, calculating the device similarity between the devices to be confirmed and the devices in the device library according to the weights corresponding to the attributes, the attribute values corresponding to the attributes and the attribute similarity calculation method in the attribute information of the devices to be confirmed participating in the similarity calculation of the devices, wherein the attribute similarity calculation method can be an initial attribute similarity calculation method.
Specifically, the following method may be adopted to calculate the device similarity between the device to be confirmed and each device in the device library: calculating attribute similarity according to the attribute value corresponding to the attribute and an attribute similarity algorithm, and then calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library by using the following formula (2):
Figure BDA0001350951960000081
wherein S isdThe degree of similarity of the devices is represented,
Figure BDA0001350951960000082
representing the ith attribute categorySimilarity of attributes of (1), WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
Step S210, determining whether at least one device similarity is greater than or equal to a preset threshold, if yes, executing step S211; if not, go to step S212.
Judging whether the calculated similarity of each device is greater than or equal to a preset threshold, if at least one device similarity is greater than or equal to the preset threshold, indicating that the device to be confirmed is similar to at least one device in the device library, and if the similarity of each device is less than the preset threshold, indicating that the device to be confirmed is not similar to any device in the device library, wherein the preset threshold can be set according to actual experience, and no specific description is provided here.
Step S211, assigning the device fingerprint of the device with the highest device similarity among the at least one device similarity greater than or equal to the preset threshold to the device to be confirmed.
If at least one device similarity is larger than or equal to the preset threshold, the device to be confirmed is similar to at least one device in the device library, so that the device with the highest device similarity in the at least one device similarity larger than or equal to the preset threshold and the device to be confirmed can be determined to be the same device, and the device fingerprint of the device with the highest device similarity can be assigned to the device to be confirmed.
Step S212, calculate the device fingerprint according to the attribute information of the new device, and assign the device fingerprint to the new device.
If each similarity is smaller than the preset threshold, it indicates that the new device is not similar to any device in the device library, and thus, the device fingerprint may be calculated according to the attribute information of the new device and assigned to the new device, where the device fingerprint may specifically include the following information: attribute coding (8 bits), time stamp coding (14 bits), check code (2 bits).
According to the method provided by the embodiment of the invention, the weight value of the weight is calculated by using the characteristic weight calculation algorithm according to the equipment attribute data record instead of setting the weight value of the weight according to the experience of people, so that the dependence on the experience of people is avoided, the accuracy of calculating the similarity of the equipment is effectively improved, and the accuracy of judging whether the equipment is the same equipment is improved; by verifying the calculated weight value, the accuracy of the similarity calculation of the equipment can be further improved. In addition, the weight value of each attribute is calculated by using the device attribute information belonging to the same operating system, so that the calculation amount is reduced, and the accuracy of device similarity calculation is improved.
EXAMPLE III
Fig. 3 is a flow chart of a method for determining device fingerprints according to similarity according to a third embodiment of the present invention. As shown in fig. 3, the method comprises the steps of:
step S300, extracting a plurality of device attribute data records belonging to the same operating system type in the device fingerprint library.
Each device attribute data record is composed of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information is composed of an attribute and an attribute value corresponding to the attribute.
Specifically, the attribute information includes: hardware attribute information, software attribute information, and/or behavior attribute information; wherein the hardware attribute information comprises one or more of the following information: MAC address, brand, model, IMEI, serial number; the software attribute information includes one or more of the following information: OS type, system settings, network settings, protocol fingerprint, browser attributes, geographic location; the behavior attribute information includes one or more of the following information: access frequency, access time and operation track.
The operating system types include: an IOS operating system, an Android operating system, a Windows operating system, and/or a Windows Phone operating system.
Step S301, a plurality of device attribute data records are preprocessed, and the preprocessed device attribute data records are obtained.
In order to more effectively calculate the weight value of the weight and the device similarity, after the plurality of device attribute data records are extracted, the plurality of device attribute data records also need to be preprocessed, and specifically, the plurality of device attribute data records may be preprocessed by the following method:
the method comprises the following steps: and deleting the device attribute data records of which the number of the attribute information is less than a preset threshold value.
Each device fingerprint corresponds to a lot of attribute information, and for those device fingerprints with less attribute information, the device fingerprint can be determined to be an abnormal device fingerprint, so that for the device fingerprints with the number of attribute information smaller than the preset threshold, a deletion process can be performed, that is, the device attribute data records with the number of attribute information smaller than the preset threshold are deleted.
The second method comprises the following steps: matching the device fingerprints corresponding to the plurality of device attribute data records with the device fingerprints in the blacklist respectively; and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
Specifically, some abnormal device fingerprints may be recorded in the device fingerprint library, for example, device fingerprints corresponding to devices logged in by using stolen information need to be deleted for such device attribute data records, and more specifically, a blacklist for storing the abnormal device fingerprints may be set, and after a plurality of device attribute data records are extracted, the device fingerprints corresponding to the plurality of device attribute data records are respectively matched with the device fingerprints in the blacklist; and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
Certainly, after deleting the device attribute data records of which the number of the attribute information is smaller than the preset threshold, judging that the device fingerprints corresponding to the remaining device attribute data records are respectively matched with the device fingerprints in the blacklist; and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
In the present embodiment, the first weight value is calculated by using a weight model algorithm, and specifically, the method described in step S302 to step S305 may be adopted:
step S302, the preprocessed device attribute data records are analyzed to obtain a first sample and a second sample.
The first sample is the device attribute data records of the same device at different moments, and the second sample is the device attribute data records of different devices.
The device attribute data records of the same device at different time points may be different, for example, the attribute values of the attributes such as access frequency, access time, operation track, etc. in the behavior attribute information may be different. The device property data records of different devices may differ significantly for different devices.
Step S303, calculating the attribute similarity of the same equipment and the attribute similarity between different equipment by using an initial attribute similarity algorithm.
The initial attribute similarity algorithm is set empirically, and includes: the equality judgment similarity algorithm, the cosine similarity algorithm, the shortest edit distance similarity algorithm, and the longest common substring similarity algorithm, and those skilled in the art can set the initial attribute similarity algorithm as needed, for example, set the initial attribute similarity algorithm as the shortest edit distance similarity algorithm.
Step S304, calculating the average value of the attribute similarity of the same equipment and the attribute similarity between different equipment as the attribute similarity of each attribute.
The attribute similarity of the same device and the attribute similarity between different devices calculated in step S303 are multiple, and here, an average value of multiple attribute similarities is obtained, and the average value of multiple attribute similarities is used as the attribute similarity of each attribute.
Step S305, inputting the attribute similarity of each attribute into a weight model to obtain a first weight value of each attribute.
The weight model is obtained by training according to the attribute similarity of a large number of sample devices and the weight values corresponding to the attributes, and is a model related to the attribute similarity and the weight values, so that after the attribute similarity of each attribute is obtained through calculation, the attribute similarity of each attribute is input into the weight model, and the first weight value of each attribute can be obtained.
After the first weight values of the attributes are calculated, the first weight values need to be verified to verify whether the device similarity calculated by using the first weight values is higher, and a specific verification method may be as shown in step S306 to step S309:
step S306, calculating a first device similarity of the same device and a second device similarity between different devices according to the first weight value.
Specifically, after the first weight value is calculated, the first device similarity of the same device and the second device similarity between different devices need to be calculated according to the first weight value, and here, the calculation may be performed by using an initially set attribute similarity algorithm.
In a preferred embodiment of the present invention, the attribute information may be further filtered according to a first weight value, for example, the calculated first weight values are sorted from small to large, the attribute information with the first weight value smaller than the preset weight value is filtered, the attribute information with the first weight value greater than or equal to the preset weight value is obtained, then, the first device similarity of the same device and the second device similarity between different devices are calculated according to the first weight value corresponding to the attribute in the filtered attribute information, and the number of the attribute information participating in the device similarity calculation process can be reduced by filtering the first weight value, so as to increase the calculation speed of the device similarity.
Specifically, the attribute similarity of the attributes may be calculated by using an attribute similarity algorithm, and then the first device similarity of the same device and the second device similarity between different devices may be calculated by using the following formula:
Figure BDA0001350951960000101
wherein S isdThe degree of similarity of the devices is represented,
Figure BDA0001350951960000102
representing the similarity of attributes of the ith attribute class, WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
Step S307, judging whether the first equipment similarity of the same equipment and the second equipment similarity between different equipment meet a first preset condition, if so, executing step S308; if not, go to step S309.
After the first device similarity of the same device and the second device similarity between different devices are obtained through calculation, it is necessary to determine whether the first device similarity of the same device and the second device similarity between different devices satisfy a first preset condition, where the first preset condition specifically is: the similarity of first equipment of the same equipment is greater than the similarity of initial equipment of the same equipment, and the similarity of second equipment between different equipment is less than the similarity of initial equipment between different equipment; the initial device similarity of the same device and the initial device similarity between different devices are calculated according to the initial weight values of the attributes.
If the first device similarity of the same device and the second device similarity between different devices meet a first preset condition, the device similarity of the same device calculated by using the first weight values of the attributes is higher, and the first weight value is better than the initial weight value; if the first device similarity of the same device and the second device similarity between different devices do not meet the first preset condition, it indicates that the device similarity of the same device calculated by using the initial weight values of the attributes is higher than the device similarity of the same device calculated by using the first weight values, indicating that the initial weight values are better than the first weight values.
In step S308, the weight of the attribute is assigned as the first weight value.
If the first device similarity of the same device and the second device similarity between different devices meet a first preset condition, the weight of the attribute can be assigned as a first weight value.
In step S309, the weight of the attribute is assigned as the initial weight value.
If the first device similarity of the same device and the second device similarity between different devices do not meet a first preset condition, the weight of the attribute can be assigned as an initial weight value.
In order to obtain the optimal device similarity, the embodiment of the present invention may recalculate the attribute similarity of the attribute by changing the attribute similarity algorithm, and calculate the weight value according to the recalculated attribute similarity, specifically, refer to step S310 to step S312:
and S310, changing the attribute similarity algorithm, and recalculating the attribute similarity of the same equipment and the attribute similarity between different equipment by using the changed attribute similarity algorithm.
Assuming that the initial attribute similarity algorithm is the shortest edit distance similarity algorithm, the attribute similarity algorithm may be modified for different attributes, for example, for an attribute: the CPU can change the attribute similarity algorithm into: an equality determination similarity algorithm, for an attribute: hostname, the attribute similarity algorithm may be changed to: the cosine similarity algorithm is here only an example and does not have any limiting effect.
Step S311, calculating an average of the attribute similarity of the same device and the attribute similarity between different devices after recalculation as the attribute similarity of each attribute.
Step S312, the attribute similarity of each attribute is used as a matching input item and input into the weight model, so as to obtain a second weight value of each attribute.
Steps S310 to S312 are similar to steps S303 to S305, and are not described herein again.
In order to determine whether the second weight value is better than the first weight value, the second weight value needs to be verified, and a specific verification method may be seen in steps S313 to S315:
step 313, calculating a third device similarity of the same device and a fourth device similarity between different devices according to the second weight values of the attributes.
Step S314, determining whether a third device similarity of the same device and a fourth device similarity between different devices satisfy a second preset condition, if yes, executing step S315; if not, go to step S316.
The second preset condition is specifically as follows: the third device similarity of the same device is greater than the first device similarity of the same device, and the fourth device similarity between different devices is less than the second device similarity between different devices.
Steps S313 to S314 are similar to steps S306 to S307, and are not described herein again.
Step S315, assigns the weight of the attribute to a second weight value, and determines an attribute similarity algorithm for each attribute.
And the third equipment similarity of the same equipment and the fourth equipment similarity between different equipment meet a second preset condition, the weight of the attribute is assigned as a second weight value, and the modified attribute similarity algorithm is used for determining the attribute similarity algorithm of each attribute.
Wherein, steps S310 to 315 are optional steps.
Step S316, obtaining attribute information of the device to be confirmed, and determining attribute information of the device to be confirmed participating in similarity calculation of each device, and a weight corresponding to the attribute.
When the device is detected to log in or access, the attribute information of the device to be confirmed is obtained, the attribute with the weight value smaller than the preset weight value is filtered, and the attribute does not participate in the device similarity calculation process.
Step S317, calculating the device similarity between the device to be confirmed and each device in the device library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute, and the attribute similarity calculation method in the attribute information of the determined device to be confirmed participating in the similarity calculation of each device.
After determining the attribute information of the equipment to be confirmed participating in the similarity calculation of each equipment and the weight corresponding to the attribute, calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute and the attribute similarity calculation method in the attribute information of the equipment to be confirmed participating in the similarity calculation of each equipment, wherein the attribute similarity calculation method can be an initial attribute similarity calculation method or a modified attribute similarity calculation method.
Specifically, the following method may be adopted to calculate the device similarity between the device to be confirmed and each device in the device library: calculating attribute similarity according to the attribute value corresponding to the attribute and an attribute similarity algorithm, and then calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library by using the following formula (2):
Figure BDA0001350951960000121
wherein S isdThe degree of similarity of the devices is represented,
Figure BDA0001350951960000122
representing the similarity of attributes of the ith attribute class, WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
Step S318, determining whether at least one device similarity is greater than or equal to a preset threshold, if yes, executing step S319; if not, go to step S320.
Judging whether the calculated similarity of each device is greater than or equal to a preset threshold, if at least one device similarity is greater than or equal to the preset threshold, indicating that the device to be confirmed is similar to at least one device in the device library, and if the similarity of each device is less than the preset threshold, indicating that the device to be confirmed is not similar to any device in the device library, wherein the preset threshold can be set according to actual experience, and no specific description is provided here.
In step S319, the device fingerprint of the device with the highest device similarity among the at least one device similarity greater than or equal to the preset threshold is assigned to the device to be confirmed.
If at least one device similarity is larger than or equal to the preset threshold, the device to be confirmed is similar to at least one device in the device library, so that the device with the highest device similarity in the at least one device similarity larger than or equal to the preset threshold and the device to be confirmed can be determined to be the same device, and the device fingerprint of the device with the highest device similarity can be assigned to the device to be confirmed.
Step S320, calculating a device fingerprint according to the attribute information of the new device, and assigning the device fingerprint to the new device.
If each similarity is smaller than the preset threshold, it indicates that the new device is not similar to any device in the device library, and thus, the device fingerprint may be calculated according to the attribute information of the new device and assigned to the new device, where the device fingerprint may specifically include the following information: attribute coding (8 bits), time stamp coding (14 bits), check code (2 bits).
According to the method provided by the embodiment of the invention, the weight value of the weight is calculated by using the weight model according to the equipment attribute record instead of setting the weight value of the weight according to the experience of people, so that the dependence on the experience of people is avoided, the accuracy of equipment similarity calculation is effectively improved, and the accuracy of judging whether the equipment is the same equipment is improved; the accuracy of the equipment similarity calculation can be further improved by verifying the calculated weight value and changing the attribute similarity calculation method. In addition, the weight value of each attribute is calculated by using the device attribute information belonging to the same operating system, which is also helpful for reducing the calculation amount and improving the accuracy of device similarity calculation.
Example four
Fig. 4 is a schematic structural diagram of an apparatus for determining device fingerprints according to similarity according to a fourth embodiment of the present invention. As shown in fig. 4, the apparatus includes: the system comprises an extraction module 400, a first calculation module 401, a first judgment module 402, a first assignment module 403, an acquisition module 404, a second calculation module 405, a second judgment module 406 and a second assignment module 407.
An extracting module 400, configured to extract multiple device attribute data records belonging to the same operating system type in the device fingerprint library. Each device attribute data record is composed of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information is composed of an attribute and an attribute value corresponding to the attribute.
The first calculating module 401 is configured to calculate a first weight value of each attribute by using a feature weight calculating algorithm or a weight model algorithm according to a plurality of device attribute data records.
A first determining module 402, configured to determine whether the calculated first weight value of each attribute meets a first preset condition.
The first assigning module 403 is configured to assign the weight of each attribute to a first weight value if the first weight value of each attribute meets a first preset condition.
An obtaining module 404, configured to obtain attribute information of a device to be confirmed.
The second calculating module 405 is configured to calculate the device similarity between the device to be confirmed and each device in the device library according to the weight corresponding to the attribute of the device to be confirmed, the attribute value corresponding to the attribute, and the attribute similarity algorithm.
And a second determining module 406, configured to determine whether each device similarity is greater than or equal to a preset threshold.
A second assigning module 407, configured to assign, to the device to be confirmed, the device fingerprint of the device corresponding to the highest device similarity that is greater than or equal to the preset threshold if at least one device similarity is greater than or equal to the preset threshold.
According to the device provided by the embodiment of the invention, the first weight value of each attribute is calculated by using the characteristic weight calculation algorithm or the weight model algorithm according to the equipment attribute data record instead of setting the weight value of the weight according to the experience of people, so that the dependence on the experience of people is avoided, the accuracy of calculating the similarity of the equipment is effectively improved, and the accuracy of judging whether the equipment is the same equipment is further improved; by verifying the calculated weight value, the accuracy of the similarity calculation of the equipment can be further improved. In addition, the weight value of each attribute is calculated by using the device attribute information belonging to the same operating system, so that the calculation amount is reduced, and the accuracy of device similarity calculation is improved.
EXAMPLE five
Fig. 5 is a schematic structural diagram of an apparatus for determining device fingerprints according to similarity according to a fifth embodiment of the present invention. As shown in fig. 5, the apparatus includes: the system comprises an extraction module 500, a preprocessing module 501, a first calculation module 502, a first judgment module 503, a first assignment module 504, an acquisition module 505, a second calculation module 506, a second judgment module 507, a second assignment module 508 and a third calculation module 509.
The extracting module 500 is configured to extract multiple device attribute data records belonging to the same operating system type in the device fingerprint library.
Each device attribute data record is composed of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information is composed of an attribute and an attribute value corresponding to the attribute.
Specifically, the attribute information includes: hardware attribute information, software attribute information, and/or behavior attribute information; wherein the hardware attribute information comprises one or more of the following information: MAC address, brand, model, IMEI, serial number; the software attribute information includes one or more of the following information: OS type, system settings, network settings, protocol fingerprint, browser attributes, geographic location; the behavior attribute information includes one or more of the following information: access frequency, access time and operation track.
The operating system types include: an IOS operating system, an Android operating system, a Windows operating system, and/or a Windows Phone operating system.
The preprocessing module 501 is configured to preprocess the multiple device attribute data records to obtain preprocessed device attribute data records.
In a preferred embodiment of the present invention, the preprocessing module 501 is further configured to: and deleting the device attribute data records of which the number of the attribute information is less than a preset threshold value.
In addition, the preprocessing module 501 is further configured to: matching the device fingerprints corresponding to the plurality of device attribute data records with the device fingerprints in the blacklist respectively; and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
A first calculating module 502, configured to calculate, according to the preprocessed device attribute data record, a first weight value of each attribute by using the following formula (1):
Figure BDA0001350951960000141
wherein, T represents a set of device attribute data records, | T | represents the number of device attribute data records contained in the set T, values (a) represents a set of all attribute values of the attribute a, T represents a set of all attribute values of the attribute a, andvis a subset of the device attribute data records for which the attribute value of attribute A in the set T is v, | TvWhen | represents that the attribute value of the attribute A is v, TvThe number of device attribute data records included in the set, S (T) represents the entropy of T, S (T)v) Represents TvEntropy of (d);
Figure BDA0001350951960000142
wherein c represents the number of attribute categories in the T set, piRepresenting the proportion of the device attribute data records of the ith attribute category in the T set to the total device attribute data records in the T set;
Figure BDA0001350951960000151
wherein, cvWhen the attribute value representing the attribute A is v, TvNumber of attribute classes in the set, pviWhen the attribute value representing the attribute A is v, TvDevice attribute data records of ith attribute category in set account for TvThe proportion of total device attribute data records in the collection.
A first determining module 503, configured to analyze the preprocessed device attribute data records to obtain a first sample and a second sample, where the first sample is a device attribute data record of the same device at different times, and the second sample is a device attribute data record of different devices;
calculating a first device similarity of the same device and a second device similarity between different devices according to the first weight value;
in a preferred embodiment of the present invention, the first determining module 503 may further screen the attribute information according to the first weight value to obtain the attribute information of which the first weight value is greater than or equal to a preset weight value; then, calculating a first device similarity of the same device and a second device similarity between different devices according to a first weight value corresponding to the attributes in the screened attribute information;
and judging whether the first equipment similarity of the same equipment and the second equipment similarity between different equipment meet a first preset condition or not.
The first preset condition is specifically as follows: the similarity of the first equipment of the same equipment is greater than the similarity of the initial equipment of the same equipment, the similarity of the second equipment between different equipment is less than the similarity of the initial equipment between different equipment, and the similarity of the initial equipment of the same equipment and the similarity of the initial equipment between different equipment are calculated according to the initial weight values of the attributes.
The first assigning module 504 is configured to assign the weight of the attribute as a first weight value if the first device similarity of the same device and the second device similarity between different devices meet a first preset condition.
An obtaining module 505 is configured to obtain attribute information of a device to be confirmed.
A second calculating module 506, configured to determine attribute information of the device to be confirmed participating in similarity calculation of each device, and a weight corresponding to the attribute;
and calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute and the attribute similarity calculation method in the attribute information of the equipment to be confirmed participating in the similarity calculation of each equipment.
In a preferred embodiment of the present invention, the second calculation module 506 is further configured to: calculating attribute similarity according to the attribute value corresponding to the attribute and an attribute similarity algorithm;
calculating the device similarity of the device to be confirmed and each device in the device library by using the following formula (2):
Figure BDA0001350951960000152
wherein S isdThe degree of similarity of the devices is represented,
Figure BDA0001350951960000153
representing the similarity of attributes of the ith attribute class, WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
A second determining module 507, configured to determine whether each device similarity is greater than or equal to a preset threshold;
a second assigning module 508, configured to assign, to the device to be confirmed, the device fingerprint of the device corresponding to the highest device similarity that is greater than or equal to the preset threshold if at least one device similarity is greater than or equal to the preset threshold.
A third calculating module 509, configured to calculate a device fingerprint according to the attribute information of the device to be confirmed and assign the device fingerprint to the device to be confirmed if each similarity is smaller than the preset threshold.
According to the device provided by the embodiment of the invention, the weight value of the weight is calculated by using the characteristic weight calculation algorithm instead of setting the weight value of the weight according to the experience of people, so that the dependence on the experience of people is avoided, the accuracy of calculating the similarity of the equipment is effectively improved, and the accuracy of judging whether the equipment is the same equipment is improved; by verifying the calculated weight value, the accuracy of the similarity calculation of the equipment can be further improved. In addition, the weight value of each attribute is calculated by using the device attribute information belonging to the same operating system, which is also helpful for reducing the calculation amount and improving the accuracy of device similarity calculation.
EXAMPLE six
Fig. 6 is a schematic structural diagram of an apparatus for determining device fingerprints according to similarity according to a sixth embodiment of the present invention. As shown in fig. 6, the apparatus includes: the system comprises an extraction module 600, a preprocessing module 601, a first calculation module 602, a first judgment module 603, a first assignment module 604, a modification module 605, a fourth calculation module 606, a fifth calculation module 607, an input module 608, a sixth calculation module 609, a third judgment module 610, a determination module 611, an acquisition module 612, a second calculation module 613, a second judgment module 614, a second assignment module 615 and a third calculation module 616.
The extracting module 600 is configured to extract multiple device attribute data records belonging to the same operating system type in the device fingerprint library.
Each device attribute data record is composed of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information is composed of an attribute and an attribute value corresponding to the attribute.
Specifically, the attribute information includes: hardware attribute information, software attribute information, and/or behavior attribute information;
wherein the hardware attribute information comprises one or more of the following information: MAC address, brand, model, IMEI, serial number; the software attribute information includes one or more of the following information: OS type, system settings, network settings, protocol fingerprint, browser attributes, geographic location; the behavior attribute information includes one or more of the following information: frequency of access, time of access, and trajectory of operation.
The operating system types include: an IOS operating system, an Android operating system, a Windows operating system, and/or a Windows Phone operating system.
The preprocessing module 601 is configured to preprocess the multiple device attribute data records to obtain preprocessed device attribute data records.
In a preferred embodiment of the present invention, the preprocessing module 601 is further configured to: and deleting the device attribute data records of which the number of the attribute information is less than a preset threshold value.
In addition, the preprocessing module 601 is further configured to: matching the device fingerprints corresponding to the plurality of device attribute data records with the device fingerprints in the blacklist respectively; and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
A first calculating module 602, configured to analyze the preprocessed device attribute data records to obtain a first sample and a second sample, where the first sample is a device attribute data record of the same device at different times, and the second sample is a device attribute data record of different devices;
calculating the attribute similarity of the same equipment and the attribute similarity between different equipment by using an initial attribute similarity algorithm;
calculating the attribute similarity of the same equipment and the average value of the attribute similarities between different equipment as the attribute similarity of each attribute;
and inputting the attribute similarity of each attribute into a weight model to obtain a first weight value of each attribute, wherein the weight model is obtained by training according to the attribute similarity of a large number of sample devices and the weight values corresponding to the attributes.
The first determining module 603 is configured to analyze the preprocessed device attribute data records to obtain a first sample and a second sample, where the first sample is a device attribute data record of the same device at different times, and the second sample is a device attribute data record of different devices;
calculating a first device similarity of the same device and a second device similarity between different devices according to the first weight value;
in a preferred embodiment of the present invention, the first determining module 603 may further screen the attribute information according to the first weight value, so as to obtain the attribute information of which the first weight value is greater than or equal to the preset weight value; then, calculating a first device similarity of the same device and a second device similarity between different devices according to a first weight value corresponding to the attributes in the screened attribute information;
and judging whether the first equipment similarity of the same equipment and the second equipment similarity between different equipment meet a first preset condition or not.
The first preset condition is specifically as follows: the similarity of the first equipment of the same equipment is greater than the similarity of the initial equipment of the same equipment, the similarity of the second equipment between different equipment is less than the similarity of the initial equipment between different equipment, and the similarity of the initial equipment of the same equipment and the similarity of the initial equipment between different equipment are calculated according to the initial weight values of the attributes.
The first assigning module 604 is configured to assign the weight of the attribute to a first weight value if the first device similarity of the same device and the second device similarity between different devices meet a first preset condition.
A change module 605 for changing the attribute similarity algorithm.
A fourth calculating module 606, configured to recalculate the attribute similarity of the same device and the attribute similarity between different devices by using the changed attribute similarity algorithm.
A fifth calculating module 607, configured to calculate an average value of the attribute similarity of the same device and the attribute similarity between different devices after recalculation as the attribute similarity of each attribute.
The input module 608 is configured to input the attribute similarity of each attribute as a matching input item into the weight model, so as to obtain a second weight value of each attribute.
A sixth calculating module 609, configured to calculate a third device similarity of the same device and a fourth device similarity between different devices according to the second weight values of the attributes.
The third determining module 610 is configured to determine whether a third device similarity of the same device and a fourth device similarity between different devices meet a second preset condition.
The first valuation module 604 is further configured to: and if the third equipment similarity of the same equipment and the fourth equipment similarity between different equipment meet a second preset condition, assigning the weight of the attribute as a second weight value.
The determining module 611 is configured to determine the attribute similarity algorithm of each attribute if a third device similarity of the same device and a fourth device similarity between different devices satisfy a second preset condition.
The obtaining module 612 is configured to obtain attribute information of a device to be confirmed.
The second calculating module 613 is configured to determine attribute information of the device to be confirmed participating in similarity calculation of each device, and a weight corresponding to the attribute.
And calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute and the attribute similarity calculation method in the attribute information of the equipment to be confirmed participating in the similarity calculation of each equipment.
In a preferred embodiment of the present invention, the second calculating module 613 is further configured to: calculating attribute similarity according to the attribute value corresponding to the attribute and an attribute similarity algorithm;
calculating the device similarity of the device to be confirmed and each device in the device library by using the following formula (2):
Figure BDA0001350951960000181
wherein S isdThe degree of similarity of the devices is represented,
Figure BDA0001350951960000182
representing the similarity of attributes of the ith attribute class, WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
The second determining module 614 is configured to determine whether each device similarity is greater than or equal to a preset threshold.
The second assigning module 615 is configured to assign, to the device to be confirmed, the device fingerprint of the device corresponding to the highest device similarity that is greater than or equal to the preset threshold if at least one device similarity is greater than or equal to the preset threshold.
And a third calculating module 616, configured to calculate a device fingerprint according to the attribute information of the device to be confirmed and assign the device fingerprint to the device to be confirmed if each similarity is smaller than the preset threshold.
According to the device provided by the embodiment of the invention, the weight value of the weight is calculated by using the weight model instead of setting the weight value of the weight according to the experience of people, so that the dependence on the experience of people is avoided, the accuracy of calculating the similarity of the equipment is effectively improved, and the accuracy of judging whether the equipment is the same equipment is improved; the accuracy of the equipment similarity calculation can be further improved by verifying the calculated weight value and changing the attribute similarity calculation method. In addition, the weight value of each attribute is calculated by using the device attribute information belonging to the same operating system, so that the calculation amount is reduced, and the accuracy of device similarity calculation is improved.
EXAMPLE seven
The embodiment of the present application provides a non-volatile computer storage medium, where the computer storage medium stores at least one executable instruction, and the computer executable instruction may execute the method for determining the device fingerprint according to the similarity in any of the above method embodiments.
Example eight
Fig. 7 is a schematic structural diagram of an eighth server according to an embodiment of the present invention, and the specific embodiment of the present invention does not limit the specific implementation of the server.
As shown in fig. 7, the server may include: a processor (processor)702, a Communications Interface 704, a memory 706, and a communication bus 708.
Wherein:
the processor 702, communication interface 704, and memory 706 communicate with each other via a communication bus 708.
A communication interface 704 for communicating with network elements of other devices, such as clients or other servers.
The processor 702, configured to execute the program 710, may specifically perform the relevant steps in the above method embodiment for determining a device fingerprint according to similarity.
In particular, the program 710 may include program code that includes computer operating instructions.
The processor 702 may be a central processing unit CPU, or an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits configured to implement an embodiment of the invention. The server comprises one or more processors, which can be the same type of processor, such as one or more CPUs; or may be a different type of processor such as one or more CPUs and one or more ASICs.
The memory 706 is used for storing a first data set, a second data set and a program 710. The memory 706 may comprise high-speed RAM memory, and may also include non-volatile memory (non-volatile memory), such as at least one disk memory.
The program 710 may be specifically configured to cause the processor 702 to execute the methods in the first to third embodiments.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, the present invention is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the teachings of the present invention as described herein, and any descriptions of specific languages are provided above to disclose the best mode of the invention.
In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the invention, various features of the invention are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various inventive aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that the invention as claimed requires more features than are expressly recited in each claim. Rather, as the following claims reflect, inventive aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this invention.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise.
Furthermore, those skilled in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the following claims, any of the claimed embodiments may be used in any combination.
It should be noted that the above-mentioned embodiments illustrate rather than limit the invention, and that those skilled in the art will be able to design alternative embodiments without departing from the scope of the appended claims. In the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

Claims (56)

1. A method for determining a device fingerprint from similarity, the method comprising:
extracting a plurality of device attribute data records belonging to the same operating system type in a device fingerprint library, wherein each device attribute data record consists of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information consists of an attribute and an attribute value corresponding to the attribute;
calculating a first weight value of each attribute according to the plurality of device attribute data records by using the following formula (1):
Figure FDA0002384309040000011
wherein, T represents a set of device attribute data records, | T | represents the number of device attribute data records contained in the set T, values (a) represents a set of all attribute values of the attribute a, T represents a set of all attribute values of the attribute a, andvis a subset of the device attribute data records for which the attribute value of attribute A in the set T is v, | TvWhen | represents that the attribute value of the attribute A is v, TvThe number of device attribute data records included in the set, S (T) represents the entropy of T, S (T)v) Represents TvEntropy of (d);
Figure FDA0002384309040000012
wherein c represents the number of attribute categories in the T set, piRepresenting the proportion of the device attribute data records of the ith attribute category in the T set to the total device attribute data records in the T set;
Figure FDA0002384309040000013
wherein, cvWhen the attribute value representing the attribute A is v, TvNumber of attribute classes in the set, pviWhen the attribute value representing the attribute A is v, TvDevice attribute data records of ith attribute category in set account for TvThe proportion of total device attribute data records in the collection;
judging whether the calculated first weight value of each attribute meets a first preset condition or not, and if so, assigning the weight of the attribute as the first weight value;
acquiring attribute information of the equipment to be confirmed, and calculating the equipment similarity of the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute of the equipment to be confirmed, the attribute value corresponding to the attribute and an attribute similarity algorithm;
judging whether the similarity of each device is greater than or equal to a preset threshold value or not;
and if at least one device similarity is larger than or equal to a preset threshold, assigning the device fingerprint of the device with the highest device similarity in the at least one device similarity larger than or equal to the preset threshold to the device to be confirmed.
2. The method of claim 1, further comprising: and if the similarity is smaller than the preset threshold value, calculating the fingerprint of the equipment according to the attribute information of the equipment to be confirmed, and assigning the fingerprint to the equipment to be confirmed.
3. The method according to claim 1 or 2, wherein the determining whether the calculated first weight value of each attribute meets a first preset condition further comprises:
analyzing a plurality of equipment attribute data records to obtain a first sample and a second sample, wherein the first sample is the equipment attribute data record of the same equipment at different moments, and the second sample is the equipment attribute data record of different equipment;
calculating a first device similarity of the same device and a second device similarity between different devices according to the first weight value;
judging whether the first equipment similarity of the same equipment and the second equipment similarity between different equipment meet a first preset condition or not;
and if so, assigning the weight of the attribute as a first weight value.
4. The method of claim 3, wherein calculating a first device similarity for the same device and a second device similarity between different devices according to the first weight value further comprises:
screening the attribute information according to the first weight value to obtain the attribute information of which the first weight value is greater than or equal to a preset weight value;
and calculating the first equipment similarity of the same equipment and the second equipment similarity between different equipment according to the first weight value corresponding to the attribute in the screened attribute information.
5. The method according to claim 3, wherein the first preset condition is specifically: the similarity of first equipment of the same equipment is greater than the similarity of initial equipment of the same equipment, and the similarity of second equipment between different equipment is less than the similarity of initial equipment between different equipment;
and calculating the initial equipment similarity of the same equipment and the initial equipment similarity between different equipment according to the initial weight value of the attribute.
6. The method according to claim 1 or 2, wherein the calculating the device similarity of the device to be confirmed and each device in the device library according to the weight corresponding to the attribute of the device to be confirmed, the attribute value corresponding to the attribute, and the attribute similarity algorithm further comprises:
determining attribute information of the equipment to be confirmed participating in similarity calculation of each equipment and weight corresponding to the attribute;
and calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute and the attribute similarity calculation method in the attribute information of the equipment to be confirmed participating in the similarity calculation of each equipment.
7. The method according to claim 6, wherein the calculating the device similarity between the device to be confirmed and each device in the device library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute, and the attribute similarity calculation method in the attribute information of the determined device to be confirmed participating in the similarity calculation of each device further comprises:
calculating attribute similarity according to the attribute value corresponding to the attribute and an attribute similarity algorithm;
calculating the device similarity of the device to be confirmed and each device in the device library by using the following formula (2):
Figure FDA0002384309040000021
wherein S isdThe degree of similarity of the devices is represented,
Figure FDA0002384309040000022
representing the similarity of attributes of the ith attribute class, WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
8. The method according to claim 1 or 2, wherein before calculating the first weight value of each attribute according to the plurality of device attribute data records using the following formula (1), the method further comprises:
preprocessing a plurality of equipment attribute data records to obtain preprocessed equipment attribute data records;
the calculating the first weight value of each attribute by using a feature weight calculation algorithm or a weight model algorithm according to the plurality of device attribute data records further includes:
and calculating a first weight value of each attribute by using a characteristic weight calculation algorithm or a weight model algorithm according to the preprocessed equipment attribute data records.
9. The method of claim 8, wherein preprocessing the plurality of device attribute data records further comprises: and deleting the device attribute data records of which the number of the attribute information is less than a preset threshold value.
10. The method of claim 8, wherein preprocessing the plurality of device attribute data records further comprises:
matching the device fingerprints corresponding to the plurality of device attribute data records with the device fingerprints in the blacklist respectively;
and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
11. The method according to claim 1 or 2, wherein the attribute information comprises: hardware attribute information, software attribute information, and/or behavior attribute information;
wherein the hardware attribute information comprises one or more of the following information: MAC address, brand, model, IMEI, serial number;
the software attribute information includes one or more of the following information: OS type, system settings, network settings, protocol fingerprint, browser attributes, geographic location;
the behavior attribute information includes one or more of the following information: access frequency, access time and operation track.
12. The method of claim 1 or 2, wherein the operating system type comprises: an IOS operating system, an Android operating system, a Windows operating system, and/or a Windows Phone operating system.
13. A method for determining a device fingerprint from similarity, the method comprising:
extracting a plurality of device attribute data records belonging to the same operating system type in a device fingerprint library, wherein each device attribute data record consists of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information consists of an attribute and an attribute value corresponding to the attribute;
analyzing a plurality of equipment attribute data records to obtain a first sample and a second sample, wherein the first sample is the equipment attribute data record of the same equipment at different moments, and the second sample is the equipment attribute data record of different equipment;
calculating the attribute similarity of the same equipment and the attribute similarity between different equipment by using an initial attribute similarity algorithm;
calculating the attribute similarity of the same equipment and the average value of the attribute similarities between different equipment as the attribute similarity of each attribute;
inputting the attribute similarity of each attribute into a weight model to obtain a first weight value of each attribute, wherein the weight model is obtained by training according to the attribute similarity of a large number of sample devices and the weight values corresponding to the attributes;
judging whether the calculated first weight value of each attribute meets a first preset condition or not, and if so, assigning the weight of the attribute as the first weight value;
acquiring attribute information of the equipment to be confirmed, and calculating the equipment similarity of the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute of the equipment to be confirmed, the attribute value corresponding to the attribute and an attribute similarity algorithm;
judging whether the similarity of each device is greater than or equal to a preset threshold value or not;
and if at least one device similarity is larger than or equal to a preset threshold, assigning the device fingerprint of the device with the highest device similarity in the at least one device similarity larger than or equal to the preset threshold to the device to be confirmed.
14. The method of claim 13, further comprising: and if the similarity is smaller than the preset threshold value, calculating the fingerprint of the equipment according to the attribute information of the equipment to be confirmed, and assigning the fingerprint to the equipment to be confirmed.
15. The method according to claim 13 or 14, wherein the determining whether the calculated first weight value of each attribute meets a first preset condition further comprises:
analyzing a plurality of equipment attribute data records to obtain a first sample and a second sample, wherein the first sample is the equipment attribute data record of the same equipment at different moments, and the second sample is the equipment attribute data record of different equipment;
calculating a first device similarity of the same device and a second device similarity between different devices according to the first weight value;
judging whether the first equipment similarity of the same equipment and the second equipment similarity between different equipment meet a first preset condition or not;
and if so, assigning the weight of the attribute as a first weight value.
16. The method of claim 15, wherein calculating a first device similarity for the same device and a second device similarity between different devices according to the first weight value further comprises:
screening the attribute information according to the first weight value to obtain the attribute information of which the first weight value is greater than or equal to a preset weight value;
and calculating the first equipment similarity of the same equipment and the second equipment similarity between different equipment according to the first weight value corresponding to the attribute in the screened attribute information.
17. The method according to claim 15, wherein the first preset condition is specifically: the similarity of first equipment of the same equipment is greater than the similarity of initial equipment of the same equipment, and the similarity of second equipment between different equipment is less than the similarity of initial equipment between different equipment;
and calculating the initial equipment similarity of the same equipment and the initial equipment similarity between different equipment according to the initial weight value of the attribute.
18. The method of claim 13, wherein before obtaining attribute information of a device to be validated, the method further comprises:
changing an attribute similarity algorithm, and recalculating the attribute similarity of the same equipment and the attribute similarity between different equipment by using the changed attribute similarity algorithm;
calculating the attribute similarity of the same equipment and the average value of the attribute similarities among different equipment after recalculation as the attribute similarity of each attribute;
inputting the attribute similarity of each attribute as a matching input item into the weight model to obtain a second weight value of each attribute;
calculating third equipment similarity of the same equipment and fourth equipment similarity between different equipment according to the second weight values of the attributes;
judging whether the third equipment similarity of the same equipment and the fourth equipment similarity between different equipment meet a second preset condition or not;
and if so, assigning the weight of the attribute as a second weight value, and determining the attribute similarity algorithm of each attribute.
19. The method according to claim 18, wherein the second preset condition is specifically: the third device similarity of the same device is greater than the first device similarity of the same device, and the fourth device similarity between different devices is less than the second device similarity between different devices.
20. The method according to claim 13 or 14, wherein the calculating the device similarity of the device to be confirmed and each device in the device library according to the weight corresponding to the attribute of the device to be confirmed, the attribute value corresponding to the attribute, and the attribute similarity algorithm further comprises:
determining attribute information of the equipment to be confirmed participating in similarity calculation of each equipment and weight corresponding to the attribute;
and calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute and the attribute similarity calculation method in the attribute information of the equipment to be confirmed participating in the similarity calculation of each equipment.
21. The method according to claim 20, wherein the calculating the device similarity between the device to be confirmed and each device in the device library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute, and the attribute similarity calculation method in the attribute information of the determined device to be confirmed participating in the similarity calculation of each device further comprises:
calculating attribute similarity according to the attribute value corresponding to the attribute and an attribute similarity algorithm;
calculating the device similarity of the device to be confirmed and each device in the device library by using the following formula (2):
Figure FDA0002384309040000051
wherein S isdThe degree of similarity of the devices is represented,
Figure FDA0002384309040000052
representing the similarity of attributes of the ith attribute class, WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
22. The method according to claim 13 or 14, wherein before calculating the first weight value of each attribute using a feature weight calculation algorithm or a weight model algorithm from a plurality of device attribute data records, the method further comprises:
preprocessing a plurality of equipment attribute data records to obtain preprocessed equipment attribute data records;
the calculating the first weight value of each attribute by using a feature weight calculation algorithm or a weight model algorithm according to the plurality of device attribute data records further includes:
and calculating a first weight value of each attribute by using a characteristic weight calculation algorithm or a weight model algorithm according to the preprocessed equipment attribute data records.
23. The method of claim 22, wherein preprocessing the plurality of device attribute data records further comprises: and deleting the device attribute data records of which the number of the attribute information is less than a preset threshold value.
24. The method of claim 22, wherein preprocessing the plurality of device attribute data records further comprises:
matching the device fingerprints corresponding to the plurality of device attribute data records with the device fingerprints in the blacklist respectively;
and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
25. The method according to claim 13 or 14, wherein the attribute information comprises: hardware attribute information, software attribute information, and/or behavior attribute information;
wherein the hardware attribute information comprises one or more of the following information: MAC address, brand, model, IMEI, serial number;
the software attribute information includes one or more of the following information: OS type, system settings, network settings, protocol fingerprint, browser attributes, geographic location;
the behavior attribute information includes one or more of the following information: access frequency, access time and operation track.
26. The method of claim 13 or 14, wherein the operating system type comprises: an IOS operating system, an Android operating system, a Windows operating system, and/or a Windows Phone operating system.
27. An apparatus for determining a fingerprint of a device based on similarity, the apparatus comprising:
the device comprises an extraction module, a storage module and a processing module, wherein the extraction module is used for extracting a plurality of device attribute data records belonging to the same operating system type in a device fingerprint library, each device attribute data record consists of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information consists of an attribute and an attribute value corresponding to the attribute;
a first calculating module, configured to calculate a first weight value of each attribute according to the multiple device attribute data records by using the following formula (1):
Figure FDA0002384309040000053
wherein, T represents a set of device attribute data records, | T | represents the number of device attribute data records contained in the set T, values (a) represents a set of all attribute values of the attribute a, T represents a set of all attribute values of the attribute a, andvis a subset of the device attribute data records for which the attribute value of attribute A in the set T is v, | TvWhen | represents that the attribute value of the attribute A is v, TvThe number of device attribute data records included in the set, S (T) represents the entropy of T, S (T)v) Represents TvEntropy of (d);
Figure FDA0002384309040000061
wherein c represents the number of attribute categories in the T set, piRepresenting the proportion of the device attribute data records of the ith attribute category in the T set to the total device attribute data records in the T set;
Figure FDA0002384309040000062
wherein, cvWhen the attribute value representing the attribute A is v, TvNumber of attribute classes in the set, pviWhen the attribute value representing the attribute A is v, TvDevice attribute data records of ith attribute category in set account for TvThe proportion of total device attribute data records in the collection;
the first judgment module is used for judging whether the calculated first weight value of each attribute meets a first preset condition or not;
the first assignment module is used for assigning the weight of each attribute to be a first weight value if the first weight value of each attribute accords with a first preset condition;
the acquisition module is used for acquiring attribute information of the equipment to be confirmed;
the second calculation module is used for calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute of the equipment to be confirmed, the attribute value corresponding to the attribute and an attribute similarity calculation method;
the second judging module is used for judging whether the similarity of each device is greater than or equal to a preset threshold value or not;
and the second assignment module is used for assigning the device fingerprint of the device with the highest device similarity in the at least one device similarity which is greater than or equal to the preset threshold value to the device to be confirmed if at least one device similarity is greater than or equal to the preset threshold value.
28. The apparatus of claim 27, further comprising: and the third calculating module is used for calculating the device fingerprint according to the attribute information of the device to be confirmed and assigning the device fingerprint to the device to be confirmed if each similarity is smaller than the preset threshold value.
29. The apparatus according to claim 27 or 28, wherein the first determining module is further configured to: analyzing a plurality of equipment attribute data records to obtain a first sample and a second sample, wherein the first sample is the equipment attribute data record of the same equipment at different moments, and the second sample is the equipment attribute data record of different equipment;
calculating a first device similarity of the same device and a second device similarity between different devices according to the first weight value;
judging whether the first equipment similarity of the same equipment and the second equipment similarity between different equipment meet a first preset condition or not;
the first valuation module is further to: and if the first equipment similarity of the same equipment and the second equipment similarity between different equipment meet a first preset condition, assigning the weight of the attribute as a first weight value.
30. The apparatus of claim 29, wherein the first determining module is further configured to: screening the attribute information according to the first weight value to obtain the attribute information of which the first weight value is greater than or equal to a preset weight value;
and calculating the first equipment similarity of the same equipment and the second equipment similarity between different equipment according to the first weight value corresponding to the attribute in the screened attribute information.
31. The apparatus according to claim 29, wherein the first preset condition is specifically: the similarity of first equipment of the same equipment is greater than the similarity of initial equipment of the same equipment, and the similarity of second equipment between different equipment is less than the similarity of initial equipment between different equipment;
and calculating the initial equipment similarity of the same equipment and the initial equipment similarity between different equipment according to the initial weight value of the attribute.
32. The apparatus of claim 27 or 28, wherein the second computing module is further configured to: determining attribute information of the equipment to be confirmed participating in similarity calculation of each equipment and weight corresponding to the attribute;
and calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute and the attribute similarity calculation method in the attribute information of the equipment to be confirmed participating in the similarity calculation of each equipment.
33. The apparatus of claim 32, wherein the second computing module is further configured to: calculating attribute similarity according to the attribute value corresponding to the attribute and an attribute similarity algorithm;
calculating the device similarity of the device to be confirmed and each device in the device library by using the following formula (2):
Figure FDA0002384309040000071
wherein S isdThe degree of similarity of the devices is represented,
Figure FDA0002384309040000072
representing the similarity of attributes of the ith attribute class, WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
34. The apparatus of claim 27 or 28, further comprising: the preprocessing module is used for preprocessing the plurality of equipment attribute data records to obtain preprocessed equipment attribute data records;
the first computing module is further to: and calculating a first weight value of each attribute by using a characteristic weight calculation algorithm or a weight model algorithm according to the preprocessed equipment attribute data records.
35. The apparatus of claim 34, wherein the preprocessing module is further configured to: and deleting the device attribute data records of which the number of the attribute information is less than a preset threshold value.
36. The apparatus of claim 34, wherein the preprocessing module is further configured to: matching the device fingerprints corresponding to the plurality of device attribute data records with the device fingerprints in the blacklist respectively;
and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
37. The apparatus according to claim 27 or 28, wherein the attribute information comprises: hardware attribute information, software attribute information, and/or behavior attribute information;
wherein the hardware attribute information comprises one or more of the following information: MAC address, brand, model, IMEI, serial number;
the software attribute information includes one or more of the following information: OS type, system settings, network settings, protocol fingerprint, browser attributes, geographic location;
the behavior attribute information includes one or more of the following information: access frequency, access time and operation track.
38. The apparatus of claim 27 or 28, wherein the operating system types comprise: an IOS operating system, an Android operating system, a Windows operating system, and/or a Windows Phone operating system.
39. An apparatus for determining a fingerprint of a device based on similarity, the apparatus comprising:
the device comprises an extraction module, a storage module and a processing module, wherein the extraction module is used for extracting a plurality of device attribute data records belonging to the same operating system type in a device fingerprint library, each device attribute data record consists of a device fingerprint and a plurality of attribute information corresponding to the device fingerprint, and each attribute information consists of an attribute and an attribute value corresponding to the attribute;
the device comprises a first calculation module, a second calculation module and a third calculation module, wherein the first calculation module is used for analyzing a plurality of device attribute data records to obtain a first sample and a second sample, the first sample is the device attribute data record of the same device at different moments, and the second sample is the device attribute data record of different devices; calculating the attribute similarity of the same equipment and the attribute similarity between different equipment by using an initial attribute similarity algorithm; calculating the attribute similarity of the same equipment and the average value of the attribute similarities between different equipment as the attribute similarity of each attribute; inputting the attribute similarity of each attribute into a weight model to obtain a first weight value of each attribute, wherein the weight model is obtained by training according to the attribute similarity of a large number of sample devices and the weight values corresponding to the attributes;
the first judgment module is used for judging whether the calculated first weight value of each attribute meets a first preset condition or not;
the first assignment module is used for assigning the weight of each attribute to be a first weight value if the first weight value of each attribute accords with a first preset condition;
the acquisition module is used for acquiring attribute information of the equipment to be confirmed;
the second calculation module is used for calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute of the equipment to be confirmed, the attribute value corresponding to the attribute and an attribute similarity calculation method;
the second judging module is used for judging whether the similarity of each device is greater than or equal to a preset threshold value or not;
and the second assignment module is used for assigning the device fingerprint of the device with the highest device similarity in the at least one device similarity which is greater than or equal to the preset threshold value to the device to be confirmed if at least one device similarity is greater than or equal to the preset threshold value.
40. The apparatus of claim 39, further comprising: and the third calculating module is used for calculating the device fingerprint according to the attribute information of the device to be confirmed and assigning the device fingerprint to the device to be confirmed if each similarity is smaller than the preset threshold value.
41. The apparatus according to claim 39 or 40, wherein the first determining module is further configured to: analyzing a plurality of equipment attribute data records to obtain a first sample and a second sample, wherein the first sample is the equipment attribute data record of the same equipment at different moments, and the second sample is the equipment attribute data record of different equipment;
calculating a first device similarity of the same device and a second device similarity between different devices according to the first weight value;
judging whether the first equipment similarity of the same equipment and the second equipment similarity between different equipment meet a first preset condition or not;
the first valuation module is further to: and if the first equipment similarity of the same equipment and the second equipment similarity between different equipment meet a first preset condition, assigning the weight of the attribute as a first weight value.
42. The apparatus of claim 41, wherein the first determining module is further configured to: screening the attribute information according to the first weight value to obtain the attribute information of which the first weight value is greater than or equal to a preset weight value;
and calculating the first equipment similarity of the same equipment and the second equipment similarity between different equipment according to the first weight value corresponding to the attribute in the screened attribute information.
43. The apparatus according to claim 41, wherein the first preset condition is specifically: the similarity of first equipment of the same equipment is greater than the similarity of initial equipment of the same equipment, and the similarity of second equipment between different equipment is less than the similarity of initial equipment between different equipment;
and calculating the initial equipment similarity of the same equipment and the initial equipment similarity between different equipment according to the initial weight value of the attribute.
44. The apparatus of claim 39, further comprising:
a modification module for modifying the attribute similarity algorithm;
the fourth calculation module is used for recalculating the attribute similarity of the same equipment and the attribute similarity between different equipment by using the changed attribute similarity algorithm;
the fifth calculation module is used for calculating the attribute similarity of the same equipment and the average value of the attribute similarities among different equipment after recalculation as the attribute similarity of each attribute;
the input module is used for inputting the attribute similarity of each attribute as a matching input item into the weight model to obtain a second weight value of each attribute;
a sixth calculating module, configured to calculate, according to the second weight values of the attributes, a third device similarity of the same device and a fourth device similarity between different devices;
the third judging module is used for judging whether the third equipment similarity of the same equipment and the fourth equipment similarity between different equipment meet a second preset condition or not;
the first valuation module is further to: if the third device similarity of the same device and the fourth device similarity between different devices meet a second preset condition, assigning the weight of the attribute as a second weight value;
and the determining module is used for determining the attribute similarity calculation method of each attribute if the third equipment similarity of the same equipment and the fourth equipment similarity between different equipment meet a second preset condition.
45. The apparatus according to claim 44, wherein the second predetermined condition is specifically: the third device similarity of the same device is greater than the first device similarity of the same device, and the fourth device similarity between different devices is less than the second device similarity between different devices.
46. The apparatus of claim 39 or 40, wherein the second computing module is further configured to: determining attribute information of the equipment to be confirmed participating in similarity calculation of each equipment and weight corresponding to the attribute;
and calculating the equipment similarity between the equipment to be confirmed and each equipment in the equipment library according to the weight corresponding to the attribute, the attribute value corresponding to the attribute and the attribute similarity calculation method in the attribute information of the equipment to be confirmed participating in the similarity calculation of each equipment.
47. The apparatus of claim 46, wherein the second computing module is further configured to: calculating attribute similarity according to the attribute value corresponding to the attribute and an attribute similarity algorithm;
calculating the device similarity of the device to be confirmed and each device in the device library by using the following formula (2):
Figure FDA0002384309040000091
wherein S isdThe degree of similarity of the devices is represented,
Figure FDA0002384309040000092
representing the similarity of attributes of the ith attribute class, WiAnd N represents the number of the attribute categories participating in the similarity calculation of the equipment.
48. The apparatus of claim 39 or 40, further comprising: the preprocessing module is used for preprocessing the plurality of equipment attribute data records to obtain preprocessed equipment attribute data records;
the first computing module is further to: and calculating a first weight value of each attribute by using a characteristic weight calculation algorithm or a weight model algorithm according to the preprocessed equipment attribute data records.
49. The apparatus of claim 48, wherein the preprocessing module is further configured to: and deleting the device attribute data records of which the number of the attribute information is less than a preset threshold value.
50. The apparatus of claim 48, wherein the preprocessing module is further configured to: matching the device fingerprints corresponding to the plurality of device attribute data records with the device fingerprints in the blacklist respectively;
and if so, deleting the device attribute data records matched with the device fingerprints in the blacklist.
51. The apparatus according to claim 39 or 40, wherein the attribute information comprises: hardware attribute information, software attribute information, and/or behavior attribute information;
wherein the hardware attribute information comprises one or more of the following information: MAC address, brand, model, IMEI, serial number;
the software attribute information includes one or more of the following information: OS type, system settings, network settings, protocol fingerprint, browser attributes, geographic location;
the behavior attribute information includes one or more of the following information: access frequency, access time and operation track.
52. The apparatus of claim 39 or 40, wherein the operating system types comprise: an IOS operating system, an Android operating system, a Windows operating system, and/or a Windows Phone operating system.
53. A server, characterized in that the server comprises: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the method of determining device fingerprints based on similarity as claimed in claims 1-12.
54. A computer storage medium having stored thereon at least one executable instruction for causing a processor to perform operations corresponding to the method of determining device fingerprints based on similarity as claimed in claims 1-12.
55. A server, characterized in that the server comprises: the system comprises a processor, a memory, a communication interface and a communication bus, wherein the processor, the memory and the communication interface complete mutual communication through the communication bus;
the memory is configured to store at least one executable instruction that causes the processor to perform operations corresponding to the method of determining device fingerprints based on similarity as claimed in claims 13-26.
56. A computer storage medium having stored thereon at least one executable instruction for causing a processor to perform operations corresponding to the method of determining device fingerprints based on similarity as claimed in claims 13-26.
CN201710575930.3A 2017-06-29 2017-07-14 Method and device for determining device fingerprint according to similarity and server Active CN107423613B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2017105143414 2017-06-29
CN201710514341 2017-06-29

Publications (2)

Publication Number Publication Date
CN107423613A CN107423613A (en) 2017-12-01
CN107423613B true CN107423613B (en) 2020-08-04

Family

ID=60426534

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710575930.3A Active CN107423613B (en) 2017-06-29 2017-07-14 Method and device for determining device fingerprint according to similarity and server

Country Status (1)

Country Link
CN (1) CN107423613B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108833384B (en) * 2018-05-31 2021-03-12 奇安信科技集团股份有限公司 Method and system for identifying counterfeit electronic devices
CN109446791A (en) * 2018-11-23 2019-03-08 杭州优行科技有限公司 New equipment recognition methods, device, server and computer readable storage medium
CN109376277B (en) * 2018-11-23 2020-11-20 京东数字科技控股有限公司 Method and device for determining equipment fingerprint homology
CN111291019B (en) * 2018-12-07 2023-09-29 中国移动通信集团陕西有限公司 Similarity discrimination method and device for data model
CN109800560B (en) * 2018-12-19 2021-06-11 同盾控股有限公司 Equipment identification method and device
CN112100604B (en) * 2019-06-17 2024-04-05 北京达佳互联信息技术有限公司 Terminal equipment information processing method and device
CN110837635A (en) * 2019-11-07 2020-02-25 深圳乐信软件技术有限公司 Method, device, equipment and storage medium for equipment verification
CN111241524A (en) * 2020-01-18 2020-06-05 苏州浪潮智能科技有限公司 Method and system for judging uniqueness of equipment
CN111478986B (en) * 2020-06-22 2020-09-25 腾讯科技(深圳)有限公司 Method, device and equipment for generating equipment fingerprint and storage medium
CN111814909B (en) * 2020-08-06 2021-07-06 广州蜜妆信息科技有限公司 Information processing method based on network live broadcast and online e-commerce delivery and cloud server
CN113989859B (en) * 2021-12-28 2022-05-06 江苏苏宁银行股份有限公司 Fingerprint similarity identification method and device for anti-flashing equipment
CN117041983B (en) * 2023-10-08 2024-02-06 中邮消费金融有限公司 Mobile terminal equipment fingerprint generation method capable of dynamically adjusting parameters

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103166917B (en) * 2011-12-12 2016-02-10 阿里巴巴集团控股有限公司 Network equipment personal identification method and system
CN105989373B (en) * 2015-02-15 2019-07-23 阿里巴巴集团控股有限公司 The acquisition device-fingerprint method and device realized using training pattern

Also Published As

Publication number Publication date
CN107423613A (en) 2017-12-01

Similar Documents

Publication Publication Date Title
CN107423613B (en) Method and device for determining device fingerprint according to similarity and server
CN107066616B (en) Account processing method and device and electronic equipment
CN111090807B (en) Knowledge graph-based user identification method and device
CN106803039B (en) A kind of homologous determination method and device of malicious file
CN109858919B (en) Abnormal account number determining method and device, and online ordering method and device
CN109919781A (en) Case recognition methods, electronic device and computer readable storage medium are cheated by clique
CN111260220B (en) Group control equipment identification method and device, electronic equipment and storage medium
CN110503566B (en) Wind control model building method and device, computer equipment and storage medium
CN107818491A (en) Electronic installation, Products Show method and storage medium based on user's Internet data
CN106095939B (en) The acquisition methods and device of account authority
CN110675252A (en) Risk assessment method and device, electronic equipment and storage medium
CN111242318A (en) Business model training method and device based on heterogeneous feature library
CN112749973A (en) Authority management method and device and computer readable storage medium
CN115222443A (en) Client group division method, device, equipment and storage medium
CN107885754B (en) Method and device for extracting credit variable from transaction data based on LDA model
CN111311276B (en) Identification method and device for abnormal user group and readable storage medium
CN117376228A (en) Network security testing tool determining method and device
CN116151965A (en) Risk feature extraction method and device, electronic equipment and storage medium
CN112241820A (en) Risk identification method and device for key nodes in fund flow and computing equipment
CN113326405B (en) Park entrance recommendation method and system based on BIM technology
CN109587248A (en) User identification method, device, server and storage medium
CN114817518A (en) License handling method, system and medium based on big data archive identification
CN113688206A (en) Text recognition-based trend analysis method, device, equipment and medium
CN107977413A (en) Feature selection approach, device, computer equipment and the storage medium of user data
CN110570301B (en) Risk identification method, device, equipment and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20200701

Address after: 4f, building C2, Suzhou 2.5 Industrial Park, 88 Dongchang Road, Suzhou Industrial Park, Jiangsu Province

Applicant after: JIANGSU TONGFUDUN INFORMATION SECURITY TECHNOLOGY Co.,Ltd.

Applicant after: JIANGSU PAY EGIS TECHNOLOGY Co.,Ltd.

Address before: Suzhou City, Jiangsu province 215021 East Road, Suzhou Industrial Park, No. 88 Suzhou 2.5 Industrial Park C2 building room 3F-301

Applicant before: JIANGSU TONGFUDUN INFORMATION SECURITY TECHNOLOGY Co.,Ltd.

GR01 Patent grant
GR01 Patent grant