CN110874465B - Mobile equipment entity identification method and device based on semi-supervised learning algorithm - Google Patents

Mobile equipment entity identification method and device based on semi-supervised learning algorithm Download PDF

Info

Publication number
CN110874465B
CN110874465B CN201811011479.3A CN201811011479A CN110874465B CN 110874465 B CN110874465 B CN 110874465B CN 201811011479 A CN201811011479 A CN 201811011479A CN 110874465 B CN110874465 B CN 110874465B
Authority
CN
China
Prior art keywords
identifiers
mobile equipment
mobile device
nodes
mobile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811011479.3A
Other languages
Chinese (zh)
Other versions
CN110874465A (en
Inventor
王灿
沈鑫
冼伟钊
杨红霞
王中要
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201811011479.3A priority Critical patent/CN110874465B/en
Publication of CN110874465A publication Critical patent/CN110874465A/en
Application granted granted Critical
Publication of CN110874465B publication Critical patent/CN110874465B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/44Program or device authentication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/513Sparse representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Security & Cryptography (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Mobile Radio Communication Systems (AREA)

Abstract

The application discloses a mobile equipment entity identification method and device based on a semi-supervised learning algorithm. The mobile equipment entity identification method based on the semi-supervised learning algorithm comprises the following steps: according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph of the identifiers of the determined mobile equipment and the marked data used for entity identification of the mobile equipment, iterative operation is carried out by utilizing a loss function of a semi-supervised learning algorithm, the characteristics of the mobile equipment center to which the identifiers belong and the characteristics of the mobile equipment center are determined, and the mobile equipment uniquely corresponding to a plurality of identifiers is determined by judging whether the characteristics of the mobile equipment center to which the identifiers belong are the same and the similarity between the characteristics of the mobile equipment center to which the identifiers belong and the characteristics of each equipment center. The mobile equipment entity identification is carried out by combining a small amount of marked data with a semi-supervised learning algorithm loss function, so that the accuracy of the mobile equipment entity identification is improved.

Description

Mobile equipment entity identification method and device based on semi-supervised learning algorithm
Technical Field
The application relates to the field of mobile equipment entity identification, in particular to a method and a device for identifying a mobile equipment entity based on a semi-supervised learning algorithm, electronic equipment and storage equipment.
Background
With the development of artificial intelligence, machine learning has gradually become a basic support and service technology, and the demands of different fields on machine learning are different. Machine learning is used to describe the process of data analysis using algorithms, building models that can be learned from them, and finally using the models built from these data for predictive analysis. In the field of mobile device identification, problems of reinstalling a system, replacing a mobile device, emulating an airplane, simulating an attack and the like are frequently encountered, partial data information of the mobile device is often lost due to the problems, if lost data is to be recalled, the mobile device needs to be identified first, but a commonly-used mobile device entity identification algorithm based on a mobile device identifier is often complex to calculate. In addition, with the popularization of the mobile internet, data is in a geometric situation, and the massive data not only has various structures but also presents characteristics with extremely strong dynamics, so that more manpower and time are consumed for iterative training of the mobile equipment entity recognition algorithm by acquiring a large amount of marked data, and the conventional method for inputting a large amount of marked data into the mobile equipment entity recognition algorithm for training so as to realize mobile equipment entity recognition is not applicable any more.
In order to solve the above problems, the conventional solutions in the art generally perform unsupervised learning training without any labeled data according to a unique identifier randomly generated by hardware and system information during APP installation, so as to implement mobile device entity identification.
Disclosure of Invention
The application provides a mobile equipment entity identification method and device based on a semi-supervised learning algorithm, and aims to solve the problems that in the prior art, the mobile equipment entity identification process is complicated and the accuracy is low. The application further provides electronic equipment and storage equipment for mobile equipment entity identification based on the semi-supervised learning algorithm.
The application provides a mobile equipment entity identification method based on a semi-supervised learning algorithm, which comprises the following steps:
determining attribute features of nodes in a co-occurrence sparse graph of identifiers of the mobile devices;
determining tagged data for the mobile device entity identification, wherein a quantity of the tagged data does not exceed a first quantity threshold;
determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data and by using a loss function of a semi-supervised learning algorithm;
if the characteristics of the mobile equipment centers of a plurality of identifiers are the same, determining the plurality of identifiers to be a plurality of identifiers of the same mobile equipment;
and determining the mobile equipment uniquely corresponding to the identifiers according to the similarity between the characteristics of the mobile equipment center to which the identifiers belong and the characteristics of each equipment center.
Optionally, the sparse co-occurrence relationship graph of the identifiers of the mobile devices is obtained by connecting nodes including the same identifier by using a set of identifier information of the mobile devices as nodes, and deleting the connection relationships between the nodes including the same identifier, where the number of the nodes reaches or exceeds a preset node number threshold.
Optionally, the determining, according to the attribute features of the nodes in the co-occurrence sparse graph and the marked data, the features of the mobile device center to which the identifier belongs and the features of the mobile device center by using a loss function of a semi-supervised learning algorithm includes:
establishing a loss function of a semi-supervised learning algorithm;
and performing iterative optimization algorithm training by taking the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data as parameters of a loss function of the semi-supervised learning algorithm to obtain the characteristics of the mobile equipment center to which each identifier belongs and the characteristics of the mobile equipment center.
Optionally, the set of identifier information of the mobile device includes software and hardware identifier information corresponding to the mobile device.
Optionally, the software and hardware identifier information corresponding to the mobile device specifically includes at least one of the following identifier information:
an equipment identity, IMEI, for uniquely identifying the mobile equipment;
a subscriber identity IMSI for uniquely identifying mobile subscriber information corresponding to the mobile device;
an advertisement Identifier (IDFA) for tracking the mobile device operation information;
a software identifier UTDID for uniquely identifying the mobile device.
Optionally, the set of identifier information of the mobile device uniquely represents a real physical mobile device.
Optionally, the attribute feature of the middle node of the co-occurrence sparse graph is a feature of a set of identifier information of the mobile device.
Optionally, if the features of the mobile device centers to which the multiple identifiers belong are the same, determining that the multiple identifiers are multiple identifiers of the same mobile device specifically includes:
determining the similarity of the attribute characteristics of any two nodes of the co-occurrence relationship sparse graph;
determining that two identifier sets respectively corresponding to the two nodes belong to the same mobile device according to the similarity meeting a preset similarity threshold;
and determining that a plurality of identifiers contained in the identifier sets are a plurality of identifiers of the same mobile device according to the fact that the two identifier sets respectively corresponding to the two nodes belong to the same mobile device.
Correspondingly, the present application also provides a mobile device entity identification apparatus based on semi-supervised learning algorithm, including:
a first obtaining unit: attribute features of nodes in a sparse graph of co-occurrence relationships for determining an identifier of the mobile device;
a second obtaining unit for determining marked data for the mobile device entity identification, wherein the amount of marked data does not exceed a first amount threshold;
a calculation unit: the mobile equipment center identification method comprises the steps that the characteristic of a mobile equipment center to which an identifier belongs and the characteristic of the mobile equipment center are determined according to the attribute characteristics of nodes in the co-occurrence relation sparse graph and the marked data by utilizing a loss function of a semi-supervised learning algorithm;
the first determining unit is used for determining a plurality of identifiers as a plurality of identifiers of the same mobile equipment if the characteristics of the mobile equipment centers to which the plurality of identifiers belong are the same;
a second determining unit, configured to determine, according to a similarity between a feature of a mobile device center to which the multiple identifiers belong and a feature of each device center, the mobile device to which the multiple identifiers uniquely correspond.
Correspondingly, the present application also provides an electronic device, comprising:
a processor; and
a memory for storing a program of a semi-supervised learning algorithm based mobile device entity identification method, the device being powered on and executing the program of the semi-supervised learning algorithm based mobile device entity identification method through the processor, and performing the following steps:
determining attribute features of nodes in a co-occurrence sparse graph of identifiers of the mobile devices;
determining labeled data for the mobile device entity identification, wherein a quantity of the labeled data is equal to or less than a first quantity threshold;
determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data and by using a loss function of a semi-supervised learning algorithm;
if the characteristics of the mobile equipment centers of a plurality of identifiers are the same, determining the plurality of identifiers to be a plurality of identifiers of the same mobile equipment;
and determining the mobile equipment uniquely corresponding to the identifiers according to the similarity between the characteristics of the mobile equipment center to which the identifiers belong and the characteristics of each equipment center.
Accordingly, the present application provides a storage device storing a program of a mobile device entity identification method based on a semi-supervised learning algorithm, the program being executed by a processor and performing the following steps:
determining attribute features of nodes in a co-occurrence sparse graph of identifiers of the mobile devices;
determining labeled data for the mobile device entity identification, wherein a quantity of the labeled data is equal to or less than a first quantity threshold;
determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data and by using a loss function of a semi-supervised learning algorithm;
if the characteristics of the mobile equipment centers of a plurality of identifiers are the same, determining the plurality of identifiers to be a plurality of identifiers of the same mobile equipment;
and determining the mobile equipment uniquely corresponding to the identifiers according to the similarity between the characteristics of the mobile equipment center to which the identifiers belong and the characteristics of each equipment center.
Compared with the prior art, the method has the following advantages:
the application provides a mobile equipment entity identification method based on a semi-supervised learning algorithm, which comprises the steps of carrying out iterative operation by utilizing a loss function of the semi-supervised learning algorithm according to attribute characteristics of nodes in a co-occurrence relation sparse graph of identifiers of determined mobile equipment and marked data used for identifying the mobile equipment entities, determining characteristics of mobile equipment centers to which the identifiers belong and characteristics of the mobile equipment centers, and determining the mobile equipment uniquely corresponding to a plurality of identifiers by judging whether the characteristics of the mobile equipment centers to which the identifiers belong are the same and the similarity between the characteristics of the mobile equipment centers to which the identifiers belong and the characteristics of each equipment center. Therefore, the mobile equipment entity identification is carried out by establishing a data model by combining a small amount of labeled data with a semi-supervised learning algorithm loss function, and the method has the advantages of improving the generalization performance of the classifier by using a small amount of labeled data and improving the accuracy of the mobile equipment entity identification.
Drawings
Fig. 1 is a flowchart of a mobile device entity identification method based on a semi-supervised learning algorithm according to an embodiment of the present application;
fig. 2 is a schematic diagram of a mobile device entity identification apparatus based on a semi-supervised learning algorithm according to an embodiment of the present application;
fig. 3 is a schematic diagram of an electronic device for mobile device entity identification based on a semi-supervised learning algorithm according to an embodiment of the present application;
fig. 4 is an identification flowchart of a mobile device entity identification method based on a semi-supervised learning algorithm according to an embodiment of the present application;
fig. 5 is a structural diagram of a co-occurrence sparse graph according to an embodiment of the present application.
Detailed Description
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present application. This application is capable of embodiments in many different forms than those described herein and is intended to be limited to the specific embodiments disclosed herein by those skilled in the art without departing from the spirit and scope of the present application.
In order to make those skilled in the art better understand the solution of the present application, the following describes an embodiment of the mobile device entity identification method based on a semi-supervised learning algorithm provided in the present application in detail. In addition, in the following description, detailed explanation will be made separately for each step of the present method. Please refer to fig. 1, which is a flowchart illustrating a method for identifying a mobile device entity based on a semi-supervised learning algorithm according to an embodiment of the present application.
And S101, determining the attribute characteristics of the nodes in the co-occurrence sparse graph of the identifier of the mobile equipment.
In this embodiment, the sparse co-occurrence relationship graph is obtained by connecting nodes including the same identifier to compose a graph, and deleting the connection relationships between nodes including the same identifier, the number of which reaches or exceeds a preset node number threshold, with a set of identifier information of the mobile device as a node. Wherein the preset threshold value of the number of nodes is 1000. In order to reduce the time complexity, when the number of nodes containing any one identifier of the four identifiers of IMEI, IMSI, IDFA, UTDID reaches or exceeds 1000, that is, if an identifier appears in more than 1000 nodes, the nodes containing the identifier are not connected in the sub-loop, and the connection relationship between the connected nodes containing the same identifier is deleted, so as to obtain the constructed sparse co-occurrence relationship graph of the mobile equipment identifier. Of course, the preset node number threshold is not limited to the above disclosed numerical value, and may be set in advance according to specific situations.
In this embodiment, the attribute feature of the node in the co-occurrence sparse graph is a feature of a set of identifier information of the mobile device. If a sparse graph of the co-occurrence relationship of the mobile device identifiers is to be constructed, all software and hardware mobile device identifiers need to be extracted from all the mobile device access logs, and a set of mobile device identifier information recorded in each access log is used as a node in the graph. The set of mobile device identifier information is an identifier set for uniquely representing a physical mobile device, and includes a hardware identifier and a software identifier, and specifically, the set of mobile device identifier information includes at least one identifier selected from a device identifier IMEI for uniquely identifying a mobile device, a subscriber identifier IMSI for uniquely identifying mobile subscriber information corresponding to a mobile device, an advertisement identifier IDFA for tracking operation information of a mobile device, and a software identifier UTDID for uniquely identifying a mobile device. And taking the set of the identifier information of the mobile equipment as the nodes of the sparse graph of the co-occurrence relationship of the identifiers of the mobile equipment, namely, each node in the sparse graph of the co-occurrence relationship of the identifiers of the mobile equipment represents a set of the identifier information of the mobile equipment.
It should be noted that the same identifier mentioned above may be at least one identifier selected from an equipment identity IMEI for uniquely identifying the first mobile equipment, a subscriber identity IMSI for uniquely identifying mobile subscriber information corresponding to the first mobile equipment, an advertisement identifier IDFA for tracking operation information of the first mobile equipment, and a software identifier UTDID for uniquely identifying the first mobile equipment.
In this embodiment, for each identifier we traverse through all the sets of mobile device identifier information (i.e., nodes in the graph), finding all the nodes containing this identifier to connect. It should be noted that in the embodiment of the present application, four identifiers, i.e., IMEI, IMSI, IDFA, and UTDID, are mainly extracted, if one identifier appears in two different nodes, the two nodes are connected into one edge, and the found nodes are connected in pairs in sequence to form a composition. Please refer to fig. 5, which is a structural diagram of a co-occurrence sparse graph according to an embodiment of the present application. When one identifier appears in a large number of nodes, namely excessive nodes are associated, in this case, edges are not connected, and the excessive connected nodes are deleted, so that the sparsity of the graph is ensured, and the time complexity is reduced. The IMEI is a mobile equipment identity, and the IMEI is a mark for distinguishing a mobile subscriber, is stored in the SIM card, and can be used for distinguishing valid information of the subscriber.
Step S102, determining marked data for the mobile device entity identification, wherein the amount of the marked data does not exceed a first amount threshold.
In the machine learning process, unmarked data is easy to obtain, while marked data is difficult to obtain, because marking data usually consumes much labor and time. The unsupervised learning algorithm belongs to a clustering learning algorithm without any labeled data, although labeled data are not needed, the obtained model is not accurate enough in the mobile equipment identification process, so that the embodiment of the application establishes a data model by using a semi-supervised learning algorithm combining a small amount of labeled data and a large amount of unlabeled data to identify the mobile equipment entity, and has the advantages of improving the generalization performance of a classifier by using a small amount of labeled data, and further, not consuming much time and labeled data. During the use process of the mobile device, the mobile device identifier set in an access record is corresponding to a unique real physical device, and partial marked data is necessary in the identification process and is used as the basis of a semi-supervised learning algorithm. The first quantity threshold is the quantity of marked data which can meet the requirement of the entity identification accuracy of the mobile equipment. The marked data means that the real values of some data are found by some technical means inside an enterprise from the existing data, and the data are used as the marks, namely a small part of identifier set pairs are given, and the conclusion that whether the data belong to the same device is obtained.
And S103, determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center by utilizing a loss function of a semi-supervised learning algorithm according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data.
In this embodiment, the mobile device entity identification is performed by constructing a mobile device identifier co-occurrence relation sparse graph, inputting a small amount of marked data to a loss function of a brand-new semi-supervised learning algorithm, and integrating the structural characteristics and the node attribute characteristics of the co-occurrence relation sparse graph. Due to the sparsity of the graph, the number of edges in the mobile equipment identifier co-occurrence relation sparse graph is similar to the number of points, and the semi-supervised learning algorithm iterates according to the number of edges of the co-occurrence relation sparse graph every time, so that the lower time complexity is ensured.
Therefore, in this embodiment, if the feature of the mobile device center to which the identifier belongs and the feature of the mobile device center are to be obtained, a loss function of the semi-supervised learning algorithm needs to be established first, and the attribute features of the nodes in the co-occurrence relationship sparse graph and a small amount of labeled data are input as parameters into the loss function of the semi-supervised learning algorithm to perform iterative optimization algorithm training, so as to obtain the feature of the mobile device center to which each identifier belongs and the feature of the mobile device center. Wherein, the attribute feature of the middle node of the co-occurrence sparse graph is the feature of the set of identifier information of the mobile device. The expression of the loss function of the semi-supervised learning algorithm is as follows:
Figure BDA0001785206870000081
where λ denotes a penalty parameter, xiIs a feature of the set of identifiers of each mobile device, CsiIs a characteristic of the mobile device center to which each identifier belongs,
Figure BDA0001785206870000082
is marked data, i and j are identifications of any two nodes of the co-occurrence sparse graph, wijAnd L is a loss function optimized by an algorithm, and is the similarity of the attribute characteristics of any two nodes of the co-occurrence relationship sparse graph.
In this embodiment, the process of obtaining the feature of the mobile equipment center to which each identifier belongs and the feature of the mobile equipment center specifically includes: according to the similarity of the attribute characteristics of any two nodes of the co-occurrence relationship sparse graph, determining y corresponding to each edge by using an iterative formulaijA value of (a), wherein yijA value of 1 or 0 indicates whether the two sets of identifiers belong to the same device, respectively. And inputting the attribute characteristics and marked data of the nodes in the co-occurrence relation sparse graph into a derivation formula of a loss function of a semi-supervised learning algorithm as parameters for iterative optimization algorithm training, and obtaining the characteristics for expressing the mobile equipment center to which each identifier belongs and the characteristics for expressing the mobile equipment center. Wherein, the expression of the iterative formula is specifically as follows:
Figure BDA0001785206870000083
wherein wijSimilarity of attribute features of any two nodes of the co-occurrence relationship sparse graph, yijA value of 1 or 0 indicates whether the two identifier sets belong to the same device, and δ is a set threshold used for determining whether the two devices are equal.
The expression of the derivation formula is specifically as follows:
Figure BDA0001785206870000084
wherein x isiIs a feature of the set of identifiers of each mobile device, CsiIs the feature of the mobile equipment center to which each identifier belongs, i and j are the identifications of any two nodes of the co-occurrence sparse graph, wijFor the similarity of the attribute characteristics of any two nodes of the co-occurrence relationship sparse graph, L is a loss function optimized by an algorithm, and lambda represents a penalty parameter.
Step S104, if the characteristics of the mobile equipment centers of the plurality of identifiers are the same, determining that the plurality of identifiers are the plurality of identifiers of the same mobile equipment.
Step S105, determining the mobile devices uniquely corresponding to the multiple identifiers according to the similarity between the features of the mobile device centers to which the multiple identifiers belong and the features of each device center.
And performing parallelization iterative optimization algorithm training according to the loss function of the provided brand-new semi-supervised learning algorithm, solving to obtain the characteristics of each mobile equipment center, performing comparative analysis according to the acquired characteristics of the mobile equipment center to which each identifier belongs and the characteristics of each mobile equipment center, and enabling the identifier contained in each node to correspond to the only mobile equipment with the closest characteristic similarity, thereby determining the final result of the mobile equipment entity identification based on the mobile equipment identifier co-occurrence relation sparse graph.
Please refer to fig. 4, which is a flowchart illustrating an identification method for identifying a mobile device entity based on a semi-supervised learning algorithm according to an embodiment of the present application
In the embodiment, the entity identification of the mobile device is performed by constructing the sparse graph of the co-occurrence relationship of the mobile device and adopting a brand-new unsupervised learning algorithm and integrating the structure of the sparse graph of the co-occurrence relationship of the mobile device and the attribute characteristics of the nodes, so that the entity identification result of the mobile device is obtained. The scheme can eliminate the influence caused by the abnormal problem of the identifiers of various mobile devices to a certain extent, and greatly improves the identification precision of the mobile devices. The abnormal problems that can be solved by the embodiment of the application include, but are not limited to: the problem of data loss of the mobile equipment caused by the problems of dual-card dual-standby, system reinstallation, mobile phone replacement, emulational mobile phone attack and the like. In the dual-card dual-standby problem, the four combinations of the IMEI and the IMSI form a strong incidence relation in a sparse graph of the co-occurrence relation of the identifiers of the mobile equipment; after the system is reinstalled, all software identifiers related to the mobile equipment and corresponding to the IMEI identifier and the IMSI identifier can be recalled through hardware identifiers such as the IMEI identifier and the IMSI identifier; after the mobile phone is replaced, all backup data of the mobile equipment corresponding to the IMSI can be recalled by utilizing the IMSI and the access attribute; huge connectivity caused by the emulational machine problem and the simulator problem can be solved when constructing the co-occurrence relationship sparse graph of the mobile equipment identifier. Meanwhile, the semi-supervised learning algorithm provided by the embodiment of the application supports parallelization processing of data, so that large-scale data processing is facilitated. If we do not use the constructed sparse graph of the co-occurrence relationship of the identifiers of the mobile devices, the complexity of an algorithm for entity identification of the mobile devices is too high, parallel computing is not facilitated, and large-scale data cannot be processed.
The application provides a mobile equipment entity identification method based on a semi-supervised learning algorithm, which comprises the steps of carrying out iterative operation by utilizing a loss function of the semi-supervised learning algorithm according to attribute characteristics of nodes in a co-occurrence relation sparse graph of an obtained identifier of a mobile equipment and marked data used for identifying the mobile equipment entity to obtain characteristics of a mobile equipment center to which the identifier belongs and characteristics of the mobile equipment center, and determining the mobile equipment uniquely corresponding to a plurality of identifiers by judging whether the characteristics of the mobile equipment center to which the identifier belongs are the same and the similarity between the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of each equipment center. The time complexity of the algorithm is reduced by utilizing the structure of the co-occurrence relation sparse graph and the characteristics of the nodes, so that the iterative optimization process of the semi-supervised learning algorithm is optimized, a small amount of marked data is further added into the semi-supervised learning algorithm, and the identification precision of the mobile equipment is improved.
Corresponding to the above-mentioned method for identifying an entity of a mobile device based on a semi-supervised learning algorithm, an embodiment of the present application further provides an apparatus for identifying an entity of a mobile device based on a semi-supervised learning algorithm, please refer to fig. 2, which is a schematic diagram of the apparatus for identifying an entity of a mobile device based on a semi-supervised learning algorithm provided in the embodiment of the present application.
The first obtaining unit 201: attribute features of nodes in a sparse graph of co-occurrence relationships for determining an identifier of the mobile device.
In this embodiment, the attribute feature of the node in the co-occurrence sparse graph is a feature of a set of identifier information of the mobile device. The co-occurrence relation sparse graph is obtained by taking a set of identifier information of the mobile equipment as a node, connecting nodes containing the same identifier for composition, and deleting the connection relation among the nodes containing the same identifier, the number of which reaches or exceeds a preset node number threshold value. Wherein the preset threshold value of the number of nodes is 1000. In order to reduce the time complexity, when the number of nodes including any one identifier of the four identifiers of IMEI, IMSI, IDFA, UTDID reaches or exceeds 1000, the nodes including the identifier are not connected in the current sub-loop, and the connection relationship between the connected nodes including the same identifier is deleted, so as to obtain the constructed co-occurrence relationship sparse graph of the mobile equipment identifier. Of course, the preset node number threshold is not limited to the above disclosed numerical value, and may be set in advance according to specific situations.
A second obtaining unit 202 for determining marked data for the mobile device entity identification, wherein the amount of marked data does not exceed a first amount threshold.
In the machine learning process, unmarked data is easy to obtain, while marked data is difficult to obtain, because marking data usually consumes much labor and time. The unsupervised learning algorithm belongs to a clustering learning algorithm without any labeled data, although labeled data are not needed, the obtained model is not accurate enough in the mobile equipment identification process, so that the embodiment of the application establishes a data model by using a semi-supervised learning algorithm combining a small amount of labeled data and a large amount of unlabeled data to identify the mobile equipment entity, and has the advantages of improving the generalization performance of a classifier by using a small amount of labeled data, and further, not consuming much time and labeled data. During the use process of the mobile device, the mobile device identifier set in one access record is corresponding to a unique mobile device, and partial marked data is necessary in the identification process and is used as the basis of a semi-supervised learning algorithm. The first quantity threshold is the quantity of marked data which can meet the requirement of the entity identification accuracy of the mobile equipment.
The calculation unit 203: and the mobile equipment center identification method is used for determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data and by utilizing a loss function of a semi-supervised learning algorithm.
In this embodiment, the mobile device entity identification is performed by constructing a mobile device identifier co-occurrence relation sparse graph, inputting a small amount of marked data to a loss function of a brand-new semi-supervised learning algorithm, and integrating the structural characteristics and the node attribute characteristics of the co-occurrence relation sparse graph. Due to the sparsity of the graph, the number of edges in the mobile equipment identifier co-occurrence relation sparse graph is similar to the number of points, and the semi-supervised learning algorithm iterates according to the number of edges of the co-occurrence relation sparse graph every time, so that the lower time complexity is ensured.
Therefore, in this embodiment, if the feature of the mobile device center to which the identifier belongs and the feature of the mobile device center are to be obtained, a loss function of the semi-supervised learning algorithm needs to be established first, and the attribute features of the nodes in the co-occurrence relationship sparse graph and a small amount of labeled data are input as parameters into the loss function of the semi-supervised learning algorithm to perform iterative optimization algorithm training, so as to obtain the feature of the mobile device center to which each identifier belongs and the feature of the mobile device center.
A first determining unit 204, configured to determine that the multiple identifiers are multiple identifiers of the same mobile device if the features of the mobile device centers to which the multiple identifiers belong are the same.
A second determining unit 205, configured to determine, according to a similarity between a feature of a mobile device center to which the multiple identifiers belong and a feature of each device center, the mobile device to which the multiple identifiers uniquely correspond.
In this embodiment, the parallelized iterative optimization algorithm training is performed according to the loss function of the provided brand-new semi-supervised learning algorithm, the features of each mobile device center are obtained through solution, after the features of each mobile device center are obtained, comparative analysis is performed according to the obtained features of the mobile device center to which each identifier belongs and the features of each mobile device center, and the identifier included in each node is corresponding to the only one mobile device with the closest feature similarity, so that the final result of mobile device entity identification based on the mobile device identifier co-occurrence relationship sparse graph is determined.
Corresponding to the above-mentioned mobile device entity identification method based on semi-supervised learning algorithm, an embodiment of the present application further provides an electronic device, please refer to fig. 3, which is a schematic diagram of an electronic device for mobile device entity identification based on semi-supervised learning algorithm provided in an embodiment of the present application.
The electronic equipment of the mobile equipment entity identification method based on the semi-supervised learning algorithm comprises the following parts:
a processor; and
a memory for storing a program of a semi-supervised learning algorithm based mobile device entity identification method, the device being powered on and executing the program of the semi-supervised learning algorithm based mobile device entity identification method through the processor, and performing the following steps:
determining attribute features of nodes in a co-occurrence sparse graph of identifiers of the mobile devices;
determining labeled data for the mobile device entity identification, wherein a quantity of the labeled data is equal to or less than a first quantity threshold;
determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data and by using a loss function of a semi-supervised learning algorithm;
if the characteristics of the mobile equipment centers of a plurality of identifiers are the same, determining the plurality of identifiers to be a plurality of identifiers of the same mobile equipment;
and determining the mobile equipment uniquely corresponding to the identifiers according to the similarity between the characteristics of the mobile equipment center to which the identifiers belong and the characteristics of each equipment center.
It should be noted that, for the detailed description of the electronic device provided in the embodiment of the present application, reference may be made to the related description of the mobile device entity identification method based on the semi-supervised learning algorithm provided in the embodiment of the present application, and details are not repeated here.
Although the present application has been described with reference to the preferred embodiments, it is not intended to limit the present application, and those skilled in the art can make variations and modifications without departing from the spirit and scope of the present application, therefore, the scope of the present application should be determined by the claims that follow.

Claims (10)

1. A mobile equipment entity identification method based on a semi-supervised learning algorithm is characterized by comprising the following steps:
determining attribute features of nodes in a co-occurrence sparse graph of the mobile device identifiers;
determining tagged data for the mobile device entity identification, wherein a quantity of the tagged data does not exceed a first quantity threshold;
determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center by utilizing a loss function of a semi-supervised learning algorithm according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data;
if the characteristics of the mobile equipment centers of a plurality of identifiers are the same, determining the plurality of identifiers to be a plurality of identifiers of the same mobile equipment;
determining the mobile equipment uniquely corresponding to the identifiers according to the similarity between the characteristics of the mobile equipment centers to which the identifiers belong and the characteristics of each mobile equipment center;
the sparse co-occurrence relationship graph of the mobile equipment identifiers is obtained by connecting nodes containing the same identifiers by taking the set of the mobile equipment identifier information as nodes and deleting the connection relationships among the nodes containing the same identifiers, wherein the number of the nodes reaches or exceeds a preset node number threshold value.
2. The semi-supervised learning algorithm based mobile device entity identification method according to claim 1, wherein determining the feature of the mobile device center to which the identifier belongs and the feature of the mobile device center according to the attribute feature of the node in the co-occurrence sparse graph and the labeled data and by using a loss function of the semi-supervised learning algorithm comprises:
establishing a loss function of a semi-supervised learning algorithm;
and performing iterative optimization algorithm training by taking the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data as parameters of a loss function of the semi-supervised learning algorithm to obtain the characteristics of the mobile equipment center to which each identifier belongs and the characteristics of the mobile equipment center.
3. The semi-supervised learning algorithm based mobile device entity identification method of claim 1, wherein the set of identifier information of the mobile device contains software and hardware identifier information corresponding to the mobile device.
4. The semi-supervised learning algorithm based mobile device entity identification method according to claim 3, wherein the software and hardware identifier information corresponding to the mobile device specifically includes at least one of the following identifier information:
an equipment identity, IMEI, for uniquely identifying the mobile equipment;
a subscriber identity IMSI for uniquely identifying mobile subscriber information corresponding to the mobile device;
an advertisement Identifier (IDFA) for tracking the mobile device operation information;
a software identifier UTDID for uniquely identifying the mobile device.
5. The semi-supervised learning algorithm based mobile device entity identification method of claim 1, wherein the set of identifier information of the mobile device uniquely represents a real physical mobile device.
6. The semi-supervised learning algorithm based mobile device entity identification method of claim 1, wherein the attribute features of the middle nodes of the co-occurrence sparse graph are features of a set of identifier information of a mobile device.
7. The method according to claim 2, wherein if the characteristics of the mobile device centers to which the identifiers belong are the same, determining that the identifiers are identifiers of the same mobile device specifically comprises:
determining the similarity of the attribute characteristics of any two nodes of the co-occurrence relationship sparse graph;
determining that two identifier sets respectively corresponding to the two nodes belong to the same mobile device according to the similarity meeting a preset similarity threshold;
and determining that a plurality of identifiers contained in the identifier sets are a plurality of identifiers of the same mobile device according to the fact that the two identifier sets respectively corresponding to the two nodes belong to the same mobile device.
8. A mobile device entity recognition apparatus based on semi-supervised learning algorithm, comprising:
a first obtaining unit: the attribute characteristics of the nodes in the access records in the co-occurrence relation sparse graph of the mobile equipment identifiers are determined;
a second obtaining unit for determining tagged data for the mobile device entity identification, wherein the amount of tagged data does not exceed a first amount threshold;
the computing unit is used for determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center by utilizing a loss function of a semi-supervised learning algorithm according to the attribute characteristics and the marked data of the nodes in the access records in the co-occurrence relation sparse graph;
the first determining unit is used for determining a plurality of identifiers as a plurality of identifiers of the same mobile equipment if the characteristics of the mobile equipment centers to which the plurality of identifiers belong are the same;
a second determining unit, configured to determine, according to a similarity between a feature of a mobile device center to which the plurality of identifiers belong and a feature of each device center, the mobile device to which the plurality of identifiers uniquely correspond;
the sparse co-occurrence relationship graph of the mobile equipment identifiers is obtained by connecting nodes containing the same identifiers by taking the set of the mobile equipment identifier information as nodes and deleting the connection relationships among the nodes containing the same identifiers, wherein the number of the nodes reaches or exceeds a preset node number threshold value.
9. An electronic device, comprising:
a processor; and
a memory for storing a program of a semi-supervised learning algorithm based mobile device entity identification method, the device being powered on and executing the program of the semi-supervised learning algorithm based mobile device entity identification method through the processor, and performing the following steps:
determining attribute features of nodes in a co-occurrence sparse graph of identifiers of the mobile devices;
determining labeled data for the mobile device entity identification, wherein a quantity of the labeled data is equal to or less than a first quantity threshold;
determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data and by using a loss function of a semi-supervised learning algorithm;
if the characteristics of the mobile equipment centers of a plurality of identifiers are the same, determining the plurality of identifiers to be a plurality of identifiers of the same mobile equipment;
determining the mobile equipment uniquely corresponding to the identifiers according to the similarity between the characteristics of the mobile equipment center to which the identifiers belong and the characteristics of each equipment center;
the sparse co-occurrence relationship graph of the mobile equipment identifiers is obtained by connecting nodes containing the same identifiers by taking the set of the mobile equipment identifier information as nodes and deleting the connection relationships among the nodes containing the same identifiers, wherein the number of the nodes reaches or exceeds a preset node number threshold value.
10. A memory device storing a program for a mobile device entity identification method based on a semi-supervised learning algorithm, the program being executed by a processor to perform the steps of:
determining attribute features of nodes in a co-occurrence sparse graph of identifiers of the mobile devices;
determining labeled data for the mobile device entity identification, wherein a quantity of the labeled data is equal to or less than a first quantity threshold;
determining the characteristics of the mobile equipment center to which the identifier belongs and the characteristics of the mobile equipment center according to the attribute characteristics of the nodes in the co-occurrence relation sparse graph and the marked data and by using a loss function of a semi-supervised learning algorithm;
if the characteristics of the mobile equipment centers of a plurality of identifiers are the same, determining the plurality of identifiers to be a plurality of identifiers of the same mobile equipment;
determining the mobile equipment uniquely corresponding to the identifiers according to the similarity between the characteristics of the mobile equipment center to which the identifiers belong and the characteristics of each equipment center;
the sparse co-occurrence relationship graph of the mobile equipment identifiers is obtained by connecting nodes containing the same identifiers by taking the set of the mobile equipment identifier information as nodes and deleting the connection relationships among the nodes containing the same identifiers, wherein the number of the nodes reaches or exceeds a preset node number threshold value.
CN201811011479.3A 2018-08-31 2018-08-31 Mobile equipment entity identification method and device based on semi-supervised learning algorithm Active CN110874465B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811011479.3A CN110874465B (en) 2018-08-31 2018-08-31 Mobile equipment entity identification method and device based on semi-supervised learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811011479.3A CN110874465B (en) 2018-08-31 2018-08-31 Mobile equipment entity identification method and device based on semi-supervised learning algorithm

Publications (2)

Publication Number Publication Date
CN110874465A CN110874465A (en) 2020-03-10
CN110874465B true CN110874465B (en) 2022-01-28

Family

ID=69715791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811011479.3A Active CN110874465B (en) 2018-08-31 2018-08-31 Mobile equipment entity identification method and device based on semi-supervised learning algorithm

Country Status (1)

Country Link
CN (1) CN110874465B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113254318B (en) * 2021-07-06 2021-10-22 北京达佳互联信息技术有限公司 Method and device for determining equipment identification information, electronic equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196933A (en) * 2008-01-09 2008-06-11 王珊 Method and device for using connection table to compress data diagram
CN102096825A (en) * 2011-03-23 2011-06-15 西安电子科技大学 Graph-based semi-supervised high-spectral remote sensing image classification method
CN105160351A (en) * 2015-08-12 2015-12-16 西安电子科技大学 Semi-monitoring high-spectral classification method based on anchor point sparse graph
CN108460326A (en) * 2018-01-10 2018-08-28 华中科技大学 A kind of high spectrum image semisupervised classification method based on sparse expression figure

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103164428B (en) * 2011-12-13 2016-01-20 富士通株式会社 Determine the method and apparatus of the correlativity of microblogging and given entity
US9424307B2 (en) * 2012-10-11 2016-08-23 Scott E. Lilienthal Multivariate data analysis method
US9292797B2 (en) * 2012-12-14 2016-03-22 International Business Machines Corporation Semi-supervised data integration model for named entity classification
GB2532194A (en) * 2014-11-04 2016-05-18 Nokia Technologies Oy A method and an apparatus for automatic segmentation of an object
CN105303198B (en) * 2015-11-17 2018-08-17 福州大学 A kind of remote sensing image semisupervised classification method learnt from fixed step size
CN107168946A (en) * 2017-04-14 2017-09-15 北京化工大学 A kind of name entity recognition method of medical text data

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101196933A (en) * 2008-01-09 2008-06-11 王珊 Method and device for using connection table to compress data diagram
CN102096825A (en) * 2011-03-23 2011-06-15 西安电子科技大学 Graph-based semi-supervised high-spectral remote sensing image classification method
CN105160351A (en) * 2015-08-12 2015-12-16 西安电子科技大学 Semi-monitoring high-spectral classification method based on anchor point sparse graph
CN108460326A (en) * 2018-01-10 2018-08-28 华中科技大学 A kind of high spectrum image semisupervised classification method based on sparse expression figure

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FAST SEMI–SUPERVISED CLASSIFICATION BASED ON PARALLEL AUCTION GRAPH FOR POLARIMETRIC SAR DATA;Hongying Liu 等;《2016 IEEE International Geoscience and Remote Sensing Symposium》;20160715;第1528-1531页 *
Feature Extraction of Hyperspectral Images With Semi-supervised Sparse Graph Learning;Renbo Luo 等;《2018 Fifth International Workshop on Earth Observation and Remote Sensing Applications》;20180620;第1-4页 *
基于稀疏图的半监督学习方法研究;王秀秀;《中国优秀硕士学位论文全文数据库 信息科技辑》;20131215(第S2期);第I140-119页 *

Also Published As

Publication number Publication date
CN110874465A (en) 2020-03-10

Similar Documents

Publication Publication Date Title
CN111931809A (en) Data processing method and device, storage medium and electronic equipment
CN111090807A (en) Knowledge graph-based user identification method and device
CN111259952B (en) Abnormal user identification method, device, computer equipment and storage medium
CN112528022A (en) Method for extracting characteristic words corresponding to theme categories and identifying text theme categories
CN110851817A (en) Terminal type identification method and device
CN115098679A (en) Method, device, equipment and medium for detecting abnormality of text classification labeling sample
CN110874465B (en) Mobile equipment entity identification method and device based on semi-supervised learning algorithm
CN113556368A (en) User identification method, device, server and storage medium
CN113705650B (en) Face picture set processing method, device, medium and computing equipment
CN108830302B (en) Image classification method, training method, classification prediction method and related device
CN110874387B (en) Method and device for constructing sparse graph of co-occurrence relation of identifiers of mobile equipment
CN114329455A (en) User abnormal behavior detection method and device based on heterogeneous graph embedding
CN112199376A (en) Standard knowledge base management method and system based on cluster analysis
CN113627464B (en) Image processing method, device, equipment and storage medium
CN109522196A (en) A kind of method and device of fault log processing
CN112148724B (en) Equipment identification processing method and system, computer equipment and readable storage medium
CN114781517A (en) Risk identification method and device and terminal equipment
CN114528973A (en) Method for generating business processing model, business processing method and device
CN113946717A (en) Sub-map index feature obtaining method, device, equipment and storage medium
CN112445939A (en) Social network group discovery system, method and storage medium
CN111752984B (en) Information processing method, device and storage medium
CN114648527B (en) Urothelial cell slide image classification method, device, equipment and medium
CN111553379B (en) Asynchronous training-based image data processing method and system
CN117853767A (en) Method, device, equipment and storage medium for analyzing same line based on portrait gathering
CN113902060A (en) Group user identification method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB03 Change of inventor or designer information

Inventor after: Wang Can

Inventor after: Shen Xin

Inventor after: Wei Zhao Xian

Inventor after: Yang Hongxia

Inventor after: Wang Zhongyao

Inventor before: Shen Xin

Inventor before: Wei Zhao Xian

Inventor before: Yang Hongxia

Inventor before: Wang Zhongyao

CB03 Change of inventor or designer information
GR01 Patent grant
GR01 Patent grant