CN114880407B - User intelligent identification method and system based on strong and weak relation network - Google Patents

User intelligent identification method and system based on strong and weak relation network Download PDF

Info

Publication number
CN114880407B
CN114880407B CN202210601877.0A CN202210601877A CN114880407B CN 114880407 B CN114880407 B CN 114880407B CN 202210601877 A CN202210601877 A CN 202210601877A CN 114880407 B CN114880407 B CN 114880407B
Authority
CN
China
Prior art keywords
user
identification
relation
head portrait
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210601877.0A
Other languages
Chinese (zh)
Other versions
CN114880407A (en
Inventor
张福明
王晓霞
李畅
於伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiufangyun Intelligent Technology Co ltd
Original Assignee
Shanghai Jiufangyun Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiufangyun Intelligent Technology Co ltd filed Critical Shanghai Jiufangyun Intelligent Technology Co ltd
Priority to CN202210601877.0A priority Critical patent/CN114880407B/en
Publication of CN114880407A publication Critical patent/CN114880407A/en
Application granted granted Critical
Publication of CN114880407B publication Critical patent/CN114880407B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a user intelligent identification method and a system based on strong and weak relation network, which are used for processing collected embedded point data of users at each service end, extracting to obtain user nodes and relations, and generating vertex combination and edge set in a graph structure; obtaining node relations with user node relation weights; obtaining a maximum connected subgraph as a candidate set of user identification results; training to obtain a user identification model fusing head portraits and predicting by using the user identification model fusing head portraits to obtain a final user identification result set. The invention solves the problem of error association of the user relationship by carrying out the calculation based on the strong and weak relationship subgraphs in the user relationship network, solves the problem of expiration or abnormality of partial user data failure in the user relationship graph, avoids the resource waste and the error relationship calculation, and ensures the accuracy, applicability and universality of user identification by adding the nickname similarity calculation of the user head portraits.

Description

User intelligent identification method and system based on strong and weak relation network
Technical Field
The invention relates to the field of artificial intelligence, in particular to a user intelligent identification method and system based on a strong-weak relation network.
Background
In the internet era, the same user in the real world may use different system ends to perform related service applications through various different devices, which may result in data isolation between multiple pieces of information of the same user, and is not beneficial to analysis and modeling of the user, so that the data value of the user cannot be fully applied to optimize the user experience.
In the prior art, a method based on real-time storage and updating of a graph database is adopted to perform user identification, and user ID data to be processed is mapped into subgraphs in the graph database; and regarding the ID relationship as undirected edges of the graph, generating a plurality of connected branch subgraphs of the user ID relationship graph, wherein each connected subgraph represents a real user, and the identification IDs on each connected subgraph are equivalent and different identification IDs of the same user.
However, most of the related technologies only relate to user Identification (ID), and each ID relationship weight is the same, so that the traversing inquiry of the ID relationship subgraph based on the graph structure is directly performed. The user association relation has no strong and weak weight, and meanwhile, the problems of partial user ID invalidation and abnormality exist. Obviously, this reduces its accuracy and versatility. The fault tolerance rate of the user association identification in the real application scene is very low, and the high accuracy is critical. Therefore, in order to perform user analysis more completely, efficient and accurate user data association is a technical problem that needs to be solved preferentially in the current field.
In recent years, the rapid development and mature application of graph calculation and artificial intelligence technology enable accurate identification of users to be possible, and through application and improvement of the front edge technology, user identification based on strong and weak relation networks can be performed, accuracy of user association mapping is improved, and the problem of user data communication is effectively solved.
Patent document CN108491424a discloses a user ID association method and apparatus, the user ID association method includes: acquiring a user ID history log of a target object in a preset period; determining association features between different user IDs based on the user ID history log; establishing a user ID mapping relation list according to the user ID history log, and sequentially judging whether each user ID and the corresponding user IDs of other types are in one-to-one mapping relation or not; if yes, determining the confidence coefficient between the user ID and the corresponding other types of user IDs as 1; otherwise, determining the confidence value between the user ID and the corresponding other type of user ID based on the association analysis algorithm according to the association characteristics between the user ID and the corresponding other type of user ID.
The inventive concept of the patent document CN108491424a is to acquire a plurality of user ID data, establish an association relationship between the plurality of user ID data, and judge the confidence of the association between the user ID data. The confidence level is set by adopting a method of carrying out relevance analysis on different relevance features in two user ID mapping relation lists, wherein the two IDs are homogeneous, namely the IDs of the same type are all user IDs, and the user IDs are only the identity of one user and have no other practical significance. There are disadvantages in that timeliness is low, flexibility is weak, the confidence cannot be updated in time due to expiration of the associated feature, and heterogeneity of the user ID network is not considered. The ID association mapping cannot be updated in real time, the data utilization rate is low, the accuracy is low and even the ID association mapping cannot be effective.
The invention is based on the fact that the user-related data are fully utilized, wherein the user-related data comprise heterogeneous IDs, namely different types of IDs. Generating a user ID relationship as a maximum connected subgraph with strong and weak relationship division, adding a dynamic attenuation factor, flexibly generating a user mapping relationship through subgraph calculation, and further predicting the user mapping relationship by using nickname data of a user head portrait so as to improve the accuracy. That is, the ID in the invention is the ID in the network, such as the ID card ID, the mobile phone number, the micro signal, etc., the invention can construct the visual and efficient user ID network graph structure through heterogeneous ID, can fully utilize abundant and diverse user data, and adopts advanced graph structure algorithm to satisfy timeliness, flexibility, accuracy, etc. of user ID mapping. The user ID, the identity card ID, the mobile phone number, the micro signal and the like contained in the graph structure are heterogeneous IDs. Heterogeneous IDs may be understood as IDs of users on different platforms.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a user intelligent identification method and system based on a strong-weak relation network.
The invention provides a user intelligent identification method based on a strong-weak relation network, which comprises the following steps:
step S1: processing the collected buried point data of each service end user, extracting to obtain user nodes and relations, and generating vertex combination and edge collection in the graph structure;
step S2: obtaining node relations with user node relation weights according to the vertex combination and the edge set;
step S3: obtaining a maximum connected subgraph as a candidate set of user identification results according to the node relation with the user node relation weight;
step S4: and training the user identification result candidate set to obtain a user identification model fusing head portrait nicknames, and predicting by using the user identification model fusing head portrait nicknames to obtain a final user identification result set.
Preferably, the step 2 includes:
Step S2.1: traversing and reading two nodes in each node pair, wherein the node relation < V x,Vy>;Vx、Vy > is respectively the node pair;
step S2.2: acquiring relevant attribute parameters of each pair of nodes
{<Vx,Vy>:property1,property2,property3,...};propertyi Representing the ith attribute;
Step S2.3: the weight xy of each pair of nodes is calculated as follows:
weightxy=∑propertyi×factori
Wherein factor i is an important factor of the ith attribute, and finally each pair of node relations is saved as a triplet form < V x,weightxy,Vy >.
Preferably, the step S3 includes:
Step S3.1: generating all node relations with the user node relation weights into a weighted graph model;
Step S3.2: carrying out maximum connected subgraph calculation aiming at the graph model;
the step S3.2 includes:
Step S3.2.1: accessing a current vertex V1, sequentially traversing first-layer vertices V1, V2, V3 directly associated therewith, respectively calculating w=weight 1i×∈i, where e i is a dynamic decay factor corresponding to weight 1i; weight 1i represents the weight of the relationship between the pair of nodes formed by vertex V1 and the ith vertex;
Step S3.2.2: mu is set as a threshold value, and the calculation formula is as follows:
Wherein N is the number of all nodes in the graph model;
Step S3.2.3: if W is more than or equal to mu, adding the node V1 into a Queue, adding the maximum connected subgraph result set at the same time, and if W is less than mu, discarding the node V1; taking the next vertex as the current vertex, and returning to the triggering step S3.2.1;
and the steps S3.2.1, S3.2.2 and S3.2.3 are circularly triggered to be executed, and each layer of nodes are sequentially traversed to obtain the maximum connected subgraph, namely a user identification result candidate set.
Preferably, the step S4 includes:
Step S4.1: randomly selecting partial data as a model training sample aiming at the user identification result candidate set, and acquiring head portrait data and nickname data of a user contained in the model training sample; respectively calculating the similarity and the complete matching value of head portraits and nicknames among users;
the step 4.1 includes:
Step S4.1.1: the user head portrait data in the model training sample are subjected to resolution change and rotation operation to obtain operated user head portrait data profile 1,profile2 respectively, and user nicknames corresponding to the operated user head portrait data are set as name 1,name2 respectively and are used as model input data;
Step S4.1.2: performing head portrait similarity calculation, wherein the head portrait similarity calculation is performed by adopting a ResNet + contrast learning method, the head portrait similarity calculation freezes the front 40 layers of the ResNet network in the whole recognition model training process, parameters of the rear network layers are continuously optimized and adjusted, finally, head portrait pictures are encoded into 2048-dimensional vectors, and similarity values are calculated through cosine similarity; the nicknames are subjected to similarity calculation in a complete matching mode based on rules;
step S4.2: using the similarity characteristic value as input data of GDBT algorithm layers to predict and identify whether two user identifications possibly being the same user in a user candidate set are the same user or not, and obtaining a trained user identification model integrating head portrait nicknames; wherein, the similarity characteristic value comprises an head portrait similarity characteristic value and a nickname similarity characteristic value;
Step S4.3: predicting through a trained user identification model integrating head portraits and judging whether prediction result conflicts exist or not, if so, taking two user marks with the prediction result conflicts as identification abnormal users, adding the identification abnormal users into an identification abnormal user set, otherwise, adding the identification abnormal users into a final user identification result set; wherein, the prediction result conflict means that two user identifications possibly being the same user in the user candidate set are identified as not being the same user;
Step S4.4: and manually auditing the abnormal user set, correcting the user with the identification error, adding the user identification result set and the training set, and continuously optimizing the iterative prediction model.
Preferably, according to the calculation of the subgraph based on the strong and weak relation in the user relation network, the association of the user relation is analyzed; in the calculation process of the graph, adding a dynamic attenuation factor of the node relation, and identifying and eliminating the expiration or abnormal condition of partial user data failure in the user relation graph; wherein, heterogeneous IDs are among the IDs of users.
The invention provides a user intelligent recognition system based on a strong-weak relation network, which comprises:
module M1: processing the collected buried point data of each service end user, extracting to obtain user nodes and relations, and generating vertex combination and edge collection in the graph structure;
Module M2: obtaining node relations with user node relation weights according to the vertex combination and the edge set;
Module M3: obtaining a maximum connected subgraph as a candidate set of user identification results according to the node relation with the user node relation weight;
module M4: and training the user identification result candidate set to obtain a user identification model fusing head portrait nicknames, and predicting by using the user identification model fusing head portrait nicknames to obtain a final user identification result set.
Preferably, the module M2 comprises:
Module M2.1: traversing and reading two nodes in each node pair, wherein the node relation < V x,Vy>;Vx、Vy > is respectively the node pair;
Module M2.2: acquiring relevant attribute parameters of each pair of nodes
{<Vx,Vy>:property1,property2,property3,...};propertyi Representing the ith attribute;
Module M2.3: the weight xy of each pair of nodes is calculated as follows:
weightxy=∑propertyi×factori
Wherein factor i is an important factor of the ith attribute, and finally each pair of node relations is saved as a triplet form < V x,weightxy,Vy >.
Preferably, the module M3 comprises:
module M3.1: generating all node relations with the user node relation weights into a weighted graph model;
Module M3.2: carrying out maximum connected subgraph calculation aiming at the graph model;
The module M3.2 comprises:
Module M3.2.1: accessing a current vertex V1, sequentially traversing first-layer vertices V1, V2, V3 directly associated therewith, respectively calculating w=weight 1i×∈i, where e i is a dynamic decay factor corresponding to weight 1i; weight 1i represents the weight of the relationship between the pair of nodes formed by vertex V1 and the ith vertex;
module M3.2.2: mu is set as a threshold value, and the calculation formula is as follows:
Wherein N is the number of all nodes in the graph model;
module M3.2.3: if W is more than or equal to mu, adding the node V1 into a Queue, adding the maximum connected subgraph result set at the same time, and if W is less than mu, discarding the node V1; the next vertex is used as the current vertex, and the trigger module M3.2.1 is returned;
And executing a loop triggering module M3.2.1, a module M3.2.2 and a module M3.2.3, and traversing each layer of nodes in turn to obtain a maximum connected subgraph, namely a user identification result candidate set.
Preferably, the module M4 comprises:
Module M4.1: randomly selecting partial data as a model training sample aiming at the user identification result candidate set, and acquiring head portrait data and nickname data of a user contained in the model training sample; respectively calculating the similarity and the complete matching value of head portraits and nicknames among users;
the module M4.1 comprises:
Module M4.1.1: the user head portrait data in the model training sample are subjected to resolution change and rotation operation to obtain operated user head portrait data profile 1,profile2 respectively, and user nicknames corresponding to the operated user head portrait data are set as name 1,name2 respectively and are used as model input data;
Module M4.1.2: performing head portrait similarity calculation, wherein the head portrait similarity calculation is performed by adopting a ResNet + contrast learning method, the head portrait similarity calculation freezes the front 40 layers of the ResNet network in the whole recognition model training process, parameters of the rear network layers are continuously optimized and adjusted, finally, head portrait pictures are encoded into 2048-dimensional vectors, and similarity values are calculated through cosine similarity; the nicknames are subjected to similarity calculation in a complete matching mode based on rules;
Module M4.2: using the similarity characteristic value as input data of GDBT algorithm layers to predict and identify whether two user identifications possibly being the same user in a user candidate set are the same user or not, and obtaining a trained user identification model integrating head portrait nicknames; wherein, the similarity characteristic value comprises an head portrait similarity characteristic value and a nickname similarity characteristic value;
Module M4.3: predicting through a trained user identification model integrating head portraits and judging whether prediction result conflicts exist or not, if so, taking two user marks with the prediction result conflicts as identification abnormal users, adding the identification abnormal users into an identification abnormal user set, otherwise, adding the identification abnormal users into a final user identification result set; wherein, the prediction result conflict means that two user identifications possibly being the same user in the user candidate set are identified as not being the same user;
Module M4.4: and manually auditing the abnormal user set, correcting the user with the identification error, adding the user identification result set and the training set, and continuously optimizing the iterative prediction model.
Preferably, according to the calculation of the subgraph based on the strong and weak relation in the user relation network, the association of the user relation is analyzed; in the calculation process of the graph, adding a dynamic attenuation factor of the node relation, and identifying and eliminating the expiration or abnormal condition of partial user data failure in the user relation graph; wherein, heterogeneous IDs are among the IDs of users.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention adopts the node relation with the user node relation weight, solves the problem of error association of the user relation by carrying out the calculation based on the strong and weak relation subgraphs in the user relation network, and improves the accuracy of user identification.
2. In the process of calculating the graph, the method adds the dynamic attenuation factor of the node relation, solves the problem that part of user data in the user relation graph is out of date or abnormal, and avoids the resource waste and the error relation calculation caused by the expiration or the abnormality.
3. According to the invention, the recognition basis is added, the nickname similarity calculation of the user head portraits is added, the user recognition candidate set is further predicted by training a high-precision user recognition prediction model, and the accuracy, applicability and universality of user recognition are ensured by manually checking and intervening the conflicting user mapping relation.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, given with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of the steps of the present invention.
Detailed Description
The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the present invention, but are not intended to limit the invention in any way. It should be noted that variations and modifications could be made by those skilled in the art without departing from the inventive concept. These are all within the scope of the present invention.
The invention provides a user identification method based on a strong-weak relation network, which is characterized in that user relation is generated into a maximum connected subgraph with strong-weak relation division by fully utilizing user related data, a dynamic attenuation factor is added, a user mapping relation is finally generated through subgraph calculation, and the user mapping relation is further predicted by utilizing user head portrait nickname data, so that the problems of vulnerability and low accuracy in user identification in the prior art are solved. The main flow steps of the invention include:
1) And acquiring relevant buried point data of the user.
2) And (5) carrying out preprocessing operations such as cleaning, filtering, de-duplication and the like on the original dirty data by using rules.
3) And extracting nodes and relations in the user data to generate vertex and edge sets.
4) And calculating the relation weight of each node according to the related attribute of each node, and generating a node relation triplet.
5) All weighted node relationships are generated as weighted graph models.
6) And carrying out maximum connected subgraph calculation by adopting a breadth-first search algorithm based on a dynamic attenuation factor, and generating a user identification result candidate set.
7) And selecting partial data in the candidate set of the user identification result, acquiring head portrait nickname data of the user, and performing operations such as resolution change, rotation and the like on the head portrait picture to serve as original training data of the user identification model.
8) The head portrait picture is converted into 2048-dimensional vectors, similarity calculation is carried out to obtain head portrait similarity characteristic values, and nicknames are completely matched through rules to obtain matching characteristic values.
9) Model training and prediction are carried out by adopting GDBT algorithm, and a user identification result set is generated.
10 If yes, the predicted result is marked as the abnormal user to be added into the abnormal user set, otherwise, the predicted result is added into the final user recognition result set.
11 Manually auditing the abnormal user set, correcting the user with the wrong identification, adding the user identification result set and the training set, and continuously optimizing the iterative prediction model.
The present invention will be described in more detail below.
The intelligent user identification method based on the strong-weak relation network provided by the invention comprises the following steps:
step S1: and (5) data acquisition and cleaning. The part mainly comprises:
Step S1.1: and collecting and acquiring buried point data of each service end user. Wherein, heterogeneous IDs are among the IDs of users.
Step S1.2: and (5) carrying out preprocessing operations such as cleaning, filtering, de-duplication and the like on the original dirty data by using rules.
Step S1.3: and extracting corresponding user nodes and relations to generate Vertex set Vertex1 and Edge set Edge1 in the graph structure.
Step S2: and calculating the relation weight of the user nodes. The part mainly comprises:
Step S2.1: traversing and reading two nodes in each node pair, wherein the node relation < V x,Vy>.Vx、Vy > is respectively the node pair;
step S2.2: acquiring relevant attribute parameters of each pair of nodes
{<Vx,Vy>:property1,property2,property3,...}.propertyi Representing the i-th attribute.
Step S2.3: the weight of the relation of each pair of nodes is calculated as follows:
weightxy=∑propertyi×factori
Wherein factor i is an important factor of the ith attribute, and finally each pair of node relations is saved as a triplet form < V x,weightxy,Vy >.
Step S3: and calculating a maximum connected subgraph. The part mainly comprises:
step S3.1: and (3) generating all weighted node relations obtained in the step S2.3 into a weighted graph model.
Step S3.2: and calculating a maximum connected subgraph by adopting a breadth-first search algorithm based on a dynamic attenuation factor, wherein the method comprises the following steps of:
Step S3.2.1: accessing the current vertex V1, sequentially traversing the first layer vertices V1, V2, V3 directly associated therewith, calculates w=weight 1i×∈i, respectively, where e i is a dynamic decay factor corresponding to weight 1i. weight 1i represents the weight of the relationship between the pair of nodes formed by vertex V1 and the ith vertex.
Step S3.2.2: mu is set as a threshold value, and the calculation formula is as follows:
wherein N is the number of all nodes in the graph model.
Step S3.2.3: if W is larger than or equal to mu, the node V1 is added into the Queue, and the maximum connected subgraph result set is added, and if W is smaller than mu, the node V1 is discarded. And returning to the triggering step S3.2.1 by taking the next vertex as the current vertex.
And the steps S3.2.1, S3.2.2 and S3.2.3 are circularly triggered to be executed, and each layer of nodes are sequentially traversed to obtain the maximum connected subgraph, namely a user identification result candidate set.
Step S4: a user identification model of the avatar nickname is fused.
Step S4.1: and (3) randomly selecting partial data as a model training sample from the user identification result candidate set obtained by calculation of the maximum connected subgraph algorithm in the step (S3), and obtaining head portrait nickname data of the contained user. Next, the similarity and the complete matching value of the head portraits and the nicknames between the users are calculated respectively.
Step S4.1.1: and respectively obtaining the operated user head portrait data profile 1,profile2 by changing the resolution and rotating the user head portrait data in the training sample, and setting the user nicknames corresponding to the operated user head portrait data as names 1,name2 respectively as model input data.
Step S4.1.2: the second layer of the model is divided into two parts, wherein one part is head portrait similarity calculation, and the other part is nickname rule matching. The head portrait similarity calculation adopts a ResNet + contrast learning method to calculate, the head portrait similarity calculation part freezes the front 40 layers of ResNet networks in the whole recognition model training process, parameters of the rear network layers are continuously optimized and adjusted, finally head portrait pictures are encoded into 2048-dimensional vectors, and the similarity value is calculated through cosine similarity. Nicknames are calculated using a rule-based perfect match.
Step S4.2: the similarity characteristic value calculated in the step S4.1.2 is used as input data of the GDBT algorithm layer to predict and identify whether the similarity characteristic value and the input data are the same user.
Step S4.3: and predicting through a trained user identification model integrating head portraits and nicknames, if prediction result conflicts occur, marking the user identification model as an abnormal user to be identified, and adding the abnormal user to an abnormal user identification set, otherwise, adding the end user identification result set. For example, the users A, B in the candidate set of users may be the same person, and after the prediction by the model, the occurrence user A, B is not the result of the same person, in which case the user A, B is considered as two users with conflicting prediction results, and the two users are marked as abnormal users, and the abnormal users are added to the identified abnormal user set.
Step S4.4: and manually auditing the abnormal user set, correcting the user with the identification error, adding the user identification result set and the training set, and continuously optimizing the iterative prediction model.
The invention also provides a user intelligent recognition system based on the strong-weak relation network, which comprises:
module M1: processing the collected buried point data of each service end user, extracting to obtain user nodes and relations, and generating vertex combination and edge collection in the graph structure;
Module M2: obtaining node relations with user node relation weights according to the vertex combination and the edge set;
Module M3: obtaining a maximum connected subgraph as a candidate set of user identification results according to the node relation with the user node relation weight;
module M4: and training the user identification result candidate set to obtain a user identification model fusing head portrait nicknames, and predicting by using the user identification model fusing head portrait nicknames to obtain a final user identification result set.
The module M2 includes:
Module M2.1: traversing and reading two nodes in each node pair, wherein the node relation < V x,Vy>;Vx、Vy > is respectively the node pair;
Module M2.2: acquiring relevant attribute parameters of each pair of nodes
{<Vx,Vy>:property1,property2,property3,...};propertyi Representing the ith attribute;
Module M2.3: the weight xy of each pair of nodes is calculated as follows:
weightxy=∑propertyi×factori
Wherein factor i is an important factor of the ith attribute, and finally each pair of node relations is saved as a triplet form < V x,weightxy,Vy >.
Preferably, the module M3 comprises:
module M3.1: generating all node relations with the user node relation weights into a weighted graph model;
Module M3.2: carrying out maximum connected subgraph calculation aiming at the graph model;
The module M3.2 comprises:
Module M3.2.1: accessing a current vertex V1, sequentially traversing first-layer vertices V1, V2, V3 directly associated therewith, respectively calculating w=weight 1i×∈i, where e i is a dynamic decay factor corresponding to weight 1i; weight 1i represents the weight of the relationship between the pair of nodes formed by vertex V1 and the ith vertex;
module M3.2.2: mu is set as a threshold value, and the calculation formula is as follows:
Wherein N is the number of all nodes in the graph model;
module M3.2.3: if W is more than or equal to mu, adding the node V1 into a Queue, adding the maximum connected subgraph result set at the same time, and if W is less than mu, discarding the node V1; the next vertex is used as the current vertex, and the trigger module M3.2.1 is returned;
And executing a loop triggering module M3.2.1, a module M3.2.2 and a module M3.2.3, and traversing each layer of nodes in turn to obtain a maximum connected subgraph, namely a user identification result candidate set.
Preferably, the module M4 comprises:
Module M4.1: randomly selecting partial data as a model training sample aiming at the user identification result candidate set, and acquiring head portrait data and nickname data of a user contained in the model training sample; respectively calculating the similarity and the complete matching value of head portraits and nicknames among users;
The module M4.1 comprises: module M4.1.1: the user head portrait data in the model training sample are subjected to resolution change and rotation operation to obtain operated user head portrait data profile 1,profile2 respectively, and user nicknames corresponding to the operated user head portrait data are set as name 1,name2 respectively and are used as model input data; module M4.1.2: performing head portrait similarity calculation, wherein the head portrait similarity calculation is performed by adopting a ResNet + contrast learning method, the head portrait similarity calculation freezes the front 40 layers of the ResNet network in the whole recognition model training process, parameters of the rear network layers are continuously optimized and adjusted, finally, head portrait pictures are encoded into 2048-dimensional vectors, and similarity values are calculated through cosine similarity; the nicknames are subjected to similarity calculation in a complete matching mode based on rules;
Module M4.2: using the similarity characteristic value as input data of GDBT algorithm layers to predict and identify whether two user identifications possibly being the same user in a user candidate set are the same user or not, and obtaining a trained user identification model integrating head portrait nicknames; wherein, the similarity characteristic value comprises an head portrait similarity characteristic value and a nickname similarity characteristic value;
Module M4.3: predicting through a trained user identification model integrating head portraits and judging whether prediction result conflicts exist or not, if so, taking two user marks with the prediction result conflicts as identification abnormal users, adding the identification abnormal users into an identification abnormal user set, otherwise, adding the identification abnormal users into a final user identification result set; wherein, the prediction result conflict means that two user identifications possibly being the same user in the user candidate set are identified as not being the same user; module M4.4: and manually auditing the abnormal user set, correcting the user with the identification error, adding the user identification result set and the training set, and continuously optimizing the iterative prediction model.
Analyzing the association of the user relationship according to the calculation of the subgraph based on the strong and weak relationship in the user relationship network; in the calculation process of the graph, adding a dynamic attenuation factor of the node relation, and identifying and eliminating the expiration or abnormal condition of partial user data failure in the user relation graph; wherein, heterogeneous IDs are among the IDs of users.
Those skilled in the art will appreciate that the systems, apparatus, and their respective modules provided herein may be implemented entirely by logic programming of method steps such that the systems, apparatus, and their respective modules are implemented as logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers, etc., in addition to the systems, apparatus, and their respective modules being implemented as pure computer readable program code. Therefore, the system, the apparatus, and the respective modules thereof provided by the present invention may be regarded as one hardware component, and the modules included therein for implementing various programs may also be regarded as structures within the hardware component; modules for implementing various functions may also be regarded as being either software programs for implementing the methods or structures within hardware components.
The foregoing describes specific embodiments of the present application. It is to be understood that the application is not limited to the particular embodiments described above, and that various changes or modifications may be made by those skilled in the art within the scope of the appended claims without affecting the spirit of the application. The embodiments of the application and the features of the embodiments may be combined with each other arbitrarily without conflict.

Claims (4)

1. The intelligent user identification method based on the strong-weak relation network is characterized by comprising the following steps of:
step S1: processing the collected buried data of each service end user, extracting to obtain user nodes and relations, and generating vertex sets and edge sets in the graph structure;
Step S2: obtaining node relations with user node relation weights according to the vertex sets and the edge sets;
step S3: obtaining a maximum connected subgraph as a candidate set of user identification results according to the node relation with the user node relation weight;
Step S4: training to obtain a user identification model fusing head portraits for the user identification result candidate set, and predicting by using the user identification model fusing head portraits to obtain a final user identification result set;
The step S4 includes:
Step S4.1: randomly selecting partial data as a model training sample aiming at the user identification result candidate set, and acquiring head portrait data and nickname data of a user contained in the model training sample; respectively calculating the similarity and the complete matching value of head portraits and nicknames among users;
the step 4.1 includes:
Step S4.1.1: the user head portrait data in the model training sample are subjected to resolution change and rotation operation to obtain operated user head portrait data profile 1,profile2 respectively, and user nicknames corresponding to the operated user head portrait data are set as name 1,name2 respectively and are used as model input data;
Step S4.1.2: performing head portrait similarity calculation, wherein the head portrait similarity calculation is performed by adopting a ResNet + contrast learning method, the head portrait similarity calculation freezes the front 40 layers of the ResNet network in the whole recognition model training process, parameters of the rear network layers are continuously optimized and adjusted, finally, head portrait pictures are encoded into 2048-dimensional vectors, and similarity values are calculated through cosine similarity; the nicknames are subjected to similarity calculation in a complete matching mode based on rules;
Step S4.2: using the similarity characteristic value as input data of GDBT algorithm layers to predict and identify whether two user identifications possibly being the same user in the user candidate set are the same user or not, and obtaining a trained user identification model integrating head portraits and nicknames; wherein, the similarity characteristic value comprises an head portrait similarity characteristic value and a nickname similarity characteristic value;
Step S4.3: predicting through a trained user identification model integrating head portraits and judging whether prediction result conflicts exist or not, if so, taking two user marks with the prediction result conflicts as identification abnormal users, adding the identification abnormal users into an identification abnormal user set, otherwise, adding the identification abnormal users into a final user identification result set; wherein, the prediction result conflict means that two user identifications possibly being the same user in the user candidate set are identified as not being the same user;
Step S4.4: and manually auditing the abnormal user set, correcting the user with the identification error, adding the user identification result set and the training set, and continuously optimizing the iterative prediction model.
2. The intelligent user identification method based on the strong and weak relation network according to claim 1, wherein the association of the user relation is analyzed according to the calculation of the sub-graph based on the strong and weak relation in the user relation network; in the calculation process of the graph, adding a dynamic attenuation factor of the node relation, and identifying and eliminating the expiration or abnormal condition of partial user data failure in the user relation graph; wherein, heterogeneous IDs are among the IDs of users; heterogeneous IDs, i.e. different types of IDs.
3. A user intelligent recognition system based on a strong-weak relation network is characterized by comprising:
Module M1: processing the collected buried data of each service end user, extracting to obtain user nodes and relations, and generating vertex sets and edge sets in the graph structure;
module M2: obtaining node relations with user node relation weights according to the vertex sets and the edge sets;
Module M3: obtaining a maximum connected subgraph as a candidate set of user identification results according to the node relation with the user node relation weight;
Module M4: training to obtain a user identification model fusing head portraits for the user identification result candidate set, and predicting by using the user identification model fusing head portraits to obtain a final user identification result set;
the module M4 includes:
Module M4.1: randomly selecting partial data as a model training sample aiming at the user identification result candidate set, and acquiring head portrait data and nickname data of a user contained in the model training sample; respectively calculating the similarity and the complete matching value of head portraits and nicknames among users;
the module M4.1 comprises:
Module M4.1.1: the user head portrait data in the model training sample are subjected to resolution change and rotation operation to obtain operated user head portrait data profile 1,profile2 respectively, and user nicknames corresponding to the operated user head portrait data are set as name 1,name2 respectively and are used as model input data;
Module M4.1.2: performing head portrait similarity calculation, wherein the head portrait similarity calculation is performed by adopting a ResNet + contrast learning method, the head portrait similarity calculation freezes the front 40 layers of the ResNet network in the whole recognition model training process, parameters of the rear network layers are continuously optimized and adjusted, finally, head portrait pictures are encoded into 2048-dimensional vectors, and similarity values are calculated through cosine similarity; the nicknames are subjected to similarity calculation in a complete matching mode based on rules;
module M4.2: using the similarity characteristic value as input data of GDBT algorithm layers to predict and identify whether two user identifications possibly being the same user in the user candidate set are the same user or not, and obtaining a trained user identification model integrating head portraits and nicknames; wherein, the similarity characteristic value comprises an head portrait similarity characteristic value and a nickname similarity characteristic value;
Module M4.3: predicting through a trained user identification model integrating head portraits and judging whether prediction result conflicts exist or not, if so, taking two user marks with the prediction result conflicts as identification abnormal users, adding the identification abnormal users into an identification abnormal user set, otherwise, adding the identification abnormal users into a final user identification result set; wherein, the prediction result conflict means that two user identifications possibly being the same user in the user candidate set are identified as not being the same user;
Module M4.4: and manually auditing the abnormal user set, correcting the user with the identification error, adding the user identification result set and the training set, and continuously optimizing the iterative prediction model.
4. The intelligent user recognition system based on the strong and weak relation network according to claim 3, wherein the association of the user relation is analyzed according to the calculation of the sub-graph based on the strong and weak relation in the user relation network; in the calculation process of the graph, adding a dynamic attenuation factor of the node relation, and identifying and eliminating the expiration or abnormal condition of partial user data failure in the user relation graph; wherein, heterogeneous IDs are among the IDs of users; heterogeneous IDs, i.e. different types of IDs.
CN202210601877.0A 2022-05-30 2022-05-30 User intelligent identification method and system based on strong and weak relation network Active CN114880407B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210601877.0A CN114880407B (en) 2022-05-30 2022-05-30 User intelligent identification method and system based on strong and weak relation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210601877.0A CN114880407B (en) 2022-05-30 2022-05-30 User intelligent identification method and system based on strong and weak relation network

Publications (2)

Publication Number Publication Date
CN114880407A CN114880407A (en) 2022-08-09
CN114880407B true CN114880407B (en) 2024-06-21

Family

ID=82679769

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210601877.0A Active CN114880407B (en) 2022-05-30 2022-05-30 User intelligent identification method and system based on strong and weak relation network

Country Status (1)

Country Link
CN (1) CN114880407B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116501726B (en) * 2023-06-20 2023-09-29 中国人寿保险股份有限公司上海数据中心 Information creation cloud platform data operation system based on GraphX graph calculation

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140279906A1 (en) * 2013-03-13 2014-09-18 Bart Michael Peintner Apparatus, system and method for multiple source disambiguation of social media communications
KR102417682B1 (en) * 2015-09-09 2022-07-07 삼성전자주식회사 Method and apparatus for managing nick name using a voice recognition
US10607065B2 (en) * 2018-05-03 2020-03-31 Adobe Inc. Generation of parameterized avatars
US20200097601A1 (en) * 2018-09-26 2020-03-26 Accenture Global Solutions Limited Identification of an entity representation in unstructured data
CN110224859B (en) * 2019-05-16 2022-04-01 蚂蚁智安安全技术(上海)有限公司 Method and system for identifying a group
CN110297738A (en) * 2019-05-21 2019-10-01 深圳壹账通智能科技有限公司 Monitoring method, device, equipment and the storage medium of system service
CN110347959B (en) * 2019-06-27 2022-11-01 杭州数跑科技有限公司 Anonymous user identification method, device, computer equipment and storage medium
CN111192153B (en) * 2019-12-19 2023-08-29 浙江大搜车软件技术有限公司 Crowd relation network construction method, device, computer equipment and storage medium
CN111199208A (en) * 2019-12-31 2020-05-26 上海昌投网络科技有限公司 Head portrait gender identification method and system based on deep learning framework
US20210294830A1 (en) * 2020-03-19 2021-09-23 The Trustees Of Indiana University Machine learning approaches to identify nicknames from a statewide health information exchange
CN111639700A (en) * 2020-05-28 2020-09-08 深圳壹账通智能科技有限公司 Target similarity recognition method and device, computer equipment and readable storage medium
CN113536870A (en) * 2020-07-09 2021-10-22 腾讯科技(深圳)有限公司 Abnormal head portrait identification method and device
CN111708823B (en) * 2020-08-18 2021-05-18 腾讯科技(深圳)有限公司 Abnormal social account identification method and device, computer equipment and storage medium
CN112433991A (en) * 2020-11-20 2021-03-02 苏宁金融科技(南京)有限公司 Problem positioning method and device
CN112685580A (en) * 2020-12-25 2021-04-20 公安部第三研究所 Social network head portrait comparison distributed detection system, method and device based on deep learning, processor and storage medium thereof
CN112734466A (en) * 2020-12-31 2021-04-30 联想(北京)有限公司 Method and device for processing associated information and storage medium
CN113222775B (en) * 2021-05-28 2022-08-05 北京理工大学 User identity correlation method integrating multi-mode information and weight tensor
CN114418781A (en) * 2021-12-31 2022-04-29 北京五八信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种集成用户画像与内容的服务重定向方法;熊伟;杭波;李兵;吴钊;谷琼;;小型微型计算机系统;20171215(第12期);140-143 *
旅游场景下的实体别名抽取联合模型;杨一帆;陈文亮;;中文信息学报;20200615(第06期);59-67 *

Also Published As

Publication number Publication date
CN114880407A (en) 2022-08-09

Similar Documents

Publication Publication Date Title
CN111526119B (en) Abnormal flow detection method and device, electronic equipment and computer readable medium
CN111563560B (en) Data stream classification method and device based on time sequence feature learning
CN112615888B (en) Threat assessment method and device for network attack behavior
CN113095370B (en) Image recognition method, device, electronic equipment and storage medium
CN111176953B (en) Abnormality detection and model training method, computer equipment and storage medium
CN111970400B (en) Crank call identification method and device
CN113297393A (en) Situation awareness and big data based information generation method and information security system
CN114880407B (en) User intelligent identification method and system based on strong and weak relation network
CN114780831A (en) Sequence recommendation method and system based on Transformer
CN110598109A (en) Information recommendation method, device, equipment and storage medium
CN113628059A (en) Associated user identification method and device based on multilayer graph attention network
CN113568899A (en) Data optimization method based on big data and cloud server
CN114221991B (en) Session recommendation feedback processing method based on big data and deep learning service system
CN114647790A (en) Big data mining method and cloud AI (Artificial Intelligence) service system applied to behavior intention analysis
CN112784008B (en) Case similarity determining method and device, storage medium and terminal
CN117221087A (en) Alarm root cause positioning method, device and medium
CN114757391B (en) Network data space design and application method oriented to service quality prediction
CN115984734A (en) Model training method, video recall method, model training device, video recall device, electronic equipment and storage medium
CN114528908A (en) Network request data classification model training method, classification method and storage medium
CN113407837A (en) Intelligent medical big data processing method based on artificial intelligence and intelligent medical system
CN113468604A (en) Big data privacy information analysis method and system based on artificial intelligence
CN113591752A (en) Equipment oil leakage monitoring method based on convolutional neural network and related equipment
CN118245825A (en) Clustering iteration method, clustering iteration device, electronic equipment and computer-readable storage medium
CN116304901B (en) Webpage server fingerprint identification method, device, equipment and storage medium
CN109462778B (en) Live broadcast identification recommendation method and related equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant