CN113807978A - Hidden community attribute acquisition method and system based on attention-seeking neural network - Google Patents

Hidden community attribute acquisition method and system based on attention-seeking neural network Download PDF

Info

Publication number
CN113807978A
CN113807978A CN202111047006.0A CN202111047006A CN113807978A CN 113807978 A CN113807978 A CN 113807978A CN 202111047006 A CN202111047006 A CN 202111047006A CN 113807978 A CN113807978 A CN 113807978A
Authority
CN
China
Prior art keywords
social
user
target user
users
attention
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111047006.0A
Other languages
Chinese (zh)
Inventor
张毅
曹万华
刘俊涛
饶子昀
王元斌
王军伟
周莹
王振杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
709th Research Institute of CSIC
Original Assignee
709th Research Institute of CSIC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 709th Research Institute of CSIC filed Critical 709th Research Institute of CSIC
Priority to CN202111047006.0A priority Critical patent/CN113807978A/en
Publication of CN113807978A publication Critical patent/CN113807978A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9536Search customisation based on social or collaborative filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Abstract

The invention discloses a hidden community attribute acquisition method based on an attention-seeking neural network, which comprises the following steps: all words in a user social media data Word library are learned through a Word vector model Word2vec network to obtain embedded representation vectors of all words; obtaining an embedded representation of a target user based on a target embedded layer of a forward full-connection network by normalizing and weighting an embedded representation vector of a user social media data vocabulary; generating embedded representation of a neighbor user of a target user based on the user social network and the social activity information, and calculating a weighted social matrix according to the embedded representation of the neighbor user; and training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user. The invention also provides a corresponding hidden community attribute acquisition system based on the attention-seeking neural network.

Description

Hidden community attribute acquisition method and system based on attention-seeking neural network
Technical Field
The invention belongs to the technical field of attribute mining, and particularly relates to a hidden community attribute acquisition method and system based on an attention-seeking neural network.
Background
User research in a network space is one of important tasks in the current internet personalized recommendation field, and with continuous expansion of social network scale, user attribute information in the network space has the characteristics of sparsity, fragmentity, heterogeneity and the like, so that acquisition of certain hidden community attribute information of a target user is very difficult, and further analysis and recommendation work is difficult. How to mine user potential or hidden community attribute information in a social network platform through effective technical means is crucial to analyzing potential preferences exhibited by target users with respect to contents to be recommended.
For user attribute cognitive analysis, most of attribute features adopted in the existing research are lightweight features, the contained information is not rich enough, and the attribute embedding representation and exploration of a target user are not sufficient, so that a large amount of information still exists for mining. Secondly, the attribute classification of the target users generated by the existing research is still too simple, the target users are usually the classification targets with simpler gender, education background and the like, the exploration on the fusion aspect of various media content information is still very lacked, and a larger explorable space still exists for the content behavior characterization method of the target users.
Disclosure of Invention
Aiming at the defects or the improvement requirements of the prior art, the invention provides a hidden community attribute acquisition method based on an attention-seeking neural network, which integrates text content information issued by target users and social information among the target users to acquire the hidden community attribute information and also considers information mining of interaction between target output content and a target of the social network. For example, two users having the same preferences and having mutual attention are likely to mutually favor or forward media data generated by each other, thereby forming a close target connection relationship. The invention realizes the uniform representation of the cross-space target attribute information, more efficiently excavates the hidden community attribute information of the target user in the network space, and is convenient for further analysis and recommendation work.
To achieve the above object, according to an aspect of the present invention, there is provided a hidden community attribute obtaining method based on an attention-seeking neural network, including:
step S1: all words in a user social media data Word library are learned through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
step S2: obtaining an embedded representation of a target user based on a target embedded layer of a forward full-connection network by normalizing and weighting an embedded representation vector of a user social media data vocabulary;
step S3: generating embedded representation of a neighbor user of a target user based on the user social network and the social activity information, and calculating a weighted social matrix according to the embedded representation of the neighbor user;
step S4: and training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
In an embodiment of the present invention, the step S1 includes: the user social media data vocabulary formed by the social media data set containing all the users is known as
Figure BDA0003250087290000021
Wherein c isi∈RfRepresenting the one-hot code of the ith Word in the vocabulary library, and the variable f ═ C |, representing the number of different vocabularies in the vocabulary library, and embedding the characterization vector for the vocabulary in C through Word2vec network learning
Figure BDA0003250087290000022
Vectorizing words in a vocabulary library from a collection of textual content, where each wiThe word embedding representation of the ith word, and k is the vector dimension of each word after network learning.
In an embodiment of the present invention, the step S2 includes: obtaining the embedded characterization vectors E of all wordswThen, because the basic object for analyzing the hidden community attribute is the user, the vocabulary embedded representation needs to be converted into the target user embedded representation to enable the subsequent attention-seeking neural network to analyze by taking the target user as the object, and therefore, the target embedded layer based on the forward full-connection network embeds the vocabularies of all words into the representation EwEmbedded representation converted into target user through full-connection form
Figure BDA0003250087290000031
Figure BDA0003250087290000032
Where n represents the number of target users, ui′∈RkAn embedded token representing the ith target user.
In an embodiment of the present invention, the step S2 specifically includes: for target user u*And target user u*All words co-occurring in the same social media content form the user's characterization word set, denoted as { u*}={w1′ w2′… wf′Each one of wi′Belong to EwFor a target user u*In other words, the number f' of the corresponding elements in the token vocabulary set does not exceed EwThe position of these words and the distance of the text from the target user are recorded as { u }*The distances between all vocabularies and target users in the text data are normalized and then are given different weights { p }1,…pf′Is formed into u*′=p1w1′+p2w2′+…pf′wf′As an embedded representation of the target user.
In an embodiment of the present invention, the step S3 includes:
for target user u*The various social relationships of (a) are converted into weight parameters;
metric target user u*Social popularity among its neighbor users, social popularity coefficient mi,pBy target user u*Calculating the social times and social behavior categories of the users adjacent to the users;
repeating the above process for each user to obtain the adjacency weight between any two users, and h between users without social relationshipi,pmi,p=0;
Recording adjacency weights among all n users as a weighted social matrix A e Rn×nWherein h isr,jmr,je.A represents a social weight parameter between user r and user j.
In one embodiment of the invention, the pair of target users u*The various social relationships are converted into weight parameters, specifically: recording target user u*Generating neighbor users of social relations, wherein the users have social behaviors of paying attention, agreeing on, forwarding, commenting and the like, and recording the target user u*Is [ u ] as a set of neighbor users*]To [ u ]*]In each neighbor user ui1,…,ik∈[u*]To obtain the target user u*Embedding characterization u ofi' and Embedded characterization of these neighbor users (u)i1′,ui2′,…,uik') calculate user u*And neighbor user set u*]In each neighbor user uipCosine similarity of `:
Figure BDA0003250087290000033
in one embodiment of the invention, the metric target user u*And the social popularity among the neighbor users thereof, specifically: remember hi,p=∑h(actioni,p)·mountaciton,i,pH is a weighted value, and fusing the social times mount and the social behavior type action between the users, specifically h (action)i,p) Representing a target user u*And each of its neighbor users uip' weight of social action category action, mounti,pRecording the times of corresponding social behavior action types, and calculating to obtain the target user u*And all its neighbor users u*]The adjacency weight between { h }i,pmi,p}。
In an embodiment of the present invention, the training of the attention-seeking neural network in step S4 includes:
calculating target user u*And its neighbor user kik:eik= hi,kmi,ka(Wui′,Wuk') where a and W are parameters to be learned, W is a shared parameter matrix in the neural network, a is an attention calculation function, hi, kmi,kRepresenting a target user u*And its neighbor user k, by pair eikPerforming softmax processing to obtain a target user u*And an arbitrary user jij
After training learning a and W, the attention coefficient alpha is calculated according to a and WijEmbedding tokens E for all usersuWeighted summation is carried out to obtain the target user u*Fusion embedding characterization of
Figure BDA0003250087290000041
Wherein u isj' go through Eu
Training forward propagation network, inputting target user u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The community attribute vector is obtained by known information, t is the total number of the community attribute types, and when the target user u*Has a certain community attribute of w hours swOtherwise, it is 0.
In one embodiment of the present invention, the actions are four categories, i.e., attention, comment, like, and forward, and the normalized weights are h (action is 0.4), h (action is 0.25), and h (action is 0.1).
According to another aspect of the present invention, there is also provided an attention-seeking neural network-based hidden community attribute acquisition system, including an embedded token vector generation module, a target user embedded token generation module, a weighted social matrix generation module, and a target user attribute classification module, where:
the embedded representation vector generation module is used for learning all words in the user social media data Word library through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
the target user embedded representation generation module is used for carrying out normalized weighting on embedded representation vectors of user social media data vocabularies and obtaining embedded representations of target users based on a target embedded layer of a forward full-connection network;
the weight social matrix generating module is used for generating embedded representation of a neighbor user of a target user based on the user social network and social activity information, and calculating a weight social matrix according to the embedded representation of the neighbor user;
the target user attribute classification module is used for training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
Generally, compared with the prior art, the technical scheme of the invention has the following beneficial effects:
(1) text features are extracted through the social media data and the social relations of the users, user social output contents and user community attribute features contained in network social interaction can be fully reserved, and compared with the traditional method of analyzing user personal network information which is difficult to distinguish true from false and an intricate social network, the method can more accurately and efficiently mine user attribute information;
(2) training an attention diagram neural network to construct a user hidden community attribute mining model, fusing social text information and interaction information, and describing the user hidden community attribute more comprehensively compared with a traditional method focusing on analyzing the user social network, so that the problem of data sparsity can be relieved;
(3) the social relationship weight between the users is calculated by combining the social popularity and the embedded representation of the neighbor users, the social behavior and the individual characteristics of the users are taken as the factors considering the social relationship weight, the limitation that only the existing social relationship is considered is broken through, and the social relationship attribute of the users is favorably mined as comprehensively as possible.
Drawings
Fig. 1 is a flowchart illustrating a hidden community attribute obtaining method based on an attention-seeking neural network according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in FIG. 1, the invention discloses a hidden community attribute acquisition method based on an attention-seeking neural network, which comprises the following steps:
step S1: all words in a user social media data Word library are learned through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
specifically, the step S1 includes: the user social media data vocabulary formed by the social media data set containing all the users is known as
Figure BDA0003250087290000061
Wherein, ci∈RfOne-hot code representing the ith word in the vocabulary library, and the variable f ═ C | representing the number of different vocabularies in the vocabulary library. For C vocabulary, embedding a characterization vector through Word2vec network learning
Figure BDA0003250087290000062
Figure BDA0003250087290000063
Words in the vocabulary library are vectorized according to the text content set. Wherein each wiThe word embedding representation of the ith word, and k is the vector dimension of each word after network learning.
Step S2: obtaining an embedded representation of a target user based on a target embedded layer of a forward full-connection network by normalizing and weighting an embedded representation vector of a user social media data vocabulary;
specifically, the step S2 includes: obtaining the embedded characterization vectors E of all wordswThen, since the basic object for performing the hidden community attribute analysis is the user, the vocabulary embedded representation needs to be converted into the target user embedded representation, so that the subsequent attention-seeking neural network can perform analysis by taking the target user as the object. To this end, the vocabulary of all words is embedded into the representation E based on the target embedding layer of the forward full-connection networkwEmbedded representation converted into target user through full-connection form
Figure BDA0003250087290000064
Figure BDA0003250087290000065
Where n represents the number of target users, ui′∈RkEmbedding representing ith target userAnd (5) characterizing.
In particular, for target user u*And target user u*All words co-occurring in the same social media content form the user's characterization word set, denoted as { u*}= {w1′ w2′ …wf′Each one of wi′Belong to EwFor a target user u*In other words, the number f' of the corresponding elements in the token vocabulary set does not exceed EwThe size of (2). The location of these words and the text distance to the target user are recorded (in words, e.g., "Zhang three purchases computer", the distance between "buy" and "Zhang three" is 1, and the distance between "computer" and "Zhang three" is 2), according to { u }*The distances between all vocabularies and target users in the text data are normalized and then are given different weights { p }1,…pf′Is formed into u*′=p1w1′+p2w2′+…pf′wf′As an embedded representation of the target user.
Step S3: generating embedded representation of a neighbor user of a target user based on the user social network and the social activity information, and calculating a weighted social matrix according to the embedded representation of the neighbor user;
specifically, the step S3 includes: for target user u*Into a weight parameter. First record the target user u*Generating neighbor users of social relations, wherein the users have social behaviors of paying attention, agreeing on, forwarding, commenting and the like, and recording the target user u*Is [ u ] as a set of neighbor users*]To [ u ]*]In each neighbor user ui1,…,ik∈[u*]To obtain the target user u*Embedding characterization u ofi' and Embedded characterization of these neighbor users (u)i1′,ui2′,…,uik') calculate user u*And neighbor user set u*]In each neighbor user uipCosine similarity of `:
Figure BDA0003250087290000071
further measure target user u*Social popularity among its neighbor users, social popularity coefficient hi,pBy target user u*And calculating the social times and social behavior categories of the neighbor users. Specifically, note hi,p=∑h(actioni,p)·mountaciton,i,pH is a weighted value, and fusing the social times mount and the social behavior type action between the users, specifically h (action)i,p) Representing a target user u*And each of its neighbor users uip' weight of social action category action. The action is four categories of attention, comment, like and forward, and the normalized weight is h (action is 0.4), h (action is 0.25), h (action is 0.1). mounti,pRecording the times of corresponding social behavior action types, and calculating to obtain the target user u*And all its neighbor users u*]The adjacency weight between { h }i,pmi,p}. The process is repeated for each user, so that the adjacency weight between any two users can be obtained, and h is between the users without social relationshipi,pmi,p0. Recording adjacency weights among all n users as a weighted social matrix A e Rn×nWherein h isr,jmr,je.A represents a social weight parameter between user r and user j.
Step S4: and training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
Specifically, the step S4 is implemented by the method of attempting a neural network including:
firstly, a target user u is calculated*And its neighbor user kik
eik=hi,kmi,ka(Wui′,Wuk′)
Wherein the content of the first and second substances,a and W are parameters to be learned, W is a shared parameter matrix in the neural network, a is an attention calculation function, hi,kmi,kRepresenting a target user u*And social weight parameters between its neighbor users k.
By pair eikPerforming softmax processing to obtain a target user u*And an arbitrary user jij
Secondly, after learning a and W, training is performed based on the attention coefficient alpha calculated based on a and WijEmbedding tokens E for all usersuWeighted summation is carried out to obtain the target user u*Fusion embedding characterization of
Figure BDA0003250087290000081
Figure BDA0003250087290000082
Wherein u isj' go through Eu
Finally, training the forward propagation network and inputting the target user u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The community attribute vector is derived from known information: t is the total number of community attribute categories, when the target user u*Has a certain community attribute of w hours swOtherwise, it is 0. By trained target users u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The mapping between the social groups obtains the community attribute missing from any target user.
The process of the invention is illustrated below with reference to a specific embodiment:
(1) applying the embedded representation of Word2vec network learning words:
the vocabulary library formed by the social media data sets containing all users is known as C ═ C (C)1,c2,…,cf)∈Rf×fWherein c isi∈RfOne-hot code representing the ith word in the vocabulary library, the variable f ═ C | representing the individual words in the vocabulary libraryAnd (4) counting. For the vocabulary in C, Word2vec network learning embedded representation is applied
Figure BDA0003250087290000091
Words in the vocabulary library are quantized according to the set of text content. Wherein each wiWord2vec embedded representation of the ith Word.
Take the example of "Wu-Yi buys computer" and "Wang Yi Xiao Zan Wu", and the vocabulary library thereof is [ Wu-Yi, buy, computer, Wang Yi, Bai Zan)]And f is 5. Obtaining the embedded representation (w) of each Word of the media data by applying Word2vec network1,w2,w3,w4,w5)。
(2) Obtaining a target user embedding representation based on a target embedding layer of a forward full-connection network:
deriving an embedded representation (w) of each word according to claim 21,w2,w3,w4,w5) And then, for each target user, Wu-Bi and Wang-Bi, respectively forming a representation vocabulary set of the user with all vocabularies which are co-occurring in the same event text, and keeping core information representing the hidden community attribute of the target user in the data as far as possible under the condition of eliminating redundant information to the maximum extent. For example "Wu-a" corresponds to { u1}={w1′w2′w3′Wu-jia, buy, computer, wang-jia corresponds to u-jia2}={w4′w5′w1′And { wang, bai zi, wu zi }. Wherein the text distance between 'purchase' and 'computer' from 'Wu-chi' is 1 and 2, the distance between 'Pengzai' and 'Wu-chi' from 'Wangzai' is 1 and 2, and different weights are given after reverse normalization, namely the target user of 'Wu-chi' is characterized as u1′=w1′+0.66w2′+0.33w3′The target user of Wangzao is characterized as u2′= w4′+0.66w5′+0.33w1′. I.e. target user embedding representation Eu=(u1′,u2′)。
(3) Generating a weighted social matrix based on the user social network and the social activity information:
various social relationships between the target users are translated into weight parameters. Firstly, recording neighbor users of each user generating social relations, wherein the users have social behaviors such as concern, praise, forward, comment and the like, and recording the neighbor user set of a target user 'Wu-somewhat' as u1]To set of [ u ]1]In each neighbor user ui1,…,ik∈[u1]To obtain their embedded characterization (u)i1′,ui2′,…,uik') obtain a target user Wu-somewhat and a neighbor user set [ u ]1]Each user u inipCosine similarity of (c):
Figure BDA0003250087290000101
further taking the social times and the social behavior categories between the target user Wu-somewhat and the neighbor users as social heat to measure, and recording hi,kΣ h (action) mount, where h is a weighted value, fusing the social times mount between users and the social behavior category action. The method of claim 4 wherein the target user "Wu-and-u" and all its neighbor users u are obtained according to the weight assignmenti1,…,ik∈[u1]The adjacency weight between { h }i,pmi,p}. Repeating the above process can obtain the adjacency weight between any two users, and h between users without social relationshipi,kmi,k0. Recording the weights as the adjacency matrix A ∈ Rn×nWherein h isi,jmi,je.A represents a social weight parameter between user i and user j.
(4) Training a social popularity weighted attention-seeking neural network, and generating a hidden community attribute classification result of a target user:
the attention-seeking neural network is trained to complete the task of fusing textual content information with the target user's social information. Firstly, calculating a similarity coefficient e between a user i and a neighbor user kik
eik=hi,kmi,ka(Wui′,Wuk') where a and W are parameters that need to be learned, W is a shared parameter matrix in the neural network, and a is an attention calculation function. By pair eikPerforming softmax processing to obtain the attention coefficient alpha between the user i and any user jij. After training learning a and W, embedding a characterization E for a user according to an attention coefficient calculated based on a and WuWeighted summation is carried out to obtain fusion embedding representation h of target user Wu-Chi*
Finally training the forward propagation network and inputting the target user u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) T is the total number of community attribute categories, when the target user u*Has a certain community attribute of w hours swOtherwise, it is 0. By trained target users u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The mapping between the two groups obtains certain community attributes of the target user 'wu-chi' missing.
Further, the invention also provides a hidden community attribute acquisition system based on an attention-seeking neural network, which comprises an embedded characterization vector generation module, a target user embedded characterization generation module, a weight social matrix generation module and a target user attribute classification module, wherein:
the embedded representation vector generation module is used for learning all words in the user social media data Word library through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
the target user embedded representation generation module is used for carrying out normalized weighting on embedded representation vectors of user social media data vocabularies and obtaining embedded representations of target users based on a target embedded layer of a forward full-connection network;
the weight social matrix generating module is used for generating embedded representation of a neighbor user of a target user based on the user social network and social activity information, and calculating a weight social matrix according to the embedded representation of the neighbor user;
the target user attribute classification module is used for training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A hidden community attribute acquisition method based on an attention-seeking neural network is characterized by comprising the following steps:
step S1: all words in a user social media data Word library are learned through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
step S2: obtaining an embedded representation of a target user based on a target embedded layer of a forward full-connection network by normalizing and weighting an embedded representation vector of a user social media data vocabulary;
step S3: generating embedded representation of a neighbor user of a target user based on the user social network and the social activity information, and calculating a weighted social matrix according to the embedded representation of the neighbor user;
step S4: and training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
2. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 1, wherein the step S1 comprises:
the user social media data vocabulary formed by the social media data set containing all the users is known as
Figure FDA0003250087280000011
Wherein c isi∈RfRepresenting the one-hot code of the ith Word in the vocabulary library, and the variable f ═ C |, representing the number of different vocabularies in the vocabulary library, and embedding the characterization vector for the vocabulary in C through Word2vec network learning
Figure FDA0003250087280000012
Vectorizing words in a vocabulary library from a collection of textual content, where each wiThe word embedding representation of the ith word, and k is the vector dimension of each word after network learning.
3. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 1 or 2, wherein the step S2 comprises: obtaining the embedded characterization vectors E of all wordswThen, because the basic object for analyzing the hidden community attribute is the user, the vocabulary embedded representation needs to be converted into the target user embedded representation to enable the subsequent attention-seeking neural network to analyze by taking the target user as the object, and therefore, the target embedded layer based on the forward full-connection network embeds the vocabularies of all words into the representation EwEmbedded representation converted into target user through full-connection form
Figure FDA0003250087280000021
Figure FDA0003250087280000022
Where n represents the number of target users, ui′∈RkAn embedded token representing the ith target user.
4. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 3, wherein the step S2 is specifically as follows: for target user u*And target user u*All words co-occurring in the same social media content form the user's characterization word set, denoted as { u*}={w1′ w2′ … wf′Therein ofEach wi′Belong to EwFor a target user u*In other words, the number f' of the corresponding elements in the token vocabulary set does not exceed EwThe position of these words and the distance of the text from the target user are recorded as { u }*The distances between all vocabularies and target users in the text data are normalized and then are given different weights { p }1,…pf′Is formed into u*′=p1w1′+p2w2′+…pf′wf′As an embedded representation of the target user.
5. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 1 or 2, wherein the step S3 comprises:
for target user u*The various social relationships of (a) are converted into weight parameters;
metric target user u*Social popularity among its neighbor users, social popularity coefficient mi,pBy target user u*Calculating the social times and social behavior categories of the users adjacent to the users;
repeating the above process for each user to obtain the adjacency weight between any two users, and h between users without social relationshipi,pmi,p=0;
Recording adjacency weights among all n users as a weighted social matrix A e Rn×nWherein h isr,jmr,je.A represents a social weight parameter between user r and user j.
6. The method of claim 5, wherein the target user u is obtained from a hidden community attribute of the attention-driven neural network*The various social relationships are converted into weight parameters, specifically:
recording target user u*Generating neighbor users of social relations, wherein the users have social behaviors of paying attention, agreeing on, forwarding, commenting and the like, and recording the target user u*Is a set of neighbor users[u*]To [ u ]*]In each neighbor user ui1,…,ik∈[u*]To obtain the target user u*Embedding characterization u ofi' and Embedded characterization of these neighbor users (u)i1′,ui2′,…,uik') calculate user u*And neighbor user set u*]In each neighbor user uipCosine similarity of `:
Figure FDA0003250087280000031
7. the method of claim 6, wherein the metric target users u are selected from the group consisting of a group of users with a hidden social attribute, and a group of users with a hidden social attribute*And the social popularity among the neighbor users thereof, specifically:
remember hi,p=∑h(actioni,p)·mountaciton,i,pH is a weighted value, and fusing the social times mount and the social behavior type action between the users, specifically h (action)i,p) Representing a target user u*And each of its neighbor users uip' weight of social action category action, mounti,pRecording the times of corresponding social behavior action types, and calculating to obtain the target user u*And all its neighbor users u*]The adjacency weight between { h }i,pmi,p}。
8. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 1 or 2, wherein the training of the attention-seeking neural network in the step S4 comprises:
calculating target user u*And its neighbor user kik:eik=hi,kmi,ka(Wui′,Wuk') where a and W are parameters to be learned, W is a shared parameter matrix in the neural network, a is an attention calculation function, hi,jmi,kRepresenting a target user u*And its neighbor user kSocial weighting parameter by pair eikPerforming softmax processing to obtain a target user u*And an arbitrary user jij
After training learning a and W, the attention coefficient alpha is calculated according to a and WijEmbedding tokens E for all usersuWeighted summation is carried out to obtain the target user u*Fusion embedding characterization of
Figure FDA0003250087280000032
Wherein u isj' go through Eu
Training forward propagation network, inputting target user u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The community attribute vector is obtained by known information, t is the total number of the community attribute types, and when the target user u*Has a certain community attribute of w hours swOtherwise, it is 0.
9. The hidden community attribute acquisition method based on attention-driven neural network as claimed in claim 8, wherein the actions are focus, comment, like-back and forward, respectively, and the normalized weights are h (action) 0.4, h (action) 0.25 and h (action) 0.1.
10. A hidden community attribute acquisition system based on an attention-seeking neural network is characterized by comprising an embedded token vector generation module, a target user embedded token generation module, a weight social matrix generation module and a target user attribute classification module, wherein:
the embedded representation vector generation module is used for learning all words in the user social media data Word library through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
the target user embedded representation generation module is used for carrying out normalized weighting on embedded representation vectors of user social media data vocabularies and obtaining embedded representations of target users based on a target embedded layer of a forward full-connection network;
the weight social matrix generating module is used for generating embedded representation of a neighbor user of a target user based on the user social network and social activity information, and calculating a weight social matrix according to the embedded representation of the neighbor user;
the target user attribute classification module is used for training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
CN202111047006.0A 2021-09-07 2021-09-07 Hidden community attribute acquisition method and system based on attention-seeking neural network Pending CN113807978A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111047006.0A CN113807978A (en) 2021-09-07 2021-09-07 Hidden community attribute acquisition method and system based on attention-seeking neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111047006.0A CN113807978A (en) 2021-09-07 2021-09-07 Hidden community attribute acquisition method and system based on attention-seeking neural network

Publications (1)

Publication Number Publication Date
CN113807978A true CN113807978A (en) 2021-12-17

Family

ID=78940536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111047006.0A Pending CN113807978A (en) 2021-09-07 2021-09-07 Hidden community attribute acquisition method and system based on attention-seeking neural network

Country Status (1)

Country Link
CN (1) CN113807978A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127204A (en) * 2023-04-17 2023-05-16 中国科学技术大学 Multi-view user portrayal method, multi-view user portrayal system, apparatus, and medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2426634A1 (en) * 2010-09-03 2012-03-07 Blueconomics Business Solutions GmbH Computer-implemented method and system for processing and monitoring business-to -business relationships
US20120311030A1 (en) * 2011-05-31 2012-12-06 International Business Machines Corporation Inferring User Interests Using Social Network Correlation and Attribute Correlation
US20160379132A1 (en) * 2015-06-23 2016-12-29 Adobe Systems Incorporated Collaborative feature learning from social media
CN108090607A (en) * 2017-12-13 2018-05-29 中山大学 A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse
CN108492200A (en) * 2018-02-07 2018-09-04 中国科学院信息工程研究所 A kind of user property estimating method and device based on convolutional neural networks
CN110781406A (en) * 2019-10-14 2020-02-11 西安交通大学 Social network user multi-attribute inference method based on variational automatic encoder
US10685183B1 (en) * 2018-01-04 2020-06-16 Facebook, Inc. Consumer insights analysis using word embeddings
CN112084335A (en) * 2020-09-09 2020-12-15 电子科技大学 Social media user account classification method based on information fusion

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP2426634A1 (en) * 2010-09-03 2012-03-07 Blueconomics Business Solutions GmbH Computer-implemented method and system for processing and monitoring business-to -business relationships
US20120311030A1 (en) * 2011-05-31 2012-12-06 International Business Machines Corporation Inferring User Interests Using Social Network Correlation and Attribute Correlation
US20160379132A1 (en) * 2015-06-23 2016-12-29 Adobe Systems Incorporated Collaborative feature learning from social media
CN108090607A (en) * 2017-12-13 2018-05-29 中山大学 A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse
US10685183B1 (en) * 2018-01-04 2020-06-16 Facebook, Inc. Consumer insights analysis using word embeddings
CN108492200A (en) * 2018-02-07 2018-09-04 中国科学院信息工程研究所 A kind of user property estimating method and device based on convolutional neural networks
CN110781406A (en) * 2019-10-14 2020-02-11 西安交通大学 Social network user multi-attribute inference method based on variational automatic encoder
CN112084335A (en) * 2020-09-09 2020-12-15 电子科技大学 Social media user account classification method based on information fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
宋巍;谢兴波;刘丽珍;王函石;: "用户隐藏属性推断研究综述", 小型微型计算机系统, no. 02 *
琚春华;陈彦;鲍福光;: "融入网络结构与社交习惯的不对称用户关系强度计算", 系统工程理论与实践, no. 08 *
董祥祥;梁英;谢小杰;: "融合多类型信息的社交网络用户表示学习方法", 重庆理工大学学报(自然科学), no. 05 *
蔡崇超;许华虎;: "基于社交网络的推荐系统研究", 软件导刊, no. 01 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116127204A (en) * 2023-04-17 2023-05-16 中国科学技术大学 Multi-view user portrayal method, multi-view user portrayal system, apparatus, and medium

Similar Documents

Publication Publication Date Title
CN109492157B (en) News recommendation method and theme characterization method based on RNN and attention mechanism
Lu et al. Why I like it: multi-task learning for recommendation and explanation
Anastasopoulos et al. Machine learning for public administration research, with application to organizational reputation
Zhou et al. Deep learning based fusion approach for hate speech detection
Wang et al. Collaborative deep learning for recommender systems
CN112131350B (en) Text label determining method, device, terminal and readable storage medium
Krešňáková et al. Deep learning methods for Fake News detection
CN113505204B (en) Recall model training method, search recall device and computer equipment
CN111783474A (en) Comment text viewpoint information processing method and device and storage medium
Ma et al. A deep-learning based citation count prediction model with paper metadata semantic features
CN110096575B (en) Psychological portrait method facing microblog user
CN111382190B (en) Object recommendation method and device based on intelligence and storage medium
CN113191154B (en) Semantic analysis method, system and storage medium based on multi-modal graph neural network
CN114238573B (en) Text countercheck sample-based information pushing method and device
Rabeya et al. Sentiment analysis of bangla song review-a lexicon based backtracking approach
CN109062902A (en) A kind of text semantic expression and device
CN111368082A (en) Emotion analysis method for domain adaptive word embedding based on hierarchical network
Tao et al. Log2intent: Towards interpretable user modeling via recurrent semantics memory unit
Ikawati et al. Student behavior analysis to predict learning styles based felder silverman model using ensemble tree method
Liu et al. Age inference using a hierarchical attention neural network
CN113807978A (en) Hidden community attribute acquisition method and system based on attention-seeking neural network
Abdi et al. Using an auxiliary dataset to improve emotion estimation in users’ opinions
Chou et al. Rating prediction based on merge-CNN and concise attention review mining
Uttarwar et al. Artificial intelligence based system for preliminary rounds of recruitment process
Mavaie et al. Hybrid deep learning approach to improve classification of low-volume high-dimensional data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination