CN113807978A - Hidden community attribute acquisition method and system based on attention-seeking neural network - Google Patents
Hidden community attribute acquisition method and system based on attention-seeking neural network Download PDFInfo
- Publication number
- CN113807978A CN113807978A CN202111047006.0A CN202111047006A CN113807978A CN 113807978 A CN113807978 A CN 113807978A CN 202111047006 A CN202111047006 A CN 202111047006A CN 113807978 A CN113807978 A CN 113807978A
- Authority
- CN
- China
- Prior art keywords
- social
- user
- target user
- users
- attention
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 47
- 238000000034 method Methods 0.000 title claims abstract description 27
- 239000013598 vector Substances 0.000 claims abstract description 45
- 239000011159 matrix material Substances 0.000 claims abstract description 29
- 238000012549 training Methods 0.000 claims abstract description 19
- 230000000694 effects Effects 0.000 claims abstract description 8
- 230000009471 action Effects 0.000 claims description 29
- 238000012512 characterization method Methods 0.000 claims description 29
- 230000011273 social behavior Effects 0.000 claims description 16
- 230000004927 fusion Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 230000006870 function Effects 0.000 claims description 4
- 238000012545 processing Methods 0.000 claims description 4
- 238000004458 analytical method Methods 0.000 description 5
- 238000005065 mining Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 3
- 230000006872 improvement Effects 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001149 cognitive effect Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003997 social interaction Effects 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
- G06F16/9536—Search customisation based on social or collaborative filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Abstract
The invention discloses a hidden community attribute acquisition method based on an attention-seeking neural network, which comprises the following steps: all words in a user social media data Word library are learned through a Word vector model Word2vec network to obtain embedded representation vectors of all words; obtaining an embedded representation of a target user based on a target embedded layer of a forward full-connection network by normalizing and weighting an embedded representation vector of a user social media data vocabulary; generating embedded representation of a neighbor user of a target user based on the user social network and the social activity information, and calculating a weighted social matrix according to the embedded representation of the neighbor user; and training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user. The invention also provides a corresponding hidden community attribute acquisition system based on the attention-seeking neural network.
Description
Technical Field
The invention belongs to the technical field of attribute mining, and particularly relates to a hidden community attribute acquisition method and system based on an attention-seeking neural network.
Background
User research in a network space is one of important tasks in the current internet personalized recommendation field, and with continuous expansion of social network scale, user attribute information in the network space has the characteristics of sparsity, fragmentity, heterogeneity and the like, so that acquisition of certain hidden community attribute information of a target user is very difficult, and further analysis and recommendation work is difficult. How to mine user potential or hidden community attribute information in a social network platform through effective technical means is crucial to analyzing potential preferences exhibited by target users with respect to contents to be recommended.
For user attribute cognitive analysis, most of attribute features adopted in the existing research are lightweight features, the contained information is not rich enough, and the attribute embedding representation and exploration of a target user are not sufficient, so that a large amount of information still exists for mining. Secondly, the attribute classification of the target users generated by the existing research is still too simple, the target users are usually the classification targets with simpler gender, education background and the like, the exploration on the fusion aspect of various media content information is still very lacked, and a larger explorable space still exists for the content behavior characterization method of the target users.
Disclosure of Invention
Aiming at the defects or the improvement requirements of the prior art, the invention provides a hidden community attribute acquisition method based on an attention-seeking neural network, which integrates text content information issued by target users and social information among the target users to acquire the hidden community attribute information and also considers information mining of interaction between target output content and a target of the social network. For example, two users having the same preferences and having mutual attention are likely to mutually favor or forward media data generated by each other, thereby forming a close target connection relationship. The invention realizes the uniform representation of the cross-space target attribute information, more efficiently excavates the hidden community attribute information of the target user in the network space, and is convenient for further analysis and recommendation work.
To achieve the above object, according to an aspect of the present invention, there is provided a hidden community attribute obtaining method based on an attention-seeking neural network, including:
step S1: all words in a user social media data Word library are learned through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
step S2: obtaining an embedded representation of a target user based on a target embedded layer of a forward full-connection network by normalizing and weighting an embedded representation vector of a user social media data vocabulary;
step S3: generating embedded representation of a neighbor user of a target user based on the user social network and the social activity information, and calculating a weighted social matrix according to the embedded representation of the neighbor user;
step S4: and training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
In an embodiment of the present invention, the step S1 includes: the user social media data vocabulary formed by the social media data set containing all the users is known asWherein c isi∈RfRepresenting the one-hot code of the ith Word in the vocabulary library, and the variable f ═ C |, representing the number of different vocabularies in the vocabulary library, and embedding the characterization vector for the vocabulary in C through Word2vec network learningVectorizing words in a vocabulary library from a collection of textual content, where each wiThe word embedding representation of the ith word, and k is the vector dimension of each word after network learning.
In an embodiment of the present invention, the step S2 includes: obtaining the embedded characterization vectors E of all wordswThen, because the basic object for analyzing the hidden community attribute is the user, the vocabulary embedded representation needs to be converted into the target user embedded representation to enable the subsequent attention-seeking neural network to analyze by taking the target user as the object, and therefore, the target embedded layer based on the forward full-connection network embeds the vocabularies of all words into the representation EwEmbedded representation converted into target user through full-connection form Where n represents the number of target users, ui′∈RkAn embedded token representing the ith target user.
In an embodiment of the present invention, the step S2 specifically includes: for target user u*And target user u*All words co-occurring in the same social media content form the user's characterization word set, denoted as { u*}={w1′ w2′… wf′Each one of wi′Belong to EwFor a target user u*In other words, the number f' of the corresponding elements in the token vocabulary set does not exceed EwThe position of these words and the distance of the text from the target user are recorded as { u }*The distances between all vocabularies and target users in the text data are normalized and then are given different weights { p }1,…pf′Is formed into u*′=p1w1′+p2w2′+…pf′wf′As an embedded representation of the target user.
In an embodiment of the present invention, the step S3 includes:
for target user u*The various social relationships of (a) are converted into weight parameters;
metric target user u*Social popularity among its neighbor users, social popularity coefficient mi,pBy target user u*Calculating the social times and social behavior categories of the users adjacent to the users;
repeating the above process for each user to obtain the adjacency weight between any two users, and h between users without social relationshipi,pmi,p=0;
Recording adjacency weights among all n users as a weighted social matrix A e Rn×nWherein h isr,jmr,je.A represents a social weight parameter between user r and user j.
In one embodiment of the invention, the pair of target users u*The various social relationships are converted into weight parameters, specifically: recording target user u*Generating neighbor users of social relations, wherein the users have social behaviors of paying attention, agreeing on, forwarding, commenting and the like, and recording the target user u*Is [ u ] as a set of neighbor users*]To [ u ]*]In each neighbor user ui1,…,ik∈[u*]To obtain the target user u*Embedding characterization u ofi' and Embedded characterization of these neighbor users (u)i1′,ui2′,…,uik') calculate user u*And neighbor user set u*]In each neighbor user uipCosine similarity of `:
in one embodiment of the invention, the metric target user u*And the social popularity among the neighbor users thereof, specifically: remember hi,p=∑h(actioni,p)·mountaciton,i,pH is a weighted value, and fusing the social times mount and the social behavior type action between the users, specifically h (action)i,p) Representing a target user u*And each of its neighbor users uip' weight of social action category action, mounti,pRecording the times of corresponding social behavior action types, and calculating to obtain the target user u*And all its neighbor users u*]The adjacency weight between { h }i,pmi,p}。
In an embodiment of the present invention, the training of the attention-seeking neural network in step S4 includes:
calculating target user u*And its neighbor user kik:eik= hi,kmi,ka(Wui′,Wuk') where a and W are parameters to be learned, W is a shared parameter matrix in the neural network, a is an attention calculation function, hi, kmi,kRepresenting a target user u*And its neighbor user k, by pair eikPerforming softmax processing to obtain a target user u*And an arbitrary user jij;
After training learning a and W, the attention coefficient alpha is calculated according to a and WijEmbedding tokens E for all usersuWeighted summation is carried out to obtain the target user u*Fusion embedding characterization ofWherein u isj' go through Eu;
Training forward propagation network, inputting target user u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The community attribute vector is obtained by known information, t is the total number of the community attribute types, and when the target user u*Has a certain community attribute of w hours swOtherwise, it is 0.
In one embodiment of the present invention, the actions are four categories, i.e., attention, comment, like, and forward, and the normalized weights are h (action is 0.4), h (action is 0.25), and h (action is 0.1).
According to another aspect of the present invention, there is also provided an attention-seeking neural network-based hidden community attribute acquisition system, including an embedded token vector generation module, a target user embedded token generation module, a weighted social matrix generation module, and a target user attribute classification module, where:
the embedded representation vector generation module is used for learning all words in the user social media data Word library through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
the target user embedded representation generation module is used for carrying out normalized weighting on embedded representation vectors of user social media data vocabularies and obtaining embedded representations of target users based on a target embedded layer of a forward full-connection network;
the weight social matrix generating module is used for generating embedded representation of a neighbor user of a target user based on the user social network and social activity information, and calculating a weight social matrix according to the embedded representation of the neighbor user;
the target user attribute classification module is used for training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
Generally, compared with the prior art, the technical scheme of the invention has the following beneficial effects:
(1) text features are extracted through the social media data and the social relations of the users, user social output contents and user community attribute features contained in network social interaction can be fully reserved, and compared with the traditional method of analyzing user personal network information which is difficult to distinguish true from false and an intricate social network, the method can more accurately and efficiently mine user attribute information;
(2) training an attention diagram neural network to construct a user hidden community attribute mining model, fusing social text information and interaction information, and describing the user hidden community attribute more comprehensively compared with a traditional method focusing on analyzing the user social network, so that the problem of data sparsity can be relieved;
(3) the social relationship weight between the users is calculated by combining the social popularity and the embedded representation of the neighbor users, the social behavior and the individual characteristics of the users are taken as the factors considering the social relationship weight, the limitation that only the existing social relationship is considered is broken through, and the social relationship attribute of the users is favorably mined as comprehensively as possible.
Drawings
Fig. 1 is a flowchart illustrating a hidden community attribute obtaining method based on an attention-seeking neural network according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.
As shown in FIG. 1, the invention discloses a hidden community attribute acquisition method based on an attention-seeking neural network, which comprises the following steps:
step S1: all words in a user social media data Word library are learned through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
specifically, the step S1 includes: the user social media data vocabulary formed by the social media data set containing all the users is known asWherein, ci∈RfOne-hot code representing the ith word in the vocabulary library, and the variable f ═ C | representing the number of different vocabularies in the vocabulary library. For C vocabulary, embedding a characterization vector through Word2vec network learning Words in the vocabulary library are vectorized according to the text content set. Wherein each wiThe word embedding representation of the ith word, and k is the vector dimension of each word after network learning.
Step S2: obtaining an embedded representation of a target user based on a target embedded layer of a forward full-connection network by normalizing and weighting an embedded representation vector of a user social media data vocabulary;
specifically, the step S2 includes: obtaining the embedded characterization vectors E of all wordswThen, since the basic object for performing the hidden community attribute analysis is the user, the vocabulary embedded representation needs to be converted into the target user embedded representation, so that the subsequent attention-seeking neural network can perform analysis by taking the target user as the object. To this end, the vocabulary of all words is embedded into the representation E based on the target embedding layer of the forward full-connection networkwEmbedded representation converted into target user through full-connection form Where n represents the number of target users, ui′∈RkEmbedding representing ith target userAnd (5) characterizing.
In particular, for target user u*And target user u*All words co-occurring in the same social media content form the user's characterization word set, denoted as { u*}= {w1′ w2′ …wf′Each one of wi′Belong to EwFor a target user u*In other words, the number f' of the corresponding elements in the token vocabulary set does not exceed EwThe size of (2). The location of these words and the text distance to the target user are recorded (in words, e.g., "Zhang three purchases computer", the distance between "buy" and "Zhang three" is 1, and the distance between "computer" and "Zhang three" is 2), according to { u }*The distances between all vocabularies and target users in the text data are normalized and then are given different weights { p }1,…pf′Is formed into u*′=p1w1′+p2w2′+…pf′wf′As an embedded representation of the target user.
Step S3: generating embedded representation of a neighbor user of a target user based on the user social network and the social activity information, and calculating a weighted social matrix according to the embedded representation of the neighbor user;
specifically, the step S3 includes: for target user u*Into a weight parameter. First record the target user u*Generating neighbor users of social relations, wherein the users have social behaviors of paying attention, agreeing on, forwarding, commenting and the like, and recording the target user u*Is [ u ] as a set of neighbor users*]To [ u ]*]In each neighbor user ui1,…,ik∈[u*]To obtain the target user u*Embedding characterization u ofi' and Embedded characterization of these neighbor users (u)i1′,ui2′,…,uik') calculate user u*And neighbor user set u*]In each neighbor user uipCosine similarity of `:
further measure target user u*Social popularity among its neighbor users, social popularity coefficient hi,pBy target user u*And calculating the social times and social behavior categories of the neighbor users. Specifically, note hi,p=∑h(actioni,p)·mountaciton,i,pH is a weighted value, and fusing the social times mount and the social behavior type action between the users, specifically h (action)i,p) Representing a target user u*And each of its neighbor users uip' weight of social action category action. The action is four categories of attention, comment, like and forward, and the normalized weight is h (action is 0.4), h (action is 0.25), h (action is 0.1). mounti,pRecording the times of corresponding social behavior action types, and calculating to obtain the target user u*And all its neighbor users u*]The adjacency weight between { h }i,pmi,p}. The process is repeated for each user, so that the adjacency weight between any two users can be obtained, and h is between the users without social relationshipi,pmi,p0. Recording adjacency weights among all n users as a weighted social matrix A e Rn×nWherein h isr,jmr,je.A represents a social weight parameter between user r and user j.
Step S4: and training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
Specifically, the step S4 is implemented by the method of attempting a neural network including:
firstly, a target user u is calculated*And its neighbor user kik:
eik=hi,kmi,ka(Wui′,Wuk′)
Wherein the content of the first and second substances,a and W are parameters to be learned, W is a shared parameter matrix in the neural network, a is an attention calculation function, hi,kmi,kRepresenting a target user u*And social weight parameters between its neighbor users k.
By pair eikPerforming softmax processing to obtain a target user u*And an arbitrary user jij。
Secondly, after learning a and W, training is performed based on the attention coefficient alpha calculated based on a and WijEmbedding tokens E for all usersuWeighted summation is carried out to obtain the target user u*Fusion embedding characterization of Wherein u isj' go through Eu。
Finally, training the forward propagation network and inputting the target user u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The community attribute vector is derived from known information: t is the total number of community attribute categories, when the target user u*Has a certain community attribute of w hours swOtherwise, it is 0. By trained target users u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The mapping between the social groups obtains the community attribute missing from any target user.
The process of the invention is illustrated below with reference to a specific embodiment:
(1) applying the embedded representation of Word2vec network learning words:
the vocabulary library formed by the social media data sets containing all users is known as C ═ C (C)1,c2,…,cf)∈Rf×fWherein c isi∈RfOne-hot code representing the ith word in the vocabulary library, the variable f ═ C | representing the individual words in the vocabulary libraryAnd (4) counting. For the vocabulary in C, Word2vec network learning embedded representation is appliedWords in the vocabulary library are quantized according to the set of text content. Wherein each wiWord2vec embedded representation of the ith Word.
Take the example of "Wu-Yi buys computer" and "Wang Yi Xiao Zan Wu", and the vocabulary library thereof is [ Wu-Yi, buy, computer, Wang Yi, Bai Zan)]And f is 5. Obtaining the embedded representation (w) of each Word of the media data by applying Word2vec network1,w2,w3,w4,w5)。
(2) Obtaining a target user embedding representation based on a target embedding layer of a forward full-connection network:
deriving an embedded representation (w) of each word according to claim 21,w2,w3,w4,w5) And then, for each target user, Wu-Bi and Wang-Bi, respectively forming a representation vocabulary set of the user with all vocabularies which are co-occurring in the same event text, and keeping core information representing the hidden community attribute of the target user in the data as far as possible under the condition of eliminating redundant information to the maximum extent. For example "Wu-a" corresponds to { u1}={w1′w2′w3′Wu-jia, buy, computer, wang-jia corresponds to u-jia2}={w4′w5′w1′And { wang, bai zi, wu zi }. Wherein the text distance between 'purchase' and 'computer' from 'Wu-chi' is 1 and 2, the distance between 'Pengzai' and 'Wu-chi' from 'Wangzai' is 1 and 2, and different weights are given after reverse normalization, namely the target user of 'Wu-chi' is characterized as u1′=w1′+0.66w2′+0.33w3′The target user of Wangzao is characterized as u2′= w4′+0.66w5′+0.33w1′. I.e. target user embedding representation Eu=(u1′,u2′)。
(3) Generating a weighted social matrix based on the user social network and the social activity information:
various social relationships between the target users are translated into weight parameters. Firstly, recording neighbor users of each user generating social relations, wherein the users have social behaviors such as concern, praise, forward, comment and the like, and recording the neighbor user set of a target user 'Wu-somewhat' as u1]To set of [ u ]1]In each neighbor user ui1,…,ik∈[u1]To obtain their embedded characterization (u)i1′,ui2′,…,uik') obtain a target user Wu-somewhat and a neighbor user set [ u ]1]Each user u inipCosine similarity of (c):
further taking the social times and the social behavior categories between the target user Wu-somewhat and the neighbor users as social heat to measure, and recording hi,kΣ h (action) mount, where h is a weighted value, fusing the social times mount between users and the social behavior category action. The method of claim 4 wherein the target user "Wu-and-u" and all its neighbor users u are obtained according to the weight assignmenti1,…,ik∈[u1]The adjacency weight between { h }i,pmi,p}. Repeating the above process can obtain the adjacency weight between any two users, and h between users without social relationshipi,kmi,k0. Recording the weights as the adjacency matrix A ∈ Rn×nWherein h isi,jmi,je.A represents a social weight parameter between user i and user j.
(4) Training a social popularity weighted attention-seeking neural network, and generating a hidden community attribute classification result of a target user:
the attention-seeking neural network is trained to complete the task of fusing textual content information with the target user's social information. Firstly, calculating a similarity coefficient e between a user i and a neighbor user kik:
eik=hi,kmi,ka(Wui′,Wuk') where a and W are parameters that need to be learned, W is a shared parameter matrix in the neural network, and a is an attention calculation function. By pair eikPerforming softmax processing to obtain the attention coefficient alpha between the user i and any user jij. After training learning a and W, embedding a characterization E for a user according to an attention coefficient calculated based on a and WuWeighted summation is carried out to obtain fusion embedding representation h of target user Wu-Chi*。
Finally training the forward propagation network and inputting the target user u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) T is the total number of community attribute categories, when the target user u*Has a certain community attribute of w hours swOtherwise, it is 0. By trained target users u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The mapping between the two groups obtains certain community attributes of the target user 'wu-chi' missing.
Further, the invention also provides a hidden community attribute acquisition system based on an attention-seeking neural network, which comprises an embedded characterization vector generation module, a target user embedded characterization generation module, a weight social matrix generation module and a target user attribute classification module, wherein:
the embedded representation vector generation module is used for learning all words in the user social media data Word library through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
the target user embedded representation generation module is used for carrying out normalized weighting on embedded representation vectors of user social media data vocabularies and obtaining embedded representations of target users based on a target embedded layer of a forward full-connection network;
the weight social matrix generating module is used for generating embedded representation of a neighbor user of a target user based on the user social network and social activity information, and calculating a weight social matrix according to the embedded representation of the neighbor user;
the target user attribute classification module is used for training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
It will be understood by those skilled in the art that the foregoing is only a preferred embodiment of the present invention, and is not intended to limit the invention, and that any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the scope of the present invention.
Claims (10)
1. A hidden community attribute acquisition method based on an attention-seeking neural network is characterized by comprising the following steps:
step S1: all words in a user social media data Word library are learned through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
step S2: obtaining an embedded representation of a target user based on a target embedded layer of a forward full-connection network by normalizing and weighting an embedded representation vector of a user social media data vocabulary;
step S3: generating embedded representation of a neighbor user of a target user based on the user social network and the social activity information, and calculating a weighted social matrix according to the embedded representation of the neighbor user;
step S4: and training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
2. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 1, wherein the step S1 comprises:
the user social media data vocabulary formed by the social media data set containing all the users is known asWherein c isi∈RfRepresenting the one-hot code of the ith Word in the vocabulary library, and the variable f ═ C |, representing the number of different vocabularies in the vocabulary library, and embedding the characterization vector for the vocabulary in C through Word2vec network learningVectorizing words in a vocabulary library from a collection of textual content, where each wiThe word embedding representation of the ith word, and k is the vector dimension of each word after network learning.
3. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 1 or 2, wherein the step S2 comprises: obtaining the embedded characterization vectors E of all wordswThen, because the basic object for analyzing the hidden community attribute is the user, the vocabulary embedded representation needs to be converted into the target user embedded representation to enable the subsequent attention-seeking neural network to analyze by taking the target user as the object, and therefore, the target embedded layer based on the forward full-connection network embeds the vocabularies of all words into the representation EwEmbedded representation converted into target user through full-connection form Where n represents the number of target users, ui′∈RkAn embedded token representing the ith target user.
4. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 3, wherein the step S2 is specifically as follows: for target user u*And target user u*All words co-occurring in the same social media content form the user's characterization word set, denoted as { u*}={w1′ w2′ … wf′Therein ofEach wi′Belong to EwFor a target user u*In other words, the number f' of the corresponding elements in the token vocabulary set does not exceed EwThe position of these words and the distance of the text from the target user are recorded as { u }*The distances between all vocabularies and target users in the text data are normalized and then are given different weights { p }1,…pf′Is formed into u*′=p1w1′+p2w2′+…pf′wf′As an embedded representation of the target user.
5. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 1 or 2, wherein the step S3 comprises:
for target user u*The various social relationships of (a) are converted into weight parameters;
metric target user u*Social popularity among its neighbor users, social popularity coefficient mi,pBy target user u*Calculating the social times and social behavior categories of the users adjacent to the users;
repeating the above process for each user to obtain the adjacency weight between any two users, and h between users without social relationshipi,pmi,p=0;
Recording adjacency weights among all n users as a weighted social matrix A e Rn×nWherein h isr,jmr,je.A represents a social weight parameter between user r and user j.
6. The method of claim 5, wherein the target user u is obtained from a hidden community attribute of the attention-driven neural network*The various social relationships are converted into weight parameters, specifically:
recording target user u*Generating neighbor users of social relations, wherein the users have social behaviors of paying attention, agreeing on, forwarding, commenting and the like, and recording the target user u*Is a set of neighbor users[u*]To [ u ]*]In each neighbor user ui1,…,ik∈[u*]To obtain the target user u*Embedding characterization u ofi' and Embedded characterization of these neighbor users (u)i1′,ui2′,…,uik') calculate user u*And neighbor user set u*]In each neighbor user uipCosine similarity of `:
7. the method of claim 6, wherein the metric target users u are selected from the group consisting of a group of users with a hidden social attribute, and a group of users with a hidden social attribute*And the social popularity among the neighbor users thereof, specifically:
remember hi,p=∑h(actioni,p)·mountaciton,i,pH is a weighted value, and fusing the social times mount and the social behavior type action between the users, specifically h (action)i,p) Representing a target user u*And each of its neighbor users uip' weight of social action category action, mounti,pRecording the times of corresponding social behavior action types, and calculating to obtain the target user u*And all its neighbor users u*]The adjacency weight between { h }i,pmi,p}。
8. The method for obtaining hidden community attributes based on attention-seeking neural network as claimed in claim 1 or 2, wherein the training of the attention-seeking neural network in the step S4 comprises:
calculating target user u*And its neighbor user kik:eik=hi,kmi,ka(Wui′,Wuk') where a and W are parameters to be learned, W is a shared parameter matrix in the neural network, a is an attention calculation function, hi,jmi,kRepresenting a target user u*And its neighbor user kSocial weighting parameter by pair eikPerforming softmax processing to obtain a target user u*And an arbitrary user jij;
After training learning a and W, the attention coefficient alpha is calculated according to a and WijEmbedding tokens E for all usersuWeighted summation is carried out to obtain the target user u*Fusion embedding characterization ofWherein u isj' go through Eu;
Training forward propagation network, inputting target user u*Fusion embedding characterization of (h)*And target user u*Community attribute vector(s)1,s2,…,st) The community attribute vector is obtained by known information, t is the total number of the community attribute types, and when the target user u*Has a certain community attribute of w hours swOtherwise, it is 0.
9. The hidden community attribute acquisition method based on attention-driven neural network as claimed in claim 8, wherein the actions are focus, comment, like-back and forward, respectively, and the normalized weights are h (action) 0.4, h (action) 0.25 and h (action) 0.1.
10. A hidden community attribute acquisition system based on an attention-seeking neural network is characterized by comprising an embedded token vector generation module, a target user embedded token generation module, a weight social matrix generation module and a target user attribute classification module, wherein:
the embedded representation vector generation module is used for learning all words in the user social media data Word library through a Word vector model Word2vec network to obtain embedded representation vectors of all words;
the target user embedded representation generation module is used for carrying out normalized weighting on embedded representation vectors of user social media data vocabularies and obtaining embedded representations of target users based on a target embedded layer of a forward full-connection network;
the weight social matrix generating module is used for generating embedded representation of a neighbor user of a target user based on the user social network and social activity information, and calculating a weight social matrix according to the embedded representation of the neighbor user;
the target user attribute classification module is used for training a social popularity weighted attention-seeking neural network according to the weighted social matrix, and generating a hidden community attribute classification result of the target user by using the attention-seeking neural network and the embedded representation of the target user.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111047006.0A CN113807978A (en) | 2021-09-07 | 2021-09-07 | Hidden community attribute acquisition method and system based on attention-seeking neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111047006.0A CN113807978A (en) | 2021-09-07 | 2021-09-07 | Hidden community attribute acquisition method and system based on attention-seeking neural network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113807978A true CN113807978A (en) | 2021-12-17 |
Family
ID=78940536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111047006.0A Pending CN113807978A (en) | 2021-09-07 | 2021-09-07 | Hidden community attribute acquisition method and system based on attention-seeking neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113807978A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116127204A (en) * | 2023-04-17 | 2023-05-16 | 中国科学技术大学 | Multi-view user portrayal method, multi-view user portrayal system, apparatus, and medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2426634A1 (en) * | 2010-09-03 | 2012-03-07 | Blueconomics Business Solutions GmbH | Computer-implemented method and system for processing and monitoring business-to -business relationships |
US20120311030A1 (en) * | 2011-05-31 | 2012-12-06 | International Business Machines Corporation | Inferring User Interests Using Social Network Correlation and Attribute Correlation |
US20160379132A1 (en) * | 2015-06-23 | 2016-12-29 | Adobe Systems Incorporated | Collaborative feature learning from social media |
CN108090607A (en) * | 2017-12-13 | 2018-05-29 | 中山大学 | A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse |
CN108492200A (en) * | 2018-02-07 | 2018-09-04 | 中国科学院信息工程研究所 | A kind of user property estimating method and device based on convolutional neural networks |
CN110781406A (en) * | 2019-10-14 | 2020-02-11 | 西安交通大学 | Social network user multi-attribute inference method based on variational automatic encoder |
US10685183B1 (en) * | 2018-01-04 | 2020-06-16 | Facebook, Inc. | Consumer insights analysis using word embeddings |
CN112084335A (en) * | 2020-09-09 | 2020-12-15 | 电子科技大学 | Social media user account classification method based on information fusion |
-
2021
- 2021-09-07 CN CN202111047006.0A patent/CN113807978A/en active Pending
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2426634A1 (en) * | 2010-09-03 | 2012-03-07 | Blueconomics Business Solutions GmbH | Computer-implemented method and system for processing and monitoring business-to -business relationships |
US20120311030A1 (en) * | 2011-05-31 | 2012-12-06 | International Business Machines Corporation | Inferring User Interests Using Social Network Correlation and Attribute Correlation |
US20160379132A1 (en) * | 2015-06-23 | 2016-12-29 | Adobe Systems Incorporated | Collaborative feature learning from social media |
CN108090607A (en) * | 2017-12-13 | 2018-05-29 | 中山大学 | A kind of social media user's ascribed characteristics of population Forecasting Methodology based on the fusion of multi-model storehouse |
US10685183B1 (en) * | 2018-01-04 | 2020-06-16 | Facebook, Inc. | Consumer insights analysis using word embeddings |
CN108492200A (en) * | 2018-02-07 | 2018-09-04 | 中国科学院信息工程研究所 | A kind of user property estimating method and device based on convolutional neural networks |
CN110781406A (en) * | 2019-10-14 | 2020-02-11 | 西安交通大学 | Social network user multi-attribute inference method based on variational automatic encoder |
CN112084335A (en) * | 2020-09-09 | 2020-12-15 | 电子科技大学 | Social media user account classification method based on information fusion |
Non-Patent Citations (4)
Title |
---|
宋巍;谢兴波;刘丽珍;王函石;: "用户隐藏属性推断研究综述", 小型微型计算机系统, no. 02 * |
琚春华;陈彦;鲍福光;: "融入网络结构与社交习惯的不对称用户关系强度计算", 系统工程理论与实践, no. 08 * |
董祥祥;梁英;谢小杰;: "融合多类型信息的社交网络用户表示学习方法", 重庆理工大学学报(自然科学), no. 05 * |
蔡崇超;许华虎;: "基于社交网络的推荐系统研究", 软件导刊, no. 01 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116127204A (en) * | 2023-04-17 | 2023-05-16 | 中国科学技术大学 | Multi-view user portrayal method, multi-view user portrayal system, apparatus, and medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109492157B (en) | News recommendation method and theme characterization method based on RNN and attention mechanism | |
Lu et al. | Why I like it: multi-task learning for recommendation and explanation | |
Anastasopoulos et al. | Machine learning for public administration research, with application to organizational reputation | |
Zhou et al. | Deep learning based fusion approach for hate speech detection | |
Wang et al. | Collaborative deep learning for recommender systems | |
CN112131350B (en) | Text label determining method, device, terminal and readable storage medium | |
Krešňáková et al. | Deep learning methods for Fake News detection | |
CN113505204B (en) | Recall model training method, search recall device and computer equipment | |
CN111783474A (en) | Comment text viewpoint information processing method and device and storage medium | |
Ma et al. | A deep-learning based citation count prediction model with paper metadata semantic features | |
CN110096575B (en) | Psychological portrait method facing microblog user | |
CN111382190B (en) | Object recommendation method and device based on intelligence and storage medium | |
CN113191154B (en) | Semantic analysis method, system and storage medium based on multi-modal graph neural network | |
CN114238573B (en) | Text countercheck sample-based information pushing method and device | |
Rabeya et al. | Sentiment analysis of bangla song review-a lexicon based backtracking approach | |
CN109062902A (en) | A kind of text semantic expression and device | |
CN111368082A (en) | Emotion analysis method for domain adaptive word embedding based on hierarchical network | |
Tao et al. | Log2intent: Towards interpretable user modeling via recurrent semantics memory unit | |
Ikawati et al. | Student behavior analysis to predict learning styles based felder silverman model using ensemble tree method | |
Liu et al. | Age inference using a hierarchical attention neural network | |
CN113807978A (en) | Hidden community attribute acquisition method and system based on attention-seeking neural network | |
Abdi et al. | Using an auxiliary dataset to improve emotion estimation in users’ opinions | |
Chou et al. | Rating prediction based on merge-CNN and concise attention review mining | |
Uttarwar et al. | Artificial intelligence based system for preliminary rounds of recruitment process | |
Mavaie et al. | Hybrid deep learning approach to improve classification of low-volume high-dimensional data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |