CN113537593A - Method and device for predicting voting tendency of agenda - Google Patents

Method and device for predicting voting tendency of agenda Download PDF

Info

Publication number
CN113537593A
CN113537593A CN202110803621.3A CN202110803621A CN113537593A CN 113537593 A CN113537593 A CN 113537593A CN 202110803621 A CN202110803621 A CN 202110803621A CN 113537593 A CN113537593 A CN 113537593A
Authority
CN
China
Prior art keywords
agenda
node
voting
topic
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110803621.3A
Other languages
Chinese (zh)
Inventor
魏忠钰
牟馨忆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Zhejiang Lab
Original Assignee
Fudan University
Zhejiang Lab
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University, Zhejiang Lab filed Critical Fudan University
Priority to CN202110803621.3A priority Critical patent/CN113537593A/en
Publication of CN113537593A publication Critical patent/CN113537593A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Strategic Management (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Economics (AREA)
  • Molecular Biology (AREA)
  • Human Resources & Organizations (AREA)
  • Development Economics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for predicting voting tendency of an agenda, which relate to the technical field of political voting, and comprise the following steps: establishing an agent node according to basic information of an agent, and establishing a speech node according to speech information issued by the agent in twitter; establishing a relationship between nodes; acquiring an initialization representation of a node; a heterogeneous graph convolution of the keyword-based speaker network; carrying out heterogeneous graph convolution on the talk network based on the topic labels; initializing text information based on the topic of the long-term and short-term memory network; and updating representations of the agenda and the discussion nodes by using a heterogeneous graph convolutional neural network, performing joint training through a triplet loss function, learning node representations and discussion topic representations in the heterogeneous graph, and measuring voting preference of the agenda on the discussion topic through the distance between the agenda and the discussion topic so as to predict voting tendency of the agenda on the discussion topic. The method and the device improve the performance of roll call trend prediction, and are suitable for prediction of newly-added agenda without voting records.

Description

Method and device for predicting voting tendency of agenda
Technical Field
The invention relates to the technical field of political voting, in particular to a method and a device for predicting voting tendency of a salesman.
Background
The goal of roll call voting prediction is to estimate their likely attitudes towards emerging issues using the history of the agenda votes. Since the political preferences and cultural background of the agenda have a great impact on their position and appeal, learning the presentation of the agenda from the roll call voting data has become an effective tool in predicting their voting propensity.
Previous research has promoted the prediction of the agenda singing vote mainly from two aspects. On the one hand, the text information of rich topics is applied to increase the characteristics of classification, on the other hand, the relationship is established among the agendas with the same voting, initiating and donation behaviors, and the performance of roll call prediction is greatly improved by integrating the agendas with similar political backgrounds.
However, for the first time an agent takes part in the voting of the issue, the lack of availability of a reference to the agent makes it difficult to obtain a representation of the agent with contextual information that embodies the interaction, resulting in a so-called cold start problem. Especially in the political field, expiry usually means renewal of the participating hospital or party. For example, more than 10% of the agenda on a given data set are newly selected. Furthermore, the voting behavior is only indicative of the resultant action of the agenda, while the content presented to the public on the social platform contains clues to their final selection. It is therefore valuable to explore the reasons behind their opinions and final decisions, and to facilitate the explanation of the subject process.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks of the prior art, embodiments of the present invention provide a method and an apparatus for predicting voting tendency of an attendee, which can improve the performance of roll call voting prediction and is suitable for those newly attended attendees.
The specific technical scheme of the embodiment of the invention is as follows:
a method of predicting a voting propensity of an agent, the method comprising:
establishing an agent node according to basic information of an agent, and establishing a speech node according to semantic information of speech issued by the agent;
establishing a relationship between nodes;
acquiring an initialization representation of a node;
carrying out word network heterogeneous graph convolution based on keywords;
carrying out language network heterogeneous graph convolution based on the topic label;
initializing text information based on the topic of the long-term and short-term memory network;
and updating representations of the agenda and the discussion nodes by using a heterogeneous graph convolutional neural network, performing joint training through a triplet loss function, learning node representations and discussion topic representations in the heterogeneous graph, and measuring voting preference of the agenda on the discussion topic through the distance between the agenda and the discussion topic so as to predict voting tendency of the agenda on the discussion topic.
Preferably, the basic information of the agenda includes member ID, state of belongings and political party; the speech issued by the agenda comprises speech text of the agenda on the twitter; the speaker nodes include at least one of keywords and topic labels.
Preferably, a heterogeneous graph model is constructed, wherein the heterogeneous graph comprises nodes, the establishment of the relationship among the nodes and the initialization of the nodes; there are three relationships of R1, R2 and R3 in the heterogeneous graph based on the keyword speaking network, and there are four relationships of R1, R2, R3 and R4 in the graph based on the tag speaking network; wherein, R1 represents the co-initiated issues between the agent nodes, and the weight thereof is the number of the issues co-initiated by two agents within a preset time; r2 represents the co-occurrence relation of the speaking nodes, and the weight is the co-occurrence times of two keywords or topic labels; r3 represents the relationship between the agenda node and the speaker node, with the weight of the number of times the agenda refers to the keyword or topic tag; r4 indicates the newsletter's feelings of texting under a certain topic.
Preferably, in the initialization representation of the step obtaining node, specifically: the basic information of the agenda is used to obtain its initial representation, which is obtained by concatenating its ID, state and political party information by the following formula:
Figure BDA0003165268110000021
XID(i) ID, X representing an agent iParty(i) Indicating the State, X, to which the Agendar i belongsstate(i) Representing the political party to which the agenda i belongs;
for the initial representation of the keyword, use the GloVe word vector;
for the topic label node, the average value of the GloVe word vectors of a preset number of high-frequency words in the tweet with the topic label is used.
Preferably, in the step of keyword-based speaker network heterogeneous graph convolution, the expressions of the agenda and the keyword are updated using the following formula:
Figure BDA0003165268110000031
Figure BDA0003165268110000032
wherein σ represents an activation function Sigmoid function; w1 (l)And
Figure BDA0003165268110000033
is the weight matrix of the l-th hidden layer; x(l)And Y(l)Is a node representation of level l; lambda [ alpha ]1And λ2Is a weighted hyperparameter;
Figure BDA0003165268110000034
and
Figure BDA0003165268110000035
standardized adjacencies of an agenda network and a speaking network, respectivelyA matrix;
Figure BDA0003165268110000036
and
Figure BDA0003165268110000037
normalized neighborhood matrices of edges from keyword to agent and from agent to keyword, respectively; x(l +1)And Y(l+1)Is a node representation of level l + 1.
Preferably, in the step of topic network heterogeneous graph convolution based on topic labels, a graph convolution neural network is generalized to process different relations between any pair of nodes, and different weight matrixes and normalization factors are used for each relation type, and the specific process is as follows:
Figure BDA0003165268110000038
wherein the content of the first and second substances,
Figure BDA0003165268110000039
is a set of neighbors of type r in relation to node i, ci,rIs a normalization factor, usually set to
Figure BDA00031652681100000310
R represents a set of relationship types, hi represents the hidden state of node i, hj is the hidden state of node j, hi(l)Indicating the hidden state of node i at the l-th level.
Preferably, the graph convolution operation based on the topic label is represented as follows:
Figure BDA0003165268110000041
Figure BDA0003165268110000042
wherein the content of the first and second substances,
Figure BDA0003165268110000043
and
Figure BDA0003165268110000044
representing a weight matrix;
Figure BDA0003165268110000045
and
Figure BDA0003165268110000046
denotes the normalization factor, NiA set of neighbors representing a node i,
Figure BDA0003165268110000047
is a set of neighbors of type r in relation to node i, xi、xj、xkRespectively representing the hidden states of the agenda nodes i, j, k, yi、yj、ykRespectively representing the hidden states of the speaking nodes i, j and k, and l represents the l-th layer.
Preferably, in the step of text information initialization based on the issues of the long-short term memory network, for which the title, description and summary are the direct text information available, the text information of the issues is compiled using the long-short term memory network to get its initial representation:
Xlgn(i)=LSTM(ti)
wherein, tiText information representing issue i, LSTM representing long-short term memory network.
Preferably, in the step of using the heterogeneous graph convolutional neural network to update the representations of the agenda and the speaking node, and performing joint training through a triplet loss function, learning the node representations and the representation of the issue in the heterogeneous graph, and measuring the voting preference of the agenda for the issue through the distance between the agenda and the issue to predict the voting tendency of the agenda for the issue, the method specifically comprises the following steps:
after the initialization representation is obtained, firstly, the representation of an agenda and a keyword or a topic label is updated through a heterogeneous graph convolutional neural network; then, the presentation of the agenda and the issue is jointly learned by a triple loss function, specifically, a batch of triples is sampled, each triplet is composed of a target issue a and a pair of agendas p and n, the voting result satisfies the VOTE (n, a) < VOTE (p, a), and the rating criterion of the voting result is NO < NOT VOTE < YES, the loss function is expressed as:
L=max(d(a,p)-d(a,n)+margin,0);
the agenda is ranked according to their distance from the issue and their choice is predicted according to the proportion of different tickets to the issue.
An apparatus for predicting a voting tendency of an agenda, comprising: a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, performs the steps of: a method of predicting a voting tendency of a human observer according to any one of claims 1 to 9.
The technical scheme of the invention has the following remarkable beneficial effects:
first, the present application is the first scheme combining the historical speech and voting of the agenda, skillfully representing the agenda and defining the relationship between the agenda, thereby greatly improving the accuracy of roll call trend prediction.
Secondly, a heterogeneous graph is constructed based on the common initiator relationship and the language similarity of the agenda, and a heterogeneous graph convolution model is provided to effectively learn the speaking of the agenda.
Third, further analysis demonstrates that a speaking network, a network of speaking nodes and an agent node, including an established heterogeneous graph containing information about the speaking of an agent, can provide a more specific indication of the agent, and the ability to alleviate cold-start problems to some extent.
Specific embodiments of the present invention are disclosed in detail with reference to the following description and drawings, indicating the manner in which the principles of the invention may be employed. It should be understood that the embodiments of the invention are not so limited in scope. The embodiments of the invention include many variations, modifications and equivalents within the spirit and scope of the appended claims. Features that are described and/or illustrated with respect to one embodiment may be used in the same way or in a similar way in one or more other embodiments, in combination with or instead of the features of the other embodiments.
Drawings
The drawings described herein are for illustration purposes only and are not intended to limit the scope of the present disclosure in any way. In addition, the shapes, the proportional sizes, and the like of the respective members in the drawings are merely schematic for facilitating the understanding of the present invention, and do not specifically limit the shapes, the proportional sizes, and the like of the respective members of the present invention. Those skilled in the art, having the benefit of the teachings of this invention, may choose from the various possible shapes and proportional sizes to implement the invention as a matter of case.
FIG. 1 is a flowchart illustrating the steps of a method for predicting voting tendencies of an interviewee in accordance with an embodiment of the present invention;
FIG. 2 is a block diagram of an embodiment of a method for predicting voting tendency of an adversary.
Detailed Description
The details of the present invention can be more clearly understood in conjunction with the accompanying drawings and the description of the embodiments of the present invention. However, the specific embodiments of the present invention described herein are for the purpose of illustration only and are not to be construed as limiting the invention in any way. Any possible variations based on the present invention may be conceived by the skilled person in the light of the teachings of the present invention, and these should be considered to fall within the scope of the present invention. It will be understood that when an element is referred to as being "disposed on" another element, it can be directly on the other element or intervening elements may also be present. When an element is referred to as being "connected" to another element, it can be directly connected to the other element or intervening elements may also be present. The terms "mounted," "connected," and "connected" are to be construed broadly and may include, for example, mechanical or electrical connections, communications between two elements, direct connections, indirect connections through intermediaries, and the like. The terms "vertical," "horizontal," "upper," "lower," "left," "right," and the like as used herein are for illustrative purposes only and do not denote a unique embodiment.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
In this application, the applicant has collected the agenda's historical opinions as an important extension in describing its political views and supplemented a hub connecting the agenda with similar emphasis and position. For example, when discussing whether partial streaming is prohibited or not, praise emphasizes protection of life, while the opposite emphasizes freedom of choice. It can be seen that the differences in vocabulary usage not only convey the speaker's conscious morphological beliefs, but can also be divided into different groups. By adding the speaking network, even a newly added agenda can clearly identify the group to which the new agenda belongs; the speaking network refers to that keywords or topic labels in the twitter text are regarded as nodes, and the relationship between the keywords or the topic labels is established to form a network.
First, a data set is constructed, for example, that includes historical voting records from 1993 to 2018 by an actual agenda. Each instance includes background information of the agenda, the content of the issue, and the final votes by the agenda for the issue, i.e., approval, disapproval, and abstain. At the same time, the data set may be expanded to include other Twitter language text information set for each agenda. Since Twitter began to prevail between agendas from 2011 or so, only 907,844 roll call records after 2011 were kept thereafter, involving an agenda of 978 and an agenda of 2,189.
To collect the agenda's public announcement, all of the agenda's personal data may be retrieved from the American Council website. Then, the twin account number of the agenda displayed on its homepage is extracted. For an agenda who does not have a Twitter account provided there, the name of the agenda may be manually searched on Twitter and his account number manually identified by examining his authentication information and personal introduction. Thus, the extended data set contains 735 member accounts, covering 79 members who cannot establish a co-originating issue relationship with others in the data set. Finally, a twitterscorper library can be used to crawl all tweets for these accounts to form an extended dataset.
As can be seen from the statistics of the corpus of expanded rosters related to the size of the agenda and their Twitter participation, the number of agendas participating in the roll vote each year, and the number of agendas who issued at least one tweet before the year, the number of agendas participating in the issue remains relatively constant, while an increasing number of agendas tend to share their opinions on Twitter, since social platforms are an important medium for winning support and follow. Overall, each agenda issued an average of 2,601 tweets, with over 51.2% of the agendas having Twitter records exceeding 2,000. Even more surprising, there are still a small number of agendas who issue more than 10,000 tweets.
In order to improve the performance of roll call voting prediction and be suitable for newly participating participants, a method for predicting voting tendency of the participants is proposed in the present application, fig. 1 is a flowchart of the method for predicting voting tendency of the participants in the embodiment of the present invention, and as shown in fig. 1, the method for predicting voting tendency of the participants may include the following steps:
s101, establishing an agent node according to basic information of an agent, and establishing a speech node according to semantic information in speech issued by the agent.
Fig. 2 is a general architecture of the method for predicting the voting tendency of the agenda in the embodiment of the present invention, and as shown in fig. 2, a heterogeneous graph model is constructed, where the heterogeneous graph includes nodes, the establishment of the relationship between the nodes, and the initialization of the nodes.
The model presented in this application relates to two types of nodes, one being an agenda node and the other being a talk node. And establishing an agenda node according to basic information of the agenda. Basic information for the agenda may include member ID, state of belongings and political party, etc. And establishing a speech node according to semantic information in the speech issued by the agenda. For example, the comments posted by the agenda may be the statements posted in the Twitter data.
The speaker nodes may be keywords or topic tags. When using keywords as speaker nodes, the top K highest frequency words can be extracted from a set of filtered tweets. Similarly, for the nodes of the topic tags, the top K highest frequency topic tags may be retained.
The reason for selecting the two types of nodes is their difference in representation capability and relationship construction role. The keyword is used as a node which is a simpler method, and the topic label is used as a more concise and clear expression mode, so that more information about the view of a legislator can be disclosed. In the step, the tweets containing keywords related to the legislation are selected, and all topic labels of the tweets are obtained. On one hand, the speech network constructed by the topic labels focuses more on specific topics. On the other hand, the topic tag network can reflect more complex relationships with legislators, because not only the number of times the legislator mentions the topic can be calculated, but also their sentiment score for the topic content can be calculated using an automated toolkit.
To build a more representative speaker network, keywords related to legislation may be applied to filter, preserving topic labels related to legislation. Since the text on Twitter is complex, possibly including other politically unrelated content, keywords may first be extracted from the topic text, leaving only the tweets containing these topic keywords. Furthermore, since some keywords related to issues, such as "congress" and "united states," are widely referred to by all legislators, these keywords are meaningless to the model in the present application because they do not distinguish the legislators well. Therefore, these words can be manually deleted when the model is built. When considering the topic labels as the speaking nodes, the top K most common topic labels may be similarly retained.
S102: relationships between the nodes are determined.
Four relationship types can be included in the heterogeneous graph model, 1, R1: the issues between the legislator nodes are co-sponsored with a weight that is the number of issues that two nodes have initiated together over a period of time (e.g., over the last four years). 2. R2: the co-occurrence of the nodes is said. This relationship needs to be considered because semantic associations exist between two tags that appear simultaneously. The weight of R2 can be defined as the number of times two keywords appear simultaneously in the tweet. 3. R3: the relationships between the legislator agenda and the talk nodes are used to form a bipartite graph. The weight of this relationship is defined as the number of times the agenda mentions the target keyword or topic tag. 4. R4: the agenda's emotion to the topic. The number of times an agenda mentions a particular topic tag may reflect his interest in the question, while emotion may quantitatively convey the position of the agenda for a certain subject or event. Thus, the weight of R4 is defined as the average sentiment score of an adversary's tweet with a particular topic tag. Here, a python library named TextBlob can be used for fast emotion analysis.
In this case, there are three relations of R1, R2, and R3 in the heterogeneous graph based on the keyword speaking network, and four relations of R1, R2, R3, and R4 in the graph based on the topic tagging speaking network. Keywords and topic tags can be used as a hub to connect the agenda in heterogeneous graphs.
S103: and initializing the node.
The nodes involved in the heterogeneous graph are initialized in the following way, taking into account the different functions.
The agent: the basic information of the agenda is used to obtain its initial representation. Specifically, the initial representation of the agenda is obtained by concatenating its member ID, state of affiliation and political party information by the following formula:
Figure BDA0003165268110000091
wherein, XID(i) ID, X representing an agent iParty(i) Indicating the State, X, to which the Agendar i belongsState(i) Representing the political party to which the agenda i belongs;
for the initial representation of the keyword, use the GloVe word vector;
topic label: the topic tags include more information than a single word. Thus, for the initial representation of the topic label node, the average of the GloVe word vectors of a preset number of the most common words in the tweet with the topic label is used, for example, the number may be 20, 30, 50, etc.
S104: and (3) carrying out word network heterogeneous graph convolution based on keywords.
Graph convolutional networks can explore passing and aggregating attributes from node neighbors to understand the representation of nodes in the graph. The heterogeneous graph convolution of keyword-based talk networks requires a new approach to feature delivery and aggregation in this graph due to the different types of nodes and edges in the graph. The personalized PageRank layer may be represented as follows:
Figure BDA0003165268110000092
Figure BDA0003165268110000093
wherein, X(l)And Y(l)Is a node representation of level l; lambda [ alpha ]1And λ2Is a weighted hyperparameter;
Figure BDA0003165268110000094
and
Figure BDA0003165268110000095
standardized adjacency matrices for the agenda network and the speaking network, respectively;
Figure BDA0003165268110000096
and
Figure BDA0003165268110000097
normalized neighborhood matrices of edges from keyword to agent and from agent to keyword, respectively; x(l+1)And Y(l+1)Is a node representation of level l + 1.
In the above process, only the user and the content information are focused on, and it is assumed that the weight of the adjacency matrix and the personalization matrix is the same. Therefore, the representations of the agenda and the keywords need to be further updated, and different weights are given to the two adjacency matrices, which are as follows:
Figure BDA0003165268110000101
Figure BDA0003165268110000102
where σ denotes an activation function, which may be, for example, a linear rectification (Sigmoid) function; w1 (l)And
Figure BDA0003165268110000103
is the weight matrix of the l-th hidden layer.
S105: and (3) carrying out word network heterogeneous graph convolution based on the topic labels.
For topic-tag based talk networks, there are two relationships between each "Agents-topic-tag" pair. Generalizing a conventional graph convolution neural network to process different relationships between any pair of nodes, and using different weight matrices and normalization factors for each relationship type, the specific process is as follows:
Figure BDA0003165268110000104
wherein the content of the first and second substances,
Figure BDA0003165268110000105
is a set of neighbors of type r in relation to node i, ci,rIs a normalization factor, usually set to
Figure BDA0003165268110000106
R represents a set of relationship types, hi represents the hidden state of node i, hj is the hidden state of node j, hi(l)Indicating the hidden state of node i at the l-th level.
In this case, the graphic operation based on the topic label can be expressed as follows:
Figure BDA0003165268110000107
Figure BDA0003165268110000108
wherein the content of the first and second substances,
Figure BDA0003165268110000111
and
Figure BDA0003165268110000112
representing a weight matrix;
Figure BDA0003165268110000113
and
Figure BDA0003165268110000114
denotes the normalization factor, NiA set of neighbors representing a node i,
Figure BDA0003165268110000115
is a set of neighbors of type r in relation to node i, xi、xj、xkRespectively representing the hidden states of the agenda nodes i, j, k, yi、yj、ykRespectively representing the hidden states of the speaking nodes i, j and k, and l represents the l-th layer.
S106: initializing text information based on the topic of the long-short term memory network.
For the subject matter, headings, descriptions, abstracts, etc. are direct textual information that is available, and therefore, long-short term memory networks (LSTM) are used to encode the textual information into an initial representation:
Xlgn(i)=LSTM(ti)
wherein, tiPresentation instrumentThe text message of topic i, LSTM, represents the long-short term memory network.
S107: and updating representations of the agenda and the discussion nodes by using a heterogeneous graph convolutional neural network, performing joint training through a triplet loss function, learning node representations and discussion topic representations in the heterogeneous graph, and measuring voting preference of the agenda on the discussion topic through the distance between the agenda and the discussion topic so as to predict voting tendency of the agenda on the discussion topic.
After obtaining the initialization representation, the initial representation (X) of the agenda may first be updated by means of a heterogeneous graph convolutional neural network (HGCN)kgt(i) And a representation of keywords or topic tags. Then, on the basis of the initial representation of the agenda, the initial representation of the keywords, the initial representation of the topic labels and the initial representation of the issue, the representation forms of the agenda and the issue are jointly learned through a triplet loss function. Specifically, a set of triples may be sampled, each triplet consisting of a target issue a and a pair of operators p and n whose voting result satisfies a vote (n, a)<Voted (p, a) and measured as NO<NOT VOTE<YES. The purpose of the triplet loss is to shorten the distance between the target issue a and the positive sample p and to push the negative sample n away from a so that it is more than 0 compared to the positive sample. Thus, the loss function can be expressed as:
L=max(d(a,p)-d(a,n)+margin,0)
since the voter's voting preference for an issue can be measured by the distance between them. The agenda may be ranked according to their distance and their choice predicted according to the proportion of different tickets to the issue. In the method of the present application, the voting rate is considered as a given input.
After initial representations of the agenda and the speech are obtained by splicing basic information of the agenda and average word vectors, neighbor information is transmitted and aggregated by using a heterogeneous graph convolutional neural network (HGCN), and updated representations of the agenda and the speech nodes are obtained. To project the agenda and the issue into the same vector space, then, training is done using triplet penalties. Specifically, a set of triples (a, p, n) is sampled at each training iteration to represent an issue a and the participants p and n participating in the voting for the issue, where the voting results for p and n satisfy the voting (n, a) for n for a < p for a (p, a), where the voting for the issue by the participants follows the rule NO < NOT VOTE < YEA. After the triplets are obtained, the representation (for the subject, the vector output by the LSTM, and for the agent, the vector output by the HGCN) is input into a triple loss function (namely L in the formula) to calculate the loss, and the triple loss function is trained through back propagation to update the parameters of the neural network, so that the representation of the agent and the subject is updated. The purpose of this is to make the issue more similar to the presentation of the advising agent, while keeping away the issue from the advising agent. In this way, distance is used to measure the preference of the agenda subjects, and in future predictions the agenda can be ranked according to distance and their selection predicted according to the voting rate.
Unlike the prior art, the method constructs an agenda talk network based on keywords and topic labels in the statement, and combines the agenda talk network with the existing agenda co-initiator network, so that a bipartite graph is established between the agenda and words. That is, the speaking network and the original sponsor network of the agenda can be regarded as two sub-graphs, and it is defined that R3 and R4 link these two nodes together, so as to establish a bipartite graph. The present application then employs a heterogeneous graph convolutional neural network (HGCN) to update both the presentation of the agenda and the words. After the long-short term memory network is used to encode the issue, the triple loss function is applied to jointly train the agenda and the issue.
The application has the following main beneficial effects:
the first scheme is that the first scheme combines the historical speech of the agenda, skillfully represents the agenda and defines the relationship between the agenda, thereby greatly improving the accuracy of roll call voting prediction.
Secondly, the method constructs a heterogeneous graph based on the mutual initiator relationship and the speech similarity of the agenda, and provides a heterogeneous graph convolution model to effectively learn the statement of the agenda.
A third, further analysis demonstrates the ability of the speaking network, a network of speaking nodes and agent nodes, including a heterogeneous graph built containing the agent's speaking information, to provide a more specific indication of the agent, and to alleviate the cold start problem to some extent.
The application also discloses a device for predicting the voting tendency of the agenda, which comprises: a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, performs the steps of: a method of predicting a voting tendency of an agenda as described in any one of the preceding claims.
All articles and references disclosed, including patent applications and publications, are hereby incorporated by reference for all purposes. The term "consisting essentially of …" describing a combination shall include the identified element, ingredient, component or step as well as other elements, ingredients, components or steps that do not materially affect the basic novel characteristics of the combination. The use of the terms "comprising" or "including" to describe combinations of elements, components, or steps herein also contemplates embodiments that consist essentially of such elements, components, or steps. By using the term "may" herein, it is intended to indicate that any of the described attributes that "may" include are optional. A plurality of elements, components, parts or steps can be provided by a single integrated element, component, part or step. Alternatively, a single integrated element, component, part or step may be divided into separate plural elements, components, parts or steps. The disclosure of "a" or "an" to describe an element, ingredient, component or step is not intended to foreclose other elements, ingredients, components or steps.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims (10)

1. A method of predicting a voting tendency of an agent, the method comprising:
establishing an agent node according to basic information of an agent, and establishing a speech node according to semantic information of speech issued by the agent;
establishing a relationship between nodes;
acquiring an initialization representation of a node;
carrying out word network heterogeneous graph convolution based on keywords;
carrying out language network heterogeneous graph convolution based on the topic label;
initializing text information based on the topic of the long-term and short-term memory network;
and updating representations of the agenda and the discussion nodes by using a heterogeneous graph convolutional neural network, performing joint training through a triplet loss function, learning node representations and discussion topic representations in the heterogeneous graph, and measuring voting preference of the agenda on the discussion topic through the distance between the agenda and the discussion topic so as to predict voting tendency of the agenda on the discussion topic.
2. A method of predicting an agenda voting propensity according to claim 1, wherein the basic information of the agenda includes member ID, state of affiliation and political party; the speech issued by the agenda comprises speech text of the agenda on the twitter; the speaking node comprises at least one of a keyword and a topic tag in the speaking text.
3. The method of predicting an agenda voting tendency of claim 2, wherein a heterogeneous graph model is constructed, the heterogeneous graph comprising nodes, the establishment of the relationship between the nodes, and the initialization of the nodes; there are three relationships of R1, R2 and R3 in the heterogeneous graph based on the keyword speaking network, and there are four relationships of R1, R2, R3 and R4 in the graph based on the tag speaking network; wherein, R1 represents the co-initiated issues between the agent nodes, and the weight thereof is the number of the issues co-initiated by two agents within a preset time; r2 represents the co-occurrence relation of the speaking nodes, and the weight is the co-occurrence times of two keywords or topic labels; r3 represents the relationship between the agenda node and the speaker node, with the weight of the number of times the agenda refers to the keyword or topic tag; r4 indicates the newsletter's feelings of texting under a certain topic.
4. The method of claim 2, wherein the initialization representation of the step acquisition node comprises: encoding the basic information of the agenda, and splicing the ID, the state and the political party information of the agenda to obtain the initial representation of the agenda by the following formula:
Figure FDA0003165268100000021
XID(i) ID, X representing an agent iParty(i) Indicating the State, X, to which the Agendar i belongsState(i) Representing the political party to which the agenda i belongs;
for the initial representation of the keyword, use the GloVe word vector;
for the topic label node, the average value of the GloVe word vectors of a preset number of high-frequency words in the tweet with the topic label is used.
5. The method of predicting an agenda voting tendency of claim 2, wherein in the step of keyword-based speaker network heterogeneous graph convolution, the expressions of the agenda and the keyword are updated using the following formulas:
Figure FDA0003165268100000022
Figure FDA0003165268100000023
wherein σ represents an activation function Sigmoid function; w1 (l)And W2 (l)Is the weight matrix of the l-th hidden layer; x(l)And Y(l)Is a node representation of level l; lambda [ alpha ]1And λ2Is a weighted hyperparameter;
Figure FDA0003165268100000031
and
Figure FDA0003165268100000032
standardized adjacency matrices for the agenda network and the speaking network, respectively;
Figure FDA0003165268100000033
and
Figure FDA0003165268100000034
normalized neighborhood matrices of edges from keyword to agent and from agent to keyword, respectively; x(l+1)And Y(l+1)Is a node representation of level l + 1.
6. The method of predicting an interviewer's voting tendency according to claim 5, wherein in the step of topic label based speaker network heterogeneous graph convolution, a traditional graph convolution neural network is generalized to handle different relationships between any pair of nodes and different weight matrices and normalization factors are used for each relationship type, as follows:
Figure FDA0003165268100000035
wherein the content of the first and second substances,
Figure FDA0003165268100000036
is a set of neighbors of type r in relation to node i, ci,rIs a normalization factor, usually set to
Figure FDA0003165268100000037
R represents a collection of relationship typesHi denotes the hidden state of node i, hj is the hidden state of node j, hi(l)Indicating the hidden state of node i at the l-th level.
7. The method of predicting an opinion voting tendency according to claim 6, wherein the graph convolution operation based on the topic label is represented as follows:
Figure FDA0003165268100000041
Figure FDA0003165268100000042
wherein the content of the first and second substances,
Figure FDA0003165268100000043
and
Figure FDA0003165268100000044
representing a weight matrix;
Figure FDA0003165268100000045
and
Figure FDA0003165268100000046
denotes the normalization factor, NiA set of neighbors representing a node i,
Figure FDA0003165268100000047
is a set of neighbors of type r in relation to node i, xi、xj、xkRespectively representing the hidden states of the agenda nodes i, j, k, yi、yj、ykRespectively representing the hidden states of the speaking nodes i, j and k, and l represents the l-th layer.
8. A method of predicting a voting tendency of an interviewee according to claim 7, characterized in that the text information of the interviewee based on the long-short term memory network is initialized in the step, for which the title, description and abstract are direct text information available, the text information of the interviewee is compiled using the long-short term memory network to obtain an initial representation thereof:
Xlgn(i)=LSTM(ti)
wherein, tiText information representing issue i, LSTM representing long-short term memory network.
9. The method for predicting voting propensity of an agenda according to claim 8, wherein the step of using the heterogeneous graph convolutional neural network to update representations of an agenda and a speaking node, performing joint training through a triplet loss function, learning the node representations and the expression of an issue in the heterogeneous graph, and measuring voting preference of the agenda on the issue through a distance between the agenda and the issue to predict voting propensity of the agenda on the issue specifically comprises:
after the initialization representation is obtained, firstly, the representation of an agenda and a keyword or a topic label is updated through a heterogeneous graph convolutional neural network; then, the presentation of the agenda and the issue is jointly learned by a triple loss function, specifically, a batch of triples is sampled, each triplet is composed of a target issue a and a pair of agendas p and n, the voting result satisfies the VOTE (n, a) < VOTE (p, a), and the rating criterion of the voting result is NO < NOT VOTE < YES, the loss function is expressed as:
L=max(d(a,p)-d(a,n)+margin,0);
the agenda is ranked according to their distance from the issue and their choice is predicted according to the proportion of different tickets to the issue.
10. An apparatus for predicting a voting tendency of an interviewer, comprising: a memory and a processor, the memory having stored therein a computer program that, when executed by the processor, performs the steps of: a method of predicting a voting tendency of a human observer according to any one of claims 1 to 9.
CN202110803621.3A 2021-07-15 2021-07-15 Method and device for predicting voting tendency of agenda Pending CN113537593A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110803621.3A CN113537593A (en) 2021-07-15 2021-07-15 Method and device for predicting voting tendency of agenda

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110803621.3A CN113537593A (en) 2021-07-15 2021-07-15 Method and device for predicting voting tendency of agenda

Publications (1)

Publication Number Publication Date
CN113537593A true CN113537593A (en) 2021-10-22

Family

ID=78128267

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110803621.3A Pending CN113537593A (en) 2021-07-15 2021-07-15 Method and device for predicting voting tendency of agenda

Country Status (1)

Country Link
CN (1) CN113537593A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114638195A (en) * 2022-01-21 2022-06-17 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-task learning-based position detection method

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0757015A (en) * 1993-08-13 1995-03-03 Center For Polytical Pub Relations:The Electronic voting system
US20110136559A1 (en) * 2009-12-09 2011-06-09 Glyn Mason Ottofy Political Persuasion Rating System, Politico: Liberal, Conservative, Evil Ranking
CN102308316A (en) * 2009-02-04 2012-01-04 有限公司呢哦派豆 System and method for collecting intention automatically
CN106796708A (en) * 2014-01-21 2017-05-31 申哲雨 Electronic voting system and method
CN109964446A (en) * 2018-06-08 2019-07-02 北京大学深圳研究生院 A kind of common recognition method based on ballot
CN110263164A (en) * 2019-06-13 2019-09-20 南京邮电大学 A kind of Sentiment orientation analysis method based on Model Fusion
CN111325326A (en) * 2020-02-21 2020-06-23 北京工业大学 Link prediction method based on heterogeneous network representation learning
CN112148875A (en) * 2020-08-03 2020-12-29 杭州中科睿鉴科技有限公司 Dispute detection method based on graph convolution neural network integration content and structure information
CN112398949A (en) * 2020-11-26 2021-02-23 卓尔智联(武汉)研究院有限公司 Transaction confirmation method, system, device and computer equipment
WO2021042543A1 (en) * 2019-09-04 2021-03-11 平安科技(深圳)有限公司 Multi-round dialogue semantic analysis method and system based on long short-term memory network
CN112862092A (en) * 2021-01-26 2021-05-28 中山大学 Training method, device, equipment and medium for heterogeneous graph convolution network

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH0757015A (en) * 1993-08-13 1995-03-03 Center For Polytical Pub Relations:The Electronic voting system
CN102308316A (en) * 2009-02-04 2012-01-04 有限公司呢哦派豆 System and method for collecting intention automatically
US20110136559A1 (en) * 2009-12-09 2011-06-09 Glyn Mason Ottofy Political Persuasion Rating System, Politico: Liberal, Conservative, Evil Ranking
CN106796708A (en) * 2014-01-21 2017-05-31 申哲雨 Electronic voting system and method
CN109964446A (en) * 2018-06-08 2019-07-02 北京大学深圳研究生院 A kind of common recognition method based on ballot
CN110263164A (en) * 2019-06-13 2019-09-20 南京邮电大学 A kind of Sentiment orientation analysis method based on Model Fusion
WO2021042543A1 (en) * 2019-09-04 2021-03-11 平安科技(深圳)有限公司 Multi-round dialogue semantic analysis method and system based on long short-term memory network
CN111325326A (en) * 2020-02-21 2020-06-23 北京工业大学 Link prediction method based on heterogeneous network representation learning
CN112148875A (en) * 2020-08-03 2020-12-29 杭州中科睿鉴科技有限公司 Dispute detection method based on graph convolution neural network integration content and structure information
CN112398949A (en) * 2020-11-26 2021-02-23 卓尔智联(武汉)研究院有限公司 Transaction confirmation method, system, device and computer equipment
CN112862092A (en) * 2021-01-26 2021-05-28 中山大学 Training method, device, equipment and medium for heterogeneous graph convolution network

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YUQIAO YANG ET AL.: "Joint Representation Learning of Legislator and Legislation for Roll Call Prediction", TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-20), pages 1424 - 1429 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114638195A (en) * 2022-01-21 2022-06-17 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-task learning-based position detection method
CN114638195B (en) * 2022-01-21 2022-11-18 哈尔滨工业大学(深圳)(哈尔滨工业大学深圳科技创新研究院) Multi-task learning-based ground detection method

Similar Documents

Publication Publication Date Title
Zhang et al. Can consumer-posted photos serve as a leading indicator of restaurant survival? Evidence from Yelp
Farnadi et al. Computational personality recognition in social media
Gloor et al. Web science 2.0: Identifying trends through semantic social network analysis
Kwok et al. Spreading social media messages on Facebook: An analysis of restaurant business-to-consumer communications
Neidhardt et al. Predicting happiness: user interactions and sentiment analysis in an online travel forum
Hodeghatta Sentiment analysis of Hollywood movies on Twitter
Xu et al. User memory reasoning for conversational recommendation
Furini et al. TSentiment: On gamifying Twitter sentiment analysis
Gelderblom et al. Mobile phone adoption: Do existing models adequately capture the actual usage of older adults?
WO2009086121A1 (en) Matching process system and method
Okazaki et al. How to mine brand Tweets: Procedural guidelines and pretest
Zhou et al. Guanxi or justice? An empirical study of WeChat voting
Shen et al. A voice of the customer real-time strategy: An integrated quality function deployment approach
Marcolin et al. Listening to the voice of the guest: A framework to improve decision-making processes with text data
CN112633690A (en) Service personnel information distribution method, service personnel information distribution device, computer equipment and storage medium
Lee et al. The moderating role of socio-semantic networks on online buzz diffusion
Barta Beacons over bridges: Hashtags, visibility, and sexual assault disclosure on social media
JP3892815B2 (en) Information processing system and method
Arava et al. Sentiment Analysis using deep learning for use in recommendation systems of various public media applications
CN113537593A (en) Method and device for predicting voting tendency of agenda
Contarello et al. Social thinking and the mobile phone: A study of social change with the diffusion of mobile phones, using a social representations framework
US20140222700A1 (en) Predictive pairing and/or matching systems, apparatus, and methods
Shi et al. Your preference or mine? A randomized field experiment on recommender systems in two-sided matching markets
Cannon et al. " Don't Downvote A\$\$\$\$\$\$ s!!": An Exploration of Reddit's Advice Communities
Celdir et al. Popularity bias in online dating platforms: Theory and empirical evidence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination