CN115098634A - Semantic dependency relationship fusion feature-based public opinion text sentiment analysis method - Google Patents

Semantic dependency relationship fusion feature-based public opinion text sentiment analysis method Download PDF

Info

Publication number
CN115098634A
CN115098634A CN202210744752.3A CN202210744752A CN115098634A CN 115098634 A CN115098634 A CN 115098634A CN 202210744752 A CN202210744752 A CN 202210744752A CN 115098634 A CN115098634 A CN 115098634A
Authority
CN
China
Prior art keywords
word
text
feature
vector
analyzed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210744752.3A
Other languages
Chinese (zh)
Other versions
CN115098634B (en
Inventor
李雨佟
周尚波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202210744752.3A priority Critical patent/CN115098634B/en
Publication of CN115098634A publication Critical patent/CN115098634A/en
Application granted granted Critical
Publication of CN115098634B publication Critical patent/CN115098634B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, it carries out feature coding processing aiming at the character granularity and the feature word granularity of the public sentiment text so as to extract the sentiment expression information with finer granularity in the public sentiment text, and extracting the dependency relationship information among the characteristic words by means of dependency syntax analysis to mine and reflect deeper relevance between fine-grained information in the public sentiment text, further fusing the information to obtain a dependency relationship fusion characteristic vector of the public sentiment text, the emotion classification prediction is carried out through the public opinion text emotion analysis model, emotion tendency characteristics contained in the public opinion text can be decomposed and transmitted in more dimensions, more details and deeper degrees, more accurate emotion classification prediction results can be obtained, and the emotion analysis accuracy of the public opinion text is further improved.

Description

Semantic dependency relationship fusion feature-based public opinion text emotion analysis method
Technical Field
The invention relates to the technical field of text information big data processing, in particular to a public sentiment text emotion analysis method based on semantic dependency relationship fusion characteristics.
Background
With the advent of the internet + age, the electronic text generated on the internet has also grown massively every day. How to effectively manage, mine and identify information features of the texts becomes a hot spot of people's attention at present. Particularly in the fields of forum information, product market research information, financial market information and the like, the development state and the dynamic trend of the corresponding field can be reflected by reflecting information of public opinion emotional tendency in information text data searched by big data, so that the attention degree of users, field participants and supervisors to field events, product conditions and financial enterprises is influenced. These electronic texts with information of public sentiment tendency are also referred to as public sentiment texts for short, and analysis of public sentiment tendency of the public sentiment texts is an important technical means for solving the above problems.
The traditional public opinion emotional tendency analysis method is mainly based on an emotion dictionary and a machine learning algorithm, the emotion dictionary analyzes corresponding text emotion polarity through the number and proportion of positive and negative emotion words in a public opinion text, and the machine learning method mainly comprises a support vector machine, naive Bayes, a K-neighbor algorithm and the like. However, the traditional machine learning algorithm not only needs to manually extract texts and consumes a large amount of manpower and material resources, but also has certain subjectivity in characteristics extracted manually, cannot fully integrate semantic information and multi-scale information of the texts, and has great influence on the accuracy of public opinion analysis.
In recent years, due to the rise of deep learning, an artificial neural network can automatically extract text features, a method based on deep learning is also applied to public opinion text analysis, a Convolutional Neural Network (CNN), a long-short term memory network (RNN) and a Transformer network are common, a mainstream method LSTM for text analysis is well known, the problem of gradient disappearance in the neural network is relieved by adding a gating mechanism, so that the neural network can pay attention to context information of sentences and effectively extract full text information, but the method is insufficient in extraction of local key information and has the problem of gradient explosion.
With the popularity of fine tuning normal forms based on pre-training models, large pre-training models such as BERT and Glove are also applied to public sentiment analysis, both the transform network and the BERT are based on a multi-head attention mechanism, great success is achieved in the aspect of extracting features of European space data, the context relation of sentences can be better noticed in two directions compared with LSTM, and semantic information of words in the sentences can be better learned. However, many public opinion text data are generated from non-euclidean space, and a core assumption of the conventional deep learning method is that data samples are independent from each other, and all situations in which there is a correlation between data samples cannot be processed.
At present, the public opinion analysis is mainly to perform entity extraction task firstly and then perform a streamline mode of an emotion analysis task, the emotion analysis task is mainly to classify coarse-grained sentence-level emotions, the emotion tendency of finer-grained information in texts is not sufficiently analyzed, deeper mining and the guiding effect of deeper correlation between the fine-grained information in the texts on emotion polarity are not deeply mined, and the accuracy of the emotion analysis of the public opinion texts is also insufficient.
In summary, how to add the priori knowledge of linguistics to the public sentiment text and increase the relevance among multiple tasks to improve the accuracy of public sentiment text analysis becomes a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above disadvantages in the prior art, an object of the present invention is to provide a method for analyzing a public sentiment text emotion based on semantic dependency relationship fusion characteristics, so as to perform more accurate analysis and prediction on emotional tendency of the public sentiment text
In order to solve the technical problems, the invention adopts the following technical scheme:
a public opinion text emotion analysis method based on semantic dependency relationship fusion features comprises the following steps:
s1: acquiring a public opinion text to be analyzed;
s2: carrying out word granularity word vector coding on the public opinion text to be analyzed to obtain a word granularity coding vector of the public opinion text to be analyzed;
s3: performing word segmentation and dependency syntactic analysis on the public sentiment text to be analyzed to obtain feature words of the segmentation and dependency relationship information among the feature words, and performing word embedding joint coding to obtain a word embedding joint coding vector carrying the dependency relationship information of the meaning of the feature words of the public sentiment text to be analyzed;
s4: splicing and fusing the word granularity coding vector and the word embedding joint coding vector of the public opinion text to be analyzed to serve as a dependency relationship fusion characteristic vector of the public opinion text to be analyzed;
s5: and inputting the dependency relationship of the public opinion text to be analyzed and the feature vector into a pre-trained public opinion text emotion analysis model to obtain an emotion classification prediction result of the public opinion text to be analyzed.
In the above public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, as a preferred scheme, in step S1, preprocessing is further included, where the preprocessing includes one or more of processing for correcting a wrong word, processing for correcting a wrong symbol, processing for correcting a wrong grammar, and processing for consistency of synonym expression of the public opinion text.
In the above public opinion text emotion analysis method based on semantic dependency relationship fusion features, as a preferred scheme, the step S2 specifically includes: carrying out word granularity decomposition on the public opinion text to be analyzed, carrying out word vector coding on each word obtained by decomposition by adopting a BERT model, and obtaining a word granularity coding vector of the public opinion text to be analyzed:
Figure BDA0003716580250000021
in the formula,
Figure BDA0003716580250000022
a word granularity coding vector for expressing the public opinion text to be analyzed,
Figure BDA0003716580250000023
representing a coding vector of each word in the public sentiment text to be analyzed in the dimension of the mth hidden layer, wherein M belongs to {1,2, …, M }, and M represents a coding dimension for coding the word vector by adopting a BERT model; n represents public sentiment text to be analyzedThe number of words contained in the book.
In the above public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, as a preferred scheme, the step S3 specifically includes the following steps:
s301: performing segmentation and dependency syntactic analysis processing on the public sentiment text to be analyzed to obtain feature words of the segmentation and dependency relationship information among the feature words;
s302: respectively determining the parts of speech of each characteristic word in the public opinion text to be analyzed, taking the coding dimension size M of the word granularity coding vector of the public opinion text to be analyzed as the coding dimension size of word embedding coding, respectively carrying out word embedding coding on the part of speech information of each characteristic word in the public opinion text to be analyzed and the corresponding dependency relationship information thereof, and obtaining the characteristic word coding vector and the dependency relationship coding vector of the public opinion text to be analyzed:
Figure BDA0003716580250000031
in the formula, W p
Figure BDA0003716580250000032
Respectively coding a characteristic word coding vector and a dependency relation coding vector of the public sentiment text to be analyzed,
Figure BDA0003716580250000033
embedding a coding vector for the part-of-speech information of each characteristic word in the public opinion text to be analyzed in the m-th coding dimension,
Figure BDA0003716580250000034
embedding a coding vector in a word of an M-th coding dimension for a dependency relationship corresponding to each feature word in the public sentiment text to be analyzed, wherein M belongs to {1,2, …, M }, M represents a coding dimension size of word embedding coding, and d len Representing the total number of the feature words obtained by dividing words in the public sentiment text to be analyzed;
s303: feature word coding vector W of public opinion text to be analyzed p Sum dependency encoding vector W d Formed by combinationAnd matrix W pd Inputting the data into a preset relational graph attention coding network for coding to obtain a corresponding joint coding matrix:
Figure BDA0003716580250000035
in the formula,
Figure BDA0003716580250000036
representing the resulting joint coding matrix;
Figure BDA0003716580250000037
the method comprises the steps of representing the part of speech of each characteristic word in the public sentiment text to be analyzed and the joint coding vector of the dependency relationship corresponding to the part of speech in the mth coding dimension;
the relational graph attention coding network comprises a multi-head attention mechanism layer, a linear layer and a point-by-point convolution layer which are sequentially connected;
s304: for the joint coding matrix
Figure BDA0003716580250000038
Performing pooling treatment to obtain word embedded joint coding vector H of public opinion text to be analyzed graph
H graph ={h g,1 ,h g,2 ,…,h g,m ,…,h g,M };
Wherein the words are embedded into the joint coding vector
Figure BDA0003716580250000039
A joint coding vector of the part of speech of each characteristic word in the public sentiment text to be analyzed and the corresponding dependency relationship in the m-th coding dimension
Figure BDA00037165802500000310
The pooling value of (a).
In the above public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, as a preferred scheme, in step S4, the dependency relationship fusion characteristic vector obtained by splicing and fusing the word granularity coding vector and the word embedding joint coding vector of the public opinion text to be analyzed is:
Figure BDA00037165802500000311
in the formula,
Figure BDA0003716580250000041
fusing a characteristic vector for the dependence relationship of the public sentiment text to be analyzed,
Figure BDA0003716580250000042
word granularity code vector representing public sentiment text to be analyzed, M table word granularity code vector H BERT N represents the number of words contained in the public opinion text to be analyzed; symbol
Figure BDA0003716580250000047
Representing a product operation element by element; g represents a splicing fusion function of a word granularity coding vector and a word embedding joint coding vector of the public sentiment text to be analyzed, and the splicing fusion function comprises the following components:
g=σ(W g [H BERT :H graph ]+b g );
where σ (·) denotes a sigmoid activation function, W g And b g Respectively representing a weight matrix and an offset vector of sigmoid activation,
Figure BDA0003716580250000043
representing word-embedding into a joint coded vector
Figure BDA0003716580250000044
Respectively spliced to word-granularity encoded vectors
Figure BDA0003716580250000045
Of the n word dimensions of
Figure BDA0003716580250000046
The concatenation vector of (1).
In the above public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, as a preferred scheme, the public opinion text sentiment analysis model in step S5 includes an entity characteristic extraction network layer, an aspect characteristic extraction network layer, an emotional tendency characteristic extraction network layer, a dependency characteristic matrix fusion network layer, a full connection layer and a classifier network layer;
the entity feature extraction network layer is used for extracting an entity feature vector X from the input dependency relationship fusion feature vector E Outputting the data to a dependency characteristic matrix fusion network layer and outputting the data to a classifier network layer through a full connection layer;
the aspect feature extraction network layer is used for extracting an aspect feature vector X from the input dependency relationship fusion feature vector A Outputting the data to a dependency characteristic matrix fusion network layer and outputting the data to a classifier network layer through a full connection layer;
the emotional tendency feature extraction network layer is used for extracting an emotional tendency feature vector X from the input dependency relationship fusion feature vector SC Outputting to a dependency characteristic matrix fusion network layer;
the dependency feature matrix fusion network layer is used for fusing the network layer based on the entity feature vector X E And the aspect feature vector X A Extracting relation characteristic matrix X of aspect characteristic and entity characteristic A2E Then the relation feature matrix X is used A2E And emotional tendency feature vector X SC Performing fusion processing to obtain emotion and dependency relationship fusion characteristic matrix X S Outputting the data to a classifier network layer through a full connection layer;
the classifier network layer is used for analyzing an entity feature vector X of the public opinion text E Aspect feature vector X A And emotion and dependency relationship fusion feature matrix X S And as a classification characteristic, carrying out emotion classification on the public sentiment text to be analyzed, and outputting an emotion classification prediction result of the public sentiment text to be analyzed.
In the above public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, as an optimal scheme, the entity characteristic extraction network layer is a one-dimensional convolution network that performs entity word characteristic training in advance by means of an entity dictionary;
the aspect feature extraction network layer is a one-dimensional convolution network which conducts aspect feature training in advance by means of aspect feature labeling information of entity words in an entity dictionary;
the emotional tendency feature extraction network layer is a long-short term memory artificial neural network for carrying out emotional tendency category training in advance by means of emotional tendency marking information of entity words in an entity dictionary;
the classifier network layer is a classifier model which carries out text emotion classification prediction training in advance.
In the above public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, as an optimal scheme, the dependency characteristic matrix fusion network layer extracts the relationship characteristic matrix X of the aspect characteristics and the entity characteristics A2E Comprises the following steps:
Figure BDA0003716580250000051
wherein,
Figure BDA0003716580250000052
is a relational feature matrix X A2E Relation characteristics of aspect characteristics and entity characteristics corresponding to the ith characteristic word, i e {1,2, …, d len },d len Representing the total number of the feature words obtained by dividing words in the public sentiment text to be analyzed; and has the following components:
Figure BDA0003716580250000053
wherein,
Figure BDA0003716580250000054
is an aspect feature vector X A The aspect characteristic corresponding to the ith characteristic word;
Figure BDA0003716580250000055
is the ith feature word and the jthAspect-oriented attention matrix between feature words, j ∈ {1,2, …, d len And:
Figure BDA0003716580250000056
Figure BDA0003716580250000057
for the aspect-oriented relationship between the ith feature word and the jth feature word:
Figure BDA0003716580250000058
Figure BDA0003716580250000059
for entity feature vector X E Entity characteristics corresponding to the jth characteristic word; t is the transposed symbol.
In the above public opinion text emotion analysis method based on semantic dependency relationship fusion characteristics, as an optimal scheme, the dependency characteristic matrix is fused with the emotion and dependency relationship fusion characteristic matrix X obtained by network layer fusion processing S Comprises the following steps:
Figure BDA00037165802500000510
wherein,
Figure BDA00037165802500000511
fusing feature matrix X for dependencies S The dependency relationship corresponding to the ith feature word fuses the features, i belongs to {1,2, …, d ∈ [ ] len },d len Representing the total number of the feature words obtained by dividing the words in the public sentiment text to be analyzed;
Figure BDA00037165802500000512
attention moment array representing aspect orientation between ith feature word and jth feature word
Figure BDA00037165802500000513
Dependency fusion feature corresponding to ith feature word
Figure BDA00037165802500000514
The dimension formed by dimension splicing of the feature words is
Figure BDA00037165802500000515
Splicing the vectors; and:
Figure BDA00037165802500000516
wherein,
Figure BDA00037165802500000517
as an emotional tendency feature vector X SC The emotional tendency characteristics corresponding to the ith characteristic word;
Figure BDA00037165802500000518
for the attention matrix facing emotional tendency between the ith feature word and the jth feature word, j is formed by {1,2, …, d len And:
Figure BDA0003716580250000061
Figure BDA0003716580250000062
the self-attention mechanism facing the emotional tendency between the ith characteristic word and the jth characteristic word is as follows:
Figure BDA0003716580250000063
h i a self-attention mechanism for the ith feature word; i-j represents the dependency relationship distance I-j between the ith characteristic word and the jth characteristic word in the public sentiment text to be analyzed;t is the transposed symbol.
In the above public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, as an optimal scheme, the public opinion text sentiment analysis model is obtained by adopting a random gradient descent method for training optimization, and a loss function adopted by the training optimization is as follows:
Figure BDA0003716580250000064
j represents the total number of label types marked by the sample public sentiment text, and N represents the number of subtasks needing to be trained in the public sentiment text sentiment analysis model;
Figure BDA0003716580250000065
respectively representing a real label and a prediction result label of a sample public sentiment text on the jth label category of the ith subtask to be trained; mu is a hyperparameter of the loss function, and theta is a weight parameter of the public opinion text sentiment analysis model; i | · | purple wind 2 The expression L2 norm operation.
Compared with the prior art, the invention has the following beneficial effects:
1. the invention discloses a public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, which is characterized in that public opinion texts are subjected to characteristic coding processing in word granularity and characteristic word granularity to extract sentiment expression information with finer granularity in the public opinion texts, dependency relationship information among characteristic words is extracted by means of dependency syntax analysis to mine and reflect deeper correlation among the fine granularity information in the public opinion texts, the information is further fused to obtain dependency relationship fusion characteristic vectors of the public opinion texts, sentiment classification prediction is carried out through a public opinion text sentiment analysis model, more dimensionality, more detail and more deep decomposition and sentiment tendency characteristics contained in the public opinion texts can be carried out, and therefore, more accurate public opinion text sentiment analysis results can be obtained.
2. The public opinion text sentiment analysis method of the invention decomposes and transmits sentiment tendency characteristics contained in the public opinion text from more dimensions, more details and more depth, such as character granularity, characteristic word granularity, dependency relationship among characteristic words, and the like, and transmits the sentiment tendency characteristic information to a dependency relationship fusion characteristic vector of the public opinion text layer by layer, and forms a sentiment and dependency relationship fusion characteristic matrix by fusing a dependency characteristic matrix fusion network layer after the characteristics of an entity characteristic extraction network layer, an aspect characteristic extraction network layer and a sentiment tendency characteristic extraction network layer in a public opinion text sentiment analysis model are mined, thereby fully reflecting the influence of the character granularity, the characteristic word granularity and the dependency relationship among the characteristic words on the sentiment tendency of the public opinion text, and further obtaining more accurate sentiment classification prediction results by performing sentiment analysis on the public opinion text by means of the characteristics, and the emotion analysis accuracy of the public sentiment text is further improved.
3. The method is helpful for better assisting in distinguishing the subsequent emotion types of the public opinion texts and in applying public opinion trends and trend analysis.
Drawings
FIG. 1 is a flow chart of a public sentiment text sentiment analysis method based on semantic dependency relationship fusion characteristics according to the invention;
fig. 2 is an exemplary diagram of a public sentiment text semantic dependency tree applied in the public sentiment text sentiment analysis method of the present invention;
fig. 3 is an exemplary diagram of a network structure of a public opinion text emotion analysis model applied in the public opinion text emotion analysis method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following description is made by way of specific embodiments with reference to the accompanying drawings.
The invention provides a public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, as shown in figure 1, the method comprises the following steps:
s1: acquiring a public opinion text to be analyzed;
s2: carrying out word granularity word vector coding on the public opinion text to be analyzed to obtain a word granularity coding vector of the public opinion text to be analyzed;
s3: performing segmentation and dependency syntactic analysis on the public sentiment text to be analyzed to obtain the feature words of the segmentation and dependency relationship information among the feature words, and then performing word embedding joint coding to obtain a word embedding joint coding vector carrying the dependency relationship information of the meaning of the feature words of the public sentiment text to be analyzed;
s4: splicing and fusing a word granularity coding vector and a word embedding joint coding vector of the public sentiment text to be analyzed to serve as a dependency relationship fusion characteristic vector of the public sentiment text to be analyzed;
s5: and inputting the dependency relationship of the public opinion text to be analyzed and the feature vector into a pre-trained public opinion text emotion analysis model to obtain an emotion classification prediction result of the public opinion text to be analyzed.
It should be noted that the public opinion text sentiment analysis method based on the semantic dependency relationship fusion feature of the present invention may generate corresponding software code or software service in a program programming manner, and further may be executed and implemented on a server and a computer.
The invention discloses a public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics, which is characterized in that public opinion texts are subjected to characteristic coding processing in word granularity and characteristic word granularity to extract sentiment expression information with finer granularity in the public opinion texts, dependency relationship information among characteristic words is extracted by means of dependency syntax analysis to mine and reflect deeper correlation among the fine granularity information in the public opinion texts, the information is further fused to obtain dependency relationship fusion characteristic vectors of the public opinion texts, sentiment classification prediction is carried out through a public opinion text sentiment analysis model, more dimensionality, more detail and more deep decomposition and sentiment tendency characteristics contained in the public opinion texts can be carried out, and therefore, more accurate public opinion text sentiment analysis results can be obtained.
In step S1 of the text feature extraction method of the present invention, the to-be-analyzed public sentiment text is the corresponding public sentiment text information to be analyzed. In specific application implementation, after the public opinion text to be analyzed is obtained according to different specific situations, the public opinion text to be analyzed can be preprocessed, wherein the preprocessing mode comprises one or more of wrongly written character correcting processing, wrongly signed symbol correcting processing, wrongly grammatical correcting processing and synonym expression consistency processing on the public opinion text; because the wrongly written characters, wrong symbols, wrong grammar, different expression modes of synonyms and the like in the public opinion text can cause obstacles to the subsequent word segmentation and word segmentation processing of the public opinion text, and influence the decomposition of the words and the judgment of the subsequent emotion analysis, the different situations can be corrected by preprocessing, and the influence on the emotion analysis accuracy of the public opinion text is reduced.
The public opinion text sentiment analysis method comprises the following steps of S2:
carrying out word granularity decomposition on the public opinion text to be analyzed, and carrying out word vector coding on each word obtained by decomposition by adopting a BERT model to obtain a word granularity coding vector of the public opinion text to be analyzed:
Figure BDA0003716580250000081
in the formula,
Figure BDA0003716580250000082
a word granularity coding vector for expressing the public opinion text to be analyzed,
Figure BDA0003716580250000083
representing a coding vector of each word in the public sentiment text to be analyzed in the dimension of the mth hidden layer, wherein M belongs to {1,2, …, M }, and M represents a coding dimension for coding the word vector by adopting a BERT model; n represents the number of words contained in the public opinion text to be analyzed.
When the method is applied specifically, a user-defined word list or word dictionary can be established through public sentiment text big data in the same application field, and a BERT pre-training frame model is used for carrying out corresponding fine tuning on parameters of the word list or word dictionary, so that word vector coding can be carried out on each word of the public sentiment text by using the BERT model; by setting the hidden layer size (e.g., M layers) of the BERT model, the encoding dimension (correspondingly M dimension) of word-granular word vector encoding can be set.
The step S3 in the public opinion text sentiment analysis method specifically comprises the following steps:
s301: and performing word segmentation and dependency syntactic analysis processing on the public sentiment text to be analyzed to obtain feature words of the word segmentation and dependency relationship information among the feature words.
When the method is applied to implementation, text segmentation and dependency parsing can be performed by means of some existing natural language processing tools, for example, a custom feature dictionary can be introduced in a custom dictionary by using a custom dictionary add mode of a Hanlp tool (Chinese language processing toolkit), a dependency parsing model can be loaded by using a JClass module of the Hanlp tool, and the segmentation of sentences in the text and the dependency relationship among the segmentation are obtained by direct calculation and analysis.
Dependency Parsing (DP) reveals the syntactic structure of a linguistic unit by analyzing the dependencies between components within the linguistic unit. Intuitively speaking, the grammatical components of the main predicate object and the fixed shape complement in the sentence are analyzed, and the relationship of each component is analyzed. The dependency syntax considers that a verb in the 'predicate' is the core of a sentence, and other components are directly or indirectly connected with the verb. Therefore, the words in the dependency relationship of 'main predicate' and 'moving object' usually have important meanings on the semantic expression of sentences. The more the word depends on, the more the word represents that the word has an important role in the sentence to some extent.
When the method is applied specifically, in order to more conveniently record the characteristic words obtained by dividing the words of the public sentiment text and the dependency relationship information among the characteristic words, a semantic dependency tree of the public sentiment text can be constructed to record the information; the characteristic words of the public sentiment text are used as layer nodes in the semantic dependency tree, and the dependency relationship among the characteristic words is presented as the connection relationship among the corresponding nodes in the semantic dependency tree, so that a network relationship graph with a tree structure is formed, and the semantic dependency tree of the public sentiment text is obtained. The semantic dependency tree may be generated using prior art natural language processing tools such as Stanfordnlp.
For example, a public opinion text "Root peer science and technology formal" smoothly obtains a credible cloud enterprise-level SaaS service evaluation certification "through credible cloud evaluation and assessment initiated by the institute of information and communications in china", and a semantic dependency tree is constructed after word segmentation and dependency syntax analysis processing, as shown in fig. 2.
S302: respectively determining the parts of speech of each characteristic word in the public opinion text to be analyzed, taking the coding dimension size M of the word granularity coding vector of the public opinion text to be analyzed as the coding dimension size of word embedding coding, respectively carrying out word embedding coding on the part of speech information of each characteristic word in the public opinion text to be analyzed and the corresponding dependency relationship information thereof, and obtaining the characteristic word coding vector and the dependency relationship coding vector of the public opinion text to be analyzed:
Figure BDA0003716580250000091
in the formula, W p
Figure BDA0003716580250000092
Respectively coding a characteristic word coding vector and a dependency relation coding vector of the public sentiment text to be analyzed,
Figure BDA0003716580250000093
embedding a coding vector for the part-of-speech information of each characteristic word in the public opinion text to be analyzed in the m-th coding dimension,
Figure BDA0003716580250000094
embedding a coding vector in a word of an M-th coding dimension for a dependency relationship corresponding to each feature word in the public sentiment text to be analyzed, wherein M belongs to {1,2, …, M }, M represents a coding dimension size of word embedding coding, and d len And the total number of the characteristic words obtained by dividing the words in the public opinion text to be analyzed is represented.
When the specific application is implemented, the d in the public opinion text to be analyzed is obtained through word segmentation processing len The characteristic words can determine the part of speech of each characteristic word in the public sentiment text to be analyzed through a user-defined characteristic dictionary established by the big data analysis of the public sentiment text in the same application field; the part of speech refers to the emotional tendency expressed by the feature words, and can be classified into positive emotional tendency, negative emotional tendency, neutral, and the like. Then, word Embedding coding processing tools such as an Embedding module in the natural language processing program catalog can be used for carrying out word Embedding coding on the part of speech information of each characteristic word and the dependency relationship information corresponding to each characteristic word.
S303: feature word coding vector W of public opinion text to be analyzed p Sum dependency encoding vector W d Combined to form a sum matrix W pd Inputting the data into a preset relation graph attention coding network for coding to obtain a corresponding joint coding matrix:
Figure BDA0003716580250000101
in the formula,
Figure BDA0003716580250000102
representing the resulting joint coding matrix;
Figure BDA0003716580250000103
and the joint coding vector represents the part of speech of each characteristic word in the public sentiment text to be analyzed and the corresponding dependency relationship in the mth coding dimension.
When the relational graph attention coding network is applied to implementation, the relational graph attention coding network can be designed into a multilayer network comprising a multi-head attention mechanism layer, a line connection layer and a point-by-point convolution layer which are connected in sequence. The multi-head attention mechanism, the point-by-point convolution layer and the like are network layer structures which are commonly used in the neural network model in the prior art.
Therein, the multi-head attention mechanism layer may be described as:
Figure BDA0003716580250000104
wherein,
Figure BDA0003716580250000105
represents the output vector of the ith layer node (corresponding to the matrix coding vector of the ith characteristic word) in the semantic dependency tree at the ith attention layer,
Figure BDA0003716580250000106
representing sums in a semantic dependency tree
Figure BDA0003716580250000107
The output vectors of the j adjacent layer nodes (corresponding to the matrix coding vector of the j characteristic word) with dependency connection relationship in the l-1 attention layer (i.e. having dependency relationship with each other), N (i) Representing ith level node in semantic dependency tree
Figure BDA0003716580250000108
The total number of nodes of the adjacent layer of the first attention layer; | | denotes the vector join operator,
Figure BDA0003716580250000109
the method comprises the steps of representing a parameter matrix of a Z-th attention head corresponding to the ith attention layer in the multi-head attention mechanism, wherein Z belongs to {1,2, …, Z }, and Z represents the total number of the attention heads in the multi-head attention mechanism; σ (-) denotes a sigmoid activation function;
Figure BDA00037165802500001010
representing the output of the ith layer node in the semantic dependency tree at the ith attention layerVector to its neighbor layer node
Figure BDA00037165802500001011
Degree of dependence of (c):
Figure BDA00037165802500001012
wherein,
Figure BDA00037165802500001013
the first attention layer corresponds to the z-th attention head in the multi-head attention mechanism; t is the transposed symbol.
Connecting the output vectors of the layer nodes in the semantic dependency tree at the ith attention layer through the connecting layer to form a combined output vector h of the layer nodes in the semantic dependency tree at the ith attention layer l (ii) a Thereby, the joint output vector of each attention layer in the multi-head attention mechanism is obtained.
The point-by-point convolutional layer can be described as:
PCT(h l )=δ(h l *W P1 +b P1 )*W P2 +b P2
where δ (·) denotes the ReLU activation function, the symbol denotes the one-dimensional convolution operation, W P1 、W P2 Convolution kernel weights corresponding to two one-dimensional convolution operations, b P1 、b P2 Convolution kernel offsets corresponding to two times of one-dimensional convolution operation are respectively obtained; PCT (h) l ) Representing a joint output vector h l Point-by-point convolution vector of (1). A matrix formed by a set of point-by-point convolution vectors of each attention layer is used for obtaining a joint coding matrix
Figure BDA0003716580250000111
S304: for the joint coding matrix
Figure BDA0003716580250000112
Performing pooling treatment of coding dimension to obtain public sentiment to be analyzedWord-embedded joint coding vector H of the book graph
H graph ={h g,1 ,h g,2 ,…,h g,m ,…,h g,M };
Wherein the words are embedded into the joint coding vector
Figure BDA0003716580250000113
A joint coding vector of the part of speech of each characteristic word in the public sentiment text to be analyzed and the corresponding dependency relationship in the m-th coding dimension
Figure BDA0003716580250000114
The pooling value of (a).
In step S4 of the public sentiment text sentiment analysis method of the present invention, a dependency relationship fusion feature vector obtained by splicing and fusing a word granularity coding vector and a word embedding joint coding vector of a public sentiment text to be analyzed is:
Figure BDA0003716580250000115
in the formula,
Figure BDA0003716580250000116
fusing a characteristic vector for the dependence relationship of the public sentiment text to be analyzed,
Figure BDA0003716580250000117
word granularity code vector representing public sentiment text to be analyzed, M table word granularity code vector H BERT N represents the number of words contained in the public opinion text to be analyzed; symbol
Figure BDA0003716580250000118
Representing a product operation element by element; g represents a splicing fusion function of a word granularity coding vector and a word embedding joint coding vector of the public sentiment text to be analyzed, and comprises the following steps:
g=σ(W g [H BERT :H graph ]+b g );
wherein, σ (·)) Denotes the sigmoid activation function, W g And b g Respectively representing a weight matrix and an offset vector of sigmoid activation,
Figure BDA0003716580250000119
representing word-embedding into a joint coded vector
Figure BDA00037165802500001110
Respectively spliced to word-granularity encoded vectors
Figure BDA00037165802500001111
Of the n word dimensions of
Figure BDA00037165802500001112
The concatenation vector of (1).
In step S5 of the method for analyzing the sentiment of the public sentiment text, a sentiment classification prediction is performed on the to-be-analyzed public sentiment text by using a public sentiment text sentiment analysis model. Specifically, the public opinion text sentiment analysis model comprises an entity feature extraction network layer, an aspect feature extraction network layer, an emotional tendency feature extraction network layer, a dependency feature matrix fusion network layer, a full connection layer and a classifier network layer. Wherein the entity feature extraction network layer is used for extracting an entity feature vector X from the input dependency relationship fusion feature vector E Outputting the data to a dependency characteristic matrix fusion network layer and outputting the data to a classifier network layer through a full connection layer; the aspect feature extraction network layer is used for extracting an aspect feature vector X from the input dependency relationship fusion feature vector A Outputting the data to a dependency characteristic matrix fusion network layer and outputting the data to a classifier network layer through a full connection layer; the emotional tendency feature extraction network layer is used for extracting an emotional tendency feature vector X from the input dependency relationship fusion feature vector SC Outputting to a dependency characteristic matrix fusion network layer; fusion of network layers for entity-based feature vectors X based on dependency feature matrices E And the aspect feature vector X A Extracting relation characteristic matrix X of aspect characteristic and entity characteristic A2E Then the relation feature matrix X is used A2E And emotionTendency feature vector X SC Performing fusion processing to obtain emotion and dependency relationship fusion feature matrix X S Outputting the data to a classifier network layer through a full connection layer; the classifier network layer is used for analyzing the entity feature vector X of the public sentiment text E Aspect feature vector X A And emotion and dependency relationship fusion feature matrix X S And as a classification characteristic, performing emotion classification on the public opinion text to be analyzed, and outputting an emotion classification prediction result of the public opinion text to be analyzed.
When the method is applied specifically, the entity feature extraction network layer can adopt a one-dimensional convolution network which carries out entity word feature training in advance by means of an entity dictionary; the aspect feature extraction network layer can adopt a one-dimensional convolution network which carries out aspect feature training in advance by means of aspect feature labeling information of entity words in an entity dictionary; the emotional tendency feature extraction network layer can adopt a long-term and short-term memory artificial neural network for carrying out emotional tendency category training in advance by means of emotional tendency marking information of entity words in an entity dictionary; the classifier network layer can adopt a classifier model which is subjected to text emotion classification prediction training in advance.
When the specific application is implemented, the dependency relationship of the public sentiment text to be analyzed, which is input into the public sentiment text sentiment analysis model, is fused with the feature vector
Figure BDA0003716580250000121
Entity feature vector obtained by entity extraction through one-dimensional convolution network of entity feature extraction network layer
Figure BDA0003716580250000122
Aspect feature vectors obtained by aspect extraction through one-dimensional convolution network of aspect feature extraction network layer
Figure BDA0003716580250000123
Similarly, the emotional tendency feature vector extracted by the long-term and short-term memory artificial neural network of the emotional tendency feature extraction network layer
Figure BDA0003716580250000124
Then, the dependency feature matrix fusion network layer is based on the entity feature vector X E And the aspect feature vector X A Extracting relation characteristic matrix X of aspect characteristic and entity characteristic A2E
Figure BDA0003716580250000125
Wherein,
Figure BDA0003716580250000126
is a relational feature matrix X A2E Relation characteristics of aspect characteristics and entity characteristics corresponding to the ith characteristic word, i e {1,2, …, d len },d len Representing the total number of the feature words obtained by dividing words in the public sentiment text to be analyzed; and has the following components:
Figure BDA0003716580250000127
wherein,
Figure BDA0003716580250000128
is aspect feature vector X A The aspect characteristic corresponding to the ith characteristic word;
Figure BDA0003716580250000129
for the aspect-oriented attention matrix between the ith and jth feature words, j ∈ {1,2, …, d len And:
Figure BDA00037165802500001210
Figure BDA0003716580250000131
for the aspect-oriented relationship between the ith feature word and the jth feature word:
Figure BDA0003716580250000132
Figure BDA0003716580250000133
for entity feature vector X E Entity characteristics corresponding to the jth characteristic word; t is the transposed symbol.
Then, the dependency characteristic matrix is fused with the network layer, and the relation characteristic matrix X is processed A2E And emotional tendency feature vector X SC Performing fusion processing to obtain emotion and dependency relationship fusion feature matrix X S
Figure BDA0003716580250000134
Wherein,
Figure BDA0003716580250000135
fusing feature matrix X for dependencies S The dependency relationship corresponding to the ith feature word fuses the features, i belongs to {1,2, …, d ∈ [ ] len },d len Representing the total number of the feature words obtained by dividing words in the public sentiment text to be analyzed;
Figure BDA0003716580250000136
attention moment array representing aspect orientation between ith characteristic word and jth characteristic word
Figure BDA0003716580250000137
Dependency fusion feature corresponding to ith feature word
Figure BDA0003716580250000138
The dimension formed by dimension splicing of the feature words is
Figure BDA0003716580250000139
Splicing the vectors; and:
Figure BDA00037165802500001310
wherein,
Figure BDA00037165802500001311
is an emotional tendency feature vector X SC The emotional tendency characteristics corresponding to the ith characteristic word;
Figure BDA00037165802500001312
for the attention matrix facing emotional tendency between the ith feature word and the jth feature word, j is formed by {1,2, …, d len And:
Figure BDA00037165802500001313
Figure BDA00037165802500001314
the self-attention mechanism facing the emotional tendency between the ith characteristic word and the jth characteristic word is as follows:
Figure BDA00037165802500001315
h i a self-attention mechanism for the ith feature word; i-j represents the dependency relationship distance I-j between the ith characteristic word and the jth characteristic word in the public sentiment text to be analyzed; t is the transposed symbol.
Finally, the network layer of the classifier uses the entity feature vector X of the public opinion text to be analyzed E Aspect feature vector X A And emotion and dependency relationship fusion feature matrix X S And as a classification characteristic, performing emotion classification on the public opinion text to be analyzed, and outputting an emotion classification prediction result of the public opinion text to be analyzed.
Fig. 3 shows an example of a network structure of a public opinion text emotion analysis model applied in the public opinion text emotion analysis method of the present invention.
Through the above process, it can also be seen that the public opinion text sentiment analysis method of the invention decomposes and transmits sentiment tendency characteristics contained in the public opinion text from more dimensions, more details and more depth, such as word granularity, characteristic word granularity, and dependency relationship among characteristic words, and the sentiment tendency characteristic information is transmitted to the dependency relationship fusion characteristic vector of the public opinion text layer by layer, and after the characteristics of the entity characteristic extraction network layer, the aspect characteristic extraction network layer and the sentiment tendency characteristic extraction network layer in the public opinion text sentiment analysis model are mined, the sentiment and dependency relationship fusion characteristic matrix is formed by fusing the dependency characteristic matrix fusion network layers, thereby fully embodying the influence of the word granularity, the characteristic word granularity, and the dependency relationship among the characteristic words on the sentiment tendency of the public opinion text, and therefore, the sentiment analysis of the public opinion text is performed by means of the characteristics, more accurate emotion classification prediction results can be obtained, and the emotion analysis accuracy of the public opinion text is further improved.
When the method is applied and implemented, the public opinion text emotion analysis model used by the invention can be obtained by training through the following steps:
step 1: and acquiring a sample public opinion text data set from a public opinion text database, wherein the sample public opinion text data set comprises a plurality of sample public opinion texts which are labeled.
In specific application implementation, a self-built dictionary, a word segmentation tool and the like can be used for carrying out word segmentation on sample public sentiment text data to be acquired from an existing public sentiment text database, and digital characters, punctuation marks and other strange characters in the sample public sentiment text can be removed before word segmentation; labeling the sample public sentiment text, wherein the labeled labels comprise an entity word property label, an aspect word label, an emotional tendency label and the like of the characteristic words and an overall emotion classification label of the sample public sentiment text; the entity word property label, the aspect word label and the emotion tendency label of the feature word can be labeled by using a BIO labeling format, and the emotion classification label of the sample public sentiment text can be labeled by using { B, I, O } and { POS, NEU, NGE } joint representation.
Step 2: extracting the dependency relationship fusion characteristic vector of each sample public sentiment text in the sample public sentiment text data set, wherein the extraction method adopts the steps S2-S4 in the public sentiment text sentiment analysis method, namely:
carrying out word granularity word vector coding on the sample public opinion text to obtain a word granularity coding vector of the sample public opinion text;
performing segmentation and dependency syntactic analysis processing on the sample public sentiment text to obtain feature words of the segmentation and dependency relationship information among the feature words, and then performing word embedding joint coding processing to obtain a word embedding joint coding vector carrying the dependency relationship information of the feature words of the sample public sentiment text;
and splicing and fusing the word granularity coding vector and the word embedding joint coding vector of the sample public sentiment text to serve as a dependency relationship fusion characteristic vector of the sample public sentiment text.
And step 3: and selecting a training sample and a test sample from the data set to respectively form a training sample set and a test sample set.
In specific application implementation, the ratio of the number of texts of the training samples to the number of texts of the test samples can be designed to be 8:2, and the selection mode can be designed to be random selection.
And 4, step 4: and performing emotion classification prediction training on the public opinion text emotion analysis model by using the dependency relationship fusion feature vector of each sample public opinion text in the training sample set as the input of the public opinion text emotion analysis model and using the label of each sample public opinion text in the training sample set as the output verification label so as to adjust emotion classification prediction parameters of the public opinion text emotion analysis model.
And 5: and inputting the dependency relationship fusion characteristic vector of each sample public sentiment text in the test sample set into a public sentiment text sentiment analysis model for sentiment classification prediction, adopting the label of each sample public sentiment text in the test sample set as an output verification label, comparing and verifying sentiment classification prediction results of the public sentiment text sentiment analysis model, and evaluating the sentiment classification prediction performance of the public sentiment text sentiment analysis model.
In the specific application implementation, an entity word property label, an aspect word label and an emotional tendency label of a feature word in a label labeled by a sample public sentiment text are respectively used as an output verification label of an entity feature extraction network layer, an aspect feature extraction network layer and an emotional tendency feature extraction network layer of a public sentiment text emotion analysis model, and an emotion classification label of the sample public sentiment text is used as an output verification label of a classifier network layer of the public sentiment text emotion analysis model to train and compare and verify results.
Step 6: if the emotion classification prediction performance of the public opinion text emotion analysis model does not reach the preset target, returning to execute the step b 4; and finishing the training if the emotion classification prediction performance of the public opinion text emotion analysis model reaches a preset target, and obtaining the trained public opinion text emotion analysis model.
The public opinion text sentiment analysis model can be trained and optimized by adopting a random gradient descent method, and the loss function adopted by the training and optimization is as follows:
Figure BDA0003716580250000151
j represents the total number of label types marked by the sample public sentiment text, and N represents the number of subtasks needing to be trained in the public sentiment text sentiment analysis model;
Figure BDA0003716580250000152
respectively representing a real label and a prediction result label of a sample public sentiment text on the jth label category of the ith subtask needing training; mu is a hyperparameter of the loss function, and theta is a weight parameter of the public opinion text sentiment analysis model; i | · | purple wind 2 The expression L2 norm operation.
In conclusion, the method combines the advantages of the semantic dependency tree multi-task joint training, uses the graph attention network to extract the non-European spatial features of the text, adopts the BERT to extract the global features of the text for fusion, and is favorable for better understanding the semantic information of the text; compared with GAT, BERT and other BERT-based models, the method can obtain better emotion analysis accuracy, and is helpful for better assisting in the application work of subsequent emotion type differentiation and public opinion trend and trend analysis of public opinion texts.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the technical solutions, and those skilled in the art should understand that modifications or equivalent substitutions can be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions, and all that should be covered by the claims of the present invention.

Claims (10)

1. A public opinion text emotion analysis method based on semantic dependency relationship fusion features is characterized by comprising the following steps:
s1: acquiring a public opinion text to be analyzed;
s2: carrying out word granularity word vector coding on the public opinion text to be analyzed to obtain a word granularity coding vector of the public opinion text to be analyzed;
s3: performing segmentation and dependency syntactic analysis on the public sentiment text to be analyzed to obtain the feature words of the segmentation and dependency relationship information among the feature words, and then performing word embedding joint coding to obtain a word embedding joint coding vector carrying the dependency relationship information of the meaning of the feature words of the public sentiment text to be analyzed;
s4: splicing and fusing the word granularity coding vector and the word embedding joint coding vector of the public opinion text to be analyzed to serve as a dependency relationship fusion characteristic vector of the public opinion text to be analyzed;
s5: and inputting the dependency relationship of the public opinion text to be analyzed and the feature vector into a pre-trained public opinion text emotion analysis model to obtain an emotion classification prediction result of the public opinion text to be analyzed.
2. The method for emotion analysis of public opinion text based on semantic dependency fusion characteristics according to claim 1, wherein in the step S1, the method further comprises preprocessing the public opinion text to be analyzed, wherein the preprocessing comprises one or more of processing for word error correction, processing for symbol error correction, processing for syntax error correction, and processing for synonym expression consistency.
3. The method for analyzing the public sentiment text emotion based on semantic dependency relationship fusion characteristics as claimed in claim 1, wherein the step S2 is specifically as follows: carrying out word granularity decomposition on the public opinion text to be analyzed, carrying out word vector coding on each word obtained by decomposition by adopting a BERT model, and obtaining a word granularity coding vector of the public opinion text to be analyzed:
Figure FDA0003716580240000011
in the formula,
Figure FDA0003716580240000012
a word granularity coding vector for expressing the public opinion text to be analyzed,
Figure FDA0003716580240000013
representing a coding vector of each word in the public sentiment text to be analyzed in the dimension of the mth hidden layer, wherein M belongs to {1,2, …, M }, and M represents a coding dimension for coding the word vector by adopting a BERT model; n represents the number of words contained in the public opinion text to be analyzed.
4. The method for analyzing the public sentiment text emotion based on semantic dependency relationship fusion characteristics as claimed in claim 1, wherein the step S3 specifically includes the following steps:
s301: performing word segmentation and dependency syntactic analysis processing on the public sentiment text to be analyzed to obtain feature words of the word segmentation and dependency relationship information among the feature words;
s302: respectively determining the parts of speech of each characteristic word in the public opinion text to be analyzed, taking the coding dimension size M of the word granularity coding vector of the public opinion text to be analyzed as the coding dimension size of word embedding coding, respectively carrying out word embedding coding on the part of speech information of each characteristic word in the public opinion text to be analyzed and the corresponding dependency relationship information thereof, and obtaining the characteristic word coding vector and the dependency relationship coding vector of the public opinion text to be analyzed:
Figure FDA0003716580240000021
in the formula,
Figure FDA0003716580240000022
respectively coding a characteristic word coding vector and a dependency relation coding vector of the public sentiment text to be analyzed,
Figure FDA0003716580240000023
embedding a coding vector for the part-of-speech information of each characteristic word in the public opinion text to be analyzed in the m-th coding dimension,
Figure FDA0003716580240000024
embedding a coding vector in a word of an M-th coding dimension for a dependency relationship corresponding to each feature word in the public sentiment text to be analyzed, wherein M belongs to {1,2, …, M }, M represents a coding dimension size of word embedding coding, and d len Representing the total number of the feature words obtained by dividing words in the public sentiment text to be analyzed;
s303: feature word coding vector W of public opinion text to be analyzed p Sum dependency encoding vector W d Combined to form a sum matrix W pd Inputting the data into a preset relational graph attention coding network for coding to obtain a corresponding joint coding matrix:
Figure FDA0003716580240000025
in the formula,
Figure FDA0003716580240000026
representing the resulting joint coding matrix;
Figure FDA0003716580240000027
indicates to be dividedAnalyzing the part of speech of each feature word in the public opinion text and the joint coding vector of the dependency relationship corresponding to the part of speech in the mth coding dimension;
the relation graph attention coding network comprises a multi-head attention mechanism layer, a linear layer and a point-by-point convolution layer which are sequentially connected;
s304: for the joint coding matrix
Figure FDA0003716580240000028
Performing pooling treatment to obtain word embedded joint coding vector H of public opinion text to be analyzed graph
H graph ={h g,1 ,h g,2 ,…,h g,m ,…,h g,M };
Wherein the words are embedded into the joint coding vector
Figure FDA0003716580240000029
A joint coding vector of the part of speech of each characteristic word in the public sentiment text to be analyzed and the corresponding dependency relationship in the m-th coding dimension
Figure FDA00037165802400000210
The pooling value of (a).
5. The method for emotion analysis of public sentiment text based on semantic dependency relationship fusion characteristics as claimed in claim 1, wherein in the step S4, the dependency relationship fusion characteristic vector obtained by splicing and fusing the word granularity coding vector and the word embedding joint coding vector of the public sentiment text to be analyzed is:
Figure FDA00037165802400000211
in the formula,
Figure FDA00037165802400000212
fusing a characteristic vector for the dependence relationship of the public sentiment text to be analyzed,
Figure FDA00037165802400000213
word granularity code vector representing public sentiment text to be analyzed, M table word granularity code vector H BERT N represents the number of words contained in the public opinion text to be analyzed; symbol(s)
Figure FDA00037165802400000214
Representing a product operation element by element; g represents a splicing fusion function of a word granularity coding vector and a word embedding joint coding vector of the public sentiment text to be analyzed, and comprises the following steps:
g=σ(W g [H BERT :H graph ]+b g );
where σ (-) denotes a sigmoid activation function, W g And b g Respectively representing a weight matrix and an offset vector of sigmoid activation,
Figure FDA0003716580240000031
representing word-embedding into a joint coded vector
Figure FDA0003716580240000032
Respectively spliced to word-granularity encoded vectors
Figure FDA0003716580240000033
Of the n word dimensions of
Figure FDA0003716580240000034
The concatenation vector of (2).
6. The method for analyzing the public sentiment text emotion based on semantic dependency relationship fusion characteristics as claimed in claim 1, wherein the public sentiment text emotion analysis model in step S5 includes an entity characteristic extraction network layer, an aspect characteristic extraction network layer, an emotion tendency characteristic extraction network layer, a dependency characteristic matrix fusion network layer, a full link layer and a classifier network layer;
for the entity feature extraction network layerExtracting entity feature vector X from input dependency relationship fusion feature vector E Outputting the data to a dependency characteristic matrix fusion network layer and outputting the data to a classifier network layer through a full connection layer;
the aspect feature extraction network layer is used for extracting an aspect feature vector X from the input dependency relationship fusion feature vector A Outputting the data to a dependency characteristic matrix fusion network layer and outputting the data to a classifier network layer through a full connection layer;
the emotional tendency feature extraction network layer is used for extracting an emotional tendency feature vector X from the input dependency relationship fusion feature vector SC Outputting to a dependency characteristic matrix fusion network layer;
the dependency feature matrix fusion network layer is used for fusing the network layer based on the entity feature vector X E And the aspect feature vector X A Extracting relation characteristic matrix X of aspect characteristic and entity characteristic A2E Then the relation feature matrix X is used A2E And emotional tendency feature vector X SC Performing fusion processing to obtain emotion and dependency relationship fusion feature matrix X S Outputting the data to a classifier network layer through a full connection layer;
the classifier network layer is used for analyzing an entity feature vector X of the public opinion text E Aspect feature vector X A And emotion and dependency relationship fusion feature matrix X S And as a classification characteristic, performing emotion classification on the public opinion text to be analyzed, and outputting an emotion classification prediction result of the public opinion text to be analyzed.
7. The public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics as claimed in claim 6, wherein the entity characteristic extraction network layer is a one-dimensional convolution network for performing entity word characteristic training in advance by means of an entity dictionary;
the aspect feature extraction network layer is a one-dimensional convolution network which conducts aspect feature training in advance by means of aspect feature labeling information of entity words in an entity dictionary;
the emotional tendency feature extraction network layer is a long-short term memory artificial neural network for carrying out emotional tendency category training in advance by means of emotional tendency marking information of entity words in an entity dictionary;
the classifier network layer is a classifier model which carries out text emotion classification prediction training in advance.
8. The method as claimed in claim 6, wherein the method for analyzing the sentiment of the public sentiment text based on the semantic dependency relationship fusion characteristics is characterized in that the dependency characteristic matrix fusion network layer extracts a relationship characteristic matrix X of the aspect characteristics and the entity characteristics A2E Comprises the following steps:
Figure FDA0003716580240000035
wherein,
Figure FDA0003716580240000036
is a relational feature matrix X A2E Relation characteristics of aspect characteristics and entity characteristics corresponding to the ith characteristic word, i e {1,2, …, d len },d len Representing the total number of the feature words obtained by dividing words in the public sentiment text to be analyzed; and has the following components:
Figure FDA0003716580240000041
wherein,
Figure FDA0003716580240000042
is an aspect feature vector X A The aspect characteristic corresponding to the ith characteristic word;
Figure FDA0003716580240000043
for the aspect-oriented attention matrix between the ith and jth feature words, j ∈ {1,2, …, d len And:
Figure FDA0003716580240000044
Figure FDA0003716580240000045
for the aspect-oriented relationship between the ith feature word and the jth feature word:
Figure FDA0003716580240000046
Figure FDA0003716580240000047
for entity feature vector X E Entity characteristics corresponding to the jth characteristic word; t is the transposed symbol.
9. The method as claimed in claim 8, wherein the dependency feature matrix is fused with an emotion and dependency feature fusion feature matrix X obtained by network layer fusion processing S Comprises the following steps:
Figure FDA0003716580240000048
wherein,
Figure FDA0003716580240000049
fusing feature matrix X for dependencies S The dependency relationship corresponding to the ith feature word fuses the features, i belongs to {1,2, …, d ∈ [ ] len },d len Representing the total number of the feature words obtained by dividing words in the public sentiment text to be analyzed;
Figure FDA00037165802400000410
attention moment array representing aspect orientation between ith feature word and jth feature word
Figure FDA00037165802400000411
Dependency fusion feature corresponding to ith feature word
Figure FDA00037165802400000412
The dimension formed by dimension splicing of the feature words is
Figure FDA00037165802400000413
Splicing the vectors; and:
Figure FDA00037165802400000414
wherein,
Figure FDA00037165802400000415
is an emotional tendency feature vector X SC The emotional tendency characteristics corresponding to the ith characteristic word;
Figure FDA00037165802400000416
for the attention matrix facing emotional tendency between the ith feature word and the jth feature word, j is formed by {1,2, …, d len And:
Figure FDA00037165802400000417
Figure FDA00037165802400000418
the self-attention mechanism facing the emotional tendency between the ith characteristic word and the jth characteristic word is as follows:
Figure FDA00037165802400000419
h i a self-attention mechanism for the ith feature word; i-j | represents to-be-analyzed public opinion textThe dependency relationship distance | i-j | between the ith characteristic word and the jth characteristic word; t is the transposed symbol.
10. The public opinion text sentiment analysis method based on semantic dependency relationship fusion characteristics as claimed in claim 1, wherein the public opinion text sentiment analysis model is obtained by performing training optimization by a random gradient descent method, and a loss function adopted by the training optimization is as follows:
Figure FDA0003716580240000051
j represents the total number of label types marked by the sample public sentiment text, and N represents the number of subtasks needing to be trained in the public sentiment text sentiment analysis model;
Figure FDA0003716580240000052
respectively representing a real label and a prediction result label of a sample public sentiment text on the jth label category of the ith subtask to be trained; mu is a hyperparameter of the loss function, and theta is a weight parameter of the public opinion text sentiment analysis model; i | · | purple wind 2 The expression L2 norm operation.
CN202210744752.3A 2022-06-27 2022-06-27 Public opinion text emotion analysis method based on semantic dependency relationship fusion characteristics Active CN115098634B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210744752.3A CN115098634B (en) 2022-06-27 2022-06-27 Public opinion text emotion analysis method based on semantic dependency relationship fusion characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210744752.3A CN115098634B (en) 2022-06-27 2022-06-27 Public opinion text emotion analysis method based on semantic dependency relationship fusion characteristics

Publications (2)

Publication Number Publication Date
CN115098634A true CN115098634A (en) 2022-09-23
CN115098634B CN115098634B (en) 2024-07-02

Family

ID=83295363

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210744752.3A Active CN115098634B (en) 2022-06-27 2022-06-27 Public opinion text emotion analysis method based on semantic dependency relationship fusion characteristics

Country Status (1)

Country Link
CN (1) CN115098634B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759104A (en) * 2023-01-09 2023-03-07 山东大学 Financial field public opinion analysis method and system based on entity recognition
CN116522013A (en) * 2023-06-29 2023-08-01 乐麦信息技术(杭州)有限公司 Public opinion analysis method and system based on social network platform
CN117648980A (en) * 2024-01-29 2024-03-05 数据空间研究院 Novel entity relationship joint extraction algorithm based on contradiction dispute data
CN118377909A (en) * 2024-06-21 2024-07-23 杭州贵禾科技有限公司 Customer label determining method and device based on call content and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN109582764A (en) * 2018-11-09 2019-04-05 华南师范大学 Interaction attention sentiment analysis method based on interdependent syntax
CN109753566A (en) * 2019-01-09 2019-05-14 大连民族大学 The model training method of cross-cutting sentiment analysis based on convolutional neural networks
US20200159832A1 (en) * 2018-11-15 2020-05-21 Fei CAI Device and text representation method applied to sentence embedding
WO2022068314A1 (en) * 2020-09-29 2022-04-07 华为技术有限公司 Neural network training method, neural network compression method and related devices

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
CN109582764A (en) * 2018-11-09 2019-04-05 华南师范大学 Interaction attention sentiment analysis method based on interdependent syntax
US20200159832A1 (en) * 2018-11-15 2020-05-21 Fei CAI Device and text representation method applied to sentence embedding
CN109753566A (en) * 2019-01-09 2019-05-14 大连民族大学 The model training method of cross-cutting sentiment analysis based on convolutional neural networks
WO2022068314A1 (en) * 2020-09-29 2022-04-07 华为技术有限公司 Neural network training method, neural network compression method and related devices

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
JUNHUA MA等: "integrating dependency tree into self-attention for sentence representation", IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS,SPEECH AND SIGNAL PROCESSING, 27 May 2022 (2022-05-27), pages 1 - 10 *
张仰森;郑佳;李佳媛;: "一种基于语义关系图的词语语义相关度计算模型", 自动化学报, vol. 44, no. 01, 15 January 2018 (2018-01-15), pages 87 - 98 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115759104A (en) * 2023-01-09 2023-03-07 山东大学 Financial field public opinion analysis method and system based on entity recognition
CN115759104B (en) * 2023-01-09 2023-09-22 山东大学 Financial domain public opinion analysis method and system based on entity identification
CN116522013A (en) * 2023-06-29 2023-08-01 乐麦信息技术(杭州)有限公司 Public opinion analysis method and system based on social network platform
CN116522013B (en) * 2023-06-29 2023-09-05 乐麦信息技术(杭州)有限公司 Public opinion analysis method and system based on social network platform
CN117648980A (en) * 2024-01-29 2024-03-05 数据空间研究院 Novel entity relationship joint extraction algorithm based on contradiction dispute data
CN117648980B (en) * 2024-01-29 2024-04-12 数据空间研究院 Novel entity relationship joint extraction method based on contradiction dispute data
CN118377909A (en) * 2024-06-21 2024-07-23 杭州贵禾科技有限公司 Customer label determining method and device based on call content and storage medium
CN118377909B (en) * 2024-06-21 2024-08-27 杭州贵禾科技有限公司 Customer label determining method and device based on call content and storage medium

Also Published As

Publication number Publication date
CN115098634B (en) 2024-07-02

Similar Documents

Publication Publication Date Title
US11631007B2 (en) Method and device for text-enhanced knowledge graph joint representation learning
Shen et al. Attention-based convolutional neural network for semantic relation extraction
CN111611810B (en) Multi-tone word pronunciation disambiguation device and method
CN112069811B (en) Electronic text event extraction method with multi-task interaction enhancement
CN111401077B (en) Language model processing method and device and computer equipment
CN115098634B (en) Public opinion text emotion analysis method based on semantic dependency relationship fusion characteristics
WO2018028077A1 (en) Deep learning based method and device for chinese semantics analysis
CN109857846B (en) Method and device for matching user question and knowledge point
CN113157859B (en) Event detection method based on upper concept information
CN113886601B (en) Electronic text event extraction method, device, equipment and storage medium
CN113672731B (en) Emotion analysis method, device, equipment and storage medium based on field information
CN114722839A (en) Man-machine collaborative dialogue interaction system and method
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN114742034A (en) Transaction information identification method, device, system and medium based on syntactic dependency
CN114004231A (en) Chinese special word extraction method, system, electronic equipment and storage medium
CN113761883A (en) Text information identification method and device, electronic equipment and storage medium
Liu et al. Cross-domain slot filling as machine reading comprehension: A new perspective
CN115510230A (en) Mongolian emotion analysis method based on multi-dimensional feature fusion and comparative reinforcement learning mechanism
CN113361252B (en) Text depression tendency detection system based on multi-modal features and emotion dictionary
CN114254641A (en) Chemical reaction event extraction method and system based on deep learning
CN113705207A (en) Grammar error recognition method and device
CN117251545A (en) Multi-intention natural language understanding method, system, equipment and storage medium
CN113012685A (en) Audio recognition method and device, electronic equipment and storage medium
Hokamp Deep interactive text prediction and quality estimation in translation interfaces
CN114239555A (en) Training method of keyword extraction model and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant