CN114297390A - Aspect category identification method and system under long-tail distribution scene - Google Patents
Aspect category identification method and system under long-tail distribution scene Download PDFInfo
- Publication number
- CN114297390A CN114297390A CN202111681644.8A CN202111681644A CN114297390A CN 114297390 A CN114297390 A CN 114297390A CN 202111681644 A CN202111681644 A CN 202111681644A CN 114297390 A CN114297390 A CN 114297390A
- Authority
- CN
- China
- Prior art keywords
- vector
- category
- embedding
- module
- equation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 239000013598 vector Substances 0.000 claims abstract description 132
- 230000007246 mechanism Effects 0.000 claims abstract description 38
- 230000006870 function Effects 0.000 claims abstract description 26
- 230000008901 benefit Effects 0.000 claims abstract description 8
- 239000011159 matrix material Substances 0.000 claims description 49
- 230000004927 fusion Effects 0.000 claims description 30
- 238000004364 calculation method Methods 0.000 claims description 15
- 239000000126 substance Substances 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 15
- 238000009499 grossing Methods 0.000 claims description 10
- 230000003993 interaction Effects 0.000 claims description 10
- 238000005070 sampling Methods 0.000 claims description 10
- 238000013507 mapping Methods 0.000 claims description 8
- 238000010276 construction Methods 0.000 claims description 6
- 239000000203 mixture Substances 0.000 claims description 6
- 230000001629 suppression Effects 0.000 claims description 6
- 238000011176 pooling Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 5
- 230000000694 effects Effects 0.000 abstract description 13
- 238000003058 natural language processing Methods 0.000 abstract description 2
- 238000004458 analytical method Methods 0.000 description 10
- 239000002775 capsule Substances 0.000 description 6
- 230000008451 emotion Effects 0.000 description 6
- 238000001514 detection method Methods 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 230000002452 interceptive effect Effects 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000012552 review Methods 0.000 description 2
- 102100035709 Acetyl-coenzyme A synthetase, cytoplasmic Human genes 0.000 description 1
- 101000783232 Homo sapiens Acetyl-coenzyme A synthetase, cytoplasmic Proteins 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention discloses an aspect category identification method and system under a long-tail distribution scene, and belongs to the technical field of natural language processing. The method is based on an aspect category identification system under a long-tail distribution scene, the system focuses on the long-tail distribution characteristics of data, firstly obtains fine-grained aspect feature vectors of sentences and provides additional context aspect-level semantic information; then, an attention mechanism fusing context aspect-level semantic information based on long-tail distribution is added, the capability of a model for capturing information most relevant to the aspect categories is enhanced, meanwhile, an improved distribution balance loss function is provided to solve the problems of label co-occurrence and negative category advantages in a long-tail multi-label text classification task, and the aspect category identification effect with the long-tail distribution characteristic is effectively improved.
Description
Technical Field
The invention relates to an aspect category identification method and system under a long tail distribution scene, and belongs to the technical field of natural language processing.
Background
Aspect Category Detection (ACD), one of the important subtasks of Aspect level emotion analysis, aims to detect an Aspect Category contained in a sentence from a set of predefined Aspect categories. Aspect category identification is the fundamental task of whole aspect level sentiment analysis. The emotion analysis has wide application in various fields of life, for example, emotion analysis aiming at opinions of users on various topics expressed in social media, restaurant evaluation, online shopping and the like can help users to have better consumption experience, and can help merchants to know market demands.
However, in actual research, the aspect category distribution often presents the characteristics of unbalanced or even long-tail distribution, so that the model cannot sufficiently extract the features of the tail aspect category, which brings great challenges to the aspect category identification task.
Some existing work addresses this problem with classical machine learning models or deep learning models. For example, Ghadery, E et al (Ghadery, E., et al, MNCN: A Multilingual N gram-Based conditional Network for Aspect Category Detection in Online reviews.2019.33: p.6441-6448.) embed Multilingual words as input to the Network, extract features using a deep Convolutional neural Network, and then learn and identify different facet classes using different fully connected layers, respectively. Hu, M et al (Hu, M., et al, CAN: structured Attention Networks for Multi-Aspect Sentiment analysis.2018.) introduce sparse regularization and orthogonal regularization to compute Attention weights for multiple aspects. This allows the attention weight of multiple aspects to be focused on different parts, while the attention weight of each aspect is focused on only a few words. Movahedi, s. et al (Movahedi, s., et al, Aspect Category Detection view-Attention network.2019.) propose a Topic Attention network model that can detect Aspect categories of a given sentence by paying Attention to different parts of the sentence. Li, Y. et al (Li, Y., et al. Multi-instant Multi-Label Learning Networks for the Aspect-Category analysis. in the processes of the 2020 Conference on electronic Methods in Natural Language Processing (EMNLP).2020.) propose a joint model of Aspect emotion analysis for Multi-Instance Multi-Label Learning, where the attention-based ACD generates significant attention weights for different Aspect classes.
However, the distribution of data collected from an actual scene is often unbalanced, and even presents the feature of a long tail distribution, i.e. a few classes (also called head classes) occupy most of the data, while most of the classes (also called tail classes) have few samples. The above-mentioned prior art methods neglect such a sample number gap when training the model. Too large difference of the numbers of training samples of different classes can make the model unable to achieve good effect on the identification of the class with limited number of samples. The aspect categories are unbalanced, and even the caused long tail distribution can influence the learning process, so that the recognition effect is poor.
Disclosure of Invention
The invention provides an aspect type identification method and system under a long-tail distribution scene, and aims to solve the problem that identification of a model on an aspect type with a limited number of samples cannot achieve a good effect due to the fact that the number difference of training samples of different types is too large caused by long-tail distribution at present.
The first purpose of the invention is to provide an aspect category identification method under a long tail distribution scene, which is characterized in that the method is used for data setsThe N sentences of (1) performs aspect category identification, wherein Sl={w1,w2,...,wnIs the ith sentence in the data set D, consisting of n words, wnRepresents the l-th sentence SlThe nth word;is the l-th sentence SlA corresponding aspect category label;
the method comprises the following steps:
step 1: defining m aspect classes in advance, and using A ═ a1,a2,…,amDenotes wherein amTo describe a word or phrase in the mth aspect,
step 2: constructing a word embedding matrix E1∈R|V|×dEach word wiEmbedding matrix E by the words1Is mapped asWhere | V | is the size of all words in the data set D, and D is the dimension of a word vector;
simultaneous construction of facet class embedding matrices E2∈Rm×dEach word aiEmbedding matrix E by the aspect class2Is mapped as
And step 3: embedding the text into a vectorEmbedding vectors with the aspectInputting the sentence into a long-time memory network LSTM to obtain the network output hidden state of the sentence And
and 4, step 4: the hidden state HwAnd HaInputting the data into an IAN-LoT mechanism to obtain a total aspect vector representation s fusing long-tail distribution characteristics;
and 5: inputting the total aspect vector s into an attention mechanism that fuses contextual aspect-level semantic information;
summing the total aspect vector s withAs input, a fusion vector is calculatedAs shown in equation (1):
wherein W ∈ Rn×1Is a learnable weight parameter that blends each word with an aspect,vector representing semantic information of the fusion context aspect level, willInput to the attention mechanism, generating an attention weight vector for each predefined facet class; as shown in equation (2), for the jth aspect class:
wherein Wj∈Rd×d,bj∈RdAnd uj∈RdFor a learnable parameter, β ∈ R1×mRepresenting the long tail distribution characteristic learned in advance, and is the reciprocal number of effective samples in the training set, alphaj∈RnIs an attention weight vector;
step 6: using vectorsAs a predicted sentence representation, for the jth aspect class, as shown in equation (3):
wherein, Wj∈Rd×1,bjIs a scalar quantity of the input signals,and when the prediction result of the jth aspect category is greater than the classification threshold value, the sentence is considered to contain the jth aspect category.
Optionally, the step of calculating a total aspect vector of the fused long-tail distribution features in the IAN-LoT mechanism includes:
step 41: hidden state for input HwAnd HaCalculating an interaction attention weight matrix I e Rn×mAs shown in equation (4):
step 42: performing softmax calculations for each row of the interaction attention weight matrix, as shown in equation (5):
wherein k isijFor the matrix k ∈ Rn×mI row and j column of (a), k represents the attention weight of the text to the aspect, IijIs the element of ith row and jth column in the matrix I;
step 43: and then introducing a data long tail distribution characteristic for the matrix k, as shown in formula (6):
wherein the content of the first and second substances,weight information for each aspect for text introducing long tail distribution, beta epsilon R1×mRepresenting the long tail distribution characteristics learned in advance, and being the reciprocal number of the effective samples in the training set, wherein m is the number of the aspect categories;
step 44: for thePerforming maximum pooling to obtain fine-grained text-to-aspect weight information I blended with long-tail distribution characteristicsLFurther, the weight information and the embedded vector representation of the aspect category are expressedThe multiplication results in the final overall aspect vector representation s, as shown in equation (7):
wherein s ∈ R1×d。
Optionally, the method trains the recognition model by using an improved a-DB loss function, where the improved a-DB loss function improves a calculation method of the rebalancing weight and a smoothing function, and specifically includes:
first, without considering tag co-occurrence,representing the number of samples containing the jth aspect category in the data set; the expected value of the sampling frequency for the jth aspect class isThe sample sampling frequency P is then estimated for each of the positive classes of repeated samples contained in the exampleIAs shown in equation (8):
wherein the content of the first and second substances,when in useIndicating that the ith sentence contains the jth aspect category aj,If not, then not included;
wherein gamma is a coordination weight hyperparameter;
in order to avoid the over-suppression of a few classes caused by the advantages of negative labels, a negative class suppression hyper-parameter lambda and a specific bias tau are introducedjAs shown in formula (11):
where ρ isjIs the ratio of the jth category to the total number of samples, and eta is a proportion hyper-parameter;
the A-DB loss function is shown in equation (12):
Optionally, the classification threshold is 0.5.
A second object of the present invention is to provide an aspect category identification system in a long tail distribution scenario, wherein the system includes: the system comprises an input module, a text embedding module, an LSTM module, an IAN-LOT module, a fusion module, an attention mechanism module and a prediction module;
the input module, the text embedding module, the LSTM module, the IAN-LOT module, the fusion module, the attention mechanism module and the prediction module are sequentially connected;
the input module is used for inputting a predefined aspect category combination and a text to be recognized; the text aspect embedding module is used for constructing a word embedding matrix and an aspect category embedding matrix, and mapping an input predefined aspect category combination and a text to be recognized to a text embedding vector and an aspect embedding vector; the LSTM module is used for outputting the hidden states of the text embedding vector and the aspect embedding vector; the IAN-LOT module is used for obtaining a total aspect vector fusing long tail distribution characteristics according to the hidden state; the fusion module is used for fusing context level semantic information and generating a fusion vector; the attention mechanism module generates an attention weight vector for each predefined aspect category according to the fusion vector; the prediction module is used for completing classification and prediction of aspect class identification according to the attention weight vector.
Optionally, the system pairs the data setsThe work process of recognizing the aspect category of the N sentences comprises the following steps:
wherein S isl={w1,w2,…,wnIs the ith sentence in the data set D, consisting of n words, wiRepresents the l-th sentence SlThe ith word;is the l-th sentence SlA corresponding aspect category label;
step 1: defining m aspect classes in advance, and using A ═ a1,a2,…,amDenotes wherein amTo describe a word or phrase in the mth aspect,
step 2: constructing a word embedding matrix E1∈R|V|×dEach word wiEmbedding matrix E by the words1Is mapped asWhere | V | is the size of all words in the data set D, and D is the dimension of a word vector;
simultaneous construction of facet class embedding matrices E2∈Rm×dEach word aiEmbedding matrix E by the aspect class2Is mapped as
And step 3: embedding the text into a vectorEmbedding vectors with the aspectInputting the sentence into a long-time memory network LSTM to obtain the network output hidden state of the sentence And
and 4, step 4: the hidden state HwAnd HaInputting the data into an IAN-LoT mechanism to obtain a total aspect vector representation s fusing long-tail distribution characteristics;
and 5: inputting the total aspect vector s into an attention mechanism that fuses contextual aspect-level semantic information;
summing the total aspect vector s withAs input, a fusion vector is calculatedAs shown in equation (1):
wherein W ∈ Rn×1Is a learnable weight parameter that blends each word with an aspect,vector representing semantic information of the fusion context aspect level, willInput to the attention mechanism, generating an attention weight vector for each predefined facet class; as shown in equation (2), for the jth aspect class:
wherein Wj∈Rd×d,bj∈RdAnd uj∈RdFor a learnable parameter, β ∈ R1×mRepresenting the long tail distribution characteristic learned in advance, and is the reciprocal number of effective samples in the training set, alphaj∈RnIs an attention weight vector;
step 6: using vectorsAs a predicted sentence representation, for the jth aspect class, as shown in equation (3):
wherein, Wj∈Rd×1,bjIs a scalar quantity of the input signals,and when the prediction result of the jth aspect category is greater than the classification threshold value, the sentence is considered to contain the jth aspect category.
Optionally, the step of calculating a total aspect vector of the fused long-tail distribution features in the IAN-LoT mechanism includes:
step 41: hidden state for input HwAnd HaCalculating an interaction attention weight matrix I e Rn×mAs shown in equation (4):
step 42: performing softmax calculations for each row of the interaction attention weight matrix, as shown in equation (5):
wherein k isijFor the matrix k ∈ Rn×mI row and j column of (a), k represents the attention weight of the text to the aspect, IijIs the element of ith row and jth column in the matrix I;
step 43: and then introducing a data long tail distribution characteristic for the matrix k, as shown in formula (6):
wherein the content of the first and second substances,weight information for each aspect for text introducing long tail distribution, beta epsilon R1×mRepresenting the long tail distribution characteristics learned in advance, and being the reciprocal number of the effective samples in the training set, wherein m is the number of the aspect categories;
step 44: for thePerforming maximum pooling to obtain fine-grained text-to-aspect weight information I blended with long-tail distribution characteristicsLFurther, the weight information and the embedded vector representation of the aspect category are expressedThe multiplication results in the final overall aspect vector representation s, as shown in equation (7):
wherein s ∈ R1×d。
Optionally, the system trains the recognition model by using an improved a-DB loss function, where the improved a-DB loss function improves a calculation method of the rebalancing weight and a smoothing function, and specifically includes:
first, without considering tag co-occurrence,representing the number of samples containing the jth aspect category in the data set; the expected value of the sampling frequency for the jth aspect class isThe sample sampling frequency P is then estimated for each of the positive classes of repeated samples contained in the exampleIAs shown in equation (8):
wherein the content of the first and second substances,when in useIndicating that the ith sentence contains the jth aspect category aj,If not, then not included;
wherein gamma is a coordination weight hyperparameter;
in order to avoid the over-suppression of a few classes caused by the advantages of negative labels, a negative class suppression hyper-parameter lambda and a specific bias tau are introducedjAs shown in formula (11):
where ρ isjIs the ratio of the jth category to the total number of samples, and eta is a proportion hyper-parameter;
the A-DB loss function is shown in equation (12):
The invention has the beneficial effects that:
aiming at the aspect category identification problem which is characterized by long-tail distribution data, the invention models the aspect category identification into a multi-label classification problem and provides an aspect category identification model based on the fusion aspect vector of long-tail distribution;
1) an A-DB loss function suitable for the multi-label long-tail distribution problem is introduced to train a recognition model, so that the method can be effectively suitable for data with the long-tail distribution characteristic, and the aspect category recognition effect with the long-tail distribution characteristic is improved;
2) an interactive attention module with the characteristic of data long-tail distribution, namely an IAN-LoT mechanism is introduced, and the mechanism introduces the characteristic of data long-tail distribution, can obtain feature vectors in the aspect of fine granularity of sentences, provides additional context aspect-level semantic information, and can effectively improve the recognition effect of tail categories;
3) after the fine-grained aspect feature vectors are obtained, the invention also provides an attention mechanism for fusing the context aspect-level semantic information, the aspect vectors and the context information are fused, the relevant correct information of the aspect can be focused, the capability of capturing the most relevant information of the aspect category by the model is enhanced, and the model is more effective on the aspect category identification task of long-tail distribution.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a model architecture diagram of the present invention.
Fig. 2 is a flow chart of the method of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The first embodiment is as follows:
the embodiment provides an aspect category identification method under a long tail distribution scene, and the method is used for identifying the category of a data set The N sentences of (1) performs aspect category identification, wherein Sl={w1,w2,…,wnIs the ith sentence in the data set D, consisting of n words, wnRepresents the l-th sentence SlThe nth word;is the l-th sentence SlA corresponding aspect category label;
the method comprises the following steps:
step 1: defining m aspect classes in advance, and using A ═ a1,a2,…,amDenotes wherein amTo describe a word or phrase in the mth aspect,
step 2: constructing a word embedding matrix E1∈R|V|×dEach word wiEmbedding matrix E by the words1Is mapped asWhere | V | is the size of all words in the data set D, and D is the dimension of a word vector;
simultaneous construction of facet class embedding matrices E2∈Rm×dEach word aiEmbedding matrix E by the aspect class2Is mapped as
And step 3: embedding the text into a vectorEmbedding vectors with the aspectInputting the sentence into a long-time memory network LSTM to obtain the network output hidden state of the sentence And
and 4, step 4: the hidden state HwAnd HaInputting the data into an IAN-LoT mechanism to obtain a total aspect vector representation s fusing long-tail distribution characteristics;
and 5: inputting the total aspect vector s into an attention mechanism that fuses contextual aspect-level semantic information;
summing the total aspect vector s withAs input, a fusion vector is calculatedAs shown in equation (1):
wherein W ∈ Rn×1Is a learnable weight parameter that blends each word with an aspect,vector representing semantic information of the fusion context aspect level, willInput to the attention mechanism, generating an attention weight vector for each predefined facet class; as shown in equation (2), for the jth aspect class:
wherein Wj∈Rd×d,bj∈RdAnd uj∈RdFor a learnable parameter, β ∈ R1×mRepresenting the long tail distribution characteristic learned in advance, and is the reciprocal number of effective samples in the training set, alphaj∈RnIs an attention weight vector;
step 6: using vectorsAs a predicted sentence representation, for the jth aspect class, as shown in equation (3):
wherein, Wj∈Rd×1,bjIs a scalar quantity of the input signals,and when the prediction result of the jth aspect category is greater than the classification threshold value, the sentence is considered to contain the jth aspect category.
Example two:
the method of the embodiment introduces an interactive attention module with the characteristic of data long-tail distribution, namely an IAN-LoT mechanism, and can obtain fine-grained aspect feature vectors of sentences and provide additional context aspect-level semantic information. After obtaining the fine-grained aspect feature vectors, the model also adds an attention mechanism based on long-tail distribution and fusing context-level semantic information, enhances the capability of the model to capture information most relevant to aspect categories, and is trained by using an improved multi-label classification loss function suitable for long-tail distribution.
This embodiment is for data setsThe N sentences of (1) performs aspect category identification, wherein Sl={w1,w2,…,wnIs the ith sentence in the data set D, consisting of n words, wnRepresents the l-th sentence SlThe nth word;is the l-th sentence SlA corresponding aspect category label;
the method comprises the following steps:
step 1: defining m aspect classes in advance, and using A ═ a1,a2,…,amDenotes wherein amTo describe a word or phrase in the mth aspect,
step 2: constructing a word embedding matrix E1∈R|V|×dEach word wiEmbedding matrix E by the words1Is mapped asWhere | V | is the size of all words in the data set D, and D is the dimension of a word vector;
simultaneous construction of facet class embedding matrices E2∈Rm×dEach word aiEmbedding matrix E by the aspect class2Is mapped as
And step 3: embedding the text into a vectorEmbedding vectors with the aspectInputting the sentence into a long-time memory network LSTM to obtain the network output hidden state of the sentence And
and 4, step 4: the hidden state HwAnd HaInputting the data into an IAN-LoT mechanism to obtain a total aspect vector representation s fusing long-tail distribution characteristics;
step 41: hidden state for input HwAnd HaCalculating an interaction attention weight matrix I e Rn×mAs shown in equation (1):
step 42: performing softmax calculations for each row of the interaction attention weight matrix, as shown in equation (2):
wherein k isijFor the matrix k ∈ Rn×mI row and j column of (a), k represents the attention weight of the text to the aspect, IijIs the element of ith row and jth column in the matrix I;
step 43: and then introducing a data long tail distribution characteristic for the matrix k, as shown in formula (3):
wherein the content of the first and second substances,weight information for each aspect for text introducing long tail distribution, beta epsilon R1×mRepresenting the long tail distribution characteristics learned in advance, and being the reciprocal number of the effective samples in the training set, wherein m is the number of the aspect categories;
step 44: for thePerforming maximum pooling to obtain fine-grained text-to-aspect weight information I blended with long-tail distribution characteristicsLFurther, the weight information and the embedded vector representation of the aspect category are expressedThe multiplication results in the final overall aspect vector representation s, as shown in equation (4):
wherein s ∈ R1×d。
And 5: inputting the total aspect vector s into an attention mechanism that fuses contextual aspect-level semantic information;
summing the total aspect vector s withAs input, a fusion vector is calculatedAs shown in equation (5):
wherein W ∈ Rn×1Is a learnable weight parameter that blends each word with an aspect,vector representing semantic information of the fusion context aspect level, willInput to the attention mechanism, generating an attention weight vector for each predefined facet class; as shown in equation (6), for the jth aspect class:
wherein Wj∈Rd×d,bj∈RdAnd uj∈RdFor a learnable parameter, β ∈ R1×mRepresenting the long tail distribution characteristic learned in advance, and is the reciprocal number of effective samples in the training set, alphaj∈RnIs an attention weight vector;
step 6: using vectorsAs a predicted sentence representation, for the jth aspect class, as shown in equation (7):
wherein, Wj∈Rd×1,bjIs a scalar quantity of the input signals,for the prediction result of the jth aspect category, when the prediction result is greater than the classification threshold value of 0.5, the sentence is considered to contain the jth aspect category.
The method of this embodiment trains the recognition model by using an improved a-DB loss function, which improves the calculation method of the rebalancing weight and the smoothing function, and specifically includes:
first, without considering tag co-occurrence,representing the number of samples containing the jth aspect category in the data set; the expected value of the sampling frequency for the jth aspect class isThe sample sampling frequency P is then estimated for each of the positive classes of repeated samples contained in the exampleIAs shown in equation (8):
wherein the content of the first and second substances,when in useIndicating that the ith sentence contains the jth partyClass a of surfacej,If not, then not included;
wherein gamma is a coordination weight hyperparameter;
in order to avoid the over-suppression of a few classes caused by the advantages of negative labels, a negative class suppression hyper-parameter lambda and a specific bias tau are introducedjAs shown in formula (11):
where ρ isjIs the ratio of the jth category to the total number of samples, and eta is a proportion hyper-parameter;
the A-DB loss function is shown in equation (12):
Example three:
the embodiment provides an aspect category identification system under a long tail distribution scene, the system comprising: the system comprises an input module, a text embedding module, an LSTM module, an IAN-LOT module, a fusion module, an attention mechanism module and a prediction module;
the input module, the text embedding module, the LSTM module, the IAN-LOT module, the fusion module, the attention mechanism module and the prediction module are sequentially connected;
the input module is used for inputting a predefined aspect category combination and a text to be recognized; the text aspect embedding module is used for constructing a word embedding matrix and an aspect category embedding matrix, and mapping an input predefined aspect category combination and a text to be recognized to a text embedding vector and an aspect embedding vector; the LSTM module is used for outputting the hidden states of the text embedding vector and the aspect embedding vector; the IAN-LOT module is used for obtaining a total aspect vector fusing long tail distribution characteristics according to the hidden state; the fusion module is used for fusing context level semantic information and generating a fusion vector; the attention mechanism module generates an attention weight vector for each predefined aspect category according to the fusion vector; the prediction module is used for completing classification and prediction of aspect class identification according to the attention weight vector.
For example, in a restaurant review, "When we sat down, the waiter barly spoken in the outer direction and abrupply overlapped outer menu on the table," we will classify this sentence and a predefined set of aspects: food, staff, miscellaneous, place, service, menu, price, and ambiance are input into the model shown in fig. 1, and a set of results can be predicted by using the model, wherein 1 is that the sentence includes the corresponding aspect type, and 0 is that the sentence does not include the corresponding aspect type, as shown in table 1:
Category | food | staff | miscellaneous | place | service | menu | price | ambience |
label | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 |
to further illustrate the beneficial effects that the present invention can achieve, the following experiments were performed:
the present invention uses 6 baseline methods for comparison:
(1) aspect category identification model:
TextCNN [34 ]: the method for classifying the text by utilizing the convolutional neural network is a more basic model;
LSTM [24 ]: training by using an LSTM network, and classifying by taking the last hidden state as a final form;
SVR [35 ]: combining word vectors of a sentence into a vector as input, and classifying by using a machine learning classifier;
SCAN [36 ]: an aspect category emotion analysis method based on a sentence component perception network.
(2) The ACD task and ACSA task combined training model comprises the following steps:
AS-Capsules [37 ]: a method for performing aspect category sentiment analysis by sharing components using correlation between aspect categories and sentiments;
AC-MIMLLN [6 ]: a joint model of multi-instance multi-label learning aspect emotion analysis is presented, where an attention-based ACD generates valid attention weights for different aspect categories. We compare for such models using their predicted results of ACD.
Table 2 shows a comparison of the results of the Macro F1 experiments on the ACD task by the method of the present invention and other methods of comparison on MAMS-LT and SemEval2014-LT data sets.
Table 2 data comparison results
In order to better research the problem of Long-tail Distribution, the invention refers to a method for creating a Multi-Label image Dataset satisfying the Long-tail Distribution by referring to (Wu, T.et al., Distribution-Balanced Loss for Multi-Label Classification in Long-labeled Datasets.2020: Computer Vision-ECCV 2020.), and an existing SemEval-2014 Task 4(Pontiki, M.et al., SemEval-Task 4: assembled basic analysis.2014.) and a MAMS Dataset (Jiang, Q.A. Challege Dataset and influence for batch analysis in Long-tail Distribution of Distribution 2019. meeting the requirements of the Emulation-topic analysis of the Long-tail Distribution and the load of the national Distribution of the origin of the image Dataset (local, quality, A. Challege Dataset and influence) are respectively created to conform to the emotion Distribution of the Long-Tail Distribution of the local Distribution of the origin and the origin of the local Distribution (local Distribution). Table 1 shows the Macro F1 scores of the baseline method and the method of the invention in the MAMS-LT data set and the SemEval2014-LT data set, and the higher the score is, the better the classification effect is.
From the experimental results, we can conclude the following:
firstly, the method is superior to all baseline methods in the MAMS-LT data set and the SemEval2014-LT data set, and the method has better aspect class detection capability in the data set with the long tail distribution characteristic.
Secondly, compared with the best scoring baseline method AS-Capsules, the method of the present invention is 2.28% and 1.92% higher on both datasets, respectively, indicating that the Macro F1 score of the method of the present invention is a distinct advantage on the MAMS-LT dataset, demonstrating that the method of the present invention has a better effect on aspect category detection for sentences containing multiple aspects.
Thirdly, the reason that the effect of the method of the present invention on the SemEval2014-LT data set is not as good as that of the MAMS-LT data set may be that most of the sentences of the former data set only contain one aspect category, which may weaken the effect of the rebalancing weights designed for the tag co-occurrence problem, but it can be seen that the effect is significantly improved on the MAMS-LT data sets with each sentence containing two or more aspects.
Table 3 shows the Macro F1 score experimental results for each aspect of AS-Capsules and the method of the present invention in the SemEval2014-LT dataset.
Table 3 comparison of AS-Capsules on SemEval2014-LT dataset with the results of the method of the invention
The SemEval2014-LT data set constructed by the method comprises 1, 2 and 2 classes of a header class, a middle class and a tail class respectively. The results show that:
first, for the tail classes such as price and ambiance, the Macro F1 scores were 5.99% and 7.88% higher, respectively, which indicates that the method of the present invention has a significantly improved detection effect on the aspect classes of the tail classes. The prediction result of the tailing class 'price' is even better than that of the head class 'food', and the prediction of the AS-Capsules on the tailing class 'price' is reduced by 9.83% compared with that of the head class 'food'. This therefore also proves the effectiveness of the work of the long tail distribution proposed by the present invention from an experimental point of view.
Secondly, for a head class 'food', the method of the invention is lower than the AS-Capsules of a comparison method, and possible reasons are two, one is that the model of the invention is more in processing the tail class, and more weight is distributed to the tail class when the fine-grained aspect is fused, so more information of the tail class is concerned when an attention mechanism is used; secondly, the use of weight balancing weights: in the improved loss function, the weight of the head classes is reduced, and the suppression effect is more obvious when the number of the head classes is larger, so that the prediction effect of the head classes can be weakened to a certain extent.
Some steps in the embodiments of the present invention may be implemented by software, and the corresponding software program may be stored in a readable storage medium, such as an optical disc or a hard disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (10)
1. An aspect category identification method under a long tail distribution scene is characterized in that the method identifies a data set The N sentences of (1) performs aspect category identification, wherein Sl={w1,w2,…,wnIs the ith sentence in the data set D, consisting of n words, wnRepresents the l-th sentence SlThe nth word;is the l-th sentence SlA corresponding aspect category label;
the method comprises the following steps:
step 1: defining m aspect classes in advance, and using A ═ a1,a2,…,amDenotes wherein amTo describe a word or phrase in the mth aspect,
step 2: constructing a word embedding matrix E1∈R|V|×dEach word wiEmbedding matrix E by the words1Is mapped asWhere | V | is the size of all words in the data set D, and D is the dimension of a word vector;
simultaneous construction of facet class embedding matrices E2∈Rm×dEach word aiEmbedding matrix E by the aspect class2Is mapped as
And step 3: embedding the text into a vectorEmbedding vectors with the aspectInputting the sentence into a long-time memory network LSTM to obtain the network output hidden state of the sentence And
and 4, step 4: the hidden state HwAnd HaInputting the data into an IAN-LoT mechanism to obtain a total aspect vector representation s fusing long-tail distribution characteristics;
and 5: inputting the total aspect vector s into an attention mechanism for fusing context aspect-level semantic information, and calculating a fusion vector
Step 6: using the fused vectorAs a predicted sentence representation, for the jth aspect class, as shown in equation (1):
2. The method of claim 1, wherein the step of calculating the total aspect vector of the fused long tail distribution features in the IAN-LoT mechanism comprises:
step 41: hidden state for input HwAnd HaCalculating an interaction attention weight matrix I e Rn×mAs shown in equation (2):
step 42: performing softmax calculations for each row of the interaction attention weight matrix, as shown in equation (3):
wherein k isijFor the matrix k ∈ Rn×mI row and j column of (a), k represents the attention weight of the text to the aspect, IijIs the element of ith row and jth column in the matrix I;
step 43: and then introducing a data long tail distribution characteristic for the matrix k, as shown in formula (4):
wherein the content of the first and second substances,weight information for each aspect for text introducing long tail distribution, beta epsilon R1×mRepresenting the long tail distribution characteristics learned in advance, and being the reciprocal number of the effective samples in the training set, wherein m is the number of the aspect categories;
step 44: for thePerforming maximum pooling to obtain fine-grained text-to-aspect weight information I blended with long-tail distribution characteristicsLFurther, the weight information and the embedded vector representation of the aspect category are expressedThe multiplication results in the final overall aspect vector representation s, as shown in equation (5):
wherein s ∈ R1×d。
3. The method of claim 2, wherein the fused vector of fused context aspect-level semantic informationThe calculation process of (2) includes:
summing the total aspect vector s withAs input, a fusion vector is calculatedAs shown in equation (6):
wherein W ∈ Rn×1Is a learnable weight parameter that blends each word with an aspect,vector representing semantic information of the fusion context aspect level, willInput to the attention mechanism, generating an attention weight vector for each predefined facet class; as shown in equation (7), for the jth aspect class:
wherein Wj∈Rd×d,bj∈RdAnd uj∈RdFor a learnable parameter, β ∈ R1×mRepresenting the long tail distribution characteristic learned in advance, and is the reciprocal number of effective samples in the training set, alphaj∈RnIs the attention weight vector.
4. The method of claim 3, wherein the method trains the recognition model using an improved A-DB loss function that improves the way the rebalancing weights are computed and the smoothing function, and comprises:
first, without considering tag co-occurrence,representing the number of samples containing the jth aspect category in the data set; the expected value of the sampling frequency for the jth aspect class isThe sample sampling frequency P is then estimated for each of the positive classes of repeated samples contained in the exampleIAs shown in equation (8):
wherein the content of the first and second substances,when in useIndicating that the ith sentence contains the jth aspect category aj,If not, then not included;
wherein gamma is a coordination weight hyperparameter;
in order to avoid the over-suppression of a few classes caused by the advantages of negative labels, a negative class suppression hyper-parameter lambda and a specific bias tau are introducedjAs shown in formula (11):
where ρ isjIs the ratio of the jth category to the total number of samples, and eta is a proportion hyper-parameter;
the A-DB loss function is shown in equation (12):
5. The method of claim 1, wherein the classification threshold is 0.5.
6. An aspect category identification system under a long tail distribution scene, the system comprising: the system comprises an input module, a text embedding module, an LSTM module, an IAN-LOT module, a fusion module, an attention mechanism module and a prediction module;
the input module, the text embedding module, the LSTM module, the IAN-LOT module, the fusion module, the attention mechanism module and the prediction module are sequentially connected;
the input module is used for inputting a predefined aspect category combination and a text to be recognized; the text aspect embedding module is used for constructing a word embedding matrix and an aspect category embedding matrix, and mapping an input predefined aspect category combination and a text to be recognized to a text embedding vector and an aspect embedding vector; the LSTM module is used for outputting the hidden states of the text embedding vector and the aspect embedding vector; the IAN-LOT module is used for obtaining a total aspect vector fusing long tail distribution characteristics according to the hidden state; the fusion module is used for fusing context level semantic information and generating a fusion vector; the attention mechanism module generates an attention weight vector for each predefined aspect category according to the fusion vector; the prediction module is used for completing classification and prediction of aspect class identification according to the attention weight vector.
7. The system of claim 6, wherein the system is to a data set The work process of recognizing the aspect category of the N sentences comprises the following steps:
wherein S isl={w1,w2,…,wnIs the ith sentence in the data set D, consisting of n words, wiRepresents the l-th sentence SlThe ith word;is the l-th sentence SlA corresponding aspect category label;
step 1: defining m aspect classes in advance, and using A ═ a1,a2,…,amDenotes wherein amTo describe a word or phrase in the mth aspect,
step 2: constructing a word embedding matrix E1∈R|V|×dEach word wiEmbedding matrix E by the words1Is mapped asWhere | V | is the size of all words in the data set D, and D is the dimension of a word vector;
simultaneous construction of facet class embedding matrices E2∈Rm×dEach word aiEmbedding matrix E by the aspect class2Is mapped as
And step 3: embedding the text into a vectorEmbedding vectors with the aspectInputting the sentence into a long-time memory network LSTM to obtain the network output hidden state of the sentence And
and 4, step 4: the hidden state HwAnd HaInputting the data into an IAN-LoT mechanism to obtain a total aspect vector representation s fusing long-tail distribution characteristics;
and 5: inputting the total aspect vector s into an attention mechanism for fusing context aspect-level semantic information, and calculating a fusion vector
Step 6: using vectorsAs a predicted sentence representation, for the jth aspect class, as shown in equation (1):
8. The system of claim 7, wherein the step of calculating the total aspect vector of the fused long tail distribution features in the IAN-LoT mechanism comprises:
step 41: hidden state for input HwAnd HaCalculating an interaction attention weight matrix I e Rn×mAs shown in equation (2):
step 42: performing softmax calculations for each row of the interaction attention weight matrix, as shown in equation (3):
wherein k isijFor the matrix k ∈ Rn×mI row and j column of (a), k represents the attention weight of the text to the aspect, IijIs the element of ith row and jth column in the matrix I;
step 43: and then introducing a data long tail distribution characteristic for the matrix k, as shown in formula (4):
wherein the content of the first and second substances,weight information for each aspect for text introducing long tail distribution, beta epsilon R1×mRepresenting the long tail distribution characteristics learned in advance, and being the reciprocal number of the effective samples in the training set, wherein m is the number of the aspect categories;
step 44: for thePerforming maximum pooling to obtain fine-grained text-to-aspect weight information I blended with long-tail distribution characteristicsLFurther, the weight information and the embedded vector representation of the aspect category are expressedThe multiplication results in the final overall aspect vector representation s, as shown in equation (5):
wherein s ∈ R1×d。
9. The system of claim 8, wherein the vector representation of fused contextual aspect-level semantic informationThe calculation process of (2) includes:
summing the total aspect vector s withAs input, a fusion vector is calculatedAs shown in equation (6):
wherein W ∈ Rn×1Is a learnable weight parameter that blends each word with an aspect,vector representing semantic information of the fusion context aspect level, willInput to the attention mechanism, generating an attention weight vector for each predefined facet class; as shown in equation (7), for the jth aspect class:
wherein Wj∈Rd×d,bj∈RdAnd uj∈RdFor a learnable parameter, β ∈ R1×mRepresenting the long tail distribution characteristic learned in advance, and is the reciprocal number of effective samples in the training set, alphaj∈RnIs the attention weight vector.
10. The system of claim 9, wherein the system trains the recognition model using an improved a-DB loss function that improves the way rebalancing weights are computed and the smoothing function, and further comprising:
first, without considering tag co-occurrence,representing the number of samples containing the jth aspect category in the data set; the expected value of the sampling frequency for the jth aspect class isThe sample sampling frequency P is then estimated for each of the positive classes of repeated samples contained in the exampleIAs shown in equation (8):
wherein the content of the first and second substances,when in useIndicating that the ith sentence contains the jth aspect category aj,If not, then not included;
wherein gamma is a coordination weight hyperparameter;
in order to avoid the over-suppression of a few classes caused by the advantages of negative labels, a negative class suppression hyper-parameter lambda and a specific bias tau are introducedjAs shown in formula (11):
where ρ isjIs the ratio of the jth category to the total number of samples, and eta is a proportion hyper-parameter;
the A-DB loss function is shown in equation (12):
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111681644.8A CN114297390B (en) | 2021-12-30 | 2021-12-30 | Aspect category identification method and system in long tail distribution scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111681644.8A CN114297390B (en) | 2021-12-30 | 2021-12-30 | Aspect category identification method and system in long tail distribution scene |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114297390A true CN114297390A (en) | 2022-04-08 |
CN114297390B CN114297390B (en) | 2024-04-02 |
Family
ID=80975589
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111681644.8A Active CN114297390B (en) | 2021-12-30 | 2021-12-30 | Aspect category identification method and system in long tail distribution scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114297390B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115563284A (en) * | 2022-10-24 | 2023-01-03 | 重庆理工大学 | Deep multi-instance weak supervision text classification method based on semantics |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20190143415A (en) * | 2018-06-20 | 2019-12-30 | 강원대학교산학협력단 | Method of High-Performance Machine Reading Comprehension through Feature Selection |
CN111581981A (en) * | 2020-05-06 | 2020-08-25 | 西安交通大学 | Evaluation object strengthening and constraint label embedding based aspect category detection system and method |
CN112199504A (en) * | 2020-10-30 | 2021-01-08 | 福州大学 | Visual angle level text emotion classification method and system integrating external knowledge and interactive attention mechanism |
CN112686056A (en) * | 2021-03-22 | 2021-04-20 | 华南师范大学 | Emotion classification method |
CN113222059A (en) * | 2021-05-28 | 2021-08-06 | 北京理工大学 | Multi-label emotion classification method using cooperative neural network chain |
WO2021164199A1 (en) * | 2020-02-20 | 2021-08-26 | 齐鲁工业大学 | Multi-granularity fusion model-based intelligent semantic chinese sentence matching method, and device |
US11194972B1 (en) * | 2021-02-19 | 2021-12-07 | Institute Of Automation, Chinese Academy Of Sciences | Semantic sentiment analysis method fusing in-depth features and time sequence models |
-
2021
- 2021-12-30 CN CN202111681644.8A patent/CN114297390B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20190143415A (en) * | 2018-06-20 | 2019-12-30 | 강원대학교산학협력단 | Method of High-Performance Machine Reading Comprehension through Feature Selection |
WO2021164199A1 (en) * | 2020-02-20 | 2021-08-26 | 齐鲁工业大学 | Multi-granularity fusion model-based intelligent semantic chinese sentence matching method, and device |
CN111581981A (en) * | 2020-05-06 | 2020-08-25 | 西安交通大学 | Evaluation object strengthening and constraint label embedding based aspect category detection system and method |
CN112199504A (en) * | 2020-10-30 | 2021-01-08 | 福州大学 | Visual angle level text emotion classification method and system integrating external knowledge and interactive attention mechanism |
US11194972B1 (en) * | 2021-02-19 | 2021-12-07 | Institute Of Automation, Chinese Academy Of Sciences | Semantic sentiment analysis method fusing in-depth features and time sequence models |
CN112686056A (en) * | 2021-03-22 | 2021-04-20 | 华南师范大学 | Emotion classification method |
CN113222059A (en) * | 2021-05-28 | 2021-08-06 | 北京理工大学 | Multi-label emotion classification method using cooperative neural network chain |
Non-Patent Citations (3)
Title |
---|
王家乾;龚子寒;薛云;庞士冠;古东宏;: "基于混合多头注意力和胶囊网络的特定目标情感分析", 中文信息学报, no. 05, 15 May 2020 (2020-05-15) * |
邓立明;魏晶晶;吴运兵;余小燕;廖祥文;: "基于知识图谱与循环注意力网络的视角级情感分析", 模式识别与人工智能, no. 06, 15 June 2020 (2020-06-15) * |
黄露;周恩国;李岱峰;: "融合特定任务信息注意力机制的文本表示学习模型", 数据分析与知识发现, no. 09, 31 December 2020 (2020-12-31) * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115563284A (en) * | 2022-10-24 | 2023-01-03 | 重庆理工大学 | Deep multi-instance weak supervision text classification method based on semantics |
Also Published As
Publication number | Publication date |
---|---|
CN114297390B (en) | 2024-04-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ishaq et al. | Aspect-based sentiment analysis using a hybridized approach based on CNN and GA | |
CN112084335B (en) | Social media user account classification method based on information fusion | |
Ye et al. | Advise: Symbolism and external knowledge for decoding advertisements | |
Jain et al. | Sentiment classification of twitter data belonging to renewable energy using machine learning | |
CN105279495A (en) | Video description method based on deep learning and text summarization | |
CN109101490B (en) | Factual implicit emotion recognition method and system based on fusion feature representation | |
CN112861541A (en) | Commodity comment sentiment analysis method based on multi-feature fusion | |
CN112256866A (en) | Text fine-grained emotion analysis method based on deep learning | |
CN112307336A (en) | Hotspot information mining and previewing method and device, computer equipment and storage medium | |
Ara et al. | Understanding customer sentiment: Lexical analysis of restaurant reviews | |
Gandhi et al. | Multimodal sentiment analysis: review, application domains and future directions | |
Nareshkumar et al. | Interactive deep neural network for aspect-level sentiment analysis | |
Fouladfar et al. | Predicting the helpfulness score of product reviews using an evidential score fusion method | |
CN114297390A (en) | Aspect category identification method and system under long-tail distribution scene | |
JP4054046B2 (en) | Opinion determination database creation method and apparatus and program, opinion determination method and apparatus and program, and computer-readable recording medium | |
CN107291686B (en) | Method and system for identifying emotion identification | |
CN109902174B (en) | Emotion polarity detection method based on aspect-dependent memory network | |
Rajput et al. | Analysis of various sentiment analysis techniques | |
Nagpal et al. | Effective approach for sentiment analysis of food delivery apps | |
CN114840665A (en) | Rumor detection method and device based on emotion analysis and related medium | |
Sungsri et al. | The analysis and summarizing system of thai hotel reviews using opinion mining technique | |
CN111340329A (en) | Actor assessment method and device and electronic equipment | |
Tuah et al. | Sentiment Analysis of Political Party News on the Online News Portal Detik. com Using LSTM and CNN | |
Li et al. | Deep recommendation based on dual attention mechanism | |
Velammal | Development of knowledge based sentiment analysis system using lexicon approach on twitter data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |