CN116561325A - Multi-language fused media text emotion analysis method - Google Patents
Multi-language fused media text emotion analysis method Download PDFInfo
- Publication number
- CN116561325A CN116561325A CN202310826886.4A CN202310826886A CN116561325A CN 116561325 A CN116561325 A CN 116561325A CN 202310826886 A CN202310826886 A CN 202310826886A CN 116561325 A CN116561325 A CN 116561325A
- Authority
- CN
- China
- Prior art keywords
- language
- encoder
- source
- data
- loss function
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 46
- 238000004458 analytical method Methods 0.000 title claims abstract description 17
- 238000012545 processing Methods 0.000 claims abstract description 33
- 239000013598 vector Substances 0.000 claims abstract description 30
- 230000006870 function Effects 0.000 claims description 87
- 238000000034 method Methods 0.000 claims description 35
- 239000011159 matrix material Substances 0.000 claims description 24
- 230000003993 interaction Effects 0.000 claims description 20
- 238000013507 mapping Methods 0.000 claims description 20
- 238000009826 distribution Methods 0.000 claims description 14
- 238000011176 pooling Methods 0.000 claims description 11
- 238000013519 translation Methods 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 8
- 238000013459 approach Methods 0.000 claims description 7
- 238000003860 storage Methods 0.000 claims description 4
- 230000003190 augmentative effect Effects 0.000 claims description 3
- 238000004590 computer program Methods 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000008901 benefit Effects 0.000 description 4
- 230000002787 reinforcement Effects 0.000 description 4
- 238000010276 construction Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000463 material Substances 0.000 description 2
- 230000005012 migration Effects 0.000 description 2
- 238000013508 migration Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 239000000047 product Substances 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 238000004904 shortening Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0475—Generative networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/048—Activation functions
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/0895—Weakly supervised learning, e.g. semi-supervised or self-supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/094—Adversarial learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/096—Transfer learning
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Databases & Information Systems (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a multi-language fused media text emotion analysis method, which belongs to the technical field of data processing and specifically comprises the following steps: the source domain language vector is used as input to obtain the output of a source language encoder, the difference between the output of the target language encoder and the output of the source language encoder is determined through a language discriminator, and a learning module and a bilinear module are adopted to correct the parameters of the target language encoder until the difference meets the requirement, and the trained target language encoder is obtained; and carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data, so that emotion analysis work of the multi-language fused media text is better realized.
Description
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to a multi-language fused media text emotion analysis method.
Background
The public opinion text data of different areas and different languages have important reference values for the determinants, and the information of different languages can be used as research data to supplement each other, so that the determinants can analyze the special opinion of the event to be treated in different areas, and the corresponding strategy deployment is adjusted. Based on this, research of cross-language emotion classification methods is particularly important.
In order to accurately evaluate emotion tendencies of a cross-language text, in an invention patent CN115080734a, namely a cross-domain emotion classification method based on attention mechanism and reinforcement learning, a random strategy is applied to perform feature selection by using reinforcement learning thought, policy optimization is performed according to delay rewards obtained by calculation, and an optimal emotion classification strategy is used to realize cross-domain emotion classification, but the following technical problems exist:
in the reinforcement learning stage, differences among different languages are ignored, the differences between a target domain and a source domain which are composed of different languages are different, and when reinforcement learning is performed, emotion recognition and classification cannot be accurately realized without considering the differences.
Aiming at the technical problems, the invention provides a multi-language fused media text emotion analysis method.
Disclosure of Invention
The invention aims to provide a multi-language fused media text emotion analysis method.
In order to solve the technical problems, the invention provides a multi-language fused media text emotion analysis method, which is characterized by comprising the following steps:
s11, acquiring source domain language data, converting the source domain language data into source domain language vector vectors, and training the source domain language vector vectors to acquire a source language encoder and a source language classifier;
s12, initializing a target language encoder based on the source language encoder, and taking a target language vector and a source domain language vector subjected to data enhancement as input of the target language encoder to obtain output of the target language encoder;
s13, taking the source domain language vector as input to obtain the output of a source language encoder, determining the difference between the output of the target language encoder and the output of the source language encoder through a language discriminator, and correcting the parameters of the target language encoder by adopting a learning module and a bilinear module until the difference meets the requirement, thereby obtaining the target language encoder after training is completed;
s14, carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data.
The method is characterized in that the source domain encoder is constructed based on a mBERT-S model, and the target domain encoder is constructed based on a mBERT-T model.
A further technical solution is that determining, by means of a language discriminator, a difference between the output of the target language encoder and the output of the source language encoder, comprising in particular:
obtaining the output of the target language encoder, taking the output of the target language encoder as the input of the language discriminator, and determining the probability that the input of the language discriminator is from the target language encoder through the language discriminator;
obtaining an output of the source language encoder, taking the output of the source language encoder as a source language input of the language discriminator, and determining a probability that the source language input of the language discriminator is from the source language encoder through the language discriminator;
and constructing a loss function through the probability of the target language encoder and the probability of the source language encoder, and determining the difference between the output of the target language encoder and the output of the source language encoder based on the loss function.
The further technical scheme is that the construction of the loss function is performed by the probability of the target language encoder and the probability of the source language encoder, and specifically comprises the following steps:
constructing a target language loss function of the target language encoder through the probability of the target language encoder;
constructing a source language loss function of the source language encoder through the probability of the source language encoder;
constructing a loss function through the target language loss function and the source language loss function, wherein the calculation formula of the loss function is as follows:
;
wherein D is a language discriminator,for the arbiter loss function +.>For source language text, < >>For target language text, < >>For the source language feature extractor->Is a target language feature extractor. />To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>For the arbiter to determine the probability that the input data is from the source language model,determining for the arbiter a probability that the input data is from the target language model; />Means that the state of the arbiter approaches to the optimum, i.e. +.>Approach 1 @ as much as possible>Approaching 0 as much as possible.
The further technical scheme is that the data enhancement processing is performed on the source domain language data and the translated target language data, and specifically includes:
performing data enhancement processing on the source domain language data and the translated target language data in a Code-switching and secondary Dropout mode to obtain the source domain language data after the data enhancement processing is completed and the translated target language data after the data enhancement processing is completed;
and constructing a model total loss function, and determining differences between the source domain language data after the data enhancement processing is completed and the target language data after the translation is completed after the data enhancement processing is completed and the source domain language data before the data processing and the target language data after the translation is completed.
The further technical scheme is that the model total loss function is composed of a contrast loss function and a mixed distance loss function, wherein the mixed distance loss function is determined according to a Euclidean distance function and a Manhattan distance function, and the calculation formula of the model total loss function is as follows:;
wherein the method comprises the steps ofRefers to the total loss function of the module, +.>Is a manually defined weight parameter, +.>Is a mixed distance loss function,/>Is the Barlow Twains loss function, < ->And->The specific formula of (2) is as follows:
;
wherein the method comprises the steps ofIs the Barlow Twains loss function, < ->Is a positive constant that is used to trade-off the importance of the lost first and second terms, equating the diagonal elements of C to 1 by the invariance term, leaving the emmbedding of different augmented versions of the same sample unchanged, and equating the non-diagonal elements to 0 by redundancy reduction term to reduce redundancy, decorrelating the different emmbedding vectors; />Is a cross-correlation matrix calculated along the batch dimension between the outputs of two identical networks, wherein +.>For the matrix dimension +.>The representation dimension is +.>Diagonal matrix of>The representation dimension is +.>Is a non-diagonal matrix of (a);
;
wherein the method comprises the steps ofFor the mixed distance loss function, +.>Is a sample before data enhancement, wherein +.>Before data enhancementSample +.>A dimension vector; />Is a data-enhanced sample, wherein +.>Sample enhanced for data +.>Dimension vector->And (3) manually setting weight parameters, wherein n is the total number of dimensions.
The further technical scheme is that the specific steps of determining the emotion classification result of the target language data are as follows:
s21, the source language text after data enhancement processing and the target language text obtained after translation are subjected to comprehensive encoder to obtain embedded expression of the source language and embedded expression of the target language, and a single-head paired interaction matrix is obtained through bilinear interaction mapping;
s22, through a bilinear pooling layer, determining cross-language joint mapping based on the single-head pair interaction matrix, and correcting the cross-language joint mapping through a sum pool to obtain dense feature mapping;
s23, through the dense feature mapping, using a full connection layer and softmax operation to obtain emotion classification probability.
The further technical scheme is that the loss function of the comprehensive encoder is determined by adopting negative log likelihood loss, and the difference condition of the input of the comprehensive encoder is measured by KL divergence, wherein the calculation formula of the loss function of the comprehensive encoder is as follows:
;
wherein the method comprises the steps ofFor the overall function of the module +.>For bilinear pooling loss function, +.>For the weight parameter of loss, +.>Measuring the difference condition of two distributions for DL divergence, wherein M is the probability distribution of source domain language, and N is the probability distribution of target domain;
;
wherein the method comprises the steps ofLoss function for bilinear pooling module, < ->Indicate->True emotion tags of individual sentences,>is the probability that the ith sentence model outputs the correct emotion of the sample.
In another aspect, the present invention provides a computer storage medium having a computer program stored thereon, which when executed in a computer causes the computer to perform a multi-lingual fused media text emotion analysis method as described above.
The invention has the beneficial effects that:
the invention carries out data enhancement processing based on traditional generation countermeasure type cross-language knowledge migration, and consists of two parts of comparison learning and a mixed distance formula, the module directly acts on a generator part, and the output of the same sample is compared with the output of a source language encoder and a target language encoder by the two encoders, thus the invention acts on the target language encoder simultaneously with generating a loss function of a countermeasure network, thereby helping the target language encoder to better obtain the knowledge of the source language encoder and pulling up the language characteristic distribution among different languages.
The source language field and the target language field are regarded as two different modes, a paired language interaction module taking bilinear attention as a core is designed, the module learns interaction expression of input source language and target language through double channels, and richer joint information is provided compared with a traditional single attention channel, so that the model learns similarity of positive semantics and negative semantics between the two languages, and the model is improved to finish cross-language emotion classification performance.
Additional features and advantages will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.
FIG. 1 is a flow chart of a multi-lingual fused media text emotion analysis method according to embodiment 1.
Fig. 2 is a flowchart showing specific steps for determining the emotion classification result of target language data in embodiment 1.
Fig. 3 is a frame diagram of a computer storage medium in embodiment 2.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus detailed descriptions thereof will be omitted.
The terms "a," "an," "the," and "said" are used to indicate the presence of one or more elements/components/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. in addition to the listed elements/components/etc.
Example 1
In order to solve the above technical problems, as shown in fig. 1, the present invention provides a multi-language fused media text emotion analysis method, which is characterized by specifically comprising:
s11, acquiring source domain language data, converting the source domain language data into source domain language vector vectors, and training the source domain language vector vectors to acquire a source language encoder and a source language classifier;
s12, initializing a target language encoder based on the source language encoder, and taking a target language vector and a source domain language vector subjected to data enhancement as input of the target language encoder to obtain output of the target language encoder;
the source domain encoder is constructed based on the mBERT-S model, and the target domain encoder is constructed based on the mBERT-T model.
It will be appreciated that determining, by the speech discriminator, the difference between the output of the target speech coder and the output of the source speech coder, comprises:
obtaining the output of the target language encoder, taking the output of the target language encoder as the input of the language discriminator, and determining the probability that the input of the language discriminator is from the target language encoder through the language discriminator;
it should be noted that the source domain and target domain mappings can be forced to agree by limiting all network layers, learning such symmetric transformations can save parameters in the model, but such constraints often degrade optimization conditions, resulting in loss of some domain features, rendering the performance less than ideal when one network processes data from two different domains. Another approach is to learn an asymmetry transformation, i.e. to constrain only a portion of the network layer, thereby forcing that portion to align. In order to enable the source domain and the target domain to extract more common characteristics, the method adopts the latter, namely, under the condition of only fixing a small amount of super parameters, other constraints are canceled.
Obtaining an output of the source language encoder, taking the output of the source language encoder as a source language input of the language discriminator, and determining a probability that the source language input of the language discriminator is from the source language encoder through the language discriminator;
and constructing a loss function through the probability of the target language encoder and the probability of the source language encoder, and determining the difference between the output of the target language encoder and the output of the source language encoder based on the loss function.
Theoretically, this is obviously difficult to achieve in order to bring the arbiter to a perfect state. The target language model can only be continuously trained to gradually acquire language knowledge in the source language model, so that the arbiter cannot judge the source of the data. The nature of training is that the arbiter learns to determine the source of the data rather than how it is "spoofed".
Specifically, the construction of the loss function through the probability of the target language encoder and the probability of the source language encoder specifically includes:
constructing a target language loss function of the target language encoder through the probability of the target language encoder;
constructing a source language loss function of the source language encoder through the probability of the source language encoder;
constructing a loss function through the target language loss function and the source language loss function, wherein the calculation formula of the loss function is as follows:
;
wherein D is a language discriminator,for the arbiter loss function +.>For source language text, < >>For target language text, < >>For the source language feature extractor->Is a target language feature extractor. />To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>For the arbiter to determine the probability that the input data is from the source language model,determining for the arbiter a probability that the input data is from the target language model; />Means that the state of the arbiter approaches to the optimum, i.e. +.>Approach 1 @ as much as possible>Approaching 0 as much as possible.
S13, taking the source domain language vector as input to obtain the output of a source language encoder, determining the difference between the output of the target language encoder and the output of the source language encoder through a language discriminator, and correcting the parameters of the target language encoder by adopting a learning module and a bilinear module until the difference meets the requirement, thereby obtaining the target language encoder after training is completed;
it should be noted that, performing data enhancement processing on the source domain language data and the translated target language data specifically includes:
performing data enhancement processing on the source domain language data and the translated target language data in a Code-switching and secondary Dropout mode to obtain the source domain language data after the data enhancement processing is completed and the translated target language data after the data enhancement processing is completed;
note that, the enhancement mode of the sample pair is constructed: code-switching. The core principle is that partial vocabulary in the source language text is replaced by the target language to mix the texts, so as to achieve the purpose of constructing the sample pair. The sample pair constructed by the principle can further obtain the learning ability of implicit characteristics of the cross-language model, but the method needs to maintain a particularly large bilingual emotion dictionary in the process of training the model, which is unacceptable in the absence of large-scale manual labeling. The invention adopts the combined operation of Dropout and Code-switching which are less dependent on external conditions in a data enhancement mode. The method comprises the steps of setting a certain probability of each element in a sample to be zero, constructing a positive sample pair, and comparing the sample before data enhancement and the sample after data enhancement in the training process, and shortening the distance between the positive sample pair in a feature space so as to learn similar features in the sample, thereby improving the model effect.
And constructing a model total loss function, and determining differences between the source domain language data after the data enhancement processing is completed and the target language data after the translation is completed after the data enhancement processing is completed and the source domain language data before the data processing and the target language data after the translation is completed.
In particular, wherein the model total loss function is composed of a contrast loss function and a hybrid distance loss function, wherein the hybrid distance loss function is determined from a Euclidean distance function and a Manhattan distance function,
;
wherein the method comprises the steps ofRefers to the total loss function of the module, +.>Is a manually defined weight parameter, +.>Is a mixed distance loss function,/>Is the Barlow Twains loss function, < ->And->The specific formula of (2) is as follows:
;
wherein the method comprises the steps ofIs the Barlow Twains loss function, < ->Is a positive constant that is used to trade-off the importance of the lost first and second terms, equating the diagonal elements of C to 1 by the invariance term, leaving the emmbedding of different augmented versions of the same sample unchanged, and equating the non-diagonal elements to 0 by redundancy reduction term to reduce redundancy, decorrelating the different emmbedding vectors; />Is a cross-correlation matrix calculated along the batch dimension between the outputs of two identical networks, wherein +.>For the matrix dimension +.>The representation dimension is +.>Diagonal matrix of>The representation dimension is +.>Is a non-diagonal matrix of (a);
;
wherein the method comprises the steps ofFor the mixed distance loss function, +.>Is a sample before data enhancement, wherein +.>Sample +.f. in feature space before data enhancement>A dimension vector; />Is a data-enhanced sample, wherein +.>Sample enhanced for data +.>Dimension vector->And (3) manually setting weight parameters, wherein n is the total number of dimensions.
In contrast learning, simCLR and Barlow Twins are commonly used. In this simple case, simCLR is very high in order of magnitude for batch size, and SimCLR relies heavily on negative sample pairs. In view of the fact that the cross-language emotion classification task itself is different from the task in the computer vision field, the main purpose of the task is to shorten the distance between the language features of the source language and the target language, the used sample is only the sample of the source language and the target language, it is difficult to construct a negative sample pair, and a negative sample pair is forced to have a counterproductive effect on the task. In summary, the invention selects a more suitable method of contrast learning by Barlow Twons.
S14, carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data.
Specifically, as shown in fig. 2, the specific steps for determining the emotion classification result of the target language data are as follows:
s21, the source language text after data enhancement processing and the target language text obtained after translation are subjected to comprehensive encoder to obtain embedded expression of the source language and embedded expression of the target language, and a single-head paired interaction matrix is obtained through bilinear interaction mapping;
it should be further noted that the source language text after the data enhancement processing and the target language text obtained after the translation pass through the source language encoderAnd target language encoder->Obtaining embedded expression of source language after encodingEmbedded expression of the target language +.>Where M and N are the lengths of the source language sentence and the target language sentence, respectively. The present invention then uses these hidden representations to construct a bilinear interaction map to obtain a single-headed pair-wise interaction matrix +.>Interaction matrix->The structural formula of (2) is formula 4;
;
single-head paired interaction matrix 4Construction formula
Where U is a learnable weight matrix for the source language domain representation, V is a learnable weight matrix for the target language domain representation, q is a learnable weight vector,is a fixed full 1 vector, +.>Representing the Hadamard product. For interaction matrix->,/>Can be calculated by formula 5;
;
5 interaction matrixCalculation formula
S22, through a bilinear pooling layer, determining cross-language joint mapping based on the single-head pair interaction matrix, and correcting the cross-language joint mapping through a sum pool to obtain dense feature mapping;
it should be further noted that in order to obtain cross-language federated mappingsThe invention is in the interaction matrix->A bilinear pooling layer is introduced above. Specifically, the->The K-th element of (2) can be obtained from formula 6;
;
formula for calculating combined mapping of 6 languages
Wherein the method comprises the steps ofAnd->The kth columns of the weight matrices U and V are represented, respectively. It should be noted that this layer has no new learnable parameters. The weight matrices U and V are shared with the previous interactive mapping layer to reduce the number of parameters and mitigate overfitting. Furthermore, in cross-language joint mapping +.>As above, the present invention adds a Sum pool (Sum-pooling) to obtain a dense feature map +.>。
S23, through the dense feature mapping, using a full connection layer and softmax operation to obtain emotion classification probability.
It should be noted that, the loss function of the integrated encoder is determined by adopting negative log likelihood loss, and the difference condition of the input of the integrated encoder is measured by KL divergence, wherein the calculation formula of the loss function of the integrated encoder is as follows:
;
wherein the method comprises the steps ofFor the overall function of the module +.>For bilinear pooling loss function, +.>For the weight parameter of loss, +.>Measuring the difference condition of two distributions for DL divergence, wherein M is the probability distribution of source domain language, and N is the probability distribution of target domain;
;
wherein the method comprises the steps ofLoss function for bilinear pooling module, < ->Indicate->True emotion tags of individual sentences,>is the probability that the ith sentence model outputs the correct emotion of the sample.
Finally, the invention maps featuresThe emotion classification probability is obtained by a full connection layer and softmax operation as shown in formula 7:
;
emotion classification probability calculation formula 7
Wherein the method comprises the steps ofRepresenting a learnable weight matrix, +.>Representing the bias. In this section, the present invention is trained using Negative log-likelihood Loss (NLLL) as shown in equation 8:
;
8 negative log likelihood loss
Wherein the method comprises the steps ofIndicate->True emotion tags of individual sentences,>is the output probability of the model.
Furthermore, given the probability distribution p of the source domain language and the probability distribution q of the target domain, the present invention uses the KL divergence to measure the difference of the two distributions. The final objective optimization function of the present module is therefore shown in equation 9:
;
equation 9 module total loss function.
Example 2
As shown in fig. 2, the present invention provides a computer storage medium having a computer program stored thereon, which when executed in a computer, causes the computer to perform a multi-language fused media text emotion analysis method as described above.
The invention has the beneficial effects that:
the invention carries out data enhancement processing based on traditional generation countermeasure type cross-language knowledge migration, and consists of two parts of comparison learning and a mixed distance formula, the module directly acts on a generator part, and the output of the same sample is compared with the output of a source language encoder and a target language encoder by the two encoders, thus the invention acts on the target language encoder simultaneously with generating a loss function of a countermeasure network, thereby helping the target language encoder to better obtain the knowledge of the source language encoder and pulling up the language characteristic distribution among different languages.
The source language field and the target language field are regarded as two different modes, a paired language interaction module taking bilinear attention as a core is designed, the module learns interaction expression of input source language and target language through double channels, and richer joint information is provided compared with a traditional single attention channel, so that the model learns similarity of positive semantics and negative semantics between the two languages, and the model is improved to finish cross-language emotion classification performance.
In embodiments of the present invention, the term "plurality" refers to two or more, unless explicitly defined otherwise. The terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly attached, detachably attached, or integrally attached. The specific meaning of the above terms in the embodiments of the present invention will be understood by those of ordinary skill in the art according to specific circumstances.
In the description of the embodiments of the present invention, it should be understood that the directions or positional relationships indicated by the terms "upper", "lower", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience in describing the embodiments of the present invention and to simplify the description, and do not indicate or imply that the devices or units referred to must have a specific direction, be configured and operated in a specific direction, and thus should not be construed as limiting the embodiments of the present invention.
In the description of the present specification, the terms "one embodiment," "a preferred embodiment," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above is only a preferred embodiment of the present invention and is not intended to limit the embodiment of the present invention, and various modifications and variations can be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present invention should be included in the protection scope of the embodiments of the present invention.
Claims (9)
1. A multi-language fused media text emotion analysis method is characterized by comprising the following steps:
acquiring source domain language data, converting the source domain language data into source domain language vector vectors, and training the source domain language vector vectors to acquire a source language encoder and a source language classifier;
initializing a target language encoder based on the source language encoder, and taking a target language vector and a source domain language vector subjected to data enhancement as input of the target language encoder to obtain output of the target language encoder;
the source domain language vector is used as input to obtain the output of a source language encoder, the difference between the output of the target language encoder and the output of the source language encoder is determined through a language discriminator, and a learning module and a bilinear module are adopted to correct the parameters of the target language encoder until the difference meets the requirement, and the trained target language encoder is obtained;
and carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data.
2. The method of claim 1, wherein the source domain encoder is constructed based on a mBERT-S model and the target domain encoder is constructed based on a mBERT-T model.
3. The method of claim 1, wherein determining, by a speech discriminator, a difference between the output of the target speech coder and the output of the source speech coder, comprises:
obtaining the output of the target language encoder, taking the output of the target language encoder as the input of the language discriminator, and determining the probability that the input of the language discriminator is from the target language encoder through the language discriminator;
obtaining an output of the source language encoder, taking the output of the source language encoder as a source language input of the language discriminator, and determining a probability that the source language input of the language discriminator is from the source language encoder through the language discriminator;
and constructing a loss function through the probability of the target language encoder and the probability of the source language encoder, and determining the difference between the output of the target language encoder and the output of the source language encoder based on the loss function.
4. The method for emotion analysis of multilingual fused media text of claim 3, wherein constructing a loss function by probability of said target language encoder and probability of said source language encoder comprises:
constructing a target language loss function of the target language encoder through the probability of the target language encoder;
constructing a source language loss function of the source language encoder through the probability of the source language encoder;
constructing a loss function through the target language loss function and the source language loss function, wherein the calculation formula of the loss function is as follows:
;
wherein D is a language discriminator,for the arbiter loss function +.>For source language text, < >>For target language text, < >>For the source language feature extractor->Is a target language feature extractor. />To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>For the arbiter to determine the probability that the input data is from the source language model,determining for the arbiter a probability that the input data is from the target language model; />Means that the state of the arbiter approaches to the optimum, i.e. +.>Approach 1 @ as much as possible>Approaching 0 as much as possible.
5. The method for emotion analysis of multilingual fused media text according to claim 1, wherein the data enhancement processing is performed on the source domain language data and the translated target language data, and specifically comprises:
performing data enhancement processing on the source domain language data and the translated target language data in a Code-switching and secondary Dropout mode to obtain the source domain language data after the data enhancement processing is completed and the translated target language data after the data enhancement processing is completed;
and constructing a model total loss function, and determining differences between the source domain language data after the data enhancement processing is completed and the target language data after the translation is completed after the data enhancement processing is completed and the source domain language data before the data processing and the target language data after the translation is completed.
6. The method of claim 5, wherein the model total loss function is comprised of a contrast loss function and a hybrid distance loss function, wherein the hybrid distance loss function is determined from a euclidean distance function and a manhattan distance function, wherein the model total loss function is calculated as:
;
wherein the method comprises the steps ofRefers to the total loss function of the module, +.>Is a manually defined weight parameter, +.>Is a mixed distance loss function,/>Is the Barlow Twains loss function, < ->And->The specific formula of (2) is as follows:
;
wherein the method comprises the steps ofIs the Barlow Twains loss function, < ->Is a positive constant that is used to trade-off the importance of the lost first and second terms, equating the diagonal elements of C to 1 by the invariance term, leaving the emmbedding of different augmented versions of the same sample unchanged, and equating the non-diagonal elements to 0 by redundancy reduction term to reduce redundancy, decorrelating the different emmbedding vectors; />Is a cross-correlation matrix calculated along the batch dimension between the outputs of two identical networks, wherein +.>For the matrix dimension +.>The representation dimension is +.>Diagonal matrix of>Then representDimension is->Is a non-diagonal matrix of (a);
;
wherein the method comprises the steps ofFor the mixed distance loss function, +.>Is a sample before data enhancement, wherein +.>Sample +.f. in feature space before data enhancement>A dimension vector; />Is a data-enhanced sample, wherein +.>Sample enhanced for data +.>Dimension vector->And (3) manually setting weight parameters, wherein n is the total number of dimensions.
7. The emotion analysis method of multi-language fused media text according to claim 1, wherein the specific steps of determining emotion classification result of the target language data are:
the method comprises the steps that a source language text after data enhancement processing and a target language text obtained after translation are subjected to comprehensive encoder to obtain embedded expression of the source language and embedded expression of the target language, and a single-head paired interaction matrix is obtained through bilinear interaction mapping;
determining cross-language joint mapping based on the single-head pair interaction matrix through a bilinear pooling layer, and correcting the cross-language joint mapping through a sum pool to obtain dense feature mapping;
and obtaining emotion classification probability by adopting the dense feature mapping and adopting a full connection layer and softmax operation.
8. The method of claim 1, wherein the loss function of the integrated encoder is determined using negative log likelihood loss and the difference in input to the integrated encoder is measured by KL divergence, wherein the loss function of the integrated encoder is calculated by the formula:;
wherein the method comprises the steps ofFor the overall function of the module +.>For bilinear pooling loss function, +.>In order to be a weight parameter for the loss,measuring the difference of two distributions for DL divergence, M is the probability distribution of source domain language, N is the probability distribution of target domain>;
Wherein the method comprises the steps ofLoss function for bilinear pooling module, < ->Indicate->True emotion tags of individual sentences,>is the probability that the ith sentence model outputs the correct emotion of the sample.
9. A computer storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform a multi-lingual fused media text emotion analysis method according to any of claims 1-8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310826886.4A CN116561325B (en) | 2023-07-07 | 2023-07-07 | Multi-language fused media text emotion analysis method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310826886.4A CN116561325B (en) | 2023-07-07 | 2023-07-07 | Multi-language fused media text emotion analysis method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116561325A true CN116561325A (en) | 2023-08-08 |
CN116561325B CN116561325B (en) | 2023-10-13 |
Family
ID=87500420
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310826886.4A Active CN116561325B (en) | 2023-07-07 | 2023-07-07 | Multi-language fused media text emotion analysis method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116561325B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117648410A (en) * | 2024-01-30 | 2024-03-05 | 中国标准化研究院 | Multi-language text data analysis system and method |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326214A (en) * | 2016-08-29 | 2017-01-11 | 中译语通科技(北京)有限公司 | Method and device for cross-language emotion analysis based on transfer learning |
CN109325112A (en) * | 2018-06-27 | 2019-02-12 | 北京大学 | A kind of across language sentiment analysis method and apparatus based on emoji |
US20210390270A1 (en) * | 2020-06-16 | 2021-12-16 | Baidu Usa Llc | Cross-lingual unsupervised classification with multi-view transfer learning |
CN113901208A (en) * | 2021-09-15 | 2022-01-07 | 昆明理工大学 | Method for analyzing emotion tendentiousness of intermediate-crossing language comments blended with theme characteristics |
CN114238636A (en) * | 2021-12-14 | 2022-03-25 | 东南大学 | Translation matching-based cross-language attribute level emotion classification method |
CN115080734A (en) * | 2022-04-29 | 2022-09-20 | 石燕青 | Cross-domain emotion classification method based on attention mechanism and reinforcement learning |
CN115952787A (en) * | 2023-03-13 | 2023-04-11 | 北京澜舟科技有限公司 | Emotion analysis method, system and storage medium for specified target entity |
-
2023
- 2023-07-07 CN CN202310826886.4A patent/CN116561325B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106326214A (en) * | 2016-08-29 | 2017-01-11 | 中译语通科技(北京)有限公司 | Method and device for cross-language emotion analysis based on transfer learning |
CN109325112A (en) * | 2018-06-27 | 2019-02-12 | 北京大学 | A kind of across language sentiment analysis method and apparatus based on emoji |
US20210390270A1 (en) * | 2020-06-16 | 2021-12-16 | Baidu Usa Llc | Cross-lingual unsupervised classification with multi-view transfer learning |
CN113901208A (en) * | 2021-09-15 | 2022-01-07 | 昆明理工大学 | Method for analyzing emotion tendentiousness of intermediate-crossing language comments blended with theme characteristics |
CN114238636A (en) * | 2021-12-14 | 2022-03-25 | 东南大学 | Translation matching-based cross-language attribute level emotion classification method |
CN115080734A (en) * | 2022-04-29 | 2022-09-20 | 石燕青 | Cross-domain emotion classification method based on attention mechanism and reinforcement learning |
CN115952787A (en) * | 2023-03-13 | 2023-04-11 | 北京澜舟科技有限公司 | Emotion analysis method, system and storage medium for specified target entity |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117648410A (en) * | 2024-01-30 | 2024-03-05 | 中国标准化研究院 | Multi-language text data analysis system and method |
CN117648410B (en) * | 2024-01-30 | 2024-05-14 | 中国标准化研究院 | Multi-language text data analysis system and method |
Also Published As
Publication number | Publication date |
---|---|
CN116561325B (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109190131B (en) | Neural machine translation-based English word and case joint prediction method thereof | |
CN111160037B (en) | Fine-grained emotion analysis method supporting cross-language migration | |
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
WO2021155699A1 (en) | Global encoding method for automatic abstract of chinese long text | |
CN107305543B (en) | Method and device for classifying semantic relation of entity words | |
CN116561325B (en) | Multi-language fused media text emotion analysis method | |
CN113901831B (en) | Parallel sentence pair extraction method based on pre-training language model and bidirectional interaction attention | |
Yan et al. | Smarter Response with Proactive Suggestion: A New Generative Neural Conversation Paradigm. | |
Xiu et al. | A handwritten Chinese text recognizer applying multi-level multimodal fusion network | |
CN112507717A (en) | Medical field entity classification method fusing entity keyword features | |
Liu et al. | Cross-domain slot filling as machine reading comprehension: A new perspective | |
CN117113937A (en) | Electric power field reading and understanding method and system based on large-scale language model | |
CN116955644A (en) | Knowledge fusion method, system and storage medium based on knowledge graph | |
CN111428518B (en) | Low-frequency word translation method and device | |
CN114595700A (en) | Zero-pronoun and chapter information fused Hanyue neural machine translation method | |
CN116415587A (en) | Information processing apparatus and information processing method | |
CN117933258A (en) | Named entity identification method and system | |
Zhao et al. | Tibetan multi-dialect speech recognition using latent regression Bayesian network and end-to-end mode | |
CN107633259A (en) | A kind of cross-module state learning method represented based on sparse dictionary | |
CN117235256A (en) | Emotion analysis classification method under multi-class knowledge system | |
CN116680407A (en) | Knowledge graph construction method and device | |
CN113342982B (en) | Enterprise industry classification method integrating Roberta and external knowledge base | |
CN115659242A (en) | Multimode emotion classification method based on mode enhanced convolution graph | |
CN114880521A (en) | Video description method and medium based on vision and language semantic autonomous optimization alignment | |
CN111008283B (en) | Sequence labeling method and system based on composite boundary information |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |