CN116561325A

CN116561325A - Multi-language fused media text emotion analysis method

Info

Publication number: CN116561325A
Application number: CN202310826886.4A
Authority: CN
Inventors: 吴林; 王永滨; 周亭; 李海滨; 李�瑞; 刘嘉暄
Original assignee: Communication University of China
Current assignee: Communication University of China
Priority date: 2023-07-07
Filing date: 2023-07-07
Publication date: 2023-08-08
Anticipated expiration: 2043-07-07
Also published as: CN116561325B

Abstract

The invention provides a multi-language fused media text emotion analysis method, which belongs to the technical field of data processing and specifically comprises the following steps: the source domain language vector is used as input to obtain the output of a source language encoder, the difference between the output of the target language encoder and the output of the source language encoder is determined through a language discriminator, and a learning module and a bilinear module are adopted to correct the parameters of the target language encoder until the difference meets the requirement, and the trained target language encoder is obtained; and carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data, so that emotion analysis work of the multi-language fused media text is better realized.

Description

Multi-language fused media text emotion analysis method

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to a multi-language fused media text emotion analysis method.

Background

The public opinion text data of different areas and different languages have important reference values for the determinants, and the information of different languages can be used as research data to supplement each other, so that the determinants can analyze the special opinion of the event to be treated in different areas, and the corresponding strategy deployment is adjusted. Based on this, research of cross-language emotion classification methods is particularly important.

In order to accurately evaluate emotion tendencies of a cross-language text, in an invention patent CN115080734a, namely a cross-domain emotion classification method based on attention mechanism and reinforcement learning, a random strategy is applied to perform feature selection by using reinforcement learning thought, policy optimization is performed according to delay rewards obtained by calculation, and an optimal emotion classification strategy is used to realize cross-domain emotion classification, but the following technical problems exist:

in the reinforcement learning stage, differences among different languages are ignored, the differences between a target domain and a source domain which are composed of different languages are different, and when reinforcement learning is performed, emotion recognition and classification cannot be accurately realized without considering the differences.

Aiming at the technical problems, the invention provides a multi-language fused media text emotion analysis method.

Disclosure of Invention

The invention aims to provide a multi-language fused media text emotion analysis method.

In order to solve the technical problems, the invention provides a multi-language fused media text emotion analysis method, which is characterized by comprising the following steps:

s11, acquiring source domain language data, converting the source domain language data into source domain language vector vectors, and training the source domain language vector vectors to acquire a source language encoder and a source language classifier;

s12, initializing a target language encoder based on the source language encoder, and taking a target language vector and a source domain language vector subjected to data enhancement as input of the target language encoder to obtain output of the target language encoder;

s13, taking the source domain language vector as input to obtain the output of a source language encoder, determining the difference between the output of the target language encoder and the output of the source language encoder through a language discriminator, and correcting the parameters of the target language encoder by adopting a learning module and a bilinear module until the difference meets the requirement, thereby obtaining the target language encoder after training is completed;

s14, carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data.

The method is characterized in that the source domain encoder is constructed based on a mBERT-S model, and the target domain encoder is constructed based on a mBERT-T model.

A further technical solution is that determining, by means of a language discriminator, a difference between the output of the target language encoder and the output of the source language encoder, comprising in particular:

obtaining the output of the target language encoder, taking the output of the target language encoder as the input of the language discriminator, and determining the probability that the input of the language discriminator is from the target language encoder through the language discriminator;

obtaining an output of the source language encoder, taking the output of the source language encoder as a source language input of the language discriminator, and determining a probability that the source language input of the language discriminator is from the source language encoder through the language discriminator;

and constructing a loss function through the probability of the target language encoder and the probability of the source language encoder, and determining the difference between the output of the target language encoder and the output of the source language encoder based on the loss function.

The further technical scheme is that the construction of the loss function is performed by the probability of the target language encoder and the probability of the source language encoder, and specifically comprises the following steps:

constructing a target language loss function of the target language encoder through the probability of the target language encoder;

constructing a source language loss function of the source language encoder through the probability of the source language encoder;

constructing a loss function through the target language loss function and the source language loss function, wherein the calculation formula of the loss function is as follows:

;

wherein D is a language discriminator,for the arbiter loss function +.>For source language text, < >>For target language text, < >>For the source language feature extractor->Is a target language feature extractor. />To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>To be from sample->One of the individual samples is selected randomly +.>Extracting features, and adding herba Vernoniae Cinerariifolii>For the arbiter to determine the probability that the input data is from the source language model,determining for the arbiter a probability that the input data is from the target language model; />Means that the state of the arbiter approaches to the optimum, i.e. +.>Approach 1 @ as much as possible>Approaching 0 as much as possible.

The further technical scheme is that the data enhancement processing is performed on the source domain language data and the translated target language data, and specifically includes:

performing data enhancement processing on the source domain language data and the translated target language data in a Code-switching and secondary Dropout mode to obtain the source domain language data after the data enhancement processing is completed and the translated target language data after the data enhancement processing is completed;

and constructing a model total loss function, and determining differences between the source domain language data after the data enhancement processing is completed and the target language data after the translation is completed after the data enhancement processing is completed and the source domain language data before the data processing and the target language data after the translation is completed.

The further technical scheme is that the model total loss function is composed of a contrast loss function and a mixed distance loss function, wherein the mixed distance loss function is determined according to a Euclidean distance function and a Manhattan distance function, and the calculation formula of the model total loss function is as follows:;

wherein the method comprises the steps ofRefers to the total loss function of the module, +.>Is a manually defined weight parameter, +.>Is a mixed distance loss function,/>Is the Barlow Twains loss function, < ->And->The specific formula of (2) is as follows:

;

wherein the method comprises the steps ofIs the Barlow Twains loss function, < ->Is a positive constant that is used to trade-off the importance of the lost first and second terms, equating the diagonal elements of C to 1 by the invariance term, leaving the emmbedding of different augmented versions of the same sample unchanged, and equating the non-diagonal elements to 0 by redundancy reduction term to reduce redundancy, decorrelating the different emmbedding vectors; />Is a cross-correlation matrix calculated along the batch dimension between the outputs of two identical networks, wherein +.>For the matrix dimension +.>The representation dimension is +.>Diagonal matrix of>The representation dimension is +.>Is a non-diagonal matrix of (a);

;

wherein the method comprises the steps ofFor the mixed distance loss function, +.>Is a sample before data enhancement, wherein +.>Before data enhancementSample +.>A dimension vector; />Is a data-enhanced sample, wherein +.>Sample enhanced for data +.>Dimension vector->And (3) manually setting weight parameters, wherein n is the total number of dimensions.

The further technical scheme is that the specific steps of determining the emotion classification result of the target language data are as follows:

s21, the source language text after data enhancement processing and the target language text obtained after translation are subjected to comprehensive encoder to obtain embedded expression of the source language and embedded expression of the target language, and a single-head paired interaction matrix is obtained through bilinear interaction mapping;

s22, through a bilinear pooling layer, determining cross-language joint mapping based on the single-head pair interaction matrix, and correcting the cross-language joint mapping through a sum pool to obtain dense feature mapping;

s23, through the dense feature mapping, using a full connection layer and softmax operation to obtain emotion classification probability.

The further technical scheme is that the loss function of the comprehensive encoder is determined by adopting negative log likelihood loss, and the difference condition of the input of the comprehensive encoder is measured by KL divergence, wherein the calculation formula of the loss function of the comprehensive encoder is as follows:

;

wherein the method comprises the steps ofFor the overall function of the module +.>For bilinear pooling loss function, +.>For the weight parameter of loss, +.>Measuring the difference condition of two distributions for DL divergence, wherein M is the probability distribution of source domain language, and N is the probability distribution of target domain;

;

wherein the method comprises the steps ofLoss function for bilinear pooling module, < ->Indicate->True emotion tags of individual sentences,>is the probability that the ith sentence model outputs the correct emotion of the sample.

In another aspect, the present invention provides a computer storage medium having a computer program stored thereon, which when executed in a computer causes the computer to perform a multi-lingual fused media text emotion analysis method as described above.

The invention has the beneficial effects that:

the invention carries out data enhancement processing based on traditional generation countermeasure type cross-language knowledge migration, and consists of two parts of comparison learning and a mixed distance formula, the module directly acts on a generator part, and the output of the same sample is compared with the output of a source language encoder and a target language encoder by the two encoders, thus the invention acts on the target language encoder simultaneously with generating a loss function of a countermeasure network, thereby helping the target language encoder to better obtain the knowledge of the source language encoder and pulling up the language characteristic distribution among different languages.

The source language field and the target language field are regarded as two different modes, a paired language interaction module taking bilinear attention as a core is designed, the module learns interaction expression of input source language and target language through double channels, and richer joint information is provided compared with a traditional single attention channel, so that the model learns similarity of positive semantics and negative semantics between the two languages, and the model is improved to finish cross-language emotion classification performance.

Additional features and advantages will be set forth in the description which follows, and in part will be apparent from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and drawings.

In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.

Drawings

The above and other features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings.

FIG. 1 is a flow chart of a multi-lingual fused media text emotion analysis method according to embodiment 1.

Fig. 2 is a flowchart showing specific steps for determining the emotion classification result of target language data in embodiment 1.

Fig. 3 is a frame diagram of a computer storage medium in embodiment 2.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments can be embodied in many forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The same reference numerals in the drawings denote the same or similar structures, and thus detailed descriptions thereof will be omitted.

The terms "a," "an," "the," and "said" are used to indicate the presence of one or more elements/components/etc.; the terms "comprising" and "having" are intended to be inclusive and mean that there may be additional elements/components/etc. in addition to the listed elements/components/etc.

Example 1

In order to solve the above technical problems, as shown in fig. 1, the present invention provides a multi-language fused media text emotion analysis method, which is characterized by specifically comprising:

the source domain encoder is constructed based on the mBERT-S model, and the target domain encoder is constructed based on the mBERT-T model.

It will be appreciated that determining, by the speech discriminator, the difference between the output of the target speech coder and the output of the source speech coder, comprises:

it should be noted that the source domain and target domain mappings can be forced to agree by limiting all network layers, learning such symmetric transformations can save parameters in the model, but such constraints often degrade optimization conditions, resulting in loss of some domain features, rendering the performance less than ideal when one network processes data from two different domains. Another approach is to learn an asymmetry transformation, i.e. to constrain only a portion of the network layer, thereby forcing that portion to align. In order to enable the source domain and the target domain to extract more common characteristics, the method adopts the latter, namely, under the condition of only fixing a small amount of super parameters, other constraints are canceled.

Theoretically, this is obviously difficult to achieve in order to bring the arbiter to a perfect state. The target language model can only be continuously trained to gradually acquire language knowledge in the source language model, so that the arbiter cannot judge the source of the data. The nature of training is that the arbiter learns to determine the source of the data rather than how it is "spoofed".

Specifically, the construction of the loss function through the probability of the target language encoder and the probability of the source language encoder specifically includes:

;

it should be noted that, performing data enhancement processing on the source domain language data and the translated target language data specifically includes:

note that, the enhancement mode of the sample pair is constructed: code-switching. The core principle is that partial vocabulary in the source language text is replaced by the target language to mix the texts, so as to achieve the purpose of constructing the sample pair. The sample pair constructed by the principle can further obtain the learning ability of implicit characteristics of the cross-language model, but the method needs to maintain a particularly large bilingual emotion dictionary in the process of training the model, which is unacceptable in the absence of large-scale manual labeling. The invention adopts the combined operation of Dropout and Code-switching which are less dependent on external conditions in a data enhancement mode. The method comprises the steps of setting a certain probability of each element in a sample to be zero, constructing a positive sample pair, and comparing the sample before data enhancement and the sample after data enhancement in the training process, and shortening the distance between the positive sample pair in a feature space so as to learn similar features in the sample, thereby improving the model effect.

In particular, wherein the model total loss function is composed of a contrast loss function and a hybrid distance loss function, wherein the hybrid distance loss function is determined from a Euclidean distance function and a Manhattan distance function,

;

wherein the method comprises the steps ofFor the mixed distance loss function, +.>Is a sample before data enhancement, wherein +.>Sample +.f. in feature space before data enhancement>A dimension vector; />Is a data-enhanced sample, wherein +.>Sample enhanced for data +.>Dimension vector->And (3) manually setting weight parameters, wherein n is the total number of dimensions.

In contrast learning, simCLR and Barlow Twins are commonly used. In this simple case, simCLR is very high in order of magnitude for batch size, and SimCLR relies heavily on negative sample pairs. In view of the fact that the cross-language emotion classification task itself is different from the task in the computer vision field, the main purpose of the task is to shorten the distance between the language features of the source language and the target language, the used sample is only the sample of the source language and the target language, it is difficult to construct a negative sample pair, and a negative sample pair is forced to have a counterproductive effect on the task. In summary, the invention selects a more suitable method of contrast learning by Barlow Twons.

Specifically, as shown in fig. 2, the specific steps for determining the emotion classification result of the target language data are as follows:

it should be further noted that the source language text after the data enhancement processing and the target language text obtained after the translation pass through the source language encoderAnd target language encoder->Obtaining embedded expression of source language after encodingEmbedded expression of the target language +.>Where M and N are the lengths of the source language sentence and the target language sentence, respectively. The present invention then uses these hidden representations to construct a bilinear interaction map to obtain a single-headed pair-wise interaction matrix +.>Interaction matrix->The structural formula of (2) is formula 4;

;

single-head paired interaction matrix 4Construction formula

Where U is a learnable weight matrix for the source language domain representation, V is a learnable weight matrix for the target language domain representation, q is a learnable weight vector,is a fixed full 1 vector, +.>Representing the Hadamard product. For interaction matrix->，/>Can be calculated by formula 5;

;

5 interaction matrixCalculation formula

it should be further noted that in order to obtain cross-language federated mappingsThe invention is in the interaction matrix->A bilinear pooling layer is introduced above. Specifically, the->The K-th element of (2) can be obtained from formula 6;

;

formula for calculating combined mapping of 6 languages

Wherein the method comprises the steps ofAnd->The kth columns of the weight matrices U and V are represented, respectively. It should be noted that this layer has no new learnable parameters. The weight matrices U and V are shared with the previous interactive mapping layer to reduce the number of parameters and mitigate overfitting. Furthermore, in cross-language joint mapping +.>As above, the present invention adds a Sum pool (Sum-pooling) to obtain a dense feature map +.>。

It should be noted that, the loss function of the integrated encoder is determined by adopting negative log likelihood loss, and the difference condition of the input of the integrated encoder is measured by KL divergence, wherein the calculation formula of the loss function of the integrated encoder is as follows:

;

Finally, the invention maps featuresThe emotion classification probability is obtained by a full connection layer and softmax operation as shown in formula 7:

;

emotion classification probability calculation formula 7

Wherein the method comprises the steps ofRepresenting a learnable weight matrix, +.>Representing the bias. In this section, the present invention is trained using Negative log-likelihood Loss (NLLL) as shown in equation 8:

;

8 negative log likelihood loss

Wherein the method comprises the steps ofIndicate->True emotion tags of individual sentences,>is the output probability of the model.

Furthermore, given the probability distribution p of the source domain language and the probability distribution q of the target domain, the present invention uses the KL divergence to measure the difference of the two distributions. The final objective optimization function of the present module is therefore shown in equation 9:

;

equation 9 module total loss function.

Example 2

As shown in fig. 2, the present invention provides a computer storage medium having a computer program stored thereon, which when executed in a computer, causes the computer to perform a multi-language fused media text emotion analysis method as described above.

The invention has the beneficial effects that:

In embodiments of the present invention, the term "plurality" refers to two or more, unless explicitly defined otherwise. The terms "mounted," "connected," "secured," and the like are to be construed broadly, and may be, for example, fixedly attached, detachably attached, or integrally attached. The specific meaning of the above terms in the embodiments of the present invention will be understood by those of ordinary skill in the art according to specific circumstances.

In the description of the embodiments of the present invention, it should be understood that the directions or positional relationships indicated by the terms "upper", "lower", etc. are based on the directions or positional relationships shown in the drawings, are merely for convenience in describing the embodiments of the present invention and to simplify the description, and do not indicate or imply that the devices or units referred to must have a specific direction, be configured and operated in a specific direction, and thus should not be construed as limiting the embodiments of the present invention.

In the description of the present specification, the terms "one embodiment," "a preferred embodiment," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The above is only a preferred embodiment of the present invention and is not intended to limit the embodiment of the present invention, and various modifications and variations can be made to the embodiment of the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the embodiments of the present invention should be included in the protection scope of the embodiments of the present invention.

Claims

1. A multi-language fused media text emotion analysis method is characterized by comprising the following steps:

acquiring source domain language data, converting the source domain language data into source domain language vector vectors, and training the source domain language vector vectors to acquire a source language encoder and a source language classifier;

initializing a target language encoder based on the source language encoder, and taking a target language vector and a source domain language vector subjected to data enhancement as input of the target language encoder to obtain output of the target language encoder;

the source domain language vector is used as input to obtain the output of a source language encoder, the difference between the output of the target language encoder and the output of the source language encoder is determined through a language discriminator, and a learning module and a bilinear module are adopted to correct the parameters of the target language encoder until the difference meets the requirement, and the trained target language encoder is obtained;

and carrying out data enhancement processing on the source domain language data and the translated target language data to serve as input of a comprehensive encoder, and constructing the comprehensive encoder by adopting the trained target language encoder and the source language encoder to obtain an emotion classification result of the target language data.

2. The method of claim 1, wherein the source domain encoder is constructed based on a mBERT-S model and the target domain encoder is constructed based on a mBERT-T model.

3. The method of claim 1, wherein determining, by a speech discriminator, a difference between the output of the target speech coder and the output of the source speech coder, comprises:

4. The method for emotion analysis of multilingual fused media text of claim 3, wherein constructing a loss function by probability of said target language encoder and probability of said source language encoder comprises:

；

5. The method for emotion analysis of multilingual fused media text according to claim 1, wherein the data enhancement processing is performed on the source domain language data and the translated target language data, and specifically comprises:

6. The method of claim 5, wherein the model total loss function is comprised of a contrast loss function and a hybrid distance loss function, wherein the hybrid distance loss function is determined from a euclidean distance function and a manhattan distance function, wherein the model total loss function is calculated as:

;

wherein the method comprises the steps ofIs the Barlow Twains loss function, < ->Is a positive constant that is used to trade-off the importance of the lost first and second terms, equating the diagonal elements of C to 1 by the invariance term, leaving the emmbedding of different augmented versions of the same sample unchanged, and equating the non-diagonal elements to 0 by redundancy reduction term to reduce redundancy, decorrelating the different emmbedding vectors; />Is a cross-correlation matrix calculated along the batch dimension between the outputs of two identical networks, wherein +.>For the matrix dimension +.>The representation dimension is +.>Diagonal matrix of>Then representDimension is->Is a non-diagonal matrix of (a);

;

7. The emotion analysis method of multi-language fused media text according to claim 1, wherein the specific steps of determining emotion classification result of the target language data are:

the method comprises the steps that a source language text after data enhancement processing and a target language text obtained after translation are subjected to comprehensive encoder to obtain embedded expression of the source language and embedded expression of the target language, and a single-head paired interaction matrix is obtained through bilinear interaction mapping;

determining cross-language joint mapping based on the single-head pair interaction matrix through a bilinear pooling layer, and correcting the cross-language joint mapping through a sum pool to obtain dense feature mapping;

and obtaining emotion classification probability by adopting the dense feature mapping and adopting a full connection layer and softmax operation.

8. The method of claim 1, wherein the loss function of the integrated encoder is determined using negative log likelihood loss and the difference in input to the integrated encoder is measured by KL divergence, wherein the loss function of the integrated encoder is calculated by the formula:;

wherein the method comprises the steps ofFor the overall function of the module +.>For bilinear pooling loss function, +.>In order to be a weight parameter for the loss,measuring the difference of two distributions for DL divergence, M is the probability distribution of source domain language, N is the probability distribution of target domain>;

9. A computer storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform a multi-lingual fused media text emotion analysis method according to any of claims 1-8.