CN115982395B

CN115982395B - Emotion prediction method, medium and device for quantum-based media information

Info

Publication number: CN115982395B
Application number: CN202310267414.XA
Authority: CN
Inventors: 王磊; 蒋永余; 王俊艳; 王宇琪; 曹家; 罗引
Original assignee: Beijing Zhongke Wenge Technology Co ltd
Current assignee: Beijing Zhongke Wenge Technology Co ltd
Priority date: 2023-03-20
Filing date: 2023-03-20
Publication date: 2023-05-23
Anticipated expiration: 2043-03-20
Also published as: CN115982395A

Abstract

The invention relates to the field of multi-mode emotion prediction, in particular to an emotion prediction method, medium and device for quantum-based media information. The method comprises the following steps: preprocessing the representation information of each mode to generate complex number word vector sets A1 and A2 corresponding to the representation information of each mode, performing feature conversion processing on the A1 and the A2 to generate a corresponding feature density matrix set rho _t ρ _v The method comprises the steps of carrying out a first treatment on the surface of the P _t ρ _v Performing feature fusion processing to generate fusion features f _p The method comprises the steps of carrying out a first treatment on the surface of the According to f _p Projection operators with a plurality of preset emotion types to generate f _p A probability value for each emotion type is preset. P (e) ₁ )，P(e ₂ )，…，P(e _w ) And the emotion type corresponding to the maximum value in the media information is used as the emotion type of the target media information. By utilizing the quantum theory model, information interaction among different modes can be more effectively captured, and further the accuracy of a prediction result of emotion expressed by media information can be improved.

Description

Emotion prediction method, medium and device for quantum-based media information

Technical Field

The invention relates to the field of multi-mode emotion prediction, in particular to an emotion prediction method, medium and device for quantum-based media information.

Background

Media platforms are increasingly prone to use multiple media formats (e.g., text-image pairs, video, etc.) to co-express their news information. The multi-modal media content can convey information more accurately and intuitively than a single modality.

In the prior art, in order to accurately establish an association between an image and text, many multi-modal information analysis methods utilize deep neural networks to first encode the image and text into a dense representation, and then learn to measure their similarity. For example, the entire image and the entire sentence are mapped to a common vector space, and cosine similarity between the global representations is calculated. In order to improve the unified embedded discrimination capability, strategies such as semantic concept learning, regional relation reasoning and the like are provided, and visual characteristics are enhanced by fusing local regional semantics.

Existing multimodal information analysis models based on neural components are black box-like with poor interpretability. The problem of multimodal information processing is ultimately a human cognitive problem. Modeling interactions between modalities and contextual information from a human cognitive perspective, methods based on classical probability theory often fail to capture these interactions effectively. Irony, for example, is a subtle expression in human language intended to express criticizing, humorous, or cynical emotions by way of exaggeration, metaphors, and the like. Ironically expressed literally tends to be the opposite of the actual meaning it expresses, such expression may reverse the polarity of emotion. As a product of subjective consciousness from humans, emotional and ironic expressions are naturally closely related. However, the current multimodal information analysis model based on the neural network cannot analyze the information from the perspective of human cognition, so that the prediction result accuracy of the emotion expressed by the media information in the prior art is lower.

Disclosure of Invention

Aiming at the technical problems, the invention adopts the following technical scheme:

according to one aspect of the present invention, there is provided an emotion prediction method for quantum-based media information, the method comprising the steps of:

acquiring representation information of any two modes of target media information;

preprocessing the representation information of each mode to generate complex word vector sets A1 and A2 corresponding to the representation information of each mode, wherein A1= (A1) ¹ ，A1 ² ，…，A1 ⁿ ，…，A1 ^f(A1) ）；A2=（A2 ¹ ，A2 ² ，…，A2 ^q ，…，A2 ^f(A2) ）；A1 ⁿ Is the n complex number word vector in A1; f (A1) is the total number of complex word vectors in A1; A2A 2 ^q Is the q complex number word vector in A2; f (A2) is the total number of complex word vectors in A2, the value range of n is 1 to f (A1), and the value range of q is 1 to f (A2);

performing feature conversion processing on A1 and A2 respectively to generate corresponding feature density matrix sets rho respectively _t ρ _v； wherein ,ρ_t =（ρ _t ¹ ，ρ _t ² ，…，ρ _t ^β ，…，ρ _t ^g ) ρ _v =（ρ _v ¹ ，ρ _v ² ，…，ρ _v ^j ，…，ρ _v ^g ），ρ _t ^β A1 is a feature density matrix correspondingly generated under the beta-th preset initial parameter; ρ _v ^j A2 is a feature density matrix correspondingly generated under the j-th preset initial parameters; β=1, 2, …, g; j=1, 2, …, g; g is the total number of categories of preset initial parameters;

p _t ρ _v Feature fusion processing is carried out to generate fusion features f of target media information _p, wherein f_p The following conditions are satisfied:

wherein ,

ɸ as interference item information _β,j For ρ _t ^β 、ρ _v ^j The degree of association between the two modality information; ɸ _β,j ∈[-π，π]The method comprises the steps of carrying out a first treatment on the surface of the PCA () is to perform PCA processing on the matrix;

according to f _p Projection operators with a plurality of preset emotion types to generate f _p Probability value P (e ₁ )，P(e ₂ )，…，P(e _δ )，…，P(e _w ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein is f _p A probability value of a delta-th preset emotion type; δ=1, 2, …, w; w is the total number of preset emotion types, P (e _δ ) The following conditions are satisfied:

wherein ,E_δ A projection operator of a delta preset emotion type; tr () is a trace taking function of the matrix and is used for obtaining the trace of the matrix; () ^H Is the conjugate transpose function of the matrix;

p (e) ₁ )，P(e ₂ )，…，P(e _δ )，…，P(e _w ) And the emotion type corresponding to the maximum value in the media information is used as the emotion type of the target media information.

According to a second aspect of the present invention, there is provided a non-transitory computer readable storage medium storing a computer program which when executed by a processor implements an emotion prediction method for quantum-based media information as described above.

According to a third aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing a method of emotion prediction for quantum-based media information as described above when the computer program is executed by the processor.

The invention has at least the following beneficial effects:

the information of each mode can be vectorized by utilizing the quantum probability principle, and a complex word vector in a multi-dimensional superposition state is formed. Therefore, uncertainty in different modal languages of a human can be captured, and subtle expression forms in the human language can be effectively represented. Then, a sliding window is arranged to segment a complex number word vector set corresponding to the representation information of each mode, so that a plurality of quantum composite systems are formed, and then context information among single-mode information is captured through the quantum composite systems to generate a density matrix corresponding to the information of each mode. And further, carrying out feature fusion processing on the density matrixes corresponding to the two modes respectively, namely quantum interference calculation. Thus, nonlinear fusion of the multi-mode information features is realized through quantum interference calculation. And finally, describing the correlation between the fused multi-modal characteristics and different emotion classifications through quantum incompatibility measurement, and outputting a preset emotion type corresponding to a projection operator with highest correlation as a prediction result.

The general technical scheme of the invention mainly comprises: data preprocessing and vectorizing multi-mode characteristics, constructing single-mode information characteristic representation, multi-mode information characteristic fusion and multi-mode social media emotion type prediction, and carrying out corresponding processing on data by using a quantum theory in the steps. As quantum theory has been proved to solve the paradox of classical probability theory in human cognitive modeling, that is, by using the model of quantum theory, information interaction between different modes can be more effectively captured, so that emotion types corresponding to ironic expression can be more accurately identified, and further, the accuracy of prediction results of emotion expressed by media information can be improved. And the model based on quantum theory has better interpretability.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flowchart of an emotion prediction method for quantum-based media information according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

According to one aspect of the present invention, as shown in fig. 1, there is provided an emotion prediction method for quantum-based media information, the method comprising the steps of:

and S100, acquiring the representation information of any two modes of the target media information. Preferably, the mode of the representation information corresponding to the target media information comprises any two of text, image and audio.

The target media information in the invention can be the existing social media information, such as the information released by the existing short video platform. Typically, the media information contains information of a plurality of modalities. Such as corresponding image, audio, and text information. In this embodiment, the combined information of any two modes is taken as the input information of the subsequent processing. Such as images and text information.

S200, preprocessing the representation information of each mode to generate complex word vector sets A1 and A2 corresponding to the representation information of each mode, wherein A1= (A1) ¹ ，A1 ² ，…，A1 ⁿ ，…，A1 ^f(A1) ）。A2=（A2 ¹ ，A2 ² ，…，A2 ^q ，…，A2 ^f(A2) ）。A1 ⁿ Is the n-th complex word vector in A1. f (A1) is the total number of complex word vectors in A1. A2A 2 ^q Is the q-th complex word vector in A2. f (A2) is the total number of complex word vectors in A2, n is in the range of 1 to f (A1), and q is in the range of 1 to f (A2).

The pretreatment in this step comprises the following steps: denoising and segmenting the information of each single mode and vectorizing the characteristics of the mode information.

The denoising and segmentation process may change the information of each modality into an ordered set of multiple valid sub-information. Such as: the text needs to be subjected to word segmentation, word stopping, word list generation and other operations; the picture needs to extract a target detection frame of a plurality of images; the voice needs to be subjected to noise reduction, segmentation and the like. Finally, the information of each mode is correspondingly changed into an ordered set of word or image target detection frames or voice fragments.

The vectorization processing of the modal information features is to vectorize the sub-information in the ordered set of each modal information by utilizing quantum probability. Taking text information as an example, each word may be represented vectorized using an superposition of basis vectors over a multidimensional Hilbert space. Uncertainty in the different modal languages of a human can thus be captured by quantum probability.

S300, respectively performing feature conversion processing on A1 and A2 to respectively generate corresponding feature density matrix sets rho _t ρ _v. wherein ,ρ_t =（ρ _t ¹ ，ρ _t ² ，…，ρ _t ^β ，…，ρ _t ^g ) ρ _v =（ρ _v ¹ ，ρ _v ² ，…，ρ _v ^j ，…，ρ _v ^g ），ρ _t ^β In order to generate the characteristic density matrix corresponding to A1 under the beta preset initial parameters. ρ _v ^j And A2 is a feature density matrix correspondingly generated under the j-th preset initial parameters. β=1, 2, …, g. j=1, 2, …, g. g is the total number of categories of the preset initial parameters.

Preferably, the preset initial parameter is a shared parameter. Presetting initial parameters u _t ^ξ The feature representation, which can be used in different texts, is a shared trainable parameter.

In the step, the context information among the single-mode information is captured through the quantum composite system, and the method for calculating the information fragment through the shared preset initialization parameters is provided, so that the calculation complexity is effectively reduced.

S400 p _t ρ _v Feature fusion processing is carried out to generate fusion features f of target media information _p, wherein f_p The following conditions are satisfied:

wherein ,

ɸ as interference item information _β,j For ρ _t ^β 、ρ _v ^j The degree of association between the two modality information. ɸ _β,j ∈[-π，π]. PCA () is PCA processing of a matrix.

To solve the problem that the current multimodal information analysis model based on the neural component cannot model the interaction of context information among modalities from the perspective of human cognition. The invention introduces the concept of quantum interference to perform the feature fusion among modes.

In the step, the information between the two modes is fused respectively so as to lead the fusion characteristic f of the finally generated target media information _p The method has richer and more accurate semantic features. Specifically, the nonlinear fusion of the features is performed by utilizing the principle of quantum interference during the fusion. In quantum interference, information of two modes can be overlapped and interfered. So by 2 x (ρ _t ^β *ρ _v ^j ) ^0.5 cosɸ _β,j Interference information is described. At the same time due to ρ _t ^β +ρ _v ^j +2*(ρ _t ^β *ρ _v ^j ) ^0.5 cosɸ _β,j This term is of enormous parameters and therefore it is necessary to perform PCA (Principal Component Analysis ) processing on it in order to reduce the number of parameters in order to mitigate the consumption of computational resources.

S500 according to f _p Projection operators with a plurality of preset emotion types to generate f _p Probability value P (e ₁ )，P(e ₂ )，…，P(e _δ )，…，P(e _w ). Wherein is f _p And presetting a probability value of the emotion type for the delta type. δ=1, 2, …, w. w is the total number of preset emotion types, P (e _δ ) The following conditions are satisfied:

wherein ,E_δ and (5) presetting a projection operator of the emotion type for the delta. tr () is a trace-taking function of the matrix for obtaining the trace of the matrix. () ^H Is the conjugate transpose function of the matrix.

Specifically, projection operators of various emotion types can be trained in advance according to actual use requirements. And carrying out emotion prediction through an emotion measurement operator, and considering the correlation between different emotion classifications.

S600, P (e) ₁ )，P(e ₂ )，…，P(e _δ )，…，P(e _w ) And the emotion type corresponding to the maximum value in the media information is used as the emotion type of the target media information.

The invention takes a multi-modal social media emotion analysis task as an example, namely the multi-modal social media emotion analysis task is divided into different emotion types according to multi-modal social media information, such as anger, fear, surprise, trust, cynism, happiness and the like. In order to capture subtle expression forms in human languages in multi-modal information, uncertainty of quantum probability in capturing different modal languages of a human is captured firstly, the multi-modal information features are fused in a nonlinear mode through quantum interference, a flux sub-composite system captures context among modal information, and finally correlation among different emotion classifications can be described through quantum incompatibility measurement.

The invention uses the cross entropy loss function commonly used in multi-classification tasks to train the model, and the specific loss function is as follows:

wherein Z is the number of emotion categories, Q (y _i ) Representing a probability of artificially tagging social media text as an ith emotion category, P (e _i ) The probability that the social media text calculated for the invention is the ith emotion type. The goal of model training is to reduce the loss value L as much as possible. Finally, the model updates parameters of the model through an Adam optimizer.

According to the invention, the information of each mode can be vectorized by utilizing the quantum probability principle, so that the complex number word vector in the multi-dimensional superposition state is formed. Therefore, uncertainty in different modal languages of a human can be captured, and subtle expression forms in the human language can be effectively represented. Then, a sliding window is arranged to segment a complex number word vector set corresponding to the representation information of each mode, so that a plurality of quantum composite systems are formed, and then context information among single-mode information is captured through the quantum composite systems to generate a density matrix corresponding to the information of each mode. And further, carrying out feature fusion processing on the density matrixes corresponding to the two modes respectively, namely quantum interference calculation. Thus, nonlinear fusion of the multi-mode information features is realized through quantum interference calculation. And finally, describing the correlation between the fused multi-modal characteristics and different emotion classifications through quantum incompatibility measurement, and outputting a preset emotion type corresponding to a projection operator with highest correlation as a prediction result.

As a possible embodiment of the present invention, the feature conversion process includes the steps of:

s301, sliding and dividing the target complex word vector set by using a sliding window to generate a plurality of composite vector subsets C1, C2, …, cx, … and Cm-k+1 corresponding to the target complex word vector set. Cx is a composite vector subset generated by the x-th segmentation of the target complex word vector set by the sliding window, and xE [1, m-k+1]. m-k+1 is the total number of composite vector subsets, m is the total length of the target complex word vector set, and k is the total length of the sliding window. The target complex word vector set is A1 or A2.

S302 according to u _t ¹ ，u _t ² ，…，u _t ^ξ ，…，u _t ^g Vector conversion is carried out on C1, C2, …, cx, … and Cm-k+1 respectively to generate a feature vector set D1, D2, …, dx, … and Dm-k+1 corresponding to each composite vector subset, wherein dx= (ψ) _x ¹ ，Ψ _x ² ，…，Ψ _x ^ξ ，…，Ψ _x ^g ) Dx is the feature vector set corresponding to Cx, ψ _x ^ξ To use u _t ^ξ And carrying out vector conversion processing on Cx to generate a characteristic vector. u (u) _t ^ξ The initial parameters are preset for the xi-th category, and the xi is [1, g ]]。

Ψ _x ^ξ The following conditions are satisfied:

Ψ _x ^ξ =u _t ^ξ ⊗c _t ^x1 ⊗c _t ^x2 ⊗…⊗c _t ^xT ⊗…⊗c _t ^xf(Cx)

wherein ,c_t ^xT T E [1, f (Cx) is the T-th complex word vector in Cx]. f (Cx) is the total number of complex word vectors in Cx. Preferably, the preset initial parameter is a shared parameter.

Taking text information as an example, u _t ^ξ May be a vector of words representing different emotions. Such as happy, sad, excited, panic, and no-eye. By adding these words to the text segment divided by the sliding window, the emotion type to be represented can be affected to a greater extent.

In this embodiment, the target complex word vector set may be divided into a plurality of complex vector subsets by setting a sliding window. I.e. a plurality of quantum composite systems can be generated. Such as: the length of the sliding window is k, and then the social media text with the length of m can create m-k+1 quantum composite systems. And then generating the feature vector corresponding to the composite vector subset by combining the context information in each sliding window. Meanwhile, in order to increase the emotion expression direction of the information in each sliding window, a preset initial parameter is added to influence the emotion expression direction of the information in the whole sliding window when the feature vector is generated. Meanwhile, as the initial state of the quantum composite system is possibly dependent on preset initial parameters, the method and the device use a plurality of preset initial parameters to perform social media text information characteristic representation.

S303, generating a feature density matrix set rho= (rho) of the target complex word vector set according to D1, D2, …, dx, … and Dm-k+1 ¹ ，ρ ² ，…，ρ ^ξ ，…，ρ ^g）. wherein ,ρ^ξ To u is _t ^ξ And correspondingly generating a characteristic density matrix in the state. ρ ^ξ The following conditions are satisfied:

wherein Px is ψ _x ^ξ Weight coefficient of (c) in the above-mentioned formula (c). Is a trainable parameter. The manual setting can be performed according to the actual use situation.

In this embodiment, feature vectors in the sliding windows are fused, so that context content of the whole information of a certain mode can be combined and represented. So that the finally generated feature density matrix has richer and more accurate semantic features. And in the combining process, feature vectors in sliding windows corresponding to the same preset initial parameters are fused. Thus, the feature density matrix corresponding to various emotion types can be obtained.

As a possible embodiment of the present invention, the preprocessing includes an information vectorization process for converting the representation information of each modality into a corresponding complex word vector. The information vectorization processing comprises the following steps:

s310, obtaining basic vectors F1, F2, …, fr, …, fz, fr= (ψ) of each effective sub-information in the target representation information in the S-dimension Hilbert space _r ¹ ，Ψ _r ² ，…，Ψ _r ^h ，…，Ψ _r ^s ). Wherein Fr is the r-th valid sub-information in the target representation information. Psi _r ¹ ，Ψ _r ² ，…，Ψ _r ^h ，…，Ψ _r ^s Is the vector value of Fr in each dimension in the hilbert space.

S320, performing superposition processing on F1, F2, …, fr, … and Fz to generate complex word vectors G1, G2, …, gr, … and Gz corresponding to each effective sub-information in the target representation information. Wherein Gr is a complex word vector corresponding to Fr. Gr satisfies the following conditions:

wherein ,z_r ^h Is a complex-valued weight corresponding to Gr in the h-th dimension in the hilbert space.

Is a complex number.

{η _r ^h } _h=1 ^s Is a non-negative real number and satisfies

，θ _r ^h Is a real number eta _r ^h Corresponding complex phase and satisfy theta _r ^h ∈[-π，π]。z _r ^h Can be set by human.

Taking text information as an example for illustration, for example, assume that there are s independent potential semantics in the text information in the social media information (the potential semantics are used to represent meaning types represented by words, such as "apple" can represent fruit, company name, and electronic device, and thus apple has 3 independent potential semantics). The number of potential semantics contained in the union of potential semantics of all words in the text information is the dimension of the Hilbert space corresponding to the text information. Each word of text is then modeled as defined in s-dimension hilbert spaceQuantum concepts in between. Wherein the latent semantics form a set of basis vectors (ψ) _t ¹ ，Ψ _t ² ，…，Ψ _t ^s ). Each word may be represented using the superposition of basis vectors on the hilbert space, i.e. a linear combination of basis vectors and complex-valued weights, as follows:

in this embodiment, the basis vectors may be defined as forms of (0, …,0,1, …). As in the 3-dimensional hilbert space, the basis vectors may be (1, 0), (0, 1, 0), and (0, 1), respectively. Thus, the words in the final text information can pass through the s-dimensional complex word vector

And (3) representing.

As a possible embodiment of the present invention, when the modality representing the information is text, the preprocessing further includes:

s201: the representation information is subjected to word segmentation processing, and a plurality of sub-representation information is generated.

S202: sub-representation information of the type stop word is removed from the plurality of sub-representation information.

S203: mapping the rest sub-representation information with a preset dictionary to generate a plurality of effective sub-information corresponding to the representation information.

As a possible embodiment of the present invention, when the modality representing the information is an image, the preprocessing further includes:

s204: and performing target detection processing on the representation information to generate a plurality of target detection frames.

S205: and taking the image information selected by the multiple target detection frames as multiple effective sub-information corresponding to the representation information.

As a possible embodiment of the present invention, when the modality representing the information is audio, the preprocessing further includes:

s206: the representation information is subjected to noise reduction processing to generate first representation information.

S207: and carrying out segmentation processing on the first representation information to generate a plurality of effective sub-information corresponding to the representation information.

When the modes of representing the information are text, image or audio, the corresponding denoising and segmentation processing method can reduce the noise information in the data input data and reduce the calculated amount after the processing.

Embodiments of the present invention also provide a non-transitory computer readable storage medium that may be disposed in an electronic device to store at least one instruction or at least one program for implementing one of the methods embodiments, the at least one instruction or the at least one program being loaded and executed by the processor to implement the methods provided by the embodiments described above.

Embodiments of the present invention also provide an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.

Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention described in the present specification when the program product is run on the electronic device.

While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims

1. An emotion prediction method for quantum-based media information, which is characterized by comprising the following steps:

representation information for each modalityLine preprocessing, namely generating complex number word vector sets A1 and A2 corresponding to the representation information of each mode, wherein A1= (A1) ¹ ，A1 ² ，…，A1 ⁿ ，…，A1 ^f(A1) ）；A2=（A2 ¹ ，A2 ² ，…，A2 ^q ，…，A2 ^f(A2) ）；A1 ⁿ Is the n complex number word vector in A1; f (A1) is the total number of complex word vectors in A1; A2A 2 ^q Is the q complex number word vector in A2; f (A2) is the total number of complex word vectors in A2, the value range of n is 1 to f (A1), and the value range of q is 1 to f (A2);

；

wherein ,

；

2. The method according to claim 1, wherein the feature conversion process includes the steps of:

performing sliding segmentation on a target complex word vector set by using a sliding window to generate a plurality of composite vector subsets C1, C2, …, cx, … and Cm-k+1 corresponding to the target complex word vector set; cx is a composite vector subset generated after the x-th segmentation of the target complex word vector set by the sliding window, and x is [1, m-k+1]; m-k+1 is the total number of the composite vector subsets, m is the total length of the target complex word vector set, and k is the total length of the sliding window;

according to u _t ¹ ，u _t ² ，…，u _t ^ξ ，…，u _t ^g Vector conversion is carried out on C1, C2, …, cx, … and Cm-k+1 respectively to generate feature vector sets D1, D2, … and Dx (), … and Dm-k+1 corresponding to each composite vector subset, wherein Dx= (ψ) _x ¹ ，Ψ _x ² ，…，Ψ _x ^ξ ，…，Ψ _x ^g ) Dx is the feature vector set corresponding to Cx, ψ _x ^ξ To use u _t ^ξ Performing vector conversion processing on Cx to generate a feature vector; u (u) _t ^ξ The initial parameters are preset for the xi-th category, and the xi is [1, g ]]；

Ψ _x ^ξ The following conditions are satisfied:

wherein ,c_t ^xT T E [1, f (Cx) is the T-th complex word vector in Cx]The method comprises the steps of carrying out a first treatment on the surface of the f (Cx) is the total number of complex word vectors in Cx;

generating a feature density matrix set rho= (rho) of the target complex word vector set according to D1, D2, …, dx, …, dm-k+1 ¹ ，ρ ² ，…，ρ ^ξ ，…，ρ ^g）； wherein ,ρ^ξ To u is _t ^ξ The characteristic density matrix is correspondingly generated in the state; ρ ^ξ The following conditions are satisfied:

；

wherein Px is ψ _x ^ξ Weight coefficient of (c) in the above-mentioned formula (c).

3. The method of claim 1, wherein the preprocessing comprises an information vectorization process for converting the representation information of each modality into a corresponding complex word vector; the information vectorization processing comprises the following steps:

obtaining basic vectors F1, F2, …, fr, …, fz, fr= (ψ) of each effective sub-information in the target representation information in the s-dimension Hilbert space _r ¹ ，Ψ _r ² ，…，Ψ _r ^h ，…，Ψ _r ^s ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein FrThe r effective sub-information in the target representation information is obtained; psi _r ¹ ，Ψ _r ² ，…，Ψ _r ^h ，…，Ψ _r ^s Vector values for Fr in each dimension in the hilbert space;

performing superposition processing on the F1, F2, …, fr, … and Fz to generate complex word vectors G1, G2, …, gr, … and Gz corresponding to each effective sub-information in the target representation information; wherein Gr is a complex word vector corresponding to Fr; gr satisfies the following conditions:

；

wherein ,z_r ^h Complex-valued weights corresponding to Gr in the hilbert space in the h dimension;

is a complex number;

{η _r ^h } _h=1 ^s is a non-negative real number and satisfies

，θ _r ^h Is a real number eta _r ^h Corresponding complex phase and satisfy theta _r ^h ∈[-π，π]。

4. The method of claim 1, wherein the modality that represents information includes any one of text, image, and audio.

5. The method of claim 4, wherein when the modality representing the information is text, the preprocessing further comprises:

performing word segmentation processing on the representation information to generate a plurality of sub-representation information;

removing sub-representation information with the type of stop words from a plurality of sub-representation information;

mapping the rest sub-representation information with a preset dictionary to generate a plurality of effective sub-information corresponding to the representation information to be processed.

6. The method of claim 4, wherein when the modality representing the information is an image, the preprocessing further comprises:

performing target detection processing on the representation information to generate a plurality of target detection frames;

and taking the image information selected by the target detection frames as a plurality of effective sub-information corresponding to the to-be-processed representation information.

7. The method of claim 4, wherein when the modality representing the information is audio, the preprocessing further comprises:

carrying out noise reduction processing on the representation information to generate first representation information to be processed;

and carrying out segmentation processing on the first to-be-processed representation information to generate a plurality of effective sub-information corresponding to the to-be-processed representation information.

8. The method of claim 1, wherein the predetermined initial parameter is a shared parameter.

9. A non-transitory computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a method of emotion prediction for quantum-based media information as claimed in any one of claims 1 to 8.

10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements a method of emotion prediction for quantum-based media information as claimed in any one of claims 1 to 8 when the computer program is executed by the processor.