CN115982395B - Emotion prediction method, medium and device for quantum-based media information - Google Patents

Emotion prediction method, medium and device for quantum-based media information Download PDF

Info

Publication number
CN115982395B
CN115982395B CN202310267414.XA CN202310267414A CN115982395B CN 115982395 B CN115982395 B CN 115982395B CN 202310267414 A CN202310267414 A CN 202310267414A CN 115982395 B CN115982395 B CN 115982395B
Authority
CN
China
Prior art keywords
information
generate
emotion
complex
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310267414.XA
Other languages
Chinese (zh)
Other versions
CN115982395A (en
Inventor
王磊
蒋永余
王俊艳
王宇琪
曹家
罗引
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Zhongke Wenge Technology Co ltd
Original Assignee
Beijing Zhongke Wenge Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Zhongke Wenge Technology Co ltd filed Critical Beijing Zhongke Wenge Technology Co ltd
Priority to CN202310267414.XA priority Critical patent/CN115982395B/en
Publication of CN115982395A publication Critical patent/CN115982395A/en
Application granted granted Critical
Publication of CN115982395B publication Critical patent/CN115982395B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to the field of multi-mode emotion prediction, in particular to an emotion prediction method, medium and device for quantum-based media information. The method comprises the following steps: preprocessing the representation information of each mode to generate complex number word vector sets A1 and A2 corresponding to the representation information of each mode, performing feature conversion processing on the A1 and the A2 to generate a corresponding feature density matrix set rho t ρ v The method comprises the steps of carrying out a first treatment on the surface of the P t ρ v Performing feature fusion processing to generate fusion features f p The method comprises the steps of carrying out a first treatment on the surface of the According to f p Projection operators with a plurality of preset emotion types to generate f p A probability value for each emotion type is preset. P (e) 1 ),P(e 2 ),…,P(e w ) And the emotion type corresponding to the maximum value in the media information is used as the emotion type of the target media information. By utilizing the quantum theory model, information interaction among different modes can be more effectively captured, and further the accuracy of a prediction result of emotion expressed by media information can be improved.

Description

Emotion prediction method, medium and device for quantum-based media information
Technical Field
The invention relates to the field of multi-mode emotion prediction, in particular to an emotion prediction method, medium and device for quantum-based media information.
Background
Media platforms are increasingly prone to use multiple media formats (e.g., text-image pairs, video, etc.) to co-express their news information. The multi-modal media content can convey information more accurately and intuitively than a single modality.
In the prior art, in order to accurately establish an association between an image and text, many multi-modal information analysis methods utilize deep neural networks to first encode the image and text into a dense representation, and then learn to measure their similarity. For example, the entire image and the entire sentence are mapped to a common vector space, and cosine similarity between the global representations is calculated. In order to improve the unified embedded discrimination capability, strategies such as semantic concept learning, regional relation reasoning and the like are provided, and visual characteristics are enhanced by fusing local regional semantics.
Existing multimodal information analysis models based on neural components are black box-like with poor interpretability. The problem of multimodal information processing is ultimately a human cognitive problem. Modeling interactions between modalities and contextual information from a human cognitive perspective, methods based on classical probability theory often fail to capture these interactions effectively. Irony, for example, is a subtle expression in human language intended to express criticizing, humorous, or cynical emotions by way of exaggeration, metaphors, and the like. Ironically expressed literally tends to be the opposite of the actual meaning it expresses, such expression may reverse the polarity of emotion. As a product of subjective consciousness from humans, emotional and ironic expressions are naturally closely related. However, the current multimodal information analysis model based on the neural network cannot analyze the information from the perspective of human cognition, so that the prediction result accuracy of the emotion expressed by the media information in the prior art is lower.
Disclosure of Invention
Aiming at the technical problems, the invention adopts the following technical scheme:
according to one aspect of the present invention, there is provided an emotion prediction method for quantum-based media information, the method comprising the steps of:
acquiring representation information of any two modes of target media information;
preprocessing the representation information of each mode to generate complex word vector sets A1 and A2 corresponding to the representation information of each mode, wherein A1= (A1) 1 ,A1 2 ,…,A1 n ,…,A1 f(A1) );A2=(A2 1 ,A2 2 ,…,A2 q ,…,A2 f(A2) );A1 n Is the n complex number word vector in A1; f (A1) is the total number of complex word vectors in A1; A2A 2 q Is the q complex number word vector in A2; f (A2) is the total number of complex word vectors in A2, the value range of n is 1 to f (A1), and the value range of q is 1 to f (A2);
performing feature conversion processing on A1 and A2 respectively to generate corresponding feature density matrix sets rho respectively t ρ v; wherein ,ρt =(ρ t 1 ,ρ t 2 ,…,ρ t β ,…,ρ t g ) ρ v =(ρ v 1 ,ρ v 2 ,…,ρ v j ,…,ρ v g ),ρ t β A1 is a feature density matrix correspondingly generated under the beta-th preset initial parameter; ρ v j A2 is a feature density matrix correspondingly generated under the j-th preset initial parameters; β=1, 2, …, g; j=1, 2, …, g; g is the total number of categories of preset initial parameters;
p t ρ v Feature fusion processing is carried out to generate fusion features f of target media information p, wherein fp The following conditions are satisfied:
Figure SMS_1
wherein ,
Figure SMS_2
ɸ as interference item information β,j For ρ t β 、ρ v j The degree of association between the two modality information; ɸ β,j ∈[-π,π]The method comprises the steps of carrying out a first treatment on the surface of the PCA () is to perform PCA processing on the matrix;
according to f p Projection operators with a plurality of preset emotion types to generate f p Probability value P (e 1 ),P(e 2 ),…,P(e δ ),…,P(e w ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein is f p A probability value of a delta-th preset emotion type; δ=1, 2, …, w; w is the total number of preset emotion types, P (e δ ) The following conditions are satisfied:
Figure SMS_3
wherein ,Eδ A projection operator of a delta preset emotion type; tr () is a trace taking function of the matrix and is used for obtaining the trace of the matrix; () H Is the conjugate transpose function of the matrix;
p (e) 1 ),P(e 2 ),…,P(e δ ),…,P(e w ) And the emotion type corresponding to the maximum value in the media information is used as the emotion type of the target media information.
According to a second aspect of the present invention, there is provided a non-transitory computer readable storage medium storing a computer program which when executed by a processor implements an emotion prediction method for quantum-based media information as described above.
According to a third aspect of the present invention, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing a method of emotion prediction for quantum-based media information as described above when the computer program is executed by the processor.
The invention has at least the following beneficial effects:
the information of each mode can be vectorized by utilizing the quantum probability principle, and a complex word vector in a multi-dimensional superposition state is formed. Therefore, uncertainty in different modal languages of a human can be captured, and subtle expression forms in the human language can be effectively represented. Then, a sliding window is arranged to segment a complex number word vector set corresponding to the representation information of each mode, so that a plurality of quantum composite systems are formed, and then context information among single-mode information is captured through the quantum composite systems to generate a density matrix corresponding to the information of each mode. And further, carrying out feature fusion processing on the density matrixes corresponding to the two modes respectively, namely quantum interference calculation. Thus, nonlinear fusion of the multi-mode information features is realized through quantum interference calculation. And finally, describing the correlation between the fused multi-modal characteristics and different emotion classifications through quantum incompatibility measurement, and outputting a preset emotion type corresponding to a projection operator with highest correlation as a prediction result.
The general technical scheme of the invention mainly comprises: data preprocessing and vectorizing multi-mode characteristics, constructing single-mode information characteristic representation, multi-mode information characteristic fusion and multi-mode social media emotion type prediction, and carrying out corresponding processing on data by using a quantum theory in the steps. As quantum theory has been proved to solve the paradox of classical probability theory in human cognitive modeling, that is, by using the model of quantum theory, information interaction between different modes can be more effectively captured, so that emotion types corresponding to ironic expression can be more accurately identified, and further, the accuracy of prediction results of emotion expressed by media information can be improved. And the model based on quantum theory has better interpretability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an emotion prediction method for quantum-based media information according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.
According to one aspect of the present invention, as shown in fig. 1, there is provided an emotion prediction method for quantum-based media information, the method comprising the steps of:
and S100, acquiring the representation information of any two modes of the target media information. Preferably, the mode of the representation information corresponding to the target media information comprises any two of text, image and audio.
The target media information in the invention can be the existing social media information, such as the information released by the existing short video platform. Typically, the media information contains information of a plurality of modalities. Such as corresponding image, audio, and text information. In this embodiment, the combined information of any two modes is taken as the input information of the subsequent processing. Such as images and text information.
S200, preprocessing the representation information of each mode to generate complex word vector sets A1 and A2 corresponding to the representation information of each mode, wherein A1= (A1) 1 ,A1 2 ,…,A1 n ,…,A1 f(A1) )。A2=(A2 1 ,A2 2 ,…,A2 q ,…,A2 f(A2) )。A1 n Is the n-th complex word vector in A1. f (A1) is the total number of complex word vectors in A1. A2A 2 q Is the q-th complex word vector in A2. f (A2) is the total number of complex word vectors in A2, n is in the range of 1 to f (A1), and q is in the range of 1 to f (A2).
The pretreatment in this step comprises the following steps: denoising and segmenting the information of each single mode and vectorizing the characteristics of the mode information.
The denoising and segmentation process may change the information of each modality into an ordered set of multiple valid sub-information. Such as: the text needs to be subjected to word segmentation, word stopping, word list generation and other operations; the picture needs to extract a target detection frame of a plurality of images; the voice needs to be subjected to noise reduction, segmentation and the like. Finally, the information of each mode is correspondingly changed into an ordered set of word or image target detection frames or voice fragments.
The vectorization processing of the modal information features is to vectorize the sub-information in the ordered set of each modal information by utilizing quantum probability. Taking text information as an example, each word may be represented vectorized using an superposition of basis vectors over a multidimensional Hilbert space. Uncertainty in the different modal languages of a human can thus be captured by quantum probability.
S300, respectively performing feature conversion processing on A1 and A2 to respectively generate corresponding feature density matrix sets rho t ρ v. wherein ,ρt =(ρ t 1 ,ρ t 2 ,…,ρ t β ,…,ρ t g ) ρ v =(ρ v 1 ,ρ v 2 ,…,ρ v j ,…,ρ v g ),ρ t β In order to generate the characteristic density matrix corresponding to A1 under the beta preset initial parameters. ρ v j And A2 is a feature density matrix correspondingly generated under the j-th preset initial parameters. β=1, 2, …, g. j=1, 2, …, g. g is the total number of categories of the preset initial parameters.
Preferably, the preset initial parameter is a shared parameter. Presetting initial parameters u t ξ The feature representation, which can be used in different texts, is a shared trainable parameter.
In the step, the context information among the single-mode information is captured through the quantum composite system, and the method for calculating the information fragment through the shared preset initialization parameters is provided, so that the calculation complexity is effectively reduced.
S400 p t ρ v Feature fusion processing is carried out to generate fusion features f of target media information p, wherein fp The following conditions are satisfied:
Figure SMS_4
wherein ,
Figure SMS_5
ɸ as interference item information β,j For ρ t β 、ρ v j The degree of association between the two modality information. ɸ β,j ∈[-π,π]. PCA () is PCA processing of a matrix.
To solve the problem that the current multimodal information analysis model based on the neural component cannot model the interaction of context information among modalities from the perspective of human cognition. The invention introduces the concept of quantum interference to perform the feature fusion among modes.
In the step, the information between the two modes is fused respectively so as to lead the fusion characteristic f of the finally generated target media information p The method has richer and more accurate semantic features. Specifically, the nonlinear fusion of the features is performed by utilizing the principle of quantum interference during the fusion. In quantum interference, information of two modes can be overlapped and interfered. So by 2 x (ρ t βv j ) 0.5 cosɸ β,j Interference information is described. At the same time due to ρ t βv j +2*(ρ t βv j ) 0.5 cosɸ β,j This term is of enormous parameters and therefore it is necessary to perform PCA (Principal Component Analysis ) processing on it in order to reduce the number of parameters in order to mitigate the consumption of computational resources.
S500 according to f p Projection operators with a plurality of preset emotion types to generate f p Probability value P (e 1 ),P(e 2 ),…,P(e δ ),…,P(e w ). Wherein is f p And presetting a probability value of the emotion type for the delta type. δ=1, 2, …, w. w is the total number of preset emotion types, P (e δ ) The following conditions are satisfied:
Figure SMS_6
wherein ,Eδ and (5) presetting a projection operator of the emotion type for the delta. tr () is a trace-taking function of the matrix for obtaining the trace of the matrix. () H Is the conjugate transpose function of the matrix.
Specifically, projection operators of various emotion types can be trained in advance according to actual use requirements. And carrying out emotion prediction through an emotion measurement operator, and considering the correlation between different emotion classifications.
S600, P (e) 1 ),P(e 2 ),…,P(e δ ),…,P(e w ) And the emotion type corresponding to the maximum value in the media information is used as the emotion type of the target media information.
The invention takes a multi-modal social media emotion analysis task as an example, namely the multi-modal social media emotion analysis task is divided into different emotion types according to multi-modal social media information, such as anger, fear, surprise, trust, cynism, happiness and the like. In order to capture subtle expression forms in human languages in multi-modal information, uncertainty of quantum probability in capturing different modal languages of a human is captured firstly, the multi-modal information features are fused in a nonlinear mode through quantum interference, a flux sub-composite system captures context among modal information, and finally correlation among different emotion classifications can be described through quantum incompatibility measurement.
The invention uses the cross entropy loss function commonly used in multi-classification tasks to train the model, and the specific loss function is as follows:
Figure SMS_7
wherein Z is the number of emotion categories, Q (y i ) Representing a probability of artificially tagging social media text as an ith emotion category, P (e i ) The probability that the social media text calculated for the invention is the ith emotion type. The goal of model training is to reduce the loss value L as much as possible. Finally, the model updates parameters of the model through an Adam optimizer.
According to the invention, the information of each mode can be vectorized by utilizing the quantum probability principle, so that the complex number word vector in the multi-dimensional superposition state is formed. Therefore, uncertainty in different modal languages of a human can be captured, and subtle expression forms in the human language can be effectively represented. Then, a sliding window is arranged to segment a complex number word vector set corresponding to the representation information of each mode, so that a plurality of quantum composite systems are formed, and then context information among single-mode information is captured through the quantum composite systems to generate a density matrix corresponding to the information of each mode. And further, carrying out feature fusion processing on the density matrixes corresponding to the two modes respectively, namely quantum interference calculation. Thus, nonlinear fusion of the multi-mode information features is realized through quantum interference calculation. And finally, describing the correlation between the fused multi-modal characteristics and different emotion classifications through quantum incompatibility measurement, and outputting a preset emotion type corresponding to a projection operator with highest correlation as a prediction result.
The general technical scheme of the invention mainly comprises: data preprocessing and vectorizing multi-mode characteristics, constructing single-mode information characteristic representation, multi-mode information characteristic fusion and multi-mode social media emotion type prediction, and carrying out corresponding processing on data by using a quantum theory in the steps. As quantum theory has been proved to solve the paradox of classical probability theory in human cognitive modeling, that is, by using the model of quantum theory, information interaction between different modes can be more effectively captured, so that emotion types corresponding to ironic expression can be more accurately identified, and further, the accuracy of prediction results of emotion expressed by media information can be improved. And the model based on quantum theory has better interpretability.
As a possible embodiment of the present invention, the feature conversion process includes the steps of:
s301, sliding and dividing the target complex word vector set by using a sliding window to generate a plurality of composite vector subsets C1, C2, …, cx, … and Cm-k+1 corresponding to the target complex word vector set. Cx is a composite vector subset generated by the x-th segmentation of the target complex word vector set by the sliding window, and xE [1, m-k+1]. m-k+1 is the total number of composite vector subsets, m is the total length of the target complex word vector set, and k is the total length of the sliding window. The target complex word vector set is A1 or A2.
S302 according to u t 1 ,u t 2 ,…,u t ξ ,…,u t g Vector conversion is carried out on C1, C2, …, cx, … and Cm-k+1 respectively to generate a feature vector set D1, D2, …, dx, … and Dm-k+1 corresponding to each composite vector subset, wherein dx= (ψ) x 1 ,Ψ x 2 ,…,Ψ x ξ ,…,Ψ x g ) Dx is the feature vector set corresponding to Cx, ψ x ξ To use u t ξ And carrying out vector conversion processing on Cx to generate a characteristic vector. u (u) t ξ The initial parameters are preset for the xi-th category, and the xi is [1, g ]]。
Ψ x ξ The following conditions are satisfied:
Ψ x ξ =u t ξ ⊗c t x1 ⊗c t x2 ⊗…⊗c t xT ⊗…⊗c t xf(Cx)
wherein ,ct xT T E [1, f (Cx) is the T-th complex word vector in Cx]. f (Cx) is the total number of complex word vectors in Cx. Preferably, the preset initial parameter is a shared parameter.
Taking text information as an example, u t ξ May be a vector of words representing different emotions. Such as happy, sad, excited, panic, and no-eye. By adding these words to the text segment divided by the sliding window, the emotion type to be represented can be affected to a greater extent.
In this embodiment, the target complex word vector set may be divided into a plurality of complex vector subsets by setting a sliding window. I.e. a plurality of quantum composite systems can be generated. Such as: the length of the sliding window is k, and then the social media text with the length of m can create m-k+1 quantum composite systems. And then generating the feature vector corresponding to the composite vector subset by combining the context information in each sliding window. Meanwhile, in order to increase the emotion expression direction of the information in each sliding window, a preset initial parameter is added to influence the emotion expression direction of the information in the whole sliding window when the feature vector is generated. Meanwhile, as the initial state of the quantum composite system is possibly dependent on preset initial parameters, the method and the device use a plurality of preset initial parameters to perform social media text information characteristic representation.
S303, generating a feature density matrix set rho= (rho) of the target complex word vector set according to D1, D2, …, dx, … and Dm-k+1 1 ,ρ 2 ,…,ρ ξ ,…,ρ g). wherein ,ρξ To u is t ξ And correspondingly generating a characteristic density matrix in the state. ρ ξ The following conditions are satisfied:
Figure SMS_8
wherein Px is ψ x ξ Weight coefficient of (c) in the above-mentioned formula (c). Is a trainable parameter. The manual setting can be performed according to the actual use situation.
In this embodiment, feature vectors in the sliding windows are fused, so that context content of the whole information of a certain mode can be combined and represented. So that the finally generated feature density matrix has richer and more accurate semantic features. And in the combining process, feature vectors in sliding windows corresponding to the same preset initial parameters are fused. Thus, the feature density matrix corresponding to various emotion types can be obtained.
As a possible embodiment of the present invention, the preprocessing includes an information vectorization process for converting the representation information of each modality into a corresponding complex word vector. The information vectorization processing comprises the following steps:
s310, obtaining basic vectors F1, F2, …, fr, …, fz, fr= (ψ) of each effective sub-information in the target representation information in the S-dimension Hilbert space r 1 ,Ψ r 2 ,…,Ψ r h ,…,Ψ r s ). Wherein Fr is the r-th valid sub-information in the target representation information. Psi r 1 ,Ψ r 2 ,…,Ψ r h ,…,Ψ r s Is the vector value of Fr in each dimension in the hilbert space.
S320, performing superposition processing on F1, F2, …, fr, … and Fz to generate complex word vectors G1, G2, …, gr, … and Gz corresponding to each effective sub-information in the target representation information. Wherein Gr is a complex word vector corresponding to Fr. Gr satisfies the following conditions:
Figure SMS_9
wherein ,zr h Is a complex-valued weight corresponding to Gr in the h-th dimension in the hilbert space.
Figure SMS_10
Is a complex number.
r h } h=1 s Is a non-negative real number and satisfies
Figure SMS_11
,θ r h Is a real number eta r h Corresponding complex phase and satisfy theta r h ∈[-π,π]。z r h Can be set by human.
Taking text information as an example for illustration, for example, assume that there are s independent potential semantics in the text information in the social media information (the potential semantics are used to represent meaning types represented by words, such as "apple" can represent fruit, company name, and electronic device, and thus apple has 3 independent potential semantics). The number of potential semantics contained in the union of potential semantics of all words in the text information is the dimension of the Hilbert space corresponding to the text information. Each word of text is then modeled as defined in s-dimension hilbert spaceQuantum concepts in between. Wherein the latent semantics form a set of basis vectors (ψ) t 1 ,Ψ t 2 ,…,Ψ t s ). Each word may be represented using the superposition of basis vectors on the hilbert space, i.e. a linear combination of basis vectors and complex-valued weights, as follows:
Figure SMS_12
in this embodiment, the basis vectors may be defined as forms of (0, …,0,1, …). As in the 3-dimensional hilbert space, the basis vectors may be (1, 0), (0, 1, 0), and (0, 1), respectively. Thus, the words in the final text information can pass through the s-dimensional complex word vector
Figure SMS_13
And (3) representing.
As a possible embodiment of the present invention, when the modality representing the information is text, the preprocessing further includes:
s201: the representation information is subjected to word segmentation processing, and a plurality of sub-representation information is generated.
S202: sub-representation information of the type stop word is removed from the plurality of sub-representation information.
S203: mapping the rest sub-representation information with a preset dictionary to generate a plurality of effective sub-information corresponding to the representation information.
As a possible embodiment of the present invention, when the modality representing the information is an image, the preprocessing further includes:
s204: and performing target detection processing on the representation information to generate a plurality of target detection frames.
S205: and taking the image information selected by the multiple target detection frames as multiple effective sub-information corresponding to the representation information.
As a possible embodiment of the present invention, when the modality representing the information is audio, the preprocessing further includes:
s206: the representation information is subjected to noise reduction processing to generate first representation information.
S207: and carrying out segmentation processing on the first representation information to generate a plurality of effective sub-information corresponding to the representation information.
When the modes of representing the information are text, image or audio, the corresponding denoising and segmentation processing method can reduce the noise information in the data input data and reduce the calculated amount after the processing.
Embodiments of the present invention also provide a non-transitory computer readable storage medium that may be disposed in an electronic device to store at least one instruction or at least one program for implementing one of the methods embodiments, the at least one instruction or the at least one program being loaded and executed by the processor to implement the methods provided by the embodiments described above.
Embodiments of the present invention also provide an electronic device comprising a processor and the aforementioned non-transitory computer-readable storage medium.
Embodiments of the present invention also provide a computer program product comprising program code for causing an electronic device to carry out the steps of the method according to the various exemplary embodiments of the invention described in the present specification when the program product is run on the electronic device.
While certain specific embodiments of the invention have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the invention. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the invention. The scope of the invention is defined by the appended claims.

Claims (10)

1. An emotion prediction method for quantum-based media information, which is characterized by comprising the following steps:
acquiring representation information of any two modes of target media information;
representation information for each modalityLine preprocessing, namely generating complex number word vector sets A1 and A2 corresponding to the representation information of each mode, wherein A1= (A1) 1 ,A1 2 ,…,A1 n ,…,A1 f(A1) );A2=(A2 1 ,A2 2 ,…,A2 q ,…,A2 f(A2) );A1 n Is the n complex number word vector in A1; f (A1) is the total number of complex word vectors in A1; A2A 2 q Is the q complex number word vector in A2; f (A2) is the total number of complex word vectors in A2, the value range of n is 1 to f (A1), and the value range of q is 1 to f (A2);
performing feature conversion processing on A1 and A2 respectively to generate corresponding feature density matrix sets rho respectively t ρ v; wherein ,ρt =(ρ t 1 ,ρ t 2 ,…,ρ t β ,…,ρ t g ) ρ v =(ρ v 1 ,ρ v 2 ,…,ρ v j ,…,ρ v g ),ρ t β A1 is a feature density matrix correspondingly generated under the beta-th preset initial parameter; ρ v j A2 is a feature density matrix correspondingly generated under the j-th preset initial parameters; β=1, 2, …, g; j=1, 2, …, g; g is the total number of categories of preset initial parameters;
p t ρ v Feature fusion processing is carried out to generate fusion features f of target media information p, wherein fp The following conditions are satisfied:
Figure QLYQS_1
wherein ,
Figure QLYQS_2
ɸ as interference item information β,j For ρ t β 、ρ v j The degree of association between the two modality information; ɸ β,j ∈[-π,π]The method comprises the steps of carrying out a first treatment on the surface of the PCA () is to perform PCA processing on the matrix;
according to f p Projection operators with a plurality of preset emotion types to generate f p Probability value P (e 1 ),P(e 2 ),…,P(e δ ),…,P(e w ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein is f p A probability value of a delta-th preset emotion type; δ=1, 2, …, w; w is the total number of preset emotion types, P (e δ ) The following conditions are satisfied:
Figure QLYQS_3
wherein ,Eδ A projection operator of a delta preset emotion type; tr () is a trace taking function of the matrix and is used for obtaining the trace of the matrix; () H Is the conjugate transpose function of the matrix;
p (e) 1 ),P(e 2 ),…,P(e δ ),…,P(e w ) And the emotion type corresponding to the maximum value in the media information is used as the emotion type of the target media information.
2. The method according to claim 1, wherein the feature conversion process includes the steps of:
performing sliding segmentation on a target complex word vector set by using a sliding window to generate a plurality of composite vector subsets C1, C2, …, cx, … and Cm-k+1 corresponding to the target complex word vector set; cx is a composite vector subset generated after the x-th segmentation of the target complex word vector set by the sliding window, and x is [1, m-k+1]; m-k+1 is the total number of the composite vector subsets, m is the total length of the target complex word vector set, and k is the total length of the sliding window;
according to u t 1 ,u t 2 ,…,u t ξ ,…,u t g Vector conversion is carried out on C1, C2, …, cx, … and Cm-k+1 respectively to generate feature vector sets D1, D2, … and Dx (), … and Dm-k+1 corresponding to each composite vector subset, wherein Dx= (ψ) x 1 ,Ψ x 2 ,…,Ψ x ξ ,…,Ψ x g ) Dx is the feature vector set corresponding to Cx, ψ x ξ To use u t ξ Performing vector conversion processing on Cx to generate a feature vector; u (u) t ξ The initial parameters are preset for the xi-th category, and the xi is [1, g ]];
Ψ x ξ The following conditions are satisfied:
Ψ x ξ =u t ξ ⊗c t x1 ⊗c t x2 ⊗…⊗c t xT ⊗…⊗c t xf(Cx)
wherein ,ct xT T E [1, f (Cx) is the T-th complex word vector in Cx]The method comprises the steps of carrying out a first treatment on the surface of the f (Cx) is the total number of complex word vectors in Cx;
generating a feature density matrix set rho= (rho) of the target complex word vector set according to D1, D2, …, dx, …, dm-k+1 1 ,ρ 2 ,…,ρ ξ ,…,ρ g); wherein ,ρξ To u is t ξ The characteristic density matrix is correspondingly generated in the state; ρ ξ The following conditions are satisfied:
Figure QLYQS_4
wherein Px is ψ x ξ Weight coefficient of (c) in the above-mentioned formula (c).
3. The method of claim 1, wherein the preprocessing comprises an information vectorization process for converting the representation information of each modality into a corresponding complex word vector; the information vectorization processing comprises the following steps:
obtaining basic vectors F1, F2, …, fr, …, fz, fr= (ψ) of each effective sub-information in the target representation information in the s-dimension Hilbert space r 1 ,Ψ r 2 ,…,Ψ r h ,…,Ψ r s ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein FrThe r effective sub-information in the target representation information is obtained; psi r 1 ,Ψ r 2 ,…,Ψ r h ,…,Ψ r s Vector values for Fr in each dimension in the hilbert space;
performing superposition processing on the F1, F2, …, fr, … and Fz to generate complex word vectors G1, G2, …, gr, … and Gz corresponding to each effective sub-information in the target representation information; wherein Gr is a complex word vector corresponding to Fr; gr satisfies the following conditions:
Figure QLYQS_5
wherein ,zr h Complex-valued weights corresponding to Gr in the hilbert space in the h dimension;
Figure QLYQS_6
is a complex number;
r h } h=1 s is a non-negative real number and satisfies
Figure QLYQS_7
,θ r h Is a real number eta r h Corresponding complex phase and satisfy theta r h ∈[-π,π]。
4. The method of claim 1, wherein the modality that represents information includes any one of text, image, and audio.
5. The method of claim 4, wherein when the modality representing the information is text, the preprocessing further comprises:
performing word segmentation processing on the representation information to generate a plurality of sub-representation information;
removing sub-representation information with the type of stop words from a plurality of sub-representation information;
mapping the rest sub-representation information with a preset dictionary to generate a plurality of effective sub-information corresponding to the representation information to be processed.
6. The method of claim 4, wherein when the modality representing the information is an image, the preprocessing further comprises:
performing target detection processing on the representation information to generate a plurality of target detection frames;
and taking the image information selected by the target detection frames as a plurality of effective sub-information corresponding to the to-be-processed representation information.
7. The method of claim 4, wherein when the modality representing the information is audio, the preprocessing further comprises:
carrying out noise reduction processing on the representation information to generate first representation information to be processed;
and carrying out segmentation processing on the first to-be-processed representation information to generate a plurality of effective sub-information corresponding to the to-be-processed representation information.
8. The method of claim 1, wherein the predetermined initial parameter is a shared parameter.
9. A non-transitory computer readable storage medium storing a computer program, wherein the computer program when executed by a processor implements a method of emotion prediction for quantum-based media information as claimed in any one of claims 1 to 8.
10. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, wherein the processor implements a method of emotion prediction for quantum-based media information as claimed in any one of claims 1 to 8 when the computer program is executed by the processor.
CN202310267414.XA 2023-03-20 2023-03-20 Emotion prediction method, medium and device for quantum-based media information Active CN115982395B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310267414.XA CN115982395B (en) 2023-03-20 2023-03-20 Emotion prediction method, medium and device for quantum-based media information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310267414.XA CN115982395B (en) 2023-03-20 2023-03-20 Emotion prediction method, medium and device for quantum-based media information

Publications (2)

Publication Number Publication Date
CN115982395A CN115982395A (en) 2023-04-18
CN115982395B true CN115982395B (en) 2023-05-23

Family

ID=85968586

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310267414.XA Active CN115982395B (en) 2023-03-20 2023-03-20 Emotion prediction method, medium and device for quantum-based media information

Country Status (1)

Country Link
CN (1) CN115982395B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832663A (en) * 2017-09-30 2018-03-23 天津大学 A kind of multi-modal sentiment analysis method based on quantum theory
CN112418166A (en) * 2020-12-10 2021-02-26 南京理工大学 Emotion distribution learning method based on multi-mode information
CN115017900A (en) * 2022-04-24 2022-09-06 北京理工大学 Multi-mode multi-unbiased conversation emotion recognition method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2005114557A2 (en) * 2004-05-13 2005-12-01 Proximex Multimodal high-dimensional data fusion for classification and identification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107832663A (en) * 2017-09-30 2018-03-23 天津大学 A kind of multi-modal sentiment analysis method based on quantum theory
CN112418166A (en) * 2020-12-10 2021-02-26 南京理工大学 Emotion distribution learning method based on multi-mode information
CN115017900A (en) * 2022-04-24 2022-09-06 北京理工大学 Multi-mode multi-unbiased conversation emotion recognition method

Also Published As

Publication number Publication date
CN115982395A (en) 2023-04-18

Similar Documents

Publication Publication Date Title
JP7193252B2 (en) Captioning image regions
Cho et al. Describing multimedia content using attention-based encoder-decoder networks
CN111079532A (en) Video content description method based on text self-encoder
CN116415654A (en) Data processing method and related equipment
CN112818861A (en) Emotion classification method and system based on multi-mode context semantic features
CN110795549B (en) Short text conversation method, device, equipment and storage medium
CN114387567B (en) Video data processing method and device, electronic equipment and storage medium
Halvardsson et al. Interpretation of swedish sign language using convolutional neural networks and transfer learning
CN114140885A (en) Emotion analysis model generation method and device, electronic equipment and storage medium
CN114443899A (en) Video classification method, device, equipment and medium
Shi et al. Multitask training with unlabeled data for end-to-end sign language fingerspelling recognition
Wang et al. WaveNet with cross-attention for audiovisual speech recognition
JP2020123329A (en) Allocation of relevance score of artificial neural network
Huang et al. Learning long-term temporal contexts using skip RNN for continuous emotion recognition
CN114169408A (en) Emotion classification method based on multi-mode attention mechanism
CN109117471B (en) Word relevancy calculation method and terminal
Jenny Li et al. Evaluating deep learning biases based on grey-box testing results
CN117421639A (en) Multi-mode data classification method, terminal equipment and storage medium
CN115982395B (en) Emotion prediction method, medium and device for quantum-based media information
Prasath Design of an integrated learning approach to assist real-time deaf application using voice recognition system
Ermatita et al. Sentiment Analysis of COVID-19 using Multimodal Fusion Neural Networks.
Joshi et al. Res-CNN-BiLSTM network for overcoming mental health disturbances caused due to cyberbullying through social media
CN115357712A (en) Aspect level emotion analysis method and device, electronic equipment and storage medium
CN115589446A (en) Meeting abstract generation method and system based on pre-training and prompting
CN114998041A (en) Method and device for training claim settlement prediction model, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant