CN118450176A - Artificial intelligence based emotion recognition and video content matching system - Google Patents

Artificial intelligence based emotion recognition and video content matching system Download PDF

Info

Publication number
CN118450176A
CN118450176A CN202410905484.8A CN202410905484A CN118450176A CN 118450176 A CN118450176 A CN 118450176A CN 202410905484 A CN202410905484 A CN 202410905484A CN 118450176 A CN118450176 A CN 118450176A
Authority
CN
China
Prior art keywords
emotion
video content
unit
feature
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410905484.8A
Other languages
Chinese (zh)
Other versions
CN118450176B (en
Inventor
刘琛良
张亦弛
吴红
肖仙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hunan Mango Rongchuang Technology Co ltd
Original Assignee
Hunan Mango Rongchuang Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hunan Mango Rongchuang Technology Co ltd filed Critical Hunan Mango Rongchuang Technology Co ltd
Priority to CN202410905484.8A priority Critical patent/CN118450176B/en
Publication of CN118450176A publication Critical patent/CN118450176A/en
Application granted granted Critical
Publication of CN118450176B publication Critical patent/CN118450176B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an artificial intelligence-based emotion recognition and video content matching system, which comprises the following modules: the system comprises an emotion recognition module, an emotion marking and classifying module, a video content analysis module, a video content matching construction module and an actual application module. The invention belongs to the technical field of data processing, in particular to an emotion recognition and video content matching system based on artificial intelligence, which adopts an emotion recognition and classification model, and updates model parameters by a gradient descent algorithm to solve the problem of inaccurate emotion recognition; calculating the relevance of emotion characteristics and opinion characteristics through tensors by adopting an emotion marking model, calculating attention scores, and carrying out characteristic fusion on the emotion characteristics and opinion characteristics to obtain final emotion information, so that the accuracy of emotion classification is improved, and the user requirements are accurately met; and calculating a characteristic attribute set of the video content data, deleting redundant data, marking an overlapping region, processing a region matching task by using a multithreading method, and improving the video content matching efficiency.

Description

Artificial intelligence based emotion recognition and video content matching system
Technical Field
The invention belongs to the technical field of data processing, and particularly relates to an artificial intelligence-based emotion recognition and video content matching system.
Background
The emotion recognition and video content matching system is a system combining emotion recognition technology and multimedia content analysis technology, captures and analyzes emotion states in signals by using a deep learning model, understands visual contents in videos, aims at enabling emotion of a person to be consistent with video picture contents, carries out automatic emotion marking and content classification on large-scale media resources, accurately pushes contents conforming to the emotion states of the user according to emotion feedback of the user, aims at helping the user to find video contents conforming to the emotion states more quickly and accurately, and brings audio-visual experience conforming to preference and emotion requirements of the user. However, the existing emotion recognition and video content matching system has inaccurate emotion recognition, so that recommended video content is not matched with the actual emotion state of a user, the experience of the user is reduced, irrelevant information is erroneously pushed by the system, and the technical problem of information overload is caused; the emotion classification accuracy is low, the user requirements cannot be met better, and the technical problem of target client groups is accurately solved; the method has the technical problems that large-scale video data are difficult to accurately identify and match video content, and bandwidth and storage resources are wasted.
Disclosure of Invention
Aiming at the situation, in order to overcome the defects of the prior art, the emotion recognition and video content matching system based on artificial intelligence provided by the invention aims at solving the technical problems that the recommended video content is not matched with the real emotion state of a user due to inaccurate emotion recognition, the experience of the user is reduced, the system erroneously pushes irrelevant information and information overload is caused, the characteristic extraction is carried out on an audio signal, an emotion recognition classification model is constructed, model iteration is carried out, model parameters are updated by adopting a gradient descent algorithm, and the process is iterated until the model converges; aiming at the technical problems that the accuracy of emotion classification is low and the user demands cannot be met well, and aiming at a target customer group, an emotion marking model is built accurately, the relevance of emotion characteristics and opinion characteristics is calculated through tensors, attention scores are calculated, the emotion characteristics and opinion characteristics are subjected to characteristic fusion, final emotion information is obtained, and the accuracy of emotion classification is improved; aiming at the technical problems that large-scale video data are difficult to accurately identify and match video content and waste of bandwidth and storage resources is caused, the method comprises the steps of calculating a video content data characteristic attribute set, deleting redundant data, calculating a loss function, marking an overlapping region, processing region matching tasks by using a multithreading method, and improving efficiency and instantaneity of a matching algorithm.
The invention provides an artificial intelligence-based emotion recognition and video content matching system, which comprises an emotion recognition module, an emotion marking and classifying module, a video content analysis module, a video content matching building module and an actual application module;
The emotion recognition module specifically performs feature extraction on the audio signal, builds an emotion recognition classification model, performs model iteration, updates model parameters by adopting a gradient descent algorithm, and iterates the process until the model converges;
The emotion marking and classifying module is specifically used for constructing an emotion marking model, calculating the relevance of emotion characteristics and opinion characteristics through tensors, calculating attention scores, carrying out characteristic fusion on the emotion characteristics and opinion characteristics to obtain final emotion information, and improving the accuracy of emotion classification;
the video content analysis module is used for carrying out video analysis and classifying video contents;
The video content matching module is specifically configured to calculate a video content data characteristic attribute set, delete redundant data, calculate a loss function, mark an overlapping region, process a region matching task by using a multithreading method, and improve the efficiency and instantaneity of a matching algorithm;
The actual application module specifically combines emotion feedback and preference of the user with emotion recognition and video content matching results to accurately push video content conforming to the emotion state of the user, so that user experience is improved and user requirements are met.
Further, in the emotion recognition module, a feature extraction unit, an emotion recognition classification model construction unit and a model iteration unit are arranged, and the emotion recognition classification model construction unit specifically comprises the following contents:
A feature extraction unit for calculating a sum of squares of differences between the audio feature values and the average values, for describing distribution features of the audio signals;
constructing an emotion recognition classification model unit, improving a traditional feedforward neural network, constructing a forward neural network model based on Chebyshev orthogonal polynomial clusters aiming at the distribution characteristic data of the audio signals, wherein in the model, a single hidden layer is used for reducing the complexity of the whole model, a function in the Chebyshev orthogonal polynomial clusters is used for a stimulation function of each neuron in the hidden layer and is used for fitting the distribution characteristic of the audio signals, and a linear stimulation function is adopted for the stimulation functions of the neurons in other layers;
And the model iteration unit is used for calculating the activation value of each node in the neural network, propagating the gradient through a back propagation algorithm to obtain the gradient value of each parameter, updating the model parameter by adopting a gradient descent algorithm, and iterating the process until the model converges to realize emotion recognition.
Further, in the emotion marking and classifying module, an emotion marking model building unit, an attention score calculating unit and a feature fusion unit are arranged, and specifically include the following contents:
An emotion marking model unit is constructed, input data of a definition model are emotion characteristics P and opinion characteristics O of a user respectively, and a combination vector between the emotion characteristics and opinion characteristics is calculated through tensor operation, wherein the following formula is used:
In the method, in the process of the invention, The method comprises the steps of representing a combination vector, wherein the combination vector is used for representing the degree of association between an ith feature vector in emotion feature P and a jth feature vector in opinion feature O, P i represents the ith feature vector in emotion feature P, O j represents the jth feature vector in opinion feature O, G p is a three-dimensional tensor, the dimension of which is K multiplied by 2d, K represents the hyper-parameters of G p, the complex intrinsic correlation between emotion feature and opinion feature is represented, the K value is increased, the more information is extracted, the complexity is higher, 2d represents the dimension of each feature vector, i is the index of the feature vector of emotion feature, j is the index of the feature vector of opinion feature, and tanh is the abbreviation of hyperbolic tangent function;
And the attention score calculating unit is used for calculating the attention score after obtaining the combined vector between the emotion characteristics and the opinion characteristics, wherein the following formula is used:
In the method, in the process of the invention, Indicating a concentration score, the higher the concentration score, indicating that the ith feature vector in emotion feature P captures more information for the jth feature vector in opinion feature O,Is a weight vector used to measure the importance of each value of the combined vector;
the feature fusion unit fuses the emotion features and the opinion features to obtain final emotion information, and the formula is as follows:
In the method, in the process of the invention, Representing final emotion information, S p representing emotion feature vectors, S o representing opinion feature vectors, E p representing a matrix based on attention scores, softmax r representing a row-based softmax function for softmax operation on each row of the attention score matrix, and the resulting weight vectors are used to weight-fuse S o for improved accuracy of emotion classification and opinion extraction.
Further, in the video content analysis module, a video analysis unit and a content classification unit are provided, which specifically include the following contents:
The video analysis unit analyzes the video and extracts basic information and characteristics of the video, including resolution, frame rate and encoding and decoding formats;
and the content classification unit classifies the video content, marks the video according to the video theme and the content characteristics, and is convenient to search and manage.
Further, in the video content matching module, a calculation feature attribute set unit, a redundant data unit deletion, a calculation loss function unit, a mark overlapping area unit and a parallel calculation unit are provided, and the specific contents are as follows:
The computing characteristic attribute set unit is used for collecting video content data from the video content analysis module, storing the video content data in the large-capacity data storage device, processing and transmitting the video content data as HLA distributed simulation data, and computing the characteristic attribute set of the video content data, wherein the formula is as follows:
In the method, in the process of the invention, Q i is the characteristic number of the video content data after HLA distributed data classification, s is the characteristic content of the video content data, n is the total characteristic number of the video content data, and c is the parameter of the HLA distributed data;
And deleting redundant data units, deleting non-characteristic attributes, reducing errors, improving the speed during collection, and calculating a removal reference, wherein the formula is as follows:
Where L represents a removal criterion for determining whether to delete data conforming to the criterion, A vector representation representing a filter request used at the time of removal, q representing a scalar representation of the filter request used at the time of removal for identifying characteristics of HLA-distributed data, e representing an existing redundant data deletion request for deleting non-characteristic attributes;
and a loss function calculation unit for defining the input vector and the label value and calculating the loss function by using the following formula:
Where LL represents the loss function, Is an input vector, represents HLA distributed simulation evaluation effect image characteristics,Is a label value, which represents a real type label corresponding to the HLA distributed simulation evaluation effect image,The method comprises the steps of representing a class weight matrix, wherein the class weight matrix is used for weighting and calculating a loss function, T represents a transpose of the matrix, b j1 represents an error value of a class, m represents the number of training samples, i1 is an index of the training samples, n represents the number of the class of the training samples, j1 is an index of the class of the training samples, the training samples are a group of known HLA distributed simulation evaluation effect image characteristics and corresponding real class labels thereof, and the evaluation effect image characteristics are used for matching video content so as to quickly find videos matched with the emotion of a user;
Marking an overlapping area unit, wherein the overlapping area unit marks overlapping areas, the overlapping area represents that repeated content exists among different parts in video content matching, processing is carried out on the overlapping areas, the accuracy and the efficiency of video content matching are improved, the labels of each area are marked, each connected area is guaranteed to have a unique label, processing is carried out from the overlapping line of the last row, each time, for each pixel, whether adjacent pixels belong to the same connected area is checked, if so, the two areas are combined, the labels of the two areas are updated to be the same label, and the next row is continuously processed upwards until all overlapping rows are processed;
The parallel computing unit divides the plurality of mobile matching tasks into different kernels, and each kernel is responsible for processing different region matching tasks; the parallelization calculation of the region matching degree is realized by utilizing a multithreading method, and each kernel uses an independent thread to execute the region matching task; in the parallel computing process, repeated computation is avoided through a reasonable data sharing and communication mechanism, the computing capacity of the region matching capacity is improved, the number of redundant computation is reduced, and the actual efficiency of video content matching is improved.
Further, in the practical application module, specifically, according to emotion feedback and preference of the user, the results of emotion recognition and video content matching are combined, so that video content conforming to the emotion state of the user is accurately pushed to the user, user experience is improved, and user requirements are met.
By adopting the scheme, the beneficial effects obtained by the invention are as follows:
(1) Aiming at the technical problems that the recommended video content is not matched with the real emotion state of the user due to inaccurate emotion recognition, the experience of the user is reduced, irrelevant information is pushed by a system error, and information overload is caused, the method comprises the steps of extracting characteristics of an audio signal, constructing an emotion recognition classification model, carrying out model iteration, updating model parameters by adopting a gradient descent algorithm, and iterating the process until the model converges;
(2) Aiming at the technical problems that the accuracy of emotion classification is low and the requirements of users cannot be met well, and the target customer group is accurately aimed at, an emotion marking model is built, the relevance between the emotion characteristics and the opinion characteristics of the users is calculated through tensors, attention scores are calculated, the emotion characteristics and the opinion characteristics are subjected to characteristic fusion, final emotion information is obtained, and the accuracy of emotion classification is improved;
(3) Aiming at the technical problems that large-scale video data are difficult to accurately identify and match video content and waste of bandwidth and storage resources is caused, the method comprises the steps of calculating a video content data characteristic attribute set, deleting redundant data, calculating a loss function, marking an overlapping region, processing region matching tasks by using a multithreading method, and improving efficiency and instantaneity of a matching algorithm.
Drawings
FIG. 1 is a schematic diagram of an artificial intelligence based emotion recognition and video content matching system provided by the present invention;
FIG. 2 is a schematic diagram of an emotion recognition module;
FIG. 3 is a schematic diagram of an emotion marking and classifying module;
Fig. 4 is a schematic diagram of constructing a video content matching module.
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the description of the present invention, it should be understood that the terms "upper," "lower," "front," "rear," "left," "right," "top," "bottom," "inner," "outer," and the like indicate orientation or positional relationships based on those shown in the drawings, merely to facilitate description of the invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the invention.
Referring to fig. 1, the emotion recognition and video content matching system based on artificial intelligence provided by the invention comprises an emotion recognition module, an emotion marking and classifying module, a video content analysis module, a video content matching module and an actual application module;
The emotion recognition module specifically performs feature extraction on the audio signal, builds an emotion recognition classification model, performs model iteration, updates model parameters by adopting a gradient descent algorithm, and iterates the process until the model converges;
The emotion marking and classifying module is specifically used for constructing an emotion marking model, calculating the relevance of emotion characteristics and opinion characteristics through tensors, calculating attention scores, carrying out characteristic fusion on the emotion characteristics and opinion characteristics to obtain final emotion information, and improving the accuracy of emotion classification;
the video content analysis module is used for carrying out video analysis and classifying video contents;
The video content matching module is specifically configured to calculate a video content data characteristic attribute set, delete redundant data, calculate a loss function, mark an overlapping region, process a region matching task by using a multithreading method, and improve the efficiency and instantaneity of a matching algorithm;
The actual application module specifically combines emotion feedback and preference of the user with emotion recognition and video content matching results to accurately push video content conforming to the emotion state of the user, so that user experience is improved and user requirements are met.
In the second embodiment, referring to fig. 1 and fig. 2, the emotion recognition module is provided with a feature extraction unit, an emotion recognition classification model construction unit and a model iteration unit, and specifically includes the following contents:
A feature extraction unit for calculating a sum of squares of differences between the audio feature values and the average values, for describing distribution features of the audio signals;
constructing an emotion recognition classification model unit, improving a traditional feedforward neural network, constructing a forward neural network model based on Chebyshev orthogonal polynomial clusters aiming at the distribution characteristic data of the audio signals, wherein in the model, a single hidden layer is used for reducing the complexity of the whole model, a function in the Chebyshev orthogonal polynomial clusters is used for a stimulation function of each neuron in the hidden layer and is used for fitting the distribution characteristic of the audio signals, and a linear stimulation function is adopted for the stimulation functions of the neurons in other layers;
And the model iteration unit is used for calculating the activation value of each node in the neural network, propagating the gradient through a back propagation algorithm to obtain the gradient value of each parameter, updating the model parameter by adopting a gradient descent algorithm, and iterating the process until the model converges to realize emotion recognition.
By executing the operations, the method adopts the characteristics extraction of the audio signals, builds the emotion recognition classification model, carries out model iteration, adopts the gradient descent algorithm to update model parameters, iterates the process until the model converges, and solves the technical problems that the recommended video content is not matched with the true emotion state of the user, the user experience is reduced, the system erroneously pushes irrelevant information and information overload is caused.
Referring to fig. 1 and 3, in the embodiment, in the emotion marking and classifying module, a emotion marking model building unit, an attention score calculating unit and a feature fusion unit are provided, which specifically include the following contents:
An emotion marking model unit is constructed, input data of a definition model are emotion characteristics P and opinion characteristics O of a user respectively, and a combination vector between the emotion characteristics and opinion characteristics is calculated through tensor operation, wherein the following formula is used:
In the method, in the process of the invention, The method comprises the steps of representing a combination vector, wherein the combination vector is used for representing the degree of association between an ith feature vector in emotion feature P and a jth feature vector in opinion feature O, P i represents the ith feature vector in emotion feature P, O j represents the jth feature vector in opinion feature O, G p is a three-dimensional tensor, the dimension of which is K multiplied by 2d, K represents the hyper-parameters of G p, the complex intrinsic correlation between emotion feature and opinion feature is represented, the K value is increased, the more information is extracted, the complexity is higher, 2d represents the dimension of each feature vector, i is the index of the feature vector of emotion feature, j is the index of the feature vector of opinion feature, and tanh is the abbreviation of hyperbolic tangent function;
And the attention score calculating unit is used for calculating the attention score after obtaining the combined vector between the emotion characteristics and the opinion characteristics, wherein the following formula is used:
In the method, in the process of the invention, Indicating a concentration score, the higher the concentration score, indicating that the ith feature vector in emotion feature P captures more information for the jth feature vector in opinion feature O,Is a weight vector used to measure the importance of each value of the combined vector;
the feature fusion unit fuses the emotion features and the opinion features to obtain final emotion information, and the formula is as follows:
In the method, in the process of the invention, Representing final emotion information, S p representing emotion feature vectors, S o representing opinion feature vectors, E p representing a matrix based on attention scores, softmax r representing a row-based softmax function for softmax operation on each row of the attention score matrix, and the resulting weight vectors are used to weight-fuse S o for improved accuracy of emotion classification and opinion extraction.
Through executing the operations, the emotion marking model is constructed, the relevance of emotion characteristics and opinion characteristics is calculated through tensors, attention scores are calculated, the emotion characteristics and opinion characteristics are subjected to characteristic fusion, final emotion information is obtained, emotion classification accuracy is improved, the technical problems that emotion classification accuracy is not high, user requirements cannot be met well, and target client groups are accurately met are solved.
In the fourth embodiment, referring to fig. 1, the video content analysis module is provided with a video analysis unit and a content classification unit, and the embodiment is based on the above embodiment, and specifically includes the following contents:
The video analysis unit analyzes the video and extracts basic information and characteristics of the video, including resolution, frame rate and encoding and decoding formats;
and the content classification unit classifies the video content, marks the video according to the video theme and the content characteristics, and is convenient to search and manage.
In the fifth embodiment, referring to fig. 1 and fig. 4, in the embodiment, in constructing a video content matching module, a calculating feature attribute set unit, a redundant data deleting unit, a calculating loss function unit, a label overlapping area unit and a parallel calculating unit are provided, which specifically includes the following steps:
The computing characteristic attribute set unit is used for collecting video content data from the video content analysis module, storing the video content data in the large-capacity data storage device, processing and transmitting the video content data as HLA distributed simulation data, and computing the characteristic attribute set of the video content data, wherein the formula is as follows:
In the method, in the process of the invention, Q i is the characteristic number of the video content data after HLA distributed data classification, s is the characteristic content of the video content data, n is the total characteristic number of the video content data, and c is the parameter of the HLA distributed data;
And deleting redundant data units, deleting non-characteristic attributes, reducing errors, improving the speed during collection, and calculating a removal reference, wherein the formula is as follows:
Where L represents a removal criterion for determining whether to delete data conforming to the criterion, A vector representation representing a filter request used at the time of removal, q representing a scalar representation of the filter request used at the time of removal for identifying characteristics of HLA-distributed data, e representing an existing redundant data deletion request for deleting non-characteristic attributes;
and a loss function calculation unit for defining the input vector and the label value and calculating the loss function by using the following formula:
Where LL represents the loss function, Is an input vector, represents HLA distributed simulation evaluation effect image characteristics,Is a label value, which represents a real type label corresponding to the HLA distributed simulation evaluation effect image,The method comprises the steps of representing a class weight matrix, wherein the class weight matrix is used for weighting and calculating a loss function, T represents a transpose of the matrix, b j1 represents an error value of a class, m represents the number of training samples, i1 is an index of the training samples, n represents the number of the class of the training samples, j1 is an index of the class of the training samples, the training samples are a group of known HLA distributed simulation evaluation effect image characteristics and corresponding real class labels thereof, and the evaluation effect image characteristics are used for matching video content so as to quickly find videos matched with the emotion of a user;
Marking an overlapping area unit, wherein the overlapping area unit marks overlapping areas, the overlapping area represents that repeated content exists among different parts in video content matching, processing is carried out on the overlapping areas, the accuracy and the efficiency of video content matching are improved, the labels of each area are marked, each connected area is guaranteed to have a unique label, processing is carried out from the overlapping line of the last row, each time, for each pixel, whether adjacent pixels belong to the same connected area is checked, if so, the two areas are combined, the labels of the two areas are updated to be the same label, and the next row is continuously processed upwards until all overlapping rows are processed;
The parallel computing unit divides the plurality of mobile matching tasks into different kernels, and each kernel is responsible for processing different region matching tasks; the parallelization calculation of the region matching degree is realized by utilizing a multithreading method, and each kernel uses an independent thread to execute the region matching task; in the parallel computing process, repeated computation is avoided through a reasonable data sharing and communication mechanism, the computing capacity of the region matching capacity is improved, the number of redundant computation is reduced, and the actual efficiency of video content matching is improved.
By executing the operation, the scheme adopts the technical problems that the characteristic attribute set of the video content data is calculated, redundant data is deleted, a loss function is calculated, an overlapping region is marked, a region matching task is processed by utilizing a multithreading method, the efficiency and the instantaneity of a matching algorithm are improved, the video content is difficult to accurately identify and match in large-scale video data, and the waste of bandwidth and storage resources is caused.
In the sixth embodiment, referring to fig. 1, the embodiment is based on the above embodiment, and in the practical application module, specifically, according to the emotion feedback and preference of the user, the result of emotion recognition and video content matching is combined, so that the video content conforming to the emotion state of the user is accurately pushed to the user, thereby improving the user experience and meeting the user requirement.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made hereto without departing from the spirit and principles of the present invention.
The invention and its embodiments have been described above with no limitation, and the actual construction is not limited to the embodiments of the invention as shown in the drawings. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.

Claims (6)

1. The emotion recognition and video content matching system based on artificial intelligence is characterized in that: the system comprises an emotion recognition module, an emotion marking and classifying module, a video content analysis module, a video content matching building module and an actual application module;
The emotion recognition module specifically performs feature extraction on the audio signal, constructs an emotion recognition classification model, updates model parameters by adopting a gradient descent algorithm, and performs model iteration;
the emotion marking and classifying module is specifically used for constructing an emotion marking model, calculating the relevance of emotion characteristics and opinion characteristics through tensors, calculating attention scores, and carrying out characteristic fusion on the emotion characteristics and opinion characteristics to obtain final emotion information;
the video content analysis module is used for carrying out video analysis and classifying video contents;
the video content matching module is specifically used for calculating a video content data characteristic attribute set, deleting redundant data, calculating a loss function, marking an overlapping region and processing a region matching task by using a multithreading method;
the actual application module is used for accurately pushing the video content conforming to the emotion state of the user according to the emotion feedback and preference of the user and combining the emotion recognition and video content matching results.
2. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: the emotion marking and classifying module is provided with an emotion marking model building unit, an attention score calculating unit and a feature fusion unit, and specifically comprises the following contents:
An emotion marking model unit is constructed, input data of a definition model are emotion characteristics P and opinion characteristics O of a user respectively, and a combination vector between the emotion characteristics and opinion characteristics is calculated through tensor operation, wherein the following formula is used:
In the method, in the process of the invention, The method comprises the steps of representing a combination vector, wherein the combination vector is used for representing the degree of association between an ith feature vector in emotion feature P and a jth feature vector in opinion feature O, P i represents the ith feature vector in emotion feature P, O j represents the jth feature vector in opinion feature O, G p is a three-dimensional tensor, the dimension of which is K multiplied by 2d, K represents the hyper-parameters of G p, the complex intrinsic correlation between emotion feature and opinion feature is represented, the K value is increased, the more information is extracted, the complexity is higher, 2d represents the dimension of each feature vector, i is the index of the feature vector of emotion feature, j is the index of the feature vector of opinion feature, and tanh is the abbreviation of hyperbolic tangent function;
And the attention score calculating unit is used for calculating the attention score after obtaining the combined vector between the emotion characteristics and the opinion characteristics, wherein the following formula is used:
In the method, in the process of the invention, Indicating a concentration score, the higher the concentration score, indicating that the ith feature vector in emotion feature P captures more information for the jth feature vector in opinion feature O,Is a weight vector used to measure the importance of each value of the combined vector;
the feature fusion unit fuses the emotion features and the opinion features to obtain final emotion information, and the formula is as follows:
In the method, in the process of the invention, Representing the final emotion information, S p representing emotion feature vectors, S o representing opinion feature vectors, E p representing a matrix based on attention scores, softmax r representing a row-based softmax function for softmax manipulation of each row of the attention score matrix, the resulting weight vectors being used to weight fuse S o.
3. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: in the construction of the video content matching module, a calculation characteristic attribute set unit, a redundant data deleting unit, a calculation loss function unit, a mark overlapping area unit and a parallel calculation unit are arranged, and the specific contents are as follows:
The computing characteristic attribute set unit is used for collecting video content data from the video content analysis module, storing the video content data in the large-capacity data storage device, processing and transmitting the video content data as HLA distributed simulation data, and computing the characteristic attribute set of the video content data, wherein the formula is as follows:
In the method, in the process of the invention, Q i is the characteristic number of the video content data after HLA distributed data classification, s is the characteristic content of the video content data, n is the total characteristic number of the video content data, and c is the parameter of the HLA distributed data;
Deleting redundant data units, and calculating a removal reference, wherein the formula is as follows:
Where L represents a removal criterion for determining whether to delete data conforming to the criterion, A vector representation representing a filter request used at the time of removal, q representing a scalar representation of the filter request used at the time of removal for identifying characteristics of HLA-distributed data, e representing an existing redundant data deletion request for deleting non-characteristic attributes;
and a loss function calculation unit for defining the input vector and the label value and calculating the loss function by using the following formula:
Where LL represents the loss function, Is an input vector, represents HLA distributed simulation evaluation effect image characteristics,Is a label value, which represents a real type label corresponding to the HLA distributed simulation evaluation effect image,The method comprises the steps of representing a class weight matrix, wherein the class weight matrix is used for weighting and calculating a loss function, T represents a transpose of the matrix, b j1 represents an error value of a class, m represents the number of training samples, i1 is an index of the training samples, n represents the number of the class of the training samples, j1 is an index of the class of the training samples, and the training samples are a group of known HLA distributed simulation evaluation effect image characteristics and corresponding real class labels;
marking an overlapping area unit, namely marking the overlapping area, wherein the overlapping part represents that repeated contents exist among different parts in video content matching, processing the overlapping area, marking the label of each area, ensuring that each connected area has a unique label, and processing from the overlapping line of the penultimate line, processing one line each time until all the overlapping lines are processed;
The parallel computing unit divides the plurality of mobile matching tasks into different kernels, and each kernel is responsible for processing different region matching tasks; the parallelization calculation of the region matching degree is realized by utilizing a multithreading method, and each kernel uses an independent thread to execute the region matching task; in the parallel computing process, repeated computation is avoided through reasonable data sharing and communication mechanisms.
4. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: the emotion recognition module is provided with a feature extraction unit, an emotion recognition classification model construction unit and a model iteration unit, and specifically comprises the following contents:
A feature extraction unit for calculating a sum of squares of differences between the audio feature values and the average values, for describing distribution features of the audio signals;
constructing an emotion recognition classification model unit, improving a traditional feedforward neural network, constructing a forward neural network model based on Chebyshev orthogonal polynomial clusters aiming at the distribution characteristic data of the audio signals, wherein in the model, a single hidden layer is used for reducing the complexity of the whole model, a function in the Chebyshev orthogonal polynomial clusters is used for a stimulation function of each neuron in the hidden layer and is used for fitting the distribution characteristic of the audio signals, and a linear stimulation function is adopted for the stimulation functions of the neurons in other layers;
And the model iteration unit is used for calculating the activation value of each node in the neural network, propagating the gradient through a back propagation algorithm to obtain the gradient value of each parameter, updating the model parameter by adopting a gradient descent algorithm, and iterating the process until the model converges to realize emotion recognition.
5. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: the video content analysis module is provided with a video analysis unit and a content classification unit, and specifically comprises the following contents:
The video analysis unit analyzes the video and extracts basic information and characteristics of the video;
And the content classification unit classifies the video content and marks the video according to the video theme and the content characteristics.
6. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: in the practical application module, specifically, according to emotion feedback and preference of a user, the result of emotion recognition and video content matching is combined to accurately push video content conforming to the emotion state of the user.
CN202410905484.8A 2024-07-08 2024-07-08 Artificial intelligence based emotion recognition and video content matching system Active CN118450176B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410905484.8A CN118450176B (en) 2024-07-08 2024-07-08 Artificial intelligence based emotion recognition and video content matching system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410905484.8A CN118450176B (en) 2024-07-08 2024-07-08 Artificial intelligence based emotion recognition and video content matching system

Publications (2)

Publication Number Publication Date
CN118450176A true CN118450176A (en) 2024-08-06
CN118450176B CN118450176B (en) 2024-09-13

Family

ID=92320264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410905484.8A Active CN118450176B (en) 2024-07-08 2024-07-08 Artificial intelligence based emotion recognition and video content matching system

Country Status (1)

Country Link
CN (1) CN118450176B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019056721A1 (en) * 2017-09-21 2019-03-28 掌阅科技股份有限公司 Information pushing method, electronic device and computer storage medium
US20190102706A1 (en) * 2011-10-20 2019-04-04 Affectomatics Ltd. Affective response based recommendations
CN109844708A (en) * 2017-06-21 2019-06-04 微软技术许可有限责任公司 Recommend media content by chat robots
WO2021147084A1 (en) * 2020-01-23 2021-07-29 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for emotion recognition in user-generated video(ugv)
WO2021208719A1 (en) * 2020-11-19 2021-10-21 平安科技(深圳)有限公司 Voice-based emotion recognition method, apparatus and device, and storage medium
CN117493973A (en) * 2023-11-20 2024-02-02 安徽信息工程学院 Social media negative emotion recognition method based on generation type artificial intelligence
CN117688936A (en) * 2024-02-04 2024-03-12 江西农业大学 Low-rank multi-mode fusion emotion analysis method for graphic fusion
US20240119716A1 (en) * 2022-09-19 2024-04-11 Hangzhou Dianzi University Method for multimodal emotion classification based on modal space assimilation and contrastive learning
CN117995171A (en) * 2024-03-13 2024-05-07 广东金湾信息科技有限公司 Voice emotion recognition method and system
US20240169711A1 (en) * 2022-11-21 2024-05-23 Samsung Electronics Co., Ltd. Multi-modal understanding of emotions in video content
CN118296150A (en) * 2024-06-05 2024-07-05 北京理工大学 Comment emotion recognition method based on multi-countermeasure network improvement

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190102706A1 (en) * 2011-10-20 2019-04-04 Affectomatics Ltd. Affective response based recommendations
CN109844708A (en) * 2017-06-21 2019-06-04 微软技术许可有限责任公司 Recommend media content by chat robots
WO2019056721A1 (en) * 2017-09-21 2019-03-28 掌阅科技股份有限公司 Information pushing method, electronic device and computer storage medium
WO2021147084A1 (en) * 2020-01-23 2021-07-29 Beijing Didi Infinity Technology And Development Co., Ltd. Systems and methods for emotion recognition in user-generated video(ugv)
WO2021208719A1 (en) * 2020-11-19 2021-10-21 平安科技(深圳)有限公司 Voice-based emotion recognition method, apparatus and device, and storage medium
US20240119716A1 (en) * 2022-09-19 2024-04-11 Hangzhou Dianzi University Method for multimodal emotion classification based on modal space assimilation and contrastive learning
US20240169711A1 (en) * 2022-11-21 2024-05-23 Samsung Electronics Co., Ltd. Multi-modal understanding of emotions in video content
CN117493973A (en) * 2023-11-20 2024-02-02 安徽信息工程学院 Social media negative emotion recognition method based on generation type artificial intelligence
CN117688936A (en) * 2024-02-04 2024-03-12 江西农业大学 Low-rank multi-mode fusion emotion analysis method for graphic fusion
CN117995171A (en) * 2024-03-13 2024-05-07 广东金湾信息科技有限公司 Voice emotion recognition method and system
CN118296150A (en) * 2024-06-05 2024-07-05 北京理工大学 Comment emotion recognition method based on multi-countermeasure network improvement

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵冬梅;李雅;陶建华;顾明亮;: "基于协同过滤Attention机制的情感分析模型", 中文信息学报, no. 08, 15 August 2018 (2018-08-15) *

Also Published As

Publication number Publication date
CN118450176B (en) 2024-09-13

Similar Documents

Publication Publication Date Title
CN111462282B (en) Scene graph generation method
CN111104595B (en) Deep reinforcement learning interactive recommendation method and system based on text information
CN110674407B (en) Hybrid recommendation method based on graph convolution neural network
CN110175628A (en) A kind of compression algorithm based on automatic search with the neural networks pruning of knowledge distillation
CN111192270A (en) Point cloud semantic segmentation method based on point global context reasoning
CN107122809A (en) Neural network characteristics learning method based on image own coding
Sajid et al. Zoomcount: A zooming mechanism for crowd counting in static images
CN104217214A (en) Configurable convolutional neural network based red green blue-distance (RGB-D) figure behavior identification method
CN112308115B (en) Multi-label image deep learning classification method and equipment
CN109711422A (en) Image real time transfer, the method for building up of model, device, computer equipment and storage medium
CN112380453B (en) Article recommendation method and device, storage medium and equipment
CN114067385B (en) Cross-modal face retrieval hash method based on metric learning
CN112364747B (en) Target detection method under limited sample
De Menezes et al. Object recognition using convolutional neural networks
CN111524140B (en) Medical image semantic segmentation method based on CNN and random forest method
CN112381179A (en) Heterogeneous graph classification method based on double-layer attention mechanism
CN111949885A (en) Personalized recommendation method for scenic spots
CN113159067A (en) Fine-grained image identification method and device based on multi-grained local feature soft association aggregation
CN115098732B (en) Data processing method and related device
Qiang et al. Detection of citrus pests in double backbone network based on single shot multibox detector
CN115457332A (en) Image multi-label classification method based on graph convolution neural network and class activation mapping
CN115393631A (en) Hyperspectral image classification method based on Bayesian layer graph convolution neural network
CN113283400B (en) Skeleton action identification method based on selective hypergraph convolutional network
Wang et al. A high-accuracy genotype classification approach using time series imagery
Xu et al. Dilated convolution capsule network for apple leaf disease identification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant