CN118450176A

CN118450176A - Artificial intelligence based emotion recognition and video content matching system

Info

Publication number: CN118450176A
Application number: CN202410905484.8A
Authority: CN
Inventors: 刘琛良; 张亦弛; 吴红; 肖仙
Original assignee: Hunan Mango Rongchuang Technology Co ltd
Current assignee: Hunan Mango Rongchuang Technology Co ltd
Priority date: 2024-07-08
Filing date: 2024-07-08
Publication date: 2024-08-06
Anticipated expiration: 2044-07-08
Also published as: CN118450176B

Abstract

The invention discloses an artificial intelligence-based emotion recognition and video content matching system, which comprises the following modules: the system comprises an emotion recognition module, an emotion marking and classifying module, a video content analysis module, a video content matching construction module and an actual application module. The invention belongs to the technical field of data processing, in particular to an emotion recognition and video content matching system based on artificial intelligence, which adopts an emotion recognition and classification model, and updates model parameters by a gradient descent algorithm to solve the problem of inaccurate emotion recognition; calculating the relevance of emotion characteristics and opinion characteristics through tensors by adopting an emotion marking model, calculating attention scores, and carrying out characteristic fusion on the emotion characteristics and opinion characteristics to obtain final emotion information, so that the accuracy of emotion classification is improved, and the user requirements are accurately met; and calculating a characteristic attribute set of the video content data, deleting redundant data, marking an overlapping region, processing a region matching task by using a multithreading method, and improving the video content matching efficiency.

Description

Artificial intelligence based emotion recognition and video content matching system

Technical Field

The invention belongs to the technical field of data processing, and particularly relates to an artificial intelligence-based emotion recognition and video content matching system.

Background

The emotion recognition and video content matching system is a system combining emotion recognition technology and multimedia content analysis technology, captures and analyzes emotion states in signals by using a deep learning model, understands visual contents in videos, aims at enabling emotion of a person to be consistent with video picture contents, carries out automatic emotion marking and content classification on large-scale media resources, accurately pushes contents conforming to the emotion states of the user according to emotion feedback of the user, aims at helping the user to find video contents conforming to the emotion states more quickly and accurately, and brings audio-visual experience conforming to preference and emotion requirements of the user. However, the existing emotion recognition and video content matching system has inaccurate emotion recognition, so that recommended video content is not matched with the actual emotion state of a user, the experience of the user is reduced, irrelevant information is erroneously pushed by the system, and the technical problem of information overload is caused; the emotion classification accuracy is low, the user requirements cannot be met better, and the technical problem of target client groups is accurately solved; the method has the technical problems that large-scale video data are difficult to accurately identify and match video content, and bandwidth and storage resources are wasted.

Disclosure of Invention

Aiming at the situation, in order to overcome the defects of the prior art, the emotion recognition and video content matching system based on artificial intelligence provided by the invention aims at solving the technical problems that the recommended video content is not matched with the real emotion state of a user due to inaccurate emotion recognition, the experience of the user is reduced, the system erroneously pushes irrelevant information and information overload is caused, the characteristic extraction is carried out on an audio signal, an emotion recognition classification model is constructed, model iteration is carried out, model parameters are updated by adopting a gradient descent algorithm, and the process is iterated until the model converges; aiming at the technical problems that the accuracy of emotion classification is low and the user demands cannot be met well, and aiming at a target customer group, an emotion marking model is built accurately, the relevance of emotion characteristics and opinion characteristics is calculated through tensors, attention scores are calculated, the emotion characteristics and opinion characteristics are subjected to characteristic fusion, final emotion information is obtained, and the accuracy of emotion classification is improved; aiming at the technical problems that large-scale video data are difficult to accurately identify and match video content and waste of bandwidth and storage resources is caused, the method comprises the steps of calculating a video content data characteristic attribute set, deleting redundant data, calculating a loss function, marking an overlapping region, processing region matching tasks by using a multithreading method, and improving efficiency and instantaneity of a matching algorithm.

The invention provides an artificial intelligence-based emotion recognition and video content matching system, which comprises an emotion recognition module, an emotion marking and classifying module, a video content analysis module, a video content matching building module and an actual application module;

The emotion recognition module specifically performs feature extraction on the audio signal, builds an emotion recognition classification model, performs model iteration, updates model parameters by adopting a gradient descent algorithm, and iterates the process until the model converges;

The emotion marking and classifying module is specifically used for constructing an emotion marking model, calculating the relevance of emotion characteristics and opinion characteristics through tensors, calculating attention scores, carrying out characteristic fusion on the emotion characteristics and opinion characteristics to obtain final emotion information, and improving the accuracy of emotion classification;

the video content analysis module is used for carrying out video analysis and classifying video contents;

The video content matching module is specifically configured to calculate a video content data characteristic attribute set, delete redundant data, calculate a loss function, mark an overlapping region, process a region matching task by using a multithreading method, and improve the efficiency and instantaneity of a matching algorithm;

The actual application module specifically combines emotion feedback and preference of the user with emotion recognition and video content matching results to accurately push video content conforming to the emotion state of the user, so that user experience is improved and user requirements are met.

Further, in the emotion recognition module, a feature extraction unit, an emotion recognition classification model construction unit and a model iteration unit are arranged, and the emotion recognition classification model construction unit specifically comprises the following contents:

A feature extraction unit for calculating a sum of squares of differences between the audio feature values and the average values, for describing distribution features of the audio signals;

constructing an emotion recognition classification model unit, improving a traditional feedforward neural network, constructing a forward neural network model based on Chebyshev orthogonal polynomial clusters aiming at the distribution characteristic data of the audio signals, wherein in the model, a single hidden layer is used for reducing the complexity of the whole model, a function in the Chebyshev orthogonal polynomial clusters is used for a stimulation function of each neuron in the hidden layer and is used for fitting the distribution characteristic of the audio signals, and a linear stimulation function is adopted for the stimulation functions of the neurons in other layers;

And the model iteration unit is used for calculating the activation value of each node in the neural network, propagating the gradient through a back propagation algorithm to obtain the gradient value of each parameter, updating the model parameter by adopting a gradient descent algorithm, and iterating the process until the model converges to realize emotion recognition.

Further, in the emotion marking and classifying module, an emotion marking model building unit, an attention score calculating unit and a feature fusion unit are arranged, and specifically include the following contents:

An emotion marking model unit is constructed, input data of a definition model are emotion characteristics P and opinion characteristics O of a user respectively, and a combination vector between the emotion characteristics and opinion characteristics is calculated through tensor operation, wherein the following formula is used:

；

In the method, in the process of the invention, The method comprises the steps of representing a combination vector, wherein the combination vector is used for representing the degree of association between an ith feature vector in emotion feature P and a jth feature vector in opinion feature O, P _i represents the ith feature vector in emotion feature P, O _j represents the jth feature vector in opinion feature O, G _p is a three-dimensional tensor, the dimension of which is K multiplied by 2d, K represents the hyper-parameters of G _p, the complex intrinsic correlation between emotion feature and opinion feature is represented, the K value is increased, the more information is extracted, the complexity is higher, 2d represents the dimension of each feature vector, i is the index of the feature vector of emotion feature, j is the index of the feature vector of opinion feature, and tanh is the abbreviation of hyperbolic tangent function;

And the attention score calculating unit is used for calculating the attention score after obtaining the combined vector between the emotion characteristics and the opinion characteristics, wherein the following formula is used:

；

In the method, in the process of the invention, Indicating a concentration score, the higher the concentration score, indicating that the ith feature vector in emotion feature P captures more information for the jth feature vector in opinion feature O,Is a weight vector used to measure the importance of each value of the combined vector;

the feature fusion unit fuses the emotion features and the opinion features to obtain final emotion information, and the formula is as follows:

；

In the method, in the process of the invention, Representing final emotion information, S _p representing emotion feature vectors, S _o representing opinion feature vectors, E _p representing a matrix based on attention scores, softmax _r representing a row-based softmax function for softmax operation on each row of the attention score matrix, and the resulting weight vectors are used to weight-fuse S _o for improved accuracy of emotion classification and opinion extraction.

Further, in the video content analysis module, a video analysis unit and a content classification unit are provided, which specifically include the following contents:

The video analysis unit analyzes the video and extracts basic information and characteristics of the video, including resolution, frame rate and encoding and decoding formats;

and the content classification unit classifies the video content, marks the video according to the video theme and the content characteristics, and is convenient to search and manage.

Further, in the video content matching module, a calculation feature attribute set unit, a redundant data unit deletion, a calculation loss function unit, a mark overlapping area unit and a parallel calculation unit are provided, and the specific contents are as follows:

The computing characteristic attribute set unit is used for collecting video content data from the video content analysis module, storing the video content data in the large-capacity data storage device, processing and transmitting the video content data as HLA distributed simulation data, and computing the characteristic attribute set of the video content data, wherein the formula is as follows:

；

In the method, in the process of the invention, Q _i is the characteristic number of the video content data after HLA distributed data classification, s is the characteristic content of the video content data, n is the total characteristic number of the video content data, and c is the parameter of the HLA distributed data;

And deleting redundant data units, deleting non-characteristic attributes, reducing errors, improving the speed during collection, and calculating a removal reference, wherein the formula is as follows:

；

Where L represents a removal criterion for determining whether to delete data conforming to the criterion, A vector representation representing a filter request used at the time of removal, q representing a scalar representation of the filter request used at the time of removal for identifying characteristics of HLA-distributed data, e representing an existing redundant data deletion request for deleting non-characteristic attributes;

and a loss function calculation unit for defining the input vector and the label value and calculating the loss function by using the following formula:

；

Where LL represents the loss function, Is an input vector, represents HLA distributed simulation evaluation effect image characteristics,Is a label value, which represents a real type label corresponding to the HLA distributed simulation evaluation effect image,The method comprises the steps of representing a class weight matrix, wherein the class weight matrix is used for weighting and calculating a loss function, T represents a transpose of the matrix, b _j1 represents an error value of a class, m represents the number of training samples, i1 is an index of the training samples, n represents the number of the class of the training samples, j1 is an index of the class of the training samples, the training samples are a group of known HLA distributed simulation evaluation effect image characteristics and corresponding real class labels thereof, and the evaluation effect image characteristics are used for matching video content so as to quickly find videos matched with the emotion of a user;

Marking an overlapping area unit, wherein the overlapping area unit marks overlapping areas, the overlapping area represents that repeated content exists among different parts in video content matching, processing is carried out on the overlapping areas, the accuracy and the efficiency of video content matching are improved, the labels of each area are marked, each connected area is guaranteed to have a unique label, processing is carried out from the overlapping line of the last row, each time, for each pixel, whether adjacent pixels belong to the same connected area is checked, if so, the two areas are combined, the labels of the two areas are updated to be the same label, and the next row is continuously processed upwards until all overlapping rows are processed;

The parallel computing unit divides the plurality of mobile matching tasks into different kernels, and each kernel is responsible for processing different region matching tasks; the parallelization calculation of the region matching degree is realized by utilizing a multithreading method, and each kernel uses an independent thread to execute the region matching task; in the parallel computing process, repeated computation is avoided through a reasonable data sharing and communication mechanism, the computing capacity of the region matching capacity is improved, the number of redundant computation is reduced, and the actual efficiency of video content matching is improved.

Further, in the practical application module, specifically, according to emotion feedback and preference of the user, the results of emotion recognition and video content matching are combined, so that video content conforming to the emotion state of the user is accurately pushed to the user, user experience is improved, and user requirements are met.

By adopting the scheme, the beneficial effects obtained by the invention are as follows:

(1) Aiming at the technical problems that the recommended video content is not matched with the real emotion state of the user due to inaccurate emotion recognition, the experience of the user is reduced, irrelevant information is pushed by a system error, and information overload is caused, the method comprises the steps of extracting characteristics of an audio signal, constructing an emotion recognition classification model, carrying out model iteration, updating model parameters by adopting a gradient descent algorithm, and iterating the process until the model converges;

(2) Aiming at the technical problems that the accuracy of emotion classification is low and the requirements of users cannot be met well, and the target customer group is accurately aimed at, an emotion marking model is built, the relevance between the emotion characteristics and the opinion characteristics of the users is calculated through tensors, attention scores are calculated, the emotion characteristics and the opinion characteristics are subjected to characteristic fusion, final emotion information is obtained, and the accuracy of emotion classification is improved;

(3) Aiming at the technical problems that large-scale video data are difficult to accurately identify and match video content and waste of bandwidth and storage resources is caused, the method comprises the steps of calculating a video content data characteristic attribute set, deleting redundant data, calculating a loss function, marking an overlapping region, processing region matching tasks by using a multithreading method, and improving efficiency and instantaneity of a matching algorithm.

Drawings

FIG. 1 is a schematic diagram of an artificial intelligence based emotion recognition and video content matching system provided by the present invention;

FIG. 2 is a schematic diagram of an emotion recognition module;

FIG. 3 is a schematic diagram of an emotion marking and classifying module;

Fig. 4 is a schematic diagram of constructing a video content matching module.

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are only some, but not all embodiments of the invention; all other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

In the description of the present invention, it should be understood that the terms "upper," "lower," "front," "rear," "left," "right," "top," "bottom," "inner," "outer," and the like indicate orientation or positional relationships based on those shown in the drawings, merely to facilitate description of the invention and simplify the description, and do not indicate or imply that the devices or elements referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus should not be construed as limiting the invention.

Referring to fig. 1, the emotion recognition and video content matching system based on artificial intelligence provided by the invention comprises an emotion recognition module, an emotion marking and classifying module, a video content analysis module, a video content matching module and an actual application module;

In the second embodiment, referring to fig. 1 and fig. 2, the emotion recognition module is provided with a feature extraction unit, an emotion recognition classification model construction unit and a model iteration unit, and specifically includes the following contents:

By executing the operations, the method adopts the characteristics extraction of the audio signals, builds the emotion recognition classification model, carries out model iteration, adopts the gradient descent algorithm to update model parameters, iterates the process until the model converges, and solves the technical problems that the recommended video content is not matched with the true emotion state of the user, the user experience is reduced, the system erroneously pushes irrelevant information and information overload is caused.

Referring to fig. 1 and 3, in the embodiment, in the emotion marking and classifying module, a emotion marking model building unit, an attention score calculating unit and a feature fusion unit are provided, which specifically include the following contents:

；

Through executing the operations, the emotion marking model is constructed, the relevance of emotion characteristics and opinion characteristics is calculated through tensors, attention scores are calculated, the emotion characteristics and opinion characteristics are subjected to characteristic fusion, final emotion information is obtained, emotion classification accuracy is improved, the technical problems that emotion classification accuracy is not high, user requirements cannot be met well, and target client groups are accurately met are solved.

In the fourth embodiment, referring to fig. 1, the video content analysis module is provided with a video analysis unit and a content classification unit, and the embodiment is based on the above embodiment, and specifically includes the following contents:

In the fifth embodiment, referring to fig. 1 and fig. 4, in the embodiment, in constructing a video content matching module, a calculating feature attribute set unit, a redundant data deleting unit, a calculating loss function unit, a label overlapping area unit and a parallel calculating unit are provided, which specifically includes the following steps:

；

By executing the operation, the scheme adopts the technical problems that the characteristic attribute set of the video content data is calculated, redundant data is deleted, a loss function is calculated, an overlapping region is marked, a region matching task is processed by utilizing a multithreading method, the efficiency and the instantaneity of a matching algorithm are improved, the video content is difficult to accurately identify and match in large-scale video data, and the waste of bandwidth and storage resources is caused.

In the sixth embodiment, referring to fig. 1, the embodiment is based on the above embodiment, and in the practical application module, specifically, according to the emotion feedback and preference of the user, the result of emotion recognition and video content matching is combined, so that the video content conforming to the emotion state of the user is accurately pushed to the user, thereby improving the user experience and meeting the user requirement.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made hereto without departing from the spirit and principles of the present invention.

The invention and its embodiments have been described above with no limitation, and the actual construction is not limited to the embodiments of the invention as shown in the drawings. In summary, if one of ordinary skill in the art is informed by this disclosure, a structural manner and an embodiment similar to the technical solution should not be creatively devised without departing from the gist of the present invention.

Claims

1. The emotion recognition and video content matching system based on artificial intelligence is characterized in that: the system comprises an emotion recognition module, an emotion marking and classifying module, a video content analysis module, a video content matching building module and an actual application module;

The emotion recognition module specifically performs feature extraction on the audio signal, constructs an emotion recognition classification model, updates model parameters by adopting a gradient descent algorithm, and performs model iteration;

the emotion marking and classifying module is specifically used for constructing an emotion marking model, calculating the relevance of emotion characteristics and opinion characteristics through tensors, calculating attention scores, and carrying out characteristic fusion on the emotion characteristics and opinion characteristics to obtain final emotion information;

the video content matching module is specifically used for calculating a video content data characteristic attribute set, deleting redundant data, calculating a loss function, marking an overlapping region and processing a region matching task by using a multithreading method;

the actual application module is used for accurately pushing the video content conforming to the emotion state of the user according to the emotion feedback and preference of the user and combining the emotion recognition and video content matching results.

2. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: the emotion marking and classifying module is provided with an emotion marking model building unit, an attention score calculating unit and a feature fusion unit, and specifically comprises the following contents:

；

In the method, in the process of the invention, Representing the final emotion information, S _p representing emotion feature vectors, S _o representing opinion feature vectors, E _p representing a matrix based on attention scores, softmax _r representing a row-based softmax function for softmax manipulation of each row of the attention score matrix, the resulting weight vectors being used to weight fuse S _o.

3. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: in the construction of the video content matching module, a calculation characteristic attribute set unit, a redundant data deleting unit, a calculation loss function unit, a mark overlapping area unit and a parallel calculation unit are arranged, and the specific contents are as follows:

；

Deleting redundant data units, and calculating a removal reference, wherein the formula is as follows:

；

Where LL represents the loss function, Is an input vector, represents HLA distributed simulation evaluation effect image characteristics,Is a label value, which represents a real type label corresponding to the HLA distributed simulation evaluation effect image,The method comprises the steps of representing a class weight matrix, wherein the class weight matrix is used for weighting and calculating a loss function, T represents a transpose of the matrix, b _j1 represents an error value of a class, m represents the number of training samples, i1 is an index of the training samples, n represents the number of the class of the training samples, j1 is an index of the class of the training samples, and the training samples are a group of known HLA distributed simulation evaluation effect image characteristics and corresponding real class labels;

marking an overlapping area unit, namely marking the overlapping area, wherein the overlapping part represents that repeated contents exist among different parts in video content matching, processing the overlapping area, marking the label of each area, ensuring that each connected area has a unique label, and processing from the overlapping line of the penultimate line, processing one line each time until all the overlapping lines are processed;

The parallel computing unit divides the plurality of mobile matching tasks into different kernels, and each kernel is responsible for processing different region matching tasks; the parallelization calculation of the region matching degree is realized by utilizing a multithreading method, and each kernel uses an independent thread to execute the region matching task; in the parallel computing process, repeated computation is avoided through reasonable data sharing and communication mechanisms.

4. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: the emotion recognition module is provided with a feature extraction unit, an emotion recognition classification model construction unit and a model iteration unit, and specifically comprises the following contents:

5. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: the video content analysis module is provided with a video analysis unit and a content classification unit, and specifically comprises the following contents:

The video analysis unit analyzes the video and extracts basic information and characteristics of the video;

And the content classification unit classifies the video content and marks the video according to the video theme and the content characteristics.

6. The artificial intelligence based emotion recognition and video content matching system of claim 1, wherein: in the practical application module, specifically, according to emotion feedback and preference of a user, the result of emotion recognition and video content matching is combined to accurately push video content conforming to the emotion state of the user.