CN113392781A - Video emotion semantic analysis method based on graph neural network - Google Patents

Video emotion semantic analysis method based on graph neural network Download PDF

Info

Publication number
CN113392781A
CN113392781A CN202110676126.0A CN202110676126A CN113392781A CN 113392781 A CN113392781 A CN 113392781A CN 202110676126 A CN202110676126 A CN 202110676126A CN 113392781 A CN113392781 A CN 113392781A
Authority
CN
China
Prior art keywords
emotion
character
video
graph
relationship
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110676126.0A
Other languages
Chinese (zh)
Inventor
孙善宝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Inspur Scientific Research Institute Co Ltd
Original Assignee
Shandong Inspur Scientific Research Institute Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Inspur Scientific Research Institute Co Ltd filed Critical Shandong Inspur Scientific Research Institute Co Ltd
Priority to CN202110676126.0A priority Critical patent/CN113392781A/en
Priority to PCT/CN2021/112475 priority patent/WO2022262098A1/en
Publication of CN113392781A publication Critical patent/CN113392781A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a video emotion semantic analysis method based on a graph neural network, which fully considers the characteristics that the emotion of individual characters in a video is influenced by concerned related characters and objects, utilizes a deep learning technology to construct graph structure association relations between the characters and the objects in the video, simultaneously extracts emotion characteristics by combining a 3D convolutional neural network of classical deep learning, utilizes the graph convolution neural network to combine the current relationship between the characters and the graph structure of the association relation between the characters and the objects, and can more accurately judge the real emotion state of a target character. The method of quickly modeling the relationship between the character and the object and then performing deep emotion analysis on the video can achieve better effects on personalized scenes such as reconnaissance, interviewing, face signing and the like.

Description

Video emotion semantic analysis method based on graph neural network
Technical Field
The invention relates to a video emotion semantic analysis method based on a graph neural network, and belongs to the technical field of graph neural networks, emotion analysis and machine vision.
Background
With the rapid development of deep learning technology and the support of mass data and high-efficiency computing power in the times of internet and cloud computing, a large-scale neural network similar to a human brain structure is obtained by training and constructing the deep learning technology represented by a CNN convolutional neural network, and breakthrough progress is made in the fields of computer vision, voice recognition, natural language understanding and the like, so that subversive change can be brought to the whole society, and the deep learning technology becomes an important development strategy of countries in the future.
The conventional convolutional neural network brings promotion in the text and image fields, but it can process only euclidean space data. Graph Neural Network (GNN) is a method that can perform deep learning on Graph data, and includes models applied to graphs by various Neural networks. The graph is a graph formed by a plurality of nodes and edges (edges) connecting the two nodes, and is used for depicting the relationship between different nodes. The graph data is a kind of non-european space data, and is gradually receiving attention due to its ubiquitous nature. Graph Convolutional neural Network (GCN) is a type of neural Network using Graph convolution, and has shown advantages in the field of computer vision as an important branch in Graph neural networks.
Video emotion analysis is an important research direction in a video understanding task, and the emotion state of a person in a picture is discovered through video analysis, so that the video emotion analysis has important application value in the fields of man-machine interaction, interview and interview, medical diagnosis, robot manufacturing, investigation and interrogation and the like. Ekman and Friesen construct a discrete classification model, 6 basic emotions are defined, angler, disgust, fear, happy happensation, heartburn and surprise are defined, and then slight bamboo contitempt is added into the basic emotion. With the continuous change of the service scene, deeper emotion of individuals in the video needs to be analyzed in the actual service, the surface emotion of the characters is uncovered, and the real emotional state of the individuals is found to meet the requirement of individuation of a new scene. Under the circumstance, modeling the relationship between people and objects of the video, effectively utilizing the graph neural network, analyzing the emotional characteristics of the video characters by combining the CNN convolutional neural network, and more accurately judging the emotional state of the individual becomes a problem to be solved urgently.
Disclosure of Invention
The invention aims to provide a video emotion semantic analysis method based on a graph neural network, which can judge the real emotion state of a target person more accurately, and has higher processing efficiency and strong timeliness.
In order to achieve the purpose, the invention is realized by the following technical scheme:
a) constructing graph structure association relations among the characters and the objects in the video,
b) people and objects in the video are identified through target detection,
c) in the video emotion analysis, emotion data in a video is extracted through a 3D-CNN three-dimensional convolution neural network based on the identified person,
d) and judging the real emotional state of the target character by utilizing the graph convolution operation and combining the current relationship between the characters and the incidence relationship graph structure between the characters and the object.
Preferably, the specific steps of step a are as follows:
step 101, designing the emotion types of characters, the relationship between the characters and objects according to the requirements of scenes in the target research field;
102, collecting a large amount of video data of scenes in the field to perform data annotation, training a target detection module ObjDet aiming at the types of interested persons and objects in an annotation data set based on a universal target detection model, and obtaining a target detection model;
103, carrying out character emotion category marking on video data, designing network structures of the character emotion feature extractor PFextract and the emotion Classifier, combining the network structures and the network structures, and training by using marking data to obtain a PFextract model and a Classifier model;
the character emotional feature extractor PFextract adopts 3D-CNN as a core network and is used for extracting the emotional features of characters in a video to form a feature vector;
the core of the emotion Classifier is a linear Classifier, and the emotion state classification is judged by using the characteristics formed by the character emotion characteristic extractor PFextract.
Preferably, the specific steps of step b are as follows:
step 201, according to the set character relationship, based on the characters obtained by video target detection and recognition, forming a character relationship diagram structure data set, training the character relationship generator PRGen, and obtaining a character relationship diagram structure generation model;
the character relation graph structure is a graph relation structure (V, E) of related characters related in a video, wherein V represents a character, E represents a relation between the character and the character, and character features are represented by a d-dimensional vector format;
the character relation generator PRGen is responsible for forming the identified characters into a character basic relation graph structure for describing the relation between main target characters;
step 202, according to the set relationship between the person and the object, based on the person and the object obtained by video target detection and recognition, forming a data set of a structure of a relationship diagram between the person and the object, and training the relation generator PAORGen between the person and the object to obtain a structure generation model of the relationship diagram between the person and the object;
the person-object relationship generator PAORGen is responsible for forming the identified persons and objects into a person-object basic relationship graph structure for describing the relationship between the main target person and the object of interest.
Preferably, the specific steps of step c are as follows:
301, training by combining the graph convolution emotion generator GCNGen and the emotion discriminator MDTR based on a character emotion feature vector, a character relation diagram structure and a character and object relation diagram structure extracted by a character emotion feature extractor PFExtract to obtain a graph convolution emotion generator model and an emotion discriminator model;
step 302, based on the emotional feature vector generated by the graph convolution emotion generator GCNGen, combining the existing character relationship graph structure, and the character relationship adjuster PRTuning, so that the character relationship graph structure adjustment conforms to the emotional feature generated by the graph convolution emotion generator;
the character relation regulator PRTuning regulates and updates the existing character relation graph structure according to the character relation identified by the current video segment and by combining the graph convolution emotion characteristic vector output by the previous scene;
303, based on the emotion feature vector generated by the graph convolution emotion generator GCNGen, combining the existing character and object relation graph structure, and adjusting the character and object relation graph structure to conform to the emotion feature generated by the graph convolution emotion generator;
the character-object relationship adjuster PAORTuning adjusts and updates the existing character-object relationship graph structure according to the relationship between the character and the object identified by the current video segment and by combining the graph volume emotion characteristic vector output by the front scene;
and step 304, combining the models formed by training in the steps 101 to 108 for video emotion semantic analysis and judgment.
Preferably, the specific steps of step d are as follows:
step 401, segmenting the video, extracting the people and the objects in the video by using the target detection ObjDet module, forming a people set and an object set based on the recognition result, and forming a people basic relationship graph structure and a people and object basic relationship graph structure through the people relationship generator PRGen and the people and object relationship generator PAORGen;
step 402, cutting the two relation graph structures formed in the step 110, carrying out fine adjustment according to known knowledge, and selecting concerned characters and objects as initial relation graph structures of video emotion semantic analysis;
step 403, using the target detection obj det module to perform target detection on the video again, acquiring characters and objects of the video according to a set time interval, and acquiring emotional feature vectors feV of characters appearing in the video segment through the character emotional feature extractor PFExtract;
step 404, inputting a character emotion feature vector feV set, a character relation graph structure and a character and object relation graph structure extracted by a character emotion feature extractor PFextract into the graph convolution emotion generator GCNGen to obtain an emotion feature vector emV subjected to graph convolution;
step 405, inputting the emotion feature vector subjected to graph convolution into the emotion discriminator MDTR, and outputting the emotion state of the character in the current video segment;
step 406, acquiring a next video segment, identifying the people and objects in the video segment, inputting the people and objects in the video segment, the emotion characteristic vector emV and the people relationship graph structure into the people relationship adjuster PRTuning based on the graph convolution emotion characteristic vector emV output by the graph convolution emotion generator GCNGen in the previous video segment, updating the people relationship graph structure, inputting the people and objects in the video segment, the emotion characteristic vector emV and the people and object relationship graph structure into the people relationship adjuster PAORTuning, and updating the people and object relationship graph structure;
step 407, obtaining an emotional feature vector feV of a person appearing in the video clip through the person emotional feature extractor PFExtract, and turning to step 113;
step 408, repeating the steps 110 to 116, and continuously outputting the emotion state of the video character;
and 409, continuously collecting data in the process of judging the video emotion, and simultaneously feeding back the correctness of an output result for continuous optimization of the model.
The invention has the advantages that: the method fully considers the characteristic that the emotion of a character individual in the video is influenced by the concerned related characters and objects, utilizes the deep learning technology to construct graph structure association relations between the characters and the objects in the video, simultaneously extracts emotion characteristics by combining with a 3D convolutional neural network of classical deep learning, utilizes the graph convolutional neural network to combine with the current relationship between the characters and the associated relation graph structure between the characters and the objects, and can more accurately judge the real emotion state of a target character; the characteristics of video time sequence are fully considered, the 3D convolutional neural network is adopted, video emotion analysis is more accurately realized, the video is segmented into sequences, complexity is reduced, and processing efficiency is improved; compared with the traditional mode of directly judging the emotional state by adopting video or image frames, the method adopts the graph convolution neural network, increases external knowledge and internal associated factors, more comprehensively expresses the emotional state of the deep-level video, better adapts to and meets the real service scene, fully considers the timeliness of the emotional state change, continuously feeds back the emotional state of the output character along with the advance of the video frames, and continuously updates the relation between the characters and the relation graph structure between the characters and the object. The method has the advantages that people and objects are identified by adopting a target detection algorithm, target people and objects can be quickly positioned, useless frames are filtered, the calculated amount of 3D convolution video for extracting emotional features is reduced to a certain degree, and the video processing speed is accelerated. In addition, a mode of rapidly modeling the relationship between the character and the object and then performing deep emotion analysis on the video is adopted, so that better effects can be achieved on personalized scenes such as reconnaissance, interviewing, face signing and the like.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention.
FIG. 1 is a schematic structural diagram of a video emotion semantic analysis model according to the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In one embodiment, as shown in fig. 1, graph structure association relations between people and objects in a video are constructed, people and objects in the video are identified through target detection, emotion data in the video are extracted through a 3D-CNN three-dimensional convolutional neural network based on the identified people in video emotion analysis, and the real emotional state of a target person is judged through graph convolution operation by combining the current graph structure of the relation between people and the current graph structure of the association relations between people and objects. Wherein the content of the first and second substances,
the character relation graph structure is a graph relation structure (V, E) of related characters related in a video, wherein V represents a character, E represents a relation between the character and the character, and character features are represented by a d-dimensional vector format; the figure-object relation graph structure describes the relation between the figure and the object, and the graph structure is also used for description; the core of the target detection module ObjDet is a neural network, and people and interested objects in the video can be identified by adopting target detection algorithms such as SSD or YOLO and the like aiming at video detection; the character relation generator PRGen is responsible for forming the identified characters into a character basic relation graph structure for describing the relation between main target characters; the person and object relation generator PAORGen is responsible for forming the recognized persons and objects into a person and object basic relation graph structure for describing the relation between the main target person and the interested object; the character emotional feature extractor PFextract adopts 3D-CNN as a core network and is used for extracting the emotional features of characters in a video to form a feature vector; the core of the emotion Classifier is a linear Classifier, and the emotion state classification is judged by using the characteristics formed by the character emotion characteristic extractor PFextract; the character relation regulator PRTuning regulates and updates the existing character relation graph structure according to the character relation identified by the current video segment and by combining the graph convolution emotion characteristic vector output by the previous scene; the character-object relationship adjuster PAORTuning adjusts and updates the existing character-object relationship graph structure according to the relationship between the character and the object identified by the current video segment and by combining the graph volume emotion characteristic vector output by the front scene; the graph convolution emotion generator GCNGen comprises a figure relation diagram structure diagram convolution operation module, a figure and object relation diagram structure diagram convolution operation module and a fusion operation module of a graph convolution operation result and a figure emotion feature vector generated by the figure emotion feature extractor PFextract, and generates graph convolution emotion feature vectors of all target figures; the core of the emotion discriminator MDTR is a neural network, and the real emotional state of the character is judged by utilizing the character emotion characteristic vector generated by the graph convolution emotion generator GCNGen.
The method provided by the invention will be described in detail with reference to specific examples.
Firstly, analyzing and judging video emotion semantics
The video emotion semantic analysis and judgment method comprises the following steps:
step 101, designing emotional types of characters, such as calmness, joy, surprise, hurt, anger, disgust, fear, slight and the like, according to the requirements of scenes in the target research field, and designing the relationship among the characters and objects;
102, collecting a large amount of video data of scenes in the field to perform data annotation, training a target detection module ObjDet aiming at the types of interested persons and objects in an annotation data set based on a universal target detection model, and obtaining a target detection model;
103, carrying out character emotion category marking on video data, designing network structures of the character emotion feature extractor PFextract and the emotion Classifier, combining the network structures and the network structures, and training by using marking data to obtain a PFextract model and a Classifier model;
104, according to the set character relationship, detecting and identifying the obtained characters based on the video target to form a character relationship diagram structure data set, and training the character relationship generator PRGen to obtain a character relationship diagram structure generation model;
105, according to the set relationship between the person and the object, detecting and identifying the obtained person and the object based on the video target, forming a data set of a structure of a relationship graph between the person and the object, and training the relation generator PAORGen between the person and the object to obtain a structure generation model of the relationship graph between the person and the object;
106, training by combining the graph convolution emotion generator GCNGen and the emotion discriminator MDTR based on a character emotion feature vector, a character relation graph structure and a character and object relation graph structure extracted by a character emotion feature extractor PFextract to obtain a graph convolution emotion generator model and an emotion discriminator model;
step 107, based on the emotional feature vector generated by the graph convolution emotion generator GCNGen, combining the existing character relationship graph structure, and the character relationship adjuster PRTuning, so that the character relationship graph structure adjustment conforms to the emotional feature generated by the graph convolution emotion generator;
step 108, based on the emotion feature vector generated by the graph convolution emotion generator GCNGen, combining the existing character and object relation graph structure, and adjusting the character and object relation graph structure to conform to the emotion feature generated by the graph convolution emotion generator;
step 109, combining the models formed by training in the steps 101 to 108 for video emotion semantic analysis and judgment;
step 110, segmenting the video, extracting the people and the objects in the video by using the target detection ObjDet module, forming a people set and an object set based on the identification result, and forming a people basic relationship graph structure and a people and object basic relationship graph structure through the people relationship generator PRGen and the people and object relationship generator PAORGen;
step 111, cutting the two relation graph structures formed in the step 110, finely adjusting according to known knowledge, and selecting concerned characters and objects as initial relation graph structures of video emotion semantic analysis;
step 112, using the target detection ObjDet module to perform target detection on the video again, acquiring characters and objects of the video according to a set time interval, and acquiring emotional feature vectors feV of characters appearing in the video segment through the character emotional feature extractor PFextract;
113, inputting a character emotion feature vector feV set, a character relation graph structure and a character and object relation graph structure extracted by a character emotion feature extractor PFextract into the graph convolution emotion generator GCNGen to obtain an emotion feature vector emV subjected to graph convolution;
step 114, inputting the emotion feature vector subjected to graph convolution into the emotion discriminator MDTR, and outputting the emotion state of the person in the current video segment;
step 115, acquiring a next video, identifying the character and the object of the current video, inputting the character and the object of the current video, the emotion feature vector emV and the character relation graph structure into the character relation regulator PRTuning based on the graph convolution emotion feature vector emV output by the graph convolution emotion generator GCNGen of the previous video, updating the character relation graph structure, inputting the character and the object of the current video, the emotion feature vector emV and the character and object relation graph structure into the character relation regulator PAORTuning, and updating the character and object relation graph structure;
step 116, obtaining an emotional feature vector feV of a character appearing in the video clip through the character emotional feature extractor PFextract, and turning to step 113;
step 117, repeating the steps 110 to 116, and continuously outputting the emotion state of the video character;
and step 118, continuously collecting data in the process of judging the video emotion, and simultaneously feeding back the correctness of the output result for continuous optimization of the model.
The above-described embodiment is only one specific embodiment of the present invention, and general changes and substitutions by those skilled in the art within the technical scope of the present invention are included in the protection scope of the present invention.

Claims (5)

1. A video emotion semantic analysis method based on a graph neural network is characterized by comprising the following steps:
a) constructing graph structure association relations among the characters and the objects in the video,
b) people and objects in the video are identified through target detection,
c) in the video emotion analysis, emotion data in a video is extracted through a 3D-CNN three-dimensional convolution neural network based on the identified person,
d) and judging the real emotional state of the target character by utilizing the graph convolution operation and combining the current relationship between the characters and the incidence relationship graph structure between the characters and the object.
2. The method for analyzing video emotion semantics based on graph neural network according to claim 1, wherein the specific steps of the step a are as follows:
step 101, designing the emotion types of characters, the relationship between the characters and objects according to the requirements of scenes in the target research field;
102, collecting a large amount of video data of scenes in the field to perform data annotation, training a target detection module ObjDet aiming at the types of interested persons and objects in an annotation data set based on a universal target detection model, and obtaining a target detection model;
103, carrying out character emotion category marking on video data, designing network structures of the character emotion feature extractor PFextract and the emotion Classifier, combining the network structures and the network structures, and training by using marking data to obtain a PFextract model and a Classifier model;
the character emotional feature extractor PFextract adopts 3D-CNN as a core network and is used for extracting the emotional features of characters in a video to form a feature vector;
the core of the emotion Classifier is a linear Classifier, and the emotion state classification is judged by using the characteristics formed by the character emotion characteristic extractor PFextract.
3. The method for analyzing video emotion semantics based on graph neural network according to claim 2, wherein the specific steps of the step b are as follows:
step 201, according to the set character relationship, based on the characters obtained by video target detection and recognition, forming a character relationship diagram structure data set, training the character relationship generator PRGen, and obtaining a character relationship diagram structure generation model;
the character relation graph structure is a graph relation structure (V, E) of related characters related in a video, wherein V represents a character, E represents a relation between the character and the character, and character features are represented by a d-dimensional vector format;
the character relation generator PRGen is responsible for forming the identified characters into a character basic relation graph structure for describing the relation between main target characters;
step 202, according to the set relationship between the person and the object, based on the person and the object obtained by video target detection and recognition, forming a data set of a structure of a relationship diagram between the person and the object, and training the relation generator PAORGen between the person and the object to obtain a structure generation model of the relationship diagram between the person and the object;
the person-object relationship generator PAORGen is responsible for forming the identified persons and objects into a person-object basic relationship graph structure for describing the relationship between the main target person and the object of interest.
4. The method for video emotion semantic analysis based on graph neural network according to claim 3, characterized in that, the specific steps of the step c are as follows:
301, training by combining the graph convolution emotion generator GCNGen and the emotion discriminator MDTR based on a character emotion feature vector, a character relation diagram structure and a character and object relation diagram structure extracted by a character emotion feature extractor PFExtract to obtain a graph convolution emotion generator model and an emotion discriminator model;
step 302, based on the emotional feature vector generated by the graph convolution emotion generator GCNGen, combining the existing character relationship graph structure, and the character relationship adjuster PRTuning, so that the character relationship graph structure adjustment conforms to the emotional feature generated by the graph convolution emotion generator;
the character relation regulator PRTuning regulates and updates the existing character relation graph structure according to the character relation identified by the current video segment and by combining the graph convolution emotion characteristic vector output by the previous scene;
303, based on the emotion feature vector generated by the graph convolution emotion generator GCNGen, combining the existing character and object relation graph structure, and adjusting the character and object relation graph structure to conform to the emotion feature generated by the graph convolution emotion generator;
the character-object relationship adjuster PAORTuning adjusts and updates the existing character-object relationship graph structure according to the relationship between the character and the object identified by the current video segment and by combining the graph volume emotion characteristic vector output by the front scene;
and step 304, combining the models formed by training in the steps 101 to 108 for video emotion semantic analysis and judgment.
5. The method for video emotion semantic analysis based on graph neural network according to claim 4, characterized in that, the specific steps of the step d are as follows:
step 401, segmenting the video, extracting the people and the objects in the video by using the target detection ObjDet module, forming a people set and an object set based on the recognition result, and forming a people basic relationship graph structure and a people and object basic relationship graph structure through the people relationship generator PRGen and the people and object relationship generator PAORGen;
step 402, cutting the two relation graph structures formed in the step 110, carrying out fine adjustment according to known knowledge, and selecting concerned characters and objects as initial relation graph structures of video emotion semantic analysis;
step 403, using the target detection obj det module to perform target detection on the video again, acquiring characters and objects of the video according to a set time interval, and acquiring emotional feature vectors feV of characters appearing in the video segment through the character emotional feature extractor PFExtract;
step 404, inputting a character emotion feature vector feV set, a character relation graph structure and a character and object relation graph structure extracted by a character emotion feature extractor PFextract into the graph convolution emotion generator GCNGen to obtain an emotion feature vector emV subjected to graph convolution;
step 405, inputting the emotion feature vector subjected to graph convolution into the emotion discriminator MDTR, and outputting the emotion state of the character in the current video segment;
step 406, acquiring a next video segment, identifying the people and objects in the video segment, inputting the people and objects in the video segment, the emotion characteristic vector emV and the people relationship graph structure into the people relationship adjuster PRTuning based on the graph convolution emotion characteristic vector emV output by the graph convolution emotion generator GCNGen in the previous video segment, updating the people relationship graph structure, inputting the people and objects in the video segment, the emotion characteristic vector emV and the people and object relationship graph structure into the people relationship adjuster PAORTuning, and updating the people and object relationship graph structure;
step 407, obtaining an emotional feature vector feV of a person appearing in the video clip through the person emotional feature extractor PFExtract, and turning to step 113;
step 408, repeating the steps 110 to 116, and continuously outputting the emotion state of the video character;
and 409, continuously collecting data in the process of judging the video emotion, and simultaneously feeding back the correctness of an output result for continuous optimization of the model.
CN202110676126.0A 2021-06-18 2021-06-18 Video emotion semantic analysis method based on graph neural network Pending CN113392781A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202110676126.0A CN113392781A (en) 2021-06-18 2021-06-18 Video emotion semantic analysis method based on graph neural network
PCT/CN2021/112475 WO2022262098A1 (en) 2021-06-18 2021-08-13 Video emotion semantic analysis method based on graph neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110676126.0A CN113392781A (en) 2021-06-18 2021-06-18 Video emotion semantic analysis method based on graph neural network

Publications (1)

Publication Number Publication Date
CN113392781A true CN113392781A (en) 2021-09-14

Family

ID=77621793

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110676126.0A Pending CN113392781A (en) 2021-06-18 2021-06-18 Video emotion semantic analysis method based on graph neural network

Country Status (2)

Country Link
CN (1) CN113392781A (en)
WO (1) WO2022262098A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154558A (en) * 2021-11-12 2022-03-08 山东浪潮科学研究院有限公司 Distributed energy power generation load prediction system and method based on graph neural network
WO2023226755A1 (en) * 2022-05-26 2023-11-30 东南大学 Emotion recognition method based on human-object spatio-temporal interaction behavior
WO2023227141A1 (en) * 2022-05-25 2023-11-30 清华大学 Confrontation scene semantic analysis method and apparatus based on target-attribute-relationship

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378381A (en) * 2019-06-17 2019-10-25 华为技术有限公司 Object detecting method, device and computer storage medium
CN111310672A (en) * 2020-02-19 2020-06-19 广州数锐智能科技有限公司 Video emotion recognition method, device and medium based on time sequence multi-model fusion modeling
CN112182209A (en) * 2020-09-24 2021-01-05 东北大学 GCN-based cross-domain emotion analysis method under lifelong learning framework
US20210000404A1 (en) * 2019-07-05 2021-01-07 The Penn State Research Foundation Systems and methods for automated recognition of bodily expression of emotion
CN112348075A (en) * 2020-11-02 2021-02-09 大连理工大学 Multi-mode emotion recognition method based on contextual attention neural network
CN112712127A (en) * 2021-01-07 2021-04-27 北京工业大学 Image emotion polarity classification method combined with graph convolution neural network
US20210125067A1 (en) * 2019-10-29 2021-04-29 Kabushiki Kaisha Toshiba Information processing device, information processing method, and program
CN112733764A (en) * 2021-01-15 2021-04-30 天津大学 Method for recognizing video emotion information based on multiple modes
CN112818861A (en) * 2021-02-02 2021-05-18 南京邮电大学 Emotion classification method and system based on multi-mode context semantic features

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110378381A (en) * 2019-06-17 2019-10-25 华为技术有限公司 Object detecting method, device and computer storage medium
US20210000404A1 (en) * 2019-07-05 2021-01-07 The Penn State Research Foundation Systems and methods for automated recognition of bodily expression of emotion
US20210125067A1 (en) * 2019-10-29 2021-04-29 Kabushiki Kaisha Toshiba Information processing device, information processing method, and program
CN111310672A (en) * 2020-02-19 2020-06-19 广州数锐智能科技有限公司 Video emotion recognition method, device and medium based on time sequence multi-model fusion modeling
CN112182209A (en) * 2020-09-24 2021-01-05 东北大学 GCN-based cross-domain emotion analysis method under lifelong learning framework
CN112348075A (en) * 2020-11-02 2021-02-09 大连理工大学 Multi-mode emotion recognition method based on contextual attention neural network
CN112712127A (en) * 2021-01-07 2021-04-27 北京工业大学 Image emotion polarity classification method combined with graph convolution neural network
CN112733764A (en) * 2021-01-15 2021-04-30 天津大学 Method for recognizing video emotion information based on multiple modes
CN112818861A (en) * 2021-02-02 2021-05-18 南京邮电大学 Emotion classification method and system based on multi-mode context semantic features

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114154558A (en) * 2021-11-12 2022-03-08 山东浪潮科学研究院有限公司 Distributed energy power generation load prediction system and method based on graph neural network
CN114154558B (en) * 2021-11-12 2024-05-21 山东浪潮科学研究院有限公司 Distributed energy power generation load prediction system and method based on graph neural network
WO2023227141A1 (en) * 2022-05-25 2023-11-30 清华大学 Confrontation scene semantic analysis method and apparatus based on target-attribute-relationship
WO2023226755A1 (en) * 2022-05-26 2023-11-30 东南大学 Emotion recognition method based on human-object spatio-temporal interaction behavior

Also Published As

Publication number Publication date
WO2022262098A1 (en) 2022-12-22

Similar Documents

Publication Publication Date Title
CN107609009B (en) Text emotion analysis method and device, storage medium and computer equipment
CN113392781A (en) Video emotion semantic analysis method based on graph neural network
CN110046656B (en) Multi-mode scene recognition method based on deep learning
CN111144448A (en) Video barrage emotion analysis method based on multi-scale attention convolutional coding network
Abdullah et al. Facial expression recognition in videos: An CNN-LSTM based model for video classification
CN108446601A (en) A kind of face identification method based on sound Fusion Features
CN104063683A (en) Expression input method and device based on face identification
KR20200054613A (en) Video metadata tagging system and method thereof
CN110929762B (en) Limb language detection and behavior analysis method and system based on deep learning
CN111160134A (en) Human-subject video scene analysis method and device
CN111369646B (en) Expression synthesis method integrating attention mechanism
CN114677687A (en) ViT and convolutional neural network fused writing brush font type rapid identification method
CN112733764A (en) Method for recognizing video emotion information based on multiple modes
Mariooryad et al. Feature and model level compensation of lexical content for facial emotion recognition
CN116758451A (en) Audio-visual emotion recognition method and system based on multi-scale and global cross attention
CN111368663A (en) Method, device, medium and equipment for recognizing static facial expressions in natural scene
CN116167015A (en) Dimension emotion analysis method based on joint cross attention mechanism
CN113254713B (en) Multi-source emotion calculation system and method for generating emotion curve based on video content
Krňoul et al. Correlation analysis of facial features and sign gestures
Yu et al. Aud-tgn: Advancing action unit detection with temporal convolution and gpt-2 in wild audiovisual contexts
Wang et al. PAU-Net: Privileged action unit network for facial expression recognition
Zhang et al. ECMER: Edge-cloud collaborative personalized multimodal emotion recognition framework in the Internet of vehicles
Ni et al. Fusion learning model for mobile face safe detection and facial gesture analysis
Zhu et al. Emotion Recognition in Learning Scenes Supported by Smart Classroom and Its Application.
Wang Improved Generative Adversarial Networks for Student Classroom Facial Expression Recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20210914