CN113297934B - Multi-mode video behavior analysis method for detecting Internet violence harmful scene - Google Patents

Multi-mode video behavior analysis method for detecting Internet violence harmful scene Download PDF

Info

Publication number
CN113297934B
CN113297934B CN202110512224.0A CN202110512224A CN113297934B CN 113297934 B CN113297934 B CN 113297934B CN 202110512224 A CN202110512224 A CN 202110512224A CN 113297934 B CN113297934 B CN 113297934B
Authority
CN
China
Prior art keywords
emotion
features
video
words
basic
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110512224.0A
Other languages
Chinese (zh)
Other versions
CN113297934A (en
Inventor
郭承禹
鲍泽民
潘进
王磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Computer Network and Information Security Management Center
Original Assignee
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by National Computer Network and Information Security Management Center filed Critical National Computer Network and Information Security Management Center
Priority to CN202110512224.0A priority Critical patent/CN113297934B/en
Publication of CN113297934A publication Critical patent/CN113297934A/en
Application granted granted Critical
Publication of CN113297934B publication Critical patent/CN113297934B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a multi-mode video behavior analysis method for detecting Internet violence harmful scenes, which mainly comprises three stages of rapid positioning detection of video scene characters, video scene behavior discrimination and video scene harmful degree qualitative.

Description

Multi-mode video behavior analysis method for detecting Internet violence harmful scene
Technical Field
The invention belongs to the technical field of information security, and particularly relates to a multi-mode video behavior analysis method for detecting Internet violence harmful scenes.
Background
With the development of multimedia technology, various emerging rapid and diversified media manifestations are presented in people's daily social activities. The emerging media brings convenience to life of people, and a large amount of negative information can be rapidly spread among people by means of rapidly-developed network technology and widely-popularized mobile intelligent terminals. How to find out the negative information in time, the propagation of the negative information is killed in the sprouting stage, which is a common concern of novel media and network supervision departments, can prevent the social masses from being poisoned by the negative information, and effectively purify the network ecology.
In the massive user generated videos, the ratio of harmful violent videos is extremely low, and the identification difficulty of the harmful violent videos is increased due to unbalanced distribution of sample categories. The current active discovery method of the harmful video mainly crawls information such as audio and video scenes, topics, station marks, captions and the like with certain limiting conditions, returns large data volume and most redundant contents, and increases the working difficulty for further manual judgment. And the research of harmful videos is mainly aimed at scenes such as pornography, and the research of harmful judgment of violence content is relatively late.
The traditional violent video detection method mainly aims at the audio and image characteristics of the video, utilizes a visual word bag model and a pooling technology to optimally construct video content representation characteristics, and is still limited to scene mode characteristics of the video. Information for the high-level semantic layer is still difficult to capture, so that content harmful to the public cannot be distinguished from video-type and educational program content. In addition, as a new characteristic and a core function of interaction among users in new media, comment information of videos can effectively assist in screening and judging video contents. Therefore, the emotional characteristics of the characters and the video comment information are introduced, a multi-mode characteristic fusion multi-task learning model is established, and the benefits of all subtasks and the whole tasks are maximized by integrating all the characteristics.
Disclosure of Invention
In view of the above, the invention provides a multi-mode video behavior analysis method for detecting Internet violence harmful scenes, which can quickly and accurately discover videos with harmful scenes from massive user-generated videos.
The technical scheme for realizing the invention is as follows:
a multimode video behavior analysis method for detecting Internet violence harmful scenes comprises the following steps:
step one, detecting a character target by taking an apparent feature and a rotation invariant feature of the apparent feature as feature descriptors;
dividing the whole human body into n regions, sequentially reorganizing adjacent regions to generate human body region detection templates with different scales, and respectively training the human body region detection templates with different scales by using CNN (computer numerical network), wherein the input of the training process is character videos with different shielding degrees;
step three, human body target detection is carried out, and the detection process is expressed as follows in an abstract way:
mapping an original video x to a feature matrix M through a feature mapping function k, calculating scoring parameters s through a component detector g, recording the probability of each component in a detection area, which is obtained according to apparent features, calculating the visibility parameters v of each component of a human body in a scene through a layered CNN model f obtained through training in the step two, correcting the scoring parameters s, and finally passing through a discriminant function in a CNN networkJudging whether a human body target exists in the detection area to calculate a detection result y;
step four, taking action characteristics, scene characteristics and emotion characteristics as the input of a LSTM (Long Short Term Memory) cyclic neural network, taking target behavior words as the output, training an LSTM model, realizing the initial judgment of target behaviors in videos, eliminating videos without harmful scenes, and executing the operation of step five aiming at the videos with harmful scenes;
fifthly, marking basic scores of words in the basic emotion word library to form a basic emotion word dictionary, extracting basic emotion words in the video input barrage, and inquiring the basic scores of the basic emotion words from the basic emotion word dictionary to carry out assignment;
step six, dividing emotion categories in the basic emotion word dictionary into 7 dimensions of 'happiness, anger, fear and sadness', and independently calculating emotion scores in each dimension; calculating the emotion value of each bullet screen by using the following formula;
S=∑a j Q(b j ×c j ,b)+∑α i +∑β m +∑ε l
wherein J takes a value of 1-J, and J is the total number of emotion words; b j For the basic emotion score of the jth emotion word, the basic emotion word dictionary is directly matched and searchedPolling, value range [0,1 ]];c j = {1, -1} is a passive verb for judging whether the emotion word j is a reverse emotion of the emotion word; b is emotion score matrix of all emotion words, pigment words, harmonic words and continuous symbols of the barrage; the Q function is a cross-correlation function and is used for calculating the correlation degree of emotion tendencies of other emotion words b in the barrage, a j Weighting scores for the degree adverbs before and after the jth emotion word, and taking value ranges of [0, N ]]N can be specified according to actual requirements and is generally not more than 10; alpha i 、β m 、ε l The emotion parameters of three special barrages of pigment characters, harmonic words, continuous punctuations or digital symbols are respectively represented, I is 1-I, M is 1-M, L is 1-L, and I, M, L is the number of the three special barrages;
step seven, after the emotion value of each barrage is calculated, carrying out outlier detection by adopting an Isolation Forest method, clustering all barrage emotion values in the same time period, eliminating the barrages with abnormal emotion values, summing the emotion values of other normal barrages to obtain emotion parameters of the whole video, wherein the emotion parameters are 7-dimensional emotion category parameter vectors, the highest dimension of the score is the overall emotion tendency of the video, and the value is the final emotion score; and when the emotion of 'fear and dislike' appears in the whole video and exceeds 1/4 of the duration of the video, recommending the video.
Further, in the first step, when the apparent feature is constructed, selecting a YUV feature and a HOG feature; when the rotation invariant feature is constructed, the polar coordinate representation method is adopted to transform the image feature from the Cartesian coordinate system to the polar coordinate system, and the space invariance of the feature is kept.
Further, n=10.
In the fourth step, the action features select optical flow features, the scene features select DeCAF features, and among the emotion features, the overall facial expression recognition features of the person adopt PCA (Principal Component Analysis) features, and the local features adopt facial motion coding analysis features.
In the fifth step, the words which are not recorded in the basic emotion word dictionary are manually marked and then added into the basic emotion word dictionary.
The beneficial effects are that:
1. at present, the negative information detection of the internet cannot be solved by the traditional scene content detection or identification method, because judging whether the negative influence of the internet information on society is caused by the fact that the judgment dimension is complex, most of information cannot be judged through shallow semantic features, and the emotion of a transmitter and a transmitted person is highly relevant. The method of the invention utilizes the scene information of the video on one hand, and establishes high-level semantic information, such as emotion conveyed by video content and true emotion expressed by audience on the other hand, so as to judge whether the video is harmful violent video, and the accuracy is superior to that of the traditional method.
2. The traditional scene person detection method is poor in applicability aiming at complex scenes, is difficult to be applied to finding out harmful information in massive videos of the Internet, and a large amount of missing information can cause missed detection of the traditional method.
3. Aiming at the problems, the method provides a barrage emotion analysis method for accurately finding a target scene to a violent scene, and can be better applied to the application of finding a harmful scene compared with the traditional method.
Drawings
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 is a schematic diagram of a split template for a human body region.
Detailed Description
The invention will now be described in detail by way of example with reference to the accompanying drawings.
The invention provides a multimode video behavior analysis method for finding Internet violence harmful scenes, which mainly comprises the following steps: quick positioning and detection of a character in a video scene, judgment of behavior of the video scene and qualitative harmful degree of the video scene.
In the rapid detection of a character in a video scene, the invention provides a two-dimensional human body rapid detection method based on layered deep learning aiming at the complexity of the behavior of the character in the Internet.
When a feature descriptor is selected, the method disclosed by the invention is used for carrying out character target detection by taking the apparent features (YUV features and HOG features) and the rotation invariant features of the apparent features as feature descriptors, and when an invariance feature is constructed, a polar coordinate representation method is adopted for transforming the image features from a Cartesian coordinate system to a polar coordinate system, so that the spatial invariance of the features is kept.
Based on the characteristics, the convolutional neural network device is selected to perform category calibration on the image characteristics of the whole area, in order to avoid false detection and omission caused by shielding of human body parts in a complex scene, the method divides the whole human body into 10 areas based on human body structures, sequentially recombines adjacent areas as shown in fig. 2, generates human body area detection templates with different scales, respectively detects and analyzes the human body areas contained in each template in each layer by layering the human body area templates with different scales with containing relations, and finally transmits context information through the mutual containing relations between the layer templates, corrects the false judgment brought by the part detector, increases the detection rate of the partially shielded human body, judges whether each part of the human body is in a visible state in the scene, and corrects the apparent characteristics obtained in the apparent model according to the visible state.
The general human detection method process can be expressed abstractly as:
wherein x is an input image to be detected, k is a feature mapping function, M is a feature map obtained after feature extraction and learning, g is a component detector, s is a scoring parameter, the probability of each component obtained according to apparent features existing in the detection area is recorded,and judging whether the whole human body target exists in the detection area or not according to the scoring parameters of each human body part as a judging function, and finally obtaining a detection result y.
In this process, the scoring parameter si e S acquired by the component detector represents the probability that a certain region in the apparent feature M obtained through the feature map f is detected as the component i. However, the direct use of the scoring parameter to determine the target may cause errors due to complicated background, occlusion, etc., so that another parameter v is added in the process to measure the possibility of occlusion of each region of the human body in the original image, where the parameter is defined as a visibility parameter, and the whole detection process may be modified as follows:
thus, the probability distribution function is used to represent the objective function of the modelThe method comprises the following steps:
wherein p (y|v, s) corresponds to the discriminant functionp (v|s) corresponds to the visibility coefficient estimation function f. Discriminant functionThe probability that the target area is detected is judged directly according to the visibility parameter v and the scoring parameter s and is corrected asTherefore, the main problem for solving the human body target detection result y is positioned in the process of calculating the visibility parameter v corresponding to each layer of human body region template in the MLMM and calculating the expected value +.>The mapping relation between s and v is described by the limited Boltzmann machine, and related contents are not described in detail herein. According to the scoring parameters of different templates, the method can detect that when people in a scene are blocked or not fully, the judging scoring of the judging template is higher than that of other templates, so that the target people in the complex scene can be accurately detected.
In the video scene behavior discrimination, the method introduces the emotion characteristics of the video character, and adopts LSTM (Long Short Term Memory) cyclic neural network to respectively carry out scene recognition on the three characteristics of the bottom action characteristics, the scene characteristics and the emotion characteristics of the video character. The motion features select optical flow features, and tracks of front and rear frames are used as the motion features; the scene features select DeCAF features, and whether a target object to be detected is contained in the target video is detected through a specific scene object related to a predefined violent scene; the emotional features select facial expressions of the person to identify global PCA (Principal Component Analysis) features and local features facial motion coding analysis features. Taking the three groups of characteristics as input, taking target behavior words as output, training an LSTM model, realizing initial judgment of target behaviors, eliminating videos without harmful scenes, and judging the harmful degree of the video scenes aiming at the videos with harmful scenes.
In the aspect of harmful judgment, the method provides a barrage comment emotion assessment method aiming at a violent scene. For the input barrage, punctuation in the text is removed by a word filtering method, extracting the Chinese words in the comments by using the Chinese word bank, and then removing background words which occur at high frequency and are irrelevant to views in the video by using an implicit Dirichlet distribution model based on background removal, and finally, the rest words are basic emotion words, and inquiring initial scores of the basic emotion words from a basic emotion word dictionary for assignment. And manually labeling words which are not recorded in the dictionary, and then adding the words into the dictionary again.
In the selection of emotion words, the emotion classification is divided into 7 dimensions of 'happiness, anger, sadness and sadness' according to the emotion word library of the known network, and emotion scores are calculated independently for each dimension. When the emotion value of each barrage is calculated, the method designs the true emotion of the multidimensional feature judgment word according to the characteristics of the network barrage: s= Σa j Q(b j ×c j ,b)+∑α i +∑β m +∑ε l J takes the value of 1-J, J is the total number of emotion words; b j For the basic emotion score of the jth emotion word, the basic emotion word dictionary is directly matched with the query, and the value range is [0,1 ]];c j = {1, -1} is a passive verb for judging whether the emotion word j is a reverse emotion of the emotion word; b is emotion score matrix of all emotion words, pigment words, harmonic words and continuous symbols of the barrage; the Q function is a cross-correlation function and is used for calculating the correlation degree of emotion tendency of other emotion words b in the bullet screen when emotion words are in, the method selects a chi-square test and a T test method to calculate the data correlation, if the correlation is smaller, the probability that the bullet screen is a reverse bullet screen is higher, and the emotion score of the emotion words is reduced by taking the test parameters as weights; a, a j Weighting scores for the degree adverbs before and after the jth emotion word, and taking value ranges of [0, N ]]N can be specified according to actual requirements and is generally not more than 10; alpha i 、β m 、ε l Emotion parameters of three special barrages of pigment characters, harmonic words, continuous punctuation or digital symbols respectively, wherein i takes a value of 1 to moreAnd I, M takes values of 1-M, and L takes values of 1-L, wherein I, M, L are the occurrence numbers of the three special barrages respectively.
In order to reduce influence of individual views on emotions of the overall video program barrage, the method comprises the steps of detecting abnormal points by adopting an Isolation Forest method after calculating emotion parameters of a single barrage, clustering all barrage emotion parameters in the same time period, eliminating abnormal barrages in emotion clusters to reduce influence on the overall video emotion parameters, and carrying out summation calculation on barrages with other normal emotion values on the basis to obtain emotion parameters about the overall program, wherein the parameters are 7-dimensional emotion category parameter vectors, the highest dimension of the score is the overall emotion tendency of the video, and the value is the final emotion score. And when the emotion of 'fear and dislike' appears in the whole video and exceeds 1/4 of the duration of the video, recommending the video.
In summary, the above embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (4)

1. The multimode video behavior analysis method for detecting the Internet violence harmful scene is characterized by comprising the following steps of:
step one, detecting human body targets by taking apparent features and rotation invariant features of the apparent features as feature descriptors;
dividing the whole human body into n regions, sequentially reorganizing adjacent regions to generate human body region detection templates with different scales, and respectively training the human body region detection templates with different scales by using CNN (computer numerical network), wherein the input of the training process is character videos with different shielding degrees;
step three, detecting human body target, mapping the original video x to a feature matrix M through a feature mapping function k, calculating a scoring parameter s through a component detector g, recording the probability of each component in a detection area obtained according to apparent features,calculating the visibility parameter v of each part of the human body in the scene through the layered CNN model f obtained through training in the second step, correcting the scoring parameter s, and finally passing through the discriminant function in the CNN networkJudging whether a human body target exists in the detection area to calculate a detection result y;
step four, taking action characteristics, scene characteristics and emotion characteristics as inputs of an LSTM circulating neural network, taking target behavior words as outputs, training an LSTM model, realizing initial judgment of target behaviors in videos, eliminating videos without harmful scenes, and executing the operation of step five aiming at the videos with harmful scenes; the method comprises the steps of selecting action features to select optical flow features, selecting scene features to select DeCAF features, wherein in emotion features, PCA features are adopted as overall features for character facial expression recognition, and facial motion coding analysis features are adopted as local features;
fifthly, marking basic scores of words in the basic emotion word library to form a basic emotion word dictionary, extracting basic emotion words in the video input barrage, and inquiring the basic scores of the basic emotion words from the basic emotion word dictionary to carry out assignment;
step six, dividing emotion categories in the basic emotion word dictionary into 7 dimensions of 'happiness, anger, fear and sadness', and independently calculating emotion scores in each dimension; calculating the emotion value of each bullet screen by using the following formula;
S=∑a j Q(b j ×c j ,b)+∑α i +∑β m +∑ε l
wherein J takes a value of 1-J, and J is the total number of emotion words; b j For the basic emotion score of the jth emotion word, the basic emotion word dictionary is directly matched with the query, and the value range is [0,1 ]];c j = {1, -1} is a passive verb for judging whether the emotion word j is a reverse emotion of the emotion word; b is emotion score matrix of all emotion words, pigment words, harmonic words and continuous symbols of the barrage; the Q function is a cross-correlation function and is used for calculating the correlation degree of the current emotion word and the emotion tendencies of other emotion words b in the barrage, a j Is the j thThe weighted score of the degree adverbs before and after the emotion word, and the value range [0, N ]]N can be specified according to actual requirements and is generally not more than 10; alpha i 、β m 、ε l The emotion parameters of three special barrages of pigment characters, harmonic words, continuous punctuations or digital symbols are respectively represented, I is 1-I, M is 1-M, L is 1-L, and I, M, L is the number of the three special barrages;
step seven, after the emotion value of each barrage is calculated, carrying out outlier detection by adopting an Isolation Forest method, clustering all barrage emotion values in the same time period, eliminating the barrages with abnormal emotion values, summing the emotion values of other normal barrages to obtain emotion parameters of the whole video, wherein the emotion parameters are 7-dimensional emotion category parameter vectors, the highest dimension of the score is the overall emotion tendency of the video, and the value is the final emotion score; and when the emotion of 'fear and dislike' appears in the whole video and exceeds 1/4 of the duration of the video, recommending the video.
2. The method for multi-modal video behavior analysis for detecting Internet violence and harmfulness scenes according to claim 1, wherein in the first step, YUV features and HOG features are selected when the apparent features are constructed; when the rotation invariant feature is constructed, the polar coordinate representation method is adopted to transform the image feature from the Cartesian coordinate system to the polar coordinate system, and the space invariance of the feature is kept.
3. The method for multimodal video behavior analysis for detecting Internet violence nuisance scenes of claim 1, wherein n = 10.
4. The method for analyzing multi-modal video behavior for detecting Internet violence and harmfulness scenes according to claim 1, wherein in the fifth step, words which are not recorded in the basic emotion word dictionary are manually marked and then added into the basic emotion word dictionary.
CN202110512224.0A 2021-05-11 2021-05-11 Multi-mode video behavior analysis method for detecting Internet violence harmful scene Active CN113297934B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110512224.0A CN113297934B (en) 2021-05-11 2021-05-11 Multi-mode video behavior analysis method for detecting Internet violence harmful scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110512224.0A CN113297934B (en) 2021-05-11 2021-05-11 Multi-mode video behavior analysis method for detecting Internet violence harmful scene

Publications (2)

Publication Number Publication Date
CN113297934A CN113297934A (en) 2021-08-24
CN113297934B true CN113297934B (en) 2024-03-29

Family

ID=77321405

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110512224.0A Active CN113297934B (en) 2021-05-11 2021-05-11 Multi-mode video behavior analysis method for detecting Internet violence harmful scene

Country Status (1)

Country Link
CN (1) CN113297934B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117056560B (en) * 2023-10-12 2024-02-06 深圳市发掘科技有限公司 Automatic generation method and device of cloud menu and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015043075A1 (en) * 2013-09-29 2015-04-02 广东工业大学 Microblog-oriented emotional entity search system
CN105068988A (en) * 2015-07-21 2015-11-18 中国科学院自动化研究所 Multi-dimension multi-granularity emotion analysis method
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN110020437A (en) * 2019-04-11 2019-07-16 江南大学 The sentiment analysis and method for visualizing that a kind of video and barrage combine
WO2019184054A1 (en) * 2018-03-29 2019-10-03 网宿科技股份有限公司 Method and system for processing on-screen comment information
CN110851621A (en) * 2019-10-31 2020-02-28 中国科学院自动化研究所 Method, device and storage medium for predicting video wonderful level based on knowledge graph
CN111078944A (en) * 2018-10-18 2020-04-28 中国电信股份有限公司 Video content heat prediction method and device
WO2021004481A1 (en) * 2019-07-08 2021-01-14 华为技术有限公司 Media files recommending method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11205103B2 (en) * 2016-12-09 2021-12-21 The Research Foundation for the State University Semisupervised autoencoder for sentiment analysis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015043075A1 (en) * 2013-09-29 2015-04-02 广东工业大学 Microblog-oriented emotional entity search system
CN105068988A (en) * 2015-07-21 2015-11-18 中国科学院自动化研究所 Multi-dimension multi-granularity emotion analysis method
WO2019184054A1 (en) * 2018-03-29 2019-10-03 网宿科技股份有限公司 Method and system for processing on-screen comment information
CN108805089A (en) * 2018-06-14 2018-11-13 南京云思创智信息科技有限公司 Based on multi-modal Emotion identification method
CN111078944A (en) * 2018-10-18 2020-04-28 中国电信股份有限公司 Video content heat prediction method and device
CN110020437A (en) * 2019-04-11 2019-07-16 江南大学 The sentiment analysis and method for visualizing that a kind of video and barrage combine
WO2021004481A1 (en) * 2019-07-08 2021-01-14 华为技术有限公司 Media files recommending method and device
CN110851621A (en) * 2019-10-31 2020-02-28 中国科学院自动化研究所 Method, device and storage medium for predicting video wonderful level based on knowledge graph

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Multi-frame feature-fusion-based model for violence detection;Mujtaba Asad 等;《The Visual Computer》;20200624;第37卷;第1415–1431页 *

Also Published As

Publication number Publication date
CN113297934A (en) 2021-08-24

Similar Documents

Publication Publication Date Title
Yang et al. Video captioning by adversarial LSTM
Xi et al. Visual question answering model based on visual relationship detection
Tian et al. Multimodal deep representation learning for video classification
CN107515877B (en) Sensitive subject word set generation method and device
Alqahtani et al. Is image-based CAPTCHA secure against attacks based on machine learning? An experimental study
US20060218192A1 (en) Method and System for Providing Information Services Related to Multimodal Inputs
CN106803098A (en) A kind of three mode emotion identification methods based on voice, expression and attitude
US20150199567A1 (en) Document classification assisting apparatus, method and program
CN110263822A (en) A kind of Image emotional semantic analysis method based on multi-task learning mode
CN106537387B (en) Retrieval/storage image associated with event
Lin et al. Effective feature space reduction with imbalanced data for semantic concept detection
CN114662497A (en) False news detection method based on cooperative neural network
KR102185777B1 (en) Method for recognising semantic relationship between objects based on deep-learning and PLSI using computer
CN113297934B (en) Multi-mode video behavior analysis method for detecting Internet violence harmful scene
CN112347339A (en) Search result processing method and device
CN116633601A (en) Detection method based on network traffic situation awareness
Gu et al. Towards facial expression recognition in the wild via noise-tolerant network
Lee et al. Osanet: Object semantic attention network for visual sentiment analysis
Wang et al. Implicit emotion relationship mining based on optimal and majority synthesis from multimodal data prediction
Liu et al. A multimodal approach for multiple-relation extraction in videos
Liu et al. Multi-type decision fusion network for visual Q&A
CN114662586A (en) Method for detecting false information based on common attention multi-mode fusion mechanism
Cho et al. Recognizing human–human interaction activities using visual and textual information
Jin et al. Fusical: Multimodal fusion for video sentiment
CN112069836A (en) Rumor recognition method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant