CN112989950A - Violent video recognition system oriented to multi-mode feature semantic correlation features - Google Patents

Violent video recognition system oriented to multi-mode feature semantic correlation features Download PDF

Info

Publication number
CN112989950A
CN112989950A CN202110185761.9A CN202110185761A CN112989950A CN 112989950 A CN112989950 A CN 112989950A CN 202110185761 A CN202110185761 A CN 202110185761A CN 112989950 A CN112989950 A CN 112989950A
Authority
CN
China
Prior art keywords
video
module
violent
information
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110185761.9A
Other languages
Chinese (zh)
Inventor
张笑钦
李兵
胡卫明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wenzhou University
Original Assignee
Wenzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wenzhou University filed Critical Wenzhou University
Priority to CN202110185761.9A priority Critical patent/CN112989950A/en
Publication of CN112989950A publication Critical patent/CN112989950A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/698Control of cameras or camera modules for achieving an enlarged field of view, e.g. panoramic image capture

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Evolutionary Biology (AREA)
  • Child & Adolescent Psychology (AREA)
  • General Health & Medical Sciences (AREA)
  • Hospice & Palliative Care (AREA)
  • Psychiatry (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a violent video identification system facing multi-mode characteristic semantic correlation characteristics, which comprises a video acquisition module, a video segmentation module, a video processing module and a violent video identification module, wherein the video acquisition module, the video segmentation module, the video processing module and the violent video identification module are sequentially connected, and the violent video identification system is provided with: violent videos in the videos can be identified in a mode of semantic association of three modal features, namely visual, auditory and text information, and the advantages of accuracy and high identification efficiency can be achieved.

Description

Violent video recognition system oriented to multi-mode feature semantic correlation features
Technical Field
The invention relates to the technical field of recognition, in particular to a violent video recognition system for multi-modal feature semantic association features.
Background
With the rapid development of video network technology, people can be exposed to various videos and simultaneously can be exposed to violent videos. The existence of the violent videos influences the physical and mental health of people, so when people watch online videos or monitoring videos monitored by staff, the violent videos need to be searched and detected, and video identification is often needed for searching the violent videos. When identifying a violent video, the video is generally required to be prepared before identification, the identification mode of the violent video is single at present, the preparation of identifying files and an identification environment is complex, the violent video cannot be accurately identified, and the identification of the violent video is difficult.
Therefore, a violent video system oriented to the multi-modal feature semantic association features is urgently needed to solve the problems.
Disclosure of Invention
In view of the above, the present invention provides a violent video recognition system oriented to the multi-modal feature semantic association feature, so as to solve the above technical problems.
In order to achieve the purpose, the invention provides the following technical scheme:
a violent video recognition system facing multi-modal feature semantic association features comprises the following components: the system comprises a video acquisition module, a video segmentation module, a video processing module and a violent video identification module, wherein the video acquisition module, the video segmentation module, the video processing module and the violent video identification module are sequentially connected;
the video acquisition module is used for acquiring video information, and the video information comprises violent video information and non-violent video information;
the video segmentation module is used for segmenting the video information obtained by the video acquisition module into a plurality of video shots according to a video segmentation technology, and extracting relevant visual features, audio features and text features from each video shot to obtain corresponding image information to be identified, audio information to be identified and text information to be identified;
the video processing module is used for carrying out video preprocessing on the plurality of divided video lenses and comprises an image processing module, an audio processing module and a text processing module;
the violence video identification module is used for judging whether the video information processed by the video processing module belongs to violence video information or not, and comprises a violence audio judgment module, a violence image judgment module and a violence text judgment module.
Further, the violent video recognition system facing the multi-modal feature semantic association features further comprises a common violent scene module used for storing and storing a violent scene template, and the common violent scene module is connected with the violent video recognition module.
Further, the common violent scene template comprises common violent scene audio characteristic information, common violent scene image characteristic information and common violent scene text characteristic information.
Further, the image processing module is configured to pre-process an image in the video lens, and the image processing module includes an image deduplication module, an image gray scale calculation module, and an image contrast enhancement module, where the image deduplication module is configured to remove overlapping information from image information in the video lens, the gray scale calculation module is configured to calculate a gray scale value of the image, the image contrast enhancement module is configured to enhance the gray scale value of the image, and the image deduplication module, the image gray scale calculation module, and the image contrast enhancement module are sequentially connected.
Further, the audio processing module is configured to pre-process the audio information in the video lens module, and the audio processing module includes a low-frequency filtering module, where the low-frequency filtering module is configured to remove low frequencies in the audio information in the video lens module.
Further, the text processing module is used for preprocessing the text information in the video shot, and the text processing module comprises a text drying module which is used for removing redundant noise in the text information in the video shot.
Further, the judgment method of the violent audio judgment module is as follows: firstly, fusing the processed audio characteristic information with the audio characteristic of a common violent scene to obtain processed fused audio characteristic information; secondly, comparing and judging the audio characteristic information of the common violent scene and the processed audio characteristic information by using a classifier, and marking the processed audio characteristic information matched with the audio information of the common violent scene as violent audio information; the judgment methods of the violent image judgment module and the violent text module are the same as the judgment method of the violent audio judgment module.
Further, the violent video information comprises at least one of marked violent audio characteristic information, marked violent image characteristic information and marked violent text characteristic information.
Further, the violent video identification system facing the multi-modal characteristic semantic correlation characteristics further comprises a timing starting module, wherein the timing starting module is used for starting a video system at regular time to carry out violent video identification, and the timing starting module is connected with the video acquisition module.
The technical scheme can show that the invention has the advantages that:
1. the method comprises the steps that a video is divided into a plurality of video shots through a video division technology, each shot video comprises an image module to be identified, an audio module to be identified and a text module to be identified, and each shot is subjected to video processing and identification so as to achieve the purpose of accurate identification;
2. the characteristics in the video lens are extracted in a mode of combining the image, the audio and the text, and the multi-mode characteristic violent videos are subjected to combined identification, so that the violent videos are identified more accurately, and the practicability of the violent videos is improved.
3. Through setting up regularly the start module, can realize opening or closing violence video identification system at certain time, do not need artifical manual operation, can accomplish intelligence and open or close violence video identification system.
In addition to the objects, features and advantages described above, other objects, features and advantages of the present invention are also provided. The present invention will be described in further detail below with reference to the drawings.
Drawings
In the drawings:
fig. 1 is a schematic structural diagram of a violent video recognition system oriented to multi-modal feature semantic association features.
FIG. 2 is a step diagram of a video segmentation technique in a violent video recognition system oriented to multi-modal feature semantic related features according to the present invention.
FIG. 3 is a step diagram of a video shot with clear view acquisition in a violent video identification system oriented to multi-modal characteristic semantic relation characteristics.
FIG. 4 is a structural diagram of the components of a video shot in the violent video identification system facing the multi-modal characteristic semantic relation characteristic.
FIG. 5 is a schematic step diagram of a violent video feature recognition system oriented to the multi-modal feature semantic relation features.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The violent video identification system facing the multi-modal characteristic semantic association characteristic comprises a timing starting module, a video acquisition module, a video segmentation module, a video processing module, a common violent scene module and a violent video identification module, wherein the timing starting module starts the video identification system at regular time, the video segmentation module segments video information received by the video acquisition module into video shots, the video processing module processes the received video shots, the violent video identification module determines whether the video processing module belongs to a violent video by comparing with the scene violent scene module, and the timing starting module, the video acquisition module, the video segmentation module, the video processing module and the violent video identification module are all connected, the common violence scene module is connected with the violence video identification module.
The timing starting module is used for starting the video acquisition module at a fixed time, and is connected with the video acquisition module.
Specifically, the timing starting module comprises a timing starter, and the timing starter controls a switch of the violence identification system. And when the set timing starting time is reached, the video acquisition module is opened according to the timing starting time to acquire the video information.
The video acquisition module can adopt a plurality of panoramic cameras to simultaneously acquire video data and audio data and acquire video information, and the acquired video information comprises violent video information and non-violent video information.
Specifically, when the timing starting module is opened and a video acquisition request is received, the plurality of panoramic cameras acquire video information. In order to ensure the definition of the video, the video acquisition range of the panoramic camera is 5 m.
The video segmentation module is used for segmenting the video information obtained by the video acquisition module into a plurality of video shots according to a video segmentation technology.
Specifically, as shown in fig. 2, the implementation of video segmentation includes the following two steps:
the first step is as follows: judging the continuity of the acquired video information;
the second step is that: and dividing the shot into a plurality of video shots according to the judgment result, if the video information is continuous, the current video shot is determined, and if the video information is discontinuous, the next video shot is determined.
Generally, video information obtained from a television, a movie or a panoramic camera from a scene is shot by a plurality of shots, and the video information of the same video shot is continuous in a common situation, while the video information between two shots is discontinuous, so that the video information is mostly composed of a plurality of video shots.
Preferably, the method for judging the sharpness of the acquired video shots and extracting the video shots with higher sharpness includes the following steps, as shown in fig. 3:
step 1: and intercepting video stream information from the video lens for acquiring the video information.
Specifically, a computer may be used to intercept a continuous video stream to obtain an image, and then analyze the image.
Step 2: and judging whether the video stream is in YUV format, if so, executing the step 3, and if not, executing the step 4.
And step 3: and analyzing the image area of the intercepted video stream.
Specifically, factors of the place and time appearing in the video stream are removed, and a rectangular area in the middle part of the image is reserved.
And 4, step 4: an evaluation function of the sharpness of the video stream is calculated.
Specifically, the sum of the gradients of the sharp points and the sum of the gradients of all the pixels are calculated for the rectangular region of the remaining middle portion, and the sharpness evaluation function is determined according to the ratio of the sum of the gradients of the sharp points to the sum of the gradients of all the pixels.
And 5, judging whether the image in the specific time is clear or not.
Specifically, the specific time may be a unit time. Counting the video frame number with abnormal video definition in unit time, if the video frame number exceeds a certain ratio of the total frame number of the video images in the unit time, the acquired video shot definition is abnormal, otherwise, the acquired video shot definition is normal.
Step 6: and acquiring video shot information with higher definition.
As shown in fig. 4, for each clear video shot, relevant visual features, audio features and text features are extracted through a deep learning neural network model to obtain corresponding image information to be recognized, audio information to be recognized and text information to be recognized. The characteristics of the video lens are extracted by adopting three modes of images, audios and texts, the video lens comprises an image module, an audio module and a text module, the image module is used for storing image information, the audio module is used for storing audio information, and the text module is used for storing text information.
The video processing module is used for carrying out video preprocessing on the plurality of divided video lenses and comprises an image processing module, an audio processing module and a text processing module.
The image processing module comprises an image duplicate removal module, an image gray scale calculation module and an image contrast enhancement module, the image duplicate removal module is used for removing overlapping information of image information in the video lens, the gray scale calculation module is used for calculating an image gray scale value, and the image contrast enhancement module is used for enhancing the gray scale value of the image, so that the identifiability of the image is improved conveniently.
Specifically, the image information in each lens is subjected to overlap elimination according to the image area, and the image information with a large area is reserved; calculating the range of the gray value of the image according to the obtained de-overlapped image information by means of image binarization, and solving a minimum gray value min and a maximum gray value max; and stretching the gray value of the obtained image to be within the interval of [0,255] so as to enhance the identification degree of the image. And preprocessing the image module according to the image processing module to obtain relatively clear image information with relatively good quality, so that the image information can be compared with the image information in the common violent scene module conveniently.
The audio processing module is used for processing the audio information in the video lens, and comprises a low-frequency filtering module which is used for removing low frequency in the audio information in the video lens, so that the quality of the audio is enhanced and the audio information with the quality is obtained.
The text processing module is used for processing text information in the shot video and comprises a text drying module; the text drying module is used for removing irrelevant noise in the text information.
The common violent scene module is a template of a violent video to be identified by a video identification, and the common violent scene module is connected with the violent video identification module. The method comprises the following steps of obtaining common violent scene audio characteristic information of violent images, common violent scene image characteristic information and common violent scene text characteristic information.
The extraction of the common violent scene audio characteristic information comprises the following steps: audio energy characteristic information, short-time average energy intensity, gene frequency, audio energy entropy and other characteristic information.
Specifically, common violence scene audio information may be defined to include audio information such as screech, hoarse, and explosion. The audio extraction steps for common violent scenes are as follows: extracting audio signals from a common violent scene template through a high-pass filter, converting the audio information into a spectrogram, carrying out forward analysis through a neural network, extracting violent audio information, and using the spectrogram as a comparison template.
The extraction of the common violent scene image feature information comprises the following steps: average motion intensity information, blood smell characteristic information, flame characteristic information and other characteristic information.
Specifically, the image information of the common violent scene can be defined to include image information of bleeding, knives, guns, explosions, actions and the like. The image extraction steps of the common violent scene are as follows: and extracting image signals from the common violence template, extracting violence image information through forward analysis of a neural network, and taking the image information as a comparison template.
The extraction of the text characteristic information of the common violent scenes comprises the following steps: sensitive word information, sensitive phrase information, etc.
Specifically, the text information of the common violent scenes can be defined to include character information such as horror, violence, bloody smell, blood and the like. The extraction steps of the common violent scene texts comprise extracting text signals from common violent scene templates and extracting common violent text features from text information, wherein the extraction of the common violent text features can be extracted by adopting a bag-of-words model.
The judgment method of the violent audio judgment module comprises the following steps: firstly, fusing the processed audio characteristic information with the audio characteristics of a common violent scene to obtain the processed fused audio characteristic information and the fused audio characteristic information of the common violent scene; and secondly, comparing and judging the fusion audio characteristic information of the common violent scenes and the processed audio characteristic information by using a classifier, and marking the processed audio characteristic information matched with the audio information of the common violent scenes as violent audio information.
Specifically, if a certain video shot contains explosive audio information, the audio information in the video shot is compared with the audio information of a common violent scene for judgment, if the certain video shot contains explosive sound, the video shot is indicated to contain audio characteristic information corresponding to the audio information of the common violent scene, the video shot contains violent audio information, the video shot is marked as violent audio information, otherwise, the video shot is marked as non-violent audio characteristic information.
The judgment methods of the violent image judgment module and the violent text module are the same as the judgment method of the violent audio judgment module.
The violent video information comprises at least one of the characteristic information of the marker violent audio frequency, the characteristic information of the marker violent image and the characteristic information of the marker violent text.
Preferably, the riot and terrorist identification system facing the multimodal feature semantic association features can further be provided with a violence video warning module, and the violence video warning module is connected with the violence video identification module. When the system identifies that the video information is violent video information, the violent video warning module reminds the user of the violent video information and informs that violent videos exist.
Specifically, the violence video warning module can adopt an audible and visual alarm to realize the reminding function through the sound and light modes. This violence video warning module's setting can realize reminding the violence video that discerns out, makes the staff convenient more audio-visual the recognition of looking over the violence video.
As shown in fig. 5, a violent video recognition system oriented to the multi-modal feature semantic association features is implemented as follows:
s1: acquiring video information, S2: segmenting the acquired video information into a plurality of video shots, S3: processing the video information in each video shot, S4: and comparing the processed video characteristic information with the characteristic information of the common violent scenes to determine whether the processed video characteristic information is violent video information.
Step S1: video information is obtained through video acquisition modules such as a panoramic camera and the like, and the video information comprises violent video information and non-violent video information.
Step S2: and segmenting the acquired video information into a plurality of pieces of video information by using a video segmentation technology, wherein the video shot information comprises audio information to be identified, image information to be identified and text information to be identified.
Step S3: and preprocessing the audio information to be identified, the image information to be identified and the text information to be identified to obtain corresponding audio processing information, image processing information and text processing information.
Step S4: and comparing the audio processing information, the image processing information and the text processing information in each video shot with the audio information, the image information and the text information in the common violent scene information, wherein at least one of the image processing information corresponding to the violent scene image characteristic information, the audio processing information corresponding to the violent scene audio characteristic information and the text processing information corresponding to the violent scene text characteristic information is violent video information.
The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (9)

1. A violent video recognition system oriented to multi-modal feature semantic related features, comprising: the system comprises a video acquisition module, a video segmentation module, a video processing module and a violent video identification module, wherein the video acquisition module, the video segmentation module, the video processing module and the violent video identification module are sequentially connected;
the video acquisition module is used for acquiring video information, and the video information comprises violent video information and non-violent video information;
the video segmentation module is used for segmenting the video information obtained by the video acquisition module into a plurality of video shots according to a video segmentation technology, extracting relevant visual features, audio features and text features from each video shot and obtaining corresponding image information to be identified, audio information to be identified and text information to be identified;
the video processing module is used for carrying out video preprocessing on the plurality of divided video lenses and comprises an image processing module, an audio processing module and a text processing module;
the violence video identification module is used for judging whether the video information processed by the video processing module belongs to violence video information or not, and comprises a violence image judgment module, a violence audio judgment module and a violence text judgment module.
2. The violent video recognition system oriented to the multimodal feature semantic related features of claim 1, further comprising a common violent scene module for storing and storing violent scene templates, wherein the common violent scene module is connected with the violent video recognition module.
3. The violent video recognition system oriented to the multimodal feature semantic related features of claim 2, wherein the common violent scene templates comprise common violent scene audio feature information, common violent scene image feature information and common violent scene text feature information.
4. The violent video identification system oriented to the multi-modal characteristic semantic related characteristics, according to claim 1, wherein the image processing module is used for processing image information in the video shot, the image processing module comprises an image deduplication module, an image gray scale calculation module and an image contrast enhancement module, the image deduplication module is used for removing overlapped information of the image information in the video shot, the gray scale calculation module is used for calculating an image gray scale value, the image contrast enhancement module is used for enhancing the gray scale value of the image, and the image deduplication module, the image gray scale calculation module and the image contrast enhancement module are connected in sequence.
5. The violent video recognition system of claim 1, wherein the audio processing module is configured to process audio information in the video shot, and the audio processing module comprises a low frequency filtering module configured to remove low frequencies in the audio information in the video shot.
6. The violent video recognition system of claim 1, wherein the text processing module is configured to pre-process text information in the video shot, and the text processing module comprises a text dessication module configured to remove noise from the text information in the video shot.
7. The violent video recognition system oriented to the multi-modal characteristic semantic related characteristics of claim 5, wherein the violent audio judging module is used for judging by: firstly, fusing the processed audio characteristic information with the audio characteristic of a common violent scene to obtain processed fused audio characteristic information; secondly, comparing and judging the fusion audio characteristic information of the common violent scenes and the processed audio characteristic information by using a classifier, and marking the processed audio characteristic information matched with the audio information of the common violent scenes as violent audio information and as violent audio information; the judgment methods of the violent image judgment module and the violent text module are the same as the judgment method of the violent audio judgment module.
8. The violence video recognition system for multimodal feature semantic association oriented features of claim 1, wherein the violence video information comprises at least one of tagged violence audio feature information, tagged violence image feature information, and tagged violence text feature information.
9. The violent video recognition system oriented to the multimodal feature semantic related features of claim 1, further comprising a timed starting module, wherein the timed starting module is used for the timed starting system to perform violent video recognition, and the timed starting module is connected with the video acquisition module.
CN202110185761.9A 2021-02-11 2021-02-11 Violent video recognition system oriented to multi-mode feature semantic correlation features Pending CN112989950A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110185761.9A CN112989950A (en) 2021-02-11 2021-02-11 Violent video recognition system oriented to multi-mode feature semantic correlation features

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110185761.9A CN112989950A (en) 2021-02-11 2021-02-11 Violent video recognition system oriented to multi-mode feature semantic correlation features

Publications (1)

Publication Number Publication Date
CN112989950A true CN112989950A (en) 2021-06-18

Family

ID=76393237

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110185761.9A Pending CN112989950A (en) 2021-02-11 2021-02-11 Violent video recognition system oriented to multi-mode feature semantic correlation features

Country Status (1)

Country Link
CN (1) CN112989950A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673364A (en) * 2021-07-28 2021-11-19 上海影谱科技有限公司 Video violence detection method and device based on deep neural network
CN114239570A (en) * 2021-12-02 2022-03-25 北京智美互联科技有限公司 Sensitive data identification method and system based on semantic analysis
CN114519828A (en) * 2022-01-17 2022-05-20 天津大学 Video detection method and system based on semantic analysis
CN114821385A (en) * 2022-03-08 2022-07-29 阿里巴巴(中国)有限公司 Multimedia information processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101834982A (en) * 2010-05-28 2010-09-15 上海交通大学 Hierarchical screening method of violent videos based on multiplex mode
CN103218608A (en) * 2013-04-19 2013-07-24 中国科学院自动化研究所 Network violent video identification method
US20170289624A1 (en) * 2016-04-01 2017-10-05 Samsung Electrônica da Amazônia Ltda. Multimodal and real-time method for filtering sensitive media
WO2019127659A1 (en) * 2017-12-30 2019-07-04 惠州学院 Method and system for identifying harmful video based on user id
WO2019127651A1 (en) * 2017-12-30 2019-07-04 惠州学院 Method and system thereof for identifying malicious video
CN112069884A (en) * 2020-07-28 2020-12-11 中国传媒大学 Violent video classification method, system and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101834982A (en) * 2010-05-28 2010-09-15 上海交通大学 Hierarchical screening method of violent videos based on multiplex mode
CN103218608A (en) * 2013-04-19 2013-07-24 中国科学院自动化研究所 Network violent video identification method
US20170289624A1 (en) * 2016-04-01 2017-10-05 Samsung Electrônica da Amazônia Ltda. Multimodal and real-time method for filtering sensitive media
WO2019127659A1 (en) * 2017-12-30 2019-07-04 惠州学院 Method and system for identifying harmful video based on user id
WO2019127651A1 (en) * 2017-12-30 2019-07-04 惠州学院 Method and system thereof for identifying malicious video
CN112069884A (en) * 2020-07-28 2020-12-11 中国传媒大学 Violent video classification method, system and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673364A (en) * 2021-07-28 2021-11-19 上海影谱科技有限公司 Video violence detection method and device based on deep neural network
CN114239570A (en) * 2021-12-02 2022-03-25 北京智美互联科技有限公司 Sensitive data identification method and system based on semantic analysis
CN114519828A (en) * 2022-01-17 2022-05-20 天津大学 Video detection method and system based on semantic analysis
CN114821385A (en) * 2022-03-08 2022-07-29 阿里巴巴(中国)有限公司 Multimedia information processing method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN112989950A (en) Violent video recognition system oriented to multi-mode feature semantic correlation features
CN109284729B (en) Method, device and medium for acquiring face recognition model training data based on video
CN110569720B (en) Audio and video intelligent identification processing method based on audio and video processing system
CN110704682B (en) Method and system for intelligently recommending background music based on video multidimensional characteristics
CN110795595B (en) Video structured storage method, device, equipment and medium based on edge calculation
CN109766779B (en) Loitering person identification method and related product
US10037467B2 (en) Information processing system
CN106708949A (en) Identification method of harmful content of video
CN110852147B (en) Security alarm method, security alarm device, server and computer readable storage medium
WO2005024707A1 (en) Apparatus and method for feature recognition
WO2018040306A1 (en) Method for detecting frequent passers-by in monitoring video
CN111797820B (en) Video data processing method and device, electronic equipment and storage medium
CN105335691A (en) Smiling face identification and encouragement system
CN110852306A (en) Safety monitoring system based on artificial intelligence
KR101092472B1 (en) Video indexing system using surveillance camera and the method thereof
CN109033476A (en) A kind of intelligent space-time data event analysis method based on event clue network
CN111126411B (en) Abnormal behavior identification method and device
KR101413620B1 (en) Apparatus for video to text using video analysis
CN110175553B (en) Method and device for establishing feature library based on gait recognition and face recognition
KR101547255B1 (en) Object-based Searching Method for Intelligent Surveillance System
CN111428589B (en) Gradual transition identification method and system
Das et al. Human face detection in color images using HSV color histogram and WLD
CN109977891A (en) A kind of object detection and recognition method neural network based
CN114241363A (en) Process identification method, process identification device, electronic device, and storage medium
CN115115976A (en) Video processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination