CN112380999B - Detection system and method for inductivity bad behavior in live broadcast process - Google Patents

Detection system and method for inductivity bad behavior in live broadcast process Download PDF

Info

Publication number
CN112380999B
CN112380999B CN202011279463.8A CN202011279463A CN112380999B CN 112380999 B CN112380999 B CN 112380999B CN 202011279463 A CN202011279463 A CN 202011279463A CN 112380999 B CN112380999 B CN 112380999B
Authority
CN
China
Prior art keywords
video
time
bad behavior
model
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011279463.8A
Other languages
Chinese (zh)
Other versions
CN112380999A (en
Inventor
张斌
陈禹奇
刘思源
刘莹
Original Assignee
东北大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 东北大学 filed Critical 东北大学
Priority to CN202011279463.8A priority Critical patent/CN112380999B/en
Publication of CN112380999A publication Critical patent/CN112380999A/en
Application granted granted Critical
Publication of CN112380999B publication Critical patent/CN112380999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a detection system and a detection method for inductivity bad behaviors in a live broadcast process, wherein the detection system comprises a video set processing module, a spatial feature processing module, a temporal feature processing module and a fusion module.

Description

Detection system and method for inductivity bad behavior in live broadcast process
Technical Field
The invention relates to the field of computer vision recognition technology and convolutional neural networks, in particular to a system and a method for detecting inductivity bad behaviors in a live broadcast process, which are used for detecting whether the inductivity bad behaviors are mixed in a continuous action sequence in the live broadcast process.
Background
With the development of information technology and the popularization of intelligent hardware, especially the popularization of mobile intelligent terminal equipment, smart phones and personal palm computers have gradually become the best choice for people in selecting office equipment and entertainment equipment. The online live broadcast platform integrates traditional offline social games, tea houses and talk show theatres, and a live broadcast host who enters the network live broadcast platform can display the online social games in real time through the live broadcast platform, and obtains corresponding incomes through gift gifts of live broadcast audiences. According to published data, the scale of the Chinese network live broadcast users reaches 4.33 hundred million, and the Chinese network live broadcast users account for 50.7% of the whole netizen, and the newly-increased number of hosts on the head live broadcast platform reaches more than 200 ten thousand in 2018. The live broadcast industry forms a perfect industrial chain combining software and hardware, and the agency of human beings defines a live broadcast salesman as one of the new work types in 7 months in 2020. The rapid increase of the number of network anchor and the rapid development of the live broadcast platform bring about the increase of live broadcast times and the storm of the total live broadcast duration of the platform.
Network hosting or to attract more viewing traffic at the live broadcast room or affected by its own behavioral habits, may perform some induced adverse actions at the live broadcast room, such as: smoking, self-disability, abuse, etc., can seriously impair physical and mental health of adolescent viewers if such behavior is mimicked by the adolescent viewers. These actions are mixed in a conventional continuous action sequence, and have a problem that the duration is long and short, and the actions are difficult to be found. Aiming at the good and bad live broadcast content, the traditional small-sized live broadcast platform realizes the identification and examination of the induced bad behavior of the anchor through an irregular patrol mechanism of a live broadcast platform manager and a report mechanism of a live broadcast audience.
However, in the face of rapid growth of live broadcast times and live broadcast duration, the conventional induced bad behavior recognition method has great dependence on the demand of a live broadcast platform patrol manager, and aggravates the operation burden of a live broadcast platform. Meanwhile, the traditional manual auditing mechanism has weaker capability of identifying part of details and lower accuracy of identifying illegal behaviors. From the aspect of recognition efficiency of illegal videos, in the process of recognizing induced bad behaviors in videos, a recognizer needs to watch a whole video, and needs to watch and judge video fragments with partial unclear judgment repeatedly, so that the problem of low efficiency is easily generated. In addition, aiming at the manually identified illegal video content, the platform provides the function of illegal complaints for the anchor in order to prevent manual misjudgment, so that the conditions of filtering out flow of air, escape penalties and the like of the anchor and an administrator can exist.
Therefore, a method with high recognition accuracy is needed to detect whether there is bad inducible behavior in the live broadcast process.
Disclosure of Invention
The invention aims to solve the problems of low identification speed and low accuracy of the existing detection method and provides a detection system and method for induced bad behaviors in a live broadcast process.
In order to achieve the above purpose, the invention is implemented according to the following technical scheme:
a detection system for induced adverse behavior in a live broadcast process, comprising:
the output end of the video set processing module is connected with the input end of the spatial feature processing module, and the video set processing module is used for processing video set contents, including obtaining illegal induction bad behavior video cases stored in a live broadcast platform database, and capturing real-time live broadcast contents to be stored as videos to be identified; dividing the video cases confirmed to be the induced bad behaviors, and marking each segmented video according to the rule-breaking type labels; dividing the long-duration video to be identified into a plurality of sections of short-duration videos according to an equal-length method, naming the divided short-duration videos according to a unified format, and ensuring the continuity and readability of a plurality of sectional videos;
the output end of the spatial feature processing module is respectively connected with the input ends of the temporal feature processing module and the fusion module, and the spatial feature processing module is used for carrying out video single frame interception RGB single frame images on the processed short-duration video, extracting spatial features from the intercepted RGB single frame images, inputting the spatial features into an induction bad behavior recognition model aiming at the spatial features and outputting a prediction result;
the output end of the time feature processing module is connected with the input end of the fusion module, and the time feature processing module is used for intercepting calculation between two RGB single-frame images with adjacent time sequences, obtaining instantaneous optical flow information through calculation and synthesizing an optical flow graph; extracting time features from the synthesized optical flow diagram, inputting the time features into an induced bad behavior recognition model aiming at the time features, and outputting a prediction result;
and the fusion module is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the spatial characteristics with the prediction result of the induction bad behavior recognition model aiming at the time characteristics to obtain data fused with the spatial characteristics and the time characteristics, and classifying the fused data to obtain the prediction result of the segmented video. After the prediction is completed on all the segmented videos, the prediction results of the segmented videos are fused and calculated to obtain a final prediction result, and the final prediction result is a long-duration video identification result obtained from the live broadcast server.
Further, the spatial feature processing module includes:
the input end of the RGB single-frame image intercepting sub-module is connected with the output end of the video set processing module, and the output end of the RGB single-frame image intercepting sub-module is respectively connected with the input ends of the spatial feature model processing sub-module and the temporal feature processing module; the RGB single-frame image intercepting submodule is used for intercepting an RGB single-frame image of a video single frame of the processed short-duration video;
the output end of the spatial feature model processing sub-module is connected with the input end of the fusion module, and the spatial feature model processing sub-module is used for extracting spatial features from the intercepted RGB single-frame image, and then inputting the spatial features into the induction bad behavior recognition model aiming at the spatial features, and outputting a prediction result.
Further, the time feature processing module includes:
the input end of the optical flow diagram synthesis submodule is connected with the output end of the RGB single-frame diagram interception submodule, the output end of the optical flow diagram synthesis submodule is connected with the input end of the time characteristic model processing submodule, and the optical flow diagram synthesis submodule is used for calculating between two adjacent RGB single-frame diagrams with time sequences obtained by interception in a video to obtain an instantaneous optical flow information synthesis optical flow diagram;
the output end of the time feature model processing sub-module is connected with the input end of the fusion module, and the time feature model processing sub-module is used for extracting time features from the synthesized optical flow graph, and then inputting the time features into the induction bad behavior recognition model aiming at the time features, and outputting a prediction result.
Further, the fusion module includes:
the input end of the space-time characteristic fusion submodule is respectively connected with the output ends of the space characteristic model processing submodule and the time characteristic model processing submodule, and the space-time characteristic fusion submodule is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the space characteristic with the prediction result of the induction bad behavior recognition model aiming at the time characteristic to obtain data fused with the space characteristic and the time characteristic, and the fused data is subjected to classification processing to obtain a prediction result of a segmented video;
the prediction result fusion submodule is used for carrying out fusion calculation on the prediction results of the segmented videos after all the segmented videos are predicted, so as to obtain a final prediction result, wherein the final prediction result is the identification result of the long-duration video obtained from the live broadcast server.
Further, the original model of the induced bad behavior recognition model aiming at the spatial characteristics and the original model of the induced bad behavior recognition model aiming at the temporal characteristics are both convolutional neural network models ResNet152.
In addition, the invention also provides a method for detecting the induced bad behavior in the live broadcast process, which utilizes the system for detecting the induced bad behavior in the live broadcast process to detect the induced bad behavior in the live broadcast process, and comprises the following steps:
step 1: extracting and processing the violation video cases stored in the live broadcast platform video database, selecting a target video to be divided into a plurality of short-time long videos containing the violation inducibility behavior, and recording the type label of the violation inducibility behavior;
step 2: processing the video to obtain an RGB single frame image and an optical flow image of the video;
step 3: training a model for identifying spatial features and a model for identifying time features by using space-time features in an RGB single-frame image and an optical flow image respectively to obtain an induced bad behavior identification model for the spatial features and an induced bad behavior identification model for the time features;
step 4: acquiring live video segments, acquiring a real-time live broadcast buffer memory from a live broadcast platform server, and cutting the live broadcast buffer memory into live video segments with a multi-segment duration of 2-3 seconds;
step 5: repeating the content of the step 2 aiming at the live video segment cut in the step 4 to obtain an RGB single-frame image and an optical flow image of the live video segment;
step 6: randomly selecting an RGB single frame image obtained in the step 5, putting the RGB single frame image into the induction bad behavior recognition model aiming at the space characteristics obtained in the step 3, and outputting a prediction result;
step 7: putting the optical flow diagram obtained in the step 5 into the induced bad behavior recognition model aiming at the time characteristics obtained in the step 3, and outputting a prediction result;
step 8: carrying out data fusion on the data obtained in the step 6 and the step 7, fusing the two results through an average method, outputting the fused result, judging the fused result, and obtaining a prediction result of a certain video segment;
step 9: and fusing prediction results of the multiple segments of video segments divided by the long-duration video, and if the prediction result of at least one segment of video segment is 'bad behavior', judging that the current video to be identified has inductivity bad behavior.
Further, the step 2 specifically includes:
step 2.1: the method comprises the steps of obtaining RGB single-frame images of video segments, extracting frames of the video segments, and extracting all RGB single-frame images contained in the video according to video frame rate characteristics;
step 2.2: performing optical flow information processing on the RGB single-frame images, and synthesizing an optical flow image through calculation between two adjacent single-frame images;
step 2.3: and processing the obtained RGB single frame image and optical flow image, and storing the same type of induced bad behaviors together.
Further, the step 3 specifically includes:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: training a ResNet152 model in a targeted manner by using the processed RGB single frame image and the marked video tag, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics;
step 3.3: the ResNet152 model is trained in a targeted mode by using the processed light flow graph and the processed video label, training parameters in the training process are adjusted, the model is updated to achieve the best model identification accuracy, and the obtained induced bad behavior identification model aiming at time characteristics is stored.
Compared with the prior art, the method has the advantages that the long video is divided into a plurality of short-duration segments, the space-time characteristics are fused, the recognition results of the segmented videos are fused, the mixed inductivity bad behaviors in the conventional continuous action sequence are ensured to be recognized timely and effectively, and the recognition accuracy in a complex state is greatly improved.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a system block diagram according to an embodiment of the present invention;
FIG. 3 is a flowchart of a spatial stream training method according to an embodiment of the present invention;
fig. 4 is a flowchart of a time-stream training method according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. The specific embodiments described herein are for purposes of illustration only and are not intended to limit the invention.
In order to meet the supervision requirement of the network live broadcast platform on increasing live broadcast times, a set of detection service aiming at induced bad behaviors in live broadcast videos, which is faster in recognition speed and higher in recognition accuracy, needs to be improved in recognition efficiency through an informatization tool, and reduces dependence on manual auditing. The apparent characteristics contained in the video image frames are obtained simply through the RGB single-frame images, so that whether the live video content has bad inducibility behavior or not can be judged, and the problem of low recognition accuracy can be generated. The video is a continuous frame set with time characteristics, besides the apparent characteristics provided by a single frame RGB picture, the video provides additional time characteristic information, namely the motion information of an object, and the motion information of the object can be obtained by means of the optical flow information stored in the optical flow diagram. The optical flow of an image can be divided into an X-direction, which contains the components of the displacement vector field in the horizontal direction, and a Y-direction, which contains the components of the displacement vector field in the vertical direction. The optical flow diagrams store the motion information in two different directions separately, and the optical flow diagrams in two different directions can be obtained through calculation between two adjacent frames of single-frame diagrams.
The space-time feature is obtained by respectively inputting the obtained RGB single frame image and optical flow image into a convolutional neural network for feature extraction, and obtaining the space feature on the RGB single frame and the time feature contained in the optical flow image. The space-time feature combination considers the prediction results obtained according to the time feature and the prediction results obtained according to the space feature, the combination method adopts an average fusion method, the two obtained prediction results are summed and then averaged to obtain fusion data, the fusion data is compared with a preset value, and a final judgment result is output.
The duration of a video to be identified may be long, a plurality of actions may be included in the video with long duration, and how to accurately and efficiently identify bad inducibility behaviors in a continuous action sequence is a key of the embodiment of the invention.
Specifically, as shown in fig. 1, the embodiment describes in detail a detection system for an induced bad behavior in a live broadcast process, and the modules cooperate with each other to realize the detection work of the induced bad behavior in the live broadcast process; the detection system for induced bad behavior in the live broadcast process of the embodiment includes:
the output end of the video set processing module is connected with the input end of the spatial feature processing module, and the video set processing module is used for processing video set contents, including obtaining illegal induction bad behavior video cases stored in a live broadcast platform database, and capturing real-time live broadcast contents to be stored as videos to be identified; dividing the video cases confirmed to be the induced bad behaviors, and marking each segmented video according to the rule-breaking type labels; dividing the long-duration video to be identified into a plurality of sections of short-duration videos according to an equal-length method, naming the divided short-duration videos according to a unified format, and ensuring the continuity and readability of a plurality of sectional videos;
the spatial feature processing module comprises:
the input end of the RGB single-frame image intercepting sub-module is connected with the output end of the video set processing module, and the output end of the RGB single-frame image intercepting sub-module is respectively connected with the input ends of the spatial feature model processing sub-module and the temporal feature processing module; the RGB single-frame image intercepting submodule is used for intercepting an RGB single-frame image of a video single frame of the processed short-duration video;
the output end of the spatial feature model processing sub-module is connected with the input end of the fusion module, and the spatial feature model processing sub-module is used for extracting spatial features from the intercepted RGB single-frame image, inputting the extracted spatial features into an induction bad behavior recognition model aiming at the spatial features and outputting a prediction result;
the time feature processing module comprises:
the input end of the optical flow diagram synthesis submodule is connected with the output end of the RGB single-frame diagram interception submodule, the output end of the optical flow diagram synthesis submodule is connected with the input end of the time characteristic model processing submodule, and the optical flow diagram synthesis submodule is used for calculating between two frames of RGB single-frame diagrams adjacent in time sequence to obtain an instantaneous optical flow information synthesis optical flow diagram;
the output end of the time feature model processing submodule is connected with the input end of the fusion module, and the time feature model processing submodule is used for extracting time features from the synthesized optical flow graph, inputting the time features into the induction bad behavior recognition model aiming at the time features and outputting a prediction result;
the fusion module comprises:
the input end of the space-time characteristic fusion submodule is respectively connected with the output ends of the space characteristic model processing submodule and the time characteristic model processing submodule, and the space-time characteristic fusion submodule is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the space characteristic with the prediction result of the induction bad behavior recognition model aiming at the time characteristic to obtain data fused with the space characteristic and the time characteristic, and the fused data is subjected to classification processing to obtain a prediction result of a segmented video;
the prediction result fusion submodule is used for carrying out fusion calculation on the prediction results of the plurality of segmented videos after the prediction of all the segmented videos is completed, so as to obtain a final prediction result, wherein the final prediction result is the identification result of the long-duration video obtained from the live broadcast server.
The method for detecting the induced bad behavior in the live broadcast process by using the system for detecting the induced bad behavior in the live broadcast process according to the embodiment is shown in fig. 2, and specifically comprises the following steps:
step 1: extracting and processing the violation video cases stored in the live broadcast platform video database, selecting a target video to be divided into a plurality of short-time long videos containing the violation inducibility behavior, and recording the type label of the violation inducibility behavior;
step 2: processing the video to obtain an RGB single frame image and an optical flow image of the video:
step 2.1: the method comprises the steps of obtaining RGB single-frame images of video segments, extracting frames of the video segments, and extracting all RGB single-frame images contained in the video according to video frame rate characteristics;
step 2.2: performing optical flow information processing on the RGB single-frame images, and synthesizing an optical flow image through calculation between two adjacent single-frame images;
step 2.3: processing the obtained RGB single frame image and optical flow image, and storing the same type of induced bad behaviors together;
step 3: training a model for identifying spatial features and a model for identifying temporal features by using space-time features in an RGB single-frame image and an optical flow image respectively to obtain an induced bad behavior identification model for the spatial features and an induced bad behavior identification model for the temporal features, as shown in fig. 3 and 4:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: training the ResNet152 model in a targeted manner by using the processed RGB single frame image and the marked video label, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics, as shown in figure 3;
step 3.3: training the ResNet152 model in a targeted manner by using the processed light flow graph and the processed video tag, adjusting training parameters of the training process, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the time characteristic, as shown in fig. 4;
step 4: acquiring live video segments, acquiring a real-time live broadcast buffer memory from a live broadcast platform server, and cutting the live broadcast buffer memory into live video segments with a multi-segment duration of 2-3 seconds;
step 5: repeating the content of the step 2 aiming at the live video segment cut in the step 4 to obtain an RGB single-frame image and an optical flow image of the live video segment;
step 6: randomly selecting an RGB single frame image obtained in the step 5, putting the RGB single frame image into the induction bad behavior recognition model aiming at the space characteristics obtained in the step 3, and outputting a prediction result;
step 7: putting the optical flow diagram obtained in the step 5 into the induced bad behavior recognition model aiming at the time characteristics obtained in the step 3, and outputting a prediction result;
step 8: carrying out data fusion on the data obtained in the step 6 and the step 7, fusing the two results through an average value fusion method, outputting the fused result, judging the fusion result, and obtaining a prediction result of a certain video segment;
step 9: and fusing prediction results of the multiple video segments divided by the long-duration video, and if the prediction result of at least one video segment is ' bad behavior ', judging that the current video to be identified has inductivity bad behavior '.
In summary, the invention divides the long video into a plurality of short time segments, and through space-time feature fusion processing and fusion of recognition results of the segmented video, ensures that the hybrid inductivity bad behavior in the conventional continuous action sequence can be timely and effectively recognized, and greatly improves the recognition accuracy in a complex state.
The technical scheme of the invention is not limited to the specific embodiment, and all technical modifications made according to the technical scheme of the invention fall within the protection scope of the invention.

Claims (6)

1. A detection system for induced adverse behavior in a live broadcast process, comprising:
the output end of the video set processing module is connected with the input end of the spatial feature processing module, and the video set processing module is used for processing video set contents, including obtaining illegal induction bad behavior video cases stored in a live broadcast platform database, and capturing real-time live broadcast contents to be stored as videos to be identified; dividing the video cases confirmed to be the induced bad behaviors, and marking each segmented video according to the rule-breaking type labels; dividing the long-duration video to be identified into a plurality of sections of short-duration videos according to an equal-length method, naming the divided short-duration videos according to a unified format, and ensuring the continuity and readability of a plurality of sectional videos;
the output end of the spatial feature processing module is respectively connected with the input ends of the temporal feature processing module and the fusion module, and the spatial feature processing module is used for carrying out video single frame interception RGB single frame images on the processed short-duration video segments, extracting spatial features from the intercepted RGB single frame images, inputting the spatial features into an induction bad behavior recognition model aiming at the spatial features and outputting a prediction result;
the output end of the time feature processing module is connected with the input end of the fusion module, and the time feature processing module is used for calculating between two adjacent frames of RGB single frames of the intercepted time sequence to obtain instantaneous optical flow information, synthesizing an optical flow graph, extracting time features from the optical flow graph obtained by synthesis, inputting the time features into an induced bad behavior recognition model aiming at the time features, and outputting a prediction result;
the fusion module is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the spatial features with the prediction result of the induction bad behavior recognition model aiming at the time features to obtain data fused with the spatial features and the time features, classifying the fused data to obtain a prediction result of one segmented video, and carrying out fusion calculation on the prediction results of a plurality of segmented videos after the prediction of all the segmented videos is completed to obtain a final prediction result, wherein the final prediction result is a long-duration video recognition result obtained from a live broadcast server;
the original models of the induced bad behavior recognition model aiming at the spatial characteristics and the induced bad behavior recognition model aiming at the time characteristics are convolutional neural network model ResNet152; the specific processes of the induction bad behavior recognition model aiming at the spatial characteristics and the induction bad behavior recognition model aiming at the time characteristics are as follows:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: training a ResNet152 model in a targeted manner by using the processed RGB single frame image and the marked video tag, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics;
step 3.3: the ResNet152 model is trained in a targeted mode by using the processed light flow graph and the processed video label, training parameters in the training process are adjusted, the model is updated to achieve the best model identification accuracy, and the obtained induced bad behavior identification model aiming at time characteristics is stored.
2. The system for detecting induced adverse behavior in a live broadcast procedure according to claim 1, wherein the spatial signature processing module comprises:
the input end of the RGB single-frame image intercepting sub-module is connected with the output end of the video set processing module, and the output end of the RGB single-frame image intercepting sub-module is respectively connected with the input ends of the spatial feature model processing sub-module and the temporal feature processing module; the RGB single-frame image interception submodule is used for carrying out video single-frame interception RGB single-frame image operation on the processed short-duration video;
the output end of the spatial feature model processing sub-module is connected with the input end of the fusion module, and the spatial feature model processing sub-module is used for extracting spatial features from the intercepted RGB single-frame image, and then inputting the spatial features into the induction bad behavior recognition model aiming at the spatial features, and outputting a prediction result.
3. The system for detecting induced adverse behavior in a live broadcast procedure according to claim 2, wherein the temporal feature processing module comprises:
the input end of the optical flow diagram synthesizing sub-module is connected with the output end of the RGB single-frame diagram intercepting sub-module, the output end of the optical flow diagram synthesizing sub-module is connected with the input end of the time characteristic model processing sub-module, and the optical flow diagram synthesizing sub-module is used for calculating between two RGB single-frame diagrams with adjacent intercepting time sequences, obtaining instantaneous optical flow information through calculation, and synthesizing an optical flow diagram;
the output end of the time feature model processing sub-module is connected with the input end of the fusion module, and the time feature model processing sub-module is used for extracting time features from the synthesized optical flow graph, and then inputting the time features into the induction bad behavior recognition model aiming at the time features, and outputting a prediction result.
4. The system for detecting induced adverse behavior in a live procedure according to claim 3, wherein the fusion module comprises:
the input end of the space-time characteristic fusion submodule is respectively connected with the output ends of the space characteristic model processing submodule and the time characteristic model processing submodule, and the space-time characteristic fusion submodule is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the space characteristic with the prediction result of the induction bad behavior recognition model aiming at the time characteristic to obtain data fused with the space characteristic and the time characteristic, and the fused data is subjected to classification processing to obtain a prediction result of a segmented video;
the prediction result fusion submodule is used for carrying out fusion calculation on the prediction results of the segmented videos after the prediction of all the segmented videos is completed, so as to obtain a final prediction result, wherein the final prediction result is a long-duration video identification result obtained from the live broadcast server.
5. A method for detecting induced bad behavior in a live broadcast process, characterized in that the method for detecting induced bad behavior in a live broadcast process by using the system for detecting induced bad behavior in a live broadcast process according to claim 4 comprises the following steps:
step 1: extracting and processing the violation video cases stored in the live broadcast platform video database, selecting a target video to be divided into a plurality of short-time long videos containing the violation inducibility behavior, and recording the type label of the violation inducibility behavior;
step 2: processing the video to obtain an RGB single frame image and an optical flow image of the video;
step 3: training a model for identifying spatial features and a model for identifying time features by using space-time features in an RGB single-frame image and an optical flow image respectively to obtain an induced bad behavior identification model for the spatial features and an induced bad behavior identification model for the time features; the original models of the induced bad behavior recognition model aiming at the spatial characteristics and the induced bad behavior recognition model aiming at the time characteristics are convolutional neural network model ResNet152; the specific processes of the induction bad behavior recognition model aiming at the spatial characteristics and the induction bad behavior recognition model aiming at the time characteristics are as follows:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: training a ResNet152 model in a targeted manner by using the processed RGB single frame image and the marked video tag, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics;
step 3.3: training the ResNet152 model in a targeted manner by using the processed light flow graph and the processed video tag, adjusting training parameters in the training process, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the time characteristic;
step 4: acquiring live video segments, acquiring a real-time live broadcast buffer memory from a live broadcast platform server, and cutting the live broadcast buffer memory into live video segments with a multi-segment duration of 2-3 seconds;
step 5: repeating the content of the step 2 aiming at the live video segment cut in the step 4 to obtain an RGB single-frame image and an optical flow image of the live video segment;
step 6: randomly selecting an RGB single frame image obtained in the step 5, putting the RGB single frame image into the induction bad behavior recognition model aiming at the space characteristics obtained in the step 3, and outputting a prediction result;
step 7: putting the optical flow diagram obtained in the step 5 into the induced bad behavior recognition model aiming at the time characteristics obtained in the step 3, and outputting a prediction result;
step 8: carrying out data fusion on the data obtained in the step 6 and the step 7, fusing two results by an average value fusion method, outputting the fused results, and carrying out classification judgment on the fused results to obtain a prediction result of a certain video segment;
step 9: and fusing prediction results of the multiple video segments divided by the long-duration video, and if the prediction result of at least one video segment is ' bad behavior ', judging that the current video to be identified has inductivity bad behavior '.
6. The method for detecting induced adverse behavior in a live broadcast process according to claim 5, wherein the step 2 specifically comprises:
step 2.1: the method comprises the steps of obtaining an RGB single frame image of a video segment, extracting frames of the video segment, and extracting all RGB single frames contained in the video according to video frame rate characteristics;
step 2.2: performing optical flow information processing on the RGB single-frame images, and synthesizing an optical flow chart through calculation between two adjacent RGB single-frame images;
step 2.3: and processing the obtained RGB single frame images and the optical flow images, and storing the related RGB single frame images and the optical flow images of the same type of induced bad behaviors together.
CN202011279463.8A 2020-11-16 2020-11-16 Detection system and method for inductivity bad behavior in live broadcast process Active CN112380999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011279463.8A CN112380999B (en) 2020-11-16 2020-11-16 Detection system and method for inductivity bad behavior in live broadcast process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011279463.8A CN112380999B (en) 2020-11-16 2020-11-16 Detection system and method for inductivity bad behavior in live broadcast process

Publications (2)

Publication Number Publication Date
CN112380999A CN112380999A (en) 2021-02-19
CN112380999B true CN112380999B (en) 2023-08-01

Family

ID=74585326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011279463.8A Active CN112380999B (en) 2020-11-16 2020-11-16 Detection system and method for inductivity bad behavior in live broadcast process

Country Status (1)

Country Link
CN (1) CN112380999B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160273A (en) * 2021-03-25 2021-07-23 常州工学院 Intelligent monitoring video segmentation method based on multi-target tracking

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550699A (en) * 2015-12-08 2016-05-04 北京工业大学 CNN-based video identification and classification method through time-space significant information fusion
CN110909658A (en) * 2019-11-19 2020-03-24 北京工商大学 Method for recognizing human body behaviors in video based on double-current convolutional network
CN110969066A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Live video identification method and device and electronic equipment
CN111462183A (en) * 2020-03-31 2020-07-28 山东大学 Behavior identification method and system based on attention mechanism double-current network
CN111709351A (en) * 2020-06-11 2020-09-25 江南大学 Three-branch network behavior identification method based on multipath space-time characteristic reinforcement fusion
CN111783540A (en) * 2020-06-01 2020-10-16 河海大学 Method and system for recognizing human body behaviors in video

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550699A (en) * 2015-12-08 2016-05-04 北京工业大学 CNN-based video identification and classification method through time-space significant information fusion
CN110969066A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Live video identification method and device and electronic equipment
CN110909658A (en) * 2019-11-19 2020-03-24 北京工商大学 Method for recognizing human body behaviors in video based on double-current convolutional network
CN111462183A (en) * 2020-03-31 2020-07-28 山东大学 Behavior identification method and system based on attention mechanism double-current network
CN111783540A (en) * 2020-06-01 2020-10-16 河海大学 Method and system for recognizing human body behaviors in video
CN111709351A (en) * 2020-06-11 2020-09-25 江南大学 Three-branch network behavior identification method based on multipath space-time characteristic reinforcement fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Deep Optical Flow Feature Fusion Based on 3D Convolutional Networks for Video Action Recognition;Tongwei Lu;《 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation》;1077-1080 *
Human Activities Recognition from Videos Based on Compound Deep Neural Network;zhijian Liu等;《 The 10th International Conference on Computer Engineering and Networks 》;314-326 *
基于深度学习的视频行为识别方法研究;杨斌;《CNKI中国优秀硕士毕业论文全文库(信息科技辑)》(第12期);I138-566 *
基于深度学习的视频行为识别算法;罗未一;《CNKI中国优秀硕士毕业论文全文库(信息科技辑)》(第7期);I138-909 *

Also Published As

Publication number Publication date
CN112380999A (en) 2021-02-19

Similar Documents

Publication Publication Date Title
AU2016291690B2 (en) Prediction of future views of video segments to optimize system resource utilization
US20170289624A1 (en) Multimodal and real-time method for filtering sensitive media
CN110796098B (en) Method, device, equipment and storage medium for training and auditing content auditing model
CN101535941B (en) Method and device for adaptive video presentation
Janowski et al. Quality assessment for a visual and automatic license plate recognition
Dou et al. Edge computing-enabled deep learning for real-time video optimization in IIoT
CN113383362B (en) User identification method and related product
CN111918130A (en) Video cover determining method and device, electronic equipment and storage medium
CN109033476B (en) Intelligent spatio-temporal data event analysis method based on event cue network
CN110692251B (en) Method and system for combining digital video content
CN110856039A (en) Video processing method and device and storage medium
CN115186303B (en) Financial signature safety management method and system based on big data cloud platform
KR20160103557A (en) Facilitating television based interaction with social networking tools
GB2550858A (en) A method, an apparatus and a computer program product for video object segmentation
CN111553328A (en) Video monitoring method, system and readable storage medium based on block chain technology and deep learning
CN112380999B (en) Detection system and method for inductivity bad behavior in live broadcast process
CN110418148B (en) Video generation method, video generation device and readable storage medium
CN111914649A (en) Face recognition method and device, electronic equipment and storage medium
CN113920585A (en) Behavior recognition method and device, equipment and storage medium
CN111565303B (en) Video monitoring method, system and readable storage medium based on fog calculation and deep learning
CN114727093B (en) Data analysis method and device, electronic equipment and computer storage medium
CN112560552A (en) Video classification method and device
CN113596354B (en) Image processing method, image processing device, computer equipment and storage medium
CN114842411A (en) Group behavior identification method based on complementary space-time information modeling
CN114550300A (en) Video data analysis method and device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant