CN112380999B - Detection system and method for inductivity bad behavior in live broadcast process - Google Patents
Detection system and method for inductivity bad behavior in live broadcast process Download PDFInfo
- Publication number
- CN112380999B CN112380999B CN202011279463.8A CN202011279463A CN112380999B CN 112380999 B CN112380999 B CN 112380999B CN 202011279463 A CN202011279463 A CN 202011279463A CN 112380999 B CN112380999 B CN 112380999B
- Authority
- CN
- China
- Prior art keywords
- video
- time
- bad behavior
- model
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 54
- 230000008569 process Effects 0.000 title claims abstract description 28
- 238000001514 detection method Methods 0.000 title claims abstract description 13
- 230000006399 behavior Effects 0.000 claims abstract description 104
- 238000012545 processing Methods 0.000 claims abstract description 73
- 230000004927 fusion Effects 0.000 claims abstract description 36
- 230000002123 temporal effect Effects 0.000 claims abstract description 10
- 230000003287 optical effect Effects 0.000 claims description 53
- 230000006698 induction Effects 0.000 claims description 28
- 238000010586 diagram Methods 0.000 claims description 27
- 238000012549 training Methods 0.000 claims description 27
- 238000004364 calculation method Methods 0.000 claims description 11
- 230000015572 biosynthetic process Effects 0.000 claims description 9
- 238000013527 convolutional neural network Methods 0.000 claims description 9
- 238000003786 synthesis reaction Methods 0.000 claims description 9
- 230000002194 synthesizing effect Effects 0.000 claims description 9
- 230000002411 adverse Effects 0.000 claims description 7
- 230000010365 information processing Effects 0.000 claims description 3
- 238000007500 overflow downdraw method Methods 0.000 claims description 3
- 230000009471 action Effects 0.000 description 9
- 230000007246 mechanism Effects 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 238000006073 displacement reaction Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 241001122767 Theaceae Species 0.000 description 1
- 230000003542 behavioural effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 239000012634 fragment Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 230000001939 inductive effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 230000004630 mental health Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000391 smoking effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/254—Fusion techniques of classification results, e.g. of results related to same input data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/46—Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/49—Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computational Linguistics (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a detection system and a detection method for inductivity bad behaviors in a live broadcast process, wherein the detection system comprises a video set processing module, a spatial feature processing module, a temporal feature processing module and a fusion module.
Description
Technical Field
The invention relates to the field of computer vision recognition technology and convolutional neural networks, in particular to a system and a method for detecting inductivity bad behaviors in a live broadcast process, which are used for detecting whether the inductivity bad behaviors are mixed in a continuous action sequence in the live broadcast process.
Background
With the development of information technology and the popularization of intelligent hardware, especially the popularization of mobile intelligent terminal equipment, smart phones and personal palm computers have gradually become the best choice for people in selecting office equipment and entertainment equipment. The online live broadcast platform integrates traditional offline social games, tea houses and talk show theatres, and a live broadcast host who enters the network live broadcast platform can display the online social games in real time through the live broadcast platform, and obtains corresponding incomes through gift gifts of live broadcast audiences. According to published data, the scale of the Chinese network live broadcast users reaches 4.33 hundred million, and the Chinese network live broadcast users account for 50.7% of the whole netizen, and the newly-increased number of hosts on the head live broadcast platform reaches more than 200 ten thousand in 2018. The live broadcast industry forms a perfect industrial chain combining software and hardware, and the agency of human beings defines a live broadcast salesman as one of the new work types in 7 months in 2020. The rapid increase of the number of network anchor and the rapid development of the live broadcast platform bring about the increase of live broadcast times and the storm of the total live broadcast duration of the platform.
Network hosting or to attract more viewing traffic at the live broadcast room or affected by its own behavioral habits, may perform some induced adverse actions at the live broadcast room, such as: smoking, self-disability, abuse, etc., can seriously impair physical and mental health of adolescent viewers if such behavior is mimicked by the adolescent viewers. These actions are mixed in a conventional continuous action sequence, and have a problem that the duration is long and short, and the actions are difficult to be found. Aiming at the good and bad live broadcast content, the traditional small-sized live broadcast platform realizes the identification and examination of the induced bad behavior of the anchor through an irregular patrol mechanism of a live broadcast platform manager and a report mechanism of a live broadcast audience.
However, in the face of rapid growth of live broadcast times and live broadcast duration, the conventional induced bad behavior recognition method has great dependence on the demand of a live broadcast platform patrol manager, and aggravates the operation burden of a live broadcast platform. Meanwhile, the traditional manual auditing mechanism has weaker capability of identifying part of details and lower accuracy of identifying illegal behaviors. From the aspect of recognition efficiency of illegal videos, in the process of recognizing induced bad behaviors in videos, a recognizer needs to watch a whole video, and needs to watch and judge video fragments with partial unclear judgment repeatedly, so that the problem of low efficiency is easily generated. In addition, aiming at the manually identified illegal video content, the platform provides the function of illegal complaints for the anchor in order to prevent manual misjudgment, so that the conditions of filtering out flow of air, escape penalties and the like of the anchor and an administrator can exist.
Therefore, a method with high recognition accuracy is needed to detect whether there is bad inducible behavior in the live broadcast process.
Disclosure of Invention
The invention aims to solve the problems of low identification speed and low accuracy of the existing detection method and provides a detection system and method for induced bad behaviors in a live broadcast process.
In order to achieve the above purpose, the invention is implemented according to the following technical scheme:
a detection system for induced adverse behavior in a live broadcast process, comprising:
the output end of the video set processing module is connected with the input end of the spatial feature processing module, and the video set processing module is used for processing video set contents, including obtaining illegal induction bad behavior video cases stored in a live broadcast platform database, and capturing real-time live broadcast contents to be stored as videos to be identified; dividing the video cases confirmed to be the induced bad behaviors, and marking each segmented video according to the rule-breaking type labels; dividing the long-duration video to be identified into a plurality of sections of short-duration videos according to an equal-length method, naming the divided short-duration videos according to a unified format, and ensuring the continuity and readability of a plurality of sectional videos;
the output end of the spatial feature processing module is respectively connected with the input ends of the temporal feature processing module and the fusion module, and the spatial feature processing module is used for carrying out video single frame interception RGB single frame images on the processed short-duration video, extracting spatial features from the intercepted RGB single frame images, inputting the spatial features into an induction bad behavior recognition model aiming at the spatial features and outputting a prediction result;
the output end of the time feature processing module is connected with the input end of the fusion module, and the time feature processing module is used for intercepting calculation between two RGB single-frame images with adjacent time sequences, obtaining instantaneous optical flow information through calculation and synthesizing an optical flow graph; extracting time features from the synthesized optical flow diagram, inputting the time features into an induced bad behavior recognition model aiming at the time features, and outputting a prediction result;
and the fusion module is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the spatial characteristics with the prediction result of the induction bad behavior recognition model aiming at the time characteristics to obtain data fused with the spatial characteristics and the time characteristics, and classifying the fused data to obtain the prediction result of the segmented video. After the prediction is completed on all the segmented videos, the prediction results of the segmented videos are fused and calculated to obtain a final prediction result, and the final prediction result is a long-duration video identification result obtained from the live broadcast server.
Further, the spatial feature processing module includes:
the input end of the RGB single-frame image intercepting sub-module is connected with the output end of the video set processing module, and the output end of the RGB single-frame image intercepting sub-module is respectively connected with the input ends of the spatial feature model processing sub-module and the temporal feature processing module; the RGB single-frame image intercepting submodule is used for intercepting an RGB single-frame image of a video single frame of the processed short-duration video;
the output end of the spatial feature model processing sub-module is connected with the input end of the fusion module, and the spatial feature model processing sub-module is used for extracting spatial features from the intercepted RGB single-frame image, and then inputting the spatial features into the induction bad behavior recognition model aiming at the spatial features, and outputting a prediction result.
Further, the time feature processing module includes:
the input end of the optical flow diagram synthesis submodule is connected with the output end of the RGB single-frame diagram interception submodule, the output end of the optical flow diagram synthesis submodule is connected with the input end of the time characteristic model processing submodule, and the optical flow diagram synthesis submodule is used for calculating between two adjacent RGB single-frame diagrams with time sequences obtained by interception in a video to obtain an instantaneous optical flow information synthesis optical flow diagram;
the output end of the time feature model processing sub-module is connected with the input end of the fusion module, and the time feature model processing sub-module is used for extracting time features from the synthesized optical flow graph, and then inputting the time features into the induction bad behavior recognition model aiming at the time features, and outputting a prediction result.
Further, the fusion module includes:
the input end of the space-time characteristic fusion submodule is respectively connected with the output ends of the space characteristic model processing submodule and the time characteristic model processing submodule, and the space-time characteristic fusion submodule is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the space characteristic with the prediction result of the induction bad behavior recognition model aiming at the time characteristic to obtain data fused with the space characteristic and the time characteristic, and the fused data is subjected to classification processing to obtain a prediction result of a segmented video;
the prediction result fusion submodule is used for carrying out fusion calculation on the prediction results of the segmented videos after all the segmented videos are predicted, so as to obtain a final prediction result, wherein the final prediction result is the identification result of the long-duration video obtained from the live broadcast server.
Further, the original model of the induced bad behavior recognition model aiming at the spatial characteristics and the original model of the induced bad behavior recognition model aiming at the temporal characteristics are both convolutional neural network models ResNet152.
In addition, the invention also provides a method for detecting the induced bad behavior in the live broadcast process, which utilizes the system for detecting the induced bad behavior in the live broadcast process to detect the induced bad behavior in the live broadcast process, and comprises the following steps:
step 1: extracting and processing the violation video cases stored in the live broadcast platform video database, selecting a target video to be divided into a plurality of short-time long videos containing the violation inducibility behavior, and recording the type label of the violation inducibility behavior;
step 2: processing the video to obtain an RGB single frame image and an optical flow image of the video;
step 3: training a model for identifying spatial features and a model for identifying time features by using space-time features in an RGB single-frame image and an optical flow image respectively to obtain an induced bad behavior identification model for the spatial features and an induced bad behavior identification model for the time features;
step 4: acquiring live video segments, acquiring a real-time live broadcast buffer memory from a live broadcast platform server, and cutting the live broadcast buffer memory into live video segments with a multi-segment duration of 2-3 seconds;
step 5: repeating the content of the step 2 aiming at the live video segment cut in the step 4 to obtain an RGB single-frame image and an optical flow image of the live video segment;
step 6: randomly selecting an RGB single frame image obtained in the step 5, putting the RGB single frame image into the induction bad behavior recognition model aiming at the space characteristics obtained in the step 3, and outputting a prediction result;
step 7: putting the optical flow diagram obtained in the step 5 into the induced bad behavior recognition model aiming at the time characteristics obtained in the step 3, and outputting a prediction result;
step 8: carrying out data fusion on the data obtained in the step 6 and the step 7, fusing the two results through an average method, outputting the fused result, judging the fused result, and obtaining a prediction result of a certain video segment;
step 9: and fusing prediction results of the multiple segments of video segments divided by the long-duration video, and if the prediction result of at least one segment of video segment is 'bad behavior', judging that the current video to be identified has inductivity bad behavior.
Further, the step 2 specifically includes:
step 2.1: the method comprises the steps of obtaining RGB single-frame images of video segments, extracting frames of the video segments, and extracting all RGB single-frame images contained in the video according to video frame rate characteristics;
step 2.2: performing optical flow information processing on the RGB single-frame images, and synthesizing an optical flow image through calculation between two adjacent single-frame images;
step 2.3: and processing the obtained RGB single frame image and optical flow image, and storing the same type of induced bad behaviors together.
Further, the step 3 specifically includes:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: training a ResNet152 model in a targeted manner by using the processed RGB single frame image and the marked video tag, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics;
step 3.3: the ResNet152 model is trained in a targeted mode by using the processed light flow graph and the processed video label, training parameters in the training process are adjusted, the model is updated to achieve the best model identification accuracy, and the obtained induced bad behavior identification model aiming at time characteristics is stored.
Compared with the prior art, the method has the advantages that the long video is divided into a plurality of short-duration segments, the space-time characteristics are fused, the recognition results of the segmented videos are fused, the mixed inductivity bad behaviors in the conventional continuous action sequence are ensured to be recognized timely and effectively, and the recognition accuracy in a complex state is greatly improved.
Drawings
FIG. 1 is a flow chart of a method according to an embodiment of the present invention;
FIG. 2 is a system block diagram according to an embodiment of the present invention;
FIG. 3 is a flowchart of a spatial stream training method according to an embodiment of the present invention;
fig. 4 is a flowchart of a time-stream training method according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the following examples in order to make the objects, technical solutions and advantages of the present invention more apparent. The specific embodiments described herein are for purposes of illustration only and are not intended to limit the invention.
In order to meet the supervision requirement of the network live broadcast platform on increasing live broadcast times, a set of detection service aiming at induced bad behaviors in live broadcast videos, which is faster in recognition speed and higher in recognition accuracy, needs to be improved in recognition efficiency through an informatization tool, and reduces dependence on manual auditing. The apparent characteristics contained in the video image frames are obtained simply through the RGB single-frame images, so that whether the live video content has bad inducibility behavior or not can be judged, and the problem of low recognition accuracy can be generated. The video is a continuous frame set with time characteristics, besides the apparent characteristics provided by a single frame RGB picture, the video provides additional time characteristic information, namely the motion information of an object, and the motion information of the object can be obtained by means of the optical flow information stored in the optical flow diagram. The optical flow of an image can be divided into an X-direction, which contains the components of the displacement vector field in the horizontal direction, and a Y-direction, which contains the components of the displacement vector field in the vertical direction. The optical flow diagrams store the motion information in two different directions separately, and the optical flow diagrams in two different directions can be obtained through calculation between two adjacent frames of single-frame diagrams.
The space-time feature is obtained by respectively inputting the obtained RGB single frame image and optical flow image into a convolutional neural network for feature extraction, and obtaining the space feature on the RGB single frame and the time feature contained in the optical flow image. The space-time feature combination considers the prediction results obtained according to the time feature and the prediction results obtained according to the space feature, the combination method adopts an average fusion method, the two obtained prediction results are summed and then averaged to obtain fusion data, the fusion data is compared with a preset value, and a final judgment result is output.
The duration of a video to be identified may be long, a plurality of actions may be included in the video with long duration, and how to accurately and efficiently identify bad inducibility behaviors in a continuous action sequence is a key of the embodiment of the invention.
Specifically, as shown in fig. 1, the embodiment describes in detail a detection system for an induced bad behavior in a live broadcast process, and the modules cooperate with each other to realize the detection work of the induced bad behavior in the live broadcast process; the detection system for induced bad behavior in the live broadcast process of the embodiment includes:
the output end of the video set processing module is connected with the input end of the spatial feature processing module, and the video set processing module is used for processing video set contents, including obtaining illegal induction bad behavior video cases stored in a live broadcast platform database, and capturing real-time live broadcast contents to be stored as videos to be identified; dividing the video cases confirmed to be the induced bad behaviors, and marking each segmented video according to the rule-breaking type labels; dividing the long-duration video to be identified into a plurality of sections of short-duration videos according to an equal-length method, naming the divided short-duration videos according to a unified format, and ensuring the continuity and readability of a plurality of sectional videos;
the spatial feature processing module comprises:
the input end of the RGB single-frame image intercepting sub-module is connected with the output end of the video set processing module, and the output end of the RGB single-frame image intercepting sub-module is respectively connected with the input ends of the spatial feature model processing sub-module and the temporal feature processing module; the RGB single-frame image intercepting submodule is used for intercepting an RGB single-frame image of a video single frame of the processed short-duration video;
the output end of the spatial feature model processing sub-module is connected with the input end of the fusion module, and the spatial feature model processing sub-module is used for extracting spatial features from the intercepted RGB single-frame image, inputting the extracted spatial features into an induction bad behavior recognition model aiming at the spatial features and outputting a prediction result;
the time feature processing module comprises:
the input end of the optical flow diagram synthesis submodule is connected with the output end of the RGB single-frame diagram interception submodule, the output end of the optical flow diagram synthesis submodule is connected with the input end of the time characteristic model processing submodule, and the optical flow diagram synthesis submodule is used for calculating between two frames of RGB single-frame diagrams adjacent in time sequence to obtain an instantaneous optical flow information synthesis optical flow diagram;
the output end of the time feature model processing submodule is connected with the input end of the fusion module, and the time feature model processing submodule is used for extracting time features from the synthesized optical flow graph, inputting the time features into the induction bad behavior recognition model aiming at the time features and outputting a prediction result;
the fusion module comprises:
the input end of the space-time characteristic fusion submodule is respectively connected with the output ends of the space characteristic model processing submodule and the time characteristic model processing submodule, and the space-time characteristic fusion submodule is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the space characteristic with the prediction result of the induction bad behavior recognition model aiming at the time characteristic to obtain data fused with the space characteristic and the time characteristic, and the fused data is subjected to classification processing to obtain a prediction result of a segmented video;
the prediction result fusion submodule is used for carrying out fusion calculation on the prediction results of the plurality of segmented videos after the prediction of all the segmented videos is completed, so as to obtain a final prediction result, wherein the final prediction result is the identification result of the long-duration video obtained from the live broadcast server.
The method for detecting the induced bad behavior in the live broadcast process by using the system for detecting the induced bad behavior in the live broadcast process according to the embodiment is shown in fig. 2, and specifically comprises the following steps:
step 1: extracting and processing the violation video cases stored in the live broadcast platform video database, selecting a target video to be divided into a plurality of short-time long videos containing the violation inducibility behavior, and recording the type label of the violation inducibility behavior;
step 2: processing the video to obtain an RGB single frame image and an optical flow image of the video:
step 2.1: the method comprises the steps of obtaining RGB single-frame images of video segments, extracting frames of the video segments, and extracting all RGB single-frame images contained in the video according to video frame rate characteristics;
step 2.2: performing optical flow information processing on the RGB single-frame images, and synthesizing an optical flow image through calculation between two adjacent single-frame images;
step 2.3: processing the obtained RGB single frame image and optical flow image, and storing the same type of induced bad behaviors together;
step 3: training a model for identifying spatial features and a model for identifying temporal features by using space-time features in an RGB single-frame image and an optical flow image respectively to obtain an induced bad behavior identification model for the spatial features and an induced bad behavior identification model for the temporal features, as shown in fig. 3 and 4:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: training the ResNet152 model in a targeted manner by using the processed RGB single frame image and the marked video label, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics, as shown in figure 3;
step 3.3: training the ResNet152 model in a targeted manner by using the processed light flow graph and the processed video tag, adjusting training parameters of the training process, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the time characteristic, as shown in fig. 4;
step 4: acquiring live video segments, acquiring a real-time live broadcast buffer memory from a live broadcast platform server, and cutting the live broadcast buffer memory into live video segments with a multi-segment duration of 2-3 seconds;
step 5: repeating the content of the step 2 aiming at the live video segment cut in the step 4 to obtain an RGB single-frame image and an optical flow image of the live video segment;
step 6: randomly selecting an RGB single frame image obtained in the step 5, putting the RGB single frame image into the induction bad behavior recognition model aiming at the space characteristics obtained in the step 3, and outputting a prediction result;
step 7: putting the optical flow diagram obtained in the step 5 into the induced bad behavior recognition model aiming at the time characteristics obtained in the step 3, and outputting a prediction result;
step 8: carrying out data fusion on the data obtained in the step 6 and the step 7, fusing the two results through an average value fusion method, outputting the fused result, judging the fusion result, and obtaining a prediction result of a certain video segment;
step 9: and fusing prediction results of the multiple video segments divided by the long-duration video, and if the prediction result of at least one video segment is ' bad behavior ', judging that the current video to be identified has inductivity bad behavior '.
In summary, the invention divides the long video into a plurality of short time segments, and through space-time feature fusion processing and fusion of recognition results of the segmented video, ensures that the hybrid inductivity bad behavior in the conventional continuous action sequence can be timely and effectively recognized, and greatly improves the recognition accuracy in a complex state.
The technical scheme of the invention is not limited to the specific embodiment, and all technical modifications made according to the technical scheme of the invention fall within the protection scope of the invention.
Claims (6)
1. A detection system for induced adverse behavior in a live broadcast process, comprising:
the output end of the video set processing module is connected with the input end of the spatial feature processing module, and the video set processing module is used for processing video set contents, including obtaining illegal induction bad behavior video cases stored in a live broadcast platform database, and capturing real-time live broadcast contents to be stored as videos to be identified; dividing the video cases confirmed to be the induced bad behaviors, and marking each segmented video according to the rule-breaking type labels; dividing the long-duration video to be identified into a plurality of sections of short-duration videos according to an equal-length method, naming the divided short-duration videos according to a unified format, and ensuring the continuity and readability of a plurality of sectional videos;
the output end of the spatial feature processing module is respectively connected with the input ends of the temporal feature processing module and the fusion module, and the spatial feature processing module is used for carrying out video single frame interception RGB single frame images on the processed short-duration video segments, extracting spatial features from the intercepted RGB single frame images, inputting the spatial features into an induction bad behavior recognition model aiming at the spatial features and outputting a prediction result;
the output end of the time feature processing module is connected with the input end of the fusion module, and the time feature processing module is used for calculating between two adjacent frames of RGB single frames of the intercepted time sequence to obtain instantaneous optical flow information, synthesizing an optical flow graph, extracting time features from the optical flow graph obtained by synthesis, inputting the time features into an induced bad behavior recognition model aiming at the time features, and outputting a prediction result;
the fusion module is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the spatial features with the prediction result of the induction bad behavior recognition model aiming at the time features to obtain data fused with the spatial features and the time features, classifying the fused data to obtain a prediction result of one segmented video, and carrying out fusion calculation on the prediction results of a plurality of segmented videos after the prediction of all the segmented videos is completed to obtain a final prediction result, wherein the final prediction result is a long-duration video recognition result obtained from a live broadcast server;
the original models of the induced bad behavior recognition model aiming at the spatial characteristics and the induced bad behavior recognition model aiming at the time characteristics are convolutional neural network model ResNet152; the specific processes of the induction bad behavior recognition model aiming at the spatial characteristics and the induction bad behavior recognition model aiming at the time characteristics are as follows:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: training a ResNet152 model in a targeted manner by using the processed RGB single frame image and the marked video tag, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics;
step 3.3: the ResNet152 model is trained in a targeted mode by using the processed light flow graph and the processed video label, training parameters in the training process are adjusted, the model is updated to achieve the best model identification accuracy, and the obtained induced bad behavior identification model aiming at time characteristics is stored.
2. The system for detecting induced adverse behavior in a live broadcast procedure according to claim 1, wherein the spatial signature processing module comprises:
the input end of the RGB single-frame image intercepting sub-module is connected with the output end of the video set processing module, and the output end of the RGB single-frame image intercepting sub-module is respectively connected with the input ends of the spatial feature model processing sub-module and the temporal feature processing module; the RGB single-frame image interception submodule is used for carrying out video single-frame interception RGB single-frame image operation on the processed short-duration video;
the output end of the spatial feature model processing sub-module is connected with the input end of the fusion module, and the spatial feature model processing sub-module is used for extracting spatial features from the intercepted RGB single-frame image, and then inputting the spatial features into the induction bad behavior recognition model aiming at the spatial features, and outputting a prediction result.
3. The system for detecting induced adverse behavior in a live broadcast procedure according to claim 2, wherein the temporal feature processing module comprises:
the input end of the optical flow diagram synthesizing sub-module is connected with the output end of the RGB single-frame diagram intercepting sub-module, the output end of the optical flow diagram synthesizing sub-module is connected with the input end of the time characteristic model processing sub-module, and the optical flow diagram synthesizing sub-module is used for calculating between two RGB single-frame diagrams with adjacent intercepting time sequences, obtaining instantaneous optical flow information through calculation, and synthesizing an optical flow diagram;
the output end of the time feature model processing sub-module is connected with the input end of the fusion module, and the time feature model processing sub-module is used for extracting time features from the synthesized optical flow graph, and then inputting the time features into the induction bad behavior recognition model aiming at the time features, and outputting a prediction result.
4. The system for detecting induced adverse behavior in a live procedure according to claim 3, wherein the fusion module comprises:
the input end of the space-time characteristic fusion submodule is respectively connected with the output ends of the space characteristic model processing submodule and the time characteristic model processing submodule, and the space-time characteristic fusion submodule is used for fusing the obtained prediction result of the induction bad behavior recognition model aiming at the space characteristic with the prediction result of the induction bad behavior recognition model aiming at the time characteristic to obtain data fused with the space characteristic and the time characteristic, and the fused data is subjected to classification processing to obtain a prediction result of a segmented video;
the prediction result fusion submodule is used for carrying out fusion calculation on the prediction results of the segmented videos after the prediction of all the segmented videos is completed, so as to obtain a final prediction result, wherein the final prediction result is a long-duration video identification result obtained from the live broadcast server.
5. A method for detecting induced bad behavior in a live broadcast process, characterized in that the method for detecting induced bad behavior in a live broadcast process by using the system for detecting induced bad behavior in a live broadcast process according to claim 4 comprises the following steps:
step 1: extracting and processing the violation video cases stored in the live broadcast platform video database, selecting a target video to be divided into a plurality of short-time long videos containing the violation inducibility behavior, and recording the type label of the violation inducibility behavior;
step 2: processing the video to obtain an RGB single frame image and an optical flow image of the video;
step 3: training a model for identifying spatial features and a model for identifying time features by using space-time features in an RGB single-frame image and an optical flow image respectively to obtain an induced bad behavior identification model for the spatial features and an induced bad behavior identification model for the time features; the original models of the induced bad behavior recognition model aiming at the spatial characteristics and the induced bad behavior recognition model aiming at the time characteristics are convolutional neural network model ResNet152; the specific processes of the induction bad behavior recognition model aiming at the spatial characteristics and the induction bad behavior recognition model aiming at the time characteristics are as follows:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: training a ResNet152 model in a targeted manner by using the processed RGB single frame image and the marked video tag, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics;
step 3.3: training the ResNet152 model in a targeted manner by using the processed light flow graph and the processed video tag, adjusting training parameters in the training process, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the time characteristic;
step 4: acquiring live video segments, acquiring a real-time live broadcast buffer memory from a live broadcast platform server, and cutting the live broadcast buffer memory into live video segments with a multi-segment duration of 2-3 seconds;
step 5: repeating the content of the step 2 aiming at the live video segment cut in the step 4 to obtain an RGB single-frame image and an optical flow image of the live video segment;
step 6: randomly selecting an RGB single frame image obtained in the step 5, putting the RGB single frame image into the induction bad behavior recognition model aiming at the space characteristics obtained in the step 3, and outputting a prediction result;
step 7: putting the optical flow diagram obtained in the step 5 into the induced bad behavior recognition model aiming at the time characteristics obtained in the step 3, and outputting a prediction result;
step 8: carrying out data fusion on the data obtained in the step 6 and the step 7, fusing two results by an average value fusion method, outputting the fused results, and carrying out classification judgment on the fused results to obtain a prediction result of a certain video segment;
step 9: and fusing prediction results of the multiple video segments divided by the long-duration video, and if the prediction result of at least one video segment is ' bad behavior ', judging that the current video to be identified has inductivity bad behavior '.
6. The method for detecting induced adverse behavior in a live broadcast process according to claim 5, wherein the step 2 specifically comprises:
step 2.1: the method comprises the steps of obtaining an RGB single frame image of a video segment, extracting frames of the video segment, and extracting all RGB single frames contained in the video according to video frame rate characteristics;
step 2.2: performing optical flow information processing on the RGB single-frame images, and synthesizing an optical flow chart through calculation between two adjacent RGB single-frame images;
step 2.3: and processing the obtained RGB single frame images and the optical flow images, and storing the related RGB single frame images and the optical flow images of the same type of induced bad behaviors together.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011279463.8A CN112380999B (en) | 2020-11-16 | 2020-11-16 | Detection system and method for inductivity bad behavior in live broadcast process |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011279463.8A CN112380999B (en) | 2020-11-16 | 2020-11-16 | Detection system and method for inductivity bad behavior in live broadcast process |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112380999A CN112380999A (en) | 2021-02-19 |
CN112380999B true CN112380999B (en) | 2023-08-01 |
Family
ID=74585326
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011279463.8A Active CN112380999B (en) | 2020-11-16 | 2020-11-16 | Detection system and method for inductivity bad behavior in live broadcast process |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112380999B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113160273A (en) * | 2021-03-25 | 2021-07-23 | 常州工学院 | Intelligent monitoring video segmentation method based on multi-target tracking |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550699A (en) * | 2015-12-08 | 2016-05-04 | 北京工业大学 | CNN-based video identification and classification method through time-space significant information fusion |
CN110909658A (en) * | 2019-11-19 | 2020-03-24 | 北京工商大学 | Method for recognizing human body behaviors in video based on double-current convolutional network |
CN110969066A (en) * | 2018-09-30 | 2020-04-07 | 北京金山云网络技术有限公司 | Live video identification method and device and electronic equipment |
CN111462183A (en) * | 2020-03-31 | 2020-07-28 | 山东大学 | Behavior identification method and system based on attention mechanism double-current network |
CN111709351A (en) * | 2020-06-11 | 2020-09-25 | 江南大学 | Three-branch network behavior identification method based on multipath space-time characteristic reinforcement fusion |
CN111783540A (en) * | 2020-06-01 | 2020-10-16 | 河海大学 | Method and system for recognizing human body behaviors in video |
-
2020
- 2020-11-16 CN CN202011279463.8A patent/CN112380999B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105550699A (en) * | 2015-12-08 | 2016-05-04 | 北京工业大学 | CNN-based video identification and classification method through time-space significant information fusion |
CN110969066A (en) * | 2018-09-30 | 2020-04-07 | 北京金山云网络技术有限公司 | Live video identification method and device and electronic equipment |
CN110909658A (en) * | 2019-11-19 | 2020-03-24 | 北京工商大学 | Method for recognizing human body behaviors in video based on double-current convolutional network |
CN111462183A (en) * | 2020-03-31 | 2020-07-28 | 山东大学 | Behavior identification method and system based on attention mechanism double-current network |
CN111783540A (en) * | 2020-06-01 | 2020-10-16 | 河海大学 | Method and system for recognizing human body behaviors in video |
CN111709351A (en) * | 2020-06-11 | 2020-09-25 | 江南大学 | Three-branch network behavior identification method based on multipath space-time characteristic reinforcement fusion |
Non-Patent Citations (4)
Title |
---|
Deep Optical Flow Feature Fusion Based on 3D Convolutional Networks for Video Action Recognition;Tongwei Lu;《 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation》;1077-1080 * |
Human Activities Recognition from Videos Based on Compound Deep Neural Network;zhijian Liu等;《 The 10th International Conference on Computer Engineering and Networks 》;314-326 * |
基于深度学习的视频行为识别方法研究;杨斌;《CNKI中国优秀硕士毕业论文全文库(信息科技辑)》(第12期);I138-566 * |
基于深度学习的视频行为识别算法;罗未一;《CNKI中国优秀硕士毕业论文全文库(信息科技辑)》(第7期);I138-909 * |
Also Published As
Publication number | Publication date |
---|---|
CN112380999A (en) | 2021-02-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
AU2016291690B2 (en) | Prediction of future views of video segments to optimize system resource utilization | |
US20170289624A1 (en) | Multimodal and real-time method for filtering sensitive media | |
CN110796098B (en) | Method, device, equipment and storage medium for training and auditing content auditing model | |
CN101535941B (en) | Method and device for adaptive video presentation | |
Janowski et al. | Quality assessment for a visual and automatic license plate recognition | |
Dou et al. | Edge computing-enabled deep learning for real-time video optimization in IIoT | |
CN113383362B (en) | User identification method and related product | |
CN111918130A (en) | Video cover determining method and device, electronic equipment and storage medium | |
CN109033476B (en) | Intelligent spatio-temporal data event analysis method based on event cue network | |
CN110692251B (en) | Method and system for combining digital video content | |
CN110856039A (en) | Video processing method and device and storage medium | |
CN115186303B (en) | Financial signature safety management method and system based on big data cloud platform | |
KR20160103557A (en) | Facilitating television based interaction with social networking tools | |
GB2550858A (en) | A method, an apparatus and a computer program product for video object segmentation | |
CN111553328A (en) | Video monitoring method, system and readable storage medium based on block chain technology and deep learning | |
CN112380999B (en) | Detection system and method for inductivity bad behavior in live broadcast process | |
CN110418148B (en) | Video generation method, video generation device and readable storage medium | |
CN111914649A (en) | Face recognition method and device, electronic equipment and storage medium | |
CN113920585A (en) | Behavior recognition method and device, equipment and storage medium | |
CN111565303B (en) | Video monitoring method, system and readable storage medium based on fog calculation and deep learning | |
CN114727093B (en) | Data analysis method and device, electronic equipment and computer storage medium | |
CN112560552A (en) | Video classification method and device | |
CN113596354B (en) | Image processing method, image processing device, computer equipment and storage medium | |
CN114842411A (en) | Group behavior identification method based on complementary space-time information modeling | |
CN114550300A (en) | Video data analysis method and device, electronic equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |