CN112380999A - System and method for detecting induced adverse behaviors in live broadcast process - Google Patents

System and method for detecting induced adverse behaviors in live broadcast process Download PDF

Info

Publication number
CN112380999A
CN112380999A CN202011279463.8A CN202011279463A CN112380999A CN 112380999 A CN112380999 A CN 112380999A CN 202011279463 A CN202011279463 A CN 202011279463A CN 112380999 A CN112380999 A CN 112380999A
Authority
CN
China
Prior art keywords
video
model
prediction result
live broadcast
spatial
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011279463.8A
Other languages
Chinese (zh)
Other versions
CN112380999B (en
Inventor
张斌
陈禹奇
刘思源
刘莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeastern University China
Original Assignee
Northeastern University China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeastern University China filed Critical Northeastern University China
Priority to CN202011279463.8A priority Critical patent/CN112380999B/en
Publication of CN112380999A publication Critical patent/CN112380999A/en
Application granted granted Critical
Publication of CN112380999B publication Critical patent/CN112380999B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/269Analysis of motion using gradient-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/49Segmenting video sequences, i.e. computational techniques such as parsing or cutting the sequence, low-level clustering or determining units such as shots or scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a detection system and a detection method for induced adverse behaviors in a live broadcast process, which comprise a video set processing module, a spatial characteristic processing module, a time characteristic processing module and a fusion module.

Description

System and method for detecting induced adverse behaviors in live broadcast process
Technical Field
The invention relates to the field of computer vision recognition technology and convolutional neural network, in particular to a detection system and a detection method for inducing adverse behaviors in a direct seeding process, which are used for detecting whether induced adverse behaviors are mixed in a continuous action sequence in the direct seeding process.
Background
With the development of information technology and the popularization of intelligent hardware, especially the popularization of mobile intelligent terminal devices, smart phones and personal palm computers have gradually become the best choice for people in selecting office equipment and entertainment equipment. The online live broadcast platform integrates traditional offline art agencies, teahouses and talk show theaters, and a live broadcast anchor program entering the online live broadcast platform can display talent art in real time through the live broadcast platform and obtain corresponding income through gift presentation of live viewers. According to published data, the scale of China network live broadcast users reaches 4.33 hundred million, which accounts for 50.7% of the whole netizens, and only 2018 years, the number of newly added anchor broadcasters of a head live broadcast platform reaches as high as 200 and more than ten thousand. The direct broadcast industry forms a perfect industry chain combining software and hardware, and the department of human society also defines 'direct broadcast salesman' as one of new projects in 7 months of 2020. The dramatic increase of the number of network anchor and the rapid development of live broadcast platforms bring about the increase of live broadcast field times and the sudden expansion of the total live broadcast duration of the platforms.
The webcast may make some induced undesirable behaviors in the live broadcast room, such as: smoking, self-mutilation, abuse, etc., which can seriously impair the physical and mental well-being of the teenage audience if such behavior is mimicked by the teenage audience. These actions are mixed in a conventional continuous action sequence, and have a problem that the duration is short and difficult to be found. In the face of unsmooth live broadcast content, the traditional small live broadcast platform realizes the identification and examination of the main broadcast induced bad behaviors through an untimely patrol mechanism of a live broadcast platform administrator and a reporting mechanism of live broadcast audiences.
However, in the face of rapid increase of live broadcast times and live broadcast durations, the traditional inductive bad behavior identification method has a large dependence on the demand of patrol administrators of live broadcast platforms, and the operation burden of the live broadcast platforms is increased. Meanwhile, the traditional manual auditing mechanism has weak capability of identifying part of details and low accuracy rate of identifying illegal behaviors. In view of the efficiency of identifying an illegal video, in the process of identifying an induced bad behavior in a video, an identifier needs to watch the whole video, and a part of video segments which are not clearly judged need to be repeatedly watched for judgment, so that the problem of low efficiency is easily caused. In addition, aiming at the artificially identified violation video content, in order to prevent artificial misjudgment, the platform provides a violation complaint function for the anchor, so that the anchor can generate gas with the administrator, escape punishment and the like.
Therefore, a method with high recognition accuracy is needed to detect whether induced adverse behaviors exist in the direct seeding process.
Disclosure of Invention
The invention aims to solve the problems of low identification speed and low accuracy rate of the existing detection method, and provides a detection system and method for inductive bad behaviors in a live broadcast process.
In order to achieve the purpose, the invention is implemented according to the following technical scheme:
a detection system for induced adverse behavior in a live broadcast process, comprising:
the video set processing module is used for processing the video set contents, and comprises a video case for acquiring illegal inductive malbehavior stored in a live broadcast platform database, and capturing real-time live broadcast contents and storing the real-time live broadcast contents as videos to be identified; segmenting the video cases confirmed as the inducing bad behaviors, and labeling each segmented video according to the violation type label; dividing the long-time video to be identified into a plurality of sections of short-time videos according to an equal-length method, naming the divided short-time videos according to a uniform format, and ensuring the continuity and the readability of the plurality of sections of videos;
the output end of the spatial feature processing module is respectively connected with the input ends of the temporal feature processing module and the fusion module, and the spatial feature processing module is used for performing video single-frame capture on the processed short-duration video, extracting spatial features from the captured RGB single-frame image, inputting the spatial features into an induced adverse behavior recognition model aiming at the spatial features, and outputting a prediction result;
the time characteristic processing module is used for intercepting and obtaining calculation between two frames of RGB single-frame images adjacent in time sequence, obtaining instantaneous light stream information through calculation and synthesizing a light stream graph; extracting time characteristics from the synthesized optical flow diagram, inputting the time characteristics into an induced adverse behavior recognition model aiming at the time characteristics, and outputting a prediction result;
and the fusion module is used for fusing the obtained induced adverse behavior recognition model prediction result aiming at the spatial characteristic with the induced adverse behavior recognition model prediction result aiming at the time characteristic to obtain data fusing the spatial characteristic and the time characteristic, and classifying the fused data to obtain the prediction result of the segmented video. After all the segmented videos are predicted, fusion calculation is carried out on the prediction results of the segmented videos to obtain a final prediction result, and the final prediction result is a long-term video recognition result obtained from a live broadcast server.
Further, the spatial feature processing module includes:
the input end of the RGB single-frame image intercepting submodule is connected with the output end of the video set processing module, and the output end of the RGB single-frame image intercepting submodule is respectively connected with the input ends of the spatial feature model processing submodule and the temporal feature processing module; the RGB single-frame image intercepting submodule is used for carrying out video single-frame intercepting on the processed short-duration video to obtain an RGB single-frame image;
and the output end of the spatial feature model processing submodule is connected with the input end of the fusion module, and the spatial feature model processing submodule is used for extracting spatial features from the intercepted RGB single-frame image, inputting the spatial features into an induced bad behavior recognition model aiming at the spatial features and outputting a prediction result.
Further, the temporal feature processing module includes:
the input end of the light flow graph synthesis submodule is connected with the output end of the RGB single-frame graph intercepting submodule, the output end of the light flow graph synthesis submodule is connected with the input end of the time characteristic model processing submodule, and the light flow graph synthesis submodule is used for calculating between two frames of RGB single-frame graphs adjacent in time sequence obtained by video interception to obtain an instantaneous light flow information synthesis light flow graph;
and the output end of the time characteristic model processing submodule is connected with the input end of the fusion module, and the time characteristic model processing submodule is used for extracting time characteristics from the synthesized light flow graph, inputting the time characteristics into an induced bad behavior identification model aiming at the time characteristics and outputting a prediction result.
Further, the fusion module includes:
the input end of the spatio-temporal feature fusion submodule is respectively connected with the output ends of the spatial feature model processing submodule and the temporal feature model processing submodule, the spatio-temporal feature fusion submodule is used for fusing the obtained induced adverse behavior recognition model prediction result aiming at the spatial feature with the induced adverse behavior recognition model prediction result aiming at the temporal feature to obtain data fusing the spatial feature and the temporal feature, and the fused data is subjected to classification processing to obtain a prediction result of a segmented video;
and the input end of the prediction result fusion sub-module is connected with the output end of the temporal-spatial feature fusion sub-module, and the prediction result fusion sub-module is used for performing fusion calculation on the prediction results of the plurality of segmented videos after the prediction of all the segmented videos is completed to obtain a final prediction result, wherein the final prediction result is the identification result of the long-term video acquired from the live broadcast server.
Further, the original models of the induced adverse behavior recognition model for the spatial features and the induced adverse behavior recognition model for the temporal features are convolutional neural network models ResNet 152.
In addition, the invention also provides a detection method for the induced adverse behaviors in the direct seeding process, which utilizes the detection system for the induced adverse behaviors in the direct seeding process to detect the induced adverse behaviors in the direct seeding process and comprises the following steps:
step 1: extracting and processing violation video cases stored in a video database of a live broadcast platform, selecting a target video, dividing the target video into a plurality of sections of short-duration videos containing violation inducing behaviors, and recording type labels of the violation inducing behaviors;
step 2: processing the video to acquire an RGB single-frame image and an optical flow image of the video;
and step 3: respectively training a model for identifying spatial features and a model for identifying temporal features by using space-time features in an RGB single-frame image and an optical flow image to obtain an induced adverse behavior identification model for the spatial features and an induced adverse behavior identification model for the temporal features;
and 4, step 4: acquiring a live broadcast video clip, acquiring a real-time live broadcast cache from a live broadcast platform server, and cutting the live broadcast video clip into a plurality of sections of live broadcast video segments with the time length of 2-3 seconds;
and 5: aiming at the live video segment obtained by cutting in the step 4, repeating the content in the step 2 to obtain an RGB single-frame image and an optical flow image of the live video segment;
step 6: randomly selecting an RGB single-frame image obtained in the step 5, putting the RGB single-frame image into the induced adverse behavior recognition model aiming at the spatial characteristics obtained in the step 3, and outputting a prediction result;
and 7: putting the optical flow diagram obtained in the step 5 into the induced adverse behavior recognition model aiming at the time characteristics obtained in the step 3, and outputting a prediction result;
and 8: performing data fusion on the data obtained in the step 6 and the step 7, fusing the two results by an averaging method, outputting the fused result, and judging the fused result to obtain a prediction result of a certain video segment;
and step 9: and fusing the prediction results of the multiple video segments divided by the long-term and long-term video, and if the prediction result of at least one video segment is 'existence of bad behavior', judging that the current video to be recognized has induced bad behavior.
Further, the step 2 specifically includes:
step 2.1: the method comprises the steps of obtaining RGB single-frame images of video segments, extracting frames of the video segments, and extracting all RGB single-frame images contained in the video according to the frame rate characteristics of the video;
step 2.2: performing optical flow information processing on the RGB single-frame images, and synthesizing an optical flow diagram through calculation between two adjacent single-frame images;
step 2.3: and processing the obtained RGB single-frame image and the optical flow image, and storing the same type of induced malbehaviors together.
Further, the step 3 specifically includes:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: using the processed RGB single frame image and the marked video label to carry out targeted training on the ResNet152 model, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics;
step 3.3: and (3) carrying out targeted training on the ResNet152 model by using the processed light flow graph and the processed video label, adjusting training parameters in the training process, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the time characteristics.
Compared with the prior art, the method and the device have the advantages that the long video is divided into the plurality of short-time and long-time segments, the time-space characteristics are fused, and the recognition results of the plurality of segmented videos are fused, so that the induction bad behaviors mixed in the conventional continuous action sequence can be effectively recognized in time, and the recognition accuracy rate under the complex state is greatly improved.
Drawings
FIG. 1 is a flow chart of a method provided by an embodiment of the present invention;
FIG. 2 is a block diagram of a system according to an embodiment of the present invention;
fig. 3 is a flowchart of a spatial stream training method according to an embodiment of the present invention;
fig. 4 is a flowchart of a time stream training method according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention is further described in detail with reference to the following embodiments. The specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
In order to meet the supervision requirements of a network live broadcast platform on increasing live broadcast fields, a set of detection service which has higher identification speed and higher identification accuracy and aims at inducing bad behaviors in live broadcast videos is formed, the identification efficiency needs to be improved through an information tool, and the dependence on manual audit is reduced. The apparent characteristics contained in the video image frame are obtained by only using the RGB single-frame image to judge whether the direct broadcast video content has inductive bad behaviors or not, and the problem of low identification accuracy rate is caused. The video is a continuous frame set with a temporal characteristic, and besides the apparent characteristic provided by the single-frame RGB picture, the video also provides additional temporal characteristic information, namely motion information of an object, and the motion information of the object can be acquired by virtue of optical flow information stored in an optical flow graph. The optical flow of an image can be divided into an X direction and a Y direction, wherein the X direction includes a horizontal component of a displacement vector field of a certain point, and the Y direction includes a vertical component of the displacement vector field of the certain point. The light flow graph stores the motion information in two different directions separately, and the light flow graphs in two different directions can be obtained through calculation between two adjacent single-frame graphs.
And the space-time characteristic is obtained by respectively inputting the obtained RGB single-frame image and the optical flow image into a convolution neural network for characteristic extraction, and obtaining the space characteristic on the RGB single frame and the time characteristic contained in the optical flow image. The time-space characteristic combination not only considers the prediction result obtained according to the time characteristic, but also considers the prediction result obtained according to the space characteristic, the combination method adopts an average value fusion method, the two obtained prediction results are summed and then averaged to obtain fusion data, the fusion data is compared with a preset value, and the final judgment result is output.
The time length of a section of video to be recognized can be very long, a plurality of actions can be contained in the long-time video, how to accurately and efficiently recognize the inductive bad behaviors in the continuous action sequence is the key of the embodiment of the invention, and the improvement of the recognition accuracy rate is the key of the embodiment of the invention.
Specifically, as shown in fig. 1, the present embodiment describes in detail a detection system for induced adverse behaviors in a live broadcast process, and modules cooperate with each other to detect induced adverse behaviors in the live broadcast process; the detection system to induced bad action among the live broadcast process of this embodiment includes:
the video set processing module is used for processing the video set contents, and comprises a video case for acquiring illegal inductive malbehavior stored in a live broadcast platform database, and capturing real-time live broadcast contents and storing the real-time live broadcast contents as videos to be identified; segmenting the video cases confirmed as the inducing bad behaviors, and labeling each segmented video according to the violation type label; dividing the long-time video to be identified into a plurality of sections of short-time videos according to an equal-length method, naming the divided short-time videos according to a uniform format, and ensuring the continuity and the readability of the plurality of sections of videos;
the spatial feature processing module comprises:
the input end of the RGB single-frame image intercepting submodule is connected with the output end of the video set processing module, and the output end of the RGB single-frame image intercepting submodule is respectively connected with the input ends of the spatial feature model processing submodule and the temporal feature processing module; the RGB single-frame image intercepting submodule is used for carrying out video single-frame intercepting on the processed short-duration video to obtain an RGB single-frame image;
the output end of the spatial feature model processing submodule is connected with the input end of the fusion module, and the spatial feature model processing submodule is used for extracting spatial features from the intercepted RGB single-frame image, inputting the spatial features into an induced bad behavior recognition model aiming at the spatial features and outputting a prediction result;
the time characteristic processing module comprises:
the input end of the optical flow graph synthesis submodule is connected with the output end of the RGB single-frame graph interception submodule, the output end of the optical flow graph synthesis submodule is connected with the input end of the time characteristic model processing submodule, and the optical flow graph synthesis submodule is used for calculating between two frames of RGB single-frame graphs adjacent in time sequence to obtain an instantaneous optical flow information synthesis optical flow graph;
the time characteristic model processing submodule is used for extracting time characteristics from the synthesized light flow graph, inputting the time characteristics into an induced bad behavior identification model aiming at the time characteristics and outputting a prediction result;
the fusion module includes:
the input end of the spatio-temporal feature fusion submodule is respectively connected with the output ends of the spatial feature model processing submodule and the temporal feature model processing submodule, the spatio-temporal feature fusion submodule is used for fusing the obtained induced adverse behavior recognition model prediction result aiming at the spatial feature with the induced adverse behavior recognition model prediction result aiming at the temporal feature to obtain data fusing the spatial feature and the temporal feature, and the fused data is subjected to classification processing to obtain a prediction result of a segmented video;
and the input end of the prediction result fusion sub-module is connected with the output end of the temporal-spatial feature fusion sub-module, and the prediction result fusion sub-module is used for performing fusion calculation on the prediction results of the plurality of segmented videos after the prediction of all the segmented videos is completed to obtain a final prediction result, wherein the final prediction result is the identification result of the long-term video acquired from the live broadcast server.
The detection method for the induced adverse behaviors in the live broadcast process by using the detection system for the induced adverse behaviors in the live broadcast process of the embodiment includes the following specific steps as shown in fig. 2:
step 1: extracting and processing violation video cases stored in a video database of a live broadcast platform, selecting a target video, dividing the target video into a plurality of sections of short-duration videos containing violation inducing behaviors, and recording type labels of the violation inducing behaviors;
step 2: processing the video to obtain an RGB single-frame image and an optical flow image of the video:
step 2.1: the method comprises the steps of obtaining RGB single-frame images of video segments, extracting frames of the video segments, and extracting all RGB single-frame images contained in the video according to the frame rate characteristics of the video;
step 2.2: performing optical flow information processing on the RGB single-frame images, and synthesizing an optical flow diagram through calculation between two adjacent single-frame images;
step 2.3: processing the obtained RGB single-frame image and the optical flow image, and storing the same type of inductive bad behaviors together;
and step 3: training a model for identifying spatial features and a model for identifying temporal features respectively by using space-time features in an RGB single-frame image and an optical flow image to obtain an induced adverse behavior identification model for the spatial features and an induced adverse behavior identification model for the temporal features, as shown in fig. 3 and 4:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: using the processed RGB single frame image and the marked video label to carry out targeted training on the ResNet152 model, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics, as shown in FIG. 3;
step 3.3: using the processed light flow graph and video label to train the ResNet152 model in a targeted manner, adjusting the training parameters in the training process, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the time characteristics, as shown in FIG. 4;
and 4, step 4: acquiring a live broadcast video clip, acquiring a real-time live broadcast cache from a live broadcast platform server, and cutting the live broadcast video clip into a plurality of sections of live broadcast video segments with the time length of 2-3 seconds;
and 5: aiming at the live video segment obtained by cutting in the step 4, repeating the content in the step 2 to obtain an RGB single-frame image and an optical flow image of the live video segment;
step 6: randomly selecting an RGB single-frame image obtained in the step 5, putting the RGB single-frame image into the induced adverse behavior recognition model aiming at the spatial characteristics obtained in the step 3, and outputting a prediction result;
and 7: putting the optical flow diagram obtained in the step 5 into the induced adverse behavior recognition model aiming at the time characteristics obtained in the step 3, and outputting a prediction result;
and 8: performing data fusion on the data obtained in the step 6 and the step 7, fusing the two results by an average value fusion method, outputting the fused result, and judging the fused result to obtain a prediction result of a certain video segment;
and step 9: and fusing the prediction results of the multiple video segments divided by the long-term video, and if the prediction result of at least one video segment is 'existence of bad behavior', judging that the current video to be recognized has induced bad behavior.
In summary, the invention ensures that the mixed inducing bad behaviors in the conventional continuous action sequence can be effectively identified in time by dividing the long video into a plurality of short-time long segments and fusing the identification results of the plurality of segment videos through space-time characteristic fusion processing, thereby greatly improving the identification accuracy under the complex state.
The technical solution of the present invention is not limited to the limitations of the above specific embodiments, and all technical modifications made according to the technical solution of the present invention fall within the protection scope of the present invention.

Claims (8)

1. A detection system for induced adverse behaviors in a live broadcast process, comprising:
the video set processing module is used for processing the video set contents, and comprises a video case for acquiring illegal inductive malbehavior stored in a live broadcast platform database, and capturing real-time live broadcast contents and storing the real-time live broadcast contents as videos to be identified; segmenting the video cases confirmed as the inducing bad behaviors, and labeling each segmented video according to the violation type label; dividing the long-time video to be identified into a plurality of sections of short-time videos according to an equal-length method, naming the divided short-time videos according to a uniform format, and ensuring the continuity and the readability of the plurality of sections of videos;
the output end of the spatial feature processing module is respectively connected with the input ends of the temporal feature processing module and the fusion module, the spatial feature processing module is used for performing video single-frame capture on the processed short-duration video segments, extracting spatial features from the captured RGB single-frame images, inputting the spatial features into an induced adverse behavior recognition model aiming at the spatial features, and outputting a prediction result;
the time characteristic processing module is used for calculating between two adjacent frames of RGB single frames of the intercepted time sequence to obtain instantaneous optical flow information, synthesizing an optical flow graph, extracting time characteristics from the synthesized optical flow graph, inputting the time characteristics into an induced adverse behavior recognition model aiming at the time characteristics, and outputting a prediction result;
and the fusion module is used for fusing the obtained induced adverse behavior recognition model prediction result aiming at the spatial characteristic with the induced adverse behavior recognition model prediction result aiming at the time characteristic to obtain data fusing the spatial characteristic and the time characteristic, the fused data is subjected to classification processing to obtain a prediction result of a segmented video, after all segmented videos are predicted, the prediction results of a plurality of segmented videos are subjected to fusion calculation to obtain a final prediction result, and the final prediction result is a long-term video recognition result obtained from a live broadcast server.
2. The system for detecting induced adverse behaviors in a live broadcast process according to claim 1, wherein the spatial feature processing module comprises:
the input end of the RGB single-frame image intercepting submodule is connected with the output end of the video set processing module, and the output end of the RGB single-frame image intercepting submodule is respectively connected with the input ends of the spatial feature model processing submodule and the temporal feature processing module; the RGB single-frame image intercepting submodule is used for carrying out video single-frame intercepting RGB single-frame image operation on the processed short-duration video;
and the output end of the spatial feature model processing submodule is connected with the input end of the fusion module, and the spatial feature model processing submodule is used for extracting spatial features from the intercepted RGB single-frame image, inputting the spatial features into an induced bad behavior recognition model aiming at the spatial features and outputting a prediction result.
3. The system for detecting induced adverse behaviors in a live broadcast process according to claim 2, wherein the temporal feature processing module comprises:
the input end of the light flow graph synthesis submodule is connected with the output end of the RGB single-frame graph intercepting submodule, the output end of the light flow graph synthesis submodule is connected with the input end of the time characteristic model processing submodule, and the light flow graph synthesis submodule is used for calculating between two adjacent frames of RGB single-frame graphs of the obtained time sequence through interception, obtaining instantaneous light flow information through calculation and synthesizing a light flow graph;
and the output end of the time characteristic model processing submodule is connected with the input end of the fusion module, and the time characteristic model processing submodule is used for extracting time characteristics from the synthesized light flow graph, inputting the time characteristics into an induced bad behavior identification model aiming at the time characteristics and outputting a prediction result.
4. The system of claim 3, wherein the fusion module comprises:
the input end of the spatio-temporal feature fusion submodule is respectively connected with the output ends of the spatial feature model processing submodule and the temporal feature model processing submodule, the spatio-temporal feature fusion submodule is used for fusing the obtained induced adverse behavior recognition model prediction result aiming at the spatial feature with the induced adverse behavior recognition model prediction result aiming at the temporal feature to obtain data fusing the spatial feature and the temporal feature, and the fused data is subjected to classification processing to obtain a prediction result of a segmented video;
and the input end of the prediction result fusion sub-module is connected with the output end of the temporal-spatial feature fusion sub-module, and the prediction result fusion sub-module is used for performing fusion calculation on the prediction results of the plurality of segmented videos after the prediction of all the segmented videos is completed to obtain a final prediction result, wherein the final prediction result is a long-term video identification result obtained from a live broadcast server.
5. The system of claim 3, wherein the original models of the model for identifying the induced adverse behaviors in the spatial features and the model for identifying the induced adverse behaviors in the temporal features are convolutional neural network models ResNet 152.
6. A method for detecting induced adverse behaviors in a direct seeding process, which is characterized in that the detection system for the induced adverse behaviors in the direct seeding process, as claimed in claim 5, is used for detecting the induced adverse behaviors in the direct seeding process, and comprises the following steps:
step 1: extracting and processing violation video cases stored in a video database of a live broadcast platform, selecting a target video, dividing the target video into a plurality of sections of short-duration videos containing violation inducing behaviors, and recording type labels of the violation inducing behaviors;
step 2: processing the video to acquire an RGB single-frame image and an optical flow image of the video;
and step 3: respectively training a model for identifying spatial features and a model for identifying temporal features by using space-time features in an RGB single-frame image and an optical flow image to obtain an induced adverse behavior identification model for the spatial features and an induced adverse behavior identification model for the temporal features;
and 4, step 4: acquiring a live broadcast video clip, acquiring a real-time live broadcast cache from a live broadcast platform server, and cutting the live broadcast video clip into a plurality of sections of live broadcast video segments with the time length of 2-3 seconds;
and 5: aiming at the live video segment obtained by cutting in the step 4, repeating the content in the step 2 to obtain an RGB single-frame image and an optical flow image of the live video segment;
step 6: randomly selecting an RGB single-frame image obtained in the step 5, putting the RGB single-frame image into the induced adverse behavior recognition model aiming at the spatial characteristics obtained in the step 3, and outputting a prediction result;
and 7: putting the optical flow diagram obtained in the step 5 into the induced adverse behavior recognition model aiming at the time characteristics obtained in the step 3, and outputting a prediction result;
and 8: performing data fusion on the data obtained in the step 6 and the step 7, fusing the two results by an average value fusion method, outputting the fused result, and performing classification judgment on the fused result to obtain a prediction result of a certain video segment;
and step 9: and fusing the prediction results of the multiple video segments divided by the long-term video, and if the prediction result of at least one video segment is 'existence of bad behavior', judging that the current video to be recognized has induced bad behavior.
7. The system for detecting induced adverse behaviors in a direct seeding process according to claim 6, wherein the step 2 specifically comprises:
step 2.1: acquiring RGB single-frame images of the video segments, extracting frames of the video segments, and extracting all RGB single frames contained in the video according to the frame rate characteristics of the video;
step 2.2: performing optical flow information processing on the RGB single-frame images, and synthesizing an optical flow graph through calculation between two adjacent RGB single-frame images;
step 2.3: and processing the obtained RGB single-frame image and the optical flow image, and storing the related RGB single-frame image and the optical flow image of the same type of induced adverse behaviors together.
8. The system for detecting induced adverse behaviors in a direct seeding process according to claim 6, wherein the step 3 specifically comprises:
step 3.1: loading a convolutional neural network model ResNet152 pre-trained by an ImageNet training set;
step 3.2: using the processed RGB single frame image and the marked video label to carry out targeted training on the ResNet152 model, continuously adjusting training parameters, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the spatial characteristics;
step 3.3: and (3) carrying out targeted training on the ResNet152 model by using the processed light flow graph and the processed video label, adjusting training parameters in the training process, updating the model to achieve the best model identification accuracy, and storing the obtained induced bad behavior identification model aiming at the time characteristics.
CN202011279463.8A 2020-11-16 2020-11-16 Detection system and method for inductivity bad behavior in live broadcast process Active CN112380999B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011279463.8A CN112380999B (en) 2020-11-16 2020-11-16 Detection system and method for inductivity bad behavior in live broadcast process

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011279463.8A CN112380999B (en) 2020-11-16 2020-11-16 Detection system and method for inductivity bad behavior in live broadcast process

Publications (2)

Publication Number Publication Date
CN112380999A true CN112380999A (en) 2021-02-19
CN112380999B CN112380999B (en) 2023-08-01

Family

ID=74585326

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011279463.8A Active CN112380999B (en) 2020-11-16 2020-11-16 Detection system and method for inductivity bad behavior in live broadcast process

Country Status (1)

Country Link
CN (1) CN112380999B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160273A (en) * 2021-03-25 2021-07-23 常州工学院 Intelligent monitoring video segmentation method based on multi-target tracking

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550699A (en) * 2015-12-08 2016-05-04 北京工业大学 CNN-based video identification and classification method through time-space significant information fusion
CN110909658A (en) * 2019-11-19 2020-03-24 北京工商大学 Method for recognizing human body behaviors in video based on double-current convolutional network
CN110969066A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Live video identification method and device and electronic equipment
CN111462183A (en) * 2020-03-31 2020-07-28 山东大学 Behavior identification method and system based on attention mechanism double-current network
CN111709351A (en) * 2020-06-11 2020-09-25 江南大学 Three-branch network behavior identification method based on multipath space-time characteristic reinforcement fusion
CN111783540A (en) * 2020-06-01 2020-10-16 河海大学 Method and system for recognizing human body behaviors in video

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105550699A (en) * 2015-12-08 2016-05-04 北京工业大学 CNN-based video identification and classification method through time-space significant information fusion
CN110969066A (en) * 2018-09-30 2020-04-07 北京金山云网络技术有限公司 Live video identification method and device and electronic equipment
CN110909658A (en) * 2019-11-19 2020-03-24 北京工商大学 Method for recognizing human body behaviors in video based on double-current convolutional network
CN111462183A (en) * 2020-03-31 2020-07-28 山东大学 Behavior identification method and system based on attention mechanism double-current network
CN111783540A (en) * 2020-06-01 2020-10-16 河海大学 Method and system for recognizing human body behaviors in video
CN111709351A (en) * 2020-06-11 2020-09-25 江南大学 Three-branch network behavior identification method based on multipath space-time characteristic reinforcement fusion

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
TONGWEI LU: "Deep Optical Flow Feature Fusion Based on 3D Convolutional Networks for Video Action Recognition", 《 2018 IEEE SMARTWORLD, UBIQUITOUS INTELLIGENCE & COMPUTING, ADVANCED & TRUSTED COMPUTING, SCALABLE COMPUTING & COMMUNICATIONS, CLOUD & BIG DATA COMPUTING, INTERNET OF PEOPLE AND SMART CITY INNOVATION》, pages 1077 - 1080 *
ZHIJIAN LIU等: "Human Activities Recognition from Videos Based on Compound Deep Neural Network", 《 THE 10TH INTERNATIONAL CONFERENCE ON COMPUTER ENGINEERING AND NETWORKS 》, pages 314 - 326 *
杨斌: "基于深度学习的视频行为识别方法研究", 《CNKI中国优秀硕士毕业论文全文库(信息科技辑)》, no. 12, pages 138 - 566 *
罗未一: "基于深度学习的视频行为识别算法", 《CNKI中国优秀硕士毕业论文全文库(信息科技辑)》, no. 7, pages 138 - 909 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113160273A (en) * 2021-03-25 2021-07-23 常州工学院 Intelligent monitoring video segmentation method based on multi-target tracking

Also Published As

Publication number Publication date
CN112380999B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
AU2016291690B2 (en) Prediction of future views of video segments to optimize system resource utilization
CN106792100B (en) Video bullet screen display method and device
CN110381366B (en) Automatic event reporting method, system, server and storage medium
CN108353208B (en) Optimizing media fingerprint retention to improve system resource utilization
JP5795580B2 (en) Estimating and displaying social interests in time-based media
US10304458B1 (en) Systems and methods for transcribing videos using speaker identification
US20230336837A1 (en) Detection of common media segments
US10643074B1 (en) Automated video ratings
US20170019719A1 (en) Detection of Common Media Segments
Dou et al. Edge computing-enabled deep learning for real-time video optimization in IIoT
CN113382284B (en) Pirate video classification method and device
CN110856039A (en) Video processing method and device and storage medium
Heng et al. How to assess the quality of compressed surveillance videos using face recognition
CN112380999A (en) System and method for detecting induced adverse behaviors in live broadcast process
CN101339662B (en) Method and device for creating video frequency feature data
CN112287771A (en) Method, apparatus, server and medium for detecting video event
US10771828B2 (en) Content consensus management
CN112560552A (en) Video classification method and device
Bertini et al. Semantic adaptation of sport videos with user-centred performance analysis
CN114727093A (en) Data analysis method and device, electronic equipment and computer storage medium
EP3323244B1 (en) System and method for improving work load management in acr television monitoring system
CA3024183C (en) Generating synthetic frame features for sentinel frame matching
Zeng et al. Instant video summarization during shooting with mobile phone
Neri et al. Unsupervised video orchestration based on aesthetic features
Sanap et al. Quality assessment framework for video contextualisation of personal videos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant