CN113705370B

CN113705370B - Method and device for detecting illegal behaviors of live broadcasting room, electronic equipment and storage medium

Info

Publication number: CN113705370B
Application number: CN202110909967.1A
Authority: CN
Inventors: 孙天艺; 孙想; 邓天生; 贠挺; 于天宝; 陈国庆; 林赛群
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-08-09
Filing date: 2021-08-09
Publication date: 2023-06-30
Anticipated expiration: 2041-08-09
Also published as: CN113705370A

Abstract

The invention discloses a method and a device for detecting illegal behaviors of a live broadcasting room, electronic equipment and a storage medium, and relates to the technical field of Internet mobile terminal application. After frames are extracted from a video to be detected to obtain a plurality of video frames, the plurality of video frames are sequentially input into an offence detection model to determine whether the offence exists in the video frames, abnormal marks are carried out on the offence, if the first offence exists in the first video frame according to the abnormal marks of the first video frame, whether the second offence exists in the second video frame is confirmed according to the abnormal marks of the second video frame, wherein the second video frame is the last video frame of the first video frames in the plurality of video frames, finally, after the second offence exists in the second video frame, the offence of the video to be detected is confirmed to exist in a live broadcasting room according to the first offence and the second offence, and the detection method ensures the accuracy of detecting the offence of the live broadcasting room through a multi-frame fusion mechanism.

Description

Method and device for detecting illegal behaviors of live broadcasting room, electronic equipment and storage medium

Technical Field

The disclosure relates to the technical field of internet mobile terminal application, in particular to the technical field of internet live broadcast platform supervision, and specifically relates to a method and a device for detecting illegal behaviors of a live broadcast room, electronic equipment and a storage medium.

Background

With the development of internet technology, live broadcasting in various scenes is rapidly developed. However, in the current live broadcasting process of each live broadcasting platform, various mess conditions exist, such as smoking behaviors in a live broadcasting scene, namely, a host is smoking in the live broadcasting scene. Smoking is not only harmful to health, but also brings bad guidance, which is unfavorable for the healthy growth of teenagers.

At present, in order to standardize live broadcast behaviors and build a good live broadcast environment, a gesture for detecting smoking actions is generally adopted to judge whether smoking actions exist in the live broadcast room, however, because of various skin colors of a host broadcast and the change of camera placement angles during live broadcast, the recognized gestures are different, and the problems of confusion and the like of gestures similar to smoking actions exist, the gesture in a non-smoking state is often misjudged as a smoking gesture, and therefore the misjudgment rate of judging the smoking actions by using the smoking gesture is high.

Disclosure of Invention

The disclosure provides a method, a device, electronic equipment and a storage medium for detecting illegal behaviors of a live broadcasting room.

According to an aspect of the present disclosure, there is provided a method for detecting a live broadcast room violation, including:

extracting frames of the video to be detected to obtain a plurality of video frames;

sequentially inputting the plurality of video frames into an illegal action detection model to determine whether the illegal action exists in the video frames, and carrying out abnormal marking on the illegal action;

if the first illegal action exists in the first video frame according to the abnormal mark of the first video frame, confirming whether the second illegal action exists in the second video frame according to the abnormal mark of the second video frame, wherein the second video frame is the last video frame of the first video frame in the plurality of video frames;

and if the second video frame is determined to have second illegal behaviors, determining that the video to be detected has live broadcasting room illegal behaviors according to the first illegal behaviors and the second illegal behaviors.

According to another aspect of the present disclosure, there is provided a text generating apparatus including:

detection device of live broadcasting room illegal action, wherein includes:

the first frame extraction module is used for extracting frames of the video to be detected to obtain a plurality of video frames;

the first input module is used for sequentially inputting the plurality of video frames into an illegal action detection model so as to determine whether the illegal action exists in the video frames and carrying out abnormal marking on the illegal action;

The first confirming module is used for confirming whether a second illegal action exists in a second video frame according to the abnormal mark of the first video frame when the first illegal action exists in the first video frame, wherein the second video frame is the last video frame of the first video frame in the plurality of video frames;

and the first determining module is used for determining that the video to be detected has live broadcasting illegal behaviors according to the first illegal behaviors and the second illegal behaviors when determining that the second video frame has the second illegal behaviors.

According to another aspect of the present disclosure, there is provided an electronic device including:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the preceding aspect.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the preceding aspect.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method as described in the preceding aspect.

According to the method, the device, the electronic equipment and the storage medium for detecting the illegal behaviors of the live broadcasting room, after frames are extracted from videos to be detected to obtain a plurality of video frames, the plurality of video frames are sequentially input into the illegal behavior detection model to determine whether the illegal behaviors exist in the video frames, abnormal marks are carried out on the illegal behaviors, if the first illegal behaviors exist in the first video frames according to the abnormal marks of the first video frames, whether the second illegal behaviors exist in the second video frames is confirmed according to the abnormal marks of the second video frames, wherein the second video frames are the last video frame of the first video frames in the plurality of video frames, finally, after the second illegal behaviors exist in the second video frames, the illegal behaviors of the live broadcasting room to be detected are determined according to the first illegal behaviors and the second illegal behaviors, and the method for detecting the live broadcasting room is capable of ensuring the accuracy of detecting the illegal behaviors of the live broadcasting room through a multi-frame fusion mechanism.

It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are for a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

fig. 1 is a flow chart of a method for detecting illegal activities in a live broadcasting room according to an embodiment of the present disclosure;

fig. 2 is a frame diagram of another method for detecting a live-room violation provided by an embodiment of the present disclosure;

FIG. 3 is a flow chart of a method for training an offence detection model provided by an embodiment of the present disclosure;

fig. 4 is a schematic structural diagram of detection of a live broadcast room violation provided in an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of another detection of illegal activities in a live broadcasting room according to an embodiment of the present disclosure;

fig. 6 is a schematic block diagram of an example electronic device 600 provided by an embodiment of the disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present disclosure to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The following describes a method, an apparatus, an electronic device, and a storage medium for detecting a live broadcast room violation according to an embodiment of the present disclosure with reference to the accompanying drawings.

In the related art, in order to detect the illegal behaviors in the live broadcasting room, taking the smoking behavior as an example, the smoking behavior is identified by detecting the gesture of the smoking behavior and combining the characteristics of the cigarette target, in this way, the identified gesture has a difference due to the problems of various complexion, camera angles and the like, and the problem of confusion of the gesture similar to the smoking behavior exists, so the misjudgment rate of judging the smoking behavior by using the smoking gesture is high.

In this application, in order to avoid the misjudgement to the illegal action of living broadcast room among the correlation technique, this application is detecting the illegal action of living broadcast room, adopts multiframe integration mechanism in order to guarantee the rate of accuracy that detects, promotes living broadcast platform's standardization level on the basis of accurately detecting the illegal action, promotes whole living broadcast quality, builds good living broadcast environment.

Fig. 1 is a flow chart of a method for detecting illegal activities in a live broadcasting room according to an embodiment of the present disclosure.

As shown in fig. 1, the method comprises the steps of:

and 101, performing frame extraction on the video to be detected to obtain a plurality of video frames.

As a first possible implementation manner, the video to be detected in the embodiments of the present application may be a live video, that is, data in a live broadcast room is monitored in real time to determine whether the video includes an illegal action.

As a second possible implementation manner, the video to be detected in the embodiments of the present application may also be a non-live video, such as live playback, or a video type with guiding effect.

The following embodiments will be described taking live video as an example, but it should be clear that this description is not intended to limit that the video to be detected can only be live video. The detection method is the same regardless of the video type, and the difference is the source and type of the video to be detected.

In practical application, the length of the video to be detected is not limited when the video to be detected is detected, for example, the video may be 45s video or 3min video, which is not limited in the embodiment of the present application.

The video to be detected is played by playing a single picture frame by frame, and the visual persistence characteristic of naked eyes is utilized to enable people to generate continuous animation illusion on the visual sense. The frame extraction in the embodiment of the application refers to extracting a single picture from the videos to be detected to obtain a video frame. The purpose of this step is to convert the video to be detected into an image (video frame) that can be processed by the offence detection model.

As a possible implementation manner of the embodiment of the present application, since the illegal activity is not video activity that changes at a high speed, when the video to be detected is frame-extracted, the frequency may be 1fps, or set to 2fps, in a specific implementation process, the frame-extracted frequency is not set too large, if the set too large would directly cause omission of the illegal activity, the specific setting of the frame-extracted frequency may be flexibly set according to different application scenarios.

Step 102, sequentially inputting the plurality of video frames into an offence detection model to determine whether the offence exists in the video frames, and performing exception marking on the offence.

In an implementation manner of the embodiment of the present application, the offence detection model may be a majority of target detection networks, and when the present application is implemented, improved deployment is performed based on yolov3 to adapt to a detection task of offence in a live broadcast scene, and it needs to be described that the description manner is not intended to limit a specific kind of offence detection model, and all models with detection functions are within the protection scope of the present application.

The violations described herein, including but not limited to any or a combination of any of the following, include: smoking behavior, advertising behavior, clothing exposure behavior, and soiling behavior, among others. When detecting a plurality of video frames by the illegal action detection model, if the illegal action in the video frames is detected, marking the video frames, and outputting a detection result with the abnormal mark.

Step 103, if it is determined that the first video frame has the first violation according to the anomaly flag of the first video frame, determining whether the second video frame has the second violation according to the anomaly flag of the second video frame, where the second video frame is a last video frame of the first video frame in the plurality of video frames.

When determining whether the illegal behaviors exist in the video frames, the method does not depend on a single video frame any more, but adopts a multi-frame fusion mode to judge, so that erroneous judgment on the illegal behaviors is prevented, and further, the accuracy of the illegal behaviors is improved.

It should be noted that, the first video frame and the second video frame described in the present application are only used to distinguish different video frames, and are not meant to refer to or indicate the priority execution order or the like. The first video is finally converted into a second video frame, namely, after the first video is detected, the first video is replaced by the second video frame, the new video frame is continuously detected by inputting the illegal action detection model, and the new video frame is used as the first video frame, and the process is circulated until the whole video to be detected is detected.

Step 104, if it is determined that the second video frame has the second violation, determining that the video to be detected has the live broadcasting room violation according to the first violation and the second violation.

The multi-frame fusion is the fusion of the first video frame and the second video frame, namely when the second video frame detects an abnormality, the video frame is not directly judged to have illegal behaviors, when the first video frame detects an abnormality, the video frame is also not directly judged to have illegal behaviors, and if and only if the first video frame and the second video frame are both determined to be marked by the abnormality, the illegal behaviors in the video to be detected can be determined.

According to the live broadcasting room illegal action detection method, after frames are extracted from a video to be detected to obtain a plurality of video frames, the plurality of video frames are sequentially input into an illegal action detection model to determine whether the video frames have illegal actions or not, and abnormal marks are carried out on the illegal actions, if the first video frames are determined to have first illegal actions according to the abnormal marks of the first video frames, whether the second video frames have second illegal actions or not is confirmed according to the abnormal marks of the second video frames, wherein the second video frames are the last video frame of the first video frames in the plurality of video frames, finally, after the second video frames are determined to have second illegal actions, the live broadcasting room illegal actions to be detected are determined according to the first illegal actions and the second illegal actions, and the live broadcasting room illegal action detection accuracy is ensured through a fusion mechanism.

In order to further standardize the live broadcasting behavior of the live broadcasting room, the host can correctly recognize self-violation behaviors, and in the specific implementation process, the duration of the first video frame and the duration of the second video frame are combined to obtain the first start-stop duration of the violation behaviors in the video to be detected.

For clarity of illustration of the present embodiment, for example, when two consecutive frames (a first video frame and a second video frame) are detected by the rule-breaking detection model to have rule-breaking, the video clips corresponding to the two frames are recorded as having rule-breaking. If the continuous video frames are marked as having the violations, the following three video frames have the violations, wherein the three video frames are respectively [ [1,2], [2,3], [3,4] ], and the first starting time of the violations after merging is [1,4]. If two non-adjacent video frames are detected as smoking, such as [ [1,2], [5,6] ], the combination processing is not performed under the application scene, and the detection result is directly output.

In a specific application process, the duration of a video to be detected in a live broadcasting room is long, when detecting illegal behaviors, more processing time may be required to be spent on directly detecting the video to be detected, more processing resources are occupied, and as shown in fig. 2, another method for detecting illegal behaviors in the live broadcasting room is further provided, and includes:

Step 201, segmenting the video to be detected according to a preset video duration threshold value to obtain a video segment to be detected.

The video segment to be detected comprises a first video segment and a second video segment, and the second video segment is an adjacent video segment to be detected on the first video segment.

By means of segmenting the video to be detected, detecting the segmented video segments to be detected can accelerate the detection speed of detecting the illegal behaviors in each segment of video segments to be detected, and further the detection speed of the whole video to be detected is accelerated, for example, if the video to be detected is 20s video, when segmentation is carried out according to a preset video duration threshold value, the segmentation is carried out by adopting the preset video duration threshold value of 10 seconds, and therefore the 20s video to be detected is segmented into two 10s video segments to be detected.

It should be noted that, the preset video duration threshold is a tested value, and the number of specific segments can also be set according to different application scenarios, and the above example is not a specific limitation on the preset video duration threshold and the video duration to be detected.

Step 202, sequentially extracting frames from each video segment to be detected to obtain a plurality of video frames.

For the description of the frame extraction, please refer to the detailed description of step 101, and this step is not repeated.

Taking the example of step 201, after the video to be detected is segmented into two video segments to be detected of 10s, frame extraction is performed on the two video segments to be detected according to the frame extraction frequency of 1fps, and 10 video frames are obtained for each video segment to be detected.

Sequentially inputting 10 video frames of a first video segment into an offence detection model to determine whether the offence exists in the 10 video frames, and carrying out exception marking on the offence, if the first offence exists in the first video frame according to the exception marking of the first video frame, determining whether the second offence exists in the second video frame according to the exception marking of the second video frame, wherein the second video frame is the last video frame of the first video frame in the plurality of video frames, and if the second offence exists in the second video frame, determining that the offence exists between live broadcast in the first video segment to be detected according to the first offence and the second offence. Continuing to confirm whether 10 video frames of the second video segment are offending according to the method.

When the first video segment confirms that the violations exist, the violations exist in the video to be detected can be confirmed, the current video frame and the adjacent video frames which are the last of the current video frame are combined, and the first starting time length of the violations in the video to be detected is calculated according to the frame extraction frequency of the video to be detected.

Step 203, if the target video frame extracted in the first video segment and the reference video frame extracted in the second video segment both determine that the abnormal mark of the illegal action exists, merging the target video frame and the reference video frame; the target video frame is a first video frame of the first video segment, and the reference video frame is a second video frame of the second video segment.

For better understanding of this step, an exemplary manner will be described, and the step 202 is presented, where if there is a violation in the 8 th to 10 th video frames of the 10 th video frames extracted in the first video segment, and there is a violation in the 1 st to 5 th video frames extracted in the second video segment, the first video segment and the second video frame are combined.

This example is given solely for ease of understanding and it should be understood that this manner of description is not intended to limit the specifics.

Step 204, calculating a second start-stop time length of the illegal action according to the time length of the target video frame and the time length of the reference video frame.

The method has the advantages that the detection efficiency can be improved, the live broadcast behavior can be ensured to be standardized in detection precision, the potential safety hazard is reduced, and in addition, the strategy of calculating the second start-stop time length through combination enables the prompting of a user to be more friendly in alarming time period.

Step 205, after confirming that the video to be detected has live broadcasting room violations, determining the severity level of the violations according to the anomaly markers.

The abnormal mark in the embodiment of the application is used for marking the illegal behaviors and has the effect of marking the severity level of the illegal behaviors, wherein in the specific implementation process, one abnormal mark corresponds to the severity level of the illegal behaviors, and different severity levels correspond to one alarm prompt.

And 206, issuing a corresponding alarm prompt to the live broadcasting room according to the severity level of the illegal action.

The severity level can be classified according to specific categories of the live broadcasting room, and the levels can be confirmed for the live broadcasting rooms of different categories. The live broadcast behavior can be further standardized, the overall live broadcast quality is improved, a good live broadcast environment is created, the user watching experience is improved, and the live broadcast system is potentially beneficial to improving the retention rate of the user.

The rule-breaking behavior detection model plays a role in detecting rule-breaking behavior in the present application, so how to train the rule-breaking behavior detection model to obtain an optimal model is important, and the embodiment of the present application provides a method for training the rule-breaking behavior detection model, as shown in fig. 3, where the method includes:

Step 301, acquiring training video frame information, where the training video frame information includes: sample video frames, and sample target information for the sample video frames.

In practical application, the illegal action detection model comprises a convolution layer, a batch normalization layer and a leak ReLu layer. The convolution layer is used for extracting features of a sample video frame, the batch normalization layer is used for fusing the features extracted by the convolution layer based on the features of the convolution layer, and the leak ReLu layer is used for detecting and outputting anomalies of the video frame based on the batch normalization layer.

Step 302, inputting the sample video frame into a violation detection model to obtain marked violations and predicted target information in the sample video frame.

The recall rate of the training can be calculated by the obtained sample video frames detected by the illegal behavior detection model and marked video frames in the sample video frames, and the accuracy of the training is confirmed according to the predicted target information on the basis of determining the recall rate.

In the training process, recall rate and accuracy are indispensable, supplement each other, and only when recall rate and accuracy all reach the forecast target information, the detection result is more accurate in the follow-up detection process for the trained illegal behavior detection model. In addition, the manual work can be directly replaced to a certain extent through the illegal action detection model, and the labor cost is solved.

And step 303, training the offence detection model according to the difference between the sample target information and the predicted target information to obtain a trained offence detection model.

In the specific implementation process, the training process of the illegal behavior detection model is generally a process of iterative training for a plurality of times, and the training result is more converged on the premise of continuously adjusting the network level parameters of each layer so as to complete the training of the illegal behavior detection model.

The above embodiments describe the method for detecting the illegal behavior of the live broadcasting room in detail, and the method can be applied to any of the following application scenarios, for example: the illegal behavior is detected in the detection process of smoking behavior, advertising behavior, clothing exposure behavior and pollution behavior. An application scenario for detecting smoking behavior in a live broadcast room is given below, in which, when detecting based on an offence detection model, whether cigarettes and/or smoke exist in a video frame is detected, and if cigarettes and/or smoke exist, the video frame is marked abnormally. The detection of cigarettes and/or smoke is added in the position, so that the aim of avoiding false detection of water mist such as a humidifier is fulfilled, and the accuracy of detection in a direct broadcasting room is improved.

Fig. 4 is a schematic structural diagram of a device for detecting illegal activities in a live broadcasting room according to an embodiment of the present disclosure, as shown in fig. 4, including: a first frame extraction module 41, a first input module 42, a first confirmation module 43, a first determination module 44.

The frame extraction module 41 is configured to extract frames of a video to be detected to obtain a plurality of video frames;

the first input module 42 is configured to sequentially input the plurality of video frames into an offence detection model, so as to determine whether an offence exists in the video frames, and perform an anomaly marking on the offence;

the first confirming module 43 is configured to confirm whether a second violation exists in a second video frame according to an anomaly flag of the second video frame after determining that the first violation exists in the first video frame according to an anomaly flag of the first video frame, where the second video frame is a previous video frame of the first video frame in the plurality of video frames;

the first determining module 44 is configured to determine that the video to be detected has live-room violations according to the first violations and the second violations when it is determined that the second video frame has second violations.

Further, in a possible implementation manner of this embodiment, as shown in fig. 5, the apparatus further includes: the first frame extracting module 51, the first input module 52, the first confirmation module 53, and the first determination module 54, and the corresponding first frame extracting module 41, the first input module 42, the first confirmation module 43, and the first determination module 44 in fig. 4 are referred to herein, and are not described in detail.

And the first merging module 55 is configured to merge the duration of the first video frame and the duration of the second video frame to obtain a first starting duration of the violation in the video to be detected.

Further, in one possible implementation manner of this embodiment, as shown in fig. 5, the frame extracting module 51 further includes:

the segmentation unit 5101 is configured to segment the video to be detected according to a preset video duration threshold value, so as to obtain a video segment to be detected;

and the frame extracting unit 5102 is used for extracting frames from each video segment to be detected in sequence to obtain a plurality of video frames.

Further, in a possible implementation manner of this embodiment, as shown in fig. 5, the video segment to be detected includes a first video segment and a second video segment, where the second video segment is an adjacent video segment to be detected on the first video segment; the apparatus further comprises:

a second merging module 56, configured to merge the target video frame and the reference video frame when the target video frame extracted in the first video segment and the reference video frame extracted in the second video segment both determine that the abnormal mark of the violation exists; the target video frame is a first video frame of the first video segment, and the reference video frame is a second video frame of the second video segment;

A calculating module 57, configured to calculate a second start-stop time length of the violation according to a time length of the target video frame and a time length of the reference video frame.

Further, in a possible implementation manner of this embodiment, as shown in fig. 5, the apparatus further includes:

a second determining module 58, configured to determine a severity level of the violation according to the anomaly flag, where the video to be detected is confirmed to have a live-room violation;

and the issuing module 59 is configured to issue a corresponding alarm prompt to the live broadcasting room according to the severity level of the offence.

an obtaining module 510, configured to obtain training video frame information, where the training video frame information includes: sample video frames and sample target information for the sample video frames;

a second input module 511, configured to input the sample video frame into an offence detection model, so as to obtain marked offence and predicted target information in the sample video frame;

the training module 512 is configured to train the offence detection model according to the difference between the sample target information and the predicted target information, so as to obtain a trained offence detection model.

Further, in a possible implementation manner of this embodiment, the offending behavior includes any one or any combination of the following behaviors, including: smoking behavior, advertising behavior, clothing exposure behavior, and soiling behavior.

Further, in a possible implementation manner of this embodiment, when the offence is a smoking behavior, the input module is further configured to sequentially input the plurality of video frames into the offence detection model, so as to determine whether an abnormal sign of a cigarette and/or smoke exists in the video frames.

The method comprises the steps of extracting frames from a video to be detected, sequentially inputting a plurality of video frames into an offence detection model to determine whether the offence exists in the video frames, carrying out abnormal marking on the offence, determining whether a second offence exists in a second video frame according to the abnormal marking of the second video frame if the first offence exists in the first video frame, wherein the second video frame is the last video frame of the first video frames in the plurality of video frames, and finally determining that the offence exists in the video to be detected according to the first offence and the second offence after the second offence exists in the second video frame.

The foregoing explanation of the method embodiment is also applicable to the apparatus of this embodiment, and the principle is the same, and this embodiment is not limited thereto.

According to embodiments of the present disclosure, the present disclosure also provides an electronic device, a readable storage medium and a computer program product.

Fig. 6 illustrates a schematic block diagram of an example electronic device 600 that may be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 6, the apparatus 600 includes a computing unit 601 that can perform various appropriate actions and processes according to a computer program stored in a ROM (Read-Only Memory) 602 or a computer program loaded from a storage unit 608 into a RAM (Random Access Memory ) 603. In the RAM 603, various programs and data required for the operation of the device 600 may also be stored. The computing unit 601, ROM 602, and RAM 603 are connected to each other by a bus 604. An I/O (Input/Output) interface 605 is also connected to bus 604.

Various components in the device 600 are connected to the I/O interface 605, including: an input unit 606 such as a keyboard, mouse, etc.; an output unit 607 such as various types of displays, speakers, and the like; a storage unit 608, such as a magnetic disk, optical disk, or the like; and a communication unit 609 such as a network card, modem, wireless communication transceiver, etc. The communication unit 609 allows the device 600 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The computing unit 601 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of computing units 601 include, but are not limited to, a CPU (Central Processing Unit ), a GPU (Graphic Processing Units, graphics processing unit), various dedicated AI (Artificial Intelligence ) computing chips, various computing units running machine learning model algorithms, DSPs (Digital Signal Processor, digital signal processors), and any suitable processors, controllers, microcontrollers, and the like. The computing unit 601 performs the various methods and processes described above, such as the detection method of live room violations. For example, in some embodiments, the method of detecting live room violations may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 608. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device 600 via the ROM 602 and/or the communication unit 609. When the computer program is loaded into RAM 603 and executed by the computing unit 601, one or more steps of the method described above may be performed. Alternatively, in other embodiments, the computing unit 601 may be configured to perform the aforementioned live-room violation detection method in any other suitable way (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit System, FPGA (Field Programmable Gate Array ), ASIC (Application-Specific Integrated Circuit, application-specific integrated circuit), ASSP (Application Specific Standard Product, special-purpose standard product), SOC (System On Chip ), CPLD (Complex Programmable Logic Device, complex programmable logic device), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for carrying out methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, RAM, ROM, EPROM (Electrically Programmable Read-Only-Memory, erasable programmable read-Only Memory) or flash Memory, an optical fiber, a CD-ROM (Compact Disc Read-Only Memory), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., CRT (Cathode-Ray Tube) or LCD (Liquid Crystal Display ) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: LAN (Local Area Network ), WAN (Wide Area Network, wide area network), internet and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual Private Server" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be noted that, artificial intelligence is a subject of studying a certain thought process and intelligent behavior (such as learning, reasoning, thinking, planning, etc.) of a computer to simulate a person, and has a technology at both hardware and software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, a machine learning/deep learning technology, a big data processing technology, a knowledge graph technology and the like.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps recited in the present disclosure may be performed in parallel or sequentially or in a different order, provided that the desired results of the technical solutions of the present disclosure are achieved, and are not limited herein.

The above detailed description should not be taken as limiting the scope of the present disclosure. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A method for detecting illegal behaviors in a live broadcasting room comprises the following steps:

if the second video frame is determined to have second illegal behaviors, determining that the video to be detected has live broadcasting room illegal behaviors according to the first illegal behaviors and the second illegal behaviors;

the frame extraction is carried out on the video to be detected to obtain a plurality of video frames, which comprises the following steps:

segmenting the video to be detected according to a preset video duration threshold value to obtain a video segment to be detected;

sequentially extracting frames from each video segment to be detected to obtain a plurality of video frames;

the video segment to be detected comprises a first video segment and a second video segment, wherein the second video segment is an adjacent video segment to be detected on the first video segment; the method further comprises the steps of:

If the target video frame extracted in the first video segment and the reference video frame extracted in the second video segment both determine that the abnormal mark of the illegal action exists, merging the target video frame and the reference video frame; the target video frame is a first video frame of the first video segment, and the reference video frame is a second video frame of the second video segment;

and calculating the second start-stop time length of the illegal action according to the time length of the target video frame and the time length of the reference video frame.

2. The method of detecting live room violations of claim 1, wherein the method further comprises:

and merging the duration of the first video frame and the duration of the second video frame to obtain a first starting duration of the illegal action in the video to be detected.

3. The method of detecting live room violations of claim 1, wherein the method further comprises:

under the condition that the video to be detected has live broadcasting room violations, determining the severity level of the violations according to the anomaly marks;

and issuing a corresponding alarm prompt to the live broadcasting room according to the severity level of the illegal action.

4. The method of detecting live room violations of claim 1, wherein the method further comprises:

acquiring training video frame information, wherein the training video frame information comprises: sample video frames and sample target information for the sample video frames;

inputting the sample video frame into an offence detection model to obtain marked offence and predicted target information in the sample video frame;

and training the offence detection model according to the difference between the sample target information and the predicted target information to obtain a trained offence detection model.

5. The method of detecting live room violations of any of claims 1-4, wherein the violations comprise any or a combination of any of the following behaviors, comprising: smoking behavior, advertising behavior, clothing exposure behavior, and soiling behavior.

6. The method for detecting a live house violation according to claim 5, wherein when the violation is a smoking behavior, the sequentially inputting the plurality of video frames into a violation detection model to determine whether there is a violation in the video frames, and performing an anomaly marking on the violation comprises:

And sequentially inputting the plurality of video frames into the illegal action detection model to determine whether the abnormal marks of cigarettes and/or smoke exist in the video frames.

7. A device for detecting live-room violations, comprising:

the first confirming module is used for confirming whether a second illegal action exists in a second video frame according to the abnormal mark of the second video frame after the first illegal action exists in the first video frame according to the abnormal mark of the first video frame, wherein the second video frame is the last video frame of the first video frame in the plurality of video frames;

the first determining module is used for determining that the video to be detected has live broadcasting illegal behaviors according to the first illegal behaviors and the second illegal behaviors when determining that the second video frame has the second illegal behaviors;

wherein, the frame extraction module further comprises:

The segmentation unit is used for segmenting the video to be detected according to a preset video duration threshold value to obtain a video segment to be detected;

the frame extraction unit is used for sequentially extracting frames from each video segment to be detected to obtain a plurality of video frames;

the video segment to be detected comprises a first video segment and a second video segment, wherein the second video segment is an adjacent video segment to be detected on the first video segment; the apparatus further comprises:

the second merging module is used for merging the target video frame and the reference video frame when the target video frame extracted in the first video segment and the reference video frame extracted in the second video segment both determine that the abnormal mark of the illegal action exists; the target video frame is a first video frame of the first video segment, and the reference video frame is a second video frame of the second video segment;

and the calculating module is used for calculating the second start-stop time length of the illegal action according to the time length of the target video frame and the time length of the reference video frame.

8. The apparatus for detecting live room violations of claim 7, wherein the apparatus further comprises:

and the first merging module is used for merging the duration of the first video frame and the duration of the second video frame to obtain a first starting duration of the illegal action in the video to be detected.

9. The apparatus for detecting live room violations of claim 7, wherein the apparatus further comprises:

the second determining module is used for determining the severity level of the illegal act according to the abnormal mark under the condition that the video to be detected has the illegal act in the live broadcasting room;

and the issuing module is used for issuing a corresponding alarm prompt to the live broadcasting room according to the severity level of the illegal action.

10. The apparatus for detecting live room violations of claim 7, wherein the apparatus further comprises:

the acquisition module is used for acquiring training video frame information, wherein the training video frame information comprises: sample video frames and sample target information for the sample video frames;

the second input module is used for inputting the sample video frame into the offence detection model so as to acquire marked offence and predicted target information in the sample video frame;

and the training module is used for training the offence detection model according to the difference between the sample target information and the predicted target information so as to obtain a trained offence detection model.

11. The apparatus for detecting live room violations according to any of claims 7 to 10, wherein the violations comprise any or a combination of any of the following behaviors, including: smoking behavior, advertising behavior, clothing exposure behavior, and soiling behavior.

12. The apparatus for detecting a live room violation according to claim 11, wherein when the violation is a smoking behavior, the input module is further configured to sequentially input the number of video frames into the violation detection model to determine whether an abnormality flag of a cigarette and/or smoke exists in the video frames.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-6.

14. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-6.