CN113965772A

CN113965772A - Live video processing method and device, electronic equipment and storage medium

Info

Publication number: CN113965772A
Application number: CN202111279813.5A
Authority: CN
Inventors: 刘洋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-21

Abstract

The application discloses a live video processing method, a live video processing device, electronic equipment and a storage medium, which relate to the field of image processing, in particular to the field of information flow and deep learning, and are specifically realized in a way that a frame image of each live video flow is acquired in response to a processing request of a plurality of live video flows; extracting face information of each frame image to obtain the face information of each frame image; clustering the face information of each frame image to obtain at least one cluster; and identifying whether the same anchor exists in the anchors corresponding to the live video streams according to the at least one cluster. The method and the device reduce the workload of manual auditing, reduce the labor cost of auditing and realize the increase of the coverage of illegal auditing.

Description

Live video processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to the field of information flow and deep learning, and in particular, to a live video processing method and apparatus, an electronic device, and a storage medium.

Background

With the popularization of 5G (fifth generation mobile communication technology) and the upgrade of various infrastructures, video has become one of the main carriers of information communication, live video broadcast has also been widely deepened into industries such as e-commerce and e-competition, and considerable economic benefits are brought. The violation detection in the live video is strongly required, and most of the related technologies adopt modes such as reporting and human review to realize the violation detection in the live video.

However, for the illegal behavior of one-person multicast in live video, an effective detection and review method is lacking at present, and due to the fact that the data of the whole network is involved, the report of a user is difficult to achieve a large coverage rate, and the manual review workload is large, and the labor cost is high.

Disclosure of Invention

The application provides a live video processing method and device, electronic equipment and a storage medium.

According to an aspect of the present application, there is provided a live video processing method, including:

responding to processing requests of a plurality of live video streams, and acquiring a frame image of each live video stream;

extracting face information of each frame image to obtain the face information of each frame image;

clustering the face information of each frame image to obtain at least one cluster;

and identifying whether the same anchor exists in the anchors corresponding to the live broadcast video streams according to the at least one cluster.

According to another aspect of the present application, there is provided a live video processing apparatus including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for responding to processing requests of a plurality of live video streams and acquiring frame images of each live video stream;

the extraction module is used for extracting the face information of each frame image to obtain the face information of each frame image;

the processing module is used for clustering the face information of each frame image to obtain at least one cluster;

and the identification module is used for identifying whether the same anchor exists in the anchors corresponding to the live video streams according to the at least one cluster.

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present application, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the aforementioned first aspect.

According to a fifth aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements the method according to the preceding first aspect.

According to the technical scheme, the frame images of the live broadcast video streams are obtained, the face information in the frame images is clustered to obtain cluster clusters, whether the same anchor exists in the anchors corresponding to the live broadcast video streams is identified based on the cluster clusters, namely, the frame images are extracted from the live broadcast video streams, clustering is carried out based on the face information in the frame images, a judgment condition is provided for auditing one-person multicast illegal behaviors based on a clustering result, the live broadcast video streams without one-person multicast illegal behaviors are screened out, the machine auditing function aiming at the illegal scenes of one-person multicast (the live broadcast video of the same anchor is played on a plurality of account numbers simultaneously, and possible video acquisition angles are different) in the live broadcast is realized, so that the workload of manual auditing can be reduced, and the purposes of improving auditing efficiency and reducing the manual cost of auditing are achieved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The accompanying drawings are included to provide a better understanding of the present solution and are not to be considered limiting of the present application, in which:

fig. 1 is a flowchart of a live video processing method according to an embodiment of the present application;

fig. 2 is a flowchart of another live video processing method according to an embodiment of the present application;

fig. 3 is a schematic flowchart of live video processing provided in an embodiment of the present application;

fig. 4 is a flowchart of another live video processing method according to an embodiment of the present application;

fig. 5 is a block diagram illustrating a structure of a live video processing apparatus according to an embodiment of the present disclosure;

fig. 6 is a block diagram of another live video processing apparatus according to an embodiment of the present disclosure;

fig. 7 is a block diagram of another live video processing apparatus according to an embodiment of the present application;

fig. 8 is a block diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, with the popularization of 5G (fifth generation mobile communication technology) and the upgrade of various infrastructures, video has become one of the main carriers of information communication, and live video has also been widely inserted into industries such as e-commerce and e-competition, and brings considerable economic benefits. The violation detection in live video is strongly required, and most of the current methods adopt reporting, human review and the like. At present, an effective detection and examination method is lacked for the illegal action of one-person multicast in video live broadcast, and due to the fact that the whole network data is involved, the coverage range of reporting by a user is limited, manual examination is difficult to implement, and the cost is high.

Based on the above problems, the present application provides a live video processing method, apparatus, electronic device and storage medium. The method and the device have the advantages that the face information is obtained from the original video stream data, clustering processing is carried out based on the face information, and whether one-man multicast illegal behavior exists or not is judged based on a clustering result. Specifically, a live video processing method, an apparatus, an electronic device, and a storage medium according to embodiments of the present application are described below with reference to the drawings.

Fig. 1 is a flowchart of a live video processing method according to an embodiment of the present application. It should be noted that the live video processing method according to the embodiment of the present application is applicable to the live video processing apparatus according to the embodiment of the present application, and the live video processing apparatus may be configured on an electronic device. As shown in fig. 1, the live video processing method includes the following steps:

step 101, in response to a processing request of a plurality of live video streams, acquiring a frame image of each live video stream.

In this embodiment of the present application, the processing request may be a violation detection request, which is used to detect whether there is a one-to-one multicast violation in a live video stream.

In some embodiments of the present application, the frame image may be understood as a key frame image in a live video stream, or may also be a frame image with a human face in the live video stream. For example, when processing requests for a plurality of live video streams are received, one frame image of each live video stream may be extracted, such as taking live video stream 1, live video stream 2, and live video stream 3 as an example, and when processing requests for these three live video streams are received, a key frame image of live video stream 1 may be extracted, a key frame image in live video stream 2 may be extracted, and a key frame image of live video stream 3 may be extracted.

It should be noted that the live video processing method in the embodiment of the present application is applicable to a one-person multicast violation detection scenario for a large amount of live video data of the whole network. For example, a large number of live video streams may be acquired, and key frame images of each live video stream may be extracted upon receiving a processing request for the live video streams.

And 102, extracting the face information of each frame image to obtain the face information of each frame image.

It should be noted that there are many methods for extracting face information from a frame image, for example, feature extraction of a face is performed by HOG (Histogram of Oriented grids), or feature extraction of a face is performed by CNN (convolutional neural network feature extraction). It can be understood that there are many methods for extracting face information from a frame image, and two examples are given below to describe specific methods for extracting face information:

as an example, the feature extraction of the human face is performed by using the HOG, the image is firstly divided into small connected regions, the connected regions are cell units, direction histograms of gradients or edges of all pixel points in the cell units are collected, and the direction histograms are combined to form feature description.

As another example, CNN is composed of three structures of convolution, activation and pooling. The convolutional layer contains a plurality of convolutional kernels, the convolutional kernels scan the image to obtain output data called a feature map, and finally, the abstract representation of the image is obtained through steps of activation, pooling, operation and the like.

It should be noted that the above-mentioned manner of extracting the face information is only for facilitating understanding of how to extract the face information in the frame image by those skilled in the art, and cannot be taken as a specific limitation of the present application, that is, the present application may also adopt other means to extract the face information from the frame image, and details are not described herein again.

And 103, clustering the face information of each frame image to obtain at least one cluster.

Optionally, a clustering algorithm may be used to cluster the face information of each frame image to obtain at least one cluster.

In some embodiments of the present application, the clustering algorithm may be a partition-based clustering method, such as a K-means (K-means) clustering algorithm. Alternatively, the Clustering algorithm may be a hierarchical-based Clustering method, such as BIRCH (Balanced Iterative Clustering based on hierarchical structure) algorithm, and the like. Alternatively, the Clustering algorithm may be a Density-Based Clustering algorithm, such as a DBSCAN (Density-Based Spatial Clustering of Applications with Noise) algorithm, a mean shift Clustering algorithm, and the like. Alternatively, the clustering algorithm may be a grid-based clustering algorithm, or alternatively, a model-based clustering algorithm. This application is not particularly limited.

In order to make it more clear for those skilled in the art how to implement the clustering process, the following will describe the mean shift clustering algorithm in detail as an example.

It should be noted that the mean shift clustering is based on a sliding window algorithm to find a dense region of data points. This is a centroid-based algorithm that locates the center point of each group/class by updating the candidate points for the center point to the mean of the points within the sliding window. And then removing similar windows from the candidate windows to finally form a central point set and corresponding groups.

For example, after the face information of each frame image is obtained, the radius r of the sliding window is determined, and the sliding is started with a circular sliding window with the radius r of a randomly selected center point C. The mean shift is similar to a hill climbing algorithm, moving to a more dense region in each iteration until convergence. Each time a new region is slid, the mean value within the sliding window is calculated as the center point, and the number of points within the sliding window is the density within the window. In each movement, the window moves to a more dense region. Moving the window, calculating the center point within the window and the density within the window until there is no direction to accommodate more points within the window, i.e., moving until the density within the circle no longer increases. In the above process, a plurality of sliding windows are generated, when the sliding windows overlap, the window containing the most points is reserved, and then clustering is performed according to the sliding window where the data point is located, so that a clustering result can be obtained, wherein the clustering result can be at least one clustering cluster. Wherein a data point can be understood as corresponding face information, i.e. face information of a face corresponding to a data point.

And 104, identifying whether the same anchor exists in the anchors corresponding to the live video streams according to the at least one cluster.

Optionally, whether the same anchor exists in the anchors corresponding to the multiple live broadcast video streams is identified based on the content in each cluster, so that detection of one-person multicast illegal behavior can be realized in a face clustering manner by performing cross analysis on a large amount of live broadcast video data.

According to the live video processing method, the frame image is obtained by performing frame extraction processing on the live video stream, a condition is provided for subsequent auditing, the live video stream without one multicast violation is screened out by performing clustering processing on the face information in the frame image and providing a judgment condition for auditing the one-man multicast violation based on a clustering result, and the machine auditing function of an illegal scene without one-man multicast violation in live broadcasting (the live video of the same main broadcast is played on a plurality of account numbers at the same time, and the video acquisition angles are different) is realized, the coverage rate is high, the practicability is realized, the workload of manual auditing can be reduced, and the purposes of improving the auditing efficiency and reducing the auditing manual cost are achieved.

Fig. 2 is a flowchart of another live video processing method according to an embodiment of the present application. As shown in fig. 2, the live video processing method may include:

step 201, in response to a processing request of a plurality of live video streams, acquiring a frame image of each live video stream.

In the embodiment of the present application, step 201 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.

Step 202, extracting the face information of each frame image to obtain the face information of each frame image.

In the embodiment of the present application, step 202 may be implemented by any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.

And 203, clustering the face information of each frame image to obtain at least one cluster.

In the embodiment of the present application, step 203 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.

And step 204, determining the number of the face information in each cluster.

It can be understood that, since the cluster is a result obtained by clustering the face information, and one face information corresponds to one face, the number of the face information included in each cluster can be counted.

Step 205, in response to the existence of the target cluster with the number greater than the preset threshold in at least one cluster, determining that the same anchor exists in the anchors corresponding to the plurality of live broadcast video streams.

Optionally, the number of the face information included in the cluster may be compared with a preset threshold, and if a target cluster with the number greater than the preset threshold can be found from the cluster, it is indicated that the same anchor exists in anchors corresponding to the plurality of live video streams. For example, if the number of face information in a certain cluster is greater than a threshold (e.g., 1), it may be determined that there is a possibility of a one-person multicast violation.

And step 206, sending the live broadcast video streams corresponding to the face information in the target cluster to a manual auditing terminal.

Optionally, it is determined that there is a possibility of one multicast violation based on the number of face information in the cluster, and live video streams corresponding to the face information in the cluster may be sent to the manual review terminal.

According to the live broadcast video processing method, the identification of the face information in the live broadcast video stream is realized by determining the number of the face information in each cluster, a judgment condition is provided for the auditing of one-person multicast violation behaviors, the possibility of one-person multicast violation is judged by comparing the number of the face information contained in the cluster with a preset threshold value, the machine auditing function for the violation scenes of one-person multicast (live broadcast videos of the same main broadcast are simultaneously played on a plurality of account numbers, and possible video acquisition angles are different) in the live broadcast is realized, the coverage rate is high, the practicability is realized, the workload of manual auditing can be reduced, and the purposes of improving the auditing efficiency and reducing the labor cost of auditing are achieved.

In order to further improve the detection result and reduce the manual examination cost. Optionally, as shown in fig. 3 and 4, the live video processing method may include:

step 401, in response to a processing request of a plurality of live video streams, acquiring a frame sequence of each of the plurality of live video streams.

Wherein the frame sequence means that the live video stream is represented by image files of one frame and one frame in sequence. For example, taking a live video stream as an example, frame images can be extracted from the live video stream and combined into a sequence to obtain a frame sequence of the live video stream.

Step 402, sampling a plurality of frame sequences for N times, and acquiring a frame image from each of the plurality of frame sequences each time; wherein N is an integer greater than or equal to 1.

For example, taking three live video streams as an example, in response to a processing request of a live video stream, it is assumed that the frame sequence of the live video stream 1 includes frame images 11,12, and 13, the frame sequence of the live video stream 2 includes frame images 21,22, and 23, and the frame sequence of the live video stream 3 includes frame images 31,32, and 33. Taking triple sampling as an example, the frame image 13 is acquired from the frame sequence of the live video stream 1, the frame image 23 is acquired from the frame sequence of the live video stream 2, and the frame image 33 is acquired from the frame sequence of the live video stream 3, wherein each sampling acquires one frame image from the frame sequence.

Step 403, determining a frame image of each live video stream according to the frame image obtained by current sampling.

For example, taking three live video streams as an example, assume that the frame sequence of the live video stream 1 includes frame images 11,12, and 13, the frame sequence of the live video stream 2 includes frame images 21,22, and 23, and the frame sequence of the live video stream 3 includes frame images 31,32, and 33. Each time a frame image is acquired from a plurality of frame sequences, taking a third sampling as an example, the frame image 12 is acquired from the frame sequence of the live video stream 1, the frame image 22 is acquired from the frame sequence of the live video stream 2, and the frame image 32 is acquired from the frame sequence of the live video stream 3, at this time, the currently acquired frame image 12, frame image 22, and frame image 32 can be determined as the frame images of the live video stream 1, live video stream 2, and live video stream 3.

And step 404, extracting the face information of each frame image to obtain the face information of each frame image.

In the embodiment of the present application, step 404 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.

Step 405, performing clustering processing on the face information of each frame image to obtain at least one cluster.

In the embodiment of the present application, step 405 may be implemented by using any one of the embodiments of the present application, which is not limited in this embodiment and is not described again.

And 406, determining that the same anchor exists in anchors corresponding to the plurality of live video streams when target cluster clusters obtained based on the face information in the sampled frame images are the same each time.

In the embodiment of the application, the target cluster is a cluster in which the number of face information in at least one cluster is greater than a preset threshold.

For example, as shown in fig. 3, in response to a processing request of N live video streams, obtaining respective frame sequences of the N live video streams, sampling the N frame sequences N times, acquiring one frame image from each of the N frame sequences each time, determining a frame image of each live video stream according to a frame image obtained by current sampling, extracting face information of each frame image, obtaining face information of each frame image, performing clustering processing on the face information of each frame image, obtaining at least one cluster, and if a cluster in which the number of the face information is greater than a preset threshold exists in the cluster, a one-person multicast behavior may exist in the live video stream corresponding to the cluster. For example, as shown in fig. 3, the cluster after the clustering process is represented by a circle, where a number 1 in the cluster represents that the number of face information in the cluster is 1, a number 2 in the cluster represents that the number of face information in the cluster is 2, and a number 5 in the cluster represents that the number of face information in the cluster is 5, so that whether a multicast behavior of one person exists in the corresponding live broadcast video stream can be determined according to the number of face information in the cluster. When the target cluster obtained based on the face information in the frame image sampled every time is the same, the same anchor is determined to exist in the anchors corresponding to the live broadcast video streams.

Taking three live video streams as an example, in response to a processing request of a live video stream, it is assumed that the frame sequence of the live video stream 1 includes frame images 11,12, and 13, the frame sequence of the live video stream 2 includes frame images 21,22, and 23, and the frame sequence of the live video stream 3 includes frame images 31,32, and 33. Taking the 3 rd sampling as an example, each acquisition acquires a frame image from a plurality of frame sequences; the frame image 13 is collected from the frame sequence of the live video stream 1, the frame image 23 is collected from the frame sequence of the live video stream 2, and the frame image 33 is collected from the frame sequence of the live video stream 3, at this time, the currently collected frame image 13, frame image 23, and frame image 33 may be determined as the frame images of the live video stream 1, live video stream 2, and live video stream 3 sampled this time. Assuming that after face clustering processing is performed on the frame images 13, 23 and 33, 2 cluster clusters are obtained, where the cluster 1 includes face information in the frame images 13 and face information in the frame images 23, and the cluster 2 includes face information in the frame images 33, it can be seen that there is a possibility that one person multicasts a live video stream corresponding to the cluster 1. Each sampling respectively acquires a frame image from a plurality of frame sequences, and the frame images acquired each time are clustered once to obtain a primary identification result, and three times of sampling are performed to obtain a three-time identification result, and the three-time identification result is counted to obtain a final result. And pushing the multi-channel videos corresponding to the face information in the cluster into a human review process, and further confirming through manual review.

According to the live video processing method, the frame sequence is obtained, and the frame sequence is sampled for multiple times, so that the frame image is obtained, a material is provided for auditing the subsequent one-person multicast violation, the frame image is more representative, the frame image can reflect the real condition of the corresponding live video stream, and the auditing result of one-person multicast violation is guaranteed.

In order to implement the above embodiment, the present application further provides a live video processing apparatus.

Fig. 5 is a block diagram of a live video processing apparatus according to an embodiment of the present application. As shown in fig. 5, the live video processing apparatus may include: an acquisition module 510, an extraction module 520, a processing module 530, and a recognition module 540.

The obtaining module 510 is configured to obtain a frame image of each live video stream in response to a processing request of multiple live video streams.

And an extracting module 520, configured to perform face information extraction on each frame image to obtain face information of each frame image.

The processing module 530 is configured to perform clustering processing on the face information of each frame image to obtain at least one cluster.

The identifying module 540 is configured to identify whether the same anchor exists in anchors corresponding to the multiple live video streams according to the at least one cluster.

The live broadcast video processing device of the embodiment of the application obtains the frame image by performing frame extraction processing on the live broadcast video stream, provides conditions for subsequent auditing, provides judgment conditions for auditing one-person multicast illegal behaviors based on clustering results by clustering face information in the frame image, screens out the live broadcast video stream without one-person multicast illegal behaviors, realizes the machine auditing function of illegal scenes aiming at one-person multicast (live broadcast video of the same main broadcast is played on a plurality of account numbers simultaneously, and possible video acquisition angles are different) in live broadcast, has high coverage rate and feasibility, can reduce the workload of manual auditing, and achieves the purposes of improving auditing efficiency and reducing auditing labor cost.

In some embodiments of the present application, as shown in fig. 6, fig. 6 is a block diagram of a live video processing apparatus according to another embodiment of the present application, where the live video processing apparatus may include: an acquisition module 610, an extraction module 620, a processing module 630, a determination module 640, a determination module 650, and a manual review module 660.

The obtaining module 610 is configured to obtain a frame image of each live video stream in response to a processing request of a plurality of live video streams.

And an extracting module 620, configured to perform face information extraction on each frame image, and obtain face information of each frame image.

The processing module 630 is configured to perform clustering processing on the face information of each frame image to obtain at least one cluster.

And the determining module 640 is configured to determine the number of the face information in each cluster.

The determining module 650 is configured to determine that the same anchor exists in anchors corresponding to multiple live broadcast video streams in response to that at least one cluster includes target clusters whose number is greater than a preset threshold.

And the manual review module 660 is configured to send the live broadcast video streams corresponding to the plurality of face information in the target cluster to a manual review terminal.

According to the live broadcast video processing device, the identification of the face information in the video stream is realized by determining the number of the face information in each cluster, a judgment condition is provided for the auditing of one-person multicast violation behaviors, the number of the face information contained in each cluster is compared with a preset threshold value so as to judge whether one-person multicast violation possibility exists, the machine auditing function of violation scenes of one-person multicast (live broadcast videos of the same main broadcast are simultaneously played on a plurality of account numbers, and possible video acquisition angles are different) in live broadcast is realized, the coverage rate is high, the practicability is realized, the workload of manual auditing can be reduced, and the purposes of improving the auditing efficiency and reducing the auditing labor cost are achieved.

Wherein, 610-630 in fig. 6 and 510-530 in fig. 5 have the same functions and structures.

In some embodiments of the present application, as shown in fig. 7, fig. 7 is a block diagram of a live video processing apparatus according to another embodiment of the present application, where the acquiring module 710 in the live video processing apparatus includes: an acquisition unit 711, a sampling unit 712, a determination unit 713, an extraction unit 714, a clustering unit 715, and a judgment unit 716.

The acquiring unit 711 is configured to acquire a frame sequence of each of the plurality of live video streams in response to a processing request of the plurality of live video streams.

A sampling unit 712, configured to sample the multiple frame sequences N times, and acquire a frame image from each of the multiple frame sequences each time; wherein N is an integer greater than or equal to 1.

A determining unit 713, configured to determine a frame image of each live video stream according to the frame image obtained by the current sampling.

The extracting unit 714 is configured to perform face information extraction on each frame image, and acquire face information of each frame image.

And the clustering unit 715 is configured to perform clustering processing on the face information of each frame image to obtain at least one cluster.

The determining unit 716 is configured to determine that the same anchor exists in anchors corresponding to the multiple live video streams when target cluster clusters obtained based on the face information in the frame image sampled each time are the same.

According to the live video processing device, the frame sequence is obtained, and the frame sequence is sampled for multiple times, so that the frame image is obtained, a material is provided for auditing the subsequent one-person multicast violation, the frame image is more representative, the frame image can reflect the real condition of the corresponding live video stream, and the auditing result of one-person multicast violation is guaranteed.

Wherein 710-760 in FIG. 7 and 610-660 in FIG. 6 have the same functions and structures

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 8, the embodiment of the present application is a block diagram of an electronic device of a method for live video processing. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 8, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 8 illustrates an example of a processor 801.

The memory 802 is a non-transitory computer readable storage medium as provided herein. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of live video processing provided herein. A non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform a method of live video processing as provided herein.

Memory 802, as a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules (e.g., acquisition module 510, extraction module 520, processing module 530, and recognition module 540 shown in fig. 5) corresponding to the methods of live video processing in embodiments of the present application. The processor 801 executes various functional applications of the server and data processing, i.e., a method of implementing live video processing in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 802.

The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created from use of an electronic device for live video processing, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, memory 802 optionally includes memory located remotely from processor 801, which may be connected to a live video processing electronic device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for live video processing may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 8.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for live video processing, such as an input device like a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

According to the technical scheme of the embodiment of the application, the frame image is obtained, the face information in the frame image is subjected to clustering processing, the auditing condition is provided for the judgment of one-person multicast illegal behavior, the live broadcast video stream without one-person multicast illegal behavior is screened out, and the workload of manual auditing is reduced, so that the aims of increasing the auditing efficiency of one-person multicast illegal behavior and reducing the labor cost of auditing are fulfilled.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A live video processing method includes:

2. The method of claim 1, wherein the identifying whether the same anchor exists in the anchors corresponding to the plurality of live video streams according to the at least one cluster comprises:

determining the number of face information in each cluster;

and determining that the same anchor exists in anchors corresponding to the plurality of live broadcast video streams in response to the target cluster of which the number is greater than a preset threshold existing in the at least one cluster.

3. The method of claim 2, further comprising:

and sending the live broadcast video streams corresponding to the face information in the target cluster to a manual auditing terminal.

4. The method of claim 1, wherein said obtaining frame images of each of said live video streams comprises:

acquiring respective frame sequences of the plurality of live video streams;

sampling a plurality of frame sequences for N times, and acquiring a frame image from each of the plurality of frame sequences each time; wherein N is an integer greater than or equal to 1;

and determining the frame image of each live video stream according to the frame image obtained by current sampling.

5. The method of claim 4, wherein the identifying whether the same anchor exists in the anchors corresponding to the plurality of live video streams according to the at least one cluster comprises:

when target cluster clusters obtained based on face information in frame images sampled every time are the same, determining that the same anchor exists in anchors corresponding to the live broadcast video streams;

the target cluster is a cluster in which the number of face information in the at least one cluster is greater than a preset threshold.

6. A live video processing apparatus comprising:

7. The apparatus of claim 6, wherein the identification module is specifically configured to:

determining the number of face information in each cluster;

8. The apparatus of claim 7, wherein the identification module is further specifically configured to:

9. The apparatus of claim 6, wherein the means for obtaining comprises:

an obtaining unit, configured to obtain a frame sequence of each of the plurality of live video streams;

the sampling unit is used for sampling the plurality of frame sequences for N times, and acquiring a frame image from each of the plurality of frame sequences each time; wherein N is an integer greater than or equal to 1;

and the determining unit is used for determining the frame image of each live video stream according to the frame image obtained by current sampling.

10. The apparatus of claim 9, wherein the identification module is specifically configured to:

11. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-5.

12. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-5.

13. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 5.