CN113965772B

CN113965772B - Live video processing method and device, electronic equipment and storage medium

Info

Publication number: CN113965772B
Application number: CN202111279813.5A
Authority: CN
Inventors: 刘洋
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2024-05-10
Anticipated expiration: 2041-10-29
Also published as: CN113965772A

Abstract

The application discloses a live video processing method, a device, electronic equipment and a storage medium, which relate to the field of image processing, in particular to the field of information flow and deep learning, and concretely realize the method that in response to processing requests of a plurality of live video flows, frame images of each live video flow are obtained; extracting face information of each frame image to obtain the face information of each frame image; clustering the face information of each frame image to obtain at least one cluster; and identifying whether the same anchor exists in the anchors corresponding to the live video streams according to at least one cluster. The application reduces the workload of manual auditing, reduces the auditing labor cost and realizes the increase of the coverage of illegal auditing.

Description

Live video processing method and device, electronic equipment and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to the field of information flow and deep learning, and in particular, to a live video processing method, apparatus, electronic device, and storage medium.

Background

With the popularization of 5G (fifth generation mobile communication technology) and the upgrading of various infrastructures, video has become one of the main carriers for information communication, and live video broadcast has also been widely extended into the industries of electronic commerce, electronic competition and the like, and brings considerable economic benefits. Aiming at strong demand for detecting violations in live video broadcast, most of related technologies adopt modes of reporting, human review and the like to realize the detection of violations in live video broadcast.

However, aiming at the illegal behavior of one person in video live broadcast, an effective detection and examination method is lacking at present, and because the method relates to whole network data, the user report is difficult to achieve larger coverage rate, and the manual examination workload is large, so that the labor cost is higher.

Disclosure of Invention

The application provides a live video processing method, a live video processing device, electronic equipment and a storage medium.

According to an aspect of the present application, there is provided a live video processing method, including:

Responding to processing requests of a plurality of live video streams, and acquiring frame images of each live video stream;

extracting face information of each frame image to obtain the face information of each frame image;

Clustering the face information of each frame image to obtain at least one cluster;

and identifying whether the same anchor exists in the anchors corresponding to the live video streams according to the at least one cluster.

According to another aspect of the present application, there is provided a live video processing apparatus, including:

The acquisition module is used for responding to the processing requests of the live video streams and acquiring frame images of each live video stream;

the extraction module is used for extracting the face information of each frame image and obtaining the face information of each frame image;

The processing module is used for carrying out clustering processing on the face information of each frame image to obtain at least one cluster;

And the identification module is used for identifying whether the same anchor exists in the anchors corresponding to the live video streams according to the at least one cluster.

According to a third aspect of the present application, there is provided an electronic device comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

According to a fourth aspect of the present application there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of the preceding first aspect.

According to a fifth aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method according to the first aspect described above.

According to the technical scheme, the clustering cluster is obtained by acquiring the frame images of the live video streams and carrying out clustering processing on the face information in the frame images, whether the same anchor exists in the anchor corresponding to each live video stream or not is identified based on the clustering cluster, namely, the frame images are extracted from the live video streams and clustered based on the face information in the frame images, judgment conditions are provided for checking the one-person multicast illegal action based on the clustering result, the live video stream without the one-person multicast illegal action is screened, the machine checking function of the illegal scene of one-person multicast (the live video of the same anchor is played on a plurality of accounts at the same time, and the possible video acquisition angles are different) in the live video is realized, so that the workload of manual checking can be reduced, and the purposes of improving checking efficiency and reducing checking labor cost are achieved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The accompanying drawings are included to provide a better understanding of the application, and are not to be construed as limiting the application, wherein:

fig. 1 is a flowchart of a live video processing method according to an embodiment of the present application;

fig. 2 is a flowchart of another live video processing method according to an embodiment of the present application;

fig. 3 is a schematic flow chart of live video processing according to an embodiment of the present application;

fig. 4 is a flowchart of another live video processing method according to an embodiment of the present application;

Fig. 5 is a block diagram of a live video processing apparatus according to an embodiment of the present application;

Fig. 6 is a block diagram of another live video processing apparatus according to an embodiment of the present application;

Fig. 7 is a block diagram of another live video processing apparatus according to an embodiment of the present application;

Fig. 8 is a block diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present application are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

It should be noted that, with the popularization of 5G (fifth generation mobile communication technology) and the upgrade of various infrastructures, video has become one of the main carriers for information communication, and video live broadcast has also been widely extended into industries such as electronic commerce and electronic competition, and brings considerable economic benefits. Aiming at strong illegal detection requirements in video live broadcast, most of the current methods are reporting, human review and the like. At present, an effective detection and examination method for the illegal behavior of one person in video live broadcast is lacking, and due to the fact that the whole network data are involved, the coverage range of user report is limited, manual examination is difficult to implement, and the cost is high.

Based on the above problems, the application provides a live video processing method, a live video processing device, electronic equipment and a storage medium. According to the application, the face information is obtained from the original video stream data, clustering processing is carried out based on the face information, and whether the illegal behavior of a person multicast exists is judged based on a clustering result. Specifically, a live video processing method, a live video processing device, an electronic device and a storage medium according to embodiments of the present application are described below with reference to the accompanying drawings.

Fig. 1 is a flowchart of a live video processing method according to an embodiment of the present application. It should be noted that, the live video processing method of the embodiment of the present application may be applied to the live video processing apparatus of the embodiment of the present application, and the live video processing apparatus may be configured on an electronic device. As shown in fig. 1, the live video processing method includes the following steps:

Step 101, responding to processing requests of a plurality of live video streams, and acquiring frame images of each live video stream.

In this embodiment of the present application, the processing request may be a violation detection request, which is used to detect whether there is a violation of one-person multicasting in the live video stream.

In some embodiments of the present application, the frame image may be understood as a key frame image in a live video stream, or may also be a frame image with a face in a live video stream. For example, when processing requests for a plurality of live video streams are received, one frame image of each live video stream may be extracted, such as live video stream 1, live video stream 2, and live video stream 3 for example, and when processing requests for these three live video streams are received, a key frame image of live video stream 1 may be extracted, a key frame image in live video stream 2 may be extracted, and a key frame image of live video stream 3 may be extracted.

It should be noted that, the live video processing method of the embodiment of the application can be suitable for a violation detection scene for performing one-person multicasting on a large amount of live video data in a whole network. For example, a large number of live video streams may be acquired, and upon receiving processing requests for these live video streams, key frame images for each live video stream may be extracted.

Step 102, extracting face information of each frame image, and obtaining the face information of each frame image.

There are many methods for extracting face information from a frame image, for example, feature extraction of a face by HOG (Histogram of Oriented Gridients oriented grid histogram) or feature extraction of a face by CNN (convolutional neural network). It will be appreciated that there are many methods for extracting face information from a frame image, and two examples are given below to describe a specific method for extracting face information:

As an example, feature extraction of a face is performed through HOG, an image is first divided into small connected regions, the connected regions are cell units, gradient or edge direction histograms of all pixel points in the cell units are collected, and feature description can be formed by combining the direction histograms.

As another example, CNN consists of three structures, a plurality of convolutional layers (convolutions), an activation (activation), and a pooling (pooling). The convolution layer comprises a plurality of convolution kernels, the convolution checks the image to scan, output data called a feature map is obtained, and finally abstract representation of the image is obtained through steps such as activation, pooling and operation.

It should be noted that, the above-mentioned manner of extracting the face information is only for facilitating the understanding of how to implement the extraction of the face information in the frame image, and the present application is not limited to the specific description of the present application, that is, the present application may also extract the face information from the frame image by other means, which is not repeated herein.

And 103, clustering the face information of each frame image to obtain at least one cluster.

Alternatively, a clustering algorithm may be used to perform clustering on the face information of each frame image to obtain at least one cluster.

Wherein, in some embodiments of the application, the clustering algorithm may be a partition-based clustering method, such as a K-means (K-means) clustering algorithm. Or the clustering algorithm may be a hierarchical based clustering method, such as a BIRCH (Balanced Iterative Reducing and Clustering using Hierarchies, hierarchical structure based balanced iterative clustering) algorithm, or the like. Or the clustering algorithm may be a Density-based clustering algorithm, such as a DBSCAN (Density-Based Spatial Clustering of Applications with Noise, density-based clustering method with noise) algorithm, a mean shift clustering algorithm, or the like. Or the clustering algorithm may be a grid-based clustering algorithm, or may also be a model-based clustering algorithm. The present application is not particularly limited thereto.

In order to facilitate the understanding of how the clustering process is implemented, those skilled in the art will be more aware of the details of the mean shift clustering algorithm.

It should be noted that, the mean shift clustering is a sliding window-based algorithm to find dense areas of data points. This is a centroid-based algorithm that locates the center point of each group/class by updating the candidate points for the center point to the mean of the points within the sliding window. And then removing similar windows from the candidate windows to finally form a center point set and corresponding groups.

For example, after face information of each frame image is obtained, a sliding window radius r is determined, and sliding starts with a circular sliding window with a radius r at a randomly selected center point C. Mean shift resembles a hill climbing algorithm, moving to a higher density region in each iteration until convergence. Each time a new region is slid to, the mean value within the sliding window is calculated as the center point, the number of points within the sliding window being the density within the window. With each movement, the window will move to a higher density area. The window is moved, the center point within the window and the density within the window are calculated until no direction can accommodate more points within the window, i.e. until the density within the circle no longer increases. In the above process, a plurality of sliding windows are generated, when the sliding windows overlap, the window containing the most points is reserved, and then clustering is performed according to the sliding window where the data point is located, so that a clustering result can be obtained, and the clustering result can be at least one clustering cluster. The data points can be understood as corresponding face information, that is, face information of a face corresponding to one data point.

Step 104, identifying whether the same anchor exists in the anchors corresponding to the live video streams according to at least one cluster.

Optionally, based on the content in each cluster, whether the same anchor exists in the anchors corresponding to the live video streams is identified, so that detection of illegal behaviors of a person multicast can be realized by performing cross analysis on a large amount of live video data based on a face clustering mode.

According to the live video processing method provided by the embodiment of the application, the frame image is obtained by performing frame extraction processing on the live video stream, conditions are provided for subsequent auditing, the judgment conditions are provided for auditing of one-person multicast illegal actions based on the clustering result by performing clustering processing on the face information in the frame image, the live video stream without the one-person multicast illegal actions is screened out, the machine auditing function of the illegal scene of one-person multicast (the live video of the same host broadcast is simultaneously played on a plurality of accounts and the video acquisition angles are possibly different) in live broadcast is realized, the coverage rate is high, the practicability is realized, the workload of manual auditing can be reduced, and the purposes of improving auditing efficiency and reducing the auditing labor cost are achieved.

Fig. 2 is a flowchart of another live video processing method according to an embodiment of the present application. As shown in fig. 2, the live video processing method may include:

In step 201, in response to a processing request of a plurality of live video streams, a frame image of each live video stream is acquired.

In the embodiment of the present application, step 201 may be implemented in any manner of each embodiment of the present application, which is not limited to this embodiment, and is not described in detail.

Step 202, extracting face information of each frame image, and obtaining the face information of each frame image.

In the embodiment of the present application, step 202 may be implemented in any manner of each embodiment of the present application, which is not limited to this embodiment, and is not repeated.

And 203, clustering the face information of each frame image to obtain at least one cluster.

In the embodiment of the present application, step 203 may be implemented in any manner of each embodiment of the present application, which is not limited to this embodiment, and is not described in detail.

Step 204, determining the number of face information in each cluster.

It can be understood that, since the clusters are the results obtained by clustering face information, one face information corresponds to one face, the number of face information contained in each cluster can be counted.

In step 205, in response to the existence of target clusters, the number of which is greater than a preset threshold, in at least one cluster, it is determined that the same anchor exists in anchors corresponding to each of the plurality of live video streams.

Optionally, the number of face information contained in the clusters can be compared with a preset threshold, and if a target cluster with the number greater than the preset threshold can be found out from the clusters, the fact that the same anchor exists in the anchors corresponding to the live video streams respectively is indicated. For example, if the number of face information in a cluster is greater than a threshold (e.g., 1), then it may be determined that there is a potential for a one-person multicast violation.

And 206, transmitting the live video streams corresponding to the face information in the target cluster to a manual auditing terminal.

Optionally, based on the number of face information in the cluster, it is determined that there is a possibility of a multicast violation by one person, and live video streams corresponding to the face information in the cluster may be sent to the manual review terminal.

According to the live video processing method provided by the embodiment of the application, the identification of the face information in the live video stream is realized by determining the number of the face information in each cluster, a judgment condition is provided for the auditing of the multicast illegal behaviors of one person, the number of the face information contained in the cluster is compared with the preset threshold value to judge whether the multicast illegal exists or not, the machine auditing function of the illegal scene of the multicast of one person in live broadcast (the live video of the same host broadcast is simultaneously played on a plurality of accounts and the video acquisition angles are possibly different) is realized, the coverage rate is high, the practicability is realized, the workload of manual auditing can be reduced, and the purposes of improving the auditing efficiency and reducing the auditing labor cost are achieved.

In order to further improve the detection result, the manual auditing cost is reduced. Optionally, as shown in fig. 3 and 4, the live video processing method may include:

Step 401, obtaining respective frame sequences of the plurality of live video streams in response to processing requests of the plurality of live video streams.

Wherein the frame sequence refers to representing the live video stream with frame-by-frame image files in sequence. For example, taking a live video stream as an example, a plurality of frame images may be extracted from the live video stream, and the plurality of frame images may be assembled into a sequence to obtain a frame sequence of the live video stream.

Step 402, sampling a plurality of frame sequences N times, and collecting one frame image from each of the plurality of frame sequences; wherein N is an integer greater than or equal to 1.

For example, taking three live video streams as an example, in response to a processing request of a live video stream, it is assumed that a frame sequence of live video stream 1 includes frame images 11,12, and 13, a frame sequence of live video stream 2 includes frame images 21,22, and 23, and a frame sequence of live video stream 3 includes frame images 31,32, and 33. Taking three samples as an example, the frame image 13 is acquired from the frame sequence of the live video stream 1, the frame image 23 is acquired from the frame sequence of the live video stream 2, the frame image 33 is acquired from the frame sequence of the live video stream 3, and one frame image is acquired from the frame sequence in each sample.

Step 403, determining the frame image of each live video stream according to the frame image obtained by current sampling.

For example, consider three live video streams, the frame sequence of live video stream 1 comprising frame images 11,12 and 13, the frame sequence of live video stream 2 comprising frame images 21,22 and 23, and the frame sequence of live video stream 3 comprising frame images 31,32 and 33. One frame image is acquired from each of the plurality of frame sequences, taking the third sampling as an example, the frame image 12 is acquired from the frame sequence of the live video stream 1, the frame image 22 is acquired from the frame sequence of the live video stream 2, and the frame image 32 is acquired from the frame sequence of the live video stream 3, and at this time, the currently acquired frame image 12, frame image 22 and frame image 32 can be determined as the frame images of the live video stream 1, live video stream 2 and live video stream 3.

Step 404, extracting face information of each frame image, and obtaining face information of each frame image.

In the embodiment of the present application, step 404 may be implemented in any manner of each embodiment of the present application, which is not limited to this embodiment, and is not described in detail.

And 405, clustering the face information of each frame image to obtain at least one cluster.

In the embodiment of the present application, step 405 may be implemented in any manner of each embodiment of the present application, which is not limited to this embodiment, and is not repeated.

Step 406, when the target clusters obtained based on the face information in the frame image sampled each time are the same, determining that the same anchor exists in the anchors corresponding to the live video streams.

In the embodiment of the application, the target cluster is a cluster with the number of face information larger than a preset threshold value in at least one cluster.

For example, as shown in fig. 3, in response to a processing request of N live video streams, respective frame sequences of the N live video streams are obtained, N samples are performed on the N frame sequences, each frame image is collected from the N frame sequences, a frame image of each live video stream is determined according to the frame image obtained by current sampling, face information is extracted from each frame image, face information of each frame image is obtained, the face information of each frame image is clustered, at least one cluster is obtained, and if a cluster with the number of face information being greater than a preset threshold exists in the cluster, a multicast behavior of one person may exist in the live video stream corresponding to the cluster. For example, as shown in fig. 3, the clustered clusters are represented by circles, wherein the number 1 in the clustered clusters represents the number 1 of the face information in the clustered clusters, the number 2 in the clustered clusters represents the number 2 of the face information in the clustered clusters, and the number 5 in the clustered clusters represents the number 5 of the face information in the clustered clusters, so that whether a person multicast behavior exists in the corresponding live video stream can be determined according to the number of the face information in the clustered clusters. And when the target clusters obtained based on the face information in the frame images sampled each time are the same, determining that the same anchor exists in the anchors corresponding to the live video streams.

Taking three live video streams as an example, in response to a processing request of a live video stream, it is assumed that a frame sequence of live video stream 1 contains frame images 11, 12 and 13, a frame sequence of live video stream 2 contains frame images 21, 22 and 23, and a frame sequence of live video stream 3 contains frame images 31, 32 and 33. Taking the 3 rd sampling as an example, each time of acquisition is performed, a frame image is acquired from each of a plurality of frame sequences; the frame image 13 is collected from the frame sequence of the live video stream 1, the frame image 23 is collected from the frame sequence of the live video stream 2, and the frame image 33 is collected from the frame sequence of the live video stream 3, and at this time, the currently collected frame image 13, frame image 23 and frame image 33 can be determined to be the frame images of the live video stream 1, live video stream 2 and live video stream 3 sampled this time. Assuming that the face clustering is performed by using the frame images 13, 23 and 33 to obtain 2 clusters, wherein the cluster 1 contains face information in the frame images 13 and face information in the frame images 23, and the cluster 2 contains face information in the frame images 33, it can be seen that the live video stream corresponding to the cluster 1 has a possibility of one person multicasting. Each sampling is carried out to acquire a frame image from a plurality of frame sequences, the frame images acquired each time are clustered for one time to obtain a primary identification result, three times of sampling are carried out, the three times of identification result can be obtained, the final result is obtained by counting the three times of identification result, and the auditing accuracy is improved through multiple times of identification, so that omission is prevented. And pushing the multipath video corresponding to the face information in the cluster into a human review flow, and further confirming through manual review.

According to the live video processing method provided by the embodiment of the application, the frame sequence is acquired, and the frame sequence is sampled for a plurality of times, so that the acquisition of the frame image is realized, materials are provided for the verification of the multicast illegal behaviors of the following one person, the frame image is more representative, the frame image is ensured to reflect the real situation of the corresponding live video stream, and the guarantee is provided for the verification result of the multicast illegal behaviors of the one person.

In order to achieve the above embodiment, the present application further provides a live video processing device.

Fig. 5 is a block diagram of a live video processing apparatus according to an embodiment of the present application. As shown in fig. 5, the live video processing apparatus may include: an acquisition module 510, an extraction module 520, a processing module 530, and an identification module 540.

The acquiring module 510 is configured to acquire a frame image of each live video stream in response to a processing request of a plurality of live video streams.

The extracting module 520 is configured to extract face information of each frame image, and obtain face information of each frame image.

And the processing module 530 is configured to perform clustering processing on the face information of each frame image, and obtain at least one cluster.

The identifying module 540 is configured to identify whether the same anchor exists in anchors corresponding to each of the plurality of live video streams according to at least one cluster.

According to the live video processing device, the frame image is obtained by performing frame extraction processing on the live video stream, conditions are provided for subsequent auditing, the judgment conditions are provided for auditing of one-person multicast illegal behaviors based on the clustering result by performing clustering processing on face information in the frame image, the live video stream without the one-person multicast illegal behaviors is screened out, the machine auditing function of the illegal scenes of one-person multicast (live video of the same main broadcasting is simultaneously played on a plurality of accounts and the video acquisition angles are possibly different) in live broadcast is realized, the coverage rate is high, the practicability is realized, the workload of manual auditing can be reduced, and the purposes of improving auditing efficiency and reducing auditing labor cost are achieved.

In some embodiments of the present application, as shown in fig. 6, fig. 6 is a block diagram of a live video processing apparatus according to another embodiment of the present application, where the live video processing apparatus may include: the system comprises an acquisition module 610, an extraction module 620, a processing module 630, a determination module 640, a judgment module 650 and a manual auditing module 660.

An acquiring module 610 is configured to acquire a frame image of each live video stream in response to a processing request of a plurality of live video streams.

The extracting module 620 is configured to extract face information of each frame image, and obtain face information of each frame image.

And the processing module 630 is configured to perform clustering processing on the face information of each frame image, and obtain at least one cluster.

A determining module 640, configured to determine the number of face information in each cluster.

The judging module 650 is configured to determine that the same anchor exists in anchors corresponding to each of the plurality of live video streams in response to the existence of at least one target cluster with the number greater than the preset threshold.

And the manual auditing module 660 is used for transmitting the live video streams corresponding to the face information in the target cluster to the manual auditing terminal.

According to the live video processing device provided by the embodiment of the application, the identification of the face information in the video stream is realized by determining the number of the face information in each cluster, a judgment condition is provided for checking the multicast violations of one person, and whether the multicast violations of one person exist or not is judged by comparing the number of the face information contained in the cluster with the preset threshold value, so that the machine checking function of checking the violations of the multicast of one person in live broadcast (the live video of the same host broadcast is simultaneously played on a plurality of accounts and the video acquisition angles are possibly different) is realized, the coverage rate is high, the practicability is realized, the workload of manual checking can be reduced, and the purposes of improving the checking efficiency and reducing the checking labor cost are achieved.

Wherein 610-630 of fig. 6 and 510-530 of fig. 5 have the same function and structure.

In some embodiments of the present application, as shown in fig. 7, fig. 7 is a block diagram of a live video processing apparatus according to another embodiment of the present application, where an obtaining module 710 of the live video processing apparatus includes: an acquisition unit 711, a sampling unit 712, a determination unit 713, an extraction unit 714, a clustering unit 715, and a judgment unit 716.

Wherein, the obtaining unit 711 is configured to obtain respective frame sequences of the plurality of live video streams in response to processing requests of the plurality of live video streams.

A sampling unit 712, configured to sample the plurality of frame sequences N times, and collect one frame image from each of the plurality of frame sequences at a time; wherein N is an integer greater than or equal to 1.

A determining unit 713 for determining a frame image of each live video stream according to the frame image obtained by the current sampling.

The extracting unit 714 is configured to extract face information of each frame image, and obtain face information of each frame image.

And the clustering unit 715 is used for clustering the face information of each frame image to obtain at least one cluster.

The determining unit 716 is configured to determine that the same anchor exists in anchors corresponding to each of the plurality of live video streams when target clusters obtained based on face information in the frame images sampled each time are the same.

According to the live video processing device provided by the embodiment of the application, the frame sequence is acquired, and the frame sequence is sampled for a plurality of times, so that the acquisition of the frame image is realized, materials are provided for the verification of the multicast illegal behaviors of the following one person, the frame image is more representative, the frame image is ensured to reflect the real situation of the corresponding live video stream, and the guarantee is provided for the verification result of the multicast illegal behaviors of the one person.

Wherein 710-760 of FIG. 7 and 610-660 of FIG. 6 have the same function and structure

The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.

According to an embodiment of the present application, the present application also provides an electronic device and a readable storage medium.

As shown in fig. 8, a block diagram of an electronic device of a method of live video processing according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the applications described and/or claimed herein.

As shown in fig. 8, the electronic device includes: one or more processors 801, memory 802, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 801 is illustrated in fig. 8.

Memory 802 is a non-transitory computer readable storage medium provided by the present application. The memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method of live video processing provided by the present application. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of live video processing provided by the present application.

The memory 802 is used as a non-transitory computer readable storage medium for storing non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the acquisition module 510, the extraction module 520, the processing module 530, and the recognition module 540 shown in fig. 5) corresponding to the method of live video processing in the embodiment of the application. The processor 801 executes various functional applications of the server and data processing, i.e., a method of implementing live video processing in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 802.

Memory 802 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created from the use of the electronic device for live video processing, and the like. In addition, memory 802 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 802 may optionally include memory located remotely from processor 801, which may be connected to the live video processing electronics via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for live video processing may further include: an input device 803 and an output device 804. The processor 801, memory 802, input devices 803, and output devices 804 may be connected by a bus or other means, for example in fig. 8.

The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for live video processing, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, and the like. The output device 804 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service ("Virtual PRIVATE SERVER" or simply "VPS") are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

According to the technical scheme of the embodiment of the application, the frame images are obtained, the face information in the frame images is clustered, the auditing conditions are provided for judging the multicast violations of one person, the live video stream without the multicast violations of one person is screened out, and the workload of manual auditing is reduced, so that the purposes of increasing the auditing efficiency of the multicast violations of one person and reducing the auditing labor cost are achieved, and the manual auditing can process the live video stream of the multicast violations of more persons due to the reduction of the workload of manual auditing, thereby realizing the increase of the coverage of the illegal auditing.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the disclosed embodiments are achieved, and are not limited herein.

The above embodiments do not limit the scope of the present application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application should be included in the scope of the present application.

Claims

1. A live video processing method, comprising:

Responding to processing requests of a plurality of live video streams, and acquiring a frame image of each live video stream, wherein the processing request is a violation detection request for detecting whether the live video stream has a violation request of one person multicasting;

identifying whether the same anchor exists in the anchors corresponding to the live video streams according to the at least one cluster;

the acquiring the frame image of each live video stream includes:

Acquiring respective frame sequences of the plurality of live video streams;

Sampling a plurality of frame sequences for N times, and collecting one frame image from each of the plurality of frame sequences each time; wherein N is an integer greater than or equal to 1;

determining a frame image of each live video stream according to the frame image obtained by current sampling;

the identifying whether the same anchor exists in the anchors corresponding to the live video streams according to the at least one cluster includes:

when target clusters obtained based on face information in each sampled frame image are the same, determining that the same anchor exists in anchors corresponding to the live video streams;

The target cluster is a cluster with the number of face information larger than a preset threshold value in the at least one cluster.

2. The method of claim 1, further comprising:

Determining the number of face information in each cluster;

and determining the cluster with the number larger than a preset threshold value in the at least one cluster as a target cluster.

3. The method of claim 2, further comprising:

And transmitting the live video streams corresponding to the face information in the target cluster to a manual auditing terminal.

4. A live video processing apparatus, comprising:

The acquisition module is used for responding to processing requests of a plurality of live video streams, acquiring frame images of each live video stream, wherein the processing requests are violation detection requests and are used for detecting whether the live video streams have a violation behavior of one person for multicasting;

the identification module is used for identifying whether the same anchor exists in the anchors corresponding to the live video streams according to the at least one cluster;

wherein, the acquisition module includes:

An obtaining unit, configured to obtain respective frame sequences of the plurality of live video streams;

The sampling unit is used for sampling a plurality of frame sequences for N times, and each time, one frame image is acquired from each of the plurality of frame sequences; wherein N is an integer greater than or equal to 1;

The determining unit is used for determining the frame image of each live video stream according to the frame image obtained by current sampling;

the identification module is specifically configured to:

5. The apparatus of claim 4, wherein the identification module is further to:

Determining the number of face information in each cluster;

6. The apparatus of claim 5, wherein the identification module is further specifically configured to:

7. An electronic device, comprising:

at least one processor; and

A memory communicatively coupled to the at least one processor; wherein,

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-3.

8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-3.

9. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 3.