CN114519889A

CN114519889A - Cover image detection method and device for live broadcast room, computer equipment and medium

Info

Publication number: CN114519889A
Application number: CN202210168827.8A
Authority: CN
Inventors: 王璞; 陈增海; 郑康元
Original assignee: Guangzhou Cubesili Information Technology Co Ltd
Current assignee: Guangzhou Cubesili Information Technology Co Ltd
Priority date: 2022-02-23
Filing date: 2022-02-23
Publication date: 2022-05-20

Abstract

The application relates to the technical field of network live broadcast, and provides a cover image detection method and device for a live broadcast room, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a cover image to be detected, performing face detection on the cover image to obtain a face detection result, acquiring the area size of each face area and the size of the cover image if one or more face areas exist in the face detection result, and performing face contour key point detection on each face area to obtain a face contour key point detection result if the occupation ratio of each face area in the cover image is smaller than a preset occupation ratio threshold value and the width-to-height ratio is within a preset proportion threshold value range; if the detection results of the key points of the face contour all meet the preset conditions, the cover image is used as the target cover image, and the cover image detection efficiency is improved.

Description

Cover image detection method and device for live broadcast room, computer equipment and medium

Technical Field

The embodiment of the application relates to the technical field of network live broadcast, in particular to a cover image detection method and device for a live broadcast room, computer equipment and a storage medium.

Background

With the rapid development of internet technology, live webcasting is becoming an entertainment means that is becoming popular. When the anchor is live, the anchor generally needs to use a live cover to show and recommend different live contents so as to attract the click and watch of the user. Specifically, after the user opened the live platform, for example live APP the interface of live APP can present the live listing, and the content of each live broadcast in the live listing can be demonstrateed through the live front cover that the anchor has set for in advance, and the user clicks corresponding live front cover and gets into the live room, can see actual live content.

Currently, live-broadcast covers mainly use live-broadcast real-time screenshots or pictures most related to live-broadcast historical screenshot candidate sets as cover images, the cover images obtained in the mode cannot guarantee the cover quality, and the live-broadcast covers need to be manually audited one by one. However, in the face of a large amount of live broadcast covers, manual auditing cost is high and efficiency is low.

Disclosure of Invention

The embodiment of the application provides a cover image detection method and device for a live broadcast room, computer equipment and a storage medium, and can solve the technical problems of low quality of live broadcast covers, high cost and low efficiency of manual auditing of live broadcast covers, and the technical scheme is as follows:

In a first aspect, an embodiment of the present application provides a cover image detection method for a live broadcast room, including:

acquiring a cover image to be detected;

carrying out face detection on the cover image to obtain a face detection result;

if one or more face regions exist in the face detection result, acquiring the region size of each face region and the size of the cover image;

if the proportion of each face region in the cover image is judged to be smaller than a preset proportion threshold value and the width-height proportion of each face region is within a preset proportion threshold value range according to the region size of each face region and the size of the cover image, face contour key point detection is carried out on each face region, and a face contour key point detection result of each face region is obtained;

and if the face contour key point detection result of each face region meets a preset condition, taking the cover image as a target cover image.

In a second aspect, an embodiment of the present application provides a cover image detection apparatus for a live broadcast room, including:

the cover image acquisition module is used for acquiring a cover image to be detected;

The face detection module is used for carrying out face detection on the cover image to obtain a face detection result;

the size acquisition module is used for acquiring the area size of each face area and the size of the cover image if one or more face areas exist in the face detection result;

the key point detection module is used for judging that the proportion of each face area in the cover image is smaller than a preset proportion threshold value and the width-height ratio of each face area is within a preset proportion threshold value range according to the area size of each face area and the size of the cover image, and performing face contour key point detection on each face area to obtain a face contour key point detection result of each face area;

and the target cover image obtaining module is used for taking the cover image as the target cover image if the face contour key point detection result of each face region meets the preset condition.

In a third aspect, embodiments of the present application provide a computer device, a processor, a memory, and a computer program stored in the memory and executable on the processor, where the processor implements the steps of the method according to the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, which when executed by a processor, performs the steps of the method according to the first aspect.

The method comprises the steps of obtaining a cover image to be detected; carrying out face detection on the cover image to obtain a face detection result; if one or more face regions exist in the face detection result, acquiring the region size of each face region and the size of the cover image; if the proportion of each face region in the cover image is judged to be smaller than a preset proportion threshold value and the width-height proportion of each face region is within a preset proportion threshold value range according to the region size of each face region and the size of the cover image, face contour key point detection is carried out on each face region, and a face contour key point detection result of each face region is obtained; and if the face contour key point detection result of each face region meets a preset condition, taking the cover image as a target cover image. According to the embodiment of the application, the front cover image to be detected is subjected to face detection and face contour key point detection in sequence, the front cover image which is free of the face, too large in face proportion, irregular in face proportion and incomplete in face is screened and eliminated, high-quality front cover image selection is automatically achieved, the labor cost is reduced, and the front cover image detection efficiency is improved.

For a better understanding and implementation, the technical solutions of the present application are described in detail below with reference to the accompanying drawings.

Drawings

Fig. 1 is a schematic view of an application scenario of a cover image detection method in a live broadcast room according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a cover image detection method of a live broadcast room according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram of face contour keypoint detection provided in an embodiment of the present application;

fig. 4 is a schematic flowchart of S21 to S25 in a cover image detection method of a live broadcast room according to an embodiment of the present application;

fig. 5 is a schematic flowchart of S31 to S32 in a cover image detection method of a live broadcast room according to an embodiment of the present application;

fig. 6 is a schematic flowchart of S311 to S312 in a cover image detection method for a live broadcast room according to an embodiment of the present application;

fig. 7 is a schematic flowchart of S51 to S52 in a cover image detection method of a live broadcast room according to an embodiment of the present application;

fig. 8 is a schematic flowchart of S61 to S62 in a cover image detection method of a live broadcast room according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a cover image detection apparatus of a live broadcast room according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if/if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

Referring to fig. 1, fig. 1 is a schematic view of an application scenario of a cover image detection method in a live broadcast room according to an embodiment of the present application, where the application scenario includes an anchor client 101, a server 102, and a viewer client 103, and the anchor client 101 and the viewer client 103 interact with each other through the server 102.

The anchor client 101 is one end that sends a webcast video, and is typically a client used by an anchor (i.e., a webcast anchor user) in webcast.

The viewer client 103 refers to an end that receives and views a live video, and is typically a client employed by a viewer viewing a video in a live network (i.e., a live viewer user).

The hardware at which the anchor client 101 and viewer client 103 are directed is essentially a computer device, and in particular, as shown in fig. 1, it may be a type of computer device such as a smart phone, smart interactive tablet, and personal computer. Both the anchor client 101 and the viewer client 103 may access the internet via known network access means to establish a data communication link with the server 102.

Server 102, acting as a business server, may be responsible for further connecting with related audio data servers, video streaming servers, and other servers providing related support, etc., to form a logically associated server cluster for serving related terminal devices, such as anchor client 101 and viewer client 103 shown in fig. 1.

In the embodiment of the present application, the anchor client 101 and the audience client 103 may join the same live broadcast room (i.e., a live broadcast channel), where the live broadcast room is a chat room implemented by means of an internet technology, and generally has an audio/video broadcast control function. The anchor user performs live broadcast in the live broadcast room through the anchor client 101, and the audience of the audience client 103 can log in the server 102 to enter the live broadcast room to watch live broadcast.

In the live broadcast room, interaction between the anchor and the audience can be realized through known online interaction modes such as voice, video, characters and the like, generally, the anchor user performs programs for the audience in the form of audio and video streams, and economic transaction behaviors can also be generated in the interaction process. Of course, the application form of the live broadcast room is not limited to online entertainment, and can also be popularized to other relevant scenes, such as: user pairing interaction scenarios, video conference scenarios, product recommendation sales scenarios, and any other scenario requiring similar interaction.

Specifically, the process of the viewer watching the live broadcast is as follows: the viewer can click to access a live application (e.g., YY) installed on the viewer client 103 and choose to enter any one of the live rooms, and the viewer client 103 is triggered to load a live room interface for the viewer, wherein the live room interface includes a plurality of interactive components, and the viewer can watch live in the live room by loading the interactive components and perform various online interactions.

When live, the anchor shows that in order to enable a user to quickly acquire live content, the user can select interested live content to watch, and a live cover is usually added to a live broadcast room. For example, for live video content displayed in a live broadcast room by a main broadcast, the live video content can be displayed in a live broadcast platform in a live broadcast cover form, and a user can enter a corresponding live broadcast room to watch live video by clicking the live broadcast cover.

However, the existing live cover is usually a frame of picture captured from live video content, and the quality of the captured live cover cannot be guaranteed, so that the user visual experience is poor, the user cannot be attracted fast, the user click feedback is poor, and the viewing effect of the live video content is poor.

Therefore, the embodiment of the application provides a cover image detection method for a live broadcast room, which can be used by a main broadcast client as an execution main body.

Referring to fig. 2, fig. 2 is a schematic flow chart of a cover image detection method for a live broadcast room according to a first embodiment of the present application, where the method includes the following steps:

s10: and acquiring a cover image to be detected.

In this embodiment of the application, the cover image to be detected may be a cover image uploaded by a main broadcast on a live broadcast platform, or may be a cover image automatically generated by a live broadcast platform, for example, a live broadcast is subjected to a real-time screenshot or a historical screenshot by the live broadcast platform, or a cover image obtained from the live broadcast real-time screenshot or the historical screenshot.

S20: and carrying out face detection on the cover image to obtain a face detection result.

The face detection means that for any given image, a certain strategy is adopted to search the image to determine whether the image contains a face, and if so, the position, the size and the posture of the face are returned. In the embodiment of the application, a cover image to be detected is input into a face detection model, and a face detection result is obtained. The face detection result is used for indicating a face area included by the cover image, and the face area is a rectangular area. For example, a cover image to be detected is input into a face detection model, k face regions are obtained, and coordinates of the k face regions are expressed as

Wherein the content of the first and second substances,

as shown as the abscissa of the upper left corner of the kth personal face area,

as represented by the upper left ordinate of the kth personal face area,

indicated as the abscissa of the upper right corner of the kth individual face region,

indicated as the upper right ordinate of the kth personal area. And if the face region does not exist in the face detection result, namely k is 0, prompting that the cover image to be detected is unqualified. If one or more face regions exist in the face detection result, namely k >And 0, performing subsequent detection on each face area.

The face detection model in the embodiment of the application is obtained by pre-training, and the training process is as follows: the method comprises the steps of obtaining a plurality of video frame images containing clear faces, using a labeling tool to label the faces in the video frame images, making the labeled images into a training data set and a testing data set according to a certain proportion, inputting the training data set into a face detection model to train, inputting the testing data set to test after the training is finished, and obtaining the face detection model if the face recognition rate of the testing data set meets the preset requirement.

S30: and if one or more face areas exist in the face detection result, acquiring the area size of each face area and the size of the cover image.

In the embodiment of the present application, the size of the face region includes the width and height of the face region, and the size of the cover image includes the width and height of the cover image. Specifically, according to the coordinate representation of the face region

I.e. the width of the k-th personal face area,

i.e. the height of the kth individual face area. Similarly, the width and height of the cover image may also be obtained. S40: if the proportion of each face region in the cover image is judged to be smaller than a preset proportion threshold value and the width-height proportion of each face region is within a preset proportion threshold value range according to the region size of each face region and the size of the cover image, face contour key point detection is carried out on each face region, and a face contour key point detection result of each face region is obtained.

The detection of the key points of the face contour refers to giving a face image and locating the position of the face contour region of the face, which can be specifically seen in fig. 3, which is a schematic diagram of the detection of the key points of the face contour. In the embodiment of the present application, after the area size of each face area and the size of the cover image are obtained, the proportion of each face area in the cover image is calculated, that is, the ratio of the width of each face area to the width of the cover image is calculated, and the ratio of the height of each face area to the height of the cover image is calculated. Meanwhile, the width-height ratio of the face area of each face area is calculated, namely the ratio of the width to the height of each face area is calculated.

Comparing the proportion of each face region in the cover image with a preset proportion threshold value, and comparing the width-height proportion of each face region with a preset proportion threshold value range, for example, the preset proportion threshold value is 0.6, if the ratio of the width of each face region to the width of the cover image is greater than 0.6, and/or the ratio of the height of each face region to the height of the cover image is greater than 0.6, determining that the proportion of the face region in the cover image is too large. The preset proportion threshold range is 0.5-1, and if the ratio of the width to the height of each face area is not 0.5-1, the width-height proportion of the face area is judged to be out of order. And if the occupation ratio of one or more face areas in the cover image is too large or one or more face areas are not in a preset proportion threshold range, prompting that the cover image to be detected is unqualified.

If the proportion of each face region in the cover image is smaller than a preset proportion threshold value and the width-height proportion of each face region is within a preset proportion threshold value range, carrying out face contour key point detection on each face region to obtain a face contour key point detection result of each face region so as to carry out integrity detection on subsequent faces according to the face contour key point detection result.

S50: and if the face contour key point detection result of each face region meets a preset condition, taking the cover image as a target cover image.

In this embodiment of the application, the detection result of the face contour keypoints in each face region includes position information and confidence score of the face contour keypoints corresponding to each face region, and the preset condition includes that the positions of the face contour keypoints are in a cover image to be detected and the confidence score of the face contour keypoints is greater than a preset threshold. And if the detection result of the face contour key points of one or more face regions does not meet the preset condition, prompting that the cover image to be detected is unqualified. And if the face contour key point detection result of each face region meets a preset condition, taking the cover image as a target cover image. The target cover image is a cover image which is obtained by subjecting a cover image to be detected to face detection, virtual portrait detection and face integrity detection, and can be used as a high-quality cover image when a live video is released by a main broadcast. In an optional embodiment, the live broadcast platform performs real-time screenshot or historical screenshot on the anchor live broadcast to obtain a screenshot candidate set, where the screenshot candidate set includes a plurality of captured images. And taking the plurality of captured images as cover images to be detected, and carrying out the detection process one by one so as to obtain a plurality of target cover images.

In an alternative embodiment, referring to fig. 4, step S20 includes steps S21-S25, which are as follows:

s21: and carrying out face detection on the cover image by using a first convolutional neural network to obtain a plurality of face detection frames comprising faces in the cover image and corresponding confidence score.

In the embodiment of the application, a first convolutional neural network is used for carrying out face detection on the cover image, and a plurality of face detection frames and corresponding confidence scores are obtained. In the process of face detection, a large number of face detection frames are generated at the same face position, and the face detection frames may overlap with each other. The first convolutional neural network can be an FPN18 convolutional neural network, the FPN18 convolutional neural network is a characteristic diagram Pyramid network (FPN for short), the problem of multiple scales in object detection is solved, and the performance of object detection is greatly improved through simple network connection change under the condition that the calculation amount of an original model is not increased basically.

S22: sequencing the face detection frames according to the confidence score from high to low to obtain a face detection frame list;

S23: adding the face detection frame with the highest confidence score into an output list, and deleting the face detection frame with the highest confidence score from the face detection frame list;

s24: calculating the overlapping degree of the face detection frame with the highest confidence score and other face detection frames, and deleting the face detection frame with the overlapping degree larger than the threshold value of the overlapping degree in the face detection frame list;

s25: and calculating the overlapping degree of the face detection frame with the highest confidence score and other face detection frames left in the face detection frame list, deleting the face detection frame with the overlapping degree larger than the overlapping degree threshold value in the face detection frame list until the face detection frame list is empty, and obtaining a face detection result according to the output list. In the embodiment of the application, a plurality of face detection frames are screened, the screened face detection frames are used as face detection results, redundant face detection frames are removed, and therefore the face detection precision is improved.

In an alternative embodiment, referring to fig. 5, step S30 includes steps S31 to S32, which are as follows:

s31: if one or more face areas exist in the face detection result, performing virtual portrait detection on each face area to obtain a virtual portrait detection result of each face area;

S32: and if the virtual portrait detection result of each face area does not have a virtual portrait, acquiring the area size of each face area and the size of the cover image.

The virtual portrait refers to a non-real portrait and includes cartoon characters, 3D virtual characters and the like. In the embodiment of the application, if one or more face areas exist in the face detection result, virtual portrait detection is performed on each face area, so that virtual portraits which are possibly recognized by a face detection model in a wrong way are recognized, and a virtual portrait detection result of each face area is obtained. And if the virtual portrait exists in the virtual portrait detection results of one or more face areas, the cover image to be detected is prompted to be unqualified. And if the virtual portrait does not exist in the virtual portrait detection result of each face area, performing subsequent face area proportion and judging whether the face area proportion is reasonable.

In an alternative embodiment, referring to fig. 6, step S31 includes steps S311 to S312, which are as follows:

S311: and if one or more face areas exist in the face detection result, performing virtual portrait detection on each face area by using a second convolutional neural network to obtain a virtual portrait confidence score of each face area.

S312: and if the confidence score of the virtual portrait is smaller than a preset score, determining that the corresponding face region does not have the virtual portrait.

In the embodiment of the application, the virtual portrait detection is performed on each face region by using the second convolutional neural network, so that the virtual portrait confidence score of each face region is obtained, the accuracy of the convolutional neural network model can be improved, and the efficiency of the convolutional neural network model is improved. The second convolutional neural network may be an efficientnet-b0 convolutional neural network, if the confidence score of the virtual portrait is greater than a preset score, the face area has the virtual portrait, and when one or more face areas have the virtual portrait, it is prompted that the cover image to be detected is unqualified. If the confidence score of the virtual portrait is smaller than a preset score, the human face regions do not have the virtual portrait, and when one or more human face regions do not have the virtual portrait, subsequent human face region proportion and whether the human face region proportion is reasonable are judged, so that cover images to be detected comprising the virtual portrait are filtered.

In an alternative embodiment, referring to fig. 7, step S50 includes steps S51 to S52, which are as follows:

s51: if the occupation ratio of each face region in the cover image is judged to be smaller than a preset occupation ratio threshold value and the width-height ratio of each face region is within a preset ratio threshold value range according to the region size of each face region and the size of the cover image, each face region is expanded outwards in the diagonal direction of the face region according to the preset proportion, and a corresponding target face region is obtained.

S52: inputting each target face area into a face contour key point detection model to obtain a face contour key point detection result of each target face area; and the detection result of the face contour key points is used for indicating the position and confidence degree score of the face contour key points in each target face region.

In the embodiment of the application, each face area is expanded outwards by a preset proportion along the diagonal direction of the face area, and a corresponding target face area is obtained. For example, the size of the pixels of the face region is 300 × 200If the preset proportion is 1.5, the size of the pixels of the expanded target face area is 450 x 300, that is, the width of the face area is expanded by 75 pixels in the left-right direction, and the height of the face area is expanded by 50 pixels in the up-down direction. And inputting each target face region into a face contour key point detection model obtained by pre-training, and obtaining the position and confidence score of the face contour key point in each target face region. Wherein for each of the target face regions, there are corresponding positions [ (x) of n key points of the face contour ₀，y₀)，…，(x_i，y_i)，…(x_n-1，y_n-1)]And a confidence score [ p ] corresponding to each face contour key point₁，…p_i，…p_n-1]. The method comprises the steps of carrying out region expansion on a face region to obtain a target face region, and obtaining face contour key points based on the target face region, so that an incomplete face is positioned for subsequent face integrity detection.

In an alternative embodiment, referring to fig. 8, step S60 includes steps S61 to S62, which are as follows:

s61: and if the position of the face contour key point of each face region is in the cover image and the confidence score of the face contour key point of each face region is greater than a preset threshold value, determining the face contour key point as a target contour key point.

S62: and if the number of the key points of the target contour is larger than a preset numerical value, taking the cover image as a target cover image.

In the embodiment of the application, each face area all has n face outline key points, the width of the front cover image is w, the height is h, and the preset threshold value is alphal. For the ith individual face contour keypoint, the position is (x)_i，y_i) Confidence score of p_iIf the position of the ith face contour key point is in the cover image, the confidence score of the ith face contour key point is larger than a preset threshold value, namely the abscissa x of the ith face contour key point _iSatisfies 0 < (x_i< w, the ordinate y of the ith individual face contour key point_iSatisfy 0 < y_i< h, and p_iIf the number of the target contour key points of each face region is larger than a preset numerical value, for example, the number of the target contour key points of each face region exceeds 20, judging that the cover image to be detected meets the face integrity detection, and determining the cover image to be detected as a target cover image. The method comprises the steps of determining key points of a target contour and counting the number of the target contour points, so that cover images to be detected including incomplete faces are filtered.

Referring to fig. 9, fig. 9 is a schematic structural diagram of a cover image detection apparatus of a live broadcast room according to the present application. The embodiment of the application provides a cover image detection device 7 in live broadcast room includes:

a cover image acquisition module 71, configured to acquire a cover image to be detected;

a face detection module 72, configured to perform face detection on the cover image to obtain a face detection result;

a size obtaining module 73, configured to, if one or more face regions exist in the face detection result, obtain a region size of each face region and a size of the cover image;

A key point detection module 74, configured to, if it is determined that the proportion of each face area in the cover image is smaller than a preset proportion threshold and the width-to-height ratio of each face area is within a preset proportion threshold range according to the area size of each face area and the size of the cover image, perform face contour key point detection on each face area to obtain a face contour key point detection result of each face area;

a target cover image obtaining module 75, configured to take the cover image as a target cover image if the face contour key point detection result of each face region meets a preset condition.

It should be noted that, when the cover image detection apparatus of the live broadcast room provided in the above embodiment executes the cover image detection method of the live broadcast room, the division of the above functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to complete all or part of the above described functions. In addition, the cover image detection device of the live broadcast room and the cover image detection method of the live broadcast room provided by the above embodiment belong to the same concept, and details of the implementation process are shown in the method embodiment, which are not described herein again.

Please refer to fig. 10, which is a schematic structural diagram of a computer device provided in the present application. As shown in fig. 10, the computer device 21 may include: a processor 210, a memory 211, and a computer program 212 stored in the memory 211 and operable on the processor 210, such as: detecting a cover image of a live broadcast room; the steps in the above embodiments are implemented when the processor 210 executes the computer program 212.

The processor 210 may include one or more processing cores, among other things. The processor 210 is connected to various parts in the computer device 21 by various interfaces and lines, executes various functions of the computer device 21 and processes data by executing or executing instructions, programs, code sets or instruction sets stored in the memory 211 and calling data in the memory 211, and optionally, the processor 210 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable Gate Array (FPGA), Programmable Logic Array (PLA). The processor 210 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing contents required to be displayed by the touch display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 210, but may be implemented by a single chip.

The Memory 211 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 211 includes a non-transitory computer-readable medium. The memory 211 may be used to store instructions, programs, code, sets of codes, or sets of instructions. The memory 211 may include a program storage area and a data storage area, wherein the program storage area may store instructions for implementing an operating system, instructions for at least one function (such as touch instructions, etc.), instructions for implementing the above-mentioned method embodiments, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 211 may optionally be at least one memory device located remotely from the processor 210.

The embodiment of the present application further provides a computer storage medium, where the computer storage medium may store a plurality of instructions, where the instructions are suitable for being loaded by a processor and executing the method steps of the foregoing embodiment, and a specific execution process may refer to specific descriptions of the foregoing embodiment, which is not described herein again.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

In the above embodiments, the description of each embodiment has its own emphasis, and reference may be made to the related description of other embodiments for parts that are not described or recited in any embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided in the present invention, it should be understood that the disclosed apparatus/terminal device and method may be implemented in other ways. For example, the above-described embodiments of the apparatus/terminal device are merely illustrative, and for example, a module or a unit may be divided into only one logical function, and may be implemented in other ways, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow in the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium and used by a processor to implement the steps of the above-described embodiments of the method. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc.

The present invention is not limited to the above-described embodiments, and various modifications and variations of the present invention are intended to be included within the scope of the claims and the equivalent technology of the present invention if they do not depart from the spirit and scope of the present invention.

Claims

1. A cover image detection method for a live broadcast room is characterized by comprising the following steps:

acquiring a cover image to be detected;

2. The cover image detection method of a live broadcast room as claimed in claim 1, wherein:

if one or more face regions exist in the face detection result, the step of obtaining the region size of each face region and the size of the cover image comprises the following steps:

if one or more face areas exist in the face detection result, performing virtual portrait detection on each face area to obtain a virtual portrait detection result of each face area;

and if the virtual portrait detection result of each face area does not have a virtual portrait, acquiring the area size of each face area and the size of the cover image.

3. The cover image detection method of a live broadcast room according to claim 1, characterized in that:

the step of carrying out face detection on the cover image to obtain a face detection result comprises the following steps:

carrying out face detection on the cover image by using a first convolutional neural network to obtain a plurality of face detection frames comprising faces in the cover image and corresponding confidence scores;

sequencing the face detection frames according to the confidence score from high to low to obtain a face detection frame list;

Adding the face detection frame with the highest confidence score into an output list, and deleting the face detection frame with the highest confidence score from the face detection frame list;

calculating the degree of overlap between the face detection frame with the highest confidence score and other face detection frames, and deleting the face detection frames with the degree of overlap larger than the threshold value of the degree of overlap from the face detection frame list;

and calculating the overlapping degree of the face detection frame with the highest confidence score and other face detection frames left in the face detection frame list, deleting the face detection frame with the overlapping degree larger than the overlapping degree threshold value in the face detection frame list until the face detection frame list is empty, and obtaining a face detection result according to the output list.

4. The cover image detection method of a live broadcast room as claimed in claim 2, wherein:

if one or more face areas exist in the face detection result, performing virtual portrait detection on each face area to obtain a virtual portrait detection result of each face area, including:

if one or more face areas exist in the face detection result, performing virtual portrait detection on each face area by using a second convolutional neural network to obtain a virtual portrait confidence score of each face area;

And if the confidence score of the virtual portrait is smaller than a preset score, determining that the corresponding face region does not have the virtual portrait.

5. The cover image detection method of a live broadcast room as claimed in claim 1, wherein:

if the proportion of each face region in the cover image is judged to be smaller than a preset proportion threshold value and the width-height proportion of each face region is within a preset proportion threshold value range according to the region size of each face region and the size of the cover image, face contour key point detection is carried out on each face region, and a face contour key point detection result of each face region is obtained, the method comprises the following steps:

if the occupation ratio of each face region in the cover image is judged to be smaller than a preset occupation ratio threshold value and the width-height ratio of each face region is within a preset ratio threshold value range according to the region size of each face region and the size of the cover image, expanding each face region outwards along the diagonal direction of the face region by a preset ratio to obtain a corresponding target face region;

inputting each target face area into a face contour key point detection model to obtain a face contour key point detection result of each target face area; and the detection result of the face contour key points is used for indicating the position and confidence degree score of the face contour key points in each target face region.

6. The cover image detection method of a live broadcast room as claimed in claim 5, wherein:

if the face contour key point detection result of each face region meets a preset condition, the step of taking the cover image as a target cover image comprises the following steps:

if the position of the face contour key point of each face region is in the cover image and the confidence score of the face contour key point of each face region is greater than a preset threshold value, determining the face contour key point as a target contour key point;

and if the number of the key points of the target contour is larger than a preset numerical value, taking the cover image as a target cover image.

7. A cover image detection device of a live broadcast room is characterized by comprising:

The key point detection module is used for detecting the face contour key points of each face area to obtain the face contour key point detection result of each face area if the proportion of each face area in the cover image is judged to be smaller than a preset proportion threshold value and the width-height ratio of each face area is within a preset proportion threshold value range according to the area size of each face area and the size of the cover image;

and the target cover image acquisition module is used for taking the cover image as a target cover image if the face contour key point detection result of each face region meets a preset condition.

8. The cover image detection device of a live broadcast room of claim 7, wherein the face detection module comprises:

a face detection frame obtaining unit, configured to perform face detection on the cover image by using a first convolutional neural network, so as to obtain a plurality of face detection frames including faces in the cover image and corresponding confidence scores;

a face detection frame list obtaining unit, configured to sort the face detection frames in order from high confidence score to low confidence score, so as to obtain a face detection frame list;

The output list adding unit is used for adding the face detection frame with the highest confidence score into an output list and deleting the face detection frame with the highest confidence score from the face detection frame list;

the overlap calculation unit is used for calculating the overlap degree of the face detection frame with the highest confidence score and other face detection frames and deleting the face detection frames with the overlap degree larger than the overlap degree threshold value from the face detection frame list;

and the face detection result obtaining unit is used for calculating the overlapping degree of the face detection frame with the highest confidence score and other face detection frames left in the face detection frame list, deleting the face detection frame with the overlapping degree larger than the overlapping degree threshold value in the face detection frame list until the face detection frame list is empty, and obtaining a face detection result according to the output list.

9. A computer device, comprising: processor, memory and computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 6 when executing the computer program.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.