WO2023045183A1

WO2023045183A1 - Image processing

Info

Publication number: WO2023045183A1
Application number: PCT/CN2022/070905
Authority: WO
Inventors: 田茂清; 刘建博; 伊帅
Original assignee: 上海商汤智能科技有限公司
Priority date: 2021-09-22
Filing date: 2022-01-10
Publication date: 2023-03-30
Also published as: CN113837065A; TW202314634A

Abstract

Embodiments of the present disclosure provide an image processing method and apparatus. The method comprises: obtaining an image frame to be processed; obtaining a scenario type of said image frame, the scenario type being determined according to an initial face detection result of said image frame; performing face detection on said image frame according to a detection mode matching the scenario type of said image frame; and performing, according to a preset shielding mode, shielding processing on the detected face. According to the method, the face can be shielded more accurately, the image processing efficiency is improved, and the time cost of the manual operation is reduced.

Description

Image Processing

Cross References to Related Applications

This application claims priority to a Chinese patent application with application number CN202111108402X filed with the China Patent Office on September 22, 2021, the entire contents of which are incorporated by reference in this disclosure.

technical field

Embodiments of the present disclosure relate to the technical field of image processing, and in particular, to an image processing method and device.

Background technique

In recent years, as face recognition technology is more and more widely used in security, payment and unlocking fields, people pay more and more attention to the privacy protection of face information. For example, in the post-production of TV programs, due to the need of privacy protection, it is often necessary to block some faces that appear on the screen. When relying on manual methods, the workload is huge and cumbersome when coding and occluding human faces, and problems such as missed typing and wrong typing often occur.

However, in related technologies, when using deep learning algorithms to occlude faces in images, the processing algorithms used are difficult to apply to images in various scenarios, and the effect is not ideal.

Contents of the invention

In view of this, the embodiments of the present disclosure provide at least one image processing method and device.

Specifically, the embodiments of the present disclosure are achieved through the following technical solutions:

In a first aspect, an image processing method is provided, the method comprising: acquiring an image frame to be processed; acquiring a scene type of the image frame, the scene type being determined according to a preliminary face detection result of the image frame; In a detection mode that matches the scene type of the image frame, face detection is performed on the image frame; occlusion processing is performed on the detected human face according to a preset occlusion mode.

In a second aspect, an image processing device is provided, the device comprising: an image frame acquisition module, configured to acquire an image frame to be processed; a scene type acquisition module, configured to acquire the scene type of the image frame, the scene type Determine according to the result of the initial inspection of the face of the image frame; the face detection module is used to detect the face of the image frame according to the detection method matched with the scene type of the image frame; the occlusion processing module uses The method is to occlude the detected faces according to the preset occlusion mode.

In a third aspect, an electronic device is provided, the device includes a memory and a processor, the memory is used to store computer instructions executable on the processor, and the processor is used to implement the present disclosure when executing the computer instructions The image processing method described in any one of the embodiments.

In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, the image processing method described in any embodiment of the present disclosure is implemented.

In a fifth aspect, a computer program product is provided, the product includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the image processing method described in any embodiment of the present disclosure is implemented.

The image processing method provided by the embodiment of the present disclosure adaptively selects a processing method that matches the scene type for image frames of different scene types, so that when performing occlusion processing on the face in the image frame, the image frames of different scenes can be A more effective processing method for the image frame is adopted, so the human face can be blocked more accurately, the efficiency of image processing is improved, and the time cost of manual operation is reduced.

Description of drawings

In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, the accompanying drawings in the following description The drawings are only some embodiments described in one or more embodiments of the present disclosure, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.

Fig. 1 is a flowchart of an image processing method shown in at least one embodiment of the present disclosure;

Fig. 2 is a flowchart of another image processing method shown in at least one embodiment of the present disclosure;

Fig. 2A is a processing logic flowchart of an image processing method shown in at least one embodiment of the present disclosure;

Fig. 3 is a block diagram of an image processing device shown in at least one embodiment of the present disclosure;

Fig. 4 is a block diagram of another image processing device shown in at least one embodiment of the present disclosure;

Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to at least one embodiment of the present disclosure.

Detailed ways

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with this specification. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present specification as recited in the appended claims.

The terms used in this specification are for the purpose of describing particular embodiments only, and are not intended to limit the specification. As used in this specification and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this specification, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."

As shown in Figure 1, Figure 1 is a flowchart of an image processing method shown in at least one embodiment of the present disclosure, including the following steps:

In step 102, an image frame to be processed is acquired, and the image frame includes at least one human face.

The image frame to be processed can be a photo, a screenshot, or a frame in a video.

This embodiment does not limit the specific manner of acquiring image frames.

For example, a Vlog video input by a user may be received, and a tool library such as FFmpeg and OpenCV may be used to deframe the video to obtain a plurality of image frames including at least one human face.

For another example, a photo taken by a camera and containing a human face may be received.

The processing in this embodiment may be occlusion processing, and occlusion processing is an image processing that can partially or completely hide the facial features of a person in an image frame, such as mosaic processing, covering with stickers, or Gaussian blur.

In step 104, the scene type of the image frame is acquired, and the scene type is determined according to the result of the preliminary face detection in the image frame.

In this embodiment, image frames are divided into different scene types, so as to select different image processing methods for image frames of different scene types. The scene type is determined according to the preliminary detection result of the human face in the image frame, and the preliminary detection result of the human face may include a detection frame corresponding to each human face. The result of the initial detection of the face can be obtained by using a lightweight neural network to perform preliminary detection of the face in the image frame.

For example, scenario types may include single-player scenarios and multiplayer scenarios. According to the number of detection frames corresponding to the faces, that is, the number of faces, it may be determined whether the scene type of the image frame is a single-person scene or a multi-person scene.

For another example, the scene type may include a sparse scene and a dense scene. The scene type of the image frame may be determined to be a sparse scene or a dense scene according to the distribution of the detection frames corresponding to the faces. Exemplarily, if the detection frames in the concentrated distribution exceed the preset number, it is judged that the scene type of the image frame is a dense scene; or, if the number of detection frames in the image frame is large but the detection frames in the concentrated distribution do not exceed If the number is preset, it is determined that the scene type of the image frame is a sparse scene.

For another example, the scene type may include distant view and close view. Generally speaking, due to the fact that the near view is larger and the far view is smaller, the characters in the image frame of the distant view are far away from the camera, and the area occupied by a single face is relatively small. The area is relatively large, so in this embodiment, the distant view and the near view are divided according to the size of the detection frame of the human face.

In this step, obtaining the scene type of the image frame may be to conduct a preliminary inspection of the face in the acquired image frame to obtain the result of the preliminary detection of the face, and analyze and process the result of the preliminary detection of the face to obtain the scene type of the image frame; It may be that when the image frame to be processed is acquired, the image frame with the label of the scene type may be acquired, and the scene type of the image frame may be directly determined according to the label.

Taking the scene type including near view and distant view as an example, obtaining the scene type of the image frame may be processed as follows:

Carrying out a preliminary detection of a human face on the image frame to obtain a preliminary detection result of a human face, wherein the preliminary detection result of a human face includes at least one initial detection frame of a human face;

In the case where the target size of a preset number of initial face detection frames in the at least one initial face detection frame is greater than a preset threshold, determine that the scene type of the image frame is a close view; or,

In a case where the target size of the preset number of initial face detection frames in the at least one initial face detection frame is smaller than or equal to a preset threshold, it is determined that the scene type of the image frame is a distant view.

In actual implementation, the preliminary detection of faces on the image frames can be performed through a pre-trained face detection model, and the face detection model here is called the first face detection model. In an example, the first face detection model may be a small neural network model, that is, the number of layers of the neural network model is relatively small, so that the processing speed of the model is faster. The image frame is input into the first face detection model, and the initial detection result of the human face is output, and the initial detection result of the human face includes information such as the size and/or confidence score of each initial face detection frame in at least one initial face detection frame, Wherein, the confidence score indicates the probability that the image in the initial face detection frame belongs to a face image, and the higher the confidence score is, the more likely the image in the initial face detection frame is a face image.

Exemplarily, a preset number of initial face detection frames may be selected from the initial face detection frames, and a target size of the selected preset number of initial face detection frames may be determined. Specifically, how to select the detection frame can be set by those skilled in the art according to actual needs, so that the judgment of the scene type of the image frame is more suitable for the actual application scene.

Exemplarily, the initial face detection frames can be sorted according to the confidence scores from large to small, and five initial face detection frames with the highest confidence scores are selected. Here, five initial face detection frames are used as an example, and other numbers can also be selected; The initial face detection frame can be selected according to the position of the initial face detection frame in the image frame, for example, three initial face detection frames are selected at equal intervals from the leftmost to the rightmost, or selected at a fixed point position in the image frame The initial face detection frame; it can also be selected according to the size of the initial face detection frame, the initial face detection frame is sorted from large to small, and four initial face detection frames are randomly selected or selected according to a certain rule.

After the initial face detection frame is selected, the target size of the selected initial face detection frame is judged. The size of the selected initial face detection frame may be calculated first to obtain a calculation result. The purpose of the calculation process is to obtain the distance of the face in the image frame according to the size of the selected initial face detection frame, so as to judge whether the image frame belongs to the distant view or the near view. The calculation process can be methods such as taking the average, taking the median, taking the root mean square, or randomly selecting.

In this example, taking the selection of five initial face detection frames with the highest confidence scores as an example, the size of the five selected initial face detection frames can be calculated by taking the size of the five initial face detection frames The average value, the target size of the preset number of initial face detection frames can be represented by the average value. If the average value is greater than the preset threshold, it is determined that the scene type of the image frame is a close view, otherwise it is determined that it is a distant view.

In step 106, face detection is performed on the image frame according to a detection method that matches the scene type of the image frame.

Depending on the type of scene, the corresponding detection methods are also different. In this embodiment, detection methods matching different scene types are preset, so that a suitable detection method can be selected more specifically when detecting a human face.

For example, when the scene type includes a single-person scene and a multi-person scene, the detection method matching the single-person scene may be: performing image segmentation on the image frame to obtain the face contour or head contour of the single person; The detection method for matching a multi-person scene may be: performing face detection on an image frame to obtain multiple rectangular detection frames.

For another example, when the scene type includes sparse scene and dense scene, the detection method matching the sparse scene can be: perform face detection on the image frame to obtain multiple detection frames; the detection method matching the dense scene can be Yes: Face detection is performed on the image frame, and a dense area containing multiple faces is obtained.

For another example, in the case that the scene type includes near view and distant view, this step may include the following processing:

Responding to the fact that the scene type of the image frame is a distant view, the key point extraction of the human head is performed on the image frame to obtain the key point coordinates of the human head in the image frame, so that subsequent occlusion is performed based on the detected key point coordinates of the human head deal with.

In response to the scene type of the image frame being close-up, face detection is performed on the image frame to obtain the detection frame of the face in the image frame and the corresponding face features, so that subsequent The face features in the face database are accurately compared to confirm whether occlusion processing is required.

It has been found through practice that the face detection technology can detect the area of the face in the image and output a series of rectangular detection frames, but it does not perform well in the scene where the face presented in the image is too small, and the calculation speed is relatively slow. Slow and inaccurate recognition. The head point positioning technology can detect the position of the head in the image and output a key point in the center of the head. It performs better in the scene where the head area in the image is small, such as the distant view, and the positioning is more accurate. The positioning of the scene with a large head area is not accurate enough.

In actual implementation, when the scene type of the image frame is a distant view, the matching detection method can be: input the image frame into the pre-trained head (face) point positioning model, and the head key point is determined by the head point positioning model Extract and output the key point coordinates of each head. The key point coordinates can be represented by two numbers. For example, when the center of the coordinate axis is the point in the lower left corner of the image frame, the key point coordinates can be (18, 39), and the coordinate unit can be is a pixel, where the key point may be the center point of a human head.

In actual implementation, when the scene type of the image frame is close-up, the matching detection method can be: input the image frame into the pre-trained face detection model, and the face detection model here is called the second face detection model. The image frame is input into the second face detection model, and the face detection result is output, and the face detection result includes a detection frame of each face in the image frame and corresponding face features. Each detection frame is represented by detection frame coordinates to describe the coordinate position of a rectangular frame in the image frame, and it can also be a frame of other shapes. In this embodiment, a rectangular frame is taken as an example for illustration, and the coordinates of the detection frame can be represented by four numbers. For example, the four numbers can be the coordinates of the upper left corner and the lower right corner of the rectangular frame. When the center of the coordinate axis is the point at the lower left corner of the image frame, the coordinates of the detection frame can be (23, 75), (57, 46), or written as (23, 75, 57, 46). The four numbers can also be the coordinates of the lower left and upper right corners of the rectangle. In other examples, the coordinates of the detection frame may also use other representations, such as eight numbers, which is not limited in this embodiment. The unit of coordinates may be pixels.

In an example, the second face detection model may be a large-scale neural network model, that is, the neural network model has a relatively large number of layers, so that the obtained face detection result is more accurate. It should be noted that the size of the second face detection model is relatively smaller than that of the first face detection model, that is, the number of neural network layers of the second face detection model is often smaller than that of the first face detection model. The number of network layers is large, and the calculation speed of the second face detection model is also slower than the first face detection model. Exemplarily, the first face detection model takes 10ms to process an image frame, and the second face detection model takes 100ms to process an image frame with the same specification.

In other examples, the first human face detection model and the second human face detection model may not have the above-mentioned restrictions, and those skilled in the art can select the required model according to actual needs, the first human face detection model and the second human face detection model The model can also be the same neural network model. In one example, the first face detection model and the second face detection model can be the same model, and for the image frame of the close-up mode, step 106 can be omitted, and the initial face detection frame determined in step 104 is directly used to perform the step 108 processing.

In step 108, occlusion processing is performed on the detected faces according to a preset occlusion mode.

In this step, the preset occlusion mode may be a default occlusion mode, or may be set by a user. For example, a selection instruction for an occlusion mode may be received, and the selection instruction is used to determine the occlusion mode to be used from at least one candidate occlusion mode. The occlusion mode to be selected includes at least one of the following: perform occlusion processing on all human faces, perform occlusion processing on human faces other than preset human faces, or perform occlusion processing on preset human faces. In an example, the preset human face may refer to a pre-stored human face in a human face database.

In addition, when performing occlusion processing, different processing methods may also be adopted for faces obtained by different detection methods.

For example, based on the key point coordinates of the human head, the face in the image frame is occluded according to a preset occlusion mode. After the key point coordinates of the human head are detected, the area to be occluded can be determined according to the key point coordinates.

Exemplarily, the key point coordinates of the human head can be used as the center of the circle, a circle can be determined, and the range in the circle can be occluded, for example, the area in the circle can be mosaiced. When determining the radius of the circle, the radius length corresponding to the size of the image frame can be selected. For example, when the size of the image frame is 1080p, the radius length generally uses a value in the range of 20 to 30 pixels, or it can be determined by those skilled in the art according to the actual situation. Requirements setting.

In other examples, regions to be occluded in other shapes may also be determined according to the key point coordinates, for example, rectangles, hexagons or irregular shapes. In addition, when the occlusion processing is to cover the sticker on the face area, the coordinates of the key points can be used as the center of the sticker graphic to perform sticker coverage.

For example, based on the detection frame of the human face, occlusion processing is performed on the human face in the image frame according to a preset occlusion mode. After the detection frame of the face is detected, the area to be occluded can be determined according to the detection frame.

In an example, the area within the detection frame may be directly occluded, or the detection frame may be scaled to determine the area to be occluded, or other means may be used to deform the detection frame to determine the area to be occluded. In addition, when the occlusion process is to cover the face area with a sticker, the corresponding relationship between the covered area of the sticker graphic and the detection frame can be preset, and the sticker is covered according to the corresponding relationship.

For another example, when the face contour or head contour of a single person is detected, the area in the contour is occluded; when a dense area containing multiple faces is detected, the entire dense area is occluded.

In some scenarios, it is often necessary to distinguish the faces appearing in the image frame before performing occlusion processing. For example, in the post-production process of TV, it is necessary to occlude all faces except for specific faces, especially for street interviews. Except for reporters, other people who appear on the scene need to be occluded.

A solution proposed in this embodiment will be described below.

In one embodiment, on the basis of the above-mentioned embodiments, a face bank can be pre-configured, which contains the features of a plurality of pre-collected face images, and when the scene type of the image frame is close-range, the Face detection is performed on the frame and the face features corresponding to the detection frame are obtained.

Based on the detection frame of the human face in the above embodiment, the occlusion processing is performed on the human face in the image frame according to the preset occlusion mode, which may be the following processing:

For each face, match the facial features of the face with the preset face library, and when the preset occlusion mode is to occlude the face matching the reference face in the face library, the response When the face feature of the face is matched with the reference face in the face library, the face in the image frame is occluded; or, when the preset occlusion mode is matched with the face in the face library When occlusion processing is performed on a human face other than the reference human face, in response to the fact that the facial features of the human face do not match the reference human face in the face library, the occlusion processing is performed on the human face in the image frame.

In actual implementation, the face image in the detection frame can be input into the pre-trained face recognition model, and the face feature in the face image can be extracted by the face recognition model, and the extracted face feature can be compared with the face feature in the face database. Refer to the face image for matching, and judge whether there is a matching result for the face feature in the face database. For example, if the highest similarity between the feature of the reference face image in the face database and the extracted face feature reaches the similarity threshold, it can be considered that the face feature has a matching result in the face database, and the matching result is a human face. The reference feature in the library is the face image with the highest similarity to the face feature, and it is determined that the face matches the face library. Otherwise, it can be considered that there is no matching result in the face database for the facial features of the human face, and it is determined that the human face does not match the human face database.

When the preset occlusion mode is to perform occlusion processing on the face images in the face library, perform occlusion processing on the faces with matching results in the image frames.

In other examples, the preset occlusion mode may also be set to perform occlusion processing on face images outside the face library, and at this time, perform occlusion processing on faces for which there is no matching result in the image frame. For example, when there is only a reporter's face image in the face database, all other people in the image frame except the reporter need to be occluded.

In one example, when configuring the face library, you can also set the attribute of whether occlusion processing is required for each face image. After determining that the face feature has a matching result in the face library, further verify whether the face is Occlusion processing is required. For example, when there is a matching result in the face database in response to the human face feature, and when the attribute of the matching result needs to be occluded, the face in the image frame is occluded; if the attribute of the matching result is not required to be occluded, the Faces in image frames are not occluded. For other faces in the image frame that do not have matching results, occlusion processing may be performed or not performed according to a preset occlusion mode.

It should be noted that since the face area in the foreground is relatively small, the cost of face recognition is relatively high, requiring more complex algorithms and consuming more time. Therefore, in this example, only the image frames in the foreground are processed. Face recognition for faster processing. In other examples, if necessary, face recognition can also be performed on faces in the foreground to further determine whether occlusion processing is required. In addition, in scene types other than near view and distant view, it is also possible to refer to the method of this embodiment to match face features with a preset face library to determine whether occlusion processing is required.

The image processing method provided by the embodiments of the present disclosure can not only adaptively select a processing method that matches the scene type for image frames of different scene types, but also automatically perform face detection on the faces in the image frames and extract face features. , match the extracted face features with the face database, so as to selectively occlude the face, which can more accurately and flexibly occlude the face, improve the efficiency of image processing, and reduce manual operations time cost.

Fig. 2 is a flow chart of another image processing method shown in at least one embodiment of the present disclosure. The method can occlude the faces in the video according to the selected occlusion mode. The same steps as the process of 1 will not be described in detail.

In step 202, a selection instruction of a shading mode is received, wherein the selection instruction is used to determine a shading mode to be used from at least one shading mode to be selected.

The occlusion mode to be selected can be set by those skilled in the art according to actual needs. Exemplarily, the occlusion mode to be selected includes at least one of the following: perform occlusion processing on all faces, perform occlusion processing on faces outside the face bank, or Mask the faces in the face library. The occlusion mode to be selected may also be: perform occlusion processing on the faces in the face database according to attributes, and also perform occlusion processing on the faces outside the face database.

For example, a face library may be preconfigured, and the face library may include reference features of face images of people whose faces need to be shown. At this time, the selected occlusion mode may be to perform occlusion processing on faces outside the face library.

For another example, a face library may be pre-configured, and the face library may include reference features of face images of people who are prohibited from showing their faces. At this time, the selected occlusion mode may be to perform occlusion processing on the faces in the face database.

In addition, in addition to the reference features of the face image in the pre-configured face database, each face image can also have an attribute of whether occlusion processing is required, and this attribute can be changed according to requirements. For example, if a person’s face needs to be occluded in a video, the person’s face image can be set to require occlusion processing attributes; in another video, a person’s face does not need to be occluded, you can Sets the person's face image to not require occlusion processing properties. In addition, the occlusion mode can be flexibly selected and the attributes of the face images in the face database can be configured to meet various practical needs.

In addition to receiving a selection instruction for the occlusion mode to determine the occlusion mode, a default occlusion mode may also be set. The same occlusion mode can be applied to different scene types, and different occlusion modes can also be applied.

In step 204, deframe processing is performed on the video to obtain at least one image frame to be processed.

For example, a video to be processed uploaded by a user may be received, and the video may be disassembled into multiple image frames for subsequent frame-by-frame processing.

In step 206, the scene type of the image frame is acquired.

In this embodiment, taking the scene types as near view and distant view as an example, it may be determined whether an image frame is a distant view or a near view.

In step 208, the face in the image frame is detected according to the detection method matching the scene type of the image frame, and the detected face is occluded according to a preset occlusion mode.

For example, the occlusion processing can be mosaic processing, and the occlusion mode can be: perform occlusion processing on the faces in the face database according to the attributes, and also perform occlusion processing on the faces outside the human face database. At this time, after determining the scene type of the image frame, in response to the scene type of the image frame being a distant view, the key points of the human head are extracted from the image frame, and the key point coordinates of the human head in the image frame are obtained, and the occlusion processing is determined according to the key point coordinates In the region, mosaic algorithm processing is performed on the region; in response to the scene type of the image frame being close-up, face detection is performed on the face in the image frame, the detection frame of the face in the image frame is obtained, and the detection frame of the face is extracted The face feature of the face image, and the face feature is matched with the face library, and when there is a matching result in the face feature in the face library, and the attribute of the matching result needs to be occluded, the image frame is The mosaic algorithm is processed in the detection frame where the face is located.

In step 210, at least one image frame that has undergone occlusion processing is synthesized into a target video.

For example, multiple image frames after the above steps can be combined according to the time sequence in the initial video to be processed, and the synthesized target video can be output. It is also possible to manually verify multiple image frames after the processing in the above steps, reprocess individual image frames that fail to be occluded, and synthesize the video after verification.

In order to intuitively understand the processing process of the above steps, please refer to FIG. 2A , which shows the processing logic flow of the image processing method in this example.

The image processing method provided by the embodiments of the present disclosure can automatically and adaptively select a detection method that matches the scene type for image frames of different scene types in a video, so that when detecting faces in image frames, different The image frame of the scene adopts a more effective detection method for the image frame; you can also set the occlusion mode, and perform occlusion processing or not processing according to the face database, which can more accurately occlude the face, which greatly improves the video post-production. Efficiency, reducing manpower input.

As shown in Figure 3, Figure 3 is a block diagram of an image processing device shown in at least one embodiment of the present disclosure, the device includes: an image frame acquisition module 31, a scene type acquisition module 32, a face detection module 33 and occlusion processing Module 34.

The image frame acquisition module 31 is configured to acquire image frames to be processed.

The scene type acquiring module 32 is configured to acquire the scene type of the image frame, and the scene type is determined according to the result of the preliminary face detection in the image frame.

The face detection module 33 is configured to perform face detection on the image frame according to a detection method that matches the scene type of the image frame.

The occlusion processing module 34 is configured to perform occlusion processing on the detected faces according to a preset occlusion mode.

In one example, the human face detection module 33 is specifically configured to: in response to the scene type of the image frame being a distant view, extract key points of the human head from the image frame to obtain the key points of the human head in the image frame coordinate;

The occlusion processing module 34 is specifically configured to: perform occlusion processing on the detected human face according to a preset occlusion mode based on the key point coordinates of the human head.

In one example, the face detection module 33 is specifically configured to: in response to the scene type of the image frame being close-up, perform face detection on the image frame to obtain the detected face in the image frame detection frame;

The occlusion processing module 34 is specifically configured to: perform occlusion processing on the human face in the image frame according to a preset occlusion mode based on the detection frame.

In an example, the face detection performed on the image frame obtains the detection frame of the face detected in the image frame and the face feature corresponding to the detection frame of the face, and the occlusion processing module 34. When performing occlusion processing on the human face in the image frame according to the preset occlusion mode based on the detection frame of the human face, it is specifically used to: combine the facial features of the human face with the preset face database for matching; when the preset occlusion mode is to perform occlusion processing on the face matched with the reference face in the human face database, responding to the facial features of the human face and the human face The reference face in the library is matched, and the face in the image frame is occluded; or when the preset occlusion mode is to perform occlusion processing on faces other than the reference face in the face library , performing occlusion processing on the face in the image frame in response to the fact that the face feature of the face does not match the reference face in the face database.

In an example, when the occlusion processing module 34 is used to match the facial features of the human face with the preset human face library, it is specifically used to: respond to determining that the detection of the human face If the size of the frame exceeds a preset value, the face feature of the face image in the face detection frame is extracted, and the face feature is matched with the preset face library.

In one example, the scene type includes near view and distant view; the scene type acquisition module 32 is specifically configured to: perform a preliminary detection of a human face on the image frame to obtain a result of a preliminary detection of a human face, and the preliminary detection of a human face The result includes at least one initial face detection frame; when the target size of the preset number of initial face detection frames in the at least one initial face detection frame exceeds a preset threshold, determine that the scene type of the image frame is Close view; or, in the case that the target size of a preset number of initial face detection frames in the at least one initial face detection frame does not exceed a preset threshold, determine that the scene type of the image frame is a distant view.

In one example, the image frame acquisition module 31 is specifically configured to: perform deframe processing on the video to obtain at least one image frame to be processed; the image frame acquisition module 31 is also configured to: process the occlusion At least one image frame of the synthesized target video.

As shown in FIG. 4 , on the basis of the foregoing device embodiments, the device further includes: an occlusion mode selection module 30; the occlusion mode selection module 30 is configured to receive a selection instruction for the occlusion mode, and the selection instruction The occlusion mode to be used is determined from at least one occlusion mode to be selected; the occlusion mode to be selected includes at least one of the following items: performing occlusion processing on all human faces, and occluding human faces other than preset human faces Processing, and occlusion processing on the preset face.

For the implementation process of the functions and effects of each module in the above-mentioned device, please refer to the implementation process of the corresponding steps in the above-mentioned method for details, and will not repeat them here.

An embodiment of the present disclosure also provides an electronic device. As shown in FIG. The device is used to implement the image processing method described in any embodiment of the present disclosure when executing the computer instructions.

An embodiment of the present disclosure further provides a computer program product, which includes a computer program/instruction, and when the computer program/instruction is executed by a processor, implements the image processing method described in any embodiment of the present disclosure.

An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the image processing method described in any embodiment of the present disclosure is implemented. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.

As for the device embodiment, since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. It can be understood and implemented by those skilled in the art without creative effort.

The foregoing describes specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.

Other embodiments of the description will readily occur to those skilled in the art from consideration of the specification and practice of the invention claimed herein. This description is intended to cover any modification, use or adaptation of this description. These modifications, uses or adaptations follow the general principles of this description and include common knowledge or conventional technical means in this technical field for which this description does not apply . The specification and examples are to be considered exemplary only, with a true scope and spirit of the specification being indicated by the following claims.

It should be understood that this specification is not limited to the precise constructions which have been described above and shown in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the specification is limited only by the appended claims.

The above descriptions are only preferred embodiments of this specification, and are not intended to limit this specification. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this specification shall be included in this specification. within the scope of protection.

Claims

An image processing method, characterized in that the method comprises:

Get the image frame to be processed;

Acquiring the scene type of the image frame, the scene type is determined according to the result of the initial detection of the face of the image frame;

Perform face detection on the image frame according to a detection method that matches the scene type of the image frame;

The detected face is occluded according to the preset occlusion mode.
The method according to claim 1, wherein performing face detection on the image frame according to a detection method that matches the scene type of the image frame includes:

In response to the scene type of the image frame being a distant view, extracting the key points of the human head from the image frame to obtain the key point coordinates of the human head in the image frame;

According to the preset occlusion mode, the detected face is occluded, including:

Based on the key point coordinates of the human head, the detected human face is occluded according to a preset occlusion mode.
The method according to claim 1, wherein performing face detection on the image frame according to a detection method that matches the scene type of the image frame includes:

In response to the scene type of the image frame being a close view, performing face detection on the image frame to obtain a detection frame of a face detected in the image frame;

According to the preset occlusion mode, the detected face is occluded, including:

Based on the detection frame of the human face, occlusion processing is performed on the human face in the image frame according to a preset occlusion mode.
The method according to claim 3, characterized in that, in response to the scene type of the image frame being a close-up, face detection is performed on the image frame to obtain a detection frame of a face detected in the image frame comprising :

In response to the scene type of the image frame being close-range, face detection is performed on the image frame to obtain the face features corresponding to the detection frame of the human face detected in the image frame and the detection frame of the human face;

Based on the detection frame of the human face, performing occlusion processing on the human face in the image frame according to a preset occlusion mode, including:

Matching the facial features of the human face with a preset human face database;

When the preset occlusion mode is to occlude the face matched with the reference face in the face database, responding to the face features of the face matching the reference face in the face database , performing occlusion processing on the face in the image frame; or

When the preset occlusion mode is to perform occlusion processing on faces other than the reference face in the face library, responding to the fact that the face features of the face do not match the reference face in the face library , performing occlusion processing on the face in the image frame.
The method according to claim 4, wherein matching the facial features of the human face with the preset human face library includes:

Determine whether the size of the detection frame of the human face is greater than a preset value;

Responsive to determining that the size of the face detection frame exceeds a preset value, extracting the face features of the face image in the face detection frame, and comparing the face features with the preset face library to match.
The method according to any one of claims 1-5, characterized in that,

The scene type includes near view and distant view;

Obtain the scene type of the image frame, including:

Carrying out a preliminary detection of a human face on the image frame to obtain a preliminary detection result of a human face, wherein the preliminary detection result of a human face includes at least one initial detection frame of a human face;

In the case where the target size of a preset number of initial face detection frames in the at least one initial face detection frame exceeds a preset threshold, determine that the scene type of the image frame is a close view; or,

In a case where the target size of the preset number of initial face detection frames in the at least one initial face detection frame does not exceed a preset threshold, determine that the scene type of the image frame is a distant view.
The method according to claim 6, further comprising:

Sorting the at least one initial face detection frame according to the confidence score from large to small;

The N initial face detection frames with the highest confidence scores are determined as the preset number of initial face detection frames, where N is equal to the preset number.
The method according to any one of claims 1-7, characterized in that,

Get the image frames to be processed, including:

Deframing the video to obtain at least one image frame to be processed;

The image processing method also includes:

Combining at least one image frame that has been occluded into a target video.
The method according to any one of claims 1-8, wherein the method further comprises:

receiving a selection instruction for an occlusion mode, the selection instruction being used to determine the occlusion mode to be used from at least one candidate occlusion mode;

The occlusion mode to be selected includes at least one of the following:

Perform occlusion processing on all human faces, perform occlusion processing on human faces other than preset human faces, and perform occlusion processing on preset human faces.
An image processing device, characterized in that the device comprises:

An image frame acquisition module, configured to acquire an image frame to be processed;

The scene type acquisition module is used to obtain the scene type of the image frame, and the scene type is determined according to the result of the initial detection of the face in the image frame;

A face detection module, configured to perform face detection on the image frame according to a detection method that matches the scene type of the image frame;

The occlusion processing module is configured to perform occlusion processing on the detected faces according to a preset occlusion mode.
An electronic device, characterized in that the device comprises a memory and a processor, the memory is used to store computer instructions executable on the processor, and the processor is used to implement claim 1 when executing the computer instructions to any of the methods described in 9.
A computer program product, the product comprising computer programs/instructions, characterized in that, when the computer program/instructions are executed by a processor, the method described in any one of claims 1 to 9 is implemented.
A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1 to 9 is realized.