WO2023045183A1 - Image processing - Google Patents

Image processing Download PDF

Info

Publication number
WO2023045183A1
WO2023045183A1 PCT/CN2022/070905 CN2022070905W WO2023045183A1 WO 2023045183 A1 WO2023045183 A1 WO 2023045183A1 CN 2022070905 W CN2022070905 W CN 2022070905W WO 2023045183 A1 WO2023045183 A1 WO 2023045183A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image frame
detection
frame
preset
Prior art date
Application number
PCT/CN2022/070905
Other languages
French (fr)
Chinese (zh)
Inventor
田茂清
刘建博
伊帅
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023045183A1 publication Critical patent/WO2023045183A1/en

Links

Images

Definitions

  • Embodiments of the present disclosure relate to the technical field of image processing, and in particular, to an image processing method and device.
  • the embodiments of the present disclosure provide at least one image processing method and device.
  • an image processing method comprising: acquiring an image frame to be processed; acquiring a scene type of the image frame, the scene type being determined according to a preliminary face detection result of the image frame; In a detection mode that matches the scene type of the image frame, face detection is performed on the image frame; occlusion processing is performed on the detected human face according to a preset occlusion mode.
  • an image processing device comprising: an image frame acquisition module, configured to acquire an image frame to be processed; a scene type acquisition module, configured to acquire the scene type of the image frame, the scene type Determine according to the result of the initial inspection of the face of the image frame; the face detection module is used to detect the face of the image frame according to the detection method matched with the scene type of the image frame; the occlusion processing module uses The method is to occlude the detected faces according to the preset occlusion mode.
  • an electronic device in a third aspect, includes a memory and a processor, the memory is used to store computer instructions executable on the processor, and the processor is used to implement the present disclosure when executing the computer instructions.
  • a computer-readable storage medium on which a computer program is stored, and when the program is executed by a processor, the image processing method described in any embodiment of the present disclosure is implemented.
  • a computer program product in a fifth aspect, includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the image processing method described in any embodiment of the present disclosure is implemented.
  • the image processing method provided by the embodiment of the present disclosure adaptively selects a processing method that matches the scene type for image frames of different scene types, so that when performing occlusion processing on the face in the image frame, the image frames of different scenes can be A more effective processing method for the image frame is adopted, so the human face can be blocked more accurately, the efficiency of image processing is improved, and the time cost of manual operation is reduced.
  • Fig. 1 is a flowchart of an image processing method shown in at least one embodiment of the present disclosure
  • Fig. 2 is a flowchart of another image processing method shown in at least one embodiment of the present disclosure
  • Fig. 2A is a processing logic flowchart of an image processing method shown in at least one embodiment of the present disclosure
  • Fig. 3 is a block diagram of an image processing device shown in at least one embodiment of the present disclosure
  • Fig. 4 is a block diagram of another image processing device shown in at least one embodiment of the present disclosure.
  • Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to at least one embodiment of the present disclosure.
  • first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this specification, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word “if” as used herein may be interpreted as “at” or “when” or “in response to a determination.”
  • Figure 1 is a flowchart of an image processing method shown in at least one embodiment of the present disclosure, including the following steps:
  • step 102 an image frame to be processed is acquired, and the image frame includes at least one human face.
  • the image frame to be processed can be a photo, a screenshot, or a frame in a video.
  • This embodiment does not limit the specific manner of acquiring image frames.
  • a Vlog video input by a user may be received, and a tool library such as FFmpeg and OpenCV may be used to deframe the video to obtain a plurality of image frames including at least one human face.
  • a tool library such as FFmpeg and OpenCV may be used to deframe the video to obtain a plurality of image frames including at least one human face.
  • a photo taken by a camera and containing a human face may be received.
  • the processing in this embodiment may be occlusion processing, and occlusion processing is an image processing that can partially or completely hide the facial features of a person in an image frame, such as mosaic processing, covering with stickers, or Gaussian blur.
  • step 104 the scene type of the image frame is acquired, and the scene type is determined according to the result of the preliminary face detection in the image frame.
  • image frames are divided into different scene types, so as to select different image processing methods for image frames of different scene types.
  • the scene type is determined according to the preliminary detection result of the human face in the image frame, and the preliminary detection result of the human face may include a detection frame corresponding to each human face.
  • the result of the initial detection of the face can be obtained by using a lightweight neural network to perform preliminary detection of the face in the image frame.
  • scenario types may include single-player scenarios and multiplayer scenarios.
  • the number of detection frames corresponding to the faces that is, the number of faces, it may be determined whether the scene type of the image frame is a single-person scene or a multi-person scene.
  • the scene type may include a sparse scene and a dense scene.
  • the scene type of the image frame may be determined to be a sparse scene or a dense scene according to the distribution of the detection frames corresponding to the faces. Exemplarily, if the detection frames in the concentrated distribution exceed the preset number, it is judged that the scene type of the image frame is a dense scene; or, if the number of detection frames in the image frame is large but the detection frames in the concentrated distribution do not exceed If the number is preset, it is determined that the scene type of the image frame is a sparse scene.
  • the scene type may include distant view and close view.
  • the near view is larger and the far view is smaller, the characters in the image frame of the distant view are far away from the camera, and the area occupied by a single face is relatively small.
  • the area is relatively large, so in this embodiment, the distant view and the near view are divided according to the size of the detection frame of the human face.
  • obtaining the scene type of the image frame may be to conduct a preliminary inspection of the face in the acquired image frame to obtain the result of the preliminary detection of the face, and analyze and process the result of the preliminary detection of the face to obtain the scene type of the image frame; It may be that when the image frame to be processed is acquired, the image frame with the label of the scene type may be acquired, and the scene type of the image frame may be directly determined according to the label.
  • obtaining the scene type of the image frame may be processed as follows:
  • the target size of a preset number of initial face detection frames in the at least one initial face detection frame is greater than a preset threshold, determine that the scene type of the image frame is a close view; or,
  • the target size of the preset number of initial face detection frames in the at least one initial face detection frame is smaller than or equal to a preset threshold, it is determined that the scene type of the image frame is a distant view.
  • the preliminary detection of faces on the image frames can be performed through a pre-trained face detection model, and the face detection model here is called the first face detection model.
  • the first face detection model may be a small neural network model, that is, the number of layers of the neural network model is relatively small, so that the processing speed of the model is faster.
  • the image frame is input into the first face detection model, and the initial detection result of the human face is output, and the initial detection result of the human face includes information such as the size and/or confidence score of each initial face detection frame in at least one initial face detection frame, wherein, the confidence score indicates the probability that the image in the initial face detection frame belongs to a face image, and the higher the confidence score is, the more likely the image in the initial face detection frame is a face image.
  • a preset number of initial face detection frames may be selected from the initial face detection frames, and a target size of the selected preset number of initial face detection frames may be determined.
  • how to select the detection frame can be set by those skilled in the art according to actual needs, so that the judgment of the scene type of the image frame is more suitable for the actual application scene.
  • the initial face detection frames can be sorted according to the confidence scores from large to small, and five initial face detection frames with the highest confidence scores are selected.
  • five initial face detection frames are used as an example, and other numbers can also be selected;
  • the initial face detection frame can be selected according to the position of the initial face detection frame in the image frame, for example, three initial face detection frames are selected at equal intervals from the leftmost to the rightmost, or selected at a fixed point position in the image frame
  • the initial face detection frame it can also be selected according to the size of the initial face detection frame, the initial face detection frame is sorted from large to small, and four initial face detection frames are randomly selected or selected according to a certain rule.
  • the target size of the selected initial face detection frame is judged.
  • the size of the selected initial face detection frame may be calculated first to obtain a calculation result.
  • the purpose of the calculation process is to obtain the distance of the face in the image frame according to the size of the selected initial face detection frame, so as to judge whether the image frame belongs to the distant view or the near view.
  • the calculation process can be methods such as taking the average, taking the median, taking the root mean square, or randomly selecting.
  • the size of the five selected initial face detection frames can be calculated by taking the size of the five initial face detection frames
  • the average value, the target size of the preset number of initial face detection frames can be represented by the average value. If the average value is greater than the preset threshold, it is determined that the scene type of the image frame is a close view, otherwise it is determined that it is a distant view.
  • step 106 face detection is performed on the image frame according to a detection method that matches the scene type of the image frame.
  • detection methods matching different scene types are preset, so that a suitable detection method can be selected more specifically when detecting a human face.
  • the detection method matching the single-person scene may be: performing image segmentation on the image frame to obtain the face contour or head contour of the single person;
  • the detection method for matching a multi-person scene may be: performing face detection on an image frame to obtain multiple rectangular detection frames.
  • the detection method matching the sparse scene can be: perform face detection on the image frame to obtain multiple detection frames; the detection method matching the dense scene can be Yes: Face detection is performed on the image frame, and a dense area containing multiple faces is obtained.
  • this step may include the following processing:
  • the key point extraction of the human head is performed on the image frame to obtain the key point coordinates of the human head in the image frame, so that subsequent occlusion is performed based on the detected key point coordinates of the human head deal with.
  • face detection is performed on the image frame to obtain the detection frame of the face in the image frame and the corresponding face features, so that subsequent The face features in the face database are accurately compared to confirm whether occlusion processing is required.
  • the face detection technology can detect the area of the face in the image and output a series of rectangular detection frames, but it does not perform well in the scene where the face presented in the image is too small, and the calculation speed is relatively slow. Slow and inaccurate recognition.
  • the head point positioning technology can detect the position of the head in the image and output a key point in the center of the head. It performs better in the scene where the head area in the image is small, such as the distant view, and the positioning is more accurate. The positioning of the scene with a large head area is not accurate enough.
  • the matching detection method can be: input the image frame into the pre-trained head (face) point positioning model, and the head key point is determined by the head point positioning model Extract and output the key point coordinates of each head.
  • the key point coordinates can be represented by two numbers. For example, when the center of the coordinate axis is the point in the lower left corner of the image frame, the key point coordinates can be (18, 39), and the coordinate unit can be is a pixel, where the key point may be the center point of a human head.
  • the matching detection method can be: input the image frame into the pre-trained face detection model, and the face detection model here is called the second face detection model.
  • the image frame is input into the second face detection model, and the face detection result is output, and the face detection result includes a detection frame of each face in the image frame and corresponding face features.
  • Each detection frame is represented by detection frame coordinates to describe the coordinate position of a rectangular frame in the image frame, and it can also be a frame of other shapes.
  • a rectangular frame is taken as an example for illustration, and the coordinates of the detection frame can be represented by four numbers.
  • the four numbers can be the coordinates of the upper left corner and the lower right corner of the rectangular frame.
  • the coordinates of the detection frame can be (23, 75), (57, 46), or written as (23, 75, 57, 46).
  • the four numbers can also be the coordinates of the lower left and upper right corners of the rectangle.
  • the coordinates of the detection frame may also use other representations, such as eight numbers, which is not limited in this embodiment.
  • the unit of coordinates may be pixels.
  • the second face detection model may be a large-scale neural network model, that is, the neural network model has a relatively large number of layers, so that the obtained face detection result is more accurate.
  • the size of the second face detection model is relatively smaller than that of the first face detection model, that is, the number of neural network layers of the second face detection model is often smaller than that of the first face detection model.
  • the number of network layers is large, and the calculation speed of the second face detection model is also slower than the first face detection model.
  • the first face detection model takes 10ms to process an image frame
  • the second face detection model takes 100ms to process an image frame with the same specification.
  • the first human face detection model and the second human face detection model may not have the above-mentioned restrictions, and those skilled in the art can select the required model according to actual needs, the first human face detection model and the second human face detection model
  • the model can also be the same neural network model.
  • the first face detection model and the second face detection model can be the same model, and for the image frame of the close-up mode, step 106 can be omitted, and the initial face detection frame determined in step 104 is directly used to perform the step 108 processing.
  • step 108 occlusion processing is performed on the detected faces according to a preset occlusion mode.
  • the preset occlusion mode may be a default occlusion mode, or may be set by a user.
  • a selection instruction for an occlusion mode may be received, and the selection instruction is used to determine the occlusion mode to be used from at least one candidate occlusion mode.
  • the occlusion mode to be selected includes at least one of the following: perform occlusion processing on all human faces, perform occlusion processing on human faces other than preset human faces, or perform occlusion processing on preset human faces.
  • the preset human face may refer to a pre-stored human face in a human face database.
  • the face in the image frame is occluded according to a preset occlusion mode.
  • the area to be occluded can be determined according to the key point coordinates.
  • the key point coordinates of the human head can be used as the center of the circle, a circle can be determined, and the range in the circle can be occluded, for example, the area in the circle can be mosaiced.
  • the radius length corresponding to the size of the image frame can be selected. For example, when the size of the image frame is 1080p, the radius length generally uses a value in the range of 20 to 30 pixels, or it can be determined by those skilled in the art according to the actual situation. Requirements setting.
  • regions to be occluded in other shapes may also be determined according to the key point coordinates, for example, rectangles, hexagons or irregular shapes.
  • the coordinates of the key points can be used as the center of the sticker graphic to perform sticker coverage.
  • occlusion processing is performed on the human face in the image frame according to a preset occlusion mode. After the detection frame of the face is detected, the area to be occluded can be determined according to the detection frame.
  • the area within the detection frame may be directly occluded, or the detection frame may be scaled to determine the area to be occluded, or other means may be used to deform the detection frame to determine the area to be occluded.
  • the occlusion process is to cover the face area with a sticker
  • the corresponding relationship between the covered area of the sticker graphic and the detection frame can be preset, and the sticker is covered according to the corresponding relationship.
  • the area in the contour is occluded; when a dense area containing multiple faces is detected, the entire dense area is occluded.
  • the image processing method provided by the embodiment of the present disclosure adaptively selects a processing method that matches the scene type for image frames of different scene types, so that when performing occlusion processing on the face in the image frame, the image frames of different scenes can be A more effective processing method for the image frame is adopted, so the human face can be blocked more accurately, the efficiency of image processing is improved, and the time cost of manual operation is reduced.
  • a face bank can be pre-configured, which contains the features of a plurality of pre-collected face images, and when the scene type of the image frame is close-range, the Face detection is performed on the frame and the face features corresponding to the detection frame are obtained.
  • the occlusion processing is performed on the human face in the image frame according to the preset occlusion mode, which may be the following processing:
  • the response When the face feature of the face is matched with the reference face in the face library, the face in the image frame is occluded; or, when the preset occlusion mode is matched with the face in the face library.
  • occlusion processing is performed on a human face other than the reference human face, in response to the fact that the facial features of the human face do not match the reference human face in the face library, the occlusion processing is performed on the human face in the image frame.
  • the face image in the detection frame can be input into the pre-trained face recognition model, and the face feature in the face image can be extracted by the face recognition model, and the extracted face feature can be compared with the face feature in the face database.
  • the reference feature in the library is the face image with the highest similarity to the face feature, and it is determined that the face matches the face library. Otherwise, it can be considered that there is no matching result in the face database for the facial features of the human face, and it is determined that the human face does not match the human face database.
  • the preset occlusion mode is to perform occlusion processing on the face images in the face library, perform occlusion processing on the faces with matching results in the image frames.
  • the preset occlusion mode may also be set to perform occlusion processing on face images outside the face library, and at this time, perform occlusion processing on faces for which there is no matching result in the image frame. For example, when there is only a reporter's face image in the face database, all other people in the image frame except the reporter need to be occluded.
  • the image processing method provided by the embodiments of the present disclosure can not only adaptively select a processing method that matches the scene type for image frames of different scene types, but also automatically perform face detection on the faces in the image frames and extract face features. , match the extracted face features with the face database, so as to selectively occlude the face, which can more accurately and flexibly occlude the face, improve the efficiency of image processing, and reduce manual operations time cost.
  • Fig. 2 is a flow chart of another image processing method shown in at least one embodiment of the present disclosure.
  • the method can occlude the faces in the video according to the selected occlusion mode. The same steps as the process of 1 will not be described in detail.
  • step 202 a selection instruction of a shading mode is received, wherein the selection instruction is used to determine a shading mode to be used from at least one shading mode to be selected.
  • the occlusion mode to be selected can be set by those skilled in the art according to actual needs.
  • the occlusion mode to be selected includes at least one of the following: perform occlusion processing on all faces, perform occlusion processing on faces outside the face bank, or Mask the faces in the face library.
  • the occlusion mode to be selected may also be: perform occlusion processing on the faces in the face database according to attributes, and also perform occlusion processing on the faces outside the face database.
  • a face library may be preconfigured, and the face library may include reference features of face images of people whose faces need to be shown.
  • the selected occlusion mode may be to perform occlusion processing on faces outside the face library.
  • a face library may be pre-configured, and the face library may include reference features of face images of people who are prohibited from showing their faces.
  • the selected occlusion mode may be to perform occlusion processing on the faces in the face database.
  • each face image can also have an attribute of whether occlusion processing is required, and this attribute can be changed according to requirements. For example, if a person’s face needs to be occluded in a video, the person’s face image can be set to require occlusion processing attributes; in another video, a person’s face does not need to be occluded, you can Sets the person's face image to not require occlusion processing properties.
  • the occlusion mode can be flexibly selected and the attributes of the face images in the face database can be configured to meet various practical needs.
  • a default occlusion mode may also be set.
  • the same occlusion mode can be applied to different scene types, and different occlusion modes can also be applied.
  • step 204 deframe processing is performed on the video to obtain at least one image frame to be processed.
  • a video to be processed uploaded by a user may be received, and the video may be disassembled into multiple image frames for subsequent frame-by-frame processing.
  • step 206 the scene type of the image frame is acquired.
  • step 208 the face in the image frame is detected according to the detection method matching the scene type of the image frame, and the detected face is occluded according to a preset occlusion mode.
  • the occlusion processing can be mosaic processing
  • the occlusion mode can be: perform occlusion processing on the faces in the face database according to the attributes, and also perform occlusion processing on the faces outside the human face database.
  • the key points of the human head are extracted from the image frame, and the key point coordinates of the human head in the image frame are obtained, and the occlusion processing is determined according to the key point coordinates In the region, mosaic algorithm processing is performed on the region; in response to the scene type of the image frame being close-up, face detection is performed on the face in the image frame, the detection frame of the face in the image frame is obtained, and the detection frame of the face is extracted.
  • step 210 at least one image frame that has undergone occlusion processing is synthesized into a target video.
  • multiple image frames after the above steps can be combined according to the time sequence in the initial video to be processed, and the synthesized target video can be output. It is also possible to manually verify multiple image frames after the processing in the above steps, reprocess individual image frames that fail to be occluded, and synthesize the video after verification.
  • FIG. 2A shows the processing logic flow of the image processing method in this example.
  • the image processing method provided by the embodiments of the present disclosure can automatically and adaptively select a detection method that matches the scene type for image frames of different scene types in a video, so that when detecting faces in image frames, different The image frame of the scene adopts a more effective detection method for the image frame; you can also set the occlusion mode, and perform occlusion processing or not processing according to the face database, which can more accurately occlude the face, which greatly improves the video post-production. Efficiency, reducing manpower input.
  • Figure 3 is a block diagram of an image processing device shown in at least one embodiment of the present disclosure, the device includes: an image frame acquisition module 31, a scene type acquisition module 32, a face detection module 33 and occlusion processing Module 34.
  • the image frame acquisition module 31 is configured to acquire image frames to be processed.
  • the scene type acquiring module 32 is configured to acquire the scene type of the image frame, and the scene type is determined according to the result of the preliminary face detection in the image frame.
  • the face detection module 33 is configured to perform face detection on the image frame according to a detection method that matches the scene type of the image frame.
  • the occlusion processing module 34 is configured to perform occlusion processing on the detected faces according to a preset occlusion mode.
  • the human face detection module 33 is specifically configured to: in response to the scene type of the image frame being a distant view, extract key points of the human head from the image frame to obtain the key points of the human head in the image frame coordinate;
  • the occlusion processing module 34 is specifically configured to: perform occlusion processing on the detected human face according to a preset occlusion mode based on the key point coordinates of the human head.
  • the face detection module 33 is specifically configured to: in response to the scene type of the image frame being close-up, perform face detection on the image frame to obtain the detected face in the image frame detection frame;
  • the occlusion processing module 34 is specifically configured to: perform occlusion processing on the human face in the image frame according to a preset occlusion mode based on the detection frame.
  • the face detection performed on the image frame obtains the detection frame of the face detected in the image frame and the face feature corresponding to the detection frame of the face, and the occlusion processing module 34.
  • the preset occlusion mode based on the detection frame of the human face, it is specifically used to: combine the facial features of the human face with the preset face database for matching; when the preset occlusion mode is to perform occlusion processing on the face matched with the reference face in the human face database, responding to the facial features of the human face and the human face The reference face in the library is matched, and the face in the image frame is occluded; or when the preset occlusion mode is to perform occlusion processing on faces other than the reference face in the face library , performing occlusion processing on the face in the image frame in response to the fact that the face feature of the face does not match the reference face in the face database.
  • the occlusion processing module 34 when used to match the facial features of the human face with the preset human face library, it is specifically used to: respond to determining that the detection of the human face If the size of the frame exceeds a preset value, the face feature of the face image in the face detection frame is extracted, and the face feature is matched with the preset face library.
  • the scene type includes near view and distant view;
  • the scene type acquisition module 32 is specifically configured to: perform a preliminary detection of a human face on the image frame to obtain a result of a preliminary detection of a human face, and the preliminary detection of a human face
  • the result includes at least one initial face detection frame; when the target size of the preset number of initial face detection frames in the at least one initial face detection frame exceeds a preset threshold, determine that the scene type of the image frame is Close view; or, in the case that the target size of a preset number of initial face detection frames in the at least one initial face detection frame does not exceed a preset threshold, determine that the scene type of the image frame is a distant view.
  • the image frame acquisition module 31 is specifically configured to: perform deframe processing on the video to obtain at least one image frame to be processed; the image frame acquisition module 31 is also configured to: process the occlusion At least one image frame of the synthesized target video.
  • the device further includes: an occlusion mode selection module 30; the occlusion mode selection module 30 is configured to receive a selection instruction for the occlusion mode, and the selection instruction
  • the occlusion mode to be used is determined from at least one occlusion mode to be selected; the occlusion mode to be selected includes at least one of the following items: performing occlusion processing on all human faces, and occluding human faces other than preset human faces Processing, and occlusion processing on the preset face.
  • An embodiment of the present disclosure also provides an electronic device. As shown in FIG. The device is used to implement the image processing method described in any embodiment of the present disclosure when executing the computer instructions.
  • An embodiment of the present disclosure further provides a computer program product, which includes a computer program/instruction, and when the computer program/instruction is executed by a processor, implements the image processing method described in any embodiment of the present disclosure.
  • An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the image processing method described in any embodiment of the present disclosure is implemented.
  • the storage medium may be a volatile or non-volatile computer-readable storage medium.
  • the device embodiment since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment.
  • the device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. It can be understood and implemented by those skilled in the art without creative effort.

Landscapes

  • Image Processing (AREA)
  • Collating Specific Patterns (AREA)
  • Image Analysis (AREA)

Abstract

Embodiments of the present disclosure provide an image processing method and apparatus. The method comprises: obtaining an image frame to be processed; obtaining a scenario type of said image frame, the scenario type being determined according to an initial face detection result of said image frame; performing face detection on said image frame according to a detection mode matching the scenario type of said image frame; and performing, according to a preset shielding mode, shielding processing on the detected face. According to the method, the face can be shielded more accurately, the image processing efficiency is improved, and the time cost of the manual operation is reduced.

Description

图像处理Image Processing
相关申请的交叉引用Cross References to Related Applications
本申请要求在2021年9月22日提交至中国专利局、申请号为CN202111108402X的中国专利申请的优先权,其全部内容通过引用结合在本公开中。This application claims priority to a Chinese patent application with application number CN202111108402X filed with the China Patent Office on September 22, 2021, the entire contents of which are incorporated by reference in this disclosure.
技术领域technical field
本公开实施例涉及图像处理技术领域,尤其涉及一种图像处理方法和装置。Embodiments of the present disclosure relate to the technical field of image processing, and in particular, to an image processing method and device.
背景技术Background technique
近些年来,随着人脸识别技术越来越广泛地应用于安防、支付以及解锁等领域,人们越来越重视对人脸信息的隐私保护。例如,在电视节目后期制作中,出于隐私保护的需要,往往需要对一些出镜的人脸,进行遮挡处理。依赖于人工的方法在对人脸进行打码遮挡处理时,工作量巨大且繁琐,并经常会出现漏打、错打等问题。In recent years, as face recognition technology is more and more widely used in security, payment and unlocking fields, people pay more and more attention to the privacy protection of face information. For example, in the post-production of TV programs, due to the need of privacy protection, it is often necessary to block some faces that appear on the screen. When relying on manual methods, the workload is huge and cumbersome when coding and occluding human faces, and problems such as missed typing and wrong typing often occur.
而相关技术中,利用深度学习算法对图像中的人脸进行遮挡处理时,所使用的处理算法很难适用于各种场景下的图像,效果并不理想。However, in related technologies, when using deep learning algorithms to occlude faces in images, the processing algorithms used are difficult to apply to images in various scenarios, and the effect is not ideal.
发明内容Contents of the invention
有鉴于此,本公开实施例提供至少一种图像处理方法和装置。In view of this, the embodiments of the present disclosure provide at least one image processing method and device.
具体地,本公开实施例是通过如下技术方案实现的:Specifically, the embodiments of the present disclosure are achieved through the following technical solutions:
第一方面,提供一种图像处理方法,所述方法包括:获取待处理的图像帧;获取所述图像帧的场景类型,所述场景类型根据所述图像帧的人脸初检结果确定;依据与所述图像帧的场景类型相匹配的检测方式,对所述图像帧进行人脸检测;按照预设的遮挡模式对检测到的人脸进行遮挡处理。In a first aspect, an image processing method is provided, the method comprising: acquiring an image frame to be processed; acquiring a scene type of the image frame, the scene type being determined according to a preliminary face detection result of the image frame; In a detection mode that matches the scene type of the image frame, face detection is performed on the image frame; occlusion processing is performed on the detected human face according to a preset occlusion mode.
第二方面,提供一种图像处理装置,所述装置包括:图像帧获取模块,用于获取待处理的图像帧;场景类型获取模块,用于获取所述图像帧的场景类型,所述场景类型根据所述图像帧的人脸初检结果确定;人脸检测模块,用于依据与所述图像帧的场景类型相匹配的检测方式,对所述图像帧进行人脸检测;遮挡处理模块,用于按照预设的遮挡模式对检测到的人脸进行遮挡处理。In a second aspect, an image processing device is provided, the device comprising: an image frame acquisition module, configured to acquire an image frame to be processed; a scene type acquisition module, configured to acquire the scene type of the image frame, the scene type Determine according to the result of the initial inspection of the face of the image frame; the face detection module is used to detect the face of the image frame according to the detection method matched with the scene type of the image frame; the occlusion processing module uses The method is to occlude the detected faces according to the preset occlusion mode.
第三方面,提供一种电子设备,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现本公开任一实施例所述的图像处理方法。In a third aspect, an electronic device is provided, the device includes a memory and a processor, the memory is used to store computer instructions executable on the processor, and the processor is used to implement the present disclosure when executing the computer instructions The image processing method described in any one of the embodiments.
第四方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施例所述的图像处理方法。In a fourth aspect, a computer-readable storage medium is provided, on which a computer program is stored, and when the program is executed by a processor, the image processing method described in any embodiment of the present disclosure is implemented.
第五方面,提供一种计算机程序产品,所述产品包括计算机程序/指令,所述计算机程序/指令被处理器执行时实现本公开任一实施例所述的图像处理方法。In a fifth aspect, a computer program product is provided, the product includes a computer program/instruction, and when the computer program/instruction is executed by a processor, the image processing method described in any embodiment of the present disclosure is implemented.
本公开实施例提供的图像处理方法,对于不同场景类型的图像帧自适应选取与场景类型相匹配的处理方式,从而在对图像帧中的人脸进行遮挡处理时,可以对不同场景的图像帧采取针对该图像帧更有效的处理方式,所以,能够更准确的对人脸进行遮挡,提高了图像处理的效率,减少了人工操作的时间成本。The image processing method provided by the embodiment of the present disclosure adaptively selects a processing method that matches the scene type for image frames of different scene types, so that when performing occlusion processing on the face in the image frame, the image frames of different scenes can be A more effective processing method for the image frame is adopted, so the human face can be blocked more accurately, the efficiency of image processing is improved, and the time cost of manual operation is reduced.
附图说明Description of drawings
为了更清楚地说明本公开一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本公开一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in one or more embodiments of the present disclosure or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, the accompanying drawings in the following description The drawings are only some embodiments described in one or more embodiments of the present disclosure, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.
图1是本公开至少一个实施例示出的一种图像处理方法的流程图;Fig. 1 is a flowchart of an image processing method shown in at least one embodiment of the present disclosure;
图2是本公开至少一个实施例示出的另一种图像处理方法的流程图;Fig. 2 is a flowchart of another image processing method shown in at least one embodiment of the present disclosure;
图2A是本公开至少一个实施例示出的图像处理方法的处理逻辑流程图;Fig. 2A is a processing logic flowchart of an image processing method shown in at least one embodiment of the present disclosure;
图3是本公开至少一个实施例示出的一种图像处理装置的框图;Fig. 3 is a block diagram of an image processing device shown in at least one embodiment of the present disclosure;
图4是本公开至少一个实施例示出的另一种图像处理装置的框图;Fig. 4 is a block diagram of another image processing device shown in at least one embodiment of the present disclosure;
图5是本公开至少一个实施例示出的一种电子设备的硬件结构示意图。Fig. 5 is a schematic diagram of a hardware structure of an electronic device according to at least one embodiment of the present disclosure.
具体实施方式Detailed ways
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本说明书相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本说明书的一些方面相一致的装置和方法的例子。Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with this specification. Rather, they are merely examples of apparatuses and methods consistent with aspects of the present specification as recited in the appended claims.
在本说明书使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本说明书。在本说明书和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in this specification are for the purpose of describing particular embodiments only, and are not intended to limit the specification. As used in this specification and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
应当理解,尽管在本说明书可能采用术语第一、第二、第三等来描述各种信息,但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如,在 不脱离本说明书范围的情况下,第一信息也可以被称为第二信息,类似地,第二信息也可以被称为第一信息。取决于语境,如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish information of the same type from one another. For example, without departing from the scope of this specification, first information may also be called second information, and similarly, second information may also be called first information. Depending on the context, the word "if" as used herein may be interpreted as "at" or "when" or "in response to a determination."
如图1所示,图1是本公开至少一个实施例示出的一种图像处理方法的流程图,包括以下步骤:As shown in Figure 1, Figure 1 is a flowchart of an image processing method shown in at least one embodiment of the present disclosure, including the following steps:
在步骤102中,获取待处理的图像帧,所述图像帧中包括至少一个人脸。In step 102, an image frame to be processed is acquired, and the image frame includes at least one human face.
待处理的图像帧,可以是照片、截图或者视频中的帧等。The image frame to be processed can be a photo, a screenshot, or a frame in a video.
本实施例对获取图像帧的具体方式不进行限制。This embodiment does not limit the specific manner of acquiring image frames.
例如,可以接收用户输入的Vlog视频,利用FFmpeg、OpenCV等工具库对该视频进行解帧处理,得到多个包括至少一个人脸的图像帧。For example, a Vlog video input by a user may be received, and a tool library such as FFmpeg and OpenCV may be used to deframe the video to obtain a plurality of image frames including at least one human face.
又例如,可以接收摄像机拍摄的包含人脸的照片。For another example, a photo taken by a camera and containing a human face may be received.
本实施例中的处理可以是遮挡处理,遮挡处理是能够部分或全部隐藏图像帧中人物的面部特征的图像处理,比如,可以是马赛克处理、覆盖上贴纸或者高斯模糊等处理手段。The processing in this embodiment may be occlusion processing, and occlusion processing is an image processing that can partially or completely hide the facial features of a person in an image frame, such as mosaic processing, covering with stickers, or Gaussian blur.
在步骤104中,获取所述图像帧的场景类型,所述场景类型根据所述图像帧中的人脸初检结果确定。In step 104, the scene type of the image frame is acquired, and the scene type is determined according to the result of the preliminary face detection in the image frame.
本实施例中,将图像帧划分为不同的场景类型,以便有针对性地对不同场景类型的图像帧选用不同的图像处理方法。场景类型根据对图像帧中的人脸进行初检得到的人脸初检结果确定,人脸初检结果可以包括每个人脸对应的检测框。人脸初检结果可以是使用一个轻量级的神经网络对图像帧中的人脸进行初步检测得到。In this embodiment, image frames are divided into different scene types, so as to select different image processing methods for image frames of different scene types. The scene type is determined according to the preliminary detection result of the human face in the image frame, and the preliminary detection result of the human face may include a detection frame corresponding to each human face. The result of the initial detection of the face can be obtained by using a lightweight neural network to perform preliminary detection of the face in the image frame.
例如,场景类型可以包括单人场景和多人场景。可以根据人脸对应的检测框的个数,即人脸个数,确定图像帧的场景类型是单人场景或多人场景。For example, scenario types may include single-player scenarios and multiplayer scenarios. According to the number of detection frames corresponding to the faces, that is, the number of faces, it may be determined whether the scene type of the image frame is a single-person scene or a multi-person scene.
又例如,场景类型可以包括稀疏场景和密集场景。可以根据人脸对应的检测框的分布情况,确定图像帧的场景类型是稀疏场景或密集场景。示例性的,如果集中分布的检测框超过预设个数,则判断该图像帧的场景类型为密集场景;或者,如果图像帧中的检测框的个数较多但集中分布的检测框没有超过预设个数,则判断该图像帧的场景类型为稀疏场景。For another example, the scene type may include a sparse scene and a dense scene. The scene type of the image frame may be determined to be a sparse scene or a dense scene according to the distribution of the detection frames corresponding to the faces. Exemplarily, if the detection frames in the concentrated distribution exceed the preset number, it is judged that the scene type of the image frame is a dense scene; or, if the number of detection frames in the image frame is large but the detection frames in the concentrated distribution do not exceed If the number is preset, it is determined that the scene type of the image frame is a sparse scene.
又例如,场景类型可以包括远景和近景。一般而言,由于近大远小,远景的图像帧中的人物离镜头较远,单个人脸所占的区域比较小,近景的图像帧中的人物离镜头比较近,单个人脸所占的区域比较大,因此,本实施例将远景和近景根据人脸的检测框的大小进行划分。For another example, the scene type may include distant view and close view. Generally speaking, due to the fact that the near view is larger and the far view is smaller, the characters in the image frame of the distant view are far away from the camera, and the area occupied by a single face is relatively small. The area is relatively large, so in this embodiment, the distant view and the near view are divided according to the size of the detection frame of the human face.
本步骤中,获取图像帧的场景类型可以是对获取的图像帧中的人脸进行初检,得到人脸初检结果,对人脸初检结果进行分析处理得到该图像帧的场景类型;也可以是在获取待处理的图像帧时,获取具有场景类型的标签的图像帧,可以直接根据该标签确定图像帧的场景类型。In this step, obtaining the scene type of the image frame may be to conduct a preliminary inspection of the face in the acquired image frame to obtain the result of the preliminary detection of the face, and analyze and process the result of the preliminary detection of the face to obtain the scene type of the image frame; It may be that when the image frame to be processed is acquired, the image frame with the label of the scene type may be acquired, and the scene type of the image frame may be directly determined according to the label.
以场景类型包括近景和远景为例,获取所述图像帧的场景类型,可以是如下处理:Taking the scene type including near view and distant view as an example, obtaining the scene type of the image frame may be processed as follows:
对所述图像帧进行人脸初检,得到人脸初检结果,所述人脸初检结果包括至少一个初始人脸检测框;Carrying out a preliminary detection of a human face on the image frame to obtain a preliminary detection result of a human face, wherein the preliminary detection result of a human face includes at least one initial detection frame of a human face;
在所述至少一个初始人脸检测框中预设数量的初始人脸检测框的目标尺寸大于预设阈值的情况下,确定所述图像帧的场景类型为近景;或,In the case where the target size of a preset number of initial face detection frames in the at least one initial face detection frame is greater than a preset threshold, determine that the scene type of the image frame is a close view; or,
在所述至少一个初始人脸检测框中所述预设数量的初始人脸检测框的目标尺寸小于或等于预设阈值的情况下,确定所述图像帧的场景类型为远景。In a case where the target size of the preset number of initial face detection frames in the at least one initial face detection frame is smaller than or equal to a preset threshold, it is determined that the scene type of the image frame is a distant view.
实际实施中,对图像帧进行人脸初检可以通过预先训练的人脸检测模型进行,这里的人脸检测模型称之为第一人脸检测模型。在一个例子中,第一人脸检测模型可以是一个小型的神经网络模型,即神经网络模型的层数比较少,以使模型的处理速度更快。将图像帧输入第一人脸检测模型,输出人脸初检结果,人脸初检结果包括至少一个初始人脸检测框中每个初始人脸检测框的大小和/或置信度分数等信息,其中,置信度分数表示初始人脸检测框中的图像属于人脸图像的概率,置信度分数越高,该初始人脸检测框中的图像越可能为人脸图像。In actual implementation, the preliminary detection of faces on the image frames can be performed through a pre-trained face detection model, and the face detection model here is called the first face detection model. In an example, the first face detection model may be a small neural network model, that is, the number of layers of the neural network model is relatively small, so that the processing speed of the model is faster. The image frame is input into the first face detection model, and the initial detection result of the human face is output, and the initial detection result of the human face includes information such as the size and/or confidence score of each initial face detection frame in at least one initial face detection frame, Wherein, the confidence score indicates the probability that the image in the initial face detection frame belongs to a face image, and the higher the confidence score is, the more likely the image in the initial face detection frame is a face image.
示例性的,可以从初始人脸检测框中选取预设数量的初始人脸检测框,并对选取出的预设数量的初始人脸检测框的目标尺寸进行判断。具体如何选取检测框可以由本领域技术人员根据实际需求设置,以使对图像帧的场景类型的判断更为贴合实际的应用场景。Exemplarily, a preset number of initial face detection frames may be selected from the initial face detection frames, and a target size of the selected preset number of initial face detection frames may be determined. Specifically, how to select the detection frame can be set by those skilled in the art according to actual needs, so that the judgment of the scene type of the image frame is more suitable for the actual application scene.
示例性的,可以将初始人脸检测框根据置信度分数从大到小排序,选取置信度分数最高的五个初始人脸检测框,这里以五个为例,也可以选取其他个数;也可以是按照初始人脸检测框在图像帧中的位置选取初始人脸检测框,例如,从最左到最右等间隔的选取三个初始人脸检测框,或者,在图像帧的定点位置选取初始人脸检测框;还可以根据初始人脸检测框的大小选取,将初始人脸检测框从大到小排序,随机选取或者按照一定规律选取四个初始人脸检测框。Exemplarily, the initial face detection frames can be sorted according to the confidence scores from large to small, and five initial face detection frames with the highest confidence scores are selected. Here, five initial face detection frames are used as an example, and other numbers can also be selected; The initial face detection frame can be selected according to the position of the initial face detection frame in the image frame, for example, three initial face detection frames are selected at equal intervals from the leftmost to the rightmost, or selected at a fixed point position in the image frame The initial face detection frame; it can also be selected according to the size of the initial face detection frame, the initial face detection frame is sorted from large to small, and four initial face detection frames are randomly selected or selected according to a certain rule.
在选取初始人脸检测框后,对上述选取的初始人脸检测框的目标尺寸进行判断。可以先将选取的初始人脸检测框的尺寸大小进行计算处理,得到计算结果。计算处理的目的是根据选取的初始人脸检测框的大小得到图像帧中人脸远近的情况,以判断该图像帧属于远景还是近景。计算处理可以是取平均值、取中位数、取均方根或者随机选取等方 法。After the initial face detection frame is selected, the target size of the selected initial face detection frame is judged. The size of the selected initial face detection frame may be calculated first to obtain a calculation result. The purpose of the calculation process is to obtain the distance of the face in the image frame according to the size of the selected initial face detection frame, so as to judge whether the image frame belongs to the distant view or the near view. The calculation process can be methods such as taking the average, taking the median, taking the root mean square, or randomly selecting.
本示例中,以选取置信度分数最高的五个初始人脸检测框为例,对选取的五个初始人脸检测框的大小进行计算处理可以是取这五个初始人脸检测框的大小的平均值,预设数量的初始人脸检测框的目标尺寸可以用该平均值表示。若平均值大于预设阈值,则判定为该图像帧的场景类型为近景,反之判定为远景。In this example, taking the selection of five initial face detection frames with the highest confidence scores as an example, the size of the five selected initial face detection frames can be calculated by taking the size of the five initial face detection frames The average value, the target size of the preset number of initial face detection frames can be represented by the average value. If the average value is greater than the preset threshold, it is determined that the scene type of the image frame is a close view, otherwise it is determined that it is a distant view.
在步骤106中,依据与所述图像帧的场景类型相匹配的检测方式,对所述图像帧进行人脸检测。In step 106, face detection is performed on the image frame according to a detection method that matches the scene type of the image frame.
场景类型不同,与之相匹配的检测方式也不同。本实施例预先设有和不同场景类型匹配的检测方式,以使在对人脸进行检测时可以更有针对性地选择适合的检测方式。Depending on the type of scene, the corresponding detection methods are also different. In this embodiment, detection methods matching different scene types are preset, so that a suitable detection method can be selected more specifically when detecting a human face.
例如,在场景类型包括单人场景和多人场景的情况下,与单人场景相匹配的检测方式可以是:对图像帧进行图像分割,得到该单人的人脸轮廓或头部轮廓;与多人场景相匹配的检测方式可以是:对图像帧进行人脸检测,得到多个矩形的检测框。For example, when the scene type includes a single-person scene and a multi-person scene, the detection method matching the single-person scene may be: performing image segmentation on the image frame to obtain the face contour or head contour of the single person; The detection method for matching a multi-person scene may be: performing face detection on an image frame to obtain multiple rectangular detection frames.
又例如,在场景类型包括稀疏场景和密集场景的情况下,与稀疏场景相匹配的检测方式可以是:对图像帧进行人脸检测,得到多个检测框;与密集场景相匹配的检测方式可以是:对图像帧进行人脸检测,得到包含多个人脸的密集区域。For another example, when the scene type includes sparse scene and dense scene, the detection method matching the sparse scene can be: perform face detection on the image frame to obtain multiple detection frames; the detection method matching the dense scene can be Yes: Face detection is performed on the image frame, and a dense area containing multiple faces is obtained.
又例如,在场景类型包括近景和远景的情况下,本步骤可以包括以下处理:For another example, in the case that the scene type includes near view and distant view, this step may include the following processing:
响应于所述图像帧的场景类型为远景,对所述图像帧进行人头关键点提取,得到所述图像帧中所述人头的关键点坐标,以便后续基于检测到的人头的关键点坐标进行遮挡处理。Responding to the fact that the scene type of the image frame is a distant view, the key point extraction of the human head is performed on the image frame to obtain the key point coordinates of the human head in the image frame, so that subsequent occlusion is performed based on the detected key point coordinates of the human head deal with.
响应于所述图像帧的场景类型为近景,对所述图像帧进行人脸检测,得到所述图像帧中人脸的检测框以及对应的人脸特征,以便后续基于人脸特征与预设人脸库中的人脸特征进行准确比对,确认是否需要做遮挡处理。In response to the scene type of the image frame being close-up, face detection is performed on the image frame to obtain the detection frame of the face in the image frame and the corresponding face features, so that subsequent The face features in the face database are accurately compared to confirm whether occlusion processing is required.
经实践发现,人脸检测技术可以检测出图像中人脸的区域,并输出一系列矩形的检测框,但是在远景这种图像中呈现的人脸过小的场景下表现不好,计算速度较慢且识别不准确。而人头点定位技术可以检测出图像中人头出现的位置,输出人头中心的一个关键点,在远景这种图像中人头区域较小时的场景下表现较好,定位较准确,但是对于近景这样图像中人头区域较大的场景定位不够准确。It has been found through practice that the face detection technology can detect the area of the face in the image and output a series of rectangular detection frames, but it does not perform well in the scene where the face presented in the image is too small, and the calculation speed is relatively slow. Slow and inaccurate recognition. The head point positioning technology can detect the position of the head in the image and output a key point in the center of the head. It performs better in the scene where the head area in the image is small, such as the distant view, and the positioning is more accurate. The positioning of the scene with a large head area is not accurate enough.
实际实施中,在图像帧的场景类型为远景的情况下,与其相匹配的检测方式可以是:将图像帧输入预先训练好的人头(脸)点定位模型,由人头点定位模型进行人头关键点提取,输出每个人头的关键点坐标,关键点坐标可以用两个数字表示,比如在坐标轴中心为图像帧左下角的点时,关键点坐标可以是(18,39),坐标的单位可以是像素,其 中,关键点可以是人头的中心点。In actual implementation, when the scene type of the image frame is a distant view, the matching detection method can be: input the image frame into the pre-trained head (face) point positioning model, and the head key point is determined by the head point positioning model Extract and output the key point coordinates of each head. The key point coordinates can be represented by two numbers. For example, when the center of the coordinate axis is the point in the lower left corner of the image frame, the key point coordinates can be (18, 39), and the coordinate unit can be is a pixel, where the key point may be the center point of a human head.
实际实施中,在图像帧的场景类型为近景的情况下,与其相匹配的检测方式可以是:将图像帧输入预先训练的人脸检测模型,这里的人脸检测模型称之为第二人脸检测模型。将图像帧输入第二人脸检测模型,输出人脸检测结果,人脸检测结果包括图像帧中每个人脸的检测框以及对应的人脸特征。每个检测框用检测框坐标表示,用以说明一个矩形框在图像帧中的坐标位置,也可以是其他形状的框。本实施例以矩形框为例进行说明,检测框坐标可以用四个数字表示。比如,四个数字可以是矩形框的左上角和右下角的坐标,在坐标轴中心为图像帧左下角的点时,检测框坐标可以是(23,75),(57,46),或者写成(23,75,57,46)。四个数字也可以是矩形框的左下角和右上角的坐标。在其他例子,检测框坐标也可以使用其他表示方式,比如八个数字,本实施例对此不进行限制。坐标的单位可以是像素。In actual implementation, when the scene type of the image frame is close-up, the matching detection method can be: input the image frame into the pre-trained face detection model, and the face detection model here is called the second face detection model. The image frame is input into the second face detection model, and the face detection result is output, and the face detection result includes a detection frame of each face in the image frame and corresponding face features. Each detection frame is represented by detection frame coordinates to describe the coordinate position of a rectangular frame in the image frame, and it can also be a frame of other shapes. In this embodiment, a rectangular frame is taken as an example for illustration, and the coordinates of the detection frame can be represented by four numbers. For example, the four numbers can be the coordinates of the upper left corner and the lower right corner of the rectangular frame. When the center of the coordinate axis is the point at the lower left corner of the image frame, the coordinates of the detection frame can be (23, 75), (57, 46), or written as (23, 75, 57, 46). The four numbers can also be the coordinates of the lower left and upper right corners of the rectangle. In other examples, the coordinates of the detection frame may also use other representations, such as eight numbers, which is not limited in this embodiment. The unit of coordinates may be pixels.
在一个例子中,第二人脸检测模型可以是一个大型的神经网络模型,即神经网络模型的层数比较多,以使得到的人脸检测结果更为精确。需要说明的是,第二人脸检测模型的大型是相对于第一人脸检测模型的小型而言的,即第二人脸检测模型的神经网络层数往往比第一人脸检测模型的神经网络层数多,在计算速度上,第二人脸检测模型也比第一人脸检测模型慢。示例性的,第一人脸检测模型处理一张图像帧用时10ms,第二人脸检测模型处理同样规格的一张图像帧用时100ms。In an example, the second face detection model may be a large-scale neural network model, that is, the neural network model has a relatively large number of layers, so that the obtained face detection result is more accurate. It should be noted that the size of the second face detection model is relatively smaller than that of the first face detection model, that is, the number of neural network layers of the second face detection model is often smaller than that of the first face detection model. The number of network layers is large, and the calculation speed of the second face detection model is also slower than the first face detection model. Exemplarily, the first face detection model takes 10ms to process an image frame, and the second face detection model takes 100ms to process an image frame with the same specification.
在其他例子中,对第一人脸检测模型和第二人脸检测模型可以没有上述的限制,由本领域技术人员根据实际需求选取所需要的模型,第一人脸检测模型和第二人脸检测模型也可以是同一个神经网络模型。在一个例子中,第一人脸检测模型和第二人脸检测模型可以是同一个模型,对于近景模式的图像帧,步骤106可以省略,直接采用步骤104中确定的初始人脸检测框进行步骤108的处理。In other examples, the first human face detection model and the second human face detection model may not have the above-mentioned restrictions, and those skilled in the art can select the required model according to actual needs, the first human face detection model and the second human face detection model The model can also be the same neural network model. In one example, the first face detection model and the second face detection model can be the same model, and for the image frame of the close-up mode, step 106 can be omitted, and the initial face detection frame determined in step 104 is directly used to perform the step 108 processing.
在步骤108中,按照预设的遮挡模式对检测到的人脸进行遮挡处理。In step 108, occlusion processing is performed on the detected faces according to a preset occlusion mode.
本步骤中,预设的遮挡模式可以是默认设置的遮挡模式,也可以是由用户设置。比如,可以接收对遮挡模式的选择指令,该选择指令用于从至少一种待选遮挡模式中确定所使用的遮挡模式。待选遮挡模式包括以下至少一项:对全部人脸进行遮挡处理、对预设的人脸之外的人脸进行遮挡处理、或对预设的人脸进行遮挡处理。在一个例子中,预设的人脸可以是指人脸库中预先存储的人脸。In this step, the preset occlusion mode may be a default occlusion mode, or may be set by a user. For example, a selection instruction for an occlusion mode may be received, and the selection instruction is used to determine the occlusion mode to be used from at least one candidate occlusion mode. The occlusion mode to be selected includes at least one of the following: perform occlusion processing on all human faces, perform occlusion processing on human faces other than preset human faces, or perform occlusion processing on preset human faces. In an example, the preset human face may refer to a pre-stored human face in a human face database.
另外,在进行遮挡处理时对不同检测方式得到的人脸也可以采用不同的处理方式。In addition, when performing occlusion processing, different processing methods may also be adopted for faces obtained by different detection methods.
比如,基于所述人头的关键点坐标,按照预设的遮挡模式对所述图像帧中的人脸进行遮挡处理。在检测得到人头的关键点坐标后,可以根据关键点坐标确定要进行遮挡的 区域。For example, based on the key point coordinates of the human head, the face in the image frame is occluded according to a preset occlusion mode. After the key point coordinates of the human head are detected, the area to be occluded can be determined according to the key point coordinates.
示例性的,可以把人头的关键点坐标作为圆心,确定一个圆,对该圆中的范围进行遮挡处理,比如,可以将圆中区域马赛克处理。在确定圆的半径时,可以选择与图像帧的大小对应的半径长度,比如,图像帧的大小为1080p时,半径长度一般用20~30像素范围内的值,也可以由本领域技术人员根据实际需求设置。Exemplarily, the key point coordinates of the human head can be used as the center of the circle, a circle can be determined, and the range in the circle can be occluded, for example, the area in the circle can be mosaiced. When determining the radius of the circle, the radius length corresponding to the size of the image frame can be selected. For example, when the size of the image frame is 1080p, the radius length generally uses a value in the range of 20 to 30 pixels, or it can be determined by those skilled in the art according to the actual situation. Requirements setting.
在其他例子中,也可以根据关键点坐标确定其他形状的要进行遮挡的区域,比如,矩形,六边形或者不规则形状。此外,当遮挡处理是在人脸区域覆盖贴纸时,可以将关键点坐标作为贴纸图形的中心来进行贴纸覆盖。In other examples, regions to be occluded in other shapes may also be determined according to the key point coordinates, for example, rectangles, hexagons or irregular shapes. In addition, when the occlusion processing is to cover the sticker on the face area, the coordinates of the key points can be used as the center of the sticker graphic to perform sticker coverage.
比如,基于所述人脸的检测框,按照预设的遮挡模式对所述图像帧中的人脸进行遮挡处理。在检测得到人脸的检测框后,可以根据检测框确定要进行遮挡的区域。For example, based on the detection frame of the human face, occlusion processing is performed on the human face in the image frame according to a preset occlusion mode. After the detection frame of the face is detected, the area to be occluded can be determined according to the detection frame.
在一个例子中,可以直接对检测框范围内的区域进行遮挡处理,或者对检测框进行缩放处理后确定要进行遮挡的区域,也可以采用其他手段对检测框进行变形确定要进行遮挡的区域。此外,当遮挡处理是在人脸区域覆盖贴纸时,可以预先设定贴纸图形的覆盖区域与检测框的对应关系,根据对应关系来进行贴纸覆盖。In an example, the area within the detection frame may be directly occluded, or the detection frame may be scaled to determine the area to be occluded, or other means may be used to deform the detection frame to determine the area to be occluded. In addition, when the occlusion process is to cover the face area with a sticker, the corresponding relationship between the covered area of the sticker graphic and the detection frame can be preset, and the sticker is covered according to the corresponding relationship.
又比如,在检测得到单人的人脸轮廓或头部轮廓时,对轮廓中的区域进行遮挡处理;在检测得到包含多个人脸的密集区域时,对整个密集区域进行遮挡处理。For another example, when the face contour or head contour of a single person is detected, the area in the contour is occluded; when a dense area containing multiple faces is detected, the entire dense area is occluded.
本公开实施例提供的图像处理方法,对于不同场景类型的图像帧自适应选取与场景类型相匹配的处理方式,从而在对图像帧中的人脸进行遮挡处理时,可以对不同场景的图像帧采取针对该图像帧更有效的处理方式,所以,能够更准确的对人脸进行遮挡,提高了图像处理的效率,减少了人工操作的时间成本。The image processing method provided by the embodiment of the present disclosure adaptively selects a processing method that matches the scene type for image frames of different scene types, so that when performing occlusion processing on the face in the image frame, the image frames of different scenes can be A more effective processing method for the image frame is adopted, so the human face can be blocked more accurately, the efficiency of image processing is improved, and the time cost of manual operation is reduced.
在一些场景中,往往需要对图像帧中出现的人脸先进行区分再进行遮挡处理。比如在电视后期制作过程中,需要对除了特定人脸之外的全部人脸进行遮挡处理,特别是街头采访,除了记者其他出镜的人都需要人脸遮挡处理。In some scenarios, it is often necessary to distinguish the faces appearing in the image frame before performing occlusion processing. For example, in the post-production process of TV, it is necessary to occlude all faces except for specific faces, especially for street interviews. Except for reporters, other people who appear on the scene need to be occluded.
下面对本实施例提出的一种解决方案进行说明。A solution proposed in this embodiment will be described below.
在一种实施方式中,在上述实施例的基础上,可以预先配置人脸库,人脸库中包含多个提前采集的人脸图像的特征,在图像帧的场景类型为近景时,对图像帧进行人脸检测还得到与检测框对应的人脸特征。In one embodiment, on the basis of the above-mentioned embodiments, a face bank can be pre-configured, which contains the features of a plurality of pre-collected face images, and when the scene type of the image frame is close-range, the Face detection is performed on the frame and the face features corresponding to the detection frame are obtained.
上述实施例中的基于所述人脸的检测框,按照预设的遮挡模式对所述图像帧中的人脸进行遮挡处理,可以是以下处理:Based on the detection frame of the human face in the above embodiment, the occlusion processing is performed on the human face in the image frame according to the preset occlusion mode, which may be the following processing:
针对每个人脸,将该人脸的人脸特征与预设的人脸库进行匹配,当预设的遮挡模式为对与人脸库中的参考人脸匹配的人脸进行遮挡处理时,响应于该人脸的人脸特征与所 述人脸库中的参考人脸匹配,对所述图像帧中的该人脸进行遮挡处理;或,当预设的遮挡模式为对与人脸库中的参考人脸以外的人脸进行遮挡处理时,响应于该人脸的人脸特征在所述人脸库中的参考人脸不匹配,对所述图像帧中该人脸进行遮挡处理。For each face, match the facial features of the face with the preset face library, and when the preset occlusion mode is to occlude the face matching the reference face in the face library, the response When the face feature of the face is matched with the reference face in the face library, the face in the image frame is occluded; or, when the preset occlusion mode is matched with the face in the face library When occlusion processing is performed on a human face other than the reference human face, in response to the fact that the facial features of the human face do not match the reference human face in the face library, the occlusion processing is performed on the human face in the image frame.
实际实施中,可以将检测框中的人脸图像输入预先训练的人脸识别模型,由人脸识别模型提取人脸图像中的人脸特征,将提取得到的人脸特征与人脸库中的参考人脸图像进行匹配,判断人脸特征在人脸库中是否存在匹配结果。例如,如果人脸库中参考人脸图像的特征与提取得到的人脸特征的最高相似度达到相似度阈值,则可以认为该人脸特征在人脸库中存在匹配结果,匹配结果是人脸库中参考特征与该人脸特征相似度最高的人脸图像,确定该人脸与人脸库相匹配。否则,可以认为人脸的人脸特征在人脸库中不存在匹配结果,确定该人脸与人脸库不匹配。In actual implementation, the face image in the detection frame can be input into the pre-trained face recognition model, and the face feature in the face image can be extracted by the face recognition model, and the extracted face feature can be compared with the face feature in the face database. Refer to the face image for matching, and judge whether there is a matching result for the face feature in the face database. For example, if the highest similarity between the feature of the reference face image in the face database and the extracted face feature reaches the similarity threshold, it can be considered that the face feature has a matching result in the face database, and the matching result is a human face. The reference feature in the library is the face image with the highest similarity to the face feature, and it is determined that the face matches the face library. Otherwise, it can be considered that there is no matching result in the face database for the facial features of the human face, and it is determined that the human face does not match the human face database.
当预设的遮挡模式为对人脸库中的人脸图像进行遮挡处理时,对图像帧中存在匹配结果的人脸进行遮挡处理。When the preset occlusion mode is to perform occlusion processing on the face images in the face library, perform occlusion processing on the faces with matching results in the image frames.
在其他例子中,预设的遮挡模式也可以设置为对人脸库之外的人脸图像进行遮挡处理,此时对图像帧中不存在匹配结果的人脸进行遮挡处理。比如,人脸库中只有记者的人脸图像时,图像帧中除了记者其他出镜的人都需要遮挡处理。In other examples, the preset occlusion mode may also be set to perform occlusion processing on face images outside the face library, and at this time, perform occlusion processing on faces for which there is no matching result in the image frame. For example, when there is only a reporter's face image in the face database, all other people in the image frame except the reporter need to be occluded.
在一个例子中,在配置人脸库时,还可以为每个人脸图像设置是否需要遮挡处理的属性,在确定人脸特征在人脸库中存在匹配结果后,进一步通过属性验证该人脸是否需要进行遮挡处理。例如,响应于人脸特征在人脸库中存在匹配结果,且匹配结果的属性为需要遮挡处理时,对图像帧中的人脸进行遮挡处理;如果匹配结果的属性为不需要遮挡处理,对图像帧中的人脸不进行遮挡处理。对于图像帧中其他不存在匹配结果的人脸可以根据预设的遮挡模式进行或者不进行遮挡处理。In one example, when configuring the face library, you can also set the attribute of whether occlusion processing is required for each face image. After determining that the face feature has a matching result in the face library, further verify whether the face is Occlusion processing is required. For example, when there is a matching result in the face database in response to the human face feature, and when the attribute of the matching result needs to be occluded, the face in the image frame is occluded; if the attribute of the matching result is not required to be occluded, the Faces in image frames are not occluded. For other faces in the image frame that do not have matching results, occlusion processing may be performed or not performed according to a preset occlusion mode.
需要说明的是,由于远景中人脸区域比较小,进行人脸识别所需要的代价比较高,需要更复杂的算法和耗费更多的时间,所以,本例中仅对近景的图像帧进行人脸识别以加快处理速度。在其他例子中,在有需求的情况下,也可以对远景中人脸做人脸识别以进一步判断是否需要进行遮挡处理。另外,在近景和远景之外的其他场景类型中,也可以参照本实施例的方法将人脸特征与预设的人脸库进行匹配来判断是否需要进行遮挡处理。It should be noted that since the face area in the foreground is relatively small, the cost of face recognition is relatively high, requiring more complex algorithms and consuming more time. Therefore, in this example, only the image frames in the foreground are processed. Face recognition for faster processing. In other examples, if necessary, face recognition can also be performed on faces in the foreground to further determine whether occlusion processing is required. In addition, in scene types other than near view and distant view, it is also possible to refer to the method of this embodiment to match face features with a preset face library to determine whether occlusion processing is required.
本公开实施例提供的图像处理方法,不仅可以对不同场景类型的图像帧自适应选取与场景类型相匹配的处理方式,还可以自动对图像帧中的人脸进行人脸检测并提取人脸特征,将提取到的人脸特征与人脸库进行匹配,从而选择性地对人脸进行遮挡处理,能够更准确和更灵活地对人脸进行遮挡,提高了图像处理的效率,减少了人工操作的时间 成本。The image processing method provided by the embodiments of the present disclosure can not only adaptively select a processing method that matches the scene type for image frames of different scene types, but also automatically perform face detection on the faces in the image frames and extract face features. , match the extracted face features with the face database, so as to selectively occlude the face, which can more accurately and flexibly occlude the face, improve the efficiency of image processing, and reduce manual operations time cost.
图2为本公开至少一个实施例示出的另一种图像处理方法的流程图,该方法可以对视频中的人脸按照选择的遮挡模式进行遮挡处理,该方法可以包括如下步骤,其中,与图1的流程相同的步骤将不再详述。Fig. 2 is a flow chart of another image processing method shown in at least one embodiment of the present disclosure. The method can occlude the faces in the video according to the selected occlusion mode. The same steps as the process of 1 will not be described in detail.
在步骤202中,接收对遮挡模式的选择指令,其中,选择指令用于从至少一种待选遮挡模式中确定所使用的遮挡模式。In step 202, a selection instruction of a shading mode is received, wherein the selection instruction is used to determine a shading mode to be used from at least one shading mode to be selected.
待选遮挡模式可以由本领域技术人员根据实际需要设置,示例性的,待选遮挡模式包括以下至少一项:对全部人脸进行遮挡处理,对人脸库之外的人脸进行遮挡处理,或对人脸库中的人脸进行遮挡处理。待选遮挡模式还可以是:对人脸库中的人脸按照属性进行遮挡处理,对人脸库之外的人脸也进行遮挡处理。The occlusion mode to be selected can be set by those skilled in the art according to actual needs. Exemplarily, the occlusion mode to be selected includes at least one of the following: perform occlusion processing on all faces, perform occlusion processing on faces outside the face bank, or Mask the faces in the face library. The occlusion mode to be selected may also be: perform occlusion processing on the faces in the face database according to attributes, and also perform occlusion processing on the faces outside the face database.
例如,可以预先配置人脸库,人脸库中可以包括需要露脸的人的人脸图像的参考特征。此时,被选择的遮挡模式可以是对人脸库之外的人脸进行遮挡处理。For example, a face library may be preconfigured, and the face library may include reference features of face images of people whose faces need to be shown. At this time, the selected occlusion mode may be to perform occlusion processing on faces outside the face library.
又例如,可以预先配置人脸库,人脸库中可以包括被禁止露脸的人的人脸图像的参考特征。此时,被选择的遮挡模式可以是对人脸库中的人脸进行遮挡处理。For another example, a face library may be pre-configured, and the face library may include reference features of face images of people who are prohibited from showing their faces. At this time, the selected occlusion mode may be to perform occlusion processing on the faces in the face database.
此外,预先配置的人脸库中除了人脸图像的参考特征之外,每个人脸图像还可以具有是否需要遮挡处理的属性,该属性可以根据需求更改。比如,在一个视频中需要对某个人的人脸进行遮挡处理,可以将此人的人脸图像设置为需要遮挡处理属性;在另一个视频中不需要对某个人的人脸进行遮挡处理,可以将此人的人脸图像设置为不需要遮挡处理属性。此外,可以灵活选择遮挡模式和配置人脸库中人脸图像的属性以满足各种实际需求。In addition, in addition to the reference features of the face image in the pre-configured face database, each face image can also have an attribute of whether occlusion processing is required, and this attribute can be changed according to requirements. For example, if a person’s face needs to be occluded in a video, the person’s face image can be set to require occlusion processing attributes; in another video, a person’s face does not need to be occluded, you can Sets the person's face image to not require occlusion processing properties. In addition, the occlusion mode can be flexibly selected and the attributes of the face images in the face database can be configured to meet various practical needs.
除了接收对遮挡模式的选择指令来确定遮挡模式,还可以设定默认的遮挡模式。对不同的场景类型可以应用同一种遮挡模式,也可以应用不同的遮挡模式。In addition to receiving a selection instruction for the occlusion mode to determine the occlusion mode, a default occlusion mode may also be set. The same occlusion mode can be applied to different scene types, and different occlusion modes can also be applied.
在步骤204中,对视频进行解帧处理,得到至少一个待处理的图像帧。In step 204, deframe processing is performed on the video to obtain at least one image frame to be processed.
例如,可以接收用户上传的待处理视频,将视频拆解成多个图像帧,以方便后续进行逐帧处理。For example, a video to be processed uploaded by a user may be received, and the video may be disassembled into multiple image frames for subsequent frame-by-frame processing.
在步骤206中,获取图像帧的场景类型。In step 206, the scene type of the image frame is acquired.
本实施例中,以场景类型为近景和远景为例,可以对图像帧是远景还是近景进行判定。In this embodiment, taking the scene types as near view and distant view as an example, it may be determined whether an image frame is a distant view or a near view.
在步骤208中,依据与图像帧的场景类型相匹配的检测方式对图像帧中的人脸进行检测,按照预设的遮挡模式对检测到的人脸进行遮挡处理。In step 208, the face in the image frame is detected according to the detection method matching the scene type of the image frame, and the detected face is occluded according to a preset occlusion mode.
例如,遮挡处理可以为马赛克处理,遮挡模式可以为:对人脸库中的人脸按照属性 进行遮挡处理,对人脸库之外的人脸也进行遮挡处理。此时,在确定图像帧的场景类型后,响应于图像帧的场景类型为远景,对图像帧进行人头关键点提取,得到图像帧中人头的关键点坐标,根据关键点坐标确定需要进行遮挡处理的区域,对该区域进行马赛克算法处理;响应于图像帧的场景类型为近景,对图像帧中的人脸进行人脸检测,得到图像帧中人脸的检测框,提取人脸的检测框中的人脸图像的人脸特征,并将人脸特征与人脸库进行匹配,响应于人脸特征在人脸库中存在匹配结果,且匹配结果的属性为需要遮挡处理时,对图像帧中的人脸所在的检测框内进行马赛克算法处理。For example, the occlusion processing can be mosaic processing, and the occlusion mode can be: perform occlusion processing on the faces in the face database according to the attributes, and also perform occlusion processing on the faces outside the human face database. At this time, after determining the scene type of the image frame, in response to the scene type of the image frame being a distant view, the key points of the human head are extracted from the image frame, and the key point coordinates of the human head in the image frame are obtained, and the occlusion processing is determined according to the key point coordinates In the region, mosaic algorithm processing is performed on the region; in response to the scene type of the image frame being close-up, face detection is performed on the face in the image frame, the detection frame of the face in the image frame is obtained, and the detection frame of the face is extracted The face feature of the face image, and the face feature is matched with the face library, and when there is a matching result in the face feature in the face library, and the attribute of the matching result needs to be occluded, the image frame is The mosaic algorithm is processed in the detection frame where the face is located.
在步骤210中,将已进行遮挡处理的至少一个图像帧合成目标视频。In step 210, at least one image frame that has undergone occlusion processing is synthesized into a target video.
例如,可以将进行上述步骤的处理后多个图像帧按照最开始的待处理视频中的时序进行组合,输出合成后的目标视频。还可以对上述步骤的处理后多个图像帧进行人工校验,对个别遮挡处理失败的图像帧进行重新处理,校验后再合成视频。For example, multiple image frames after the above steps can be combined according to the time sequence in the initial video to be processed, and the synthesized target video can be output. It is also possible to manually verify multiple image frames after the processing in the above steps, reprocess individual image frames that fail to be occluded, and synthesize the video after verification.
为了便于直观的理解上述步骤的处理过程,可以参见图2A,图2A示出了本例中的图像处理方法的处理逻辑流程。In order to intuitively understand the processing process of the above steps, please refer to FIG. 2A , which shows the processing logic flow of the image processing method in this example.
本公开实施例提供的图像处理方法,可以自动地对视频中不同场景类型的图像帧自适应选取与场景类型相匹配的检测方式,从而在对图像帧中的人脸进行检测时,可以对不同场景的图像帧采取针对该图像帧更有效的检测方式;还可以设定遮挡模式,对照人脸库进行遮挡处理或者不处理,能够更准确的对人脸进行遮挡,大大提升了视频后期制作的效率,减少了人力投入。The image processing method provided by the embodiments of the present disclosure can automatically and adaptively select a detection method that matches the scene type for image frames of different scene types in a video, so that when detecting faces in image frames, different The image frame of the scene adopts a more effective detection method for the image frame; you can also set the occlusion mode, and perform occlusion processing or not processing according to the face database, which can more accurately occlude the face, which greatly improves the video post-production. Efficiency, reducing manpower input.
如图3所示,图3是本公开至少一个实施例示出的一种图像处理装置的框图,所述装置包括:图像帧获取模块31,场景类型获取模块32,人脸检测模块33和遮挡处理模块34。As shown in Figure 3, Figure 3 is a block diagram of an image processing device shown in at least one embodiment of the present disclosure, the device includes: an image frame acquisition module 31, a scene type acquisition module 32, a face detection module 33 and occlusion processing Module 34.
图像帧获取模块31,用于获取待处理的图像帧。The image frame acquisition module 31 is configured to acquire image frames to be processed.
场景类型获取模块32,用于获取所述图像帧的场景类型,所述场景类型根据所述图像帧中的人脸初检结果确定。The scene type acquiring module 32 is configured to acquire the scene type of the image frame, and the scene type is determined according to the result of the preliminary face detection in the image frame.
人脸检测模块33,用于依据与所述图像帧的场景类型相匹配的检测方式,对所述图像帧进行人脸检测。The face detection module 33 is configured to perform face detection on the image frame according to a detection method that matches the scene type of the image frame.
遮挡处理模块34,用于按照预设的遮挡模式对检测到的人脸进行遮挡处理。The occlusion processing module 34 is configured to perform occlusion processing on the detected faces according to a preset occlusion mode.
在一个例子中,所述人脸检测模块33,具体用于:响应于所述图像帧的场景类型为远景,对所述图像帧进行人头关键点提取,得到所述图像帧中人头的关键点坐标;In one example, the human face detection module 33 is specifically configured to: in response to the scene type of the image frame being a distant view, extract key points of the human head from the image frame to obtain the key points of the human head in the image frame coordinate;
所述遮挡处理模块34,具体用于:基于所述人头的关键点坐标,按照预设的遮挡模式对检测到的的人脸进行遮挡处理。The occlusion processing module 34 is specifically configured to: perform occlusion processing on the detected human face according to a preset occlusion mode based on the key point coordinates of the human head.
在一个例子中,所述人脸检测模块33,具体用于:响应于所述图像帧的场景类型为近景,对所述图像帧进行人脸检测,得到所述图像帧中检测到的人脸的检测框;In one example, the face detection module 33 is specifically configured to: in response to the scene type of the image frame being close-up, perform face detection on the image frame to obtain the detected face in the image frame detection frame;
所述遮挡处理模块34,具体用于:基于所述检测框,按照预设的遮挡模式对所述图像帧中的人脸进行遮挡处理。The occlusion processing module 34 is specifically configured to: perform occlusion processing on the human face in the image frame according to a preset occlusion mode based on the detection frame.
在一个例子中,对所述图像帧进行的所述人脸检测得到所述图像帧中检测到的人脸的检测框与所述人脸的检测框对应的人脸特征,所述遮挡处理模块34,在用于基于所述人脸的检测框,按照预设的遮挡模式对所述图像帧中的人脸进行遮挡处理时,具体用于:将所述人脸的人脸特征与预设的人脸库进行匹配;当预设的遮挡模式为对与所述人脸库中的参考人脸匹配的人脸进行遮挡处理时,响应于所述人脸的人脸特征与所述人脸库中的参考人脸匹配,对所述图像帧中的所述人脸进行遮挡处理;或当预设的遮挡模式为对所述人脸库中的参考人脸以外的人脸进行遮挡处理时,响应于所述人脸的人脸特征与所述人脸库中的参考人脸不匹配,对所述图像帧中的所述人脸进行遮挡处理。In an example, the face detection performed on the image frame obtains the detection frame of the face detected in the image frame and the face feature corresponding to the detection frame of the face, and the occlusion processing module 34. When performing occlusion processing on the human face in the image frame according to the preset occlusion mode based on the detection frame of the human face, it is specifically used to: combine the facial features of the human face with the preset face database for matching; when the preset occlusion mode is to perform occlusion processing on the face matched with the reference face in the human face database, responding to the facial features of the human face and the human face The reference face in the library is matched, and the face in the image frame is occluded; or when the preset occlusion mode is to perform occlusion processing on faces other than the reference face in the face library , performing occlusion processing on the face in the image frame in response to the fact that the face feature of the face does not match the reference face in the face database.
在一个例子中,所述遮挡处理模块34,在用于将所述人脸的人脸特征与预设的所述人脸库进行匹配时,具体用于:响应于确定所述人脸的检测框的大小超过预设值,提取该人脸的检测框中的人脸图像的人脸特征,并将该人脸特征与预设的所述人脸库进行匹配。In an example, when the occlusion processing module 34 is used to match the facial features of the human face with the preset human face library, it is specifically used to: respond to determining that the detection of the human face If the size of the frame exceeds a preset value, the face feature of the face image in the face detection frame is extracted, and the face feature is matched with the preset face library.
在一个例子中,所述场景类型包括近景和远景;所述场景类型获取模块32,具体用于:对所述图像帧进行人脸初检,得到人脸初检结果,所述人脸初检结果包括至少一个初始人脸检测框;在所述至少一个初始人脸检测框中预设数量的初始人脸检测框的目标尺寸超过预设阈值的情况下,确定所述图像帧的场景类型为近景;或,在所述至少一个初始人脸检测框中预设数量的初始人脸检测框的目标尺寸不超过预设阈值的情况下,确定所述图像帧的场景类型为远景。In one example, the scene type includes near view and distant view; the scene type acquisition module 32 is specifically configured to: perform a preliminary detection of a human face on the image frame to obtain a result of a preliminary detection of a human face, and the preliminary detection of a human face The result includes at least one initial face detection frame; when the target size of the preset number of initial face detection frames in the at least one initial face detection frame exceeds a preset threshold, determine that the scene type of the image frame is Close view; or, in the case that the target size of a preset number of initial face detection frames in the at least one initial face detection frame does not exceed a preset threshold, determine that the scene type of the image frame is a distant view.
在一个例子中,所述图像帧获取模块31,具体用于:对视频进行解帧处理,得到至少一个待处理的图像帧;所述图像帧获取模块31,还用于:将已进行遮挡处理的至少一个图像帧合成目标视频。In one example, the image frame acquisition module 31 is specifically configured to: perform deframe processing on the video to obtain at least one image frame to be processed; the image frame acquisition module 31 is also configured to: process the occlusion At least one image frame of the synthesized target video.
如图4所示,在前述装置实施例的基础上,所述装置还包括:遮挡模式选择模块30;所述遮挡模式选择模块30,用于接收对遮挡模式的选择指令,所述选择指令用于从至少一种待选遮挡模式中确定所使用的遮挡模式;所述待选遮挡模式至少包括如下一项:对全部人脸进行遮挡处理、对预设的人脸之外的人脸进行遮挡处理、以及对预设的人脸进行遮挡处理。As shown in FIG. 4 , on the basis of the foregoing device embodiments, the device further includes: an occlusion mode selection module 30; the occlusion mode selection module 30 is configured to receive a selection instruction for the occlusion mode, and the selection instruction The occlusion mode to be used is determined from at least one occlusion mode to be selected; the occlusion mode to be selected includes at least one of the following items: performing occlusion processing on all human faces, and occluding human faces other than preset human faces Processing, and occlusion processing on the preset face.
上述装置中各个模块的功能和作用的实现过程具体详见上述方法中对应步骤的实 现过程,在此不再赘述。For the implementation process of the functions and effects of each module in the above-mentioned device, please refer to the implementation process of the corresponding steps in the above-mentioned method for details, and will not repeat them here.
本公开实施例还提供了一种电子设备,如图5所示,所述电子设备包括存储器51、处理器52,所述存储器51用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现本公开任一实施例所述的图像处理方法。An embodiment of the present disclosure also provides an electronic device. As shown in FIG. The device is used to implement the image processing method described in any embodiment of the present disclosure when executing the computer instructions.
本公开实施例还提供了一种计算机程序产品,该产品包括计算机程序/指令,该计算机程序/指令被处理器执行时实现本公开任一实施例所述的图像处理方法。An embodiment of the present disclosure further provides a computer program product, which includes a computer program/instruction, and when the computer program/instruction is executed by a processor, implements the image processing method described in any embodiment of the present disclosure.
本公开实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,所述程序被处理器执行时实现本公开任一实施例所述的图像处理方法。其中,该存储介质可以是易失性或非易失的计算机可读取存储介质。An embodiment of the present disclosure further provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the image processing method described in any embodiment of the present disclosure is implemented. Wherein, the storage medium may be a volatile or non-volatile computer-readable storage medium.
对于装置实施例而言,由于其基本对应于方法实施例,所以相关之处参见方法实施例的部分说明即可。以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的模块可以是或者也可以不是物理上分开的,作为模块显示的部件可以是或者也可以不是物理模块,即可以位于一个地方,或者也可以分布到多个网络模块上。可以根据实际的需要选择其中的部分或者全部模块来实现本说明书方案的目的。本领域普通技术人员在不付出创造性劳动的情况下,即可以理解并实施。As for the device embodiment, since it basically corresponds to the method embodiment, for related parts, please refer to the part description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate components may or may not be physically separated, and the components shown as modules may or may not be physical modules, that is, they may be located in One place, or it can be distributed to multiple network modules. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. It can be understood and implemented by those skilled in the art without creative effort.
上述对本说明书特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of this specification. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.
本领域技术人员在考虑说明书及实践这里申请的发明后,将容易想到本说明书的其它实施方案。本说明书旨在涵盖本说明书的任何变型、用途或者适应性变化,这些变型、用途或者适应性变化遵循本说明书的一般性原理并包括本说明书未申请的本技术领域中的公知常识或惯用技术手段。说明书和实施例仅被视为示例性的,本说明书的真正范围和精神由下面的权利要求指出。Other embodiments of the description will readily occur to those skilled in the art from consideration of the specification and practice of the invention claimed herein. This description is intended to cover any modification, use or adaptation of this description. These modifications, uses or adaptations follow the general principles of this description and include common knowledge or conventional technical means in this technical field for which this description does not apply . The specification and examples are to be considered exemplary only, with a true scope and spirit of the specification being indicated by the following claims.
应当理解的是,本说明书并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本说明书的范围仅由所附的权利要求来限制。It should be understood that this specification is not limited to the precise constructions which have been described above and shown in the accompanying drawings, and that various modifications and changes may be made without departing from the scope thereof. The scope of the specification is limited only by the appended claims.
以上所述仅为本说明书的较佳实施例而已,并不用以限制本说明书,凡在本说明书的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本说明书保护的范围之内。The above descriptions are only preferred embodiments of this specification, and are not intended to limit this specification. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of this specification shall be included in this specification. within the scope of protection.

Claims (13)

  1. 一种图像处理方法,其特征在于,所述方法包括:An image processing method, characterized in that the method comprises:
    获取待处理的图像帧;Get the image frame to be processed;
    获取所述图像帧的场景类型,所述场景类型根据所述图像帧的人脸初检结果确定;Acquiring the scene type of the image frame, the scene type is determined according to the result of the initial detection of the face of the image frame;
    依据与所述图像帧的场景类型相匹配的检测方式,对所述图像帧进行人脸检测;Perform face detection on the image frame according to a detection method that matches the scene type of the image frame;
    按照预设的遮挡模式对检测到的人脸进行遮挡处理。The detected face is occluded according to the preset occlusion mode.
  2. 根据权利要求1所述的方法,其特征在于,依据与所述图像帧的场景类型相匹配的检测方式,对所述图像帧进行人脸检测,包括:The method according to claim 1, wherein performing face detection on the image frame according to a detection method that matches the scene type of the image frame includes:
    响应于所述图像帧的场景类型为远景,对所述图像帧进行人头关键点提取,得到所述图像帧中人头的关键点坐标;In response to the scene type of the image frame being a distant view, extracting the key points of the human head from the image frame to obtain the key point coordinates of the human head in the image frame;
    按照预设的遮挡模式对检测到的人脸进行遮挡处理,包括:According to the preset occlusion mode, the detected face is occluded, including:
    基于所述人头的关键点坐标,按照预设的遮挡模式对检测到的人脸进行遮挡处理。Based on the key point coordinates of the human head, the detected human face is occluded according to a preset occlusion mode.
  3. 根据权利要求1所述的方法,其特征在于,依据与所述图像帧的场景类型相匹配的检测方式,对所述图像帧进行人脸检测,包括:The method according to claim 1, wherein performing face detection on the image frame according to a detection method that matches the scene type of the image frame includes:
    响应于所述图像帧的场景类型为近景,对所述图像帧进行人脸检测,得到所述图像帧中检测到的人脸的检测框;In response to the scene type of the image frame being a close view, performing face detection on the image frame to obtain a detection frame of a face detected in the image frame;
    按照预设的遮挡模式对检测到的人脸进行遮挡处理,包括:According to the preset occlusion mode, the detected face is occluded, including:
    基于所述人脸的检测框,按照预设的遮挡模式对所述图像帧中的人脸进行遮挡处理。Based on the detection frame of the human face, occlusion processing is performed on the human face in the image frame according to a preset occlusion mode.
  4. 根据权利要求3所述的方法,其特征在于,响应于所述图像帧的场景类型为近景,对所述图像帧进行人脸检测,得到所述图像帧中检测到的人脸的检测框包括:The method according to claim 3, characterized in that, in response to the scene type of the image frame being a close-up, face detection is performed on the image frame to obtain a detection frame of a face detected in the image frame comprising :
    响应于所述图像帧的场景类型为近景,对所述图像帧进行人脸检测,得到所述图像帧中检测到的人脸的检测框与所述人脸的检测框对应的人脸特征;In response to the scene type of the image frame being close-range, face detection is performed on the image frame to obtain the face features corresponding to the detection frame of the human face detected in the image frame and the detection frame of the human face;
    基于所述人脸的检测框,按照预设的遮挡模式对所述图像帧中的人脸进行遮挡处理,包括:Based on the detection frame of the human face, performing occlusion processing on the human face in the image frame according to a preset occlusion mode, including:
    将所述人脸的人脸特征与预设的人脸库进行匹配;Matching the facial features of the human face with a preset human face database;
    当预设的遮挡模式为对与所述人脸库中的参考人脸匹配的人脸进行遮挡处理时,响应于所述人脸的人脸特征与所述人脸库中的参考人脸匹配,对所述图像帧中的所述人脸进行遮挡处理;或When the preset occlusion mode is to occlude the face matched with the reference face in the face database, responding to the face features of the face matching the reference face in the face database , performing occlusion processing on the face in the image frame; or
    当预设的遮挡模式为对所述人脸库中的参考人脸以外的人脸进行遮挡处理时,响应于所述人脸的人脸特征与所述人脸库中的参考人脸不匹配,对所述图像帧中的所述人脸进行遮挡处理。When the preset occlusion mode is to perform occlusion processing on faces other than the reference face in the face library, responding to the fact that the face features of the face do not match the reference face in the face library , performing occlusion processing on the face in the image frame.
  5. 根据权利要求4所述的方法,其特征在于,将所述人脸的人脸特征与预设的所述人脸库进行匹配,包括:The method according to claim 4, wherein matching the facial features of the human face with the preset human face library includes:
    确定所述人脸的检测框的大小是否大于预设值;Determine whether the size of the detection frame of the human face is greater than a preset value;
    响应于确定所述人脸的检测框的大小超过预设值,提取所述人脸的检测框中的人脸图像的人脸特征,并将所述人脸特征与预设的所述人脸库进行匹配。Responsive to determining that the size of the face detection frame exceeds a preset value, extracting the face features of the face image in the face detection frame, and comparing the face features with the preset face library to match.
  6. 根据权利要求1-5任一所述的方法,其特征在于,The method according to any one of claims 1-5, characterized in that,
    所述场景类型包括近景和远景;The scene type includes near view and distant view;
    获取所述图像帧的场景类型,包括:Obtain the scene type of the image frame, including:
    对所述图像帧进行人脸初检,得到人脸初检结果,所述人脸初检结果包括至少一个初始人脸检测框;Carrying out a preliminary detection of a human face on the image frame to obtain a preliminary detection result of a human face, wherein the preliminary detection result of a human face includes at least one initial detection frame of a human face;
    在所述至少一个初始人脸检测框中预设数量的初始人脸检测框的目标尺寸超过预设阈值的情况下,确定所述图像帧的场景类型为近景;或,In the case where the target size of a preset number of initial face detection frames in the at least one initial face detection frame exceeds a preset threshold, determine that the scene type of the image frame is a close view; or,
    在所述至少一个初始人脸检测框中所述预设数量的初始人脸检测框的目标尺寸不超过预设阈值的情况下,确定所述图像帧的场景类型为远景。In a case where the target size of the preset number of initial face detection frames in the at least one initial face detection frame does not exceed a preset threshold, determine that the scene type of the image frame is a distant view.
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:The method according to claim 6, further comprising:
    将所述至少一个初始人脸检测框按照置信度分数从大到小排序;Sorting the at least one initial face detection frame according to the confidence score from large to small;
    将置信度分数最高的N个初始人脸检测框确定为所述预设数量的初始人脸检测框,其中N等于所述预设数量。The N initial face detection frames with the highest confidence scores are determined as the preset number of initial face detection frames, where N is equal to the preset number.
  8. 根据权利要求1-7任一所述的方法,其特征在于,The method according to any one of claims 1-7, characterized in that,
    获取待处理的图像帧,包括:Get the image frames to be processed, including:
    对视频进行解帧处理,得到至少一个待处理的图像帧;Deframing the video to obtain at least one image frame to be processed;
    所述图像处理方法还包括:The image processing method also includes:
    将已进行遮挡处理的至少一个图像帧合成目标视频。Combining at least one image frame that has been occluded into a target video.
  9. 根据权利要求1-8任一所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-8, wherein the method further comprises:
    接收对遮挡模式的选择指令,所述选择指令用于从至少一种待选遮挡模式中确定所使用的遮挡模式;receiving a selection instruction for an occlusion mode, the selection instruction being used to determine the occlusion mode to be used from at least one candidate occlusion mode;
    所述待选遮挡模式至少包括如下一项:The occlusion mode to be selected includes at least one of the following:
    对全部人脸进行遮挡处理、对预设的人脸之外的人脸进行遮挡处理、以及对预设的人脸进行遮挡处理。Perform occlusion processing on all human faces, perform occlusion processing on human faces other than preset human faces, and perform occlusion processing on preset human faces.
  10. 一种图像处理装置,其特征在于,所述装置包括:An image processing device, characterized in that the device comprises:
    图像帧获取模块,用于获取待处理的图像帧;An image frame acquisition module, configured to acquire an image frame to be processed;
    场景类型获取模块,用于获取所述图像帧的场景类型,所述场景类型根据所述图像帧中的人脸初检结果确定;The scene type acquisition module is used to obtain the scene type of the image frame, and the scene type is determined according to the result of the initial detection of the face in the image frame;
    人脸检测模块,用于依据与所述图像帧的场景类型相匹配的检测方式,对所述图像帧进行人脸检测;A face detection module, configured to perform face detection on the image frame according to a detection method that matches the scene type of the image frame;
    遮挡处理模块,用于按照预设的遮挡模式对检测到的人脸进行遮挡处理。The occlusion processing module is configured to perform occlusion processing on the detected faces according to a preset occlusion mode.
  11. 一种电子设备,其特征在于,所述设备包括存储器、处理器,所述存储器用于存储可在处理器上运行的计算机指令,所述处理器用于在执行所述计算机指令时实现权利要求1至9任一所述的方法。An electronic device, characterized in that the device comprises a memory and a processor, the memory is used to store computer instructions executable on the processor, and the processor is used to implement claim 1 when executing the computer instructions to any of the methods described in 9.
  12. 一种计算机程序产品,该产品包括计算机程序/指令,其特征在于,该计算机程序/指令被处理器执行时实现权利要求1至9任一所述的方法。A computer program product, the product comprising computer programs/instructions, characterized in that, when the computer program/instructions are executed by a processor, the method described in any one of claims 1 to 9 is implemented.
  13. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述程序被处理器执行时实现权利要求1至9任一所述的方法。A computer-readable storage medium, on which a computer program is stored, wherein, when the program is executed by a processor, the method according to any one of claims 1 to 9 is realized.
PCT/CN2022/070905 2021-09-22 2022-01-10 Image processing WO2023045183A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111108402.X 2021-09-22
CN202111108402.XA CN113837065A (en) 2021-09-22 2021-09-22 Image processing method and device

Publications (1)

Publication Number Publication Date
WO2023045183A1 true WO2023045183A1 (en) 2023-03-30

Family

ID=78960319

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/070905 WO2023045183A1 (en) 2021-09-22 2022-01-10 Image processing

Country Status (3)

Country Link
CN (1) CN113837065A (en)
TW (1) TW202314634A (en)
WO (1) WO2023045183A1 (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113837065A (en) * 2021-09-22 2021-12-24 上海商汤智能科技有限公司 Image processing method and device
CN114333030A (en) * 2021-12-31 2022-04-12 科大讯飞股份有限公司 Image processing method, device, equipment and storage medium
CN114445711B (en) * 2022-01-29 2023-04-07 北京百度网讯科技有限公司 Image detection method, image detection device, electronic equipment and storage medium
CN115240265B (en) * 2022-09-23 2023-01-10 深圳市欧瑞博科技股份有限公司 User intelligent identification method, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310274A1 (en) * 2014-04-25 2015-10-29 Xerox Corporation Method and system for automatically locating static occlusions
CN110119711A (en) * 2019-05-14 2019-08-13 北京奇艺世纪科技有限公司 A kind of method, apparatus and electronic equipment obtaining video data personage segment
CN111416950A (en) * 2020-03-26 2020-07-14 腾讯科技(深圳)有限公司 Video processing method and device, storage medium and electronic equipment
CN112016464A (en) * 2020-08-28 2020-12-01 中移(杭州)信息技术有限公司 Method and device for detecting face shielding, electronic equipment and storage medium
CN113837065A (en) * 2021-09-22 2021-12-24 上海商汤智能科技有限公司 Image processing method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150310274A1 (en) * 2014-04-25 2015-10-29 Xerox Corporation Method and system for automatically locating static occlusions
CN110119711A (en) * 2019-05-14 2019-08-13 北京奇艺世纪科技有限公司 A kind of method, apparatus and electronic equipment obtaining video data personage segment
CN111416950A (en) * 2020-03-26 2020-07-14 腾讯科技(深圳)有限公司 Video processing method and device, storage medium and electronic equipment
CN112016464A (en) * 2020-08-28 2020-12-01 中移(杭州)信息技术有限公司 Method and device for detecting face shielding, electronic equipment and storage medium
CN113837065A (en) * 2021-09-22 2021-12-24 上海商汤智能科技有限公司 Image processing method and device

Also Published As

Publication number Publication date
CN113837065A (en) 2021-12-24
TW202314634A (en) 2023-04-01

Similar Documents

Publication Publication Date Title
WO2023045183A1 (en) Image processing
US11107232B2 (en) Method and apparatus for determining object posture in image, device, and storage medium
WO2019218824A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
CN108829900B (en) Face image retrieval method and device based on deep learning and terminal
Ashraf et al. Dogfight: Detecting drones from drones videos
US9898686B2 (en) Object re-identification using self-dissimilarity
US9721387B2 (en) Systems and methods for implementing augmented reality
US20210192194A1 (en) Video-based human behavior recognition method, apparatus, device and storage medium
EP3108379B1 (en) Image editing techniques for a device
CN109410026A (en) Identity identifying method, device, equipment and storage medium based on recognition of face
US20180173963A1 (en) Detection of an Object in a Distorted Image
US10922531B2 (en) Face recognition method
US8649612B1 (en) Parallelizing cascaded face detection
US20220262163A1 (en) Method of face anti-spoofing, device, and storage medium
Zakaria et al. Hierarchical skin-adaboost-neural network (h-skann) for multi-face detection
CN112287867B (en) Multi-camera human body action recognition method and device
Amjad et al. Multiple face detection algorithm using colour skin modelling
US11861806B2 (en) End-to-end camera calibration for broadcast video
WO2022135574A1 (en) Skin color detection method and apparatus, and mobile terminal and storage medium
CN113221767B (en) Method for training living body face recognition model and recognizing living body face and related device
AU2011265494A1 (en) Kernalized contextual feature
Ahlvers et al. Model-free face detection and head tracking with morphological hole mapping
US20230084980A1 (en) System for detecting face liveliness in an image
CN112101479B (en) Hair style identification method and device
CN112149598A (en) Side face evaluation method and device, electronic equipment and storage medium