CN113850245A

CN113850245A - Image processing method, image processing device, electronic equipment and storage medium

Info

Publication number: CN113850245A
Application number: CN202111440316.9A
Authority: CN
Inventors: 朱洁茹; 吴尧; 四建楼; 钱晨
Original assignee: Beijing Sensetime Technology Development Co Ltd
Current assignee: Beijing Sensetime Technology Development Co Ltd
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2021-12-28
Also published as: WO2023098635A1

Abstract

The present disclosure relates to an image processing method, an apparatus, an electronic device, and a storage medium, the image processing method including: acquiring an image to be processed, wherein the image to be processed comprises at least one object; detecting a target object in the at least one object in the image to be processed, and extracting key points of the target object; and taking the key points of the target object as guide information of segmentation processing, and segmenting the target object in the image to be processed to obtain a segmentation result of the target object. The two processes of detection and segmentation of the target object are decoupled and then independently run, so that the image processing efficiency is improved, and the segmentation process of the target object is guided by the key points of the target object, so that the segmentation precision of the target object is improved.

Description

Image processing method, image processing device, electronic equipment and storage medium

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.

Background

With the continuous development of the artificial intelligence technology, the image processing has more and more types and better effect. For example, example object segmentation of images is a common form of image processing that enables independent segmentation of different individuals of the same class in an image to assign an independent segmentation label to each individual. However, in the image segmentation algorithm in the related art, the portions with higher activation degree in the image to be processed are firstly identified, the portions with high activation degree are regarded as the instances to be segmented, and then the portions with higher activation degree are directly subjected to instance segmentation, so that the segmentation precision is low, and the effect is difficult to achieve the satisfaction of a user.

Disclosure of Invention

The present disclosure provides an image processing method, apparatus, device, and storage medium to solve the drawbacks of the related art.

According to a first aspect of embodiments of the present disclosure, there is provided an image processing method, including:

acquiring an image to be processed, wherein the image to be processed comprises at least one object;

detecting a target object in the at least one object in the image to be processed, and extracting key points of the target object;

and taking the key points of the target object as guide information of segmentation processing, and segmenting the target object in the image to be processed to obtain a segmentation result of the target object.

In one embodiment, the detecting the target object in the at least one object in the image to be processed and extracting the key point of the target object includes:

detecting at least two objects to be segmented in the image to be processed as target objects, and respectively extracting key points of each of the at least two target objects;

the method for segmenting the target object in the image to be processed by taking the key point of the target object as the guide information of segmentation processing to obtain the segmentation result of the target object comprises the following steps:

and respectively taking the key point of each target object as guide information of segmentation processing, and segmenting each target object in the image to be processed to obtain a segmentation result of each target object.

In one embodiment, the segmenting the target object in the image to be processed by using the key point of the target object as guidance information for segmentation processing to obtain a segmentation result of the target object includes:

determining a key point heat map of the target object according to key points of the target object, wherein the key point heat map is used for marking position coordinates of the key points;

and taking the key point heat map of the target object as guiding information of segmentation processing, and segmenting the target object in the image to be processed to obtain a segmentation result of the target object.

In one embodiment, the determining a keypoint heat map of the target object based on keypoints of the target object comprises:

determining a region to be segmented of the target object from the image to be processed according to the key point of the target object, wherein the region to be segmented of the target object is an image block of the target object;

mapping keypoints of the target object into the image block;

generating a keypoint heat map consistent with the image block size, wherein keypoints of the target object are marked in the keypoint heat map.

In one embodiment, the segmenting the target object in the image to be processed by using the keypoint heat map of the target object as guidance information of segmentation processing to obtain a segmentation result of the target object includes:

inputting the key point heat map and the image block into a pre-trained segmentation network model for segmentation processing, and outputting a segmentation result of the target object marked on the image block;

and mapping the segmentation result of the target object marked on the image block into the image to be processed according to the position of the region to be segmented of the target object in the image to be processed.

generating a key point heat map consistent with the size of the image to be processed, wherein key points of the target object are marked on the key point heat map;

the step of taking the key point heat map of the target object as guidance information for segmentation processing to segment the target object in the image to be processed to obtain a segmentation result of the target object includes:

inputting the image to be processed and the key point heat map into a pre-trained segmentation network model for segmentation processing, and outputting the segmentation result of the target object marked on the image to be processed.

generating a global key point heat map according to the key points of the target object, wherein all the key points of the target object are marked on the global key point heat map; or,

generating a plurality of local keypoint heat maps with different position attributes according to the keypoints of the target object, wherein the position attributes are used for characterizing local position areas of the target object, and each local keypoint heat map is marked with the keypoints of the local position area of the target object.

In one embodiment, further comprising:

determining the identification of the segmentation result of the target object, and performing at least one of the following processes on the segmentation result of the target object according to the identification: counting processing, positioning processing, content classification processing and special effect rendering processing.

According to a second aspect of the embodiments of the present disclosure, there is provided an image processing apparatus including:

the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring an image to be processed, and the image to be processed comprises at least one object;

the detection module is used for detecting a target object in the at least one object in the image to be processed and extracting key points of the target object;

and the segmentation module is used for segmenting the target object in the image to be processed by taking the key point of the target object as guide information of segmentation processing to obtain a segmentation result of the target object.

In one embodiment, the image to be processed includes a plurality of objects, and the detection module is specifically configured to:

the segmentation module is specifically configured to:

In one embodiment, the segmentation module is specifically configured to include:

determining a key point heat map of the target object according to the key points of the target object, wherein the key point heat map is used for marking the position coordinates of the key points of the target object;

In an embodiment, the segmentation module, when determining the keypoint heat map of the target object according to the keypoints of the target object, is specifically configured to:

mapping keypoints of the target object into the image block;

In an embodiment, the segmentation module is configured to segment the target object in the image to be processed by using the keypoint heat map of the target object as guidance information for segmentation processing, and when obtaining a segmentation result of the target object, is specifically configured to:

the segmentation module is configured to segment the target object in the image to be processed by using the key point heatmap of the target object as guidance information for segmentation processing, and when a segmentation result of the target object is obtained, is specifically configured to:

In one embodiment, the apparatus further comprises a processing module configured to:

According to a third aspect of embodiments of the present disclosure, there is provided an electronic device comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of the first aspect when executing the computer instructions.

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of the first aspect.

According to the embodiment, the image to be processed is acquired, the target object in at least one object in the image to be processed is detected, the key points of the target object are extracted, and finally the key points of the target object are used as the guide information of the segmentation process to segment the target object in the image to be processed, so that the segmentation result of the target object is obtained. The method comprises the steps that the related technology directly conducts example segmentation on the high-activation-degree image area in the image to be processed, the key points of the target object are used for guiding the segmentation process of the target object, and compared with the high-activation-degree image area, the key points transmit target features to be segmented with higher fine granularity, so that the segmentation precision of the target object is improved; especially when a plurality of objects to be segmented exist in the image to be processed, due to the fact that guiding is conducted through key point information of each object, even if the plurality of objects are close to each other, the segmentation boundary is clear, and therefore the segmentation precision is improved. In addition, because the two processes of detection and segmentation of the target object are decoupled and then independently run, when a plurality of continuous images or a plurality of video frames of a video are processed, the current image or video frame can be segmented while the next image or video frame is detected, so that the processing efficiency of the image or video is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

FIG. 1 is a flow chart illustrating an image processing method according to an embodiment of the present disclosure;

FIG. 2 is a schematic diagram of an image to be processed according to an embodiment of the disclosure;

fig. 3 is a schematic diagram illustrating a region to be segmented on an image to be processed according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of an image block shown in an embodiment of the present disclosure;

FIG. 5 is a schematic diagram of a keypoint heat map shown in one embodiment of the present disclosure;

FIG. 6 is a schematic flow chart diagram illustrating determination of segmentation results using keypoint heat maps and image blocks according to an embodiment of the present disclosure;

FIG. 7 is a diagram illustrating a segmentation result displayed on an image to be processed according to an embodiment of the present disclosure;

fig. 8 is a schematic structural diagram of an image processing apparatus shown in an embodiment of the present disclosure;

fig. 9 is a schematic structural diagram of an electronic device shown in an embodiment of the present disclosure.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

The terminology used in the present disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present disclosure. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

At least one embodiment of the present disclosure provides an image processing method, please refer to fig. 1, which illustrates a flow of the method, including steps S101 to S103.

The method can be used for carrying out example segmentation processing on an image to be processed, the example segmentation is an important problem in the field of computer vision, has the common characteristics of image semantic segmentation and target detection, and provides independent segmentation labels for targets belonging to the same category and different individuals in the image. For example, there are multiple human instances in an image, and the instance segmentation will separate each person separately and return segmentation labels at each person pixel level. Example segmentation has important applications in many fields, such as automatic driving scene understanding, image video special effect rendering, target statistics, distinguishing tracking, and the like.

Specifically, the method may segment a target object in the image to be processed to obtain a segmentation result of the target object, where the target object may be one of at least one preset type of objects in the image to be processed, and the preset type of object may be at least one of the following objects: human body, human head, human face, human hand, clothing, vehicle and specific animals and plants. It can be understood that, each preset kind of object in at least one object in the image to be processed may be determined as a target object in turn, and the target object segmentation is performed according to the method; and simultaneously determining each object in at least one preset kind of objects in the image to be processed as a target object, and simultaneously executing the method for segmenting each target object.

The image to be processed may be an image captured by the image capturing device or a frame of a video recorded by the image capturing device. It can be understood that, in a case where each frame in a video recorded by an image capturing device is used as an image to be processed and processed by the method provided in the embodiment of the present application, the processing of the video is completed.

In addition, the method may be performed by an electronic device such as a terminal device or a server, where the terminal device may be a User Equipment (UE), a mobile device, a User terminal, a cellular phone, a cordless phone, a Personal Digital Assistant (PDA) handheld device, a computing device, a vehicle-mounted device, a wearable device, or the like, and the method may be implemented by a processor calling computer readable instructions stored in a memory. Alternatively, the method may be performed by a server, which may be a local server, a cloud server, or the like.

In step S101, a to-be-processed image is acquired, wherein the to-be-processed image includes at least one object.

The image to be processed may be an image shot by an image acquisition device, or a frame in a video recorded by the image acquisition device, where the image acquisition device may be an electronic device with an image acquisition function, such as a mobile phone and a camera. The at least one object is a preset kind of object, and may be at least one of the following: human body, human head, human face, human hand, clothes, vehicle, specific animals and plants and the like.

The method includes the steps that an image shot by an image acquisition device can be recognized, if the image contains at least one object (for example, if the type of the object is a human face, the image contains at least one human face), the image is obtained to serve as an image to be processed, and if the image does not contain the object, the image is not used as the image to be processed; correspondingly, the video recorded by the image acquisition equipment can be identified, if a certain frame of image contains an object, the frame of image is acquired as an image to be processed, and if the certain frame of image does not contain the object, the frame of image is not taken as the image to be processed.

In a possible scene, a user records a video by using a mobile phone, sometimes a face appears in a picture during the recording process, sometimes the face does not appear, then an image frame with the face in the picture is taken as an image to be processed for segmentation processing, and an image frame without the face in the picture is not taken as the image to be processed for segmentation processing, so that the face in the video picture can be ensured to be segmented in real time, the load can be reduced when the face does not exist in the video picture, the energy consumption is saved, the video picture is prevented from being processed by mistake, and the like.

In step S102, a target object of the at least one object in the image to be processed is detected, and a key point of the target object is extracted.

The target object in the image to be processed may be detected by using a pre-trained target detection neural network, and the detection result is a position of the target object in the image to be processed, for example, the detection result may be a coordinate of a detection frame capable of surrounding the target object (for example, the position of the detection frame may be identified by an upper left corner coordinate and a lower right corner coordinate of a rectangular detection frame). If each object of at least one object in the image to be processed is detected in sequence, the detection result of the object needing to be detected currently can be output; if each object of at least one object in the image to be processed is detected at the same time, the detection result of each object can be output at the same time.

And determining the position of the detection frame of the target object in the image to be processed according to the detection result of the target object, and extracting the key points at the corresponding positions by utilizing a pre-trained target key point detection network. The number of the extracted key points can be preset, and the key points of the target object are extracted according to the number requirement; the position attributes of the key points to be extracted may also be preset according to the type of the target object, where the position attributes are used to represent local position areas of the target object, for example, the local position areas of the face may be glasses, a nose, a mouth, a contour, etc., and then the key points are extracted at corresponding positions of the target object according to each position attribute, and the extracted key points are labeled with the corresponding position attributes, for example, when extracting the key points of the face, the key points of the eyes, the nose, the mouth, the contour, etc. may be extracted respectively.

The key points can be represented by their position coordinates on the image to be processed, that is, the information of the key points extracted in this step is the coordinate information of the key points on the image to be processed.

In some embodiments, when the image to be processed includes a plurality of objects, at least two objects to be segmented in the image to be processed may be detected as target objects in this step, and the key points of each of the at least two target objects may be extracted respectively. Therefore, when instance segmentation is performed on at least two objects to be segmented in the image to be processed, the key point of each object is obtained in the step, and then the segmentation processing of each object is guided in the subsequent step (specifically, step S103).

In step S103, the key points of the target object are used as guidance information for segmentation processing, and the target object in the image to be processed is segmented to obtain a segmentation result of the target object.

The target object in the image to be processed may be segmented by using a pre-trained segmentation network model, for example, the image to be processed and the position information of the key point are input to the segmentation network model, so that the segmentation network model outputs the segmentation result of the target object. The segmentation result is used to indicate the position information of the target object in the image to be processed, for example, coordinate information of pixels contained in the target object in the image to be processed. For example, the segmentation result may be represented by a mask image used to mark the position of the pixel included in the target object.

If the plurality of objects in the image to be processed determine the segmentation result according to the method provided by the embodiment, the output segmentation result may be used to indicate the position information of the plurality of target objects in the image to be processed respectively. For example, the segmentation result may be represented by a mask image for labeling positions of pixels included in each of the plurality of target objects, and pixel values of positions of different target objects in the mask image may be set to be different, thereby distinguishing regions where different target objects are located.

In some embodiments, when the image to be processed includes a plurality of objects, in this step, each of the target objects in the image to be processed may be segmented with the key point of each of the target objects as guidance information for the segmentation processing, so as to obtain a segmentation result of each of the target objects. That is, step S103 may be performed for each target object, thereby obtaining a segmentation result for each target object. Therefore, when the instance segmentation is carried out on a plurality of target objects in the image to be processed, the key points of each target object are used as guidance to carry out the segmentation. In this way, even when a plurality of target objects are close to each other in the same image to be processed, each target object can be accurately segmented with the key point as the guide information. When the segmentation network model is used for carrying out instance segmentation on a plurality of target objects, the segmentation network model can distinguish which object in an image to be segmented is the target object needing segmentation positioning under the guidance of key points, and gives a segmentation result of the target object; the method can well solve the problem of wrong segmentation caused by the fact that when a plurality of target objects are close to or overlapped with each other, the segmentation network model cannot judge which segmentation result of the target object is output, and meanwhile, the problem of wrong segmentation caused by insufficient learning capacity of the segmentation network model under the condition of complex object postures is solved.

It is understood that step S102 and step S103 may be executed in a multi-thread parallel manner, and the target object detection and the key point extraction of step S102 may be performed on the next frame image while the target object segmentation is performed on the current frame image in step S103, so as to improve the image processing speed. And the neural network executing the step S102 and the neural network executing the step S103 can be trained independently, so that the training process is more stable, the training effect is better, and the cost is lower.

After the division result of the target object is determined, the identification of the division result of the target object may be determined, and a number, a name, or the like may be given to the target object. And at least one of the following processes can be performed on the segmentation result of the target object according to the identification of the segmentation result of the target object: counting processing, positioning processing, content classification processing, special effect rendering processing and the like. For example, when the target object is a face, the segmentation result of the target object may be subjected to a beautifying process or a special effect may be added, a bullet screen may be avoided from blocking the face during video editing, and a segment of a designated person may be captured during video playing. After each object in the image to be processed is segmented and the segmentation result is determined, the number of the objects in the image to be processed can be counted.

In the embodiment of the application, because the two processes of target object detection and segmentation are decoupled and then run independently, when a plurality of continuous images or a plurality of video frames of a video are processed, the next image or video frame can be detected while the current image or video frame is segmented, so that the processing efficiency of the image or video is improved, and the problems of large calculation amount, long calculation time consumption and the like caused by the fact that the tasks of target detection positioning and target segmentation are completed simultaneously in an example segmentation algorithm in the related technology are solved; in addition, in the related technology, the example segmentation is directly carried out on the image area with high activation degree, the segmentation process of the target object is guided by the key points of the target object, and compared with the image area with high activation degree, the key points transmit the target features to be segmented with higher fine granularity, so that the segmentation precision of the target object is improved under the conditions of smaller calculation amount and time consumption of calculation, particularly, the key points are favorable for segmenting the object with larger posture change, for example, the segmentation capability of the non-rigid object (such as a human body structure) is enhanced, and a better segmentation effect can be achieved under the conditions that the segmentation target structure is complex and the postures are varied.

According to some embodiments of the present disclosure, the target object in the image to be processed may be segmented with the key point of the target object as guidance information for the segmentation process, to obtain a segmentation result of the target object, as follows: firstly, determining a key point heat map of the target object according to the key points of the target object; and then, taking the key point heat map of the target object as guidance information of segmentation processing, and segmenting the target object in the image to be processed to obtain a segmentation result of the target object.

The keypoint heat map is used for marking the position coordinates of the keypoints, wherein the position coordinates of the keypoints can be marked through pixel differences in the image, and for example, the position coordinates of the keypoints can be marked through brightness differences of pixels, that is, pixels at the positions of the keypoints are identified by pixel values with higher brightness, and pixels at other positions are identified by pixel values with lower brightness. In addition, the position coordinates of the marked key points are the position coordinates of the key points on the image to which the key points belong, for example, the key point heat map determined according to the key points in the image to be processed is used for marking the position coordinates of the key points on the image to be processed, and the key point heat map determined according to the key points in the image block of the target object is used for marking the position coordinates of the key points on the image block. The key point heat map can clearly and accurately represent the positions of the key points, and is more beneficial to guiding the segmentation of the target object. When determining the heat map of the key point, the position information of the key point can be encoded into the form of the heat map of the key point from the coordinate form of the point by a Gaussian coding algorithm or an approximate Gaussian coding algorithm. For example, the coordinates of the key point may be used as the center of a gaussian circle in the heat map, the hyperparameter sigma, i.e., the radius of the gaussian circle, may be determined according to the size of the target object, and the central pixel parameter of the gaussian circle (e.g., the brightness value of the pixel) is the largest, and decreases to a preset minimum value along the radius according to a two-dimensional gaussian distribution.

The target object is divided as described above, and the following two cases can be made.

In the first case, when determining the key point heat map of the target object according to the key points of the target object, a key point heat map with the size consistent with that of the image to be processed may be generated, wherein the key points of the target object are marked on the key point heat map; and when the key point heat map of the target object is used as guidance information for segmentation processing, the target object in the image to be processed is segmented to obtain a segmentation result of the target object, the image to be processed and the key point heat map can be input into a segmentation network model trained in advance for segmentation processing, and the segmentation result of the target object marked on the image to be processed is output.

In this case, the key points of the target object can be directly mapped onto the key point heat map, that is, the positions of the key points on the image to be processed are consistent with the positions of the key points on the key point heat map, and the segmentation result of the target object marked on the image to be processed can be directly determined, so that the method is simple and convenient and has fewer steps.

In a second case, when determining the key point heat map of the target object according to the key points of the target object, a region to be segmented of the target object may be determined from the image to be processed according to the key points of the target object, where the region to be segmented of the target object is an image block of the target object; next mapping the key points of the target object into the image block; and finally, generating a key point heat map consistent with the size of the image block, wherein key points of the target object are marked in the key point heat map. When the key point heat map of the target object is used as guidance information for segmentation processing, and the target object in the image to be processed is segmented to obtain a segmentation result of the target object, the key point heat map and the image block may be input into a pre-trained segmentation network model for segmentation processing, and the segmentation result of the target object marked on the image block is output; and mapping the segmentation result of the target object marked on the image block into the image to be processed according to the position of the region to be segmented of the target object in the image to be processed.

Optionally, the region to be segmented is determined according to the following manner, first, the central points of the key points are determined according to all the key points of the target object, then, the key points with the borderline in the up, down, left and right directions are taken, then, the point reached by extending the connecting line from the central point to the leftmost key point by a preset multiple (for example, by 0.5 times) is determined as the left borderline of the region to be segmented, the point reached by extending the connecting line from the central point to the rightmost key point by the preset multiple (for example, by 0.5 times) is determined as the right borderline of the region to be segmented, the point reached by extending the connecting line from the central point to the uppermost key point by the preset multiple (for example, by 0.5 times) is determined as the upper borderline of the region to be segmented, the point reached by extending the connecting line from the central point to the bottommost key point by the preset multiple (for example, by 0.5 times) is determined as the lower borderline of the region to be segmented, and finally, determining the area to be segmented according to the upper, lower, left and right boundary points. It should be noted that, the above-mentioned up, down, left and right directions refer to up, down, left and right directions on the image to be processed, that is, the straight line on the left side of the image to be processed is the straight line in the up-down direction, and the straight line on the upper side of the image to be processed is the straight line in the left-right direction.

Optionally, the detection frame determined by detecting the target object in step S102 is used as the to-be-segmented region of the target object.

In one example, example segmentation is performed on four faces in the image to be processed shown in fig. 2 at the same time, the four faces are detected respectively at first, detection frames of the four faces are obtained, after a key point of each face is extracted, a region to be segmented of each face is determined according to the key point of each face, and the result is shown in fig. 3; then, taking a face on the left side as an example, determining a picture in a region to be segmented as an image block as shown in fig. 4, and mapping key points of the face to the image block; then generating a key point heat map as shown in fig. 5, which is consistent with the image block size as shown in fig. 4 and on which key points in the image block as shown in fig. 4 are marked; then as shown in fig. 6, inputting the image block shown in fig. 4 and the key point heat map shown in fig. 5 into a segmentation network model, and outputting the segmentation result of the face by the segmentation network model; the segmentation results of other faces are determined according to the above process, and finally the segmentation results of the faces displayed on each face image block are converted to the image to be processed, and the results are shown in fig. 7.

In this case, the key point heat map is determined for the image block where the target object is located, the target object is divided for the image block, and the division result is converted into the image to be processed, so that the calculation load is reduced, the calculation efficiency is improved, and the accuracy of the division result can be further improved by dividing the specific image block.

In the two cases, the key point heat map and the image to be segmented (namely the image to be processed or the image block of the target object) are input into the segmentation network model, and the segmentation network model distinguishes which object in the image to be segmented is the target object to be segmented and positioned under the guidance of the key point heat map and provides the segmentation result of the target object; the key point heat map is added as input, so that the problem of wrong segmentation caused by the fact that a segmentation network model cannot judge which segmentation result of a target object is output when a plurality of target objects are close to or overlapped with each other can be well solved, and meanwhile, the problem of wrong segmentation caused by the fact that the learning capacity of the segmentation network model is insufficient under the condition that the posture of the target is complex is solved.

In the two cases, when determining the key point heat map of the target object, a global key point heat map may be generated according to the key points of the target object, where all the key points of the target object are marked on the global key point heat map, and when inputting the key point heat map and the image to be segmented into the segmentation network model, the global key point heat map and the image to be segmented may be input into the segmentation network model.

In both cases, when determining the keypoint heat map of the target object, a plurality of local keypoint heat maps with different position attributes may be generated according to the keypoints of the target object, where the position attributes are used to characterize a local position area of the target object, for example, a local position area of a human face may be glasses, a nose, a mouth, a contour, and the like, and each local keypoint heat map is marked with the keypoints of the local position area of the target object. If the target object is a human face, whose keypoints include eye keypoints, nose keypoints, mouth keypoints, ear keypoints, etc., then an eye local keypoint heat map may be determined, and eye keypoints may be marked on the eye local keypoint heat map, a nose local keypoint heat map may be determined, and nose keypoints may be marked on the nose local keypoint heat map, a mouth local keypoint heat map may be determined, and mouth keypoints may be marked on the mouth local keypoint heat map, an ear local keypoint heat map may be determined, and ear keypoints may be marked on the ear local keypoint heat map, etc. Then when the keypoint heat map and the image to be segmented are input into the segmented network model, all of the local keypoint heat maps and the image to be segmented may be input into the segmented network model. By determining a plurality of local key point heat maps with different position attributes and marking corresponding key points on the different local key point heat maps, the segmentation network model can determine the posture of the target object according to the attributes of the plurality of local key point heat maps and the key points on the local key point heat maps, thereby segmenting the target object more accurately and improving the accuracy of the segmentation result of the target object.

In one application example of the present disclosure, a complete flow of the image processing method of the present application is provided, and the flow includes step 1 to step 8.

Firstly, step 1 is executed: acquiring an image to be processed, wherein the image to be processed is provided with a plurality of objects;

then step 2 is executed: detecting the positions of all target objects to be segmented in an image to be processed and the positions of key points of all the target objects;

then step 3 is executed: aiming at each target object, determining a region to be segmented of the target object according to a preset rule and the key point of the target object;

then step 4 is executed: extracting an image block of each target object from the image to be processed, namely an image area in the area to be segmented, and mapping the key point of each target object in the image to be processed into the image block of the target object respectively;

then step 5 is executed: encoding the position information of the key points into key point heat maps consistent with the sizes of the image blocks in a coordinate mode through a Gaussian encoding algorithm or an approximate Gaussian encoding algorithm, wherein all the key points of the target object can be encoded into one global key point heat map, or all the key points of the target object can be encoded into local key point heat maps with different position attributes respectively, the position attributes are used for representing local position areas of the target object, and each local key point heat map is marked with the key points of the local position areas of the target object;

then step 6 is executed: for each target object: combining the image block and the key point heat map and inputting the image block and the key point heat map into a segmentation network model, segmenting the target object by the segmentation network model under the guidance of the key point heat map, and outputting the segmentation result of the target object marked on the image block;

then step 7 is performed: respectively mapping the segmentation result marked on the image block of each target object to the image to be processed according to the position of the image block on the image to be processed, so as to obtain the segmentation result of each object to be segmented marked on the image to be processed, and displaying the segmentation result of the image to be segmented through the eye mask image;

finally, step 8 may be performed: adding a number to the segmentation result of each object to be segmented, and performing memorability counting processing, positioning processing, content classification processing, special effect rendering processing and the like on the segmentation results of the objects to be segmented according to the number. Thereby completing the example segmentation and subsequent processing of the image to be processed.

An embodiment of the present disclosure provides an image processing apparatus, referring to fig. 8, the apparatus including:

an obtaining module 801, configured to obtain an image to be processed, where the image to be processed includes at least one object;

a detecting module 802, configured to detect a target object in the at least one object in the image to be processed, and extract a key point of the target object;

a segmenting module 803, configured to segment the target object in the image to be processed by using the key point of the target object as guidance information for the segmentation processing, so as to obtain a segmentation result of the target object.

In some embodiments of the present disclosure, the image to be processed includes a plurality of objects, and the detection module is specifically configured to:

the segmentation module is specifically configured to:

In some embodiments of the present disclosure, the segmentation module is specifically configured to include:

In some embodiments of the present disclosure, the segmentation module, when determining the keypoint heat map of the target object according to the keypoints of the target object, is specifically configured to:

mapping keypoints of the target object into the image block;

In some embodiments of the present disclosure, the segmentation module is configured to segment the target object in the image to be processed by using the keypoint heat map of the target object as guidance information for segmentation processing, and when obtaining a segmentation result of the target object, the segmentation module is specifically configured to:

In some embodiments of the present disclosure, a processing module is further included for:

With regard to the apparatus in the above-mentioned embodiment, the specific manner in which each module performs the operation has been described in detail in the third aspect with respect to the embodiment of the method, and will not be elaborated here.

In a third aspect, at least one embodiment of the present disclosure provides an electronic device, please refer to fig. 9, which illustrates a structure of the electronic device, where the electronic device includes a memory for storing computer instructions executable on a processor, and the processor is configured to process an image based on the method according to any one of the first aspect when executing the computer instructions.

In a fourth aspect, at least one embodiment of the disclosure provides a computer readable storage medium having a computer program stored thereon, which when executed by a processor, performs the method of any of the first aspects.

The disclosure relates to the field of augmented reality, and aims to detect or identify relevant features, states and attributes of a target object by means of various visual correlation algorithms by acquiring image information of the target object in a real environment, so as to obtain an AR effect combining virtual and reality matched with specific applications. For example, the target object may relate to a face, a limb, a gesture, an action, etc. associated with a human body, or a marker, a marker associated with an object, or a sand table, a display area, a display item, etc. associated with a venue or a place. The vision-related algorithms may involve visual localization, SLAM, three-dimensional reconstruction, image registration, background segmentation, key point extraction and tracking of objects, pose or depth detection of objects, and the like. The specific application can not only relate to interactive scenes such as navigation, explanation, reconstruction, virtual effect superposition display and the like related to real scenes or articles, but also relate to special effect treatment related to people, such as interactive scenes such as makeup beautification, limb beautification, special effect display, virtual model display and the like. The detection or identification processing of the relevant characteristics, states and attributes of the target object can be realized through the convolutional neural network. The convolutional neural network is a network model obtained by performing model training based on a deep learning framework.

In this disclosure, the terms "first" and "second" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. The term "plurality" means two or more unless expressly limited otherwise.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image processing method, comprising:

2. The image processing method according to claim 1, wherein the image to be processed includes a plurality of objects, and the detecting a target object of the at least one object in the image to be processed and extracting key points of the target object includes:

3. The image processing method according to claim 1, wherein the segmenting the target object in the image to be processed by using the key point of the target object as guidance information for segmentation processing to obtain a segmentation result of the target object comprises:

4. The method of image processing according to claim 3, wherein said determining a keypoint heat map of the target object from the keypoints of the target object comprises:

mapping keypoints of the target object into the image block;

5. The image processing method according to claim 4, wherein the segmenting the target object in the image to be processed by using the keypoint heat map of the target object as guidance information of segmentation processing to obtain a segmentation result of the target object comprises:

6. The method of image processing according to claim 3, wherein said determining a keypoint heat map of the target object from the keypoints of the target object comprises:

7. The method of any of claims 3 to 6, wherein determining the keypoint heat map of the target object based on the keypoints of the target object comprises:

8. The image processing method according to any one of claims 1 to 6, further comprising:

9. An image processing apparatus characterized by comprising:

10. The image processing apparatus according to claim 9, wherein the image to be processed includes a plurality of objects, and the detection module is specifically configured to:

the segmentation module is specifically configured to:

11. The image processing apparatus according to claim 9, wherein the segmentation module is specifically configured to:

12. The image processing apparatus according to claim 11, wherein the segmentation module, when determining the keypoint heat map of the target object based on the keypoints of the target object, is specifically configured to:

mapping keypoints of the target object into the image block;

13. The image processing apparatus according to claim 12, wherein the segmentation module is configured to segment the target object in the image to be processed by using the keypoint heat map of the target object as guidance information for segmentation processing, and when obtaining a segmentation result of the target object, is specifically configured to:

14. The image processing apparatus according to claim 11, wherein the segmentation module, when determining the keypoint heat map of the target object based on the keypoints of the target object, is specifically configured to:

15. The image processing apparatus according to any one of claims 11 to 14, wherein the segmentation module, when determining the keypoint heat map of the target object based on the keypoints of the target object, is specifically configured to:

16. The image processing apparatus according to any one of claims 9 to 14, further comprising a processing module configured to:

17. An electronic device, comprising a memory for storing computer instructions executable on a processor, the processor being configured to implement the method of any one of claims 1 to 8 when executing the computer instructions.

18. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method of any one of claims 1 to 8.