CN108694719B

CN108694719B - Image output method and device

Info

Publication number: CN108694719B
Application number: CN201710217139.5A
Authority: CN
Inventors: 安山; 陈宇
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-04-05
Filing date: 2017-04-05
Publication date: 2020-11-03
Anticipated expiration: 2037-04-05
Also published as: CN108694719A

Abstract

The application discloses an image output method and device. One embodiment of the method comprises: based on a background point set and a foreground point set of the target image, segmenting the target image and determining a pixel value of each pixel point in the generated first mask image; importing the pixel value of each pixel point of the target image into a pre-generated skin likelihood value detection model for matching to obtain the likelihood value of each pixel point, and determining the pixel points belonging to the skin area in each pixel point; performing superpixel segmentation on the target image to generate a target image comprising superpixels; determining the pixel value of each pixel point in the super-pixels based on the number of the pixel points belonging to the skin area in the super-pixels so as to determine the pixel value of each pixel point in the second mask image of the target image; and outputting the human body image in the target image based on the pixel values of all pixel points in the first mask image and the second mask image. The embodiment realizes a more accurate and reliable human body image extraction mode.

Description

Image output method and device

Technical Field

The present application relates to the field of computer technologies, and in particular, to the field of image processing technologies, and in particular, to an image output method and apparatus.

Background

The accurate separation of foreground from images or videos (also known as "matting" techniques), which is a key technique in image processing, video editing, and movie production, has been a history of over twenty years and has been widely studied and applied. In recent years, researches mainly focus on "matting" natural images (i.e., images without limitation on backgrounds), extra information needs to be obtained for solving the "matting" natural images, usually more input information is obtained through interaction with users to construct constraint conditions, and the method needs more manual operations, and is low in efficiency and accuracy for some complex images; meanwhile, when an image segmentation algorithm is adopted to divide a foreground region and a background region in an image, due to some external factors such as illumination, angle and the like, a result obtained by 'matting' is not accurate.

Disclosure of Invention

It is an object of the present application to provide an improved image output method and apparatus to solve the technical problems mentioned in the background section above.

In a first aspect, an embodiment of the present application provides an image output method, including: based on a background point set and a foreground point set of a target image containing a human body image, segmenting the target image, generating a first mask image comprising a foreground region and the background region, and determining a pixel value of each pixel point in the first mask image, wherein the background point is a pixel point belonging to the background of the target image, and the foreground point is a pixel point belonging to the foreground of the target image; importing the pixel value of each pixel point of the target image into a pre-generated skin likelihood value detection model for matching to obtain the likelihood value of each pixel point belonging to a skin area, and determining the pixel point belonging to the skin area in each pixel point based on the likelihood value, wherein the skin likelihood value detection model is used for representing the corresponding relation between the pixel value and the likelihood value; performing superpixel segmentation on the target image to generate a target image comprising superpixels; determining the pixel value of each pixel point in the super-pixels based on the number of the pixel points belonging to the skin area and contained in the super-pixels so as to determine the pixel value of each pixel point in a second mask image of the pre-established target image; and outputting the human body image in the target image based on the pixel value of each pixel point in the first mask image and the pixel value of each pixel point in the second mask image.

In some embodiments, before segmenting the target image based on the background point set and the foreground point set of the target image including the human body image, the method further comprises: determining the contour of a human body image in a target image, acquiring at least one pixel point between the contour and the top edge of the target image, and taking the at least one pixel point as a background point set of the target image; and acquiring at least one pixel point on the face in the target image, and taking the at least one pixel point on the face as a foreground point set of the target image.

In some embodiments, determining the contour of the human image in the target image comprises: detecting the edge of a human body image in a target image to generate an edge characteristic image containing the contour line of the human body image; closing the edge characteristic image to convert discontinuous contour lines in the edge characteristic image into continuous contour lines; determining the contour line of the target image after the closing operation; and filling the target image containing the contour line after the closing operation by adopting an image filling algorithm, and acquiring the contour of the human body image in the filled target image.

In some embodiments, obtaining at least one pixel point between the contour and the top edge of the target image, and using the at least one pixel point as a background point set of the target image, includes: and determining the midpoint of a connecting line between the point closest to the top edge of the target image and any point on the top edge in the contour, and taking a point set on a line segment passing through the midpoint and parallel to the top edge as a background point set of the target image.

In some embodiments, obtaining at least one pixel point on a face in a target image, and using the at least one pixel point on the face as a foreground point set of the target image includes: and acquiring pixel points of two eyes on the face of the target image, and taking a point set on a connecting line of the pixel points of the two eyes as a foreground point set of the target image.

In some embodiments, outputting the human body image in the target image comprises: generating a human body image to be output; constructing a rectangle with preset side length by taking each point on the outline of the human body image to be output as a center; smoothing each pixel point in each rectangle by using Gaussian filtering to obtain a smoothed human body image; and outputting the smoothed human body image.

In some embodiments, determining the pixel value of each pixel point in the super-pixel based on the number of pixel points belonging to the skin region included in the super-pixel comprises: acquiring the total number of pixel points in the super pixels; determining the ratio of the number and the total number of pixel points belonging to the skin area and contained in the super pixels, and determining whether the ratio is greater than a preset ratio threshold value; and if so, setting the pixel value of each pixel point in the super pixels as the pixel value of the pixel point in the skin area.

In some embodiments, the pixel values of the pixel points in the foreground region in the first mask image are preset values, and the pixel values of the pixel points in the skin region in the second mask image are preset values; and outputting the human body image in the target image based on the pixel values of all the pixel points in the first mask image and the pixel values of all the pixel points in the second mask image, wherein the steps of: acquiring pixel points of which the pixel values in the first mask image are preset values and pixel points of which the pixel values in the second mask image are preset values, and generating a pixel point set of the human body image; extracting the region where the pixel points in the pixel point set are located from the target image, and taking the region as a human body image; and outputting the human body image.

In a second aspect, an embodiment of the present application provides an image output apparatus, including: the first determining unit is configured to segment a target image based on a background point set and a foreground point set of the target image including a human body image, generate a first mask image including a foreground region and the background region, and determine a pixel value of each pixel point in the first mask image, wherein the background point is a pixel point belonging to the background of the target image, and the foreground point is a pixel point belonging to the foreground of the target image; the second determining unit is configured to introduce the pixel value of each pixel point of the target image into a pre-generated skin likelihood value detection model for matching to obtain a likelihood value of each pixel point belonging to a skin region, and determine the pixel point belonging to the skin region in each pixel point based on the likelihood value, wherein the skin likelihood value detection model is used for representing the corresponding relation between the pixel value and the likelihood value; a generating unit configured to perform superpixel segmentation on a target image and generate a target image including superpixels; a third determining unit configured to determine a pixel value of each pixel point in the super-pixels based on the number of pixel points belonging to the skin region included in the super-pixels, so as to determine a pixel value of each pixel point in a second mask image of the pre-established target image; and the output unit is configured to output the human body image in the target image based on the pixel value of each pixel point in the first mask image and the pixel value of each pixel point in the second mask image.

In some embodiments, the apparatus further comprises: the fourth determining unit is configured to determine the contour of the human body image in the target image, acquire at least one pixel point between the contour and the top edge of the target image, and use the at least one pixel point as a background point set of the target image; and the fifth determining unit is configured to acquire at least one pixel point on the face in the target image and take the at least one pixel point on the face as a foreground point set of the target image.

In some embodiments, the fourth determination unit comprises: the detection module is configured to detect the edge of the human body image in the target image and generate an edge feature image containing the contour line of the human body image; the closing operation module is configured to perform closing operation on the edge feature image so as to convert discontinuous contour lines in the edge feature image into continuous contour lines; a determining module configured to determine a contour line of the target image after the closing operation; and the filling module is configured to fill the target image containing the contour line after the closing operation by adopting an image filling algorithm and acquire the contour of the human body image in the filled target image.

In some embodiments, the fourth determination unit is further configured to: and determining the midpoint of a connecting line between the point closest to the top edge of the target image and any point on the top edge in the contour, and taking a point set on a line segment passing through the midpoint and parallel to the top edge as a background point set of the target image.

In some embodiments, the fifth determining unit is further configured to: and acquiring pixel points of two eyes on the face of the target image, and taking a point set on a connecting line of the pixel points of the two eyes as a foreground point set of the target image.

In some embodiments, an output unit includes: the first generation module is configured to generate a human body image to be output; the construction module is configured to construct a rectangle with preset side length by taking each point on the outline of the human body image to be output as a center; the smoothing module is configured to perform smoothing on each pixel point in each rectangle by using Gaussian filtering to obtain a smoothed human body image; and the first output module is configured to output the smoothed human body image.

In some embodiments, the third determining unit comprises: the acquisition module is configured to acquire the total number of pixel points in the super pixels; the determining module is configured to determine the ratio of the number of the pixel points belonging to the skin area and contained in the super-pixel to the total number, and determine whether the ratio is greater than a preset ratio threshold; and the setting module is configured to set the pixel value of each pixel point in the super pixels as the pixel value of the pixel point in the skin area if the super pixels are in the skin area.

In some embodiments, the pixel values of the pixel points in the foreground region in the first mask image are preset values, and the pixel values of the pixel points in the skin region in the second mask image are preset values; and an output unit, further comprising: the second generation module is configured to acquire pixel points of which the pixel values in the first mask image are preset values and pixel points of which the pixel values in the second mask image are preset values, and generate a pixel point set of the human body image; the extraction module is configured to extract a region where the pixel points in the pixel point set are located from the target image, and the region is used as a human body image; and the second output module is configured to output the human body image.

In a third aspect, an embodiment of the present application further provides a server, including: one or more processors; the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors realize the image output method provided by the application.

In a fourth aspect, the present application further provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the image output method provided in the present application.

The image output method and the image output device provided by the application are characterized in that the pixel values of all the pixel points in the first mask image are determined based on the foreground point set and the background point set of the target image containing the human body image, then the pixel values of all the pixel points in the second mask image are determined through algorithms such as skin color detection and superpixel segmentation, and finally the human body image in the target image is output based on the pixel values of all the pixel points in the first mask image and the pixel values of all the pixel points in the second mask image.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which the present application may be applied;

FIG. 2 is a flow diagram of one embodiment of an image output method according to the present application;

FIG. 3 is a flow chart of yet another embodiment of an image output method according to the present application;

fig. 4A is a schematic diagram of a target image including a human body image according to the image output method of the present application;

FIG. 4B is a schematic diagram of an edge feature image according to the image output method of the present application;

FIG. 4C is a schematic diagram of an edge feature image including continuous contour lines according to the image output method of the present application;

FIG. 4D is a diagram of an edge feature image including a maximum contour according to the image output method of the present application;

FIG. 4E is a schematic illustration of a target image after image fill according to the image output method of the present application;

FIG. 4F is a schematic diagram of a target image containing an outline of a human body image according to the image output method of the present application;

FIG. 4G is a schematic diagram of a target image containing a background point set according to the image output method of the present application;

FIG. 4H is a schematic diagram of a target image containing a set of foreground points according to the image output method of the present application;

fig. 4I is a schematic diagram of a first mask image according to the image output method of the present application;

FIG. 4J is a schematic diagram of a target image with skin regions and non-skin regions marked according to the image output method of the present application;

FIG. 4K is a schematic illustration of a target image including superpixels according to the image output method of the present application;

fig. 4L is a schematic diagram of a second mask image according to the image output method of the present application;

fig. 4M is a schematic diagram of a human body image to be output according to the image output method of the present application;

FIG. 4N is a schematic diagram of an image of a jagged edge according to the image output method of the present application;

FIG. 4O is a schematic illustration of an image of a smoothed edge according to the image output method of the present application;

FIG. 4P is a schematic diagram of a smoothed human body image according to the image output method of the present application;

FIG. 5 is a schematic structural diagram of one embodiment of an image output apparatus according to the present application;

FIG. 6 is a schematic block diagram of a computer system suitable for use in implementing a server according to embodiments of the present application.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the image output method or image output apparatus of the present application may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

1011, 1012, a network 102, a server 103, and an information display apparatus 104. Network 102 serves, among other things, as a medium for providing communication links between

terminal devices

1011, 1012 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The server 103 may interact with the

terminal apparatuses

1011, 1012 through the network 102 to transmit or receive target image information or the like; the server 103 may also interact with a local information display device 104 to output images and the like. Various client applications, such as a camera-type application, an image processing-type application, and the like, may be installed on the

terminal devices

1011, 1012.

The

terminal devices

1011, 1012 may be various electronic devices having a display screen and a camera and supporting information interaction, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 103 may be a server that provides various services, such as a background server that performs image processing on an acquired target image including a human body image. The background server may perform image segmentation, skin color detection, superpixel segmentation, and other processing on the acquired target image, and output a processing result (e.g., a human body image) to be presented on the

terminal devices

1011 and 1012 or on the local information display apparatus 104.

The information display device 104 may be various electronic apparatuses having a display screen and locally interacting with the server 103, and may display an image output by the server 103.

It should be noted that the image output method provided in the embodiment of the present application is generally executed by the server 103, and accordingly, the image output apparatus is generally disposed in the server 103.

It should be understood that the number of terminal devices, networks, servers, and information display devices in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, servers, and information display devices, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of an image output method according to the present application is shown. The image output method comprises the following steps:

step 201, based on a background point set and a foreground point set of a target image including a human body image, segmenting the target image, generating a first mask image including a foreground region and a background region, and determining a pixel value of each pixel point in the first mask image.

In this embodiment, an electronic device (for example, a server shown in fig. 1) on which the image output method is executed may first obtain a background point set and a foreground point set of a target image including a human body image, where the human body image included in the target image may be a human face image, a human body upper body image including a skin region, or a human body whole body image including a skin region. Each image can be divided into foreground and background, the background points in the background point set are pixel points belonging to the background of the target image, and the foreground points in the foreground point set are pixel points belonging to the foreground of the target image. Then, based on the background point set and the foreground point set, an image segmentation algorithm may be used to segment the target image.

As an example, the image may be segmented using a lazy Snapping algorithm, which is an interactive image segmentation method and is based on the following steps: establishing a color model of a foreground and a color model of a background by using a foreground point set and a background point set, and performing energy optimization on a Markov Random Field (Markov Random Field) established on the basis of an image by using a Graph Cut algorithm (Graph Cut) to determine the classification of each pixel in the image, namely whether the pixel belongs to a pixel point of a foreground region or a pixel point of a background region, so as to partition the foreground region and the background region of the image, wherein the Graph Cut algorithm judges whether the pixel in the image belongs to the foreground or the background of the image by defining an energy function, the Markov Random Field is a Random Field with Markov property, the Markov property is a Random Field with the Markov property, the distribution characteristic of an N +1 moment is irrelevant to the value of a Random variable before N moments when a Random variable sequence is sequentially arranged according to time relationship, random fields contain mainly two elements: the position (site) and phase space (phase space), when a value of the phase space is randomly assigned to each position according to a certain distribution, the totality is called the random field.

In this embodiment, after the target image is segmented, a first mask image including a foreground region and a background region may be generated. In the image processing, whether the image is processed depends on whether the mask bit of the pixel point in the mask image of the image is the mask, and if the mask bit is the mask, the pixel point is not processed. In the first mask image, a pixel value of a pixel point of the background region may be set to 0; the pixel value of the pixel point of the foreground region is set to 255. When the foreground region needs to be processed, the mask bit of the pixel point with the pixel value of 0 (namely the pixel point of the background region) can be set as a mask; when the background region needs to be processed, the mask bit of the pixel point with the pixel value of 255 (i.e., the pixel point of the foreground region) may be set as a mask. After the first mask image is generated, the electronic device may determine a pixel value of each pixel in the first mask image according to whether each pixel belongs to a foreground region or a background region.

Step 202, importing the pixel value of each pixel point of the target image into a pre-generated skin likelihood value detection model for matching to obtain a likelihood value of each pixel point belonging to a skin area, and determining the pixel point belonging to the skin area in each pixel point based on the likelihood value.

In this embodiment, the electronic device may obtain a skin likelihood value detection model from other electronic devices, or may pre-establish a skin likelihood value detection model, where the skin likelihood value detection model is a correspondence between a pixel value used for representing a pixel point and a likelihood value of the pixel point belonging to a skin region. The step of establishing a skin likelihood value detection model may include: the method comprises the steps of obtaining a preset number of images containing human skin and human skin areas marked in the images, obtaining pixel values corresponding to all pixel points in the images, and training to obtain a skin likelihood value detection model based on the pixel values of the pixel points and likelihood values of the pixel points belonging to the skin areas by utilizing a big data analysis algorithm, a machine learning algorithm and the like.

In this embodiment, the electronic device may first obtain a pixel value of each pixel point of the target image, and then introduce the pixel value of each pixel point into the obtained or pre-generated skin likelihood value detection model for matching; then, the likelihood value of the pixel point belonging to the skin area can be output; then, the electronic device may mark the pixel points whose likelihood value is greater than 0 as pixel points belonging to a skin region, and mark the pixel points whose likelihood value is less than or equal to 0 as pixel points belonging to a non-skin region.

Step 203, performing superpixel segmentation on the target image to generate a target image comprising superpixels.

In this embodiment, the electronic device may perform superpixel segmentation on the target image and generate the target image including superpixels. As an example, the target image may be subjected to superpixel segmentation using a Simple Linear Iterative Clustering (SLIC) method, where the SLIC method is a process of converting an initial color space of an image into a CIELAB (color mode theoretically including all colors visible to the human eye) color space and 5-dimensional feature vectors in XY coordinates, and then constructing a distance metric standard for the 5-dimensional feature vectors to perform local Clustering on pixels of the image.

The super pixel is an irregular pixel block which is formed by adjacent pixels with similar texture, color, brightness and other characteristics and has a certain visual significance. The method uses the similarity of the features between pixels to group the pixels, uses a small amount of super pixels to replace a large amount of pixels to express the picture features, and greatly reduces the complexity of image post-processing, so the method is usually used as a preprocessing step of a segmentation algorithm. Have been widely used in computer vision applications such as image segmentation, pose estimation, target tracking, target recognition, etc.

And 204, determining the pixel value of each pixel point in the super pixels based on the number of the pixel points belonging to the skin area and contained in the super pixels so as to determine the pixel value of each pixel point in the second mask image of the pre-established target image.

In this embodiment, after the pixel points belonging to the skin region are determined in step 202, and the target image including the superpixel is generated in step 203, a second mask image having the same size as the target image may be created, where the second mask image includes the split superpixels, and for each superpixel in the second mask image, the pixel value of each pixel point in the superpixel may be determined based on the number of pixel points belonging to the skin region included in the superpixel, and the pixel value of each pixel point in the second mask image may be determined.

Step 205, outputting the human body image in the target image based on the pixel value of each pixel point in the first mask image and the pixel value of each pixel point in the second mask image.

In this embodiment, after the pixel value of each pixel in the first mask image is determined in step 201, and after the pixel value of each pixel in the second mask image is determined in step 204, a foreground region in the first mask image and a skin region in the second mask image may be determined, then, a pixel point of the foreground region and a pixel point of the skin region are obtained, a pixel point set is generated, a region where the pixel point in the pixel point set is located may be extracted from the target image, and the region is used as a human body image and the human body image is output.

In some optional implementation manners of this embodiment, after generating the first mask image including the foreground region in step 201, the electronic device may set a pixel value of a pixel point of the foreground region in the first mask image to 255, and set a pixel value of a pixel point of the background region in the first mask image to 0; after determining whether the pixel value of the pixel point in each super pixel in the second mask image is the pixel value of the pixel point in the skin region in step 204, the pixel value of the pixel point in the skin region in the second mask image may be set to 255, and the pixel value of the pixel point in the non-skin region in the second mask image may be set to 0. The electronic device may obtain a pixel point with a pixel value of 255 in the first mask image and a pixel point with a pixel value of 255 in the second mask image, and generate a pixel point set of the human body image; then, the region where the pixel points in the pixel point set are located can be extracted from the target image, and the region is used as a human body image and is output. That is, if the pixel values of the pixel points in the first mask image and the second mask image are both 0, the pixel point of the output image is white; otherwise, setting the pixel value of the pixel point as the pixel value of the pixel point on the target image.

According to the method provided by the embodiment of the application, the skin color detection, the super-pixel segmentation and the segmentation algorithm for the foreground and the background of the image are combined, so that a better segmentation result of the human body image can be obtained, and a more accurate and reliable extraction mode of the human body image is realized.

With further reference to fig. 3, a flow 300 of yet another embodiment of an image output method is shown. The flow 300 of the image output method comprises the following steps:

step 301, determining the contour of the human body image in the target image, obtaining at least one pixel point between the contour and the top edge of the target image, and using the at least one pixel point as a background point set of the target image.

In this embodiment, an electronic device (for example, a server shown in fig. 1) on which the image output method operates may first acquire a target image including a human body image, which is input by a user through a terminal or acquired locally, as shown in fig. 4A; then, determining the outline or edge of the human body image in the target image; and then, acquiring at least one pixel point between the contour and the top edge of the target image, and taking the acquired at least one pixel point as a background point set of the target image. Each image can be divided into foreground and background, and the background points in the background point set are pixel points belonging to the background of the target image.

In some optional implementations of the embodiment, the electronic device may first detect an edge of a human body image in the target image, determine a contour line in the human body image, and generate an edge feature image including the contour line, as shown in fig. 4B. By way of example, a Canny edge detection algorithm can be used for detecting the edge of the image, the Canny edge detection algorithm is a first-order differential operator detection algorithm, on the basis of a first-order differential operator, non-maximum suppression and double-threshold improvement are added, and the non-maximum suppression can be used for effectively suppressing the multi-response edge and improving the positioning accuracy of the edge; the missing rate of the edge can be effectively reduced by using the double thresholds.

In some optional implementations of the embodiment, after the edge feature image including the contour line is generated, a closing operation may be performed on the edge feature image to convert a discontinuous contour line in the edge feature image into a continuous contour line, as shown in fig. 4C. The closing operation is a process of performing expansion operation and then corrosion operation on the image, and the closing operation can make the contour line smoother, generally can eliminate narrow gaps and long and thin gaps, eliminate small holes and fill up fractures in the contour line. The specific operation of the expansion is as follows: each element in the image is scanned by a structural element (typically 3 × 3 or 5 × 5), and then each pixel in the structural element and the pixel covered by the structural element are and-operated, if the result is 1, the pixel value of the pixel is 1, otherwise, the pixel value of the pixel is 0. The specific operation of corrosion is as follows: each element in the image is scanned by a structural element, and then each pixel in the structural element and the pixel covered by the structural element are subjected to AND operation, if the results are all 0, the pixel value of the pixel is 0, otherwise, the pixel value of the pixel is 1.

In some optional implementations of this embodiment, after the closing operation is performed on the edge feature image, a contour line in the target image after the closing operation is performed on the edge feature image may be determined, where the contour line is generally the largest contour line, that is, the contour line that includes the largest number of pixels in the image, as shown in fig. 4D.

In some optional implementations of this embodiment, after the contour line after the closing operation is determined, the target image including the contour line after the closing operation is performed may be filled using an image filling algorithm, where the image filling algorithm may also be referred to as a planar region filling algorithm, and the region filling is to give a boundary of a region, and modify all pixel units within the boundary to a specified color. As an example, a Flood fill algorithm (Flood fill algorithm), also known as an injection fill algorithm, may be used, which is an algorithm that gives a point within a connected domain, and starts from this point, finds all the remaining points of this connected domain and fills them with a specified color. In this implementation manner, the pixel point at the lower left corner of the target image containing the maximum contour line may be set as a starting point, the image is filled, the color of the filled portion is marked as 0, and the color of the unfilled portion is marked as 255, as shown in fig. 4E, the white area is the filled portion. Thereafter, an edge detection algorithm (e.g., Canny edge detection algorithm) may be employed to obtain the contour of the human body image in the target image after filling, as shown in fig. 4F.

In some optional implementations of this embodiment, the electronic device may first determine a closest point in the contour of the human body image to the top edge of the target image, then determine a midpoint on a connecting line between the closest point and any point on the top edge of the target image, and finally take a point set on a line segment that passes through the midpoint and is parallel to the top edge of the target image as a background point set of the target image, as shown in fig. 4G, a point set on a white straight line is the background point set of the target image.

Step 302, at least one pixel point on the face in the target image is obtained, and the at least one pixel point on the face is used as a foreground point set of the target image.

In this embodiment, the electronic device may first identify a position of a face in a target image by using an open-source face detection system, then acquire at least one pixel point on the face, and then use the acquired at least one pixel point as a foreground point set of the target image. The at least one pixel point may be any pixel point on the face, for example, a pixel point on the mouth, a pixel point on the nose, a pixel point on the forehead, and the like.

In some optional implementation manners of this embodiment, the electronic device may obtain pixel points on two eyes of a human face, and use a point set of the pixel points on a connection line of the pixel points on the two eyes as a foreground point set of the target image, as shown in fig. 4H, a point set on a white straight line is a foreground point set of the target image.

Step 303, segmenting the target image based on the background point set and the foreground point set of the target image including the human body image, generating a first mask image including a foreground region and a background region, and determining a pixel value of each pixel point in the first mask image.

In this embodiment, after the background point set and the foreground point set of the target image are obtained in step 301 and step 302, respectively, the target image may be segmented by using an image segmentation algorithm, for example, the image may be segmented by using a lazy Snapping algorithm, based on the background point set and the foreground point set; thereafter, a first mask image including a foreground region and a background region may be generated, as shown in fig. 4I, a white region is the foreground region in the first mask image, and a black region is the background region in the first mask image. In the image processing, whether the image is processed depends on whether the mask bit of the pixel point in the mask image of the image is the mask, and if the mask bit is the mask, the pixel point is not processed. After the first mask image is generated, the electronic device may determine a pixel value of each pixel in the first mask image according to whether each pixel belongs to a foreground region or a background region.

Step 304, importing the pixel value of each pixel point of the target image into a pre-generated skin likelihood value detection model for matching to obtain the likelihood value of each pixel point belonging to the skin area, and determining the pixel point belonging to the skin area in each pixel point based on the likelihood value.

In this embodiment, the electronic device may import the pixel value of each pixel of the target image into the obtained or pre-generated skin likelihood value detection model for matching; then, the likelihood value of the pixel point belonging to the skin area can be output; then, the electronic device may mark the pixel points with the likelihood value greater than 0 as pixel points belonging to a skin region, and mark the pixel points with the likelihood value less than or equal to 0 as pixel points belonging to a non-skin region, as shown in fig. 4J, a white region is a pixel point of a marked skin region, and a black region is a pixel point of a marked non-skin region.

Step 305, performing superpixel segmentation on the target image to generate a target image comprising superpixels.

In this embodiment, the electronic device may perform superpixel segmentation on the target image and generate the target image including superpixels, as shown in fig. 4K, the number of superpixels is set to 255 in fig. 4K, and thus, the target image is segmented into 255 blocks. The super pixel is an irregular pixel block which is formed by adjacent pixels with similar texture, color, brightness and other characteristics and has a certain visual significance. As an example, a super-pixel segmentation may be performed by using an SLIC method, where the SLIC method is a process of converting an initial color space of an image into a CIELAB color space and 5-dimensional feature vectors in XY coordinates, then constructing a distance metric for the 5-dimensional feature vectors, and performing local clustering on pixel points of the image.

Step 306, determining the pixel value of each pixel point in the super-pixel based on the number of the pixel points belonging to the skin region contained in the super-pixel, so as to determine the pixel value of each pixel point in the second mask image of the pre-established target image.

In this embodiment, after the pixels belonging to the skin region are determined in step 304 and the target image including the superpixels is generated in step 305, a second mask image having the same size as the target image may be created, where the second mask image includes the split superpixels, and for each superpixel in the second mask image, the pixel value of each pixel in the superpixel may be determined based on the number of pixels belonging to the skin region included in the superpixel, and the pixel value of each pixel in the second mask image may be determined.

In some optional implementation manners of this embodiment, for each super pixel in the second mask image, the electronic device may first obtain the total number of pixel points in the super pixel, and obtain the number of pixel points belonging to a skin region included in the super pixel; then, the ratio of the number of the pixel points belonging to the skin area to the total number can be determined, and whether the ratio is greater than a preset ratio threshold value is determined; if the ratio is greater than the preset ratio threshold, the pixel values of the pixels in the super pixel may be set as the pixel values of the pixels in the skin area, as shown in fig. 4L, the white area is the skin area in the second mask image, and the black area is the non-skin area in the second mask image.

Step 307, generating a human body image to be output based on the pixel value of each pixel point in the first mask image and the pixel value of each pixel point in the second mask image.

In this embodiment, after generating the first mask image including the foreground region in step 303, the electronic device may set a pixel value of a pixel point of the foreground region in the first mask image to 255, and set a pixel value of a pixel point of the background region in the first mask image to 0; after determining whether the pixel value of the pixel point in each super pixel in the second mask image is the pixel value of the pixel point in the skin region in step 306, the pixel value of the pixel point in the skin region in the second mask image may be set to 255, and the pixel value of the pixel point in the non-skin region in the second mask image may be set to 0. The electronic device may obtain a pixel point with a pixel value of 255 in the first mask image and a pixel point with a pixel value of 255 in the second mask image, and generate a pixel point set of the human body image; then, an area where a pixel point in the pixel point set is located may be extracted from the target image, and the area is used as a human body image to be output, as shown in fig. 4M, a non-white area in the image is a human body image to be output.

And 308, constructing a rectangle with preset side length by taking each point on the contour of the human body image to be output as a center.

In this embodiment, since the edge of the human body image to be output generated in step 307 is rough, there may be a sawtooth-shaped edge, as shown in fig. 4N, and therefore, each pixel point on the contour of the human body image to be output may be smoothed. The electronic device may construct a rectangle (e.g., a 5 × 5 structural element) with a predetermined side length with each point on the contour of the human body image to be output as a center.

And 309, smoothing each pixel point in each rectangle by using Gaussian filtering to obtain a smoothed human body image.

In this embodiment, a gaussian filter (Gauss filter) may be used to perform smoothing on each pixel point in each constructed rectangle to obtain a smoothed human body image, as shown in fig. 4O. Gaussian filtering is essentially a filter of a signal whose purpose is to smooth the signal. The specific operation of gaussian filtering is: each pixel in the image is scanned using a template (or convolution, mask), and the weighted average gray value of the pixels in the neighborhood determined by the template is used to replace the value of the pixel in the center of the template.

And step 310, outputting the smoothed human body image.

In this embodiment, after performing the smoothing process on each pixel point in step 309, the electronic device may output the human body image after the smoothing process, where the output image is as shown in fig. 4P. The electronic device can output the smoothed human body image on a local display screen step by step, and can also send the smoothed human body image to the terminal device for the terminal device to output the image.

As can be seen from fig. 3, compared with the embodiment corresponding to fig. 2, the flow 300 of the image output method in this embodiment highlights the steps of acquiring the background point set and the foreground point set of the target image, and performing smoothing processing on the edge of the human body image to be output. Therefore, the scheme described in this embodiment can automatically acquire the background point set and the foreground point set from the target image without manual annotation, and perform smoothing processing on the edge of the image before outputting the human body image.

With further reference to fig. 5, as an implementation of the methods shown in the above figures, the present application provides an embodiment of an image output apparatus, which corresponds to the embodiment of the method shown in fig. 2, and which is particularly applicable in various electronic devices.

As shown in fig. 5, the image output apparatus 500 of the present embodiment includes: a first determining unit 501, a second determining unit 502, a generating unit 503, a third determining unit 504, and an output unit 505. The first determining unit 501 is configured to segment a target image based on a background point set and a foreground point set of the target image including a human body image, generate a first mask image including a foreground region and the background region, and determine a pixel value of each pixel point in the first mask image, where a background point is a pixel point belonging to a background of the target image, and a foreground point is a pixel point belonging to a foreground of the target image; the second determining unit 502 is configured to import the pixel value of each pixel point of the target image into a pre-generated skin likelihood value detection model for matching to obtain a likelihood value that each pixel point belongs to a skin region, and determine, based on the likelihood value, a pixel point that belongs to the skin region in each pixel point, where the skin likelihood value detection model is used to represent a corresponding relationship between the pixel value and the likelihood value; the generation unit 503 is configured to perform superpixel segmentation on the target image, and generate a target image including superpixels; the third determining unit 504 is configured to determine a pixel value of each pixel point in the super-pixel based on the number of pixel points belonging to the skin region included in the super-pixel, so as to determine a pixel value of each pixel point in a second mask image of the pre-established target image; the output unit 505 is configured to output a human body image in the target image based on the pixel values of the respective pixel points in the first mask image and the pixel values of the respective pixel points in the second mask image.

In this embodiment, the first determining unit 501 of the image output apparatus 500 may first acquire a background point set and a foreground point set of a target image including a human body image, where the human body image included in the target image may be a human face image, a human body upper body image including a skin region, or a human body whole body image including a skin region. Each image can be divided into foreground and background, the background points in the background point set are pixel points belonging to the background of the target image, and the foreground points in the foreground point set are pixel points belonging to the foreground of the target image. Then, based on the background point set and the foreground point set, an image segmentation algorithm may be used to segment the target image. After the target image is segmented, a first mask image including a foreground region and a background region may be generated. After generating the first mask image, the first determining unit 501 may determine the pixel value of each pixel in the first mask image according to whether each pixel belongs to a foreground region or a background region.

In this embodiment, the second determining unit 502 may first obtain a pixel value of each pixel point of the target image, and then introduce the pixel value of each pixel point into an obtained or pre-generated skin likelihood value detection model for matching; then, the likelihood value of the pixel point belonging to the skin area can be output; then, the second determining unit 502 may mark the pixel points whose likelihood value is greater than 0 as the pixel points belonging to the skin region, and mark the pixel points whose likelihood value is less than or equal to 0 as the pixel points belonging to the non-skin region.

In this embodiment, the generation unit 503 may perform superpixel segmentation on the target image and generate a target image including superpixels. The super pixel is an irregular pixel block which is formed by adjacent pixels with similar texture, color, brightness and other characteristics and has a certain visual significance.

In this embodiment, after the second determining unit 502 determines the pixels belonging to the skin region and the generating unit 503 generates the target image including the superpixels, the third determining unit 504 may create a second mask image having the same size as the target image, where the second mask image includes the split superpixels, and for each superpixel in the second mask image, the pixel value of each pixel in the superpixel may be determined based on the number of pixels belonging to the skin region included in the superpixel, and the pixel value of each pixel in the second mask image may be determined.

In this embodiment, after the first determining unit 501 determines the pixel value of each pixel in the first mask image, and after the third determining unit 504 determines the pixel value of each pixel in the second mask image, the output unit 505 may determine a foreground region in the first mask image and a skin region in the second mask image, then obtain a pixel point of the foreground region and a pixel point of the skin region, generate a pixel point set, extract a region where a pixel in the pixel point set is located from the target image, use the region as a human body image, and output the human body image.

In some optional implementations of the present embodiment, the image output apparatus 500 may further include a fourth determination unit 506 (not shown in the figure) and a fifth determination unit 507 (not shown in the figure). The fourth determining unit 506 may first obtain a target image including a human body image, which is input by a user through a terminal or locally obtained; then, determining the outline or edge of the human body image in the target image; and then, acquiring at least one pixel point between the contour and the top edge of the target image, and taking the acquired at least one pixel point as a background point set of the target image. The fifth determining unit 507 may first identify a position of a face in a target image by using an open-source face detection system, then obtain at least one pixel point on the face, and then use the obtained at least one pixel point as a foreground point set of the target image.

In some optional implementations of the present embodiment, the fourth determining unit 506 may include a detecting module 5061 (not shown), a closing operation module 5062 (not shown), a determining module 5063 (not shown), and a filling module 5064 (not shown). The detection module 5061 may first detect an edge of a human body image in the target image, determine a contour line in the human body image, and generate an edge feature image including the contour line. After generating the edge feature image including the contour lines, the closing operation module 5062 may perform a closing operation on the edge feature image to convert discontinuous contour lines in the edge feature image into continuous contour lines. The closing operation is a process of performing expansion operation and then corrosion operation on the image, and the closing operation can make the contour line smoother, generally can eliminate narrow gaps and long and thin gaps, eliminate small holes and fill up fractures in the contour line. After the edge feature image is closed, the determining module 5063 may determine a contour line in the target image after the edge feature image is closed, where the contour line is generally the largest contour line, i.e., the contour line including the largest number of pixels in the image. After determining the contour line after the closing operation is performed, the filling module 5064 may fill the target image including the contour line after the closing operation using an image filling algorithm, where the image filling algorithm may also be referred to as a planar region filling algorithm, where region filling is to provide a boundary of a region, modify all pixel units within the boundary to a specified color, and then obtain the contour of the human body image in the target image after filling using an edge detection algorithm.

In some optional implementation manners of this embodiment, the fourth determining unit 506 may further determine a point closest to the top edge of the target image in the contour of the human body image, then determine a midpoint on a connecting line between the closest point and any point on the top edge of the target image, and finally determine a point set on a line segment that passes through the midpoint and is parallel to the top edge of the target image as the background point set of the target image.

In some optional implementation manners of this embodiment, the fifth determining unit 507 may obtain pixel points on two eyes of a human face, and use a point set of the pixel points on a connection line of the pixel points on the two eyes as a foreground point set of the target image.

In some optional implementation manners of this embodiment, the output unit 505 may further include: a first generation module 5051 (not shown), a construction module 5052 (not shown), a smoothing module 5053 (not shown), and a first output module 5054 (not shown). The first generation module 5051 may extract a region where a pixel in the pixel set is located from the target image, and use the region as a human body image to be output. The building module 5052 may perform smoothing processing on each pixel point on the contour of the human body image to be output, and the building module 5052 may build a rectangle with a preset side length with each point on the contour of the human body image to be output as a center. The smoothing module 5053 may perform smoothing on each pixel point in each constructed rectangle by using gaussian filtering to obtain a smoothed human body image. After the smoothing module 5053 performs smoothing on each pixel, the first output module 5054 may output the smoothed human body image.

In some optional implementations of the present embodiment, the third determining unit 504 may include an obtaining module 5041 (not shown in the figure), a determining module 5042 (not shown in the figure), and a setting module 5043 (not shown in the figure). For each super pixel in the second mask image, the obtaining module 5041 may first obtain the total number of pixel points in the super pixel, and obtain the number of pixel points belonging to a skin region included in the super pixel; then, the determining module 5042 may determine a ratio of the number of the pixels belonging to the skin area to the total number, and determine whether the ratio is greater than a preset ratio threshold; if the ratio is greater than the preset ratio threshold, the setting module 5043 may set the pixel values of the pixels in the super pixel to the pixel values of the pixels in the skin area.

In some optional implementation manners of this embodiment, the output unit 505 may further include: a second generation module 5055 (not shown), an extraction module 5056 (not shown), and a second output module 5057 (not shown). After the first determining unit 501 generates the first mask image including the foreground region, the first determining unit 501 may set a pixel value of a pixel point of the foreground region in the first mask image to 255; after the third determining unit 504 determines whether the pixel value of the pixel point in each super pixel in the second mask image is the pixel value of the pixel point in the skin region, the pixel value of the pixel point in the skin region in the second mask image may be set to 255. The second generating module 5055 may obtain a pixel point with a pixel value of 255 in the first mask image and a pixel point with a pixel value of 255 in the second mask image, and generate a pixel point set of the human body image; then, the extracting module 5056 may extract a region where a pixel in the pixel set is located from the target image, and the second output module 5057 may use the region as a human body image and output the human body image.

Referring now to FIG. 6, a block diagram of a computer system 600 suitable for use as a server in implementing embodiments of the present invention is shown. The server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the computer system 600 includes a Central Processing Unit (CPU)601 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage section 608 into a Random Access Memory (RAM) 603. In the RAM 603, various programs and data necessary for the operation of the system 600 are also stored. The CPU 601, ROM 602, and RAM 603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.

The following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, and the like; an output portion 607 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 608 including a hard disk and the like; and a communication section 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the internet. The driver 610 is also connected to the I/O interface 605 as needed. A removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 610 as necessary, so that a computer program read out therefrom is mounted in the storage section 608 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 609, and/or installed from the removable medium 611. The computer program performs the above-described functions defined in the method of the present application when executed by a Central Processing Unit (CPU) 601. It should be noted that the computer readable medium mentioned above in the present application may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present invention may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes a first determining unit, a second determining unit, a generating unit, a third determining unit, and an output unit. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves. For example, the generation unit may also be described as "a unit that performs superpixel segmentation on the target image, generating the target image including superpixels".

As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: based on a background point set and a foreground point set of a target image containing a human body image, segmenting the target image, generating a first mask image comprising a foreground region and the background region, and determining a pixel value of each pixel point in the first mask image, wherein the background point is a pixel point belonging to the background of the target image, and the foreground point is a pixel point belonging to the foreground of the target image; importing the pixel value of each pixel point of the target image into a pre-generated skin likelihood value detection model for matching to obtain the likelihood value of each pixel point belonging to a skin area, and determining the pixel point belonging to the skin area in each pixel point based on the likelihood value, wherein the skin likelihood value detection model is used for representing the corresponding relation between the pixel value and the likelihood value; performing superpixel segmentation on the target image to generate a target image comprising superpixels; determining the pixel value of each pixel point in the super-pixels based on the number of the pixel points belonging to the skin area and contained in the super-pixels so as to determine the pixel value of each pixel point in a second mask image of the pre-established target image; and outputting the human body image in the target image based on the pixel value of each pixel point in the first mask image and the pixel value of each pixel point in the second mask image.

The foregoing description is only exemplary of the preferred embodiments of the invention and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention according to the present invention is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the scope of the invention as defined by the appended claims. For example, the above features and (but not limited to) features having similar functions disclosed in the present invention are mutually replaced to form the technical solution.

Claims

1. An image output method, characterized in that the method comprises:

based on a background point set and a foreground point set of a target image containing a human body image, segmenting the target image, generating a first mask image comprising a foreground region and the background region, and determining a pixel value of each pixel point in the first mask image, wherein the background point is a pixel point belonging to the background of the target image, and the foreground point is a pixel point belonging to the foreground of the target image;

importing the pixel value of each pixel point of the target image into a pre-generated skin likelihood value detection model for matching to obtain the likelihood value of each pixel point belonging to a skin area, and determining the pixel point belonging to the skin area in each pixel point based on the likelihood value, wherein the skin likelihood value detection model is used for representing the corresponding relation between the pixel value and the likelihood value;

performing superpixel segmentation on the target image to generate a target image comprising superpixels;

determining the pixel value of each pixel point in the super pixels based on the number of the pixel points belonging to the skin area and contained in the super pixels so as to determine the pixel value of each pixel point in a second mask image of the target image which is established in advance;

outputting the human body image in the target image based on the pixel values of the pixel points in the first mask image and the pixel values of the pixel points in the second mask image, including: acquiring pixel points of a foreground region in the first mask image and pixel points of a skin region in the second mask image, and generating a pixel point set of the human body image; extracting the region where the pixel points in the pixel point set are located from the target image, and taking the region as a human body image; and outputting the human body image.

2. The method according to claim 1, wherein before the segmenting the target image based on a background point set and a foreground point set of the target image including the human body image, the method further comprises:

determining the contour of a human body image in the target image, acquiring at least one pixel point between the contour and the top edge of the target image, and taking the at least one pixel point as a background point set of the target image;

and acquiring at least one pixel point on the face in the target image, and taking the at least one pixel point on the face as a foreground point set of the target image.

3. The method of claim 2, wherein the determining the contour of the human body image in the target image comprises:

detecting the edge of the human body image in the target image to generate an edge characteristic image containing the contour line of the human body image;

closing the edge feature image to convert discontinuous contour lines in the edge feature image into continuous contour lines;

determining the contour line of the target image after the closing operation;

and filling the target image containing the contour line after the closing operation by adopting an image filling algorithm, and acquiring the contour of the human body image in the filled target image.

4. The method according to claim 2 or 3, wherein the obtaining at least one pixel point between the contour and the top edge of the target image, and using the at least one pixel point as a background point set of the target image comprises:

and determining the midpoint of a connecting line of the point closest to the top edge of the target image in the contour and any point on the top edge, and taking a point set on a line segment which passes through the midpoint and is parallel to the top edge as a background point set of the target image.

5. The method of claim 2, wherein the obtaining at least one pixel point on a face in the target image, and using the at least one pixel point on the face as a foreground point set of the target image comprises:

and acquiring pixel points of two eyes on the face of the target image, and taking a point set on a connecting line of the pixel points of the two eyes as a foreground point set of the target image.

6. The method according to claim 1 or 2, wherein the outputting the human body image in the target image comprises:

generating a human body image to be output;

constructing a rectangle with preset side length by taking each point on the contour of the human body image to be output as a center;

smoothing each pixel point in each rectangle by using Gaussian filtering to obtain a smoothed human body image;

and outputting the smoothed human body image.

7. The method according to claim 1, wherein determining the pixel value of each pixel point in the superpixel based on the number of pixel points belonging to the skin region contained in the superpixel comprises:

acquiring the total number of pixel points in the super pixels;

determining the ratio of the number of pixel points belonging to the skin area and contained in the super pixels to the total number, and determining whether the ratio is greater than a preset ratio threshold value;

and if so, setting the pixel value of each pixel point in the super pixels as the pixel value of the pixel point in the skin area.

8. An image output apparatus, characterized in that the apparatus comprises:

the system comprises a first determining unit, a second determining unit and a third determining unit, wherein the first determining unit is configured to divide a target image based on a background point set and a foreground point set of the target image including a human body image, generate a first mask image including a foreground region and the background region, and determine pixel values of all pixel points in the first mask image, wherein background points are pixel points belonging to the background of the target image, and foreground points are pixel points belonging to the foreground of the target image;

the second determining unit is configured to import the pixel value of each pixel point of the target image into a pre-generated skin likelihood value detection model for matching to obtain a likelihood value that each pixel point belongs to a skin region, and determine the pixel point belonging to the skin region in each pixel point based on the likelihood value, wherein the skin likelihood value detection model is used for representing the corresponding relation between the pixel value and the likelihood value;

a generating unit configured to perform superpixel segmentation on the target image and generate a target image including superpixels;

a third determining unit, configured to determine, based on the number of pixel points belonging to a skin region included in the super-pixel, a pixel value of each pixel point in the super-pixel, so as to determine a pixel value of each pixel point in a second mask image of the target image, which is established in advance;

an output unit, configured to output a human body image in the target image based on the pixel values of the respective pixel points in the first mask image and the pixel values of the respective pixel points in the second mask image, including: the second generation module is configured to acquire pixel points of a foreground region in the first mask image and pixel points of a skin region in the second mask image, and generate a pixel point set of the human body image; the extraction module is configured to extract a region where a pixel point in the pixel point set is located from the target image, and the region is used as a human body image; and the second output module is configured to output the human body image.

9. The apparatus of claim 8, further comprising:

a fourth determining unit, configured to determine a contour of the human body image in the target image, obtain at least one pixel point between the contour and a top edge of the target image, and use the at least one pixel point as a background point set of the target image;

and the fifth determining unit is configured to acquire at least one pixel point on the face in the target image, and use the at least one pixel point on the face as a foreground point set of the target image.

10. The apparatus of claim 9, wherein the fourth determining unit comprises:

the detection module is configured to detect an edge of a human body image in the target image and generate an edge feature image containing a contour line of the human body image;

the closing operation module is configured to perform closing operation on the edge feature image so as to convert discontinuous contour lines in the edge feature image into continuous contour lines;

a determining module configured to determine a contour line of the target image after the closing operation;

and the filling module is configured to fill the target image containing the contour line after the closing operation by adopting an image filling algorithm and acquire the contour of the human body image in the filled target image.

11. The apparatus according to claim 9 or 10, wherein the fourth determining unit is further configured to:

12. The apparatus according to claim 9, wherein the fifth determining unit is further configured to:

13. The apparatus of claim 8 or 9, wherein the output unit comprises:

the first generation module is configured to generate a human body image to be output;

the construction module is configured to construct a rectangle with preset side length by taking each point on the contour of the human body image to be output as a center;

the smoothing module is configured to perform smoothing on each pixel point in each rectangle by using Gaussian filtering to obtain a smoothed human body image;

and the first output module is configured to output the smoothed human body image.

14. The apparatus of claim 8, wherein the third determining unit comprises:

an obtaining module configured to obtain a total number of pixels in the super-pixel;

a determining module configured to determine a ratio of the number of pixel points belonging to the skin region and included in the super-pixel to the total number, and determine whether the ratio is greater than a preset ratio threshold;

and the setting module is configured to set the pixel value of each pixel point in the super pixels as the pixel value of the pixel point in the skin area if the super pixels are in the skin area.

15. A server, comprising:

one or more processors;

a storage device for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

16. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.