WO2022152116A1 - 图像处理方法、装置、设备、存储介质及计算机程序产品 - Google Patents

图像处理方法、装置、设备、存储介质及计算机程序产品 Download PDF

Info

Publication number
WO2022152116A1
WO2022152116A1 PCT/CN2022/071306 CN2022071306W WO2022152116A1 WO 2022152116 A1 WO2022152116 A1 WO 2022152116A1 CN 2022071306 W CN2022071306 W CN 2022071306W WO 2022152116 A1 WO2022152116 A1 WO 2022152116A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
area
ternary
target
region
Prior art date
Application number
PCT/CN2022/071306
Other languages
English (en)
French (fr)
Inventor
冯云龙
陈旭
邰颖
汪铖杰
李季檩
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Priority to EP22738995.4A priority Critical patent/EP4276754A1/en
Priority to JP2023524819A priority patent/JP2023546607A/ja
Publication of WO2022152116A1 publication Critical patent/WO2022152116A1/zh
Priority to US17/989,109 priority patent/US20230087489A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • G06T11/20Drawing from basic elements, e.g. lines or circles
    • G06T11/203Drawing of straight lines or curves
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/90Determination of colour characteristics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Definitions

  • the present application relates to the technical field of artificial intelligence, and in particular, to an image processing method, apparatus, device, storage medium and computer program product.
  • image matting is a widely used image processing technology, which specifically refers to separating the foreground area in the image from the background area in the image.
  • the method of segmentation is usually used to realize matting, specifically, each pixel in the image is classified, so as to obtain block segmentation results of different categories, thereby obtaining the foreground area in the image, Such as portrait area, or building area.
  • Embodiments of the present application provide an image processing method, apparatus, device, storage medium, and computer program product.
  • an image processing method executed by a computer device, the method comprising:
  • the foreground area in the first image is the area where the target object is located in the original image
  • the second image package is the target object.
  • the third image is a segmented image of the second target area of the target object; the sub-areas of the foreground area include the first target area and the second target area;
  • a target ternary image is generated, the target ternary image includes the foreground area and a line drawing area, the line drawing area is formed by passing on the outline of the foreground area Obtained by drawing lines; different sub-regions of the foreground area correspond to different line widths;
  • the target object in the original image is cut out to obtain a target image including the target object.
  • an image processing device comprising:
  • the image segmentation module is used to perform image semantic segmentation on the original image to obtain a first image, a second image and a third image.
  • the foreground area in the first image is the area where the target object is located in the original image
  • the second image is the segmented image of the first target area of the target object
  • the third image is the segmented image of the second target area of the target object
  • the sub-area of the foreground area includes the first target area and the second target area
  • a ternary image generation module configured to generate a target ternary image based on the first image, the second image and the third image, where the target ternary image includes the foreground area and the line drawing area, and the line drawing area is obtained by Obtained by drawing lines on the outline of the foreground area; different sub-areas of the foreground area correspond to different line widths;
  • the matting module is configured to perform matting processing on the target object in the original image based on the target ternary image to obtain a target image including the target object.
  • a computer device comprising one or more processors and a memory for storing at least one computer readable instruction, the at least one piece of computer readable instruction being processed by the one or more
  • the image processing method in the embodiment of the present application is loaded and executed by the imager.
  • one or more computer-readable storage media having stored therein at least one computer-readable instruction that is loaded and executed by one or more processors to The operations performed in the image processing method in the embodiment of the present application are implemented.
  • a computer program product comprising computer readable instructions stored in a computer readable storage medium.
  • One or more processors of the computer device read the computer-readable instructions from the computer-readable storage medium, and the one or more processors execute the computer-readable instructions, causing the computer device to perform the image processing provided in the above embodiments method.
  • FIG. 1 is a schematic structural diagram of a high-resolution network provided according to an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of an object context feature representation provided according to an embodiment of the present application.
  • FIG. 3 is a schematic diagram of an implementation environment of an image processing method provided according to an embodiment of the present application.
  • FIG. 5 is a flowchart of another image processing method provided according to an embodiment of the present application.
  • FIG. 6 is a schematic diagram of an image semantic segmentation result provided according to an embodiment of the present application.
  • FIG. 7 is a schematic diagram of a first ternary graph provided according to an embodiment of the present application.
  • FIG. 8 is a schematic diagram of a second ternary graph provided according to an embodiment of the present application.
  • FIG. 9 is a schematic diagram of a third ternary graph provided according to an embodiment of the present application.
  • FIG. 10 is a schematic diagram of a target ternary graph provided according to an embodiment of the present application.
  • FIG. 11 is a schematic diagram of a cutout model provided according to an embodiment of the present application.
  • FIG. 12 is a schematic diagram of a target image provided according to an embodiment of the present application.
  • FIG. 13 is a schematic diagram of an image processing method provided according to an embodiment of the present application.
  • FIG. 14 is a schematic structural diagram of an image processing apparatus provided according to an embodiment of the present application.
  • FIG. 15 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • first and second are used to distinguish the same or similar items that have substantially the same function and function. It should be understood that the terms “first”, “second” and “nth” There is no logical or timing dependency between them, and the number and execution order are not limited. It will also be understood that, although the following description uses the terms first, second, etc. to describe various elements, these elements should not be limited by the terms.
  • first image could be referred to as a second image
  • second image could be referred to as a first image
  • Both the first image and the second image may be images, and in some cases, may be separate and distinct images.
  • At least one refers to one or more than one image
  • at least one image may be any integer number of images greater than or equal to one, such as one image, two images, three images, etc.
  • the multiple refers to two or more than two.
  • the multiple images may be any integer number of images greater than or equal to two, such as two images or three images.
  • the image processing solutions provided in the embodiments of the present application may use computer vision technology in artificial intelligence technology.
  • the semantic segmentation processing in this application uses computer vision technology.
  • a high-resolution network can be used to extract image feature information
  • an Object-Contextual Representations (OCR) technology can be used to calculate the semantic category of each pixel in the image.
  • High Resolution Network is a computational model used to obtain image feature information, which can maintain high-resolution representations during all operations.
  • HRNET starts with a set of high-resolution convolutions, and then progressively adds low-resolution branches of convolutions and concatenates them in parallel.
  • FIG. 1 is a schematic structural diagram of a high-resolution network provided by the present application. As shown in FIG. 1, the network parallels feature maps of different resolutions, and each resolution is one channel. In the whole process Information is continuously exchanged between the parallel operation combinations in the multi-resolution fusion.
  • OCR is a computational model for characterizing the semantic categories of pixels in an image.
  • FIG. 2 is a schematic structural diagram of an object context feature representation provided by the present application, as shown in FIG. 2: First, a rough semantic segmentation result is obtained through the middle layer of the backbone network, that is, the soft object region (Soft object area).
  • Soft object area the soft object region
  • K groups of vectors are obtained by calculating the pixel representation (Pixel Representation) and the soft object region output by the deep layer of the backbone network, K>1, that is, the object region representation (Object Region Representations), in which, each vector The feature representation corresponding to a semantic category; thirdly, the relationship matrix between the pixel feature and the object region feature representation is calculated; fourthly, according to the value of the pixel feature of each pixel and the object region feature representation in the relationship matrix, each The object region features are weighted and summed to obtain the contextual feature representation of the object, that is, OCR; finally, based on the OCR and pixel features, an Augmented Representation is obtained as contextual information enhancement, and the enhanced feature representation can be used for prediction. Semantic category for each pixel.
  • Semantic Segmentation For the input image, based on the semantic understanding of each pixel, the pixels with the same semantics are divided into the same part or region, and the process of obtaining several different semantic regions.
  • Foreground The subject in the image, such as a portrait in a portrait shot.
  • Image Matting An image processing technique that separates the foreground of an image from the background.
  • Trimap An image that contains three types of markers: foreground, background, and foreground-background mixed areas, and is usually used as the input of the matting model together with the original image. It should be noted that, in the following embodiments, the foreground/background mixed area is also referred to as a line drawing area.
  • Identification value A numerical value used to identify the color of a pixel in an image. For example, the identification value of a pixel is 255, indicating that the RGB (Red-Green-Blue, red, green and blue) color value of the pixel is (255, 255, 255), which is white; The identification value is 0, indicating that the RGB color value of the pixel point is (0, 0, 0), which is black; for another example, the identification value of a pixel point is 128, indicating that the RGB color value of the pixel point is (128, 128, 128), shown in gray.
  • Open Source Computer Vision Library A cross-platform computer vision and machine learning software library that runs on a variety of operating systems. OpenCV can be used to develop real-time image processing, computer vision and pattern recognition programs.
  • findContours A function in OpenCV for detecting contours in images.
  • drawContours A function in OpenCV for drawing contours in an image.
  • Cutout model A computational model used to calculate the probability that each pixel in the original image belongs to the foreground based on the original image and the ternary map.
  • the matting models include the IndexNet model, the GCAMatting model, and the ContextNet model.
  • FIG. 3 is a schematic diagram of an implementation environment of an image processing method provided according to an embodiment of the present application.
  • the implementation environment includes: a terminal 301 and a server 302 .
  • the terminal 301 and the server 302 can be directly or indirectly connected through wired or wireless communication, which is not limited in this application.
  • the terminal 301 is a smart phone, a tablet computer, a notebook computer, a desktop computer, etc., but is not limited thereto.
  • the terminal 301 can have applications installed and running.
  • the application is a social application, an image processing application, a photographing application, or the like.
  • the terminal 301 is a terminal used by a user, and a social application program runs in the terminal 301 , and the user can extract the portrait in the picture through the social application program.
  • the server 302 can be an independent physical server, a server cluster or a distributed system composed of multiple physical servers, and can also provide cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communications, Cloud servers for basic cloud computing services such as middleware services, domain name services, security services, Content Delivery Network (CDN), and big data and artificial intelligence platforms.
  • the server 302 is used to provide background services for the applications running on the terminal 301 .
  • the server 302 undertakes the main computing work, and the terminal 301 undertakes the secondary computing work; or, the server 302 undertakes the secondary computing work, and the terminal 301 undertakes the main computing work; or, the server 302 or The terminals 301 can individually undertake computing work.
  • the terminal 301 generally refers to one of multiple terminals, and this embodiment only takes the terminal 301 as an example for illustration.
  • the number of the above-mentioned terminals 301 can be larger.
  • the number of the above-mentioned terminals 301 is tens or hundreds, or more, in this case, the implementation environment of the above-mentioned image processing method also includes other terminals.
  • the embodiments of the present application do not limit the number of terminals and device types.
  • the aforementioned wireless or wired networks use standard communication techniques and/or protocols.
  • the network is usually the Internet, but can be any network, including but not limited to Local Area Network (LAN), Metropolitan Area Network (MAN), Wide Area Network (WAN), mobile, wired or wireless network, private network, or any combination of virtual private networks).
  • data exchanged over a network is represented using technologies and/or formats including HTML, Extensible Markup Language (XML), and the like.
  • it can also use services such as Secure Socket Layer (SSL), Transport Layer Security (TLS), Virtual Private Network (VPN), Internet Protocol Security (Internet Protocol Security, IPsec) and other conventional encryption techniques to encrypt all or some of the links.
  • custom and/or dedicated data communication techniques can also be used in place of or in addition to the data communication techniques described above.
  • FIG. 4 is a flowchart of an image processing method provided according to an embodiment of the present application.
  • the application to a computer device is taken as an example for description.
  • the computer equipment can be a terminal or a server, and the method includes the following steps:
  • 401 Perform image semantic segmentation on the original image to obtain a first image, a second image and a third image, where the foreground area in the first image is the area where the target object is located in the original image, and the second image is the area of the target object.
  • a segmented image of the first target area, and the third image is a segmented image of the second target area of the target object.
  • the original image refers to an image that needs to be extracted.
  • the target object refers to the object in the original image that needs to be separated to generate the target image.
  • the first image, the second image and the third image are essentially divided images, and the first image is a segmented image obtained by segmenting the entire target object. Therefore, the foreground area in the first image is the original image. All elements of the target object in the image.
  • the second image is a segmented image obtained by segmenting the local part of the first target area of the target object. Therefore, the foreground area in the second image is all elements in the first target area of the target object, except the first target area. The other regions belong to the background region in the second image.
  • the second image is a segmented image obtained by segmenting the local part of the second target area of the target object. Therefore, the foreground area in the third image is all elements in the second target area of the target object, except for the first target area. Areas other than a target area belong to the background area in the third image.
  • the target object may include at least one of a portrait of a person, an image of an animal, an image of a plant, and the like in the original image.
  • the foreground area of the target object in the first image is the area used to indicate the entire target object, and the first target area and the second target area are the areas of the local parts of the target object, the first target area and the second target area are Both target areas are sub-areas in the foreground area of the target object in the first image.
  • the first target area may be an area in the target object that needs to be refined.
  • the second target area may be an area of the target object that is related to the first target area and has a different matting refinement requirement than the first target area.
  • the matting refinement requirement of the second target area may be lower than the matting refinement requirement of the first target area.
  • the detail information of the second target area is lower than that of the first target area, and therefore, the matting refinement requirement is lower than that of the first target area.
  • the first target area and the second target area may be a hair area and a face area, respectively.
  • the first target area and the second target area may be a hair area and a head area, respectively.
  • the first target area and the second target area may be a leaf area and a branch area, respectively.
  • the target object may be a portrait of a real person, a portrait of a cartoon character, a portrait of an anime character, or the like.
  • the target ternary image includes a foreground area and a line drawing area, and the line drawing area is obtained by drawing lines on the outline of the foreground area. ; different sub-areas of the foreground area correspond to different line widths.
  • the sub-area refers to a partial area in the foreground area, including a first target area and a second target area.
  • matting processing refers to a process of separating the target object in the original image from the background area to obtain the target image.
  • a semantic segmentation method is used to obtain a plurality of segmented images containing different regions, and further, according to these segmented images, on the outline of the foreground region, the Lines of different widths are drawn to obtain a target ternary image, and finally a target image is generated based on the target ternary image.
  • a target ternary map because lines of different widths are used to draw on the outline of the foreground area, targeted matting of different regions can be realized.
  • the matting accuracy of the region can also be guaranteed, so that a fine and natural matting image can be finally obtained; in addition, the above-mentioned matting process is fully automated, which greatly improves the matting efficiency.
  • FIG. 4 is only the basic flow of the present application, and the solution provided by the present application will be further described below based on a specific implementation manner.
  • FIG. 5 is a flowchart of another image processing method provided according to an embodiment of the present application.
  • the first target area is a hair area; the second target area is a face area.
  • the application to a terminal is taken as an example for description. The method includes the following steps:
  • the terminal provides a cutout function
  • the user can perform a cutout operation on the terminal
  • the terminal obtains the original image in response to the cutout operation.
  • the original image is a local image stored on the terminal, or, the original image is an online image, and this embodiment of the present application does not limit the source of the original image.
  • an image processing interface for the original image is displayed on the terminal, and the image processing interface includes a cropping option, a cropping option, etc., the user can select the cropping option, and the terminal responds to the selecting operation , to obtain the original image.
  • an image processing interface is displayed on the terminal, and the image processing interface includes a cutout option, the user can select the cutout option, and the terminal displays an image selection interface in response to the selection operation. Click on the image you want to cut out, select the original image to cut out, and the terminal acquires the original image in response to the click operation.
  • the image segmentation model is used to calculate the semantic category of each pixel in the original image according to the input original image, so as to output at least one image of the original image.
  • the image segmentation model is an HRNET-OCR model, which is a computational model combining the HRNET model and the OCR model.
  • the calculation process of the HRNET-OCR model is as follows: first, extract the features of the original image through the HRNET model to obtain the feature information of the original image; secondly, input the obtained feature information into the backbone network of the OCR model; again , based on the OCR model, the semantic category of each pixel in the original image is calculated; for example, the semantic categories include hair, nose, eyes, torso, clothing, and buildings, etc.; finally, based on the semantic category of each pixel, output the At least one image of the original image.
  • the specific calculation process of the above HRNET-OCR model has been described in detail with reference to FIG. 1 and FIG. 2 , so it will not be repeated here.
  • At least one image of the original image may be output by adjusting some structures in the above HRNET-OCR model, and the embodiments of the present application do not limit the structure composition of the HRNET-OCR model.
  • the above image segmentation model may also be implemented by other network models, and the embodiment of the present application does not limit the type of the image semantic segmentation model.
  • the first image includes a foreground area where the target object is located in the original image
  • the second image is a segmented image of the target object's hair area
  • the third image is a segmented image of the target object's face area.
  • the terminal can obtain three segmented images, that is, the first image, the second image and the third image in this step.
  • each image includes two kinds of regions, and the two kinds of regions are respectively marked with different identification values.
  • the first image includes a foreground region and a background region, wherein each of the foreground regions
  • the identification value of the pixel point is 255, and the identification value of each pixel point in the background area is 0. It should be noted that, in practical applications, the developer can flexibly set the identification value according to requirements, which is not limited in this embodiment of the present application.
  • FIG. 6 is a schematic diagram of an image semantic segmentation result provided by an embodiment of the present application.
  • the original image is a portrait of a person
  • the image shown in (a) in FIG. 6 is the first image
  • the first image includes a foreground area 1 and a background area 2
  • the foreground area 1 contains the portrait of the person All elements of
  • the image shown in (b) in Figure 6 is the second image
  • the second image includes the hair area 3 and the background area 4
  • Figure 6 (c) shows the third image
  • the The third image includes a face area 5 and a background area 6 .
  • first ternary image based on the first image and the second image, where the first ternary image includes a foreground area, a first line drawing sub-area, and a second line drawing sub-area.
  • the first line-drawing sub-region covers the outline of the hair region on the side of the hair region close to the background region in the first image
  • the second line-drawing sub-region covers the non-hair region in the foreground region
  • the non-hair region is the region other than the hair region in the foreground region.
  • the first line width is greater than the second line width
  • the first line width is used for drawing the first line drawing sub-region
  • the second line width is used for drawing the second line drawing sub-region.
  • the first ternary image further includes a background area, and the identification values of the first line drawing sub-area and the second line drawing sub-area are different from the identification value of the foreground area and the identification value of the background area.
  • the identification value of each pixel in the foreground area is 255
  • the identification value of each pixel in the background area is 0
  • the identification value of each pixel in the first line drawing sub-region and the second line drawing sub-region is 128.
  • the developer can flexibly set the identification value of the line drawing area according to requirements, which is not limited in this embodiment of the present application.
  • FIG. 7 is a schematic diagram of a first ternary graph provided by an embodiment of the present application.
  • the first ternary diagram includes a foreground area 7 , a background area 8 , a first line drawing sub-area 9 and a second line drawing sub-area 10 , wherein the first line drawing sub-area 9 is drawn according to the first line width, and the second line drawing sub-area 10 is drawn according to the second line width.
  • the complete contour line of the foreground area refers to the boundary line between the foreground area and the background area.
  • the terminal obtains the complete contour line of the foreground area in the first image through the contour detection algorithm.
  • the above-mentioned contour detection algorithm may be implemented by the findContours function, which is not limited in this embodiment of the present application.
  • the second ternary image includes a foreground area and a third line-drawing sub-area, and the third line-drawing sub-area covers the complete contour line of the foreground area.
  • the second line width is calculated from the dimensions of the original image. In some embodiments, the second line width can be calculated by the following formula (1):
  • S is the width of the second line; width and height are the width and height of the original image, respectively; min() refers to the minimum value function; min(width, height) indicates that the minimum value is selected from the width and height of the original image; N is the default line size, for example, N may be 17, which is not limited in this embodiment of the present application.
  • the terminal draws a line on the complete contour line according to the second line width through the contour drawing algorithm, and the identification value of the line is different from the identification value of the foreground area and the background area.
  • the above-mentioned contour drawing algorithm can be implemented by the drawContours function. For example, taking the identification value of the foreground area as 255 and the identification value of the background area as 0, the following formula (2) is used to realize the completeness of the foreground area.
  • segResult is the first image
  • contours is the complete contour line of the foreground area detected by the findContours function
  • -1 indicates that all contour lines are operated
  • Scalar is the identification value
  • Scalar(128, 128, 128) indicates that the The color values of the R, G, and B channels in the RGB channel are all set to 128
  • S is the second line width.
  • the above method of drawing a line is to operate on the obtained complete contour line, that is, to cover the complete contour line.
  • the complete contour line obtained by the findContours function includes pixel points A1 to A10, then these pixel points are operated to realize line drawing. That is, the third line drawing sub-region obtained by drawing lines covers both the foreground region in the first image and the background region in the first image.
  • FIG. 8 is a schematic diagram of a second ternary graph provided by an embodiment.
  • the left image in FIG. 8 shows the first image
  • the right image in FIG. 8 shows the second ternary image.
  • the second ternary image includes a foreground area 11, a background area 12 and a third line drawing sub-area 13.
  • the third line drawing sub-region 13 is drawn according to the second line width.
  • the complete outline of the hair region refers to the boundary between the hair region and the background region.
  • the terminal acquires the complete contour line of the hair region in the second image through the contour detection algorithm.
  • the above-mentioned contour detection algorithm may be implemented by the findContours function, which is not limited in this embodiment of the present application.
  • the third ternary image includes a hair region and a fourth line-drawing sub-region, and the fourth line-drawing sub-region covers the complete contour line of the hair region.
  • the width of the first line is M times the width of the second line, wherein M is greater than 1.
  • the width of the first line is three times the width of the second line; that is, when the width of the second line is S, the width of the first line is S*3, which is not limited in this embodiment of the present application.
  • segResultHair is the second image
  • contours is the complete contour line of the hair area detected by the findContours function
  • -1 indicates that all contour lines are operated
  • Scalar is the identification value
  • Scalar(128, 128, 128) indicates that the The color values of the R, G, and B channels in the RGB channel are all set to 128
  • S*3 is the first line width.
  • FIG. 9 is a schematic diagram of a third ternary graph provided by an embodiment.
  • the left image in FIG. 9 shows the second image, and the right image in FIG. 9 shows the third ternary image.
  • the third ternary image includes a foreground area 14, a background area 15, and a fourth line drawing sub-area 16.
  • the fourth line drawing sub-region 16 is drawn according to the first line width.
  • merging the second ternary graph and the third ternary graph means: taking the maximum identification value of the same position in the two ternary graphs as the identification value of the corresponding position in the first ternary graph.
  • Step A Obtain the first identification value of each pixel in the second ternary image, where the first identification value is used to identify the color of the pixel in the second ternary image.
  • Step B Obtain the second identification value of each pixel point in the third ternary diagram, where the second identification value is used to identify the color of the pixel point in the third ternary diagram.
  • Step C Generate a first ternary graph based on the magnitude relationship between the first identification value and the second identification value.
  • step C includes: comparing the first identification value of the pixel point at any position in the second ternary diagram with the second identification value of the pixel point at the same position in the third ternary diagram; The largest of the identification value and the second identification value is used as the third identification value of the pixel at the same position in the first ternary diagram, and the third identification value is used to identify the color of the pixel in the first ternary diagram.
  • Pixel result Pixel leftUp > Pixel leftDown ? Pixel leftUp : Pixel leftDown (4)
  • Pixel result is the first ternary image, which is the right image in Figure 7;
  • Pixel leftUp is the second ternary image, which is the upper left image in Figure 7;
  • Pixel leftDown is the third ternary image, That is, the lower left image in FIG. 7 .
  • the first line drawing sub-region 9 corresponding to the hair region is larger than the second line drawing sub-region 10 corresponding to other regions,
  • the target ternary image includes a foreground area and a line drawing area, and the line drawing area is obtained by drawing lines on the outline of the foreground area; different sub-areas of the foreground area correspond to different line widths.
  • the foreground region further includes a torso region of the target object, wherein, in the target ternary diagram, the line width corresponding to the hair region is larger than the line width corresponding to the torso region, and the line width corresponding to the torso region is larger than the face The width of the line corresponding to the area.
  • FIG. 10 is a schematic diagram of a target ternary graph provided by an embodiment.
  • the upper right picture in FIG. 10 shows the target ternary graph, which includes a foreground area 17 , a background area 18 and a line drawing area 19 .
  • the line width corresponding to the hair area is larger than the line width corresponding to the torso area
  • the line width corresponding to the torso area is larger than the line width corresponding to the face area.
  • the relationship between the line widths in the line drawing area 19 can continue to refer to the lower right figure in FIG. 10, which includes the line drawing areas 19a, 19b and 19c.
  • 19a represents the line width corresponding to the hair area
  • 19b represents the line width.
  • the line width corresponding to the face area, 19c represents the line width corresponding to the torso area. As shown in the figure, the line width 19a corresponding to the hair area is larger than the line width 19c corresponding to the torso area, and the line width 19c corresponding to the torso area is larger than that corresponding to the face area.
  • the line width is 19b.
  • the terminal determines the target overlapping area in the first ternary image based on the pixel position of the face region in the third image.
  • the figure includes the target overlapping area 20, which is the face area in the third image and the second in the first ternary image.
  • the identification value of the target overlapping area is the identification value of the second line drawing sub-area
  • the identification value of the target overlapping area in the first ternary diagram is changed to The target identification value assigns the pixel points in this area to generate the target ternary map. For example, taking the identification value of the face area as 255 and the identification value of the second line drawing sub-region as 128, in the first ternary diagram, the identification value of the target overlapping area was originally 128. The pixel points in the target overlapping area are reassigned with the identification value of 255 to obtain the target ternary map.
  • the above steps 5051 and 5052 can be implemented by the following formula (5):
  • Pixel Pixel ⁇ Face ⁇ ? 255: Pixel trimp (5)
  • ⁇ Face ⁇ represents the face area
  • 255 represents the target identification value
  • Pixel trimp is the target ternary image.
  • the terminal After the above steps 501 to 505, after acquiring the original image, the terminal automatically generates a target ternary image, and in the target ternary image, the line widths corresponding to different sub-areas of the foreground area are different.
  • the line drawing area in the target ternary image is the foreground and background mixed area.
  • the terminal automatically divides the hair area and the areas other than the hair area according to different line widths. Line drawing in other areas ensures the matting range of complex areas such as the hair area, and improves the matting accuracy of this part of the area; at the same time, the pixels belonging to the face area are assigned the same target identification value as the foreground area. , which takes into account the protection of key areas in the portrait, and avoids the loss of details in the cutout.
  • the matting model is used to calculate the probability that each pixel in the original image belongs to the target image according to the input target ternary image and the original image, so as to output the transparency.
  • transparency is calculated by the following formula (6):
  • I represents the original image
  • F represents the foreground, that is, the area that includes all the elements of the target object
  • B represents the background
  • is the transparency, which is used to represent the proportion of the foreground color in the original image.
  • Formula (6) shows that the original image is composed of the foreground and background superimposed according to a certain transparency.
  • the above-mentioned matting model may be an IndexNet matting model, or, the above-mentioned matting model may also be a GCAMatting matting model, or, the above-mentioned matting model may also be a ContextNet model, etc.
  • the specific type of the above cutout model is not limited.
  • FIG. 11 is a schematic diagram of a matting model provided by an embodiment of the present application.
  • the target ternary image and the original image are used as inputs to obtain a rough Alpha (that is, a rough Alpha).
  • the fine result of that is, the Alpha value of each pixel.
  • the matting process in step 508 refers to the process of separating the target object in the original image from the background based on the transparency of each pixel to obtain the target image included.
  • FIG. 12 is a schematic diagram of a target image provided by an embodiment of the present application.
  • the left picture in Fig. 12 is the original image
  • the upper right picture in Fig. 12 is the target image obtained by this method.
  • the lower right figure shows the target image obtained according to the image segmentation method in the related art.
  • the image segmentation is accurate, the hair tips of the portrait are very rough, and there is a loss of details on the face.
  • a semantic segmentation method is used to obtain a plurality of segmented images containing different regions, and further, according to these segmented images, on the outline of the foreground region, the Lines of different widths are drawn to obtain a target ternary image, and finally a target image is generated based on the target ternary image.
  • a target ternary map because lines of different widths are used to draw on the outline of the foreground area, targeted matting of different regions can be realized.
  • the matting accuracy of the region can also be guaranteed, so that a fine and natural matting image can be finally obtained; in addition, the above-mentioned matting process is fully automated, which greatly improves the matting efficiency.
  • an original image is obtained, and the original image is an image including a portrait of a person.
  • the first segmented image includes the foreground area, which contains all elements of the portrait; the second segmented image Including the hair area, because the edge lines between the human torso and the background are relatively clear, and because of the characteristics of its shape, the hair often merges with the background more seriously, so it is necessary to focus on matting; the third segmented image includes the face area, which can also be understood as In the protection area, the face is an important part of the portrait. If it is accidentally cut out and injured, it will greatly affect the look and feel. It is necessary to protect this part of the area from being cut out by the cutout.
  • the second ternary graph and the third ternary graph are merged to obtain the merged first ternary graph.
  • the target image is finally obtained, that is, the portrait of the person in the original image.
  • application scenarios of the image processing method provided by the embodiments of the present application include but are not limited to:
  • the terminal provides an emoticon package making function for portraits through an application, and the user performs an operation on the terminal to input the original image of the portrait that the user wants to extract.
  • the terminal adopts the image processing method provided in the embodiment of the present application to automatically extract the portrait of the person in the original image, and display it on the terminal for the user to follow up on the basis of the portrait of the person. Perform other image processing operations to obtain the emoticon package that the user wants.
  • the process of extracting a person's portrait by the terminal includes the following steps 1 to 8:
  • the terminal obtains the original image.
  • the terminal inputs the original image into the image segmentation model.
  • the terminal acquires the first image, the second image and the third image output by the image segmentation model.
  • the first image includes a foreground area where a portrait of a person is located in the original image
  • the second image includes a hair area of the portrait
  • the third image includes a face area of the portrait.
  • the terminal generates a first ternary image based on the first image and the second image, where the first ternary image includes a foreground area, a first line drawing sub-area, and a second line drawing sub-area.
  • the terminal generates a target ternary image based on the third image and the first ternary image.
  • the terminal inputs the target ternary image and the original image into the matting model.
  • the terminal obtains the transparency of the output of the matting model, and the transparency is used to represent the probability that the pixel belongs to the portrait of the person.
  • the terminal performs matting processing on the original image based on the transparency to obtain a target image including the portrait of the person. Subsequently, the user makes an emoticon package based on the target image.
  • the image processing method provided by the embodiment of the present application can realize the automatic extraction of the portrait of the person, and the effect of the extracted portrait of the person is fine and natural, which can meet the user's personalized needs for the production of expression packs.
  • the host may wish to hide the real background environment he is in, and then only display the host's portrait in the live broadcast, or add other virtual backgrounds based on the host's portrait.
  • the terminal provides a character portrait mode during the live broadcast, and the host enables the character portrait mode to enable the terminal to acquire each frame of the original image captured by the camera in real time, and then use the image processing method provided by the embodiment of the present application to transform each frame of the original image.
  • the portrait of the anchor in the original image of the frame is extracted, and the live broadcast screen is generated in real time for live broadcast.
  • the specific process of extracting the portrait of the person by the terminal is similar to the above-mentioned scenario 1, so it is not repeated here.
  • the image processing method provided by the embodiment of the present application realizes automatic portrait extraction, it can be directly applied to such a scene that requires real-time extraction of a portrait of a person.
  • FIG. 14 is a schematic structural diagram of an image processing apparatus provided according to an embodiment of the present application.
  • the apparatus is used to execute the steps of the above-mentioned image processing method.
  • the apparatus includes: an image segmentation module 1401 , a ternary image generation module 1402 , and a matting module 1403 .
  • the image segmentation module 1401 is used to perform image semantic segmentation on the original image to obtain a first image, a second image and a third image, the first image includes the foreground area where the target object is located in the original image, and the second image includes the the hair region of the target object, the third image includes the face region of the target object;
  • the ternary image generation module 1402 is configured to generate a target ternary image based on the first image, the second image and the third image, where the target ternary image includes the foreground area and the drawing area, and the drawing area is Obtained by drawing lines on the outline of the foreground area; different sub-areas of the foreground area correspond to different line widths;
  • the matting module 1403 is configured to perform matting processing on the original image based on the target ternary image to obtain a target image including the target object.
  • the foreground area further includes a torso area of the target object, wherein, in the target ternary diagram, the line width corresponding to the hair area is greater than the line width corresponding to the torso area, and the line corresponding to the torso area The width is larger than the line width corresponding to the face area.
  • the ternary graph generation module 1402 includes:
  • a first generating unit for generating a first ternary image based on the first image and the second image, the first ternary image including the foreground area, the first line drawing sub-region and the second line drawing sub-region;
  • the first line drawing sub-area covers the contour line of the side of the hair area close to the background area
  • the second line drawing sub-region covers the contour lines of other areas, and the other areas are the foreground areas except for the The area outside the hair area
  • the first line width is greater than the second line width
  • the first line width is used to draw the first line drawing sub-area
  • the second line width is used to draw the second line drawing sub-area
  • the second generating unit is configured to generate the target ternary image based on the third image and the first ternary image.
  • the first generating unit is configured to: in the first image, obtain a complete contour line of the foreground area; and draw a line on the complete contour line of the foreground area according to the second line width to obtain a second ternary image; wherein, the second ternary image includes the foreground area and a third line-drawing sub-area, and the third line-drawing sub-area covers the complete contour line of the foreground area; in the second image , obtain the complete contour line of the hair area; according to the width of the first line, draw a line on the complete contour line of the hair area to obtain a third ternary image; wherein, the third ternary image includes the hair area and the third ternary image.
  • Four line drawing sub-regions, the fourth line drawing sub-region covers the complete contour line of the hair region; the second ternary image and the third ternary image are combined to obtain the first ternary image.
  • the first line width is M times the second line width, and M is greater than 1.
  • the first generating unit is further configured to: obtain a first identification value of each pixel in the second ternary diagram, where the first identification value is used to identify the pixel point in the second ternary diagram color; obtain the second identification value of each pixel in the third ternary diagram, and the second identification value is used to identify the color of the pixel in the third ternary diagram; based on the first identification value and the second identification The magnitude relationship between the values generates the first ternary map.
  • the first generating unit is further configured to: the first identification value of the pixel point at any position in the second ternary diagram is the same as the second identification value of the pixel point at the same position in the third ternary diagram The identification values are compared; the maximum of the first identification value and the second identification value is used as the third identification value of the pixel point at the same position in the first ternary diagram, and the third identification value is used to identify the The color of the pixel in the first ternary image.
  • the second generating unit is configured to: determine, based on the face region in the third image, a target overlapping region of the first ternary image, where the target overlapping region is the face region and the first ternary image.
  • the overlapping area of the two-line sub-area; the pixel points of the target overlapping area are assigned with the target identification value, and the target ternary map is generated, and the target identification value is used to identify the color of the pixels in the face area.
  • the matting module 1403 is configured to: based on the target ternary map, obtain the transparency of each pixel in the original image, where the transparency is used to represent the probability that the pixel belongs to the target object; based on the transparency The original image is cut out to obtain the target image.
  • the image segmentation module 1401 is further used for: acquiring the original image; inputting the original image into an image segmentation model, wherein the image segmentation model is used for, according to the input original image, the original image Calculate the semantic category of each pixel in the original image to output at least one image of the original image; obtain the first image, the second image and the third image output by the image segmentation model.
  • the matting module 1403 is further configured to: input the target ternary image and the original image into a matting model, where the matting model is used to input the target ternary image and the original image according to the input , calculate the probability that each pixel in the original image belongs to the target image to output the transparency; obtain the transparency output by the matting model.
  • a semantic segmentation method is used to obtain a plurality of segmented images containing different regions, and further, according to these segmented images, on the outline of the foreground region, the Lines of different widths are drawn to obtain a target ternary image, and finally a target image is generated based on the target ternary image.
  • a target ternary map because lines of different widths are used to draw on the outline of the foreground area, targeted matting of different regions can be realized.
  • the matting accuracy of the region can also be guaranteed, so that a fine and natural matting image can be finally obtained; in addition, the above-mentioned matting process is fully automated, which greatly improves the matting efficiency.
  • the image processing apparatus when the image processing apparatus provided in the above-mentioned embodiments performs image processing, only the division of the above-mentioned functional modules is used as an example for illustration. That is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
  • Each module in the above apparatus may be implemented in whole or in part by software, hardware and combinations thereof.
  • the above modules can be embedded in or independent of the processor in the computer device in the form of hardware, or stored in the memory in the computer device in the form of software, so that the processor can call and execute the operations corresponding to the above modules.
  • the image processing apparatus and the image processing method embodiments provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
  • FIG. 15 shows a schematic structural diagram of a terminal 1500 provided by an exemplary embodiment of the present application.
  • the terminal 1500 can be: a smart phone, a tablet computer, an MP3 player (Moving Picture Experts Group Audio Layer III, the standard audio level 3 of the moving picture expert compression), MP4 (Moving Picture Experts Group Audio Layer IV, the moving picture expert compressed standard audio Level 4) Player, laptop or desktop computer.
  • Terminal 1500 may also be called user equipment, portable terminal, laptop terminal, desktop terminal, and the like by other names.
  • the terminal 1500 includes: one or more processors 1501 and a memory 1502 .
  • the processor 1501 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like.
  • the processor 1501 can use at least one hardware form among DSP (Digital Signal Processing, digital signal processing), FPGA (Field-Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, programmable logic array) accomplish.
  • the processor 1501 may also include a main processor and a coprocessor.
  • the main processor is a processor used to process data in the wake-up state, also called CPU (Central Processing Unit, central processing unit); the coprocessor is A low-power processor for processing data in a standby state.
  • the processor 1501 may be integrated with a GPU (Graphics Processing Unit, image processor), and the GPU is used for rendering and drawing the content that needs to be displayed on the display screen.
  • the processor 1501 may further include an AI (Artificial Intelligence, artificial intelligence) processor, where the AI processor is used to process computing operations related to machine learning.
  • AI Artificial Intelligence, artificial intelligence
  • Memory 1502 may include one or more computer-readable storage media, which may be non-transitory. Memory 1502 may also include high-speed random access memory, as well as non-volatile memory, such as one or more disk storage devices, flash storage devices. In some embodiments, a non-transitory computer-readable storage medium in memory 1502 is used to store at least one computer-readable instruction for execution by one or more processors 1501 to implement The image processing methods provided by the method embodiments in this application.
  • the terminal 1500 may further include: a peripheral device interface 1503 and at least one peripheral device.
  • One or more of the processor 1501, the memory 1502 and the peripheral device interface 1503 may be connected through a bus or a signal line.
  • Each peripheral device can be connected to the peripheral device interface 1503 through a bus, a signal line or a circuit board.
  • the peripheral device includes: at least one of a radio frequency circuit 1504 , a display screen 1505 , a camera assembly 1506 , an audio circuit 1507 , a positioning assembly 1508 and a power supply 1509 .
  • the peripheral device interface 1503 may be used to connect at least one peripheral device related to I/O (Input/Output) to the one or more processors 1501 and the memory 1502 .
  • the radio frequency circuit 1504 is used for receiving and transmitting RF (Radio Frequency, radio frequency) signals, also called electromagnetic signals.
  • the radio frequency circuit 1504 communicates with communication networks and other communication devices via electromagnetic signals.
  • the display screen 1505 is used for displaying UI (User Interface, user interface).
  • the UI can include graphics, text, icons, video, and any combination thereof.
  • the display screen 1505 also has the ability to acquire touch signals on or above the surface of the display screen 1505 .
  • the touch signal may be input to one or more processors 1501 as a control signal for processing.
  • the display screen 1505 may also be used to provide virtual buttons and/or virtual keyboards, also referred to as soft buttons and/or soft keyboards.
  • the display screen 1505 there may be one display screen 1505, which is arranged on the front panel of the terminal 1500; in other embodiments, there may be at least two display screens 1505, which are respectively arranged on different surfaces of the terminal 1500 or in a folded design; In other embodiments, the display screen 1505 may be a flexible display screen, which is disposed on a curved surface or a folding surface of the terminal 1500 . Even, the display screen 1505 can also be set as a non-rectangular irregular figure, that is, a special-shaped screen.
  • the display screen 1505 can be prepared by using materials such as LCD (Liquid Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, organic light emitting diode).
  • the camera assembly 1506 is used to capture images or video.
  • Audio circuitry 1507 may include a microphone and speakers.
  • the microphone is used to collect the sound waves of the user and the environment, convert the sound waves into electrical signals and input them to one or more processors 1501 for processing, or input them to the radio frequency circuit 1504 to realize voice communication.
  • Speakers are used to convert electrical signals from one or more processors 1501 or radio frequency circuits 1504 into sound waves.
  • the positioning component 1508 is used to locate the current geographic location of the terminal 1500 to implement navigation or LBS (Location Based Service).
  • LBS Location Based Service
  • the power supply 1509 is used to power various components in the terminal 1500 .
  • the terminal 1500 also includes one or more sensors 1510 .
  • the one or more sensors 1510 include, but are not limited to, an acceleration sensor 1511 , a gyro sensor 1512 , a pressure sensor 1513 , a fingerprint sensor 1514 , an optical sensor 1515 , and a proximity sensor 1516 .
  • the acceleration sensor 1511 can detect the magnitude of acceleration on the three coordinate axes of the coordinate system established by the terminal 1500 .
  • the gyroscope sensor 1512 can detect the body direction and rotation angle of the terminal 1500 , and the gyroscope sensor 1512 can cooperate with the acceleration sensor 1511 to collect 3D actions of the user on the terminal 1500 .
  • the pressure sensor 1513 may be disposed on the side frame of the terminal 1500 and/or the lower layer of the display screen 1505 .
  • the fingerprint sensor 1514 is used to collect the user's fingerprint, and the one or more processors 1501 identify the user's identity according to the fingerprints collected by the fingerprint sensor 1514, or the fingerprint sensor 1514 identifies the user's identity according to the collected fingerprints.
  • Optical sensor 1515 is used to collect ambient light intensity.
  • the one or more processors 1501 may control the display brightness of the display screen 1505 according to the ambient light intensity collected by the optical sensor 1515 .
  • a proximity sensor 1516 also called a distance sensor, is usually provided on the front panel of the terminal 1500.
  • the proximity sensor 1516 is used to collect the distance between the user and the front of the terminal 1500 .
  • FIG. 15 does not constitute a limitation on the terminal 1500, and may include more or less components than the one shown, or combine some components, or adopt different component arrangements.
  • Embodiments of the present application further provide one or more computer-readable storage media, where the computer-readable storage media is applied to a computer device, and at least one computer-readable instruction is stored in the computer-readable storage medium, and the at least one computer-readable storage medium is The instructions are loaded and executed by one or more processors to implement the operations performed by the computer device in the image processing method of the above-described embodiments.
  • Embodiments of the present application further provide a computer-readable instruction product or computer-readable instruction, where the computer-readable instruction product or computer-readable instruction includes computer-readable instruction code, and the computer-readable instruction code is stored in a computer-readable storage in the medium.
  • One or more processors of the computer device read the computer-readable instruction code from the computer-readable storage medium, and the one or more processors execute the computer-readable instruction code, so that the computer device executes the various optional implementations described above Image processing methods provided in .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

一种图像处理方法,包括:在对原始图像进行抠图时,首先采用语义分割的方式得到多个包含有不同区域的分割图像,进一步地,根据这些分割图像,在前景区域的轮廓线上,采用不同宽度的线条进行绘制,以得到目标三元图,最终基于该目标三元图,生成目标图像。针对上述目标三元图,由于在前景区域的轮廓线上,采用了不同宽度的线条进行绘制,能够实现对不同区域的针对性抠图,对于需要进行精细化抠图的区域,能够提高这部分区域的抠图精度,同时还能够保证其他区域的抠图精度,使得最终得到效果精细且自然的抠图图像。

Description

图像处理方法、装置、设备、存储介质及计算机程序产品
本申请要求于2021年01月18日提交中国专利局,申请号为2021100625671,申请名称为“图像处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能技术领域,特别涉及一种图像处理方法、装置、设备、存储介质及计算机程序产品。
背景技术
随着计算机技术的发展,图像处理越来越普遍。其中,抠图(Image Matting)是一种应用非常广泛的图像处理技术,具体指的是,将图像中的前景区域从图像中的背景区域中分离出来。
在相关技术中,通常采用分割(Segmentation)的方法实现抠图,具体是对图像中的每一个像素点进行分类,从而获取各个不同类别的块状分割结果,由此得到图像中的前景区域,比如人像区域,或者建筑区域等。
然而,采用上述方法,对于每个像素点都会给出一个固定的分类,容易造成前景区域的边缘较为粗糙,导致抠图效果较差。
发明内容
本申请实施例提供了一种图像处理方法、装置、设备、存储介质及计算机程序产品。
一方面,提供了一种图像处理方法,由计算机设备执行,该方法包括:
对原始图像进行图像语义分割,得到第一图像、第二图像以及第三图像,该第一图像中的前景区域是该原始图像中目标对象所在的区域,该第二图像包是该目标对象的第一目标区域的分割图像,该第三图像是该目标对象的第二目标区域的分割图像;所述前景区域的子区域包括所述第一目标区域和所述第二目标区域;
基于该第一图像、该第二图像以及该第三图像,生成目标三元图,该目标三元图包括该前景区域和画线区域,该画线区域是通过在该前景区域的轮廓线上绘制线条得到的;该前景区域的不同子区域对应不同的线条宽度;
基于该目标三元图,对该原始图像中的该目标对象进行抠图处理,得到包括该目标对象的目标图像。
另一方面,提供了一种图像处理装置,该装置包括:
图像分割模块,用于对原始图像进行图像语义分割,得到第一图像、第二图像以及第三图像,该第一图像中的前景区域是该原始图像中目标对象所在的区域,该第二图像是该目标对象的第一目标区域的分割图像,该第三图像是该目标对象的第二目标区域的分割图像;该前景区域的子区域包括第一目标区域和第二目标区域;
三元图生成模块,用于基于该第一图像、该第二图像以及该第三图像,生成目标三元图,该目标三元图包括该前景区域和画线区域,该画线区域是通过在该前景区域的轮廓线上绘制线条得到的;该前景区域的不同子区域对应不同的线条宽度;
抠图模块,用于基于该目标三元图,对该原始图像中的该目标对象进行抠图处理,得到包括该目标对象的目标图像。
另一方面,提供了一种计算机设备,该计算机设备包括一个或多个处理器和存储器,该存储器用于存储至少一条计算机可读指令,该至少一段计算机可读指令由该一个或多个处理器加载并执行以实现本申请实施例中的图像处理方法中所执行的操作。
另一方面,提供了一个或多个计算机可读存储介质,该计算机可读存储介质中存储有至少一条计算机可读指令,该至少一条计算机可读指令由一个或多个处理器加载并执行以实现如本申请实施例中图像处理方法中所执行的操作。
另一方面,提供了一种计算机程序产品,该计算机程序产品包括计算机可读指令,该计算机可读指令存储在计算机可读存储介质中。计算机设备的一个或多个处理器从计算机可读存储介质读取该计算机可读指令,一个或多个处理器执行该计算机可读指令,使得该计算机设备执行上述各实施例中提供的图像处理方法。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是根据本申请实施例提供的一种高分辨网络的结构示意图;
图2是根据本申请实施例提供的一种物体上下文特征表示的结构示意图;
图3是根据本申请实施例提供的一种图像处理方法的实施环境示意图;
图4是根据本申请实施例提供的一种图像处理方法的流程图;
图5是根据本申请实施例提供的另一种图像处理方法的流程图;
图6是根据本申请实施例提供的一种图像语义分割结果的示意图;
图7是根据本申请实施例提供的一种第一三元图的示意图;
图8是根据本申请实施例提供的一种第二三元图的示意图;
图9是根据本申请实施例提供的一种第三三元图的示意图;
图10是根据本申请实施例提供的一种目标三元图的示意图;
图11是根据本申请实施例提供的一种抠图模型的示意图;
图12是根据本申请实施例提供的一种目标图像的示意图;
图13是根据本申请实施例提供的一种图像处理方法的示意图;
图14是根据本申请实施例提供的一种图像处理装置的结构示意图;
图15是根据本申请实施例提供的一种终端的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
这里将详细地对示例性实施例进行说明,其示例表示在附图中。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的装置和方法的例子。
本申请中术语“第一”及“第二”等字样用于对作用和功能基本相同的相同项或相似项进行区分,应理解,“第一”、“第二”、“第n”之间不具有逻辑或时序上的依赖关系,也不对数量和执行顺序进行限定。还应理解,尽管以下描述使用术语第一、第二等来描述各种元素,但这些元素不应受术语的限制。
这些术语只是用于将一个元素与另一个元素区别开。例如,在不脱离各种示例的范围的情况下,第一图像能够被称为第二图像,并且类似地,第二图像也能够被称为第一图像。第一图像和第二图像都可以是图像,并且在某些情况下,可以是单独且不同的图像。
其中,至少一个是指一个或一个以上,例如,至少一个图像可以是一个图像、两个图像、三个 图像等任意大于等于一的整数个图像。而多个是指两个或者两个以上,例如,多个图像可以是两个图像、三个图像等任意大于等于二的整数个图像。
本申请实施例提供的图像处理方案可能用到人工智能技术中的计算机视觉技术。比如,本申请中的语义分割处理就用到了计算机视觉技术。具体地,在本申请各实施例提供的图像处理方案中,可以使用高分辨网络提取图像特征信息,并使用物体上下文特征表示(Object-Contextual Representations,OCR)技术计算图像中各个像素的语义类别。
高分辨网络(High Resolution Network,HRNET)是一种用于获取图像特征信息的计算模型,在运算全部过程中都可以保持高分辨表征。HRNET始于一组高分辨率卷积,然后逐步添加低分辨率的卷积分支,并将它们以并行的方式连接起来。参考图1,图1是本申请提供的一种高分辨网络的结构示意图,如图1所示,该网络将不同分辨率的特征图谱(feature map)并联,各个分辨率分别一路,在整个过程中并行的运算组合间通过多分辨率融合不断地交换着信息。
OCR是一种用于表征图像中像素的语义类别的计算模型。
参考图2,图2是本申请提供的一种物体上下文特征表示的结构示意图,如图2所示:第一,通过主干网络的中间层得到一个粗略的语义分割结果,即软物体区域(Soft Object Regions);第二,通过主干网络的深层所输出的像素特征(Pixel Representation)和软物体区域计算得到K组向量,K>1,即物体区域表示(Object Region Representations),其中,每一个向量对应一个语义类别的特征表示;第三,计算像素特征与物体区域特征表示之间的关系矩阵;第四,根据每个像素的像素特征和物体区域特征表示在关系矩阵中的数值,将每个物体区域特征进行加权求和,得到物体的上下文特征表示,也即是OCR;最后,基于OCR与像素特征得到作为上下文信息增强的特征表示(Augmented Representation),该增强后的特征表示能够用于预测每个像素的语义类别。
下面简单介绍一下本申请实施例提供的图像处理方案可能用到的关键术语或缩略语。
语义分割(Semantic Segmentation):针对输入的图像,基于对每个像素的语义理解,将相同语义的像素分割为同一部分或区域,得到若干个不同语义区域的过程。
前景(Foreground):图像中的主体,如人像拍摄中的人像。
背景(Background):图像中主体所处的环境,如人像拍摄中人物所处的风景、道路、建筑等。
抠图(Image Matting):一种将图像的前景从背景中分离出来的图像处理技术。
三元图(Trimap):包含有前景、背景及前后景混合区域三种标记的图像,通常与原始图像一并作为抠图模型的输入。需要说明的是,在下述实施例中,前后景混合区域也称为画线区域。
标识值:一种用于标识图像中像素点颜色的数值。例如,一个像素点的标识值为255,表明该像素点的RGB(Red-Green-Blue,红绿蓝)颜色值为(255,255,255),表现为白色;又例如,一个像素点的标识值为0,表明该像素点的RGB颜色值为(0,0,0),表现为黑色;再例如,一个像素点的标识值为128,表明该像素点的RGB颜色值为(128,128,128),表现为灰色。
开源计算机视觉(Open Source Computer Vision Library,OpenCV):一种跨平台计算机视觉和机器学习软件库,可以运行在各种操作系统上。OpenCV可用于开发实时的图像处理、计算机视觉以及模式识别程序。
findContours:OpenCV中的一种用于在图像中检测轮廓的函数。
drawContours:OpenCV中的一种用于在图像中绘制轮廓的函数。
抠图模型:一种计算模型,用于根据原始图像和三元图,对原始图像中各个像素点属于前景的概率进行计算。例如,抠图模型有IndexNet模型、GCAMatting模型以及ContextNet模型等。
下面对本申请实施例提供的图像处理方法的实施环境进行介绍。
图3是根据本申请实施例提供的图像处理方法的实施环境示意图。该实施环境包括:终端301 和服务器302。
终端301和服务器302能够通过有线或无线通信方式进行直接或间接地连接,本申请在此不做限制。在一些实施例中,终端301是智能手机、平板电脑、笔记本电脑、台式计算机等,但并不局限于此。终端301能够安装和运行有应用程序。在一些实施例中,该应用程序是社交类应用程序、图像处理类应用程序或者拍摄类应用程序等。示意性地,终端301是用户使用的终端,终端301中运行有社交类应用程序,用户可通过该社交类应用程序对图片中的人像进行抠取。
服务器302能够是独立的物理服务器,也能够是多个物理服务器构成的服务器集群或者分布式系统,还能够是提供云服务、云数据库、云计算、云函数、云存储、网络服务、云通信、中间件服务、域名服务、安全服务、内容分发网络(Content Delivery Network,CDN)、以及大数据和人工智能平台等基础云计算服务的云服务器。服务器302用于为终端301运行的应用程序提供后台服务。
在一些实施例中,在图像处理的过程中,服务器302承担主要计算工作,终端301承担次要计算工作;或者,服务器302承担次要计算工作,终端301承担主要计算工作;或者,服务器302或终端301分别能够单独承担计算工作。
在一些实施例中,终端301泛指多个终端中的一个,本实施例仅以终端301来举例说明。本领域技术人员能够知晓,上述终端301的数量能够更多。比如上述终端301为几十个或几百个,或者更多数量,此时上述图像处理方法的实施环境还包括其他终端。本申请实施例对终端的数量和设备类型不加以限定。
在一些实施例中,上述的无线网络或有线网络使用标准通信技术和/或协议。网络通常为因特网、但也能够是任何网络,包括但不限于局域网(Local Area Network,LAN)、城域网(Metropolitan Area Network,MAN)、广域网(Wide Area Network,WAN)、移动、有线或者无线网络、专用网络或者虚拟专用网络的任何组合)。在一些实施例中,使用包括HTML、可扩展标记语言(Extensible Markup Langu-age,XML)等的技术和/或格式来代表通过网络交换的数据。此外还能够使用诸如安全套接字层(Secure Socket Layer,SSL)、传输层安全(Transport Layer Secur-ity,TLS)、虚拟专用网络(Virtual Private Network,VPN)、网际协议安全(Internet Protocol Security,IPsec)等常规加密技术来加密所有或者一些链路。在另一些实施例中,还能够使用定制和/或专用数据通信技术取代或者补充上述数据通信技术。
在本申请实施例中,提供了一种图像处理方法,能够满足各种场景下对目标对象抠取的需求。而且,针对包括目标对象的原始图像,能够自动从中抠取出该目标对象比较细节的第一目标区域和第二目标区域等局部区域,抠图效果精细且自然。图4是根据本申请实施例提供的一种图像处理方法的流程图。如图4所示,在本申请实施例中以应用于计算机设备为例进行说明。计算机设备可以是终端或服务器,该方法包括以下步骤:
401、对原始图像进行图像语义分割,得到第一图像、第二图像以及第三图像,该第一图像中的前景区域是原始图像中目标对象所在的区域,该第二图像是该目标对象的第一目标区域的分割图像,该第三图像是目标对象的第二目标区域的分割图像。
在本申请实施例中,原始图像是指需要进行图像抠取的图像。目标对象是指原始图像中的需要被分离出来生成目标图像的对象。
需要说明的是,第一图像、第二图像和第三图像实质上都属于分割图像,第一图像是对整个目标对象进行分割得到的分割图像,所以,第一图像中的前景区域,是原始图像中目标对象的所有元素。第二图像是对目标对象的第一目标区域这一局部部位进行分割得到的分割图像,所以,第二图像中的前景区域是目标对象的第一目标区域内的所有元素,除第一目标区域之外的区域在第二图像 中都属于背景区域。同理,第二图像是对目标对象的第二目标区域这一局部部位进行分割得到的分割图像,所以,第三图像中的前景区域是目标对象的第二目标区域内的所有元素,除第一目标区域之外的区域在第三图像中都属于背景区域。
可以理解,目标对象可以包括原始图像中的人物肖像、动物图像和植物图像等中的至少一种。
可以理解,由于第一图像中目标对象的前景区域是用于示意整个目标对象的区域,而第一目标区域和第二目标区域是目标对象的局部部位的区域,所以,第一目标区域和第二目标区域皆为第一图像中目标对象的前景区域中的子区域。
在一个实施例中,第一目标区域可以是目标对象中需要精细化抠图的区域。第二目标区域可以是目标对象中与第一目标区域相关的、且抠图精细化要求不同于第一目标区域的区域。具体地,第二目标区域的抠图精细化要求可以低于第一目标区域的抠图精细化要求。比如,第二目标区域的细节信息低于第一目标区域的细节信息,因而,其抠图精细化要求低于第一目标区域。
在一个实施例中,若目标对象为人物肖像,第一目标区域和第二目标区域可以分别为头发区域和脸部区域。
在一个实施例中,若目标对象为动物图像,第一目标区域和第二目标区域可以分别为毛发区域和头部区域。
在一个实施例中,若目标对象为植物图像,第一目标区域和第二目标区域可以分别为树叶区域和树枝区域。
需要说明的是,目标对象可以是真人肖像,也可以是卡通人物肖像或动漫人物肖像等。
402、基于第一图像、第二图像以及第三图像,生成目标三元图,该目标三元图包括前景区域和画线区域,该画线区域是通过在前景区域的轮廓线上绘制线条得到的;该前景区域的不同子区域对应不同的线条宽度。
在本申请实施例中,子区域是指前景区域中的部分区域,包括第一目标区域和第二目标区域等。
403、基于该目标三元图,对该原始图像中的目标对象进行抠图处理,得到包括该目标对象的目标图像。
在本申请实施例中,抠图处理是指将原始图像中的目标对象从背景区域中分离出来,以得到目标图像的过程。
在本申请实施例中,在对原始图像进行抠图时,首先采用语义分割的方式得到多个包含有不同区域的分割图像,进一步地,根据这些分割图像,在前景区域的轮廓线上,采用不同宽度的线条进行绘制,以得到目标三元图,最终基于该目标三元图,生成目标图像。针对上述目标三元图,由于在前景区域的轮廓线上,采用了不同宽度的线条进行绘制,能够实现对不同区域的针对性抠图,对于需要进行精细化抠图的区域,能够提高这部分区域的抠图精度,同时还能够保证其他区域的抠图精度,使得最终得到效果精细且自然的抠图图像;另外,上述抠图过程还实现了全自动化,极大提高了抠图效率。
上述图4所示仅为本申请的基本流程,下面基于一种具体实施方式,来对本申请提供的方案进行进一步阐述。
图5是根据本申请实施例提供的另一种图像处理方法的流程图,本实施例中,第一目标区域为头发区域;所述第二目标区域为脸部区域。如图5所示,在本申请实施例中以应用于终端为例进行说明。该方法包括以下步骤:
501、获取原始图像。
在本申请实施例中,终端提供抠图功能,用户能够在终端上进行抠图操作,终端响应于该抠图操作,获取到该原始图像。在一些实施例中,该原始图像为终端上存储的本地图像,或,该原始图 像为在线图像,本申请实施例对于原始图像的来源不作限定。
在一些实施例中,终端上显示针对该原始图像的图像处理界面,该图像处理界面上包括抠图选项、剪裁选项等等,用户能够对该抠图选项进行选中操作,终端响应于该选中操作,获取到该原始图像。
在一些实施例中,终端上显示图像处理界面,该图像处理界面上包括抠图选项,用户能够对该抠图选项进行选中操作,终端响应于该选中操作,显示图像选择界面,用户能够通过对想要抠图的图像进行点击操作,选择原始图像进行抠图,终端响应于该点击操作,获取到该原始图像。
需要说明的是,本申请实施例对于终端获取原始图像的方式不作限定。
502、将该原始图像输入到图像分割模型中。
在本申请实施例中,图像分割模型用于根据输入的原始图像,对该原始图像中各个像素点的语义类别进行计算,以输出该原始图像的至少一个图像。
在一些实施例中,该图像分割模型为HRNET-OCR模型,是一种将HRNET模型与OCR模型结合后的计算模型。该HRNET-OCR模型的计算过程如下:首先,通过HRNET模型对该原始图像进行特征提取,获取到该原始图像的特征信息;其次,将获取到的特征信息输入到OCR模型的主干网络中;再次,基于OCR模型,计算得到该原始图像中各个像素点的语义类别;例如,语义类别有头发、鼻子、眼睛、躯干、衣物以及建筑物等等;最后,基于各个像素点的语义类别,输出该原始图像的至少一个图像。上述HRNET-OCR模型的具体计算过程已结合图1和图2进行了详细说明,故在此不再赘述。
需要说明的是,在实际应用中,可以通过对上述HRNET-OCR模型中的部分结构进行调整,以输出原始图像的至少一个图像,本申请实施例对于HRNET-OCR模型中的结构组成不作限定。在另一些实施例中,上述图像分割模型还可以通过其他网络模型来实现,本申请实施例对于图像语义分割模型的类型不作限定。
503、获取该图像分割模型输出的第一图像、第二图像以及第三图像。该第一图像包括该原始图像中目标对象所在的前景区域,该第二图像是该目标对象的头发区域的分割图像,该第三图像是该目标对象的脸部区域的分割图像。
在本申请实施例中,经过图像分割模型对原始图像中各个像素点进行分类,终端能够获取到三种分割图像,也即是本步骤中的第一图像、第二图像以及第三图像。其中,各个图像均包括两种区域,这两种区域分别通过不同的标识值来进行标记,例如,以第一图像为例,该第一图像包括前景区域和背景区域,其中,前景区域中各个像素点的标识值为255,而背景区域中各个像素点的标识值为0。需要说明的是,在实际应用中,开发人员能够根据需求对标识值进行灵活设置,本申请实施例对此不作限定。
示意性地,参考图6,图6是本申请实施例提供的一种图像语义分割结果的示意图。如图6所示,原始图像为人物肖像,图6中的(a)所示的图像即为第一图像,该第一图像包括前景区域1和背景区域2,该前景区域1包含有人物肖像的所有元素;图6中的(b)所示的图像即为第二图像,该第二图像包括头发区域3和背景区域4;图6中(c)图所示即为第三图像,该第三图像包括脸部区域5和背景区域6。
504、基于该第一图像和该第二图像,生成第一三元图,该第一三元图包括前景区域、第一画线子区域和第二画线子区域。
在本申请实施例中,该第一画线子区域覆盖在头发区域靠近第一图像中的背景区域一侧的轮廓线上,该第二画线子区域覆盖在前景区域中的非头发区域的轮廓线上,该非头发区域为前景区域中除了头发区域之外的区域。第一线条宽度大于第二线条宽度,该第一线条宽度用于绘制第一画线子 区域,该第二线条宽度用于绘制第二画线子区域。
其中,该第一三元图还包括背景区域,且该第一画线子区域和第二画线子区域的标识值与前景区域的标识值不同,与该背景区域的标识值也不同。例如,前景区域中各个像素点的标识值为255,背景区域中各个像素点的标识值为0,第一画线子区域和第二画线子区域中各个像素点的标识值为128。
需要说明的是,在实际应用中,开发人员能够根据需求对画线区域的标识值进行灵活设置,本申请实施例对此不作限定。
图7是本申请实施例提供的一种第一三元图的示意图。如图7中右图所示,该第一三元图中包括前景区域7、背景区域8、第一画线子区域9以及第二画线子区域10,其中,第一画线子区域9是按照第一线条宽度进行绘制的,第二画线子区域10是按照第二线条宽度进行绘制的。
另外,需要说明的是,在本申请实施例中,不同图像中的前景区域之间存在区域范围上的差别,不同图像中的背景区域之间也存在区域范围上的差别,例如,参考图6中的背景区域2和背景区域4,两者的区域范围虽然存在明显差别,但均为背景区域。再例如,参考图6中的前景区域1和图7中的前景区域7,两者的区域范围虽然存在细微差别,但均为前景区域。
下面对本步骤中终端生成第一三元图的具体实现方式进行详细阐述,包括下述步骤5041至步骤5045:
5041、在第一图像中,获取前景区域的完整轮廓线。
其中,前景区域的完整轮廓线是指前景区域与背景区域之间的界线。终端基于获取到的第一图像,通过轮廓检测算法,在该第一图像中获取到前景区域的完整轮廓线。在一些实施例中,上述轮廓检测算法可以通过findContours函数实现,本申请实施例对此不作限定。
5042、按照第二线条宽度,在前景区域的完整轮廓线上绘制线条,得到第二三元图。
其中,第二三元图包括前景区域和第三画线子区域,该第三画线子区域覆盖在前景区域的完整轮廓线上。第二线条宽度是根据原始图像的尺寸计算得到的。在一些实施例中,第二线条宽度可以通过下述公式(1)计算得到:
Figure PCTCN2022071306-appb-000001
式中,S为第二线条宽度;width和height分别为原始图像的宽度和高度;min()是指最小值函数;min(width,height)表示从原始图像的宽度和高度中选取最小值;N为默认线条尺寸,例如,N可以为17,本申请实施例对此不作限定。
终端在获取到前景区域的完整轮廓线后,通过轮廓绘制算法,在该完整轮廓线上按照第二线条宽度绘制线条,且,线条的标识值与前景区域和背景区域的标识值均不同。在一些实施例中,上述轮廓绘制算法可以通过drawContours函数实现,例如,以前景区域的标识值为255,背景区域的标识值为0为例,通过下述公式(2)实现在前景区域的完整轮廓线上绘制线条:
cv::drawContours(seg Result,contours,-1,Scalar(128,128,128),S)    (2)
式中,segResult为第一图像;contours为通过findContours函数检测到的前景区域的完整轮廓线,-1表示对所有的轮廓线进行操作,Scalar为标识值,Scalar(128,128,128)表示将RGB通道中R、G以及B通道的颜色值均设置为128;S为第二线条宽度。
需要说明的是,上述绘制线条的方式是对获取到的完整轮廓线进行操作,也即是覆盖在该完整轮廓线上。例如,通过findContours函数获取到的完整轮廓线包括像素点A1至A10,则对这些像素点进行操作,实现线条绘制。也即是,通过绘制线条得到的第三画线子区域既覆盖了第一图像中 的前景区域,又覆盖了第一图像中的背景区域。
图8是一个实施例提供的一种第二三元图的示意图。图8中左图所示为第一图像,图8中右图所示为第二三元图,在第二三元图中包括前景区域11、背景区域12以及第三画线子区域13,该第三画线子区域13是按照第二线条宽度进行绘制的。
5043、在第二图像中,获取头发区域的完整轮廓线。
其中,头发区域的完整轮廓线是指头发区域与背景区域之间的界线。终端基于获取到的第二图像,通过轮廓检测算法,在该第二图像中获取到头发区域的完整轮廓线。在一些实施例中,上述轮廓检测算法可以通过findContours函数实现,本申请实施例对此不作限定。
5044、按照第一线条宽度,在头发区域的完整轮廓线上绘制线条,得到第三三元图。
其中,该第三三元图包括头发区域和第四画线子区域,该第四画线子区域覆盖在头发区域的完整轮廓线上。第一线条宽度为第二线条宽度的M倍,其中,M大于1。例如,第一线条宽度为第二线条宽度的3倍;也即是,当第二线条宽度为S时,第一线条宽度即为S*3,本申请实施例此不作限定。
需要说明的是,本步骤中在头发区域的完整轮廓线上绘制线条的方式与上述步骤5042中类似,故在此不再赘述,仅通过下述公式(3)进行举例说明:
cv::drawContours(seg ResultHair,contours,-1,Scalar(128,128,128),S*3)    (3)
式中,segResultHair为第二图像;contours为通过findContours函数检测到的头发区域的完整轮廓线,-1表示对所有的轮廓线进行操作,Scalar为标识值,Scalar(128,128,128)表示将RGB通道中R、G以及B通道的颜色值均设置为128;S*3为第一线条宽度。
图9是一个实施例提供的一种第三三元图的示意图。图9中左图所示为第二图像,图9中右图所示为第三三元图,在第三三元图中包括前景区域14、背景区域15以及第四画线子区域16,该第四画线子区域16是按照第一线条宽度进行绘制的。
5045、对第二三元图和第三三元图进行合并处理,得到第一三元图。
其中,对第二三元图和第三三元图进行合并处理是指:取两张三元图中相同位置的最大标识值,作为第一三元图中对应位置的标识值。
下面对本步骤的具体实现方式进行详细阐述,包括下述步骤A至步骤C:
步骤A:获取第二三元图中各个像素点的第一标识值,该第一标识值用于标识第二三元图中像素点的颜色。
步骤B:获取第三三元图中各个像素点的第二标识值,该第二标识值用于标识第三三元图中像素点的颜色。
步骤C:基于第一标识值和第二标识值之间的大小关系,生成第一三元图。
其中,步骤C的实现方式包括:将第二三元图中任意位置上像素点的第一标识值,与第三三元图中相同位置上像素点的第二标识值进行比较;将第一标识值和第二标识值中的最大者,作为第一三元图中相同位置上像素点的第三标识值,该第三标识值用于标识第一三元图中像素点的颜色。
示意性地,继续参考图7,终端在获取到第二三元图和第三三元图后,通过下述公式(4)实现对这两张三元图的合并处理,以得到第一三元图。公式(4)如下:
Pixel result=Pixel leftUp>Pixel leftDown?Pixel leftUp:Pixel leftDown    (4)
式中,Pixel result为第一三元图,也即是图7中的右图;Pixel leftUp为第二三元图,也即是图7中的左上图;Pixel leftDown为第三三元图,也即是图7中的左下图。
如图7中右图所示,通过上述步骤5045生成的第一三元图中,与头发区域所对应的第一画线子区域9大于与其他区域所对应的第二画线子区域10,通过这种按照不同的线条宽度绘制得到不 同的画线区域的方式,在后续的抠图处理中,对于头发区域等这类较为复杂的抠图区域,能够提升抠图的精细程度,以提高这部分区域的抠图效果。
505、基于第三图像和第一三元图,生成目标三元图。
在本申请实施例中,目标三元图包括前景区域和画线区域,该画线区域是通过在前景区域的轮廓线上绘制线条得到的;该前景区域的不同子区域对应不同的线条宽度。在一些实施例中,该前景区域还包括目标对象的躯干区域,其中,在目标三元图中,头发区域对应的线条宽度大于躯干区域对应的线条宽度,该躯干区域对应的线条宽度大于脸部区域对应的线条宽度。
图10是一个实施例提供的一种目标三元图的示意图。图10中右上图所示即为目标三元图,该图中包括前景区域17、背景区域18以及画线区域19。在该画线区域19中,头发区域对应的线条宽度大于躯干区域对应的线条宽度,躯干区域对应的线条宽度大于脸部区域对应的线条宽度。示意性地,画线区域19中线条宽度的关系可继续参考图10中的右下图,图中包括画线区域19a,19b以及19c,图中,19a表示头发区域对应的线条宽度,19b表示脸部区域对应的线条宽度,19c表示躯干区域对应的线条宽度,如图所示,头发区域对应的线条宽度19a大于躯干区域对应的线条宽度19c,躯干区域对应的线条宽度19c大于脸部区域对应的线条宽度19b。
下面对本步骤中终端生成目标三元图的具体实现方式进行详细阐述,包括下述步骤5051至步骤5052:
5051、基于第三图像中的脸部区域,确定第一三元图的目标重叠区域,该目标重叠区域为脸部区域与第二画线子区域的重叠区域。
其中,终端在获取到第一三元图后,基于第三图像中脸部区域的像素点位置,在第一三元图中确定目标重叠区域。示意性地,继续参考图10,如图10中右下图所示,图中包括目标重叠区域20,这一区域即为第三图像中的脸部区域与第一三元图中的第二画线子区域19之间的重叠区域。
5052、以目标标识值对目标重叠区域的像素点进行赋值,生成目标三元图,该目标标识值用于标识脸部区域中像素点的颜色。
其中,在第一三元图中,目标重叠区域的标识值为第二画线子区域的标识值,在本步骤中,对第一三元图中的目标重叠区域的标识值进行变更,以目标标识值对这一区域的像素点进行赋值,以生成目标三元图。例如,以脸部区域的标识值为255,第二画线子区域的标识值为128为例,在第一三元图中,目标重叠区域的标识值原本为128,经过本步骤5052,对该目标重叠区域的像素点以255的标识值进行重新赋值,得到目标三元图。
在一些实施例中,上述步骤5051和步骤5052可通过下述公式(5)来实现:
Pixel=Pixel∈φ{Face}?255:Pixel trimp    (5)
式中,φ{Face}表示脸部区域;255表示目标标识值,Pixel trimp为目标三元图。通过该公式(5),将第一三元图中属于脸部区域的像素点以目标标识值进行赋值,能够使得脸部区域不参与下述步骤506至步骤507中,利用抠图模型计算透明度的过程。
经过上述步骤501至步骤505,终端在获取到原始图像后,自动生成了目标三元图,且在该目标三元图中,前景区域的不同子区域对应的线条宽度不同。
需要说明的是,在实际应用中,三元图中前后景混合区域的标记将直接影响抠图效果的精细度,且,三元图中如果将前景区域标记为前后景混合区域,会造成抠图结果不准确。
而在本申请实施例中,目标三元图中的画线区域即为前后景混合区域,终端在自动生成目标三元图的过程中,按照不同的线条宽度对头发区域和除了头发区域以外的其他区域进行线条绘制,保证了头发区域等这类复杂区域的抠图范围,提高了这部分区域的抠图精度;同时,将属于脸部区域的像素点赋值为与前景区域相同的目标标识值,考虑到了对人物肖像中关键区域的保护,避免出现 抠图的细节损失。
506、将该目标三元图和该原始图像输入到抠图模型中。
在本申请实施例中,抠图模型用于根据输入的目标三元图和原始图像,对该原始图像中各个像素点属于目标图像的概率进行计算,以输出透明度。在一些实施例中,通过下述公式(6)来计算透明度:
I=α*F+(1-α)*B      (6)
式中,I表示原始图像;F表示前景,也即是包括了目标对象所有元素的区域;B表示背景;α为透明度,用于表示原始图像中前景颜色所占的比重。公式(6)表明,原始图像是前景与背景按照一定的透明度叠加组成的。
在一些实施例中,上述抠图模型可以为IndexNet抠图模型,或者,上述抠图模型还可以为GCAMatting抠图模型,又或者,上述抠图模型还可以为ContextNet模型等等,本申请实施对于上述抠图模型的具体类型不做限定。
下面以IndexNet抠图模型为例,对本步骤进行示意性说明。示意性地,参考图11,图11是本申请实施例提供的一种抠图模型的示意图,如图11所示,将目标三元图和原始图像作为输入,得到粗糙的Alpha(也即是α)图,和Alpha预测损失;同时,用粗糙的前景与背景进行合成后与原始图像进行比较,以得到图像的组合损失,最终利用一个卷积层来优化得到精细的Alpha图,输出抠图的精细结果,也即是各个像素点的Alpha值。
507、获取该抠图模型输出的透明度,该透明度用于表征像素点属于目标对象的概率。
508、基于该透明度对该原始图像进行抠图处理,得到包括目标对象的目标图像。
在本申请实施例中,本步骤508中的抠图处理是指基于各个像素点的透明度,将原始图像中的目标对象从背景中分离出来,以得到包括目标图像的过程。示意性地,参考图12,图12是本申请实施例提供的一种目标图像的示意图。图12中左图所示即为原始图像,图12中右上图所示即为根据本方法得到的目标图像,图中,人物肖像的发梢飘逸,抠图精细,面部完整;而图12中右下图所示为根据相关技术中的图像分割方法得到的目标图像,图中,虽然图像分割准确,但是人物肖像的发梢十分粗糙,且面部存在细节损失。
在本申请实施例中,在对原始图像进行抠图时,首先采用语义分割的方式得到多个包含有不同区域的分割图像,进一步地,根据这些分割图像,在前景区域的轮廓线上,采用不同宽度的线条进行绘制,以得到目标三元图,最终基于该目标三元图,生成目标图像。针对上述目标三元图,由于在前景区域的轮廓线上,采用了不同宽度的线条进行绘制,能够实现对不同区域的针对性抠图,对于需要进行精细化抠图的区域,能够提高这部分区域的抠图精度,同时还能够保证其他区域的抠图精度,使得最终得到效果精细且自然的抠图图像;另外,上述抠图过程还实现了全自动化,极大提高了抠图效率。
下面结合图13对本申请实施例提供的图像处理方法进行示例性地简要总结。如图13所示,包括以下六个图像处理阶段:
第一、获取原始图像,原始图像为包括人物肖像的图像。
第二、基于HRNET-OCR模型对原始图像进行图像语义分割,根据分割结果得到三种分割图像,第一种分割图像包括前景区域,该前景区域包含了人物肖像的所有元素;第二种分割图像包括头发区域,由于人类的躯干与背景的边缘线条比较明确,而头发由于其形状的特性,往往与背景融合较严重,需着重抠图;第三种分割图像包括脸部区域,也可以理解为保护区域,人脸作为人物肖像的重要关注部位,如果被误抠误伤,会极其影响观感,需保护这部分区域不被抠图抠破。
第三、针对第一种分割图像,以基础尺寸在前景区域的轮廓线上绘制线条,得到第二三元图; 针对第二种分割图像,以三倍基础尺寸在头发区域的轮廓线上绘制线条,得到第三三元图;其中,所绘制的线条的标识值为128。
第四、对第二三元图和第三三元图进行合并处理,得到合并后的第一三元图。
第五、将第一三元图中的脸部区域重新设置为前景标记,也即是将第一三元图中的脸部区域的各个像素点重新赋值为255,得到目标三元图。
第六、基于目标三元图,最终得到目标图像,也即是原始图像中的人物肖像。
针对上述目标三元图,由于在前景区域的轮廓线上,采用了不同宽度的线条进行绘制,能够实现对不同区域的针对性抠图,对于需要进行精细化抠图的区域,能够提高这部分区域的抠图精度,同时还能够保证其他区域的抠图精度,使得最终得到效果精细且自然的抠图图像;另外,上述抠图过程还实现了全自动化,极大提高了抠图效率。
示意性地,本申请实施例提供的图像处理方法的应用场景包括但不限于:
场景一、表情包场景
随着表情包文化的流行,许多应用都添加了制作表情包的功能,便于用户通过制作表情包的方式,来表达自己的情绪和心情。在一些情景下,用户希望将图片中的人物肖像抠取出来,然后在人物肖像的基础上通过添加贴纸、文字或背景等,制作得到自己想要的表情包。
例如,终端通过应用提供人物肖像的表情包制作功能,用户通过在终端上执行操作,输入想要抠取出人物肖像的原始图像。终端在获取到原始图像后,采用本申请实施例提供的图像处理方法,将该原始图像中的人物肖像自动抠取出来,并在终端上进行显示,以供用户后续在该人物肖像的基础上进行其他图像处理操作,进而得到用户想要的表情包。示意性地,终端抠取人物肖像的过程包括以下步骤1至步骤8:
1、终端获取原始图像。
2、终端将该原始图像输入到图像分割模型中。
3、终端获取该图像分割模型输出的第一图像、第二图像以及第三图像。该第一图像包括该原始图像中人物肖像所在的前景区域,该第二图像包括该人物肖像的头发区域,该第三图像包括该人物肖像的脸部区域。
4、终端基于该第一图像和该第二图像,生成第一三元图,该第一三元图包括前景区域、第一画线子区域和第二画线子区域。
5、终端基于第三图像和第一三元图,生成目标三元图。
6、终端将该目标三元图和该原始图像输入到抠图模型中。
7、终端获取该抠图模型输出的透明度,该透明度用于表征像素点属于人物肖像的概率。
8、终端基于该透明度对该原始图像进行抠图处理,得到包括人物肖像的目标图像。后续,用户在该目标图像的基础上制作表情包。
通过本申请实施例提供的图像处理方法能够实现对人物肖像的自动抠取,且抠取出的人物肖像效果精细且自然,能够满足用户对于表情包制作的个性化需求。
场景二、直播场景
在一些直播情景下,主播为了保护个人隐私,会希望隐藏自己所处的真实背景环境,然后在直播画面中仅显示主播的人物肖像,或者在主播的人物肖像的基础上添加其他虚拟背景。
例如,终端提供直播过程中的人物肖像模式,主播通过开启该人物肖像模式,使得终端实时获取由摄像头拍摄到的每一帧原始图像,然后采用本申请实施例提供的图像处理方法,将每一帧原始图像中主播的人物肖像抠取出来,并实时生成直播画面进行直播。终端具体抠取人物肖像的过程与上述场景一类似,故在此不再赘述。
可见,本申请实施例提供的图像处理方法由于实现了自动化人像抠取,能够直接应用于这种需要实时抠取人物肖像的场景。
应该理解的是,虽然上述各实施例的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,这些步骤可以以其它的顺序执行。而且,上述各实施例的流程图中的至少一部分步骤可以包括多个步骤或者多个阶段,这些步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,这些步骤或者阶段的执行顺序也不必然是依次进行,而是可以与其它步骤或者其它步骤中的步骤或者阶段的至少一部分轮流或者交替地执行。
图14是根据本申请实施例提供的一种图像处理装置的结构示意图。该装置用于执行上述图像处理方法执行时的步骤,参见图14,装置包括:图像分割模块1401、三元图生成模块1402以及抠图模块1403。
图像分割模块1401,用于对原始图像进行图像语义分割,得到第一图像、第二图像以及第三图像,该第一图像包括该原始图像中目标对象所在的前景区域,该第二图像包括该目标对象的头发区域,该第三图像包括该目标对象的脸部区域;
三元图生成模块1402,用于基于该第一图像、该第二图像以及该第三图像,生成目标三元图,该目标三元图包括该前景区域和画线区域,该画线区域是通过在该前景区域的轮廓线上绘制线条得到的;该前景区域的不同子区域对应不同的线条宽度;
抠图模块1403,用于基于该目标三元图,对该原始图像进行抠图处理,得到包括该目标对象的目标图像。
在一些实施例中,该前景区域还包括该目标对象的躯干区域,其中,在该目标三元图中,该头发区域对应的线条宽度大于该躯干区域对应的线条宽度,该躯干区域对应的线条宽度大于该脸部区域对应的线条宽度。
在一些实施例中,该三元图生成模块1402包括:
第一生成单元,用于基于该第一图像和该第二图像,生成第一三元图,该第一三元图包括该前景区域、第一画线子区域和第二画线子区域;
其中,该第一画线子区域覆盖在该头发区域靠近背景区域一侧的轮廓线上,该第二画线子区域覆盖在其他区域的轮廓线上,该其他区域为该前景区域中除了该头发区域之外的区域;第一线条宽度大于第二线条宽度,该第一线条宽度用于绘制该第一画线子区域,该第二线条宽度用于绘制该第二画线子区域;
第二生成单元,用于基于该第三图像和该第一三元图,生成该目标三元图。
在一些实施例中,该第一生成单元用于:在该第一图像中,获取该前景区域的完整轮廓线;按照该第二线条宽度,在该前景区域的完整轮廓线上绘制线条,得到第二三元图;其中,该第二三元图包括该前景区域和第三画线子区域,该第三画线子区域覆盖在该前景区域的完整轮廓线上;在该第二图像中,获取该头发区域的完整轮廓线;按照该第一线条宽度,在该头发区域的完整轮廓线上绘制线条,得到第三三元图;其中,该第三三元图包括该头发区域和第四画线子区域,该第四画线子区域覆盖在该头发区域的完整轮廓线上;对该第二三元图和该第三三元图进行合并处理,得到该第一三元图。
在一些实施例中,该第一线条宽度为该第二线条宽度的M倍,M大于1。
在一些实施例中,该第一生成单元还用于:获取该第二三元图中各个像素点的第一标识值,该第一标识值用于标识该第二三元图中像素点的颜色;获取该第三三元图中各个像素点的第二标识值,该第二标识值用于标识该第三三元图中像素点的颜色;基于该第一标识值和该第二标识值之 间的大小关系,生成该第一三元图。
在一些实施例中,该第一生成单元还用于:将该第二三元图中任意位置上像素点的第一标识值,与该第三三元图中相同位置上像素点的第二标识值进行比较;将该第一标识值和该第二标识值中的最大者,作为该第一三元图中相同位置上像素点的第三标识值,该第三标识值用于标识该第一三元图中像素点的颜色。
在一些实施例中,该第二生成单元用于:基于该第三图像中的该脸部区域,确定该第一三元图的目标重叠区域,该目标重叠区域为该脸部区域与该第二画线子区域的重叠区域;以目标标识值对该目标重叠区域的像素点进行赋值,生成该目标三元图,该目标标识值用于标识该脸部区域中像素点的颜色。
在一些实施例中,该抠图模块1403用于:基于该目标三元图,获取该原始图像中各个像素点的透明度,该透明度用于表征该像素点属于该目标对象的概率;基于该透明度对该原始图像进行抠图处理,得到该目标图像。
在一些实施例中,该图像分割模块1401还用于:获取该原始图像;将该原始图像输入到图像分割模型中,其中,该图像分割模型用于根据输入的该原始图像,对该原始图像中每个像素点的语义类别进行计算,以输出该原始图像的至少一个图像;获取该图像分割模型输出的该第一图像、该第二图像以及该第三图像。
在一些实施例中,该抠图模块1403还用于:将该目标三元图和该原始图像输入到抠图模型中,该抠图模型用于根据输入的该目标三元图和该原始图像,对该原始图像中每个像素点属于该目标图像的概率进行计算,以输出该透明度;获取该抠图模型输出的该透明度。
在本申请实施例中,在对原始图像进行抠图时,首先采用语义分割的方式得到多个包含有不同区域的分割图像,进一步地,根据这些分割图像,在前景区域的轮廓线上,采用不同宽度的线条进行绘制,以得到目标三元图,最终基于该目标三元图,生成目标图像。针对上述目标三元图,由于在前景区域的轮廓线上,采用了不同宽度的线条进行绘制,能够实现对不同区域的针对性抠图,对于需要进行精细化抠图的区域,能够提高这部分区域的抠图精度,同时还能够保证其他区域的抠图精度,使得最终得到效果精细且自然的抠图图像;另外,上述抠图过程还实现了全自动化,极大提高了抠图效率。
需要说明的是:上述实施例提供的图像处理装置在进行图像处理时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即,将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。上述装置中的各个模块可全部或部分通过软件、硬件及其组合来实现。上述各模块可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。另外,上述实施例提供的图像处理装置与图像处理方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
在示例性实施例中,还提供了一种计算机设备。以计算机设备为终端为例,图15示出了本申请一个示例性实施例提供的终端1500的结构示意图。该终端1500可以是:智能手机、平板电脑、MP3播放器(Moving Picture Experts Group Audio Layer III,动态影像专家压缩标准音频层面3)、MP4(Moving Picture Experts Group Audio Layer IV,动态影像专家压缩标准音频层面4)播放器、笔记本电脑或台式电脑。终端1500还可能被称为用户设备、便携式终端、膝上型终端、台式终端等其他名称。
通常,终端1500包括有:一个或多个处理器1501和存储器1502。
处理器1501可以包括一个或多个处理核心,比如4核心处理器、8核心处理器等。处理器1501 可以采用DSP(Digital Signal Processing,数字信号处理)、FPGA(Field-Programmable Gate Array,现场可编程门阵列)、PLA(Programmable Logic Array,可编程逻辑阵列)中的至少一种硬件形式来实现。处理器1501也可以包括主处理器和协处理器,主处理器是用于对在唤醒状态下的数据进行处理的处理器,也称CPU(Central Processing Unit,中央处理器);协处理器是用于对在待机状态下的数据进行处理的低功耗处理器。在一些实施例中,处理器1501可以集成有GPU(Graphics Processing Unit,图像处理器),GPU用于负责显示屏所需要显示的内容的渲染和绘制。一些实施例中,处理器1501还可以包括AI(Artificial Intelligence,人工智能)处理器,该AI处理器用于处理有关机器学习的计算操作。
存储器1502可以包括一个或多个计算机可读存储介质,该计算机可读存储介质可以是非暂态的。存储器1502还可包括高速随机存取存储器,以及非易失性存储器,比如一个或多个磁盘存储设备、闪存存储设备。在一些实施例中,存储器1502中的非暂态的计算机可读存储介质用于存储至少一个计算机可读指令,该至少一个计算机可读指令用于被一个或多个处理器1501所执行以实现本申请中方法实施例提供的图像处理方法。
在一些实施例中,终端1500还可以包括:外围设备接口1503和至少一个外围设备。一个或多个处理器1501、存储器1502和外围设备接口1503之间可以通过总线或信号线相连。各个外围设备可以通过总线、信号线或电路板与外围设备接口1503相连。具体地,外围设备包括:射频电路1504、显示屏1505、摄像头组件1506、音频电路1507、定位组件1508和电源1509中的至少一种。
外围设备接口1503可被用于将I/O(Input/Output,输入/输出)相关的至少一个外围设备连接到一个或多个处理器1501和存储器1502。
射频电路1504用于接收和发射RF(Radio Frequency,射频)信号,也称电磁信号。射频电路1504通过电磁信号与通信网络以及其他通信设备进行通信。
显示屏1505用于显示UI(User Interface,用户界面)。该UI可以包括图形、文本、图标、视频及其它们的任意组合。当显示屏1505是触摸显示屏时,显示屏1505还具有采集在显示屏1505的表面或表面上方的触摸信号的能力。该触摸信号可以作为控制信号输入至一个或多个处理器1501进行处理。此时,显示屏1505还可以用于提供虚拟按钮和/或虚拟键盘,也称软按钮和/或软键盘。在一些实施例中,显示屏1505可以为一个,设置在终端1500的前面板;在另一些实施例中,显示屏1505可以为至少两个,分别设置在终端1500的不同表面或呈折叠设计;在另一些实施例中,显示屏1505可以是柔性显示屏,设置在终端1500的弯曲表面上或折叠面上。甚至,显示屏1505还可以设置成非矩形的不规则图形,也即异形屏。显示屏1505可以采用LCD(Liquid Crystal Display,液晶显示屏)、OLED(Organic Light-Emitting Diode,有机发光二极管)等材质制备。
摄像头组件1506用于采集图像或视频。
音频电路1507可以包括麦克风和扬声器。麦克风用于采集用户及环境的声波,并将声波转换为电信号输入至一个或多个处理器1501进行处理,或者输入至射频电路1504以实现语音通信。扬声器用于将来自一个或多个处理器1501或射频电路1504的电信号转换为声波。
定位组件1508用于定位终端1500的当前地理位置,以实现导航或LBS(Location Based Service,基于位置的服务)。
电源1509用于为终端1500中的各个组件进行供电。
在一些实施例中,终端1500还包括有一个或多个传感器1510。该一个或多个传感器1510包括但不限于:加速度传感器1511、陀螺仪传感器1512、压力传感器1513、指纹传感器1514、光学传感器1515以及接近传感器1516。
加速度传感器1511可以检测以终端1500建立的坐标系的三个坐标轴上的加速度大小。
陀螺仪传感器1512可以检测终端1500的机体方向及转动角度,陀螺仪传感器1512可以与加速度传感器1511协同采集用户对终端1500的3D动作。
压力传感器1513可以设置在终端1500的侧边框和/或显示屏1505的下层。
指纹传感器1514用于采集用户的指纹,由一个或多个处理器1501根据指纹传感器1514采集到的指纹识别用户的身份,或者,由指纹传感器1514根据采集到的指纹识别用户的身份。
光学传感器1515用于采集环境光强度。在一个实施例中,一个或多个处理器1501可以根据光学传感器1515采集的环境光强度,控制显示屏1505的显示亮度。
接近传感器1516,也称距离传感器,通常设置在终端1500的前面板。接近传感器1516用于采集用户与终端1500的正面之间的距离。
本领域技术人员可以理解,图15中示出的结构并不构成对终端1500的限定,可以包括比图示更多或更少的组件,或者组合某些组件,或者采用不同的组件布置。
本申请实施例还提供了一个或多个计算机可读存储介质,该计算机可读存储介质应用于计算机设备,该计算机可读存储介质中存储有至少一条计算机可读指令,该至少一条计算机可读指令由一个或多个处理器加载并执行以实现上述实施例的图像处理方法中计算机设备所执行的操作。
本申请实施例还提供了一种计算机可读指令产品或计算机可读指令,该计算机可读指令产品或计算机可读指令包括计算机可读指令代码,该计算机可读指令代码存储在计算机可读存储介质中。计算机设备的一个或多个处理器从计算机可读存储介质读取该计算机可读指令代码,一个或多个处理器执行该计算机可读指令代码,使得该计算机设备执行上述各种可选实现方式中提供的图像处理方法。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一个或多个计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (17)

  1. 一种图像处理方法,其特征在于,由计算机设备执行,所述方法包括:
    对原始图像进行图像语义分割,得到第一图像、第二图像以及第三图像,所述第一图像中的前景区域是所述原始图像中目标对象所在的区域,所述第二图像是所述目标对象的第一目标区域的分割图像,所述第三图像是所述目标对象的第二目标区域的分割图像;所述前景区域的子区域包括所述第一目标区域和所述第二目标区域;
    基于所述第一图像、所述第二图像以及所述第三图像,生成目标三元图,所述目标三元图包括所述前景区域和画线区域,所述画线区域是通过在所述前景区域的轮廓线上绘制线条得到的;所述前景区域的不同子区域对应不同的线条宽度;
    基于所述目标三元图,对所述原始图像中的所述目标对象进行抠图处理,得到包括所述目标对象的目标图像。
  2. 根据权利要求1所述的方法,其特征在于,所述第一目标区域为头发区域;所述第二目标区域为脸部区域;所述前景区域还包括所述目标对象的躯干区域,其中,在所述目标三元图中,所述头发区域对应的线条宽度大于所述躯干区域对应的线条宽度,所述躯干区域对应的线条宽度大于所述脸部区域对应的线条宽度。
  3. 根据权利要求1所述的方法,其特征在于,所述第一目标区域为头发区域;所述第二目标区域为脸部区域;所述基于所述第一图像、所述第二图像以及所述第三图像,生成目标三元图,包括:
    基于所述第一图像和所述第二图像,生成第一三元图,所述第一三元图包括所述前景区域、第一画线子区域和第二画线子区域;
    其中,所述第一画线子区域覆盖在所述头发区域靠近所述第一图像中的背景区域一侧的轮廓线上,所述第二画线子区域覆盖在所述前景区域中的非头发区域的轮廓线上,所述非头发区域是所述前景区域中除了所述头发区域之外的区域;所述第一画线子区域是使用第一线条宽度绘制的,所述第二画线子区域是使用第二线条宽度绘制的;所述第一线条宽度大于第二线条宽度;
    基于所述第三图像和所述第一三元图,生成所述目标三元图。
  4. 根据权利要求3所述的方法,其特征在于,所述基于所述第一图像和所述第二图像,生成第一三元图,包括:
    在所述第一图像中,获取所述前景区域的完整轮廓线;
    按照所述第二线条宽度,在所述前景区域的完整轮廓线上绘制线条,得到第二三元图;其中,所述第二三元图包括所述前景区域和第三画线子区域,所述第三画线子区域覆盖在所述前景区域的完整轮廓线上;
    在所述第二图像中,获取所述头发区域的完整轮廓线;
    按照所述第一线条宽度,在所述头发区域的完整轮廓线上绘制线条,得到第三三元图;其中,所述第三三元图包括所述头发区域和第四画线子区域,所述第四画线子区域覆盖在所述头发区域的完整轮廓线上;
    对所述第二三元图和所述第三三元图进行合并处理,得到所述第一三元图。
  5. 根据权利要求3或4所述的方法,其特征在于,所述第一线条宽度为所述第二线条宽度的M倍,M大于1。
  6. 根据权利要求4所述的方法,其特征在于,所述对所述第二三元图和所述第三三元图进行合并处理,得到所述第一三元图,包括:
    获取所述第二三元图中各个像素点的第一标识值,所述第一标识值用于标识所述第二三元图中 像素点的颜色;
    获取所述第三三元图中各个像素点的第二标识值,所述第二标识值用于标识所述第三三元图中像素点的颜色;
    基于所述第一标识值和所述第二标识值之间的大小关系,生成所述第一三元图。
  7. 根据权利要求6所述的方法,其特征在于,所述基于所述第一标识值和所述第二标识值之间的大小关系,生成所述第一三元图,包括:
    将所述第二三元图中任意位置上像素点的第一标识值,与所述第三三元图中相同位置上像素点的第二标识值进行比较;
    将所述第一标识值和所述第二标识值中的最大者,作为所述第一三元图中相同位置上像素点的第三标识值,所述第三标识值用于标识所述第一三元图中像素点的颜色。
  8. 根据权利要求3所述的方法,其特征在于,所述基于所述第三图像和所述第一三元图,生成所述目标三元图,包括:
    基于所述第三图像中的所述脸部区域,确定所述第一三元图的目标重叠区域,所述目标重叠区域为所述脸部区域与所述第二画线子区域的重叠区域;
    以目标标识值对所述目标重叠区域的像素点进行赋值,生成所述目标三元图,所述目标标识值用于标识所述脸部区域中像素点的颜色。
  9. 根据权利要求1所述的方法,其特征在于,所述基于所述目标三元图,对所述原始图像中的所述目标对象进行抠图处理,得到包括所述目标对象的目标图像,包括:
    基于所述目标三元图,获取所述原始图像中各个像素点的透明度,所述透明度用于表征所述像素点属于所述目标对象的概率;
    基于所述透明度对所述原始图像进行抠图处理,得到所述目标图像。
  10. 根据权利要求9所述的方法,其特征在于,所述基于所述目标三元图,获取所述原始图像中各个像素点的透明度包括:
    将所述目标三元图和所述原始图像输入至抠图模型中,基于所述抠图模型根据所述目标三元图和所述原始图像,对所述原始图像中每个像素点属于目标图像的概率进行计算,以输出透明度;
    获取所述抠图模型输出的所述透明度。
  11. 一种图像处理装置,其特征在于,所述装置包括:
    图像分割模块,用于对原始图像进行图像语义分割,得到第一图像、第二图像以及第三图像,所述第一图像中的前景区域是所述原始图像中目标对象所在的区域,所述第二图像是所述目标对象的第一目标区域的分割图像,所述第三图像是所述目标对象的第二目标区域的分割图像;所述前景区域的子区域包括所述第一目标区域和所述第二目标区域;
    三元图生成模块,用于基于所述第一图像、所述第二图像以及所述第三图像,生成目标三元图,所述目标三元图包括所述前景区域和画线区域,所述画线区域是通过在所述前景区域的轮廓线上绘制线条得到的;所述前景区域的不同子区域对应不同的线条宽度;
    抠图模块,用于基于所述目标三元图,对所述原始图像中的所述目标对象进行抠图处理,得到包括所述目标对象的目标图像。
  12. 根据权利要求11所述的装置,其特征在于,所述第一目标区域为头发区域;所述第二目标区域为脸部区域;所述前景区域还包括所述目标对象的躯干区域,其中,在所述目标三元图中,所述头发区域对应的线条宽度大于所述躯干区域对应的线条宽度,所述躯干区域对应的线条宽度大于所述脸部区域对应的线条宽度。
  13. 根据权利要求12所述的装置,其特征在于,所述第一目标区域为头发区域;所述第二目标区域为脸部区域;所述三元图生成模块包括:
    第一生成单元,用于基于所述第一图像和所述第二图像,生成第一三元图,所述第一三元图包括所述前景区域、第一画线子区域和第二画线子区域;
    其中,所述第一画线子区域覆盖在所述头发区域靠近背景区域一侧的轮廓线上,所述第二画线子区域覆盖在其他区域的轮廓线上,所述其他区域为所述前景区域中除了所述头发区域之外的区域;第一线条宽度大于第二线条宽度,所述第一线条宽度用于绘制所述第一画线子区域,所述第二线条宽度用于绘制所述第二画线子区域;
    第二生成单元,用于基于所述第三图像和所述第一三元图,生成所述目标三元图。
  14. 根据权利要求13所述的装置,其特征在于,所述第一生成单元用于:
    在所述第一图像中,获取所述前景区域的完整轮廓线;
    按照所述第二线条宽度,在所述前景区域的完整轮廓线上绘制线条,得到第二三元图;其中,所述第二三元图包括所述前景区域和第三画线子区域,所述第三画线子区域覆盖在所述前景区域的完整轮廓线上;
    在所述第二图像中,获取所述头发区域的完整轮廓线;
    按照所述第一线条宽度,在所述头发区域的完整轮廓线上绘制线条,得到第三三元图;其中,所述第三三元图包括所述头发区域和第四画线子区域,所述第四画线子区域覆盖在所述头发区域的完整轮廓线上;
    对所述第二三元图和所述第三三元图进行合并处理,得到所述第一三元图。
  15. 一种计算机设备,其特征在于,所述计算机设备包括一个或多个处理器和存储器,所述存储器用于存储至少一条计算机可读指令,所述至少一条计算机可读指令由所述一个或多个处理器加载并执行权利要求1至10中任一项所述的图像处理方法。
  16. 一个或多个计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条计算机可读指令,所述至少一条计算机可读指令由一个或多个处理器加载并执行以实现权利要求1至10中任一项所述的图像处理方法。
  17. 一种计算机程序产品,包括计算机可读指令,其特征在于,该计算机可读指令被一个或多个处理器执行时实现权利要求1至10中任一项所述的图像处理方法。
PCT/CN2022/071306 2021-01-18 2022-01-11 图像处理方法、装置、设备、存储介质及计算机程序产品 WO2022152116A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP22738995.4A EP4276754A1 (en) 2021-01-18 2022-01-11 Image processing method and apparatus, device, storage medium, and computer program product
JP2023524819A JP2023546607A (ja) 2021-01-18 2022-01-11 画像処理の方法、装置、デバイス及びコンピュータプログラム
US17/989,109 US20230087489A1 (en) 2021-01-18 2022-11-17 Image processing method and apparatus, device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110062567.1 2021-01-18
CN202110062567.1A CN113570614A (zh) 2021-01-18 2021-01-18 图像处理方法、装置、设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/989,109 Continuation US20230087489A1 (en) 2021-01-18 2022-11-17 Image processing method and apparatus, device, and storage medium

Publications (1)

Publication Number Publication Date
WO2022152116A1 true WO2022152116A1 (zh) 2022-07-21

Family

ID=78160954

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/071306 WO2022152116A1 (zh) 2021-01-18 2022-01-11 图像处理方法、装置、设备、存储介质及计算机程序产品

Country Status (5)

Country Link
US (1) US20230087489A1 (zh)
EP (1) EP4276754A1 (zh)
JP (1) JP2023546607A (zh)
CN (1) CN113570614A (zh)
WO (1) WO2022152116A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113570614A (zh) * 2021-01-18 2021-10-29 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质
CN117651972A (zh) * 2022-07-04 2024-03-05 北京小米移动软件有限公司 图像处理方法、装置、终端设备、电子设备及存储介质
CN116843708B (zh) * 2023-08-30 2023-12-12 荣耀终端有限公司 图像处理方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150213611A1 (en) * 2014-01-29 2015-07-30 Canon Kabushiki Kaisha Image processing apparatus that identifies image area, and image processing method
CN110751655A (zh) * 2019-09-16 2020-02-04 南京工程学院 一种基于语义分割和显著性分析的自动抠图方法
CN111383232A (zh) * 2018-12-29 2020-07-07 Tcl集团股份有限公司 抠图方法、装置、终端设备及计算机可读存储介质
CN113570614A (zh) * 2021-01-18 2021-10-29 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150213611A1 (en) * 2014-01-29 2015-07-30 Canon Kabushiki Kaisha Image processing apparatus that identifies image area, and image processing method
CN111383232A (zh) * 2018-12-29 2020-07-07 Tcl集团股份有限公司 抠图方法、装置、终端设备及计算机可读存储介质
CN110751655A (zh) * 2019-09-16 2020-02-04 南京工程学院 一种基于语义分割和显著性分析的自动抠图方法
CN113570614A (zh) * 2021-01-18 2021-10-29 腾讯科技(深圳)有限公司 图像处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
US20230087489A1 (en) 2023-03-23
EP4276754A1 (en) 2023-11-15
CN113570614A (zh) 2021-10-29
JP2023546607A (ja) 2023-11-06

Similar Documents

Publication Publication Date Title
WO2022152116A1 (zh) 图像处理方法、装置、设备、存储介质及计算机程序产品
JP7110502B2 (ja) 深度を利用した映像背景減算法
CN110070056B (zh) 图像处理方法、装置、存储介质及设备
CN108594997B (zh) 手势骨架构建方法、装置、设备及存储介质
CN111541907B (zh) 物品显示方法、装置、设备及存储介质
Liu et al. Real-time robust vision-based hand gesture recognition using stereo images
US9727775B2 (en) Method and system of curved object recognition using image matching for image processing
US11308655B2 (en) Image synthesis method and apparatus
CN110400304B (zh) 基于深度学习的物体检测方法、装置、设备及存储介质
WO2020169051A1 (zh) 一种全景视频数据处理的方法、终端以及存储介质
CN110570460B (zh) 目标跟踪方法、装置、计算机设备及计算机可读存储介质
CN111242090B (zh) 基于人工智能的人脸识别方法、装置、设备及介质
CN112749613B (zh) 视频数据处理方法、装置、计算机设备及存储介质
CN111652796A (zh) 图像处理方法、电子设备及计算机可读存储介质
CN112052186A (zh) 目标检测方法、装置、设备以及存储介质
CN112257552B (zh) 图像处理方法、装置、设备及存储介质
CN113570052B (zh) 图像处理方法、装置、电子设备及存储介质
CN111768356A (zh) 一种人脸图像融合方法、装置、电子设备及存储介质
CN111738914A (zh) 图像处理方法、装置、计算机设备及存储介质
CN111325220B (zh) 图像生成方法、装置、设备及存储介质
CN113822798B (zh) 生成对抗网络训练方法及装置、电子设备和存储介质
CN112818979B (zh) 文本识别方法、装置、设备及存储介质
CN113642359B (zh) 人脸图像生成方法、装置、电子设备及存储介质
WO2020155984A1 (zh) 人脸表情图像处理方法、装置和电子设备
CN111107264A (zh) 图像处理方法、装置、存储介质以及终端

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22738995

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2023524819

Country of ref document: JP

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2022738995

Country of ref document: EP

Effective date: 20230811