CN114140547B

CN114140547B - Image generation method and device

Info

Publication number: CN114140547B
Application number: CN202111485564.5A
Authority: CN
Inventors: 彭昊天
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2021-12-07
Filing date: 2021-12-07
Publication date: 2023-03-14
Anticipated expiration: 2041-12-07
Also published as: CN114140547A

Abstract

The present disclosure provides an image generation method and apparatus, which relate to the technical field of computers, and in particular, to the technical field of virtual/augmented reality and image processing based on artificial intelligence. The implementation scheme is as follows: obtaining a first image containing a target object; and generating a binarized mask image corresponding to the first image based on the first image, wherein the mask image corresponds to a binarized matte image obtained by matting the first image for the target object, and a region having a first gray value of the two gray values in the mask image corresponds to a region having the first gray value in the matte image and corresponds to a region of the target object corresponding in the first image.

Description

Image generation method and device

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to virtual/augmented reality and image processing technologies based on artificial intelligence, and in particular, to an image generation method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

Background

Artificial intelligence is the subject of research that makes computers simulate some human mental processes and intelligent behaviors (such as learning, reasoning, thinking, planning, etc.), both at the hardware level and at the software level. Artificial intelligence hardware technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing, and the like; the artificial intelligence software technology mainly comprises a computer vision technology, a voice recognition technology, a natural language processing technology, machine learning/deep learning, a big data processing technology, a knowledge map technology and the like.

Image processing techniques based on artificial intelligence have penetrated into various fields. The method comprises the steps of dividing a target object from a plurality of objects in an image based on an artificial intelligence virtual image generation technology so as to generate the target object.

The approaches described in this section are not necessarily approaches that have been previously conceived or pursued. Unless otherwise indicated, it should not be assumed that any of the approaches described in this section qualify as prior art merely by virtue of their inclusion in this section. Similarly, unless otherwise indicated, the problems mentioned in this section should not be considered as having been acknowledged in any prior art.

Disclosure of Invention

The present disclosure provides an image generation method, an apparatus, an electronic device, a computer-readable storage medium, and a computer program product.

According to an aspect of the present disclosure, there is provided an image generation method including: obtaining a first image containing a target object; and generating a binarized mask image corresponding to the first image based on the first image, wherein the mask image corresponds to a binarized matte image obtained by matting the first image for the target object.

According to another aspect of the present disclosure, there is provided a method for training an image generation model, comprising: obtaining a training image containing a target object and a binary sectional image corresponding to the training image, wherein the sectional image is obtained by performing sectional drawing on the training image aiming at the target object, a region with a first gray value in the sectional image corresponds to a first region of the target object in the training image, and the first gray value is one of two gray values; and training the image generation model based on the training images and the sectional images so that the image generation model outputs generated images corresponding to the sectional images of the input images based on the input images.

According to another aspect of the present disclosure, there is provided an image generating apparatus including: a first acquisition unit configured to obtain a first image containing a target object; and a generating unit configured to generate a binarized mask image corresponding to the first image based on the first image, wherein the mask image corresponds to a binarized matte image obtained by matting the first image with respect to the target object.

According to another aspect of the present disclosure, there is provided an apparatus for training an image generation model, comprising: a training image obtaining unit, configured to obtain a training image containing a target object and a binarized matte image corresponding to the training image, where the matte image is obtained by matting the training image with respect to the target object, a region in the matte image having a first gray value corresponds to a first region of the target object in the training image, and the first gray value is one of two gray values; and a training unit configured to train the image generation model based on the training images and the matting images so that the image generation model outputs generated images corresponding to the matting images of the input images based on the input images.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to implement a method according to above.

According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to implement the method according to the above.

According to another aspect of the present disclosure, there is provided a computer program product comprising a computer program, wherein the computer program when executed by a processor implements the method according to the above.

According to one or more embodiments of the present disclosure, by obtaining a mask image based on a first image containing a target object, since the mask image corresponds to a binarized matte image obtained by matting the first image with respect to the target object, and since a region having a first gray value of two gray values in the mask image corresponds to a region having the first gray value in the matte image and corresponds to a region of the target object corresponding to the first image, the generated mask image has an effect that it can serve as a mask for the target object obtained by object-segmenting the first image with respect to the target object, while the accuracy of the mask is consistent with the accuracy of the matte image, the accuracy of the obtained mask is ensured. Meanwhile, compared with the matting technology, according to the image generation method disclosed by the invention, the process of determining the foreground and the background through manual interaction is not needed, and the processing efficiency is high while the image with the matting effect is generated.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate exemplary embodiments of the embodiments and, together with the description, serve to explain the exemplary implementations of the embodiments. The illustrated embodiments are for purposes of illustration only and do not limit the scope of the claims. Throughout the drawings, identical reference numbers designate similar, but not necessarily identical, elements.

FIG. 1 illustrates a schematic diagram of an exemplary system in which various methods described herein may be implemented, according to an embodiment of the present disclosure;

FIG. 2 shows a flow diagram of an image generation method according to an embodiment of the present disclosure;

FIG. 3 shows a flow diagram of a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 4 shows a flowchart of a process of obtaining a training image containing a target object and a binarized matte image corresponding to the training image in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 5A shows a schematic diagram of a training image in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 5B shows a schematic diagram of a segmented image obtained based on a training image in a method for training an image generation model according to an embodiment of the present disclosure;

fig. 5C shows a schematic diagram of a binarized processed image obtained based on a training image in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 5D shows a schematic diagram of a dilated image obtained based on a segmented image in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 5E shows a schematic diagram of a corrosion image obtained based on a segmented image and a binarized image in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 5F shows a schematic diagram of a trimap in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 5G shows a schematic diagram of a matte image in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 6 shows a flowchart of a process of obtaining a trimap based on a segmented image and a binarized processed image in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 7 shows a flowchart of a process of obtaining a trimap based on a dilated image and a binarized processed image in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 8 shows a flow diagram of a process of obtaining a trimap based on intersection images and dilated images in a method for training an image generation model according to an embodiment of the present disclosure;

FIG. 9 shows a flowchart of a process for training an image generation model based on training images and matting images in a method for training an image generation model according to an embodiment of the disclosure;

fig. 10 shows a block diagram of the structure of an image generation apparatus according to an embodiment of the present disclosure;

FIG. 11 shows a block diagram of an apparatus for training an image generation model according to an embodiment of the present disclosure; and

FIG. 12 illustrates a block diagram of an exemplary electronic device that can be used to implement embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, it will be recognized by those of ordinary skill in the art that various changes and modifications may be made to the embodiments described herein without departing from the scope of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

In the present disclosure, unless otherwise specified, the use of the terms "first", "second", etc. to describe various elements is not intended to limit the positional relationship, the timing relationship, or the importance relationship of the elements, and such terms are used only to distinguish one element from another. In some examples, a first element and a second element may refer to the same instance of the element, and in some cases, based on the context, they may also refer to different instances.

The terminology used in the description of the various described examples in this disclosure is for the purpose of describing particular examples only and is not intended to be limiting. Unless the context clearly indicates otherwise, if the number of elements is not specifically limited, the elements may be one or more. Furthermore, the term "and/or" as used in this disclosure is intended to encompass any and all possible combinations of the listed items.

Embodiments of the present disclosure will be described in detail below with reference to the accompanying drawings.

Fig. 1 illustrates a schematic diagram of an exemplary system 100 in which various methods and apparatus described herein may be implemented in accordance with embodiments of the present disclosure. Referring to fig. 1, the system 100 includes one or

more client devices

101, 102, 103, 104, 105, and 106, a server 120, and one or more communication networks 110 coupling the one or more client devices to the server 120.

Client devices

101, 102, 103, 104, 105, and 106 may be configured to execute one or more applications.

In embodiments of the present disclosure, the server 120 may run one or more services or software applications that enable the image generation method to be performed.

In some embodiments, the server 120 may also provide other services or software applications that may include non-virtual environments and virtual environments. In certain embodiments, these services may be provided as web-based services or cloud services, for example, provided to users of

client devices

101, 102, 103, 104, 105, and/or 106 under a software as a service (SaaS) model.

In the configuration shown in fig. 1, server 120 may include one or more components that implement the functions performed by server 120. These components may include software components, hardware components, or a combination thereof, which may be executed by one or more processors. A user operating a

client device

101, 102, 103, 104, 105, and/or 106 may, in turn, utilize one or more client applications to interact with the server 120 to take advantage of the services provided by these components. It should be understood that a variety of different system configurations are possible, which may differ from system 100. Accordingly, fig. 1 is one example of a system for implementing the various methods described herein and is not intended to be limiting.

The user may receive the generated image using

client devices

101, 102, 103, 104, 105, and/or 106. The client device may provide an interface that enables a user of the client device to interact with the client device. The client device may also output information to the user via the interface. Although fig. 1 depicts only six client devices, those skilled in the art will appreciate that any number of client devices may be supported by the present disclosure.

Client devices

101, 102, 103, 104, 105, and/or 106 may include various types of computer devices, such as portable handheld devices, general purpose computers (such as personal computers and laptop computers), workstation computers, wearable devices, smart screen devices, self-service terminal devices, service robots, gaming systems, thin clients, various messaging devices, sensors or other sensing devices, and so forth. These computer devices may run various types and versions of software applications and operating systems, such as MICROSOFT Windows, APPLE iOS, UNIX-like operating systems, linux, or Linux-like operating systems (e.g., GOOGLE Chrome OS); or include various Mobile operating systems such as MICROSOFT Windows Mobile OS, iOS, windows Phone, android. Portable handheld devices may include cellular telephones, smart phones, tablets, personal Digital Assistants (PDAs), and the like. Wearable devices may include head-mounted displays (such as smart glasses) and other devices. The gaming system may include a variety of handheld gaming devices, internet-enabled gaming devices, and the like. The client device is capable of executing a variety of different applications, such as various Internet-related applications, communication applications (e.g., email applications), short Message Service (SMS) applications, and may use a variety of communication protocols.

Network 110 may be any type of network known to those skilled in the art that may support data communications using any of a variety of available protocols, including but not limited to TCP/IP, SNA, IPX, etc. By way of example only, one or more networks 110 may be a Local Area Network (LAN), an ethernet-based network, a token ring, a Wide Area Network (WAN), the internet, a virtual network, a Virtual Private Network (VPN), an intranet, an extranet, a Public Switched Telephone Network (PSTN), an infrared network, a wireless network (e.g., bluetooth, WIFI), and/or any combination of these and/or other networks.

The server 120 may include one or more general purpose computers, special purpose server computers (e.g., PC (personal computer) servers, UNIX servers, mid-end servers), blade servers, mainframe computers, server clusters, or any other suitable arrangement and/or combination. The server 120 may include one or more virtual machines running a virtual operating system, or other computing architecture involving virtualization (e.g., one or more flexible pools of logical storage that may be virtualized to maintain virtual storage for the server). In various embodiments, the server 120 may run one or more services or software applications that provide the functionality described below.

The computing units in server 120 may run one or more operating systems including any of the operating systems described above, as well as any commercially available server operating systems. The server 120 may also run any of a variety of additional server applications and/or middle tier applications, including HTTP servers, FTP servers, CGI servers, JAVA servers, database servers, and the like.

In some implementations, the server 120 can include one or more applications to analyze and consolidate data feeds and/or event updates received from users of the

client devices

101, 102, 103, 104, 105, and 106. Server 120 may also include one or more applications to display data feeds and/or real-time events via one or more display devices of

client devices

101, 102, 103, 104, 105, and 106.

In some embodiments, the server 120 may be a server of a distributed system, or a server incorporating a blockchain. The server 120 may also be a cloud server, or a smart cloud computing server or a smart cloud host with artificial intelligence technology. The cloud Server is a host product in a cloud computing service system, and is used for solving the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service.

The system 100 may also include one or more databases 130. In some embodiments, these databases may be used to store data and other information. For example, one or more of the databases 130 may be used to store information such as audio files and object files. The data store 130 may reside in various locations. For example, the data store used by the server 120 may be local to the server 120, or may be remote from the server 120 and may communicate with the server 120 via a network-based or dedicated connection. The data store 130 may be of different types. In certain embodiments, the data store used by the server 120 may be a database, such as a relational database. One or more of these databases may store, update, and retrieve data to and from the database in response to the command.

In some embodiments, one or more of the databases 130 may also be used by applications to store application data. The databases used by the application may be different types of databases, such as key-value stores, object stores, or conventional stores supported by a file system.

The system 100 of fig. 1 may be configured and operated in various ways to enable application of the various methods and apparatus described in accordance with the present disclosure.

Referring to fig. 2, an image generation method 200 according to some embodiments of the present disclosure includes:

step S210: obtaining a first image containing a target object; and

step S220: based on the first image, a binarized mask image corresponding to the first image is generated.

Wherein the mask image corresponds to a binarized matte image obtained by matting the first image with respect to the target object, and wherein a region of the mask image having a first gray value of the two gray values corresponds to a region of the matte image having the first gray value and corresponds to a corresponding region of the target object in the first image.

The mask image is obtained based on a first image containing a target object, the mask image corresponds to a binary sectional image obtained by sectional drawing of the first image aiming at the target object, meanwhile, a region with a first gray value in two gray values in the mask image corresponds to a region with the first gray value in the sectional image and corresponds to a region corresponding to the target object in the first image, so that the generated mask image has the effect of serving as a mask of the target object obtained by object segmentation aiming at the target object of the first image, and meanwhile, the accuracy of the mask is consistent with the accuracy of the sectional image, and the accuracy of the obtained mask is ensured. Meanwhile, compared with the matting technology, according to the image generation method disclosed by the invention, the process of determining the foreground and the background through manual interaction is not needed, and the processing efficiency is high while the image with the matting effect is generated.

In the related art, semantic segmentation is often adopted to segment a target object in an image, however, in the semantic segmentation method, for a target object with high precision requirement and high detail requirement, an ideal segmentation effect cannot be obtained due to the existence of many fuzzy regions. For example, corresponding to the hair segmentation in the face image, the background area between the hairs is often segmented into the mask of the hairs as a part of the hairs. Meanwhile, in the related art, in order to solve the problem that the image of the blurred region is difficult to accurately segment, a matting technique is often adopted. However, matting techniques require manual interactive guidance (e.g., specifying foreground and background regions in an image), resulting in a process that is not automated.

In the method according to the present disclosure, based on a first image containing a target object, a binary mask image is automatically generated, and the mask image can be used as a mask for segmenting the first image for the target object on one hand, and has a matting effect corresponding to a matting image corresponding to the first image on the other hand, so that even for a target object with high precision requirement and more detail requirement, a mask with high precision can be obtained, and simultaneously the mask obtaining efficiency is high.

In some embodiments, the target object may be any object such as a human, an animal, a plant, and the like.

In some embodiments, the target object comprises at least one of: human hair or fur, animal hair and wool.

Because human hair or hair, animal hair and wool fabric adopt semantic segmentation to obtain the mask and often make the mask that obtains reach the precision of wanting, and adopt the matting technique to obtain the mask, often make the time spent long, adopt according to the method of this disclosure, can make the mask precision that obtains higher while, make the efficiency of obtaining the mask high.

It should be understood that the embodiments are described by taking human hair or hair, animal hair and wool as examples, which are only exemplary, and those skilled in the art will understand that the method according to the present disclosure can be applied to the segmentation of any object to obtain the effect of the present disclosure, even if the obtained mask has high precision, and the efficiency of obtaining the mask is high.

In some embodiments, generating, based on the first image, a binarized mask image corresponding to the first image comprises:

obtaining the mask image based on the first image using an image generation model corresponding to the target object, wherein the image generation model is obtained by training using a training image including the target object and a binarized matte image corresponding to the training image,

the cutout image corresponding to the training image is obtained by cutout of the training image for the target object, and a region of the cutout image corresponding to the training image with the first gray value corresponds to a region of the target object in the training image.

And obtaining a mask image based on the first image by adopting an image generation model, and realizing generation of the mask image.

In the related art, a segmentation mask for obtaining a target object by semantic segmentation is adopted, and the segmentation effect of the segmentation mask is often dependent on training data, wherein the training data is often a training image and annotation data for labeling the target object in the training image. For target objects with higher precision requirements and more detail requirements, the number of target objects to be labeled in the training image is very large, and it is often difficult to obtain reliable labeling data.

According to the method disclosed by the invention, the training data is obtained by taking the training image and the binary sectional image corresponding to the training image as the training data of the training generation model, so that the method for obtaining the training data is simple and reliable, and the method for obtaining the image generation model is simple.

In some embodiments, the image generation model includes generating a countermeasure network, a cycle generation network, a conditional GAN network applied to image translation (image-to-image translation), and the like.

Referring to fig. 3, a method 300 for training an image generation model according to some embodiments of the present disclosure includes:

step S310: obtaining a training image containing a target object and a binary sectional image corresponding to the training image; and

step S320: training the image generation model based on the training images and the matting images.

In step S310, the cutout image is obtained by cutting out the training image for the target object, and a region of the cutout image having a first gray value of the two gray values corresponds to a corresponding first region of the target object in the training image. In step S320, the image generation model is caused to output a generated image corresponding to a cutout image of an input image based on the input image.

The method comprises the steps of training an image generation model based on a matting image corresponding to a training image and the training image, enabling the image generation model to obtain a mask image based on a first image containing a target object, enabling a region with a first gray value in two gray values in the mask image to correspond to a binarized matting image obtained by matting the first image aiming at the target object, enabling the generated mask image to have an effect of serving as a mask of the target object obtained by performing target segmentation aiming at the target object on the first image, enabling the precision of the mask to be consistent with that of the matting image and ensuring the accuracy of the obtained mask, wherein the region with the first gray value in the mask image corresponds to the region with the first gray value in the matting image and corresponds to the region of the target object in the first image. Meanwhile, compared with the matting technology, according to the image generation method disclosed by the invention, the process of determining the foreground and the background through manual interaction is not needed, and the processing efficiency is high while the image with the matting effect is generated.

In some embodiments, the training image may be any image that contains a target object. The target object may be human hair or any one of hair, animal hair and wool.

In some embodiments, as shown in fig. 4, obtaining a training image containing a target object and a binarized matte image corresponding to the training image comprises:

step S410: carrying out segmentation processing on the training image to obtain a binary segmentation image;

step S420: performing binarization processing on the training image based on the gray value corresponding to the target object in the training image to obtain a binarization processing image;

step S430: obtaining a trisection image based on the segmentation image and the binarization processing image; and

step S440: based on the area corresponding to the second gray value in the three-part image, carrying out cutout processing on the training image to obtain the cutout image.

In step S410, a sub-region of the segmented image with the first gray value in the segmented image at least includes a region corresponding to the first region; in step S420, the binarized image sub-region having the first gradation value in the binarized image includes at least a region corresponding to the first region; in step S430, the second gray value region in the trimap image having the second gray value different from the two gray values in the three gray values includes at least a region corresponding to the second region in the training image, the three gray values include the two gray values, the divided image sub-region includes a region corresponding to the second region, and the binarized processed image sub-region does not include a region corresponding to the second region.

Through segmentation image and binarization processing image based on training image, trisection image is obtained, and matting is carried out based on the obtained trisection image to obtain a matting image, so that the process of obtaining the matting image corresponding to the training image can be automatically completed by adopting a computer, the foreground and background areas of the first image do not need manual interaction to guide, the process of obtaining the matting image corresponding to the training image is simplified, and the efficiency is improved.

In some embodiments, the training image is subjected to a segmentation process using a segmentation model. The segmentation model may, for example, comprise a segmented neural network.

Referring to FIG. 5A, a schematic diagram of a training image 500 is shown, according to one embodiment of the present disclosure. The training image 500 is a face image, and the target object is hair in the face image. According to some embodiments, the training image 500 is segmented, resulting in a binarized segmented image 510 as shown in FIG. 5B. As shown in fig. 5B, the segmented image 510 includes a segmented image sub-region 511 having a first gray value (255), and since the processing accuracy of the segmentation model on the fuzzy region of the hair edge of the hair is not high enough, the segmented image sub-region 511 in the segmented image 510 is larger than the first region where the hair is located in the training image 500, that is, the segmented image sub-region 511 in the segmented image 510 includes a region corresponding to the first region.

It should be noted that, in the training image 500, the image is subjected to mosaic processing based on the necessity of description and privacy protection, and this processing is not required in the actual application process. Meanwhile, it should be noted that the face image in the embodiment according to the present disclosure is not a face image for a specific user, and cannot reflect personal information of a specific user, and the face image is from a public data set.

In some embodiments, the process of binarizing the training images differs for different training images based on different gray value thresholds.

In some examples, the training image is binarized for hair in the face image based on the gray value of the hair in the training image.

With continued reference to fig. 5A, for the training image 500, the training image 500 is binarized using the gray value of the hair in the training image 500 as the gray value threshold. For example, if the gray-scale value threshold is obtained as 80 based on the training image 500, the gray-scale value of the position in the first image where the gray-scale value is smaller than the gray-scale value threshold is set to 255, and the gray-scale value of the position where the gray-scale value is larger than the gray-scale value threshold is set to 0, thereby obtaining the binarized image 520 as shown in fig. 5C. As shown in fig. 5C, the binarized image 520 includes a binarized image sub-area 521 with a first grayscale value (255), and since the grayscale values of the eyebrows, clothes, or other parts on the face of a person are similar to the grayscale values of the hairs in the binarizing process, the binarized image sub-area 521 includes an area corresponding to the area where the eyebrows, clothes, or other parts with grayscale values corresponding to the grayscale values of the hairs are located, in addition to the area corresponding to the first area where the hairs are located in the training image 500.

Because the binarized image sub-region 521 in the binarized image 520 and the segmented image sub-region 511 in the segmented image 510 both include a region corresponding to the first region where the hair in the training image 500 is located, the region where the binarized image sub-region 521 and the segmented image sub-region 511 in the training image 500 both have corresponding regions is the region where the hair is located, and for other regions, the matting processing can be performed pertinently, the region in the training image 500 which needs to be subjected to matting is reduced, and the efficiency of obtaining the matting image of the training image 500 is improved.

Meanwhile, according to the method disclosed by the invention, the cutout image of the training image is obtained by carrying out cutout processing on the fuzzy region in the segmentation image of the training image, so that the cutout image is accurate while the cutout efficiency is improved.

In some embodiments, a region other than the overlapping region in the divided image sub-region 511, which is a region existing in both the binarized processed image sub-region 521 and the divided image sub-region 511, is taken as a region corresponding to the second grayscale value region to be subjected to matting in the trimap.

In some embodiments, obtaining a trimap image based on the segmentation image and the binarization-processed image as shown in fig. 6 includes:

step S610: performing expansion processing on the segmentation image to obtain an expansion image, wherein an expansion image subregion with the first gray value in the expansion image is larger than a region corresponding to the segmentation image subregion; and

step S620: obtaining the trimap image based on the dilated image and the binarized image, the second gray value region including at least a region corresponding to a third region including the second region in the training image, the dilated image sub-region including a region corresponding to the third region, and the binarized image sub-region not including a region corresponding to the third region.

The method comprises the steps of expanding a segmentation image, expanding the boundary of a segmentation image subregion in the segmentation image, obtaining an expanded image subregion larger than the segmentation image subregion, increasing the area of a second gray value region to be subjected to image matting in a trisection image obtained based on the expanded image subregion, ensuring that the second gray value region to be subjected to image matting in the trisection image is enough to cover a fuzzy region, and ensuring the accuracy of an obtained first image matting.

According to some examples, the dilated image 530 shown in fig. 5D, which is obtained after the dilated image 510 in fig. 5B is subjected to a dilation process, wherein the dilated image subregion 531 having the first gray value (255) in the dilated image 530 is larger than the segmented image subregion 511 in the segmented image 510.

In some embodiments, a region other than the overlapping region in the expanded image sub-region 531 is taken as a region corresponding to the second gray value region to be subjected to matting in the trimap image, where the overlapping region is a region existing in both the binarized processed image sub-region 521 and the expanded image sub-region 531.

In some embodiments, as shown in fig. 7, obtaining the trimap image based on the dilated image and the binarized processed image comprises:

step S710: obtaining a binarized intersection image based on the segmentation image and the binarized processing image, wherein an intersection image sub-region with the first gray value in the intersection image corresponds to a fourth region in the training image, and the segmentation image sub-region and the intersection image sub-region both comprise a region corresponding to the fourth region; and

step S720: obtaining the three-part map based on the intersection image and the expansion image, wherein a first gray value region with the first gray value in the three-part map is not larger than the fourth region.

By obtaining an intersection image of the segmentation image and the binarization image and obtaining a first gray value region which does not need to be subjected to image matting in the three-segment image based on the intersection image, the first gray value region is smaller than the intersection image sub-region of the segmentation image sub-region and the intersection image sub-region in the intersection image, which both comprise the intersection image sub-region of the region corresponding to the first gray value region, the area of the first gray value region is reduced, the area of a second gray value region to be subjected to image matting in the three-segment image is further enlarged, the second gray value region to be subjected to image matting in the three-segment image is further ensured to sufficiently cover a fuzzy region, and the accuracy of the obtained first image matting is ensured.

In some embodiments, as shown in fig. 8, obtaining the trimap based on the intersection image and the dilated image comprises:

step S810: performing erosion processing on the intersection image to obtain an erosion image, wherein a sub-area of the erosion image with the first gray value in the erosion image is smaller than an area corresponding to the fourth area; and

step S820: generating the trimap based on the erosion image and the dilated image, wherein the first grayscale value region corresponds to the erosion image sub-region, the second grayscale value region corresponds to a first sub-region of the dilated image sub-region, and the first sub-region is distinct from a second sub-region of the dilated image sub-region that corresponds to the erosion image sub-region.

By carrying out corrosion treatment on the intersection image, noise points which are obviously not corresponding to the hair region in the intersection image are removed, the influence of the noise points is avoided, and the accuracy of the obtained three-dimensional image is improved.

According to some examples, an erosion image 540 as shown in fig. 5E obtained after an intersection image obtained based on the divided image 510 in fig. 5B and the binarized image in fig. 5C is subjected to erosion processing, wherein an erosion image sub-region 541 in the erosion image 540 is smaller than a region corresponding to a fourth region present in both the divided image 510 and the binarized image 520.

A trimap image 550, as shown in fig. 5F, is obtained from the erosion image 540 and the dilation image 530, according to some examples. Wherein the first gray value region 551 in the trimap image 550 corresponds to the erosion image sub-region 541 in the erosion image 540, and the second gray value region 552 in the trimap image 550 corresponds to a first sub-region in the expanded image sub-region 531, which is different from a second sub-region in the expanded image sub-region 531 corresponding to the erosion image sub-region 541.

In some embodiments, a depth image matting technique is employed to matte the second grayscale value region of the trimap.

In one example, matting is performed based on a tripartite graph 550, resulting in a matte image 560 as shown in FIG. 5G.

In some embodiments, as shown in FIG. 9, training the image generation model based on the training images and the matte images comprises:

step S910: obtaining a first generated image corresponding to the training image based on the training image by using the image generation model;

step S920: inputting the first generated image and the matting image to a discriminator network to obtain a discrimination result and

step S930: and adjusting parameters of the image generation model based on the judgment result.

In step S920, the determination result indicates the similarity between the first generated image and the cutout image.

And through the condition constraint of the discriminator network, the image generated by the image generation model is consistent with the cutout image corresponding to the training image, so that the training result of the image generation model is more accurate.

In some examples, the process of training the image generation model by using the training image and the matting image with the training image is performed iteratively, so that the image generation model with high accuracy is finally obtained.

In the technical scheme of the disclosure, the collection, storage, use, processing, transmission, provision, disclosure and other processing of the personal information of the related user are all in accordance with the regulations of related laws and regulations and do not violate the good customs of the public order.

According to another aspect of the present disclosure, there is also provided an image generating apparatus, as shown in fig. 10, the apparatus 1000 including: a first acquisition unit 1010 configured to obtain a first image containing a target object; and a generating unit 1020 configured to generate a binarized mask image corresponding to the first image based on the first image, wherein the mask image corresponds to a binarized matte image obtained by matting the first image with respect to the target object, and wherein a region having a first grayscale value of two grayscale values in the mask image corresponds to a region having the first grayscale value in the matte image and corresponds to a region of the target object corresponding to the first image.

In some embodiments, the generating unit 1020 includes: a model processing unit configured to obtain the mask image based on the first image by using an image generation model corresponding to the target object, wherein the image generation model is obtained by training a second image containing the target object and a binarized matte image corresponding to the second image, the matte image corresponding to the second image is obtained by matting the second image with respect to the target object, and a region having the first gray value in the matte image corresponding to the region of the target object in the second image.

According to another aspect of the present disclosure, there is also provided an apparatus for training an image generation model, as shown in fig. 11, the apparatus 1100 includes: a training image obtaining unit 1110 configured to obtain a training image including a target object and a matte image corresponding to the training image, where the matte image is a binarized image obtained by matting the training image with respect to the target object, and a region having a first gray value of two gray values in the matte image corresponds to a first region of the target object corresponding to the training image; and a training unit 1120 configured to train the image generation model based on the training images and the matting images so that the image generation model outputs generated images corresponding to the matting images of the input images based on the input images.

In some embodiments, the training image acquisition unit 1110 includes: an image segmentation unit configured to perform segmentation processing on the training image to obtain a binarized segmented image, wherein a segmented image sub-region having the first gray value in the segmented image at least includes a region corresponding to the first region; a binarization processing unit configured to perform binarization processing on the training image based on a grayscale value corresponding to the target object in the training image to obtain a binarization-processed image, wherein a binarization-processed image sub-region having the first grayscale value in the binarization-processed image at least includes a region corresponding to the first region; a trimap image processing unit configured to obtain a trimap image, based on the segmentation image and the binarization-processed image, a second gray value region of the trimap image having a second gray value different from the two gray values among three gray values including the two gray values at least including a region corresponding to a second region of the training image, the segmentation image sub-region including the region corresponding to the second region, and the binarization-processed image sub-region not including the region corresponding to the second region; and the matting unit is configured to perform matting processing on the training image based on a region corresponding to the second gray value in the three-segment image to obtain the matting image.

In some embodiments, the trimap image processing unit includes: an expansion processing unit configured to perform expansion processing on the segmented image to obtain an expanded image, wherein an expanded image subregion with the first gray value in the expanded image is larger than a region corresponding to the segmented image subregion; and a first processing unit configured to obtain the trimap image based on the dilated image and the binarized image, the second gray value region including at least a region corresponding to a third region including the second region in the training image, the dilated image sub-region including a region corresponding to the third region, and the binarized image sub-region not including a region corresponding to the third region.

In some embodiments, the first processing comprises: an intersection image acquisition unit configured to acquire a binarized intersection image based on the segmented image and the binarized processed image, an intersection image sub-region having the first gray value in the intersection image corresponding to a fourth region in the training image, the segmented image sub-region and the intersection image sub-region each including a region corresponding to the fourth region; and a second processing unit configured to obtain the trimap image based on the intersection image and the dilated image, wherein a first gray value region having the first gray value in the trimap image is not greater than the fourth region.

In some embodiments, the second processing unit comprises: the erosion processing unit is configured to perform erosion processing on the intersection image to obtain an erosion image, and a sub-area of the erosion image with the first gray value in the erosion image is smaller than an area corresponding to the fourth area; a trimap image generation unit configured to generate the trimap image based on the erosion image and the dilated image, the first grayscale region corresponding to the erosion image sub-region, the second grayscale region corresponding to a first sub-region of the dilated image sub-region, the first sub-region being different from a second sub-region of the dilated image sub-region corresponding to the erosion image sub-region.

In some embodiments, the training unit comprises: a generated image acquisition unit configured to acquire a first generated image corresponding to the training image based on the training image using the image generation model; a discrimination unit configured to input the first generated image and the matte image to a discriminator network to obtain a discrimination result indicating a similarity of the first generated image and the matte image; and an adjusting unit configured to adjust a parameter of the image generation model based on the discrimination result.

According to another aspect of the present disclosure, there is also provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores a computer program which, when executed by the at least one processor, implements a method according to the above.

According to another aspect of the present disclosure, there is also provided a non-transitory computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the method according to the above.

According to another aspect of the present disclosure, there is also provided a computer program product comprising a computer program, wherein the computer program realizes the method according to the above when executed by a processor.

Referring to fig. 12, a block diagram of a structure of an electronic device 1200, which may be a server or a client of the present disclosure, which is an example of a hardware device that may be applied to aspects of the present disclosure, will now be described. Electronic device is intended to represent various forms of digital electronic computer devices, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other suitable computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 12, the electronic apparatus 1200 includes a computing unit 1201, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 1202 or a computer program loaded from a storage unit 1208 into a Random Access Memory (RAM) 1203. In the RAM 1203, various programs and data necessary for the operation of the electronic apparatus 1200 may also be stored. The computing unit 1201, the ROM 1202, and the RAM 1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

A number of components in the electronic device 1200 are connected to the I/O interface 1205, including: an input unit 1206, an output unit 1207, a storage unit 1208, and a communication unit 1209. The input unit 1206 may be any type of device capable of inputting information to the electronic device 1200, and the input unit 1206 may receive input numeric or character information and generate key signal inputs related to user settings and/or function controls of the electronic device, and may include, but is not limited to, a mouse, a keyboard, a touch screen, a track pad, a track ball, a joystick, a microphone, and/or a remote controller. Output unit 1207 may be any type of device capable of presenting information and may include, but is not limited to, a display, speakers, an object/audio output terminal, a vibrator, and/or a printer. Storage unit 1208 can include, but is not limited to, magnetic or optical disks. The communication unit 1209 allows the electronic device 1200 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunications networks, and may include, but is not limited to, a modem, a network card, an infrared communication device, a wireless communication transceiver, and/or a chipset, such as bluetooth ^TM Devices, 802.11 devices, wiFi devices, wiMax devices, cellular communication devices, and/or the like.

The computing unit 1201 may be a variety of general purpose and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 1201 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The computing unit 1201 performs the various methods and processes described above, such as the method 200. For example, in some embodiments, the method 200 may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as the storage unit 1208. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 1200 via the ROM 1202 and/or the communication unit 1209. One or more steps of the method 200 described above may be performed when the computer program is loaded into the RAM 1203 and executed by the computing unit 1201. Alternatively, in other embodiments, the computing unit 1201 may be configured to perform the method 200 in any other suitable manner (e.g., by way of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server may be a cloud server, a server of a distributed system, or a server with a combined blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be performed in parallel, sequentially or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

While embodiments or examples of the present disclosure have been described with reference to the accompanying drawings, it is to be understood that the above-described methods, systems and apparatus are merely illustrative embodiments or examples and that the scope of the invention is not to be limited by these embodiments or examples, but only by the claims as issued and their equivalents. Various elements in the embodiments or examples may be omitted or may be replaced with equivalents thereof. Further, the steps may be performed in an order different from that described in the present disclosure. Further, the various elements in the embodiments or examples may be combined in various ways. It is important that as technology evolves, many of the elements described herein may be replaced with equivalent elements that appear after the present disclosure.

Claims

1. A method for training an image generation model, comprising:

obtaining a training image containing a target object and a binarized matte image corresponding to the training image, wherein the matte image is obtained by matting the training image aiming at the target object, a region with a first gray value in the matte image corresponds to a first region corresponding to the target object in the training image, and the first gray value is one of two gray values; and

training the image generation model based on the training images and the sectional images so that the image generation model outputs generated images corresponding to the sectional images of the input images based on the input images;

wherein the obtaining of the training image containing the target object and the binarized matte image corresponding to the training image comprises:

performing segmentation processing on the training image to obtain a binarized segmented image, wherein a segmented image sub-region with the first gray value in the segmented image at least comprises a region corresponding to the first region;

performing binarization processing on the training image based on a gray value corresponding to the target object in the training image to obtain a binarization processing image, wherein a sub-region of the binarization processing image with the first gray value in the binarization processing image at least comprises a region corresponding to the first region;

obtaining a ternary map based on the segmented image and the binarized processed image, a second gray value region in the ternary map including at least a region corresponding to a second region in the training image, the second gray value region having a second gray value different from the two gray values among three gray values including the two gray values, the segmented image sub-region including a region corresponding to the second region, and the binarized processed image sub-region not including a region corresponding to the second region; and

based on the area corresponding to the second gray value in the three-part image, carrying out cutout processing on the training image to obtain the cutout image.

2. The method according to claim 1, wherein the obtaining a trimap image based on the segmentation image and the binarization-processed image comprises:

performing expansion processing on the segmentation image to obtain an expansion image, wherein an expansion image subregion with the first gray value in the expansion image is larger than a region corresponding to the segmentation image subregion; and

obtaining the trimap image based on the dilated image and the binarized image, the second gray value region including at least a region corresponding to a third region including the second region in the training image, the dilated image sub-region including a region corresponding to the third region, and the binarized image sub-region not including a region corresponding to the third region.

3. The method according to claim 2, wherein the obtaining the trimap image based on the dilated image and the binarized processed image comprises:

obtaining a binarized intersection image based on the segmented image and the binarized processed image, wherein an intersection image sub-region with the first gray value in the intersection image corresponds to a fourth region in the training image, and the segmented image sub-region and the intersection image sub-region both comprise a region corresponding to the fourth region; and

obtaining the three-part map based on the intersection image and the expansion image, wherein a first gray value region with the first gray value in the three-part map is not larger than the fourth region.

4. The method of claim 3, wherein the obtaining the trimap image based on the intersection image and the dilated image comprises:

corroding the intersection image to obtain a corroded image, wherein a corroded image sub-area with the first gray value in the corroded image is smaller than an area corresponding to the fourth area; and

generating the trimap based on the erosion image and the dilated image, the first grayscale region corresponding to the erosion image sub-region, the second grayscale region corresponding to a first sub-region of the dilated image sub-region, the first sub-region being distinct from a second sub-region of the dilated image sub-region corresponding to the erosion image sub-region.

5. The method of claim 1, wherein the training the image generation model based on the training images and the matte images comprises:

obtaining a first generated image corresponding to the training image based on the training image by using the image generation model;

inputting the first generated image and the matte image to a discriminator network to obtain a discrimination result, the discrimination result indicating a similarity of the first generated image and the matte image; and

and adjusting parameters of the image generation model based on the discrimination result.

6. An image generation method, comprising:

obtaining a first image containing a target object; and

generating a binarized mask image corresponding to the first image based on the first image, wherein,

the mask image corresponds to a binarized matte image obtained by matting the first image with respect to the target object, wherein a region of the mask image having a first gray value of the two gray values corresponds to a region of the matte image having the first gray value and corresponds to a region of the target object corresponding to the first image, and wherein the generating of the binarized mask image corresponding to the first image based on the first image comprises:

generating the mask image based on the first image by adopting an image generation model;

wherein the image generation model is trained by a method for training an image generation model according to any one of claims 1 to 5.

7. The method of claim 6, wherein the generating a binarized mask image corresponding to the first image based on the first image comprises:

obtaining the mask image based on the first image using an image generation model corresponding to the target object, the image generation model being obtained by training using a second image containing the target object and a binarized matte image corresponding to the second image, wherein,

the cutout image corresponding to the second image is obtained by cutout of the second image aiming at the target object.

8. The method of claim 6, wherein the target object comprises at least one of: human hair or fur, animal hair and wool.

9. An apparatus for training an image generation model, comprising:

a training image obtaining unit, configured to obtain a training image containing a target object and a cutout image corresponding to the training image, where the cutout image is a binarized image obtained by matting the training image with respect to the target object, a region in the cutout image having a first gray value corresponds to a first region of the target object in the training image, and the first gray value is one of two gray values; and

a training unit configured to train the image generation model based on the training image and the matting image so that the image generation model outputs a generated image corresponding to the matting image of the input image based on the input image;

wherein the training image acquisition unit includes:

an image segmentation unit configured to perform segmentation processing on the training image to obtain a binarized segmented image, wherein a segmented image sub-region having the first gray value in the segmented image at least includes a region corresponding to the first region;

a binarization processing unit configured to perform binarization processing on the training image based on a grayscale value corresponding to the target object in the training image to obtain a binarization-processed image, wherein a binarization-processed image sub-region having the first grayscale value in the binarization-processed image at least includes a region corresponding to the first region;

a trimap image processing unit configured to obtain a trimap image based on the segmentation image and the binarization-processed image, a second gray value region in the trimap image including at least a region corresponding to a second region in the training image, the second gray value region having a second gray value different from the two gray values among three gray values including the two gray values, the segmentation image sub-region including a region corresponding to the second region, and the binarization-processed image sub-region not including a region corresponding to the second region; and

a matting unit configured to perform matting processing on the training image based on a region corresponding to the second gray value in the tri-segment map to obtain the matting image.

10. The apparatus of claim 9, wherein the trimap processing unit comprises:

an expansion processing unit configured to perform expansion processing on the segmented image to obtain an expanded image, wherein an expanded image subregion with the first gray value in the expanded image is larger than a region corresponding to the segmented image subregion; and

a first processing unit configured to obtain the trimap image based on the dilated image and the binarized image, the second gray value region including at least a region corresponding to a third region including the second region in the training image, the dilated image sub-region including a region corresponding to the third region, and the binarized image sub-region not including a region corresponding to the third region.

11. The apparatus of claim 10, wherein the first processing comprises:

an intersection image acquisition unit configured to acquire a binarized intersection image based on the segmented image and the binarized processed image, an intersection image sub-region having the first gray value in the intersection image corresponding to a fourth region in the training image, the segmented image sub-region and the intersection image sub-region each including a region corresponding to the fourth region; and

a second processing unit configured to obtain the trimap based on the intersection image and the dilated image, wherein a first gray value region having the first gray value in the trimap is not greater than the fourth region.

12. The apparatus of claim 11, wherein the second processing unit comprises:

the erosion processing unit is configured to perform erosion processing on the intersection image to obtain an erosion image, and a sub-area of the erosion image with the first gray value in the erosion image is smaller than an area corresponding to the fourth area;

a trimap image generation unit configured to generate the trimap image based on the erosion image and the dilated image, the first grayscale region corresponding to the erosion image sub-region, the second grayscale region corresponding to a first sub-region of the dilated image sub-region, the first sub-region being different from a second sub-region of the dilated image sub-region corresponding to the erosion image sub-region.

13. The apparatus of claim 9, wherein the training unit comprises:

a generated image acquisition unit configured to obtain a first generated image corresponding to the training image based on the training image using the image generation model;

a discrimination unit configured to input the first generated image and the matte image to a discriminator network to obtain a discrimination result indicating a similarity of the first generated image and the matte image; and

an adjusting unit configured to adjust parameters of the image generation model based on the discrimination result.

14. An image generation apparatus comprising:

a first acquisition unit configured to obtain a first image containing a target object; and

a generating unit configured to generate a binarized mask image corresponding to the first image based on the first image, wherein,

15. The apparatus of claim 14, wherein the generating unit comprises:

a model processing unit configured to obtain the mask image based on the first image using an image generation model corresponding to the target object, the image generation model being obtained by training with a second image including the target object and a binarized matte image corresponding to the second image, wherein,

the sectional image corresponding to the second image is obtained by sectional drawing of the second image aiming at the target object.

16. The apparatus of claim 14, wherein the target object comprises at least one of: human hair or fur, animal hair and wool.

17. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein

The memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.