WO2019114571A1 - 图像处理方法及相关装置 - Google Patents

图像处理方法及相关装置 Download PDF

Info

Publication number
WO2019114571A1
WO2019114571A1 PCT/CN2018/118644 CN2018118644W WO2019114571A1 WO 2019114571 A1 WO2019114571 A1 WO 2019114571A1 CN 2018118644 W CN2018118644 W CN 2018118644W WO 2019114571 A1 WO2019114571 A1 WO 2019114571A1
Authority
WO
WIPO (PCT)
Prior art keywords
foreground
mask
pixel
foreground object
background
Prior art date
Application number
PCT/CN2018/118644
Other languages
English (en)
French (fr)
Inventor
朱晓龙
黄凯宁
罗镜民
梅利健
黄生辉
郑永森
王一同
黄浩智
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2019114571A1 publication Critical patent/WO2019114571A1/zh
Priority to US16/671,747 priority Critical patent/US11200680B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20072Graph-based image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20156Automatic seed setting

Definitions

  • the present invention relates to the field of computer technology, and in particular to image processing technology.
  • Mathematical image mapping refers to a technique for separating foreground objects (such as people, vehicles, etc.) from existing natural images. It was first used in the special effects production of the film and television industry, which has won huge commercial for the film industry. value.
  • Mathematical image maps are initially limited to fixed background maps, such as separating a foreground object (person) from a blue background portrait image from a blue background.
  • Fig. 1 exemplarily shows an image with a natural background, wherein the natural background 11 contains curtains, carpets, etc., and after the map is completed, the foreground object 12 (a female) and the natural object can be drawn from the image. The background 11 is separated and the natural background 11 is removed.
  • the natural background map has a good application prospect in many fields. How to extract the foreground object is a hot topic in current research.
  • an embodiment of the present invention provides an image processing method and related apparatus, which can finely extract a foreground object from an original image.
  • the embodiment of the present invention provides the following technical solutions:
  • an embodiment of the present invention provides an image processing method, which is applied to an image processing device, and includes:
  • Extracting the foreground object from the original image according to the mask Extracting the foreground object from the original image according to the mask.
  • an embodiment of the present invention provides an image processing apparatus, including:
  • An obtaining unit configured to acquire an original image, where the original image includes a foreground object
  • a foreground background separating unit configured to perform foreground extraction through a deep neural network according to the original image to obtain a foreground region
  • a dividing unit configured to acquire a pixel point of the foreground object from the foreground area, and form a mask according to the pixel point of the foreground object, where the mask includes a mask corresponding to each pixel point of the foreground object value;
  • an extracting unit configured to extract the foreground object from the original image according to the mask.
  • an embodiment of the present invention provides an image processing apparatus including at least a processor and a memory; the processor executes the image processing method by executing a program stored in the memory and calling other devices.
  • an embodiment of the present invention further provides a storage medium, where the storage medium stores a plurality of instructions, and the instructions are adapted to be loaded by a processor to perform any image processing method provided by an embodiment of the present invention. The steps in .
  • an embodiment of the present invention further provides a computer program product, comprising instructions, when executed on a computer, causing a computer to perform the steps in any of the image processing methods provided by the embodiments of the present invention.
  • the image processing device first acquires an original image including a foreground object, and performs foreground extraction processing on the original image through a deep neural network, thereby extracting a foreground region; Obtaining a pixel of the foreground object, and forming a mask according to the pixel of the foreground object, the mask including a mask value corresponding to each pixel of the foreground object; and further, extracting the foreground object from the original image according to the mask.
  • FIG. 1 is a schematic diagram of an original image with a natural background according to an embodiment of the present invention
  • FIG. 2a is a schematic diagram of an application scenario provided by an embodiment of the present invention.
  • FIG. 2b is an exemplary structural diagram of an image processing apparatus according to an embodiment of the present invention.
  • FIG. 2c is an exemplary structural diagram of an image processing device according to an embodiment of the present invention.
  • FIG. 3 is an exemplary flowchart of an image processing method according to an embodiment of the present invention.
  • 4a is a schematic diagram of an image including a foreground area according to an embodiment of the present invention.
  • FIG. 4b is another schematic diagram of an image including a foreground area according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of a mask according to an embodiment of the present invention.
  • FIG. 6 is an exemplary flowchart of an image processing method according to an embodiment of the present invention.
  • FIG. 7a is a schematic diagram of an extracted foreground object according to an embodiment of the present invention.
  • FIG. 7b is a schematic diagram of a result after gesture recognition according to an embodiment of the present invention.
  • FIG. 8 is an exemplary flowchart of an image processing method according to an embodiment of the present invention.
  • Embodiments of the present invention provide an image processing method and related device (image processing device, image processing device), which are suitable for extracting scene requirements of a person (body, body, or even a body part such as a human head), and can also be used for extracting other foreground objects (such as a car).
  • the core idea of the present invention is: first acquiring an original image including a foreground object; then performing foreground extraction processing on the original image through a deep neural network to obtain a foreground region; and then acquiring a pixel of the foreground object from the foreground region, according to the foreground object
  • the pixel forms a mask, and the mask includes a mask value corresponding to each pixel of the foreground object; finally, the foreground object is extracted from the original image according to the mask.
  • the above image processing apparatus can be applied to an image processing apparatus in the form of software or hardware.
  • the image processing device may be a terminal such as a desktop computer, a mobile terminal (for example, a smart phone), an ipad, or the like, or a server that provides an image processing service.
  • the above image processing apparatus may be independent software.
  • it can also be used as a subsystem (sub-component) of a large system (such as an operating system) to provide image processing services.
  • the image processing apparatus described above may be a controller/processor of a terminal or a server.
  • Fig. 2a shows an exemplary application scenario (client-server scenario) of the image processing apparatus described above.
  • the web server 21, the image processing server 22, and, in addition, a model training server 23 may be included in the application scenario.
  • the web server 21 is a front end (foreground), is responsible for communicating with the client browser, the image processing server 22, the model training server 23, etc. are the back end, the image processing server 22 can provide image processing services, and the model training server 23 can be used for training.
  • An image processing algorithm e.g., a map algorithm used by the image processing server 22, or an image sample for training.
  • the image processing apparatus described above is deployed in the image processing server 22.
  • the application scenario of the image processing apparatus described above may also be: the user selects a photo or video, and performs image processing (eg, a map) on the selected photo or video on a terminal (eg, a smart phone). At this time, the image processing device needs to be deployed on the terminal.
  • image processing eg, a map
  • An exemplary structure of the image processing apparatus is as shown in FIG. 2b, and includes an acquisition unit 201, a foreground background separation unit 202, a division unit 203, and an extraction unit 204.
  • FIG. 2c is a schematic diagram showing a possible structure of the image processing apparatus in the above embodiment, including:
  • processor 1 memory 2, communication interface 3, input device 4, and output device 5.
  • the processor 1, the memory 2, the communication interface 3, the input device 4, and the output device 5 are connected to each other through a bus. among them:
  • the bus can include a path for communicating information between various components of the computer system.
  • the processor 1 may be a general-purpose processor, such as a general-purpose central processing unit (CPU), a network processor (NP Processor, NP for short, a microprocessor, etc., or an application-specific integrated circuit (ASIC). , or one or more integrated circuits for controlling the execution of the program of the present invention. It can also be a digital signal processor (DSP), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • FPGA off-the-shelf programmable gate array
  • the program 2 or the script for executing the technical solution of the present invention is stored in the memory 2, and an operating system and other key services can also be saved.
  • the program can include program code, the program code including computer operating instructions.
  • Scripts are usually saved in text (such as ASCII) and interpreted or compiled only when called.
  • the memory 2 may include a read-only memory (ROM), other types of static storage devices that can store static information and instructions, random access memory (RAM), storable information, and Other types of dynamic storage devices, disk storage, flash, and so on.
  • ROM read-only memory
  • RAM random access memory
  • storable information storable information
  • dynamic storage devices disk storage, flash, and so on.
  • Input device 4 may include means for receiving data and information input by a user, such as a keyboard, mouse, camera, voice input device, touch screen, and the like.
  • Output device 5 may include devices that allow output of information to the user, such as a display screen, speakers, and the like.
  • Communication interface 3 may include devices that use any type of transceiver to communicate with other devices or communication networks, such as Ethernet, Radio Access Network (RAN), Wireless Local Area Network (WLAN), and the like.
  • RAN Radio Access Network
  • WLAN Wireless Local Area Network
  • Figure 2c only shows a simplified design of the image processing apparatus.
  • the image processing device may include any number of transmitters, receivers, processors, controllers, memories, communication interfaces, etc., and all servers/smart terminals that can implement the present invention are within the scope of the present invention.
  • the processor 1 can implement the image processing method provided by the following embodiments by executing the program stored in the memory 2 and calling other devices.
  • the functions of the respective units of the image processing apparatus shown in FIG. 2b can be realized by the processor 1 described above executing the program stored in the memory 2 and calling other devices.
  • the above image processing method and image processing apparatus can be used to perform mapping on images (eg, photos) and videos. It should be noted that the video is also composed of one frame of image, so that each frame of the video constituting the video can be mapped, thereby completing the mapping process of the entire video.
  • mapping processing of the video is also based on the mapping processing of the single image
  • the following will introduce the technical solution by taking the mapping processing of the single image as an example.
  • FIG. 3 illustrates an exemplary flow of an image processing method performed by an image processing apparatus, which may include at least the following steps:
  • Part 300 Get the original image.
  • the original image acquired by the image processing device may include a foreground object.
  • Figure 1 shows an example diagram of an original image containing a natural background 11 and a foreground object 12 (a female).
  • an image processing device can acquire an original image in different scenarios.
  • the user can start an image processing application (for example, a drawing software) on the terminal, and the image processing application interface can provide a camera/camera button, and the user clicks the camera/camera button to call
  • the camera of the terminal (such as a camera) can take a photo or video, and the captured photo or video can be stored in the local map of the terminal, and the image processing device can obtain the captured photo or video as the original image.
  • the user can also select the captured photo or video in the terminal's local gallery, launch the image processing application to map the selected photo or video, and the image processing application can obtain the user selected photo from the local gallery.
  • the video is used as the original image for subsequent mapping processing.
  • the image processing application can provide image processing services to the user as an image editing component (eg, an image editing component under the "beautification" option).
  • the photo or video can be sent by the client to the server side, and the image processing device deployed on the server side can obtain the photo or video provided by the client as the original image.
  • the 300 portion may be performed by the aforementioned acquisition unit 201.
  • Section 301 According to the original image described above, foreground extraction is performed through a deep neural network to obtain a foreground region.
  • the 301 portion can be performed by the foreground background separation unit 202 described above.
  • the portion outside the foreground area is the background area.
  • the image is generally composed of a plurality of pixels.
  • the image processing device can determine whether each pixel in the original image is a front spot or a background point, and if it is a background point, it can be assigned a first pixel value (for example, 0) to identify it as the background; if it is the former attraction, assign it a second pixel value (for example, 255) to identify it as a foreground.
  • a first pixel value for example, 0
  • a second pixel value for example, 255
  • the image processing device may also assign a fixed pixel value (for example, 0 or 255) when it is determined that the pixel is the background point, and keep the pixel value unchanged when it is determined that the pixel is the front point.
  • a fixed pixel value for example, 0 or 255
  • the extracted image including the foreground area can be as shown in FIG. 4b.
  • the foreground area is composed of the former attractions.
  • the foreground area and the partial background of Figure 1 may be included in the foreground area, with the background being further removed.
  • the image processing device may adopt various algorithms or models to extract the foreground region, for example, using a mixed Gaussian background model, a deep neural network, or the like to extract the foreground region, and those skilled in the art may flexibly select and design according to needs. This will not be repeated.
  • Section 302 Obtaining a pixel of the foreground object from the foreground area, and forming a mask according to the acquired pixel of the foreground object.
  • Section 302 Compared with the foreground background separation of Section 301, Section 302 performs a more elaborate segmentation process.
  • the mask may include a mask value corresponding to each pixel of the foreground object.
  • the 302 portion can be performed by the aforementioned dividing unit 203.
  • the image processing device can obtain pixel points of the foreground object from the foreground region by using various algorithms.
  • the graph cut algorithm can be used to perform finer segmentation processing, and the pixel points of the foreground object and the pixels of the background are segmented from the foreground region. point.
  • a first pixel value (eg, 0) may be assigned to the segmented pixel points that make up the background
  • a second pixel value eg, 255
  • One pixel value and the second pixel value are mask values.
  • the mask obtained by the graph cut algorithm can be as shown in FIG. 5.
  • the image processing device may further divide the foreground region into a plurality of sub-regions, and segment each sub-region using a graph cut algorithm to obtain a sub-segment result; and then merge the sub-segment results to form a mask.
  • Section 303 Extracting a foreground object from the original image according to the mask described above.
  • the image processing device first acquires an original image including a foreground object, and performs foreground extraction processing on the original image through a deep neural network, thereby extracting a foreground region; Obtaining a pixel of the foreground object, and forming a mask according to the pixel of the foreground object, the mask including a mask value corresponding to each pixel of the foreground object; and further, extracting the foreground object from the original image according to the mask.
  • Deep neural networks are composed of "neurons” in a hierarchical structure, and the weights and offsets between them are trainable.
  • the deep neural network consists of an input layer and an output layer. There are several hidden layers sandwiched between the input layer and the output layer. Each layer is composed of multiple neurons, and the neurons in the same layer are not connected. The output of each layer will be used as the input to the next layer.
  • the deep neural network needs to be trained in advance, and the trained deep neural network can be used as an integral part of the image processing device or the image processing device to complete the mapping operation.
  • the foreground region may be extracted by the deep neural network, and the pixel of the foreground object is obtained from the foreground region by the graph cut algorithm.
  • the foreground background separation unit 202 mentioned above may specifically be a deep neural network or a callable deep neural network, and the segmentation unit 203 may execute or invoke a graph cut algorithm.
  • the graph cut algorithm may be executed by a branch (or hidden layer) of the deep neural network.
  • the deep neural network can implement the function of the foreground background separating unit 202 and the function of the dividing unit 203;
  • the graph cut algorithm may be executed by a network or algorithm module other than the deep neural network, that is, the function of the foreground background separating unit 202 is performed separately from the function of the dividing unit 203.
  • FIG. 6 shows an exemplary flow of an image processing method implemented by a deep neural network in combination with a graph cut algorithm.
  • the deep neural network may include a foreground background separation layer, which may include at least the following steps:
  • Section 600 Deep neural network to obtain the original image
  • Section 600 is similar to the aforementioned 300 part and will not be described here.
  • Section 601 The depth neural network encodes the original image to obtain an original vector; and inputs the original vector to the foreground background separation layer;
  • Encoding the original image yields a fixed vector, which is a fixed feature.
  • an image is often represented as a vector, such as a 1000 ⁇ 1000 image, which can be represented as a vector of 1000000.
  • Deep neural networks generally operate only on vectors of a specific length (the length is determined by the dimensions of the depth neural network input layer), so the image needs to be expanded into a vector (ie, the image is represented by a vector) to be input to the deep neural network.
  • Section 601 encodes the original image to obtain a vector that meets the length required by the deep neural network.
  • Section 602 The deep neural network separates the layers through the foreground background, and extracts the foreground region according to the input original vector;
  • the pixels between the hair strands may be judged as the front sights. Therefore, after the foreground area is obtained, it is further refined afterwards.
  • an input layer or other layer of the deep neural network may also extract at least one of pixel-level features and abstract features of the original vector.
  • the above pixel level features may include colors, shapes, edges, texture features, and the like.
  • Abstract features can include identifying key features, such as whether someone, the foreground of the image is a person or a dog.
  • Section 603 The pixel in the foreground area is used as the designated foreground pixel, and the pixel located outside the foreground area is used as the designated background pixel.
  • the existing graph cut algorithm requires the user to specify part of the foreground and part of the background of the image by some means of interaction, that is, the user needs to specify some foreground points of the foreground object and the background. In many cases, the user may There are many reasons for lack of experience, and it is not possible to accurately specify foreground seed points and background seed points.
  • the foreground area has been obtained in section 602
  • some pixels in the foreground area can be automatically used as the foreground seed point, and the pixel points outside the foreground area are used as the background seed point.
  • Part 603 may be performed by foreground background separation unit 202, or by partition unit 203 that performs a graph cut algorithm.
  • Part 604 Using the graph cut algorithm (according to the foreground seed point and the background seed point), the pixel of the foreground object and the pixel of the background are segmented from the foreground region to obtain a segmentation result.
  • the foreground area obtained in Section 602 may contain the foreground object and a part of the background, so it is also necessary to segment it to further separate the foreground object from the foreground area.
  • the graph cut algorithm is used to perform two splits, so the segmentation result obtained in part 604 can be referred to as a preliminary segmentation result (or referred to as a preliminary mask).
  • a preliminary segmentation result or referred to as a preliminary mask.
  • the graph cut algorithm associates the image segmentation problem with the min cut problem of the graph, and divides the vertex of the graph into two disjoint subsets, which correspond to the foreground pixel set and the background pixel set, respectively.
  • the foreground pixel set includes the pixel points of the foreground object
  • the background pixel set includes the pixels of the background, thereby completing the segmentation of the foreground and the background of the image.
  • the graph cut algorithm is a relatively mature algorithm. For the graph cut algorithm, the foreground seed point and the background seed point are segmented, and will not be described here.
  • the graph cut algorithm can input the preliminary segmentation result into the deep neural network.
  • the preliminary segmentation result of the input depth neural network is a vector form.
  • the preliminary segmentation result is also a vector, and the hidden layer can output the preliminary segmentation result to the next layer for decoding.
  • Part 605 The deep neural network decodes the above preliminary segmentation result to obtain a decoding result.
  • the preliminary segmentation result is a vector
  • the decoding restores the vector of the preliminary segmentation result to an image
  • the decoding result obtained in part 604 can be referred to as a first decoding result.
  • the vector can be deconvolved to obtain a first decoding result.
  • the size of the first decoding result is determined by the dimensions of the depth neural network output layer.
  • a mask for extracting a foreground object may be formed directly from the first decoding result, ie, the subsequent 606 portion need not be performed.
  • Part 606 The (first) decoding result is subjected to secondary segmentation using a graph cut algorithm to obtain a quadratic segmentation result.
  • the foreground pixel set and the background pixel set obtained at the initial segmentation may be used as the foreground seed point and the background seed point, respectively.
  • the first decoding result may be converted into a data form required by the graph cut algorithm, and then the second segmentation is performed.
  • the secondary segmentation result may be referred to as a secondary mask, and the mask value may be set as described above, and will not be described herein.
  • Part 607 Deep neural network output final segmentation results.
  • the final segmentation result is also a mask.
  • the mask processed by the technical solution provided by the embodiment can be as shown in FIG. 5, and it can be seen that the hairline and the background are performed in this embodiment. Precise separation.
  • the extracted foreground area may be divided into multiple sub-areas according to the features extracted by the 602 part, and the 603-606 part processing is performed by different tasks on the sub-areas to obtain multiple Sub-segmentation results, and then, in step 607, multiple quadratic sub-segment results are fused together to obtain a final segmentation result and output.
  • different parts/shapes may be identified according to at least one of pixel level features and abstract features, and the foreground area is divided according to the recognition result.
  • Sub-areas, different sub-areas contain different parts (such as head sub-area, arm sub-area, torso area, etc.), or different sub-areas may be considered to have different shapes, and then, respectively, tasks that deal with different parts (such as human heads) Processing tasks, arm processing tasks, and body processing tasks are performed on portions 603-606 of each sub-area, respectively.
  • a deep neural network may have multiple interfaces (branches), one for each task, and each task is responsible for a different function.
  • preliminary portrait segmentation (604 part), decoding, and quadratic portrait segmentation (part 606) can be performed on the foreground region by multiple tasks, each task focusing on different parts, for example, task A is optimal for the human head, and some The task is optimal for the treatment of the arm, and some tasks are optimal for the treatment of the torso. Therefore, different parts can be processed by different tasks, and finally the segmentation results obtained by multiple tasks are merged together to obtain a final segmentation result and output.
  • the quadratic segmentation result can be directly output as the final segmentation result.
  • the final segmentation result may also be output by a graph cut algorithm.
  • Parts 604, 606, and 607 can all be performed by the aforementioned dividing unit 203.
  • Section 608 Extracting foreground objects from the original image using the final segmentation result.
  • the size of the final segmentation result of the output may be converted into a size corresponding to the original image.
  • the extracted foreground object can be as shown in FIG. 7a.
  • the gesture recognition may be further performed based on the foreground object, so that the identification and analysis and tracking of the person may be performed.
  • Figure 7b which is the result of gesture recognition.
  • the posture of the head and arm is marked with lines.
  • Section 608 can be performed by extraction unit 204.
  • the functionality of the extraction unit 204 may also be accomplished by a layer of the deep neural network.
  • FIG. 8 shows an exemplary flow, which may include at least the following steps:
  • the 800-802 part is the same as the aforementioned 600-602 part, and will not be described herein.
  • Section 803 Depth neural network decodes the foreground region to obtain a second decoding result
  • Section 804 is similar to the aforementioned section 603 and will not be described here.
  • Section 805 The pixel cut point of the foreground object and the pixel of the background are segmented from the foreground area corresponding to the second decoding result by using a graph cut algorithm to obtain a segmentation result, and the segmentation result is used to form a mask in subsequent operations.
  • Section 805 is similar to the aforementioned section 604 and will not be described here.
  • Section 806 Output the final segmentation result.
  • the extracted foreground area is divided into a plurality of sub-areas, and 805 parts are processed by different tasks to the sub-areas to obtain multiple sub-segment results, and then, in part 806. Combine multiple sub-segmentation results to get the final segmentation result and output.
  • the segmentation result obtained in Section 805 can be directly output as the final segmentation result.
  • Section 806 is similar to Section 607 above and will not be described here.
  • Part 807 A mask is formed according to the final segmentation result, and the foreground object is extracted from the original image by using the mask.
  • Section 807 is similar to the aforementioned section 608 and will not be described here.
  • the original image is directly processed by using the graph cut algorithm, and the embodiment of the present invention first performs foreground region extraction, and then uses the graph cut algorithm to the foreground.
  • the area is split. Since the foreground area itself contains fewer backgrounds, a more efficient and finer segmentation can be performed.
  • the existing graph cut algorithm requires the user to specify part of the foreground and partial background of the image by some means of interaction, that is, the user needs to specify some foreground points of the foreground object and the background. In many cases, the user may be inexperienced, etc. For a variety of reasons, it is not possible to specify foreground seed points and background seed points precisely.
  • the foreground area is extracted first, some pixels in the foreground area can be automatically used as foreground seed points, and some pixels outside the foreground area are used as background seed points. In addition to streamlining the process, you can also avoid the problem of users not being able to specify part of the foreground and part of the background.
  • the existing mapping method also uses the anti-learning network for mapping processing, but the anti-learning network is not able to perform a good mapping for complex backgrounds, because the mapping against the learning network is not pixel-level.
  • the map in this application is pixel-level and is more suitable for mapping in complex backgrounds.
  • the aforementioned image processing apparatus includes an acquisition unit 201, a foreground background separation unit 202, a division unit 203, and an extraction unit 204.
  • the dividing unit 203 in all the foregoing embodiments may be specifically configured to:
  • the pixel cut point of the foreground object and the pixel of the background are segmented from the foreground region by using a graph cut algorithm to obtain a segmentation result;
  • a mask is formed according to the first decoding result.
  • the dividing unit 203 in all the above embodiments may be specifically used for:
  • the segmentation result is used to form a mask
  • a mask is formed according to the segmentation result.
  • the segmentation unit 203 can also be used to: before the pixel point of the foreground object and the pixel of the background are segmented using the graph cut algorithm:
  • a pixel point in the foreground area is used as a designated foreground pixel point; and a pixel point outside the foreground area is used as a designated background pixel point.
  • the dividing unit 203 may further divide the foreground region or the second decoding result into multiple sub-regions, and perform segmentation using the graph cut algorithm for each sub-region to obtain a sub-segment result; The result of each sub-segmentation is obtained, wherein the segmentation result is used to form a mask.
  • the original image including the foreground object is first acquired, and the foreground image is extracted by the depth neural network to extract the foreground region; and the foreground object is obtained from the foreground region.
  • Embodiments of the present invention also claim an image processing apparatus including at least a processor and a memory, the processor executing the image processing method described above by executing a program stored in a memory and calling other devices.
  • the embodiment of the present invention further provides a storage medium storing a plurality of instructions, the instructions being adapted to be loaded by a processor to perform the steps in the image processing method provided by any embodiment of the present invention.
  • Embodiments of the present invention also claim a computer program product comprising instructions which, when executed on a computer, cause the computer to perform the steps of the image processing method provided by any of the embodiments of the present invention.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein can be implemented directly in hardware, a software unit executed by a processor, or a combination of both.
  • the software unit can be placed in random access memory (RAM), memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or technical field. Any other form of storage medium known.

Abstract

本发明实施例提供一种图像处理方法及相关装置,该方法应用于图像处理设备,包括:获取包括前景图像的原始图像;根据该原始图像,通过深度神经网络进行前景提取得到前景区域;从该前景区域中获取前景对象的像素点;进而,根据前景对象的像素点形成掩膜,其中,所述掩膜包括所述前景对象的各个像素点对应的掩膜值;根据所述掩膜从所述原始图像中提取出所述前景对象。在本发明实施例中,利用根据前景对象的像素点形成的掩膜提取前景对象,能够保证从原始图像中提取出的前景对象更精细。

Description

图像处理方法及相关装置
本申请要求于2017年12月11日提交中国专利局、申请号为2017113073818、申请名称为“图像处理方法及相关装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及计算机技术领域,具体涉及图像处理技术。
背景技术
数学图像抠图是指把前景对象(例如人物、车辆等)从已有的自然图像中分离出来的一种技术,它最早被运用于影视业的特效制作中,为影视业赢得了巨大的商业价值。
数学图像抠图开始仅限于固定背景抠图,比如,将一幅蓝色背景的人像图中的前景对象(人物)与蓝色背景分离。
随着科技的日益发展,数学图像抠图逐渐渗入到人们的日常生活中,并从固定背景抠图发展到自然背景抠图。图1示例性得示出了一张有着自然背景的图像,其中,自然背景11中包含窗帘、地毯等,在完成抠图后,可从该图像中将前景对象12(一位女性)与自然背景11分离,将自然背景11去除。
自然背景抠图在众多领域中有着良好的应用前景,如何将前景对象提取出来,是目前研究的热门。
发明内容
有鉴于此,本发明实施例提供一种图像处理方法及相关装置,能够从原始图像中精细地提取出前景对象。
为实现上述目的,本发明实施例提供如下技术方案:
第一方面,本发明实施例提供了一种图像处理方法,应用于图像处理设备,包括:
获取原始图像,所述原始图像包括前景对象;
根据所述原始图像,通过深度神经网络进行前景提取,得到前景区域;
从所述前景区域中获取所述前景对象的像素点,并根据所述前景对象的像素点形成掩膜,其中,所述掩膜包括所述前景对象的各个像素点对应的掩膜值;
根据所述掩膜从所述原始图像中提取出所述前景对象。
第二方面,本发明实施例提供了一种图像处理装置,包括:
获取单元,用于获取原始图像,所述原始图像包括前景对象;
前景背景分离单元,用于根据所述原始图像,通过深度神经网络进行前景提取,得到前景区域;
分割单元,用于从所述前景区域中获取所述前景对象的像素点,并根据所述前景对象的像素点形成掩膜,所述掩膜包括所述前景对象的各个像素点对应的掩膜值;
提取单元,用于根据所述掩膜从所述原始图像中提取出所述前景对象。
第三方面,本发明实施例提供了一种图像处理设备,至少包括处理器和存储器;所述处理器通过执行所述存储器中存放的程序以及调用其他设备,执行上述的图像处理方法。
第四方面,本发明实施例还提供一种存储介质,所述存储介质存储有多条指令,所述指令适于处理器进行加载,以执行本发明实施例所提供的任一种图像处理方法中的步骤。
第五方面,本发明实施例还提供一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行本发明实施例所提供的任一种图像处理方法中的步骤。
可见,在本发明实施例提供的图像处理方法中,图像处理设备先获取包括有前景对象的原始图像,通过深度神经网络对该原始图像进行前景提取处理,从而提取出前景区域;再从前景区域中获取前景对象的像素点,并根据前景对象的像素点形成掩膜,该掩膜包括前景对象的各个像素点对应的掩膜值;进而,根据该掩膜从原始图像中提取前景对象。利用根据前景对象的像素点形成的掩膜提取前景对象,能够保证从原始图像中提取出的前景对象更精细。
附图说明
图1为本发明实施例提供的有着自然背景的原始图像示意图;
图2a为本发明实施例提供的应用场景示意图;
图2b为本发明实施例提供的图像处理装置的示例性结构图;
图2c为本发明实施例提供的图像处理设备的示例性结构图;
图3为本发明实施例提供的一种图像处理方法的示例性流程图;
图4a为本发明实施例提供的一种包含前景区域的图像示意图;
图4b为本发明实施例提供的另一种包含前景区域的图像示意图;
图5为本发明实施例提供的掩膜示意图;
图6为本发明实施例提供的一种图像处理方法的示例性流程图;
图7a为本发明实施例提供的提取出的前景对象的示意图;
图7b为本发明实施例提供的姿势识别后的结果示意图;
图8为本发明实施例提供的一种图像处理方法的示例性流程图。
具体实施方式
本发明实施例提供图像处理方法及相关装置(图像处理装置、图像处理设备),其适用于提取人物(全身、半身、乃至诸如人头等身体部位)的场景需求,也可用于提取其他前景对象(例如车)。
本发明的核心思想是:先获取包括有前景对象的原始图像;然后通过深度神经网络对该原始图像进行前景提取处理,得到前景区域;接着从前景区域中获取前景对象的像素点,根据前景对象的像素点形成掩膜,该掩膜包括前景对象的各个像素点对应的掩膜值;最终,根据上述掩膜从原始图像中提取前景对象。
在介绍完核心思想后,下面介绍本发明实施例所涉及的装置。
上述图像处理装置可以软件或硬件的形式应用于图像处理设备中。具体的,图像处理设备可为诸如台式机、移动终端(例如智能手机)、ipad等的终端,也可以是提供图像处理服务的服务器。
当以软件形式应用于图像处理设备中时,上述图像处理装置可为独立的软件。当然,也可作为大型系统(例如操作系统)的子系统(子组件),提供图像处理服务。
当以硬件形式应用于图像处理设备中时,上述图像处理装置示例性的可为终端或服务器的控制器/处理器。
图2a示出了上述图像处理装置的一种示例性应用场景(客户端-服务器场景)。在该应用场景中包括web服务器21、图像处理服务器22,此外,还可包括模型训练服务器23。
其中,web服务器21为前端(前台),负责与客户端浏览器通信,图像处理服务器22、模型训练服务器23等为后端,图像处理服务器22可提供图像处理服务,模型训练服务器23可用于训练图像处理服务器22使用的图像处理算法(例如抠图算法),或者,提供训练用的图像样本。
上述图像处理装置即部署在图像处理服务器22中。
当然,上述图像处理装置的应用场景也可以是:用户选定照片或视频,在终端(例如智能手机)上对选定的照片或视频进行图像处理(例如抠图)。此时,需在终端上部署图像处理装置。
下面介绍图像处理装置的内部结构,图像处理装置的一种示例性结构如图2b所示,包括:获取单元201、前景背景分离单元202、分割单元203和提取单元204。
本文后续将结合图像处理方法介绍各单元的功能。
图2c示出了上述实施例中图像处理设备的一种可能的结构示意图,包括:
总线、处理器1、存储器2、通信接口3、输入设备4和输出设备5。处理器1、存储器2、通信接口3、输入设备4和输出设备5通过总线相互连 接。其中:
总线可包括一通路,在计算机系统各个部件之间传送信息。
处理器1可以是通用处理器,例如通用中央处理器(CPU)、网络处理器(Network Processor,简称NP)、微处理器等,也可以是特定应用集成电路(application-specific integrated circuit,ASIC),或一个或多个用于控制本发明方案程序执行的集成电路。还可以是数字信号处理器(DSP)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。
存储器2中保存有执行本发明技术方案的程序或脚本,还可以保存有操作系统和其他关键业务。具体地,程序可以包括程序代码,程序代码包括计算机操作指令。脚本则通常以文本(如ASCII)保存,只在被调用时进行解释或编译。
更具体的,存储器2可以包括只读存储器(read-only memory,ROM)、可存储静态信息和指令的其他类型的静态存储设备、随机存取存储器(random access memory,RAM)、可存储信息和指令的其他类型的动态存储设备、磁盘存储器、flash等等。
输入设备4可包括接收用户输入的数据和信息的装置,例如键盘、鼠标、摄像头、语音输入装置、触摸屏等。
输出设备5可包括允许输出信息给用户的装置,例如显示屏、扬声器等。
通信接口3可包括使用任何收发器一类的装置,以便与其他设备或通信网络通信,如以太网,无线接入网(RAN),无线局域网(WLAN)等。
可以理解的是,图2c仅仅示出了图像处理设备的简化设计。在实际应用中,图像处理设备可以包含任意数量的发射器,接收器,处理器,控制器,存储器,通信接口等,而所有可以实现本发明的服务器/智能终端都在本发明的保护范围之内。
处理器1通过执行存储器2中所存放的程序以及调用其他设备,可实现下述实施例提供的图像处理方法。
此外,图2b所示的图像处理装置各单元的功能,可由前述的处理器1执行存储器2中所存放的程序以及调用其他设备实现。
上述图像处理方法和图像处理装置可用于对图像(例如相片)、视频进行抠图。需要说明的是,视频也是由一帧帧图像构成,因此可对构成视频的每一帧图像进行抠图,从而完成对整个视频的抠图处理。
由于对视频的抠图处理,也是基于对单幅图像的抠图处理,因此,本文后续将以对单幅图像进行抠图处理为例来介绍技术方案。
下面将基于上面所述的本发明涉及的共性方面,对本发明实施例进一步详细说明。
图3示出了由图像处理设备执行的图像处理方法的一种示例性流程,其至少可包括如下步骤:
300部分:获取原始图像。
图像处理设备所获取的原始图像可包括前景对象。
图1给出了原始图像的示例图,这张原始图像包含自然背景11和前景对象12(一位女性)。
在不同的场景下,图像处理设备获取原始图像的方式有多种。以图像处理设备为终端的应用场景为例,用户可在终端上启动图像处理应用程序(例如抠图软件),图像处理应用程序的界面可提供拍照/摄像按钮,用户点击拍照/摄像按钮来调用终端的拍摄装置(例如摄像头)来拍摄照片或视频,拍摄的照片或视频可存于终端本地图库,则图像处理设备可获取拍摄到的照片或视频作为原始图像进行抠图。
或者,用户也可在终端的本地图库中选择已拍摄的照片或视频,启动图像处理应用程序对选择的照片或视频进行抠图,则图像处理应用程序可从本地图库中获取用户选定的照片或视频作为原始图像进行后续的抠图处理。更具体的,图像处理应用程序可作为图像编辑组件(例如“美化”选项下的图像编辑组件),为用户提供图像处理服务。
再例如,在图像处理设备为服务器场景下,可由客户端向服务器侧发送照片或视频,则部署在服务器侧的图像处理装置可获取客户端提供的照片或视频作为原始图像。
在一个示例中,可由前述的获取单元201执行300部分。
301部分:根据上述原始图像,通过深度神经网络进行前景提取,得到前景区域。
在一个示例中,可由前述的前景背景分离单元202执行301部分。
上述原始图像中,前景区域之外的部分为背景区域。
图像一般是由多个像素点组成,在具体实现时,图像处理设备可判断原始图像中的每一像素点是前景点还是背景点,如果是背景点,可为其分配第一像素值(例如0),以将其标识为背景;如果是前景点,为其分配第二像素值(例如255),以将其标识为前景。则以图1所示原始图像为例,得到包含前景区域的图像可如图4a所示。
或者,图像处理设备也可以在判断出像素点为背景点时,为其分配固定的像素值(例如0或255),而在判断出像素点为前景点时,保持其像素值不变,则以图1所示原始图像为例,提取出的包含前景区域的图像可如图4b示。
通过上述叙述可知,前景区域由前景点组成。
由图4a和图4b可见,前景区域中可能包含图1中的前景对象和部分背景,后续要进一步去除背景。
在具体实现时,图像处理设备可采用多种算法或模型来提取前景区域,例如使用混合高斯背景模型、深度神经网络等来提取前景区域,本领域技术人员可根据需要进行灵活选择和设计,在此不作赘述。
302部分:从上述前景区域中获取前景对象的像素点,并根据获取的前景对象的像素点形成掩膜(mask)。
与301部分的前景背景分离相比,302部分进行了更为精细的分割处理。
上述掩膜可包括前景对象的各个像素点对应的掩膜值。
可由前述的分割单元203执行302部分。
图像处理设备可采用多种算法从上述前景区域中获取前景对象的像素点,例如,可使用graph cut算法进行更为精细的分割处理,从前景区域中分割出前景对象的像素点和背景的像素点。
在一个示例中,可为分割出的、组成背景的像素点分配第一像素值(例如0),而为分割出的、组成前景对象的像素点分配第二像素值(例如255),上述第一像素值和第二像素值均为掩膜值。
以图1所示的原始图像为例,采用graph cut算法得到的掩膜可如图5所示。
在其他示例中,图像处理设备还可将上述前景区域划分为多个子区域,对每一子区域使用graph cut算法进行分割,得到子分割结果;然后,再合并各子分割结果,最终形成掩膜。
303部分:根据上述掩膜从原始图像中提取出前景对象。
如何使用掩膜从原始图像中提取出前景对象可参见现有方式,在此不作赘述。
可见,在本发明实施例提供的图像处理方法中,图像处理设备先获取包括有前景对象的原始图像,通过深度神经网络对该原始图像进行前景提取处理,从而提取出前景区域;再从前景区域中获取前景对象的像素点,并根据前景对象的像素点形成掩膜,该掩膜包括前景对象的各个像素点对应的掩膜值;进而,根据该掩膜从原始图像中提取前景对象。利用根据前景对象的像素点形成的掩膜提取前景对象,能够保证从原始图像中提取出的前景对象更精细。
下面,将以深度神经网络结合graph cut算法为例,进行更为详细的介绍。
深度神经网络是由“神经元”按层级结构组成,其间的权重和偏移量都是可训练得到的。
深度神经网络包括输入层和输出层,在输入层和输出层中间夹着数层隐 藏层,每一层由多个神经元组成,同一层的神经元之间没有连接。每一层的输出将作为下一层的输入。
当然,深度神经网络事先需要经过训练,训练好的深度神经网络可作为图像处理装置或图像处理装置的组成部分,完成抠图操作。
在本发明实施例中,可由深度神经网络完成前景区域的提取,由graph cut算法从前景区域中获取前景对象的像素点。
则前述提及的前景背景分离单元202具体可为深度神经网络或可调用深度神经网络,而分割单元203则可执行或调用graph cut算法。
在实现上,可由深度神经网络的一个分支(或隐藏层)执行graph cut算法,此时,深度神经网络既可实现前景背景分离单元202的功能,又可实现分割单元203的功能;当然,也可由深度神经网络之外的网络或算法模块执行graph cut算法,即将前景背景分离单元202的功能与分割单元203的功能分开执行。
图6示出了深度神经网络结合graph cut算法实现的图像处理方法的一种示例性流程,在本实施例中,深度神经网络可包括前景背景分离层,其至少可包括如下步骤:
600部分:深度神经网络获取原始图像;
600部分与前述300部分相类似,在此不作赘述。
601部分:深度神经网络对上述原始图像进行编码,得到原始向量;并向前景背景分离层输入该原始向量;
将原始图像编码可得到一个固定的向量,也即得到固定的特征。
需要说明的是,在图像处理中往往会把图像表示为向量,比如一个1000×1000的图像,可以表示为一个1000000的向量。
深度神经网络一般只能对特定长度(长度由深度神经网络输入层的维度决定)的向量进行操作,因此图像需要展开成向量(也即用向量表征图像)才能输入给深度神经网络。601部分对原始图像进行编码,可得到符合深度神经网络要求长度的向量。
602部分:深度神经网络通过前景背景分离层,根据输入的原始向量提取出前景区域;
前景区域的相关介绍可参见前述301部分的记载,在此不作赘述。
在实际中,以头发为例,若头发被风吹起,发丝间的像素点可能被判断为前景点,因此,在得到前景区域后,后续还要对其进行精细化处理。
除前景区域外,深度神经网络的输入层或其他层(例如前景背景分离层)还可提取原始向量的像素级特征和抽象特征中的至少一种。
上述像素级特征可包括颜色、形状、边缘、纹理特征等。
而抽象特征可包括辨别关键特征,例如,是否有人、图像包含的前景是人还是狗等。
603部分:将前景区域中的像素点作为指定的前景像素点,将位于前景区域之外的像素点作为指定的背景像素点。
需要说明的是,现有的graph cut算法需要由用户以某种交互手段指定图像的部分前景与部分背景,也即,需要用户指定前景对象和背景的一些种子点,很多情况下,用户会因经验不足等多种原因,而无法精确指定前景种子点和背景种子点。
而在本实施例中,由于在602部分已经得到了前景区域,因此可自动将前景区域中的一些像素点作为前景种子点,将位于前景区域之外的像素点作为背景种子点。除了简化流程外,也可避免用户无法精确指定部分前景与部分背景的问题。
603部分可由前景背景分离单元202执行,也可由执行graph cut算法的分割单元203执行。
604部分:使用graph cut算法(根据前景种子点和背景种子点)从上述前景区域中分割出前景对象的像素点和背景的像素点,得到分割结果。
在602部分得到的前景区域中可能包含前景对象和一部分背景,因此,还需要对其进行分割,才能够从前景区域中进一步分离出前景对象。
在本实施例中,会使用graph cut算法进行两次分割,所以604部分得到的分割结果可称为初步分割结果(或称为初步掩膜)。掩膜值如何设置可参见前述记载,在此不作赘述。
当然,在本发明其他实施例中,也可只使用graph cut算法分割一次,则不再需要执行后续的606部分。
graph cut算法可把图像分割问题与图的最小割(min cut)问题相关联,将图的顶点划分为两个不相交的子集,这两个子集分别对应于前景像素集和背景像素集,其中,前景像素集中包括前景对象的像素点,背景像素集中包括背景的像素点,从而完成了图像的前景与背景的分割。graph cut算法是比较成熟的算法,对于graph cut算法如何使用前景种子点和背景种子点进行分割,在此不作赘述。
之后,graph cut算法可将初步分割结果输入深度神经网络,根据前述的记载,输入深度神经网络的初步分割结果为向量形式。
当然,若由深度神经网络的一层隐藏层执行graph cut算法,则上述初步分割结果也是向量,该隐藏层可将初步分割结果输出给下一层进行解码。
605部分:深度神经网络对上述初步分割结果进行解码,得到解码结果。
前述提及了初步分割结果是向量,解码会将初步分割结果这个向量还原成图像。
为了与后续实施例中的解码结果相区别,可将604部分得到的解码结果称为第一解码结果。
在一个示例中,若深度神经网络为卷积深度神经网络,则可对向量进行反卷积,得到第一解码结果。
第一解码结果的大小受深度神经网络输出层的维度决定。
应理解,在一些可能的实现方式中,可以直接根据该第一解码结果形成用于提取前景对象的掩膜,即无需执行后续606部分。
606部分:对(第一)解码结果使用graph cut算法进行二次分割,得到二次分割结果。
二次分割时,可采用在初次分割时得到的前景像素集和背景像素集中的像素点分别作为前景种子点和背景种子点。
在本发明其他实施例中,在进行二次分割前,还可将第一解码结果转换为graph cut算法需要的数据形式,再进行二次分割。二次分割结果可称为二次掩膜,掩膜值如何设置可参见前述记载,在此不作赘述。
607部分:深度神经网络输出最终分割结果。
最终分割结果也为掩膜,以图1所示原图为例,采用本实施例所提供的技术方案处理得到的掩膜可如图5所示,可见,本实施例将发丝与背景进行了精确分离。
需要说明的是,在具体实施时,可根据602部分提取的特征,将提取出的前景区域分为多个子区域,分别由不同的任务对子区域执行603-606部分的处理,得到多个二次子分割结果,然后,在607部分将多个二次子分割结果融合在一起,得到最终分割结果,并输出。
更具体的,以前景区域包含人物为例,可根据像素级特征和抽象特征中的至少一种,识别出不同的部位/形状(人头、手臂、躯干等),根据识别结果将前景区域分为子区域,不同子区域包含不同的部位(例如头部子区域、手臂子区域、躯干区域等),或也可认为不同子区域呈现不同的形状,然后,分别由处理不同部位的任务(例如人头处理任务、手臂处理任务、躯干处理任务)来分别对各子区域进行603-606部分的处理。
在一个示例中,深度神经网络可具有多个接口(分支),每个接口对应一项任务,每项任务负责的功能不同。例如,可由多项任务对前景区域分别进行初步人像分割(604部分)、解码、二次人像分割(606部分),每一任务侧重不同的部位,例如,任务A对人头的处理最优,有些任务对手臂的处理最优,有些任务对躯干的处理最优。因此,可分别由不同任务处理不同部位,最后再将多个任务得到的分割结果融合在一起,得到最终分割结果,并输出。
当然,若未将前景区域划分为多个子区域分别进行处理,则可将二次分割结果直接作为最终分割结果输出。
在其他实施例中,也可由graph cut算法输出最终分割结果。
604部分、606和607部分均可由前述的分割单元203执行。
608部分:使用最终分割结果从原始图像中提取出前景对象。
需要说明的是,若输出的最终分割结果的尺寸大小(分辨率)与原始图像不符,可对最终分割结果的尺寸大小进行转换,转换成与原始图像相符的尺寸大小。
以图1所示原图为例,提取出的前景对象可如图7a所示。
在本发明其他实施例中,在提取出前景对象后,还可进一步基于前景对象进行姿势识别,从而可对人身进行识别分析与追踪。例如,请参见图7b,即为姿势识别后的结果,在图7b中,对头部和手臂的姿势用线条进行了标示。
608部分可由提取单元204执行。在本发明其他实施例中,也可由深度神经网络的某一层完成提取单元204的功能。
此外,在本发明其他实施例中,graph cut算法也可在解码后进行,图8示出了一种示例性流程,其至少可包括如下步骤:
800-802部分与前述600-602部分相同,在此不作赘述。
803部分:深度神经网络对前景区域进行解码,得到第二解码结果;
解码相关介绍请参见前述605部分的记载,在此不作赘述。
804部分与前述的603部分相类似,在此不作赘述。
805部分:使用graph cut算法从第二解码结果对应的前景区域中分割出前景对象的像素点和背景的像素点,得到分割结果,该分割结果用于在后续操作中形成掩膜。
805部分与前述的604部分相类似,在此不作赘述。
806部分:输出最终分割结果。
在具体实施时,可根据802部分提取的特征,将提取出的前景区域分为多个子区域,分别由不同的任务对子区域执行805部分的处理,得到多个子分割结果,然后,在806部分将多个子分割结果合并在一起,得到最终分割结果,并输出。
当然,若未将前景区域划分为多个子区域,则可将805部分得到的分割结果作为最终分割结果直接输出。
806部分与前述的607部分相类似,在此不作赘述。
807部分:根据最终的分割结果形成掩膜,利用该掩膜从原始图像中提取出前景对象。
807部分与前述的608部分相类似,在此不作赘述。
需要说明的是,现有图像处理方法中,有直接使用graph cut算法对原始图像进行抠图处理的,本发明实施例与之相比,是先进行前景区域提取,再使用graph cut算法对前景区域进行分割。由于前景区域本身包含的背景较少,因此,可进行更为有效、更精细的分割。
此外,现有的graph cut算法需要由用户以某种交互手段指定图像的部分 前景与部分背景,也即,需要用户指定前景对象和背景的一些种子点,很多情况下,用户会因经验不足等多种原因,而无法精确指定前景种子点和背景种子点。
而在本实施例中,由于先行提取了前景区域,因此可自动将前景区域中的一些像素点作为前景种子点,将前景区域之外的一些像素点作为背景种子点。除了简化流程外,也可避免用户无法精确指定部分前景与部分背景的问题。
此外,现有抠图方式也有使用对抗学习网络进行抠图处理的,但对抗学习网络对于复杂背景并非能够很好地进行抠图,这是因为对抗学习网络的抠图并不是像素级的。而本申请中的抠图是像素级的,更加适用于复杂背景下的抠图。
下面再简单介绍本发明实施例所涉及的装置。
前述提及了图像处理装置包括:获取单元201、前景背景分离单元202、分割单元203和提取单元204。
在本发明其他实施例中,在获取上述前景对象的像素点的方面,上述所有实施例中的分割单元203可具体用于:
使用graph cut算法从上述前景区域中分割出前景对象的像素点和背景的像素点,得到分割结果;
对上述分割结果进行解码,得到第一解码结果;其中,第一解码结果用于形成掩膜;
根据该第一解码结果形成掩膜。
或者,在获取上述前景对象的像素点的方面,上述所有实施例中的中的分割单元203可具体用于:
对上述前景区域进行解码,得到第二解码结果;
使用graph cut算法从上述第二解码结果中分割出前景对象的像素点和背景的像素点,得到分割结果;该分割结果用于形成掩膜;
根据分割结果形成掩膜。
相关介绍请参见本文前述的记载,在此不作赘述。
而在使用graph cut算法分割出前景对象的像素点和背景的像素点之前,上述分割单元203还可用于:
将上述前景区域中的像素点作为指定的前景像素点;而将位于前景区域之外的像素点作为指定的背景像素点。
相关介绍请参见本文前述的记载,在此不作赘述。
此外,在本发明其他实施例中,上述分割单元203还可将前景区域或第二解码结果划分为多个子区域,对每一子区域使用所述graph cut算法进行分 割,得到子分割结果;合并各子分割结果,得到分割结果,其中,该分割结果用于形成掩膜。
在本发明实施例提供的图像处理装置中,先获取包括有前景对象的原始图像,通过深度神经网络对该原始图像进行前景提取处理,从而提取出前景区域;再从前景区域中获取前景对象的像素点,并根据前景对象的像素点形成掩膜,该掩膜包括前景对象的各个像素点对应的掩膜值;进而,根据该掩膜从原始图像中提取前景对象。利用根据前景对象的像素点形成的掩膜提取前景对象,能够保证从原始图像中提取出的前景对象更精细。
本发明实施例还要求保护图像处理设备,其至少包括处理器和存储器,该处理器通过执行存储器中存放的程序以及调用其他设备,执行上述的图像处理方法。
本发明实施例还要求保护一种存储介质,该存储介质存储有多条指令,所述指令适于处理器进行加载,以执行本发明任一实施例所提供的图像处理方法中的步骤。
本发明实施例还要求保护一种计算机程序产品,其包括指令,当其在计算机上运行时,使得计算机执行本发明任一实施例所提供的图像处理方法中的步骤。
本说明书中各个实施例采用递进的方式描述,每个实施例重点说明的都是与其他实施例的不同之处,各个实施例之间相同相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处参见方法部分说明即可。
专业人员还可以进一步意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件单元,或者二者的结合来实施。软件单元可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
对所公开的实施例的上述说明,使本领域专业技术人员能够实现或使用本发明。对这些实施例的多种修改对本领域的专业技术人员来说将是显而易 见的,本文中所定义的一般原理可以在不脱离本发明的精神或范围的情况下,在其它实施例中实现。因此,本发明将不会被限制于本文所示的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。

Claims (15)

  1. 一种图像处理方法,应用于图像处理设备,包括:
    获取原始图像,所述原始图像包括前景对象;
    根据所述原始图像,通过深度神经网络进行前景提取,得到前景区域;
    从所述前景区域中获取所述前景对象的像素点,并
    根据所述前景对象的像素点形成掩膜,其中,所述掩膜包括所述前景对象的各个像素点对应的掩膜值;
    根据所述掩膜从所述原始图像中提取出所述前景对象。
  2. 如权利要求1所述的方法,所述深度神经网络包括前景背景分离层,则所述根据所述原始图像,通过输入深度神经网络进行前景提取,得到前景区域包括:
    对所述原始图像进行编码,得到原始向量;
    向所述前景背景分离层输入所述原始向量,通过所述前景背景分离层提取出所述前景区域。
  3. 如权利要求2所述的方法,所述从所述前景区域中获取所述前景对象的像素点,并根据所述前景对象的像素点形成掩膜包括:
    使用graph cut算法从所述前景区域中分割出所述前景对象的像素点和背景的像素点,得到分割结果;对所述分割结果进行解码,得到第一解码结果;所述第一解码结果用于形成所述掩膜;
    根据所述第一解码结果形成所述掩膜。
  4. 如权利要求2所述的方法,所述从所述前景区域中获取所述前景对象的像素点,并根据所述前景对象的像素点形成掩膜包括:
    对所述前景区域进行解码,得到第二解码结果;
    使用graph cut算法从所述第二解码结果中分割出所述前景对象的像素点和背景的像素点,得到分割结果;所述分割结果用于形成所述掩膜;
    根据所述分割结果形成所述掩膜。
  5. 如权利要求3或4所述的方法,其特征在于,在使用graph cut算法分割出所述前景对象的像素点和背景的像素点之前,还包括:
    将位于所述前景区域中的像素点作为指定的前景像素点;
    将位于所述前景区域之外的像素点作为指定的背景像素点;其中,所述指定的前景像素点和所述指定的背景像素点用于所述graph cut算法分割出所述前景对象的像素点和所述背景的像素点。
  6. 如权利要求1所述的方法,其特征在于,所述从所述前景区域中获取所述前景对象的像素点,并根据所述前景对象的像素点形成掩膜包括:
    将所述前景区域划分为多个子区域;
    对每一子区域使用graph cut算法进行分割,得到子分割结果;合并各子分割结果,得到分割结果,其中,所述分割结果用于形成所述掩膜;
    根据所述分割结果形成所述掩膜。
  7. 一种图像处理装置,其特征在于,包括:
    获取单元,用于获取原始图像,所述原始图像包括前景对象;
    前景背景分离单元,用于根据所述原始图像,通过深度神经网络进行前景提取,得到前景区域;
    分割单元,用于从所述前景区域中获取所述前景对象的像素点,并根据所述前景对象的像素点形成掩膜,所述掩膜包括所述前景对象的各个像素点对应的掩膜值;
    提取单元,用于根据所述掩膜从所述原始图像中提取出所述前景对象。
  8. 如权利要求7所述的装置,其特征在于,所述深度神经网络具体用于:
    对所述原始图像进行编码,得到原始向量;
    根据所述原始向量提取所述前景区域。
  9. 如权利要求8所述的装置,其特征在于,所述分割单元具体用于:
    使用graph cut算法从所述前景区域中分割出所述前景对象的像素点和背景的像素点,得到分割结果;
    对所述分割结果进行解码,得到第一解码结果;所述第一解码结果用于形成所述掩膜;
    根据所述第一解码结果形成所述掩膜。
  10. 如权利要求8所述的装置,所述分割单元具体用于:
    对所述前景区域进行解码,得到第二解码结果;
    使用graph cut算法从所述第二解码结果中分割出所述前景对象的像素点和背景的像素点,得到分割结果;所述分割结果用于形成所述掩膜;
    根据所述分割结果形成所述掩膜。
  11. 如权利要求9或10所述的装置,所述分割单元还用于:
    将所述前景区域中的像素点作为指定的前景像素点;
    将位于所述前景区域之外的像素点作为指定的背景像素点;其中,所述指定的前景像素点和指定的背景像素点用于所述graph cut算法分割出所述前景对象的像素点和所述背景的像素点。
  12. 如权利要求7所述的装置,其特征在于,所述分割单元具体用于:
    将所述前景区域划分为多个子区域;
    对每一子区域使用所述graph cut算法进行分割,得到子分割结果;
    合并各子分割结果,得到所述分割结果,其中,所述分割结果用于形成所述掩膜;
    根据所述分割结果形成所述掩膜。
  13. 一种图像处理设备,至少包括处理器和存储器;所述处理器通过调用所述存储器中存放的程序,以执行如权利要求1-6任一项所述的图像处理方法中的步骤。
  14. 一种存储介质,所述存储介质存储有多条指令,所述指令在被处理器加载时,用以执行权利要求1至6任一项所述的图像处理方法中的步骤。
  15. 一种计算机程序产品,包括指令,当其在计算机上运行时,使得计算机执行权利要求1至6任一项所述的图像处理方法中的步骤。
PCT/CN2018/118644 2017-12-11 2018-11-30 图像处理方法及相关装置 WO2019114571A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/671,747 US11200680B2 (en) 2017-12-11 2019-11-01 Image processing method and apparatus

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711307381.8A CN109903291B (zh) 2017-12-11 2017-12-11 图像处理方法及相关装置
CN201711307381.8 2017-12-11

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/671,747 Continuation US11200680B2 (en) 2017-12-11 2019-11-01 Image processing method and apparatus

Publications (1)

Publication Number Publication Date
WO2019114571A1 true WO2019114571A1 (zh) 2019-06-20

Family

ID=66819919

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/118644 WO2019114571A1 (zh) 2017-12-11 2018-11-30 图像处理方法及相关装置

Country Status (3)

Country Link
US (1) US11200680B2 (zh)
CN (1) CN109903291B (zh)
WO (1) WO2019114571A1 (zh)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111126394A (zh) * 2019-12-25 2020-05-08 上海肇观电子科技有限公司 文字识别方法、阅读辅助设备、电路和介质
CN113139566B (zh) * 2020-01-20 2024-03-12 北京达佳互联信息技术有限公司 图像生成模型的训练方法及装置、图像处理方法及装置
CN111652796A (zh) * 2020-05-13 2020-09-11 上海连尚网络科技有限公司 图像处理方法、电子设备及计算机可读存储介质
CN113838076A (zh) * 2020-06-24 2021-12-24 深圳市中兴微电子技术有限公司 目标图像中的对象轮廓的标注方法及装置、存储介质
CN112232217B (zh) * 2020-10-16 2022-08-02 怀化新大地电脑有限公司 手势识别系统
CN112839223B (zh) * 2020-12-23 2022-12-20 深圳酷派技术有限公司 图像压缩方法、装置、存储介质及电子设备
CN113538273B (zh) * 2021-07-13 2023-09-19 荣耀终端有限公司 图像处理方法及图像处理装置

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781198A (en) * 1995-12-22 1998-07-14 Intel Corporation Method and apparatus for replacing a background portion of an image
US6651246B1 (en) * 1999-11-08 2003-11-18 International Business Machines Corporation Loop allocation for optimizing compilers
CN101588459A (zh) * 2009-06-26 2009-11-25 北京交通大学 一种视频抠像处理方法
US20110249190A1 (en) * 2010-04-09 2011-10-13 Nguyen Quang H Systems and methods for accurate user foreground video extraction
CN103745456A (zh) * 2013-12-23 2014-04-23 深圳先进技术研究院 一种图像分割方法及装置
CN104935832A (zh) * 2015-03-31 2015-09-23 浙江工商大学 针对带深度信息的视频抠像方法
CN105120185A (zh) * 2015-08-27 2015-12-02 新奥特(北京)视频技术有限公司 一种视频图像抠像方法与装置
CN105590309A (zh) * 2014-10-23 2016-05-18 株式会社理光 前景图像分割方法和装置
CN105631868A (zh) * 2015-12-25 2016-06-01 清华大学深圳研究生院 一种基于图像分类的深度信息提取方法
CN106303161A (zh) * 2015-06-24 2017-01-04 联想(北京)有限公司 一种图像处理方法及电子设备

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102750535B (zh) * 2012-04-01 2014-03-19 北京京东世纪贸易有限公司 自动提取图像前景的方法和系统
CN104463865A (zh) * 2014-12-05 2015-03-25 浙江大学 一种人像分割方法
CN105631880B (zh) * 2015-12-31 2019-03-22 百度在线网络技术(北京)有限公司 车道线分割方法和装置
CN106204597B (zh) * 2016-07-13 2019-01-11 西北工业大学 一种基于自步式弱监督学习的视频物体分割方法
US10032281B1 (en) * 2017-05-03 2018-07-24 Siemens Healthcare Gmbh Multi-scale deep reinforcement machine learning for N-dimensional segmentation in medical imaging
US10937169B2 (en) * 2018-12-18 2021-03-02 Qualcomm Incorporated Motion-assisted image segmentation and object detection

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5781198A (en) * 1995-12-22 1998-07-14 Intel Corporation Method and apparatus for replacing a background portion of an image
US6651246B1 (en) * 1999-11-08 2003-11-18 International Business Machines Corporation Loop allocation for optimizing compilers
CN101588459A (zh) * 2009-06-26 2009-11-25 北京交通大学 一种视频抠像处理方法
US20110249190A1 (en) * 2010-04-09 2011-10-13 Nguyen Quang H Systems and methods for accurate user foreground video extraction
CN103745456A (zh) * 2013-12-23 2014-04-23 深圳先进技术研究院 一种图像分割方法及装置
CN105590309A (zh) * 2014-10-23 2016-05-18 株式会社理光 前景图像分割方法和装置
CN104935832A (zh) * 2015-03-31 2015-09-23 浙江工商大学 针对带深度信息的视频抠像方法
CN106303161A (zh) * 2015-06-24 2017-01-04 联想(北京)有限公司 一种图像处理方法及电子设备
CN105120185A (zh) * 2015-08-27 2015-12-02 新奥特(北京)视频技术有限公司 一种视频图像抠像方法与装置
CN105631868A (zh) * 2015-12-25 2016-06-01 清华大学深圳研究生院 一种基于图像分类的深度信息提取方法

Also Published As

Publication number Publication date
CN109903291A (zh) 2019-06-18
US20200082542A1 (en) 2020-03-12
CN109903291B (zh) 2021-06-01
US11200680B2 (en) 2021-12-14

Similar Documents

Publication Publication Date Title
WO2019114571A1 (zh) 图像处理方法及相关装置
US11463631B2 (en) Method and apparatus for generating face image
CN107704838B (zh) 目标对象的属性识别方法及装置
US10789453B2 (en) Face reenactment
CN109173263B (zh) 一种图像数据处理方法和装置
US9414016B2 (en) System and methods for persona identification using combined probability maps
CN108182714B (zh) 图像处理方法及装置、存储介质
EP3108379B1 (en) Image editing techniques for a device
CN111553267B (zh) 图像处理方法、图像处理模型训练方法及设备
CN111275784B (zh) 生成图像的方法和装置
CN111787242A (zh) 用于虚拟试衣的方法和装置
CN111008935B (zh) 一种人脸图像增强方法、装置、系统及存储介质
US11978216B2 (en) Patch-based image matting using deep learning
CN112330527A (zh) 图像处理方法、装置、电子设备和介质
CN113822798B (zh) 生成对抗网络训练方法及装置、电子设备和存储介质
CN112750176A (zh) 一种图像处理方法、装置、电子设备及存储介质
US20240095886A1 (en) Image processing method, image generating method, apparatus, device, and medium
CN110910400A (zh) 图像处理方法、装置、存储介质及电子设备
CN113642359B (zh) 人脸图像生成方法、装置、电子设备及存储介质
US20220207917A1 (en) Facial expression image processing method and apparatus, and electronic device
CN111107264A (zh) 图像处理方法、装置、存储介质以及终端
CN111179287A (zh) 人像实例分割方法、装置、设备及存储介质
Kamat et al. MonVoix-An Android Application for hearing impaired people
CN111652792A (zh) 图像的局部处理、直播方法、装置、设备和存储介质
CN111489769B (zh) 图像处理方法、装置和硬件装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18887865

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18887865

Country of ref document: EP

Kind code of ref document: A1