CN112200817A - Sky region segmentation and special effect processing method, device and equipment based on image - Google Patents

Sky region segmentation and special effect processing method, device and equipment based on image Download PDF

Info

Publication number
CN112200817A
CN112200817A CN202011104753.9A CN202011104753A CN112200817A CN 112200817 A CN112200817 A CN 112200817A CN 202011104753 A CN202011104753 A CN 202011104753A CN 112200817 A CN112200817 A CN 112200817A
Authority
CN
China
Prior art keywords
image
sky
original image
convolution
segmentation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011104753.9A
Other languages
Chinese (zh)
Inventor
黄培根
朱鹏飞
王雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Huaduo Network Technology Co Ltd
Original Assignee
Guangzhou Huaduo Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Huaduo Network Technology Co Ltd filed Critical Guangzhou Huaduo Network Technology Co Ltd
Priority to CN202011104753.9A priority Critical patent/CN112200817A/en
Publication of CN112200817A publication Critical patent/CN112200817A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • G06T3/04
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/143Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The embodiment of the application provides a sky region segmentation and special effect processing method, a device and equipment based on an image, and relates to the field of image processing. According to the technical scheme, the sky area in the image is accurately segmented on the client, the performance is stable, the processing speed is high, and the display effect of the image is improved.

Description

Sky region segmentation and special effect processing method, device and equipment based on image
Technical Field
The present application relates to the field of image processing, and in particular, to a sky region segmentation and special effect processing method, device, and apparatus based on an image, and further including a computer-readable storage medium.
Background
With the rapid development of artificial intelligence, especially deep learning, semantic segmentation has become an important research topic with a wide application scenario. Sky segmentation means that a sky region in an image is segmented, then weather transformation is achieved through relevant post-processing, such as sunny days, rainy days, sunset and the like, and special effects such as fireworks, streamer lights, aurora and the like are added.
In the related art, a semantic segmentation technology based on deep learning is generally adopted, an image is input into a deep learning network structure or a convolutional neural network for training to obtain a sky segmentation model, and the sky segmentation model is used for processing such as feature extraction, segmentation, detection and identification. However, the current sky segmentation model occupies a large memory, can only be implemented in the cloud, and cannot be deployed on a user terminal to run in real time. Even if some sky segmentation models operate at the user terminal, segmentation is inaccurate, so that the segmentation effect is poor, and the use experience of a user is influenced.
Disclosure of Invention
The present application aims to solve at least one of the above-described technical drawbacks, and particularly, to solve the problems of a large amount of computation and low segmentation accuracy.
In a first aspect, an embodiment of the present application provides an image-based sky region segmentation method, including the following steps:
acquiring an original image containing sky content characteristics;
inputting an original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and segmenting the sky area of the original image according to the probability image.
In one embodiment, the encoder module includes a multi-layer first convolution element;
the step of extracting first image features of the original image having different receptive fields using a first convolution unit of an encoder module of the sky segmentation network comprises:
extracting first image features with different receptive fields of the original image layer by utilizing a plurality of layers of first convolution units of an encoder module of the sky segmentation network; wherein the resolution of the first image feature decreases from layer to layer.
In one embodiment, the decoder module comprises a plurality of layers of second convolution units; the second convolution units correspond to the first convolution units one by one;
the step of inputting the first image feature into a second convolution unit of a decoder module of the sky segmentation network for processing, and restoring to obtain a first feature image consistent with the resolution of the input original image includes:
inputting the first image features output by each first convolution unit of the encoder module into a second convolution unit corresponding to the corresponding decoder module by adopting a feature jumper connection mode, and performing network operation layer by combining with an up-sampling mode to extract and obtain second image features;
and restoring according to the second image characteristic to obtain a first characteristic image which is consistent with the resolution of the input image to be segmented.
In one embodiment, each of the first convolution units includes a plurality of stacked Conv + BN + ReLU network layers;
the operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image features layer by layer through a line bilinear interpolation upsampling operation.
In one embodiment, the step of training the second feature images output by the second convolution units of at least two layers of the decoder module by using a loss function corresponding to the resolution of the second feature images comprises:
determining the resolution ratio of a second characteristic image output by a second convolution unit of at least two layers of the decoder module relative to the original image;
determining the type and weight value of a BCE loss function for training each convolution layer according to the resolution ratio;
and based on the weight value, carrying out constraint training on the sky segmentation network by utilizing the BCE loss function of the corresponding category.
In one embodiment, the step of determining the resolution ratio of the second feature image output by the second convolution units of at least two layers of the decoder module with respect to the original image comprises:
adjusting the characteristic channels of the second convolution units of at least two layers of the decoder module into output channels through 1 x 1 convolution, and outputting corresponding second characteristic images;
and calculating the resolution ratio of the second characteristic image relative to the original image based on the resolution of the second characteristic image.
In one embodiment, the step of obtaining a probability image corresponding to the original image according to the first feature image includes:
converting the first characteristic image by using a Sigmoid function to obtain a probability image corresponding to the characteristic image; and the first characteristic image is a second characteristic image output by a last layer of second convolution unit of the encoder module.
In a second aspect, an embodiment of the present application provides an image special effect processing method, including:
acquiring a target sky material selected by a user;
determining a sky region of an original image to be segmented, wherein the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image which is consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and fusing the target sky material to the sky area to generate a target image.
In a third aspect, an embodiment of the present application provides an image-based sky region segmentation apparatus, including:
the system comprises an original image acquisition module, a background image acquisition module and a background image acquisition module, wherein the original image acquisition module is used for acquiring an original image containing sky content characteristics;
a probability image obtaining module, configured to input an original image into a sky segmentation network, perform semantic segmentation processing on the original image, extract first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, input the first image features into a second convolution unit of a decoder module of the sky segmentation network, perform processing on the first image features, restore the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtain a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and the sky region segmentation module is used for segmenting the sky region of the original image according to the probability image.
In a fourth aspect, an embodiment of the present application provides an image special effect processing apparatus, including:
the sky material acquisition module is used for acquiring a target sky material selected by a user;
a sky region determining module, configured to determine a sky region to be segmented of the original image, where the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network, processing the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and the target image generation module is used for fusing the target sky material into the sky area to generate a target image.
In a fifth aspect, an embodiment of the present application provides an electronic device, which includes:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the image-based sky region segmentation method according to the first aspect or the image special effect processing method according to the second aspect.
In a sixth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the image-based sky region segmentation method of the first aspect or the image special effect processing method of the second aspect.
In the image-based sky region segmentation and special effect processing method, device, equipment, and computer-readable storage medium provided by the embodiments, an original image including sky content features is acquired, the original image is input to a sky segmentation network to perform semantic segmentation processing, first image features having different receptive fields of the original image are extracted by using a first convolution unit of an encoder module of the sky segmentation network, the first image features are input to a second convolution unit of a decoder module of the sky segmentation network to be processed, a first feature image consistent with the resolution of the input original image is obtained by restoring, a probability image corresponding to the original image is obtained, and the sky region of the original image is segmented according to the probability image, wherein the sky segmentation network is a lightweight deep convolution neural network, and second feature images output by second convolution units of at least two layers of the decoder module utilize the resolution of the second feature image to be the same as the resolution of the second feature image The corresponding loss function is obtained by training and deployed at the client, so that the sky area in the image is accurately segmented at the client, the performance is stable, the processing speed is high, the display effect of the image is improved, and the requirements of users are met.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic diagram of an application system framework involved in an image-based sky region segmentation process according to an embodiment of the present application;
fig. 2 is a flowchart of a sky region segmentation method based on an image according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a sky segmentation network according to an embodiment of the present application;
fig. 4 is a flowchart of an image special effect processing method provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of an implementation of image special effects processing according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an image-based sky region segmentation apparatus according to an embodiment;
fig. 7 is a schematic structural diagram of an image special effect processing apparatus according to an embodiment.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The following describes an application scenario related to an embodiment of the present application.
The embodiment of the application is applied to a scene with effect transformation on the sky in an image, and particularly can be applied to the image, the sky area in the image is identified and segmented, and the original sky is replaced by a target sky pattern through a weather transformation mode, a color transformation mode and the like.
For example, a neural network for identifying and segmenting a sky region of an image is deployed at a client, the sky region is determined through the neural network, a target sky material selected by a user is obtained, an original sky in the image is replaced by the target sky material, and then the image is displayed, so that the display effect of the image is improved, and the image is more attractive.
Based on the application scenario, the neural network is required to be executed on the client, and the sky area can be accurately segmented, so that the display effect of the image can be better improved. Of course, the technical solution provided in the embodiment of the present application may also be applied to other positioning scenarios, which are not listed here.
In order to better explain the technical solution of the present application, a certain application environment to which the present solution can be applied is shown below. Fig. 1 is a schematic diagram of an application system framework related to image-based sky region segmentation processing according to an embodiment of the present application, and as shown in fig. 1, the application system 10 includes a client 101 and a server 102, and a communication connection is established between the client 101 and the server 102 through a wired network or a wireless network.
The client 101 may be a portable device such as a smart phone, a smart camera, a palm computer, a tablet computer, an electronic book, and a notebook computer, which is not limited to the above, and may have functions such as photographing and image processing, so as to implement image-based sky region segmentation and special effect processing. Optionally, the client 101 has a touch screen, and a user may perform corresponding operations on the touch screen of the client 101 to implement functions such as sky segmentation, image processing, special effect synthesis, and the like. The client acquires the related images, performs related processing on the images, and sends the images to the server 102, so that the images are sent to other clients for display through the server 102.
The server 102 includes an electronic device, such as a background server provided by the client 101, and may be implemented by a stand-alone server or a server cluster composed of multiple servers. In one embodiment, the server may be an image sharing platform. After the user shoots the image, the image is processed correspondingly, such as sky transformation, character beautification, background replacement and the like, the processed image is uploaded to the server 102, and then the server 102 pushes the image to other clients, so that other users can see the produced image of the user.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
In the related image processing technology, a user can set related parameters at a client to identify a designated area, such as a sky area, in an image through a server with better performance, and after the sky area is subjected to pattern transformation according to the parameters set by the user, a processed image is generated and then pushed to each viewer.
In the process, the client and the server are required to be performed in a networking state, the neural network model is trained by means of high performance of the server, and the neural network model based on deep learning in the related technology cannot be deployed at the client due to large operation amount and cannot realize image processing based on the neural network model in an offline state. Some improved image processing methods can be executed on the client, but other methods for image segmentation have low accuracy, so that the segmentation effect is caused, and the image processing effect is influenced.
The application provides a sky region segmentation and special effect processing method, a sky region segmentation and special effect processing device, sky region segmentation and special effect processing equipment and a computer readable storage medium, and aims to solve the technical problems in the prior art.
Fig. 2 is a flowchart of an image-based sky region segmentation method according to an embodiment of the present disclosure, which is applicable to an image-based sky region segmentation apparatus, such as a client. The following description will be given taking a mobile terminal as an example.
And S210, acquiring an original image containing sky content characteristics.
Generally, a user takes an external scene image including a sky area by using a mobile phone camera or the like. Sky content features refer to features representing the sky, such as clouds, color categories, color variations, and color regions.
In this embodiment, the original image including the sky content feature may be acquired from a local device such as a mobile phone camera, or may be acquired from a local storage device or an external storage device. Of course, the original image may not include sky content features, and if the original image not including sky content features is obtained, it may be deleted in subsequent processing, or subsequent sky segmentation or special effect processing is not required.
Optionally, an image format of the original image may be adjusted according to the sky segmentation scene characteristics.
In this embodiment, the sky-segmentation scene characteristic refers to a parameter characteristic that meets requirements of image processing, such as segmentation accuracy, display resolution, image size, and the like, required for fine image correction or small video cover creation.
For example, the image is subjected to processing such as cropping and resolution adjustment, and the height and width of the image are adjusted to (384,512), where the height of the image is 384 pixels in height and the width is 512 pixels in width.
And setting a preprocessing parameter of the image to be segmented according to the sky segmentation scene characteristic, and adjusting an image format of an original image in the image to be segmented according to the preprocessing parameter, such as adjusting the size, the resolution, the precision and the like.
S220, an original image is input into a sky segmentation network, semantic segmentation processing is carried out on the original image, first image features with different receptive fields of the original image are extracted by utilizing a first convolution unit of an encoder module of the sky segmentation network, the first image features are input into a second convolution unit of a decoder module of the sky segmentation network to be processed, a first feature image which is consistent with the input resolution of the original image is obtained through restoration, and a probability image corresponding to the original image is obtained according to the first feature image.
The sky segmentation network is a convolution neural network with a light-weight depth, and the sky segmentation network is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image.
Semantic segmentation of images (semantic segmentation) refers to having a computer segment according to the semantics of an image. In the image field, semantic meaning refers to the content of an image, and understanding the meaning of a picture, meaning that different objects in the picture are divided from the perspective of pixels, and labeling is performed on each pixel in an original image, for example, white in an image frame represents a sky area, and black represents a non-sky area. Of course, in other embodiments, white of the image frame may represent a sky area, black may represent a non-sky area, etc.
In this embodiment, the sky segmentation network is a convolutional neural network obtained through pre-training, can be adapted to an image-based sky region segmentation scene, is deployed on a client, and can perform pattern transformation on a sky region in an image in real time. For the high operand of the relevant convolution neural network, the sky segmentation model of this embodiment carries out structure adjustment and parameter adjustment to on the basis of guaranteeing the identification accuracy, the operand significantly reduces. Generally, the larger the number of channels of the neural network, the larger the amount of data computation. The sky segmentation network compresses the number of channels, so that the operation amount is reduced, for example, the number of basic channels is compressed from 64 to 16. Sky segmentation is based on image processing, and the operand requirement is a little lower, so the number of channels is set to 16, the parameter number is properly reduced, and the accuracy and the stability of the segmentation effect are ensured when the operand is reduced. In this embodiment, the segmentation effect caused by the compression of the number of channels can be compensated by other parameter adjustment or structure adjustment methods, so as to ensure the segmentation accuracy.
In this embodiment, a first convolution unit of an encoder module of the sky segmentation network is used to extract first image features of the original image with different receptive fields, the first image features are input to a second convolution unit of a decoder module of the sky segmentation network for processing, a first feature image consistent with the resolution of the input original image is restored, and a probability image corresponding to the original image is obtained according to the first feature image.
Wherein the encoder module comprises a plurality of layers of first convolution elements; the decoder module comprises a plurality of layers of second convolution units, and the second convolution units correspond to the first convolution units one by one. In this embodiment, the decoder module includes five layers of second convolution units, and the decoder module includes five layers of second convolution units.
In an embodiment, the extracting, by the first convolution unit of the encoder module of the sky segmentation network in step S220, the first image feature of the original image with different receptive fields may include:
s2201, extracting first image features with different receptive fields of the original image layer by utilizing a multilayer first convolution unit of an encoder module of the sky segmentation network; wherein the resolution of the first image feature decreases from layer to layer.
In the convolutional neural network, the definition of a Receptive Field (Receptive Field) is the area size of a pixel point on a feature image (feature map) output by each layer of the convolutional neural network, which is mapped on an input image. The explanation for the restyle point is that a point on the feature image corresponds to an area on the input image. In the embodiment, the semantic information and the spatial information of the original image are determined by acquiring the first image features with different receptive fields, so that the accuracy of pixel classification and the edge segmentation precision of the original image are improved.
Each first convolution unit comprises a Conv + BN + ReLU network layer which is stacked for multiple times; in the present embodiment, the second convolution unit includes two stacked Conv + BN + ReLU network layers.
In this embodiment, each layer of the first convolution unit is processed, the resolution of the output first image feature is halved with respect to the resolution of the input first image feature of the layer.
In one embodiment, the decoder module includes a plurality of layers of second convolution units; the second convolution units are in one-to-one correspondence with the first convolution units. The operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image feature layer by layer through a line bilinear interpolation upsampling operation, so that a feature image consistent with the resolution of the input original image, namely the first feature image, is restored layer by layer.
In an embodiment, the step S220 of inputting the first image feature to a second convolution unit of a decoder module of the sky segmentation network for processing, and restoring to obtain a first feature image consistent with the resolution of the input original image includes the following steps:
s2202, inputting the first image features output by each first convolution unit of the encoder module into a second convolution unit corresponding to the corresponding decoder module by adopting a feature jumper connection mode, performing network operation layer by combining with an up-sampling mode, and extracting to obtain second image features.
The characteristic skip connection (skip connection) refers to that a first image characteristic output by a first convolution unit positioned in a middle layer (not a last layer) of an encoder module is input into a corresponding second convolution unit of a corresponding decoder module in a skip mode, and is subjected to convolution operation processing with a second image characteristic output by a second convolution unit positioned in a previous layer corresponding to the layer.
Fig. 3 is a schematic diagram illustrating an operation of a sky segmentation network according to an embodiment of the present invention, as shown in fig. 3, a first image feature output by an in _ conv convolution unit of an Encoder module is input to a Decoder _5 convolution unit corresponding to a Decoder module in a skip manner, a first image feature output by an Encoder _1 convolution unit of the Encoder module is input to a Decoder _4 convolution unit corresponding to the Decoder module in a skip manner, a first image feature output by an Encoder _2 convolution unit of the Encoder module is input to a Decoder _3 convolution unit corresponding to the Decoder module in a skip manner, a first image feature output by an Encoder _3 convolution unit of the Encoder module is input to a Decoder _2 convolution unit corresponding to the Decoder module in a skip manner, and a first image feature output by an Encoder _4 convolution unit of the Encoder module is input to a Decoder _1 convolution unit corresponding to the Decoder module in a skip manner, so as to implement a feature skip manner, and input a first image feature output by each first convolution unit of the Encoder module to a corresponding Decoder module And the module corresponds to a second convolution unit. Meanwhile, a Decoder _1 convolution unit corresponding to the Decoder module acquires a first image characteristic output by an Encoder _5 convolution unit, a Decoder _2 corresponding to the Decoder module acquires a second image characteristic output by the Decode _1 convolution unit, a Decoder _3 convolution unit corresponding to the Decoder module acquires a second image characteristic output by the Decode _2 convolution unit, a Decode _4 convolution unit corresponding to the Decoder module acquires a second image characteristic output by the Decode _3 convolution unit, and a Decode _5 corresponding to the Decoder module acquires a second image characteristic output by the Decode _4 convolution layer, thereby realizing that the first image characteristics output by each first convolution unit of the encoder module are input into the second convolution unit corresponding to the corresponding decoder module, and performing network operation layer by combining an upsampling mode with the second image features output by each second convolution unit of the decoder module, and extracting to obtain second image features.
S2203, restoring according to the second image characteristics to obtain a first characteristic image which is consistent with the resolution of the input image to be segmented.
In this embodiment, the decoder module doubles the resolution of the second image feature layer by layer through multiple layers of second convolution units, so that when the resolution of the second image feature output by the last layer of second convolution units is consistent with the resolution of the input original image, the first feature image is obtained.
Further, a probability image corresponding to the original image is obtained according to the first feature image, wherein the probability image is an image formed by pixels with values of 0 or 1 and is a single-channel image. In the probability image, the value of each pixel is 0 or 1, and the size of the probability image is the same as that of the input image to be segmented.
And S230, receiving a probability image output by the sky segmentation network, and segmenting the sky area of the original image according to the probability image.
The sky segmentation network performs semantic segmentation on an original image, outputs a corresponding probability image, wherein the value of each pixel in the probability image is 1 or 0, and determines a sky segmentation area according to the pixel value, for example, the pixel value 255 is white, and the white area is a sky area.
Optionally, the sky region of the image may be segmented according to the probability image output by the sky segmentation network. In this embodiment, subsequent processing, such as guided filtering, may be performed on the probability image to optimize the segmentation accuracy of the sky region, or inter-frame mean smoothing may be performed to obtain a more accurate probability image, so as to further improve the segmentation effect of the sky region of the image.
In the image-based sky region segmentation method provided by the embodiment, an original image containing sky content features is obtained; inputting an original image into a sky segmentation network, and performing semantic segmentation processing on the original image to obtain a probability image corresponding to the original image; the method comprises the steps of receiving a probability image output by a sky segmentation network, segmenting a sky region of an original image according to the probability image, wherein the sky segmentation network is a convolution neural network with the number of compression channels, so that the sky region in the image is accurately segmented on a client, the performance is stable, the processing speed is high, and the requirements of users are met.
In order to make the technical solution clearer and easier to understand, specific implementation processes and modes of a plurality of steps in the technical solution are described in detail below.
It should be noted that, in the related art, all sky segmentation implementation schemes are processed based on image input, and generally have a large computation amount, and if a neural network in the related art, such as a Unet network, is directly transplanted to a client, the processing speed of sky segmentation is slow, and even the sky segmentation cannot work due to dead halt, so that a card end is caused, and the sky segmentation network in the related art cannot be deployed at the client for operation. According to the scheme, the sky segmentation of the image is carried out based on the lightweight sky segmentation network, the image can be deployed at a client, the sky area can be accurately segmented in real time, and blocking is avoided while the segmentation precision is guaranteed.
Based on this, the image-based sky region segmentation method provided by the present application further includes: and training to generate a sky segmentation network. In an embodiment, the sky segmentation network may be obtained by:
s300, compressing the number of basic channels of a Unet network model according to the sky segmentation scene characteristics and increasing the number of layers of the Unet network model, so that the number of the basic channels after modification is adapted to the characteristics of the sky segmentation scene, and training to obtain a lightweight sky segmentation network.
In this embodiment, the sky-segmented scene characteristic refers to a parameter characteristic that meets requirements such as segmentation accuracy, display resolution, and image size required by an image production platform, for example, fine-tuning of an artistic drawing, production of a short video or a small video.
In this embodiment, the convolutional neural network adopted by the sky segmentation network is a neural network built based on a net network architecture. The Unet network architecture learns deep features by means of downsampling and convolution in different degrees, restores the deep features into the size of an original image by means of upsampling, and outputs a probability image corresponding to the original image.
In an embodiment, the number of basic channels of the Unet network architecture is compressed, the number of basic channels is compressed from 64 to 16, meanwhile, 4 convolutional layers of the convolutional neural network are increased to 5 layers, the segmentation effect loss caused by the compression of the number of basic channels is made up, and the feature extraction capability of the sky segmentation network is enhanced, so that the resolution of feature images output by the sky segmentation network with the number of layers of the basic channels and the convolutional layers is adapted to the characteristics of a sky segmentation scene, and a lightweight sky segmentation network capable of being deployed on a client is obtained.
In this embodiment, each convolutional layer of the sky segmentation network includes a plurality of encoder basic convolution modules and decoder basic convolution modules; the encoder basic convolution module comprises Conv + BN + ReLU network layers stacked twice, and the decoder basic convolution module comprises Conv + BN + ReLU + UpSample network layers. Compared with the encoder basic convolution module and the decoder basic convolution module in the related art, the encoder basic convolution module and the decoder basic convolution module provided by the embodiment add BN operation to enable training to be more stable and the segmentation effect to be better.
In this embodiment, the number of the encoder basic convolution modules and the number of the decoder basic convolution modules included in each convolution layer of the sky segmentation network may be different, so as to achieve different training effects.
Compared with the calculation amount Flops of the original Unet network model being 46.3G, the calculation amount Flops of the sky segmentation network provided by the embodiment capable of performing sky segmentation in real time being 370M, the processing speed is increased by more than 160 times, the sky segmentation network can be deployed on a smart phone of a middle-low-end android system to operate in real time, and the segmentation accuracy and stability are better.
In an embodiment, the obtaining a probability image corresponding to the original image according to the first feature image in step S220 may include the following steps:
s2204, converting the first characteristic image by using a Sigmoid function to obtain a probability image corresponding to the first characteristic image.
The first characteristic image is a single-channel image with the same size as the original image. In a single-channel image, commonly referred to as a gray-scale image, each pixel point can only have one value representing color, the pixel value of the single-channel image is between 0 and 255, 0 is black, 255 is white, and the intermediate value is gray of different levels. The pixel value range of the first characteristic image is 0-255.
The value range of the Sigmoid function is between 0 and 1, and the Sigmoid function has very good symmetry. In the embodiment, the feature image with the pixel value range of 0-255 is converted by using a Sigmoid function, and the value range of the probability image corresponding to the output feature image is 0-1.
The Sigmoid function is:
Figure BDA0002726577930000151
wherein, x is the pixel value of the characteristic image, and f (x) is the value of the probability image corresponding to the pixel point of the characteristic image.
In one embodiment, in order to optimize the segmentation effect of the sky segmentation network and reduce the false detection phenomenon of missed detection, in the training process, a multi-scale loss constraint method is used for optimizing model training. Specifically, the training of the second feature image output by the second convolution unit of at least two layers of the decoder module by using the loss function corresponding to the resolution of the second feature image may include the following steps:
s3101, determining a resolution ratio of the second feature image outputted by the second convolution unit of at least two layers of the decoder module with respect to the original image.
In this embodiment, the second convolution unit of the upper layer of the decoder module outputs the second feature image as the input of the second convolution unit of the lower layer. The resolution of the second feature image output by each layer of second convolution unit is half of the resolution of the second feature image input by the layer of second convolution unit (i.e. the second feature image output by the last second convolution unit). For example, if the resolution of the input second feature image is (384,512), the resolution of the output second feature image is (192,256). In the present embodiment, the feature channel may be adjusted to the output channel by 1 × 1 convolution, and the second feature image output by each second convolution unit is output. During the prediction phase, the neural network does not need to compute these additional 1 × 1 convolutions and therefore does not add additional computational effort.
And after the second convolution unit of the second layer, the resolution of the output second characteristic image is 1/4 (namely 1/2 of the resolution of the second characteristic image of the previous layer) of the resolution of the original image, and so on. And acquiring the resolution of the second characteristic image output by at least two second convolution units of the decoder module, and determining the ratio of the resolution of the second characteristic image to the resolution of the original image based on the resolution of the second characteristic image.
S3102, determining the type and weight value of a BCE (binary Cross Entry) loss function for training the second convolution units of the at least two layers according to the resolution ratio.
In this embodiment, the type and weight of the BCE loss function that trains the second convolution unit of at least two layers are matched to the resolution of the feature image output by the convolution layer.
For example, on the branch of the second feature image output with the resolution ratio of 1/2, training is performed by the 1/2BCE loss function, on the branch of the second feature image output with the resolution ratio of 1/4, training is performed by the 1/4BCE loss function, and on the branch of the second feature image output with the resolution ratio of 1/8, training is performed by the 1/8BCE loss function. Among them, 1/2BCE loss function, 1/4BCE loss function, and 1/8BCE loss function are different kinds of loss functions.
Furthermore, the weight of the loss function on each output branch can be set, so that the trained result is more accurate and stable.
S3103, based on the weight values, utilizing the BCE loss functions of the corresponding types to carry out constraint training on the sky segmentation network.
In this embodiment, based on the weight of the Loss function on each output branch, the final Loss in the training phase is calculated as:
Figure BDA0002726577930000161
wherein the content of the first and second substances,
Figure BDA0002726577930000162
the predicted value of the network is shown, y is shown as the true value, and the values of alpha, beta, theta and lambda are 10, 2, 1 and 1. Of course, in other application scenarios, α, β, θ and λ may be other values, which is not limited herein.
Fig. 4 is a flowchart of an image special effect processing method provided in an embodiment of the present application, where the image special effect processing method is applicable to an image special effect processing device, such as a client.
Specifically, as shown in fig. 4, the image special effect processing method may include the following steps:
and S410, acquiring a target sky material selected by a user.
In this embodiment, a sky material is downloaded locally in advance at a client, a user triggers a panel for popping up the sky material by clicking a relevant button on a display interface of the client, and different sky materials are displayed through the panel, wherein the sky materials can be the sky material pre-configured by a system or the sky material self-defined by the user.
The user selects one or more of the sky materials on the panel as the target sky material for subsequent processing.
And S420, determining a sky area to be segmented of the original image to be segmented.
The sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image which is consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image.
An original image containing sky content features is acquired. Generally, a user takes an external scene image including a sky area by using a mobile phone camera or the like. Sky content features refer to features representing the sky, such as clouds, color categories, color variations, and color regions.
In this embodiment, the original image including the sky content feature may be acquired from a local device such as a mobile phone camera, or may be acquired from a local storage device or an external storage device.
And adjusting the image format of the original image according to the sky segmentation scene characteristics. In this embodiment, the sky-segmentation scene characteristic refers to a parameter characteristic that meets requirements of image processing, such as segmentation accuracy, display resolution, image size, and the like, required for fine image correction or small video cover creation.
For example, the image is subjected to processing such as cropping and resolution adjustment, and the height and width of the image are adjusted to (384,512), where the height of the image is 384 pixels in height and the width is 512 pixels in width.
And setting a preprocessing parameter of the image to be segmented according to the sky segmentation scene characteristic, and adjusting an image format of an original image in the image to be segmented according to the preprocessing parameter, such as adjusting the size, the resolution, the precision and the like.
Inputting an original image subjected to image format adjustment into a sky segmentation network, and performing semantic segmentation processing on the original image to obtain a probability image corresponding to the original image; wherein the sky segmentation network is a convolutional neural network that compresses a number of channels.
Semantic segmentation of images (semantic segmentation) refers to having a computer segment according to the semantics of an image. In the image field, semantic meaning refers to the content of an image, and understanding the meaning of a picture, meaning that different objects in the picture are divided from the perspective of pixels, and labeling is performed on each pixel in an original image, for example, white in an image frame represents a sky area, and black represents a non-sky area. Of course, in other embodiments, white of the image frame may represent a sky area, black may represent a non-sky area, etc.
In this embodiment, the sky segmentation network is a convolutional neural network obtained through pre-training, can be adapted to an image-based sky region segmentation scene, is deployed on a client, and can perform pattern transformation on a sky region in an image in real time. For the high operand of the relevant convolution neural network, the sky segmentation model of this embodiment carries out structure adjustment and parameter adjustment to on the basis of guaranteeing the identification accuracy, the operand significantly reduces. Generally, the larger the number of channels of the neural network, the larger the amount of data computation. The sky segmentation network compresses the number of channels, so that the operation amount is reduced, for example, the number of basic channels is compressed from 64 to 16. Sky segmentation is based on image processing, and the operand requirement is a little lower, so the number of channels is set to 16, the parameter number is properly reduced, and the accuracy and the stability of the segmentation effect are ensured when the operand is reduced. In this embodiment, the segmentation effect caused by the compression of the number of channels can be compensated by other parameter adjustment or structure adjustment methods, so as to ensure the segmentation accuracy.
In this embodiment, a first convolution unit of an encoder module of the sky segmentation network is used to extract first image features of the original image with different receptive fields, the first image features are input to a second convolution unit of a decoder module of the sky segmentation network for processing, a first feature image consistent with the resolution of the input original image is restored, and a probability image corresponding to the original image is obtained according to the first feature image.
Wherein the encoder module comprises a plurality of layers of first convolution elements; the decoder module comprises a plurality of layers of second convolution units, and the second convolution units correspond to the first convolution units one by one. In this embodiment, the decoder module includes five layers of second convolution units, and the decoder module includes five layers of second convolution units.
In an embodiment, a plurality of layers of first convolution units of an encoder module of the sky segmentation network are utilized to extract first image features with different receptive fields of the original image layer by layer; wherein the resolution of the first image feature decreases from layer to layer.
In the convolutional neural network, the definition of a Receptive Field (Receptive Field) is the area size of a pixel point on a feature image (feature map) output by each layer of the convolutional neural network, which is mapped on an input image. The explanation for the restyle point is that a point on the feature image corresponds to an area on the input image. In the embodiment, the semantic information and the spatial information of the original image are determined by acquiring the first image features with different receptive fields, so that the accuracy of pixel classification and the edge segmentation precision of the original image are improved.
Each first convolution unit comprises a Conv + BN + ReLU network layer which is stacked for multiple times; in the present embodiment, the second convolution unit includes two stacked Conv + BN + ReLU network layers.
In this embodiment, each layer of the first convolution unit is processed, the resolution of the output first image feature is halved with respect to the resolution of the input first image feature of the layer.
In one embodiment, the decoder module includes a plurality of layers of second convolution units; the second convolution units are in one-to-one correspondence with the first convolution units. The operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image feature layer by layer through a line bilinear interpolation upsampling operation, so that a feature image consistent with the resolution of the input original image, namely the first feature image, is restored layer by layer.
In an embodiment, the first image features output by each first convolution unit of the encoder module are input into a second convolution unit corresponding to a corresponding decoder module by adopting a feature jumper connection mode, and network operation is performed layer by combining with an up-sampling mode to extract and obtain second image features.
The characteristic skip connection (skip connection) refers to that a first image characteristic output by a first convolution unit positioned in a middle layer (not a last layer) of an encoder module is input into a corresponding second convolution unit of a corresponding decoder module in a skip mode, and is subjected to convolution operation processing with a second image characteristic output by a second convolution unit positioned in a previous layer corresponding to the layer.
Fig. 3 is a schematic diagram illustrating an operation of a sky segmentation network according to an embodiment of the present invention, as shown in fig. 3, a first image feature output by an in _ conv convolution unit of an Encoder module is input to a Decoder _5 convolution unit corresponding to a Decoder module in a skip manner, a first image feature output by an Encoder _1 convolution unit of the Encoder module is input to a Decoder _4 convolution unit corresponding to the Decoder module in a skip manner, a first image feature output by an Encoder _2 convolution unit of the Encoder module is input to a Decoder _3 convolution unit corresponding to the Decoder module in a skip manner, a first image feature output by an Encoder _3 convolution unit of the Encoder module is input to a Decoder _2 convolution unit corresponding to the Decoder module in a skip manner, and a first image feature output by an Encoder _4 convolution unit of the Encoder module is input to a Decoder _1 convolution unit corresponding to the Decoder module in a skip manner, so as to implement a feature skip manner, and input a first image feature output by each first convolution unit of the Encoder module to a corresponding Decoder module And the module corresponds to a second convolution unit. Meanwhile, a Decoder _1 convolution unit corresponding to the Decoder module acquires a first image characteristic output by an Encoder _5 convolution unit, a Decoder _2 corresponding to the Decoder module acquires a second image characteristic output by the Decode _1 convolution unit, a Decoder _3 convolution unit corresponding to the Decoder module acquires a second image characteristic output by the Decode _2 convolution unit, a Decode _4 convolution unit corresponding to the Decoder module acquires a second image characteristic output by the Decode _3 convolution unit, and a Decode _5 corresponding to the Decoder module acquires a second image characteristic output by the Decode _4 convolution layer, thereby realizing that the first image characteristics output by each first convolution unit of the encoder module are input into the second convolution unit corresponding to the corresponding decoder module, and performing network operation layer by combining an upsampling mode with the second image features output by each second convolution unit of the decoder module, and extracting to obtain second image features.
In an embodiment, a first feature image consistent with the resolution of the input image to be segmented is obtained through restoration according to the second image feature.
In this embodiment, the decoder module doubles the resolution of the second image feature layer by layer through multiple layers of second convolution units, so that when the resolution of the second image feature output by the last layer of second convolution units is consistent with the resolution of the input original image, the first feature image is obtained.
Further, a probability image corresponding to the original image is obtained according to the first feature image, wherein the probability image is an image formed by pixels with values of 0 or 1 and is a single-channel image. In the probability image, the value of each pixel is 0 or 1, and the size of the probability image is the same as that of the input image to be segmented.
And the client receives the probability image output by the sky segmentation network, and segments the sky area of the original image according to the probability image.
The sky segmentation network performs semantic segmentation on an original image, outputs a corresponding probability image, wherein the value of each pixel in the probability image is 1 or 0, and determines a sky segmentation area according to the pixel value, for example, the pixel value 255 is white, and the white area is a sky area.
Optionally, the sky region of the image may be segmented according to the probability image output by the sky segmentation network. In this embodiment, subsequent processing, such as guided filtering, may be performed on the probability image to optimize the segmentation accuracy of the sky region, or inter-frame mean smoothing may be performed to obtain a more accurate probability image, so as to further improve the segmentation effect of the sky region of the image.
And S430, fusing the target sky material to the sky area to generate a target image.
Covering a target sky material on a sky area of an original image, or deleting an original sky pattern of the original image, replacing the target sky material selected by a user, and fusing the target sky material into the original image to obtain a target image, so that the target image has special sky effect, for example, a cloudy sky is changed into a clear and bright sky.
According to the image special effect processing method provided by the embodiment, a target sky material selected by a user is obtained; performing semantic segmentation processing on the original image by using a pre-trained lightweight sky segmentation network to obtain a probability image corresponding to the original image; the method comprises the steps of receiving a probability image output by a sky segmentation network, determining that the probability image is segmented in a sky area of an original image correspondingly, carrying out special effect processing on the sky area by using a target sky material, outputting a target image fused with the target sky material, realizing accurate segmentation of the sky area of the original image and replacement of the sky material in real time on a client, and meeting the requirements of a user.
In order to more clearly illustrate the present solution, an implementation process of the present solution is exemplarily described below with reference to fig. 5. Fig. 5 is a schematic diagram of an implementation of image special effect processing according to an embodiment of the present application.
As shown in fig. 5, an input original image is acquired, the original image is input into a pre-trained sky segmentation network for semantic segmentation, encoding and decoding are performed by using convolution units of an encoding module and a decoding module of the sky segmentation network, in the encoding process, layer-by-layer convolution encoding processing is performed on convolution layers such as in _ conv, Encoder _1, Encoder _2, Encoder _3, Encoder _4, and Encoder _5, and then decoding processing such as Decoder _1, Decoder _2, Decoder _3, Decoder _4, and Decoder _5 is performed to obtain a probability image.
As shown in fig. 5, for each Decoder _ i (i is a positive integer, such as 1, 2, 3, 4 and 5) channel output branch, the output result is trained with a corresponding BCE penalty function, if the resolution ratio of the second characteristic image output by the Decoder _2 channel to the original image is 1/8, then 1/8BCE loss function is used for training, and further, the resolution ratio of the second characteristic image output by the Decoder _3 channel to the original image is 1/4, then, using 1/4BCE loss function for training, the resolution ratio of the second feature image output by the Decoder _4 channel to the original image is 1/2, then the training is performed by using 1/2BCE loss function, and the resolution ratio of the second feature image output by the Decoder _5 channel to the original image is 1/1, then the training is performed by using 1/1BCE loss function.
In the embodiment, different BCE loss function constraints are performed on output results of different convolutional layer output channels, which is beneficial to model training and learning of more detailed edge information.
The above examples are merely used to assist in explaining the technical solutions of the present disclosure, and the drawings and specific flows related thereto do not constitute a limitation on the usage scenarios of the technical solutions of the present disclosure.
The following describes in detail related embodiments of an image-based sky region segmentation apparatus and an image special effect processing apparatus.
Fig. 6 is a schematic structural diagram of an image-based sky region segmentation apparatus according to an embodiment, which is executable by an image-based sky region segmentation device, such as a client.
Specifically, the image-based sky region segmentation apparatus 200 includes: an original image obtaining module 210, a probability image obtaining module 220, and a sky region segmentation module 230.
The original image obtaining module 210 is configured to obtain an original image containing sky content features; a probability image obtaining module 220, configured to input the original image with the adjusted image format into a sky segmentation network, perform semantic segmentation processing on the original image, extract first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, input the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restore the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtain a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image; a sky region segmentation module 230 configured to segment a sky region of the original image according to the probability image.
The sky region segmentation device based on the image, which is provided by the embodiment, is deployed at the client, can realize accurate segmentation of the sky region of the image on the client, has stable performance and high processing speed, and meets the requirements of users.
In an embodiment, the encoder module comprises a plurality of layers of first convolution elements;
the probability image obtaining module 220 includes: a first image feature extraction unit, configured to extract, layer by layer, first image features of the original image with different receptive fields by using a multi-layer first convolution unit of an encoder module of the sky segmentation network; wherein the resolution of the first image feature decreases from layer to layer.
In one embodiment, the decoder module includes a plurality of layers of second convolution units; the second convolution units correspond to the first convolution units one by one;
the probability image obtaining module 220 includes: a second image feature extraction unit and a first feature image obtaining unit;
the second image feature extraction unit is used for inputting the first image features output by each first convolution unit of the encoder module into a second convolution unit corresponding to the corresponding decoder module in a feature jumping mode, and performing network operation layer by combining with an up-sampling mode to extract and obtain second image features; and the first characteristic image obtaining unit is used for obtaining a first characteristic image which is consistent with the resolution of the input image to be segmented according to the second image characteristic restoration.
In an embodiment, each of the first convolution units includes a plurality of stacked Conv + BN + ReLU network layers; the operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image features layer by layer through a line bilinear interpolation upsampling operation.
In an embodiment, the image-based sky region segmentation apparatus 200 further includes: and the segmentation network training module is used for training the second characteristic image output by the second convolution unit of at least two layers of the decoder module by using a loss function corresponding to the resolution of the second characteristic image.
In one embodiment, the split network training module comprises: the system comprises a resolution ratio unit, a weight value determining unit and a constraint training unit; the resolution ratio unit is used for determining the resolution ratio of the second characteristic image output by the second convolution unit of at least two layers of the decoder module relative to the original image; a weight value determining unit, configured to determine, according to the resolution ratio, a type and a weight value of a BCE loss function used for training the second convolution units of the at least two layers; and the constraint training unit is used for carrying out constraint training on the sky segmentation network by utilizing the BCE loss function of the corresponding category based on the weight value.
In one embodiment, the resolution ratio unit includes: an output channel adjusting subunit and a resolution ratio determining unit;
the output channel adjusting subunit is used for adjusting the characteristic channels of the second convolution units of at least two layers of the decoder module into output channels through 1 × 1 convolution and outputting corresponding second characteristic images; and the resolution ratio determining unit is used for calculating the resolution ratio of the second characteristic image relative to the original image based on the resolution of the second characteristic image.
In one embodiment, the probability image obtaining module 220 includes: and the probability image obtaining unit is used for converting the first characteristic image by using a Sigmoid function to obtain a probability image corresponding to the characteristic image.
The image-based sky region segmentation apparatus according to the embodiment of the present disclosure may perform the image-based sky region segmentation method according to the embodiment of the present disclosure, which achieves similar principles and has the same beneficial effects as the image special effect processing method, and thus, details thereof are not repeated herein.
Fig. 7 is a schematic structural diagram of an image special effect processing apparatus according to an embodiment, where the image special effect processing apparatus is executable on an image special effect processing device, such as a client.
Specifically, as shown in fig. 7, the image special effect processing apparatus 400 includes: a sky material acquisition module 410, a sky region determination module 420, and a target image generation module 430.
The sky material acquiring module 410 is configured to acquire a target sky material selected by a user;
a sky region determining module 420, configured to determine a sky region of an original image to be segmented, where the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network, processing the first image features, restoring to obtain a first feature image that is consistent with a resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and a target image generation module 430, configured to fuse the target sky material to the sky area, and generate a target image.
The image special effect processing device provided by the embodiment is deployed at a client, can realize the accurate segmentation of the sky area of an image and the synthesis processing of the sky special effect on the client, has stable performance and high processing speed, and meets the requirements of users.
The image special effect processing apparatus according to the embodiment of the present disclosure may execute the image special effect processing method provided by the embodiment of the present disclosure, which achieves similar principles and has the same beneficial effects as the image special effect processing method, and details are not repeated here.
An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the image-based sky region segmentation method or the image special effect processing method in any of the above embodiments when executing the program.
When the computer device provided by the above embodiment executes the sky region segmentation method based on the image or the image special effect processing method provided by any of the above embodiments, the computer device provided by the above embodiment has corresponding functions and beneficial effects.
An embodiment of the present invention provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for image-based sky region segmentation, including:
acquiring an original image containing sky content characteristics;
inputting an original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and receiving a probability image output by the sky segmentation network, and segmenting the sky area of the original image according to the probability image.
The computer executable instructions, when executed by a computer processor, are further for performing an image special effects processing method comprising:
acquiring a target sky material selected by a user;
determining a sky area to be segmented of the original image, wherein the sky area is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image which is consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and fusing the target sky material to the sky area to generate a target image.
Of course, the embodiments of the present invention provide a storage medium containing computer-executable instructions, which are not limited to the operation of the image-based sky region segmentation method or the image special effect processing method described above, and have corresponding functions and advantages.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk, or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the image-based sky region segmentation method or the image special effect processing method according to any embodiment of the present invention.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps. The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.

Claims (12)

1. A sky region segmentation method based on images is characterized by comprising the following steps:
acquiring an original image containing sky content characteristics;
inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features with different receptive fields of the original image by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and segmenting the sky area of the original image according to the probability image.
2. The image-based sky region segmentation method of claim 1, wherein the encoder module includes a plurality of layers of a first convolution unit;
the step of extracting first image features of the original image having different receptive fields using a first convolution unit of an encoder module of the sky segmentation network comprises:
extracting first image features with different receptive fields of the original image layer by utilizing a plurality of layers of first convolution units of an encoder module of the sky segmentation network; wherein the resolution of the first image feature decreases from layer to layer.
3. The image-based sky region segmentation method of claim 2, wherein the decoder module includes a multi-layer second convolution unit; the second convolution units correspond to the first convolution units one by one;
the step of inputting the first image feature into a second convolution unit of a decoder module of the sky segmentation network for processing, and restoring to obtain a first feature image consistent with the resolution of the input original image includes:
inputting the first image features output by each first convolution unit of the encoder module into a second convolution unit corresponding to the corresponding decoder module by adopting a feature jumper connection mode, and performing network operation layer by combining with an up-sampling mode to extract and obtain second image features;
and restoring according to the second image characteristic to obtain a first characteristic image which is consistent with the resolution of the input image to be segmented.
4. The image-based sky region segmentation method of claim 3, wherein each of the first convolution units includes a plurality of stacked Conv + BN + ReLU network layers;
the operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image features layer by layer through a line bilinear interpolation upsampling operation.
5. The method of claim 1, wherein the step of training a second feature image output by a second convolution unit of at least two layers of the decoder module with a loss function corresponding to a resolution of the second feature image comprises:
determining the resolution ratio of a second characteristic image output by a second convolution unit of at least two layers of the decoder module relative to the original image;
determining the type and weight value of a BCE loss function used for training the second convolution units of the at least two layers according to the resolution ratio;
and based on the weight value, carrying out constraint training on the sky segmentation network by utilizing the BCE loss function of the corresponding category.
6. The method of claim 5, wherein the step of determining a resolution ratio of a second characteristic image outputted by a second convolution unit of at least two layers of the decoder module with respect to the original image comprises:
adjusting the characteristic channels of the second convolution units of at least two layers of the decoder module into output channels through 1 x 1 convolution, and outputting corresponding second characteristic images;
and calculating the resolution ratio of the second characteristic image relative to the original image based on the resolution of the second characteristic image.
7. The method of claim 1, wherein the step of deriving a probability image corresponding to the original image according to the first feature image comprises:
converting the first characteristic image by using a Sigmoid function to obtain a probability image corresponding to the characteristic image; and the first characteristic image is a second characteristic image output by a last layer of second convolution unit of the encoder module.
8. An image special effect processing method is characterized by comprising the following steps:
acquiring a target sky material selected by a user;
determining a sky region of an original image to be segmented, wherein the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image which is consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and fusing the target sky material to the sky area to generate a target image.
9. An image-based sky region segmentation apparatus, comprising:
the system comprises an original image acquisition module, a background image acquisition module and a background image acquisition module, wherein the original image acquisition module is used for acquiring an original image containing sky content characteristics;
a probability image obtaining module, configured to input the original image into a sky segmentation network, perform semantic segmentation processing on the original image, extract first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, input the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restore the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtain a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and the sky region segmentation module is used for segmenting the sky region of the original image according to the probability image.
10. An image special effect processing apparatus, comprising:
the sky material acquisition module is used for acquiring a target sky material selected by a user;
a sky region determining module, configured to determine a sky region of an original image to be segmented, where the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network, processing the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and the target image generation module is used for fusing the target sky material into the sky area to generate a target image.
11. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the image-based sky region segmentation method of any one of claims 1-7 or the image special effect processing method of claim 8.
12. A computer-readable storage medium on which a computer program is stored, which, when being executed by a processor, implements the image-based sky region segmentation method of any one of claims 1 to 7 or the image special effect processing method of claim 8.
CN202011104753.9A 2020-10-15 2020-10-15 Sky region segmentation and special effect processing method, device and equipment based on image Pending CN112200817A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011104753.9A CN112200817A (en) 2020-10-15 2020-10-15 Sky region segmentation and special effect processing method, device and equipment based on image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011104753.9A CN112200817A (en) 2020-10-15 2020-10-15 Sky region segmentation and special effect processing method, device and equipment based on image

Publications (1)

Publication Number Publication Date
CN112200817A true CN112200817A (en) 2021-01-08

Family

ID=74009749

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011104753.9A Pending CN112200817A (en) 2020-10-15 2020-10-15 Sky region segmentation and special effect processing method, device and equipment based on image

Country Status (1)

Country Link
CN (1) CN112200817A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115660944A (en) * 2022-10-27 2023-01-31 深圳市大头兄弟科技有限公司 Dynamic method, device and equipment for static picture and storage medium
WO2023016150A1 (en) * 2021-08-09 2023-02-16 北京字跳网络技术有限公司 Image processing method and apparatus, device, and storage medium

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023016150A1 (en) * 2021-08-09 2023-02-16 北京字跳网络技术有限公司 Image processing method and apparatus, device, and storage medium
CN115660944A (en) * 2022-10-27 2023-01-31 深圳市大头兄弟科技有限公司 Dynamic method, device and equipment for static picture and storage medium
CN115660944B (en) * 2022-10-27 2023-06-30 深圳市闪剪智能科技有限公司 Method, device, equipment and storage medium for dynamic state of static picture

Similar Documents

Publication Publication Date Title
CN109493350B (en) Portrait segmentation method and device
CN111598776B (en) Image processing method, image processing device, storage medium and electronic apparatus
US20220222786A1 (en) Image processing method, smart device, and computer readable storage medium
CN112330574B (en) Portrait restoration method and device, electronic equipment and computer storage medium
CN110929569B (en) Face recognition method, device, equipment and storage medium
CN110717851A (en) Image processing method and device, neural network training method and storage medium
WO2021048607A1 (en) Motion deblurring using neural network architectures
CN112348747A (en) Image enhancement method, device and storage medium
CN113487618B (en) Portrait segmentation method, portrait segmentation device, electronic equipment and storage medium
CN111652830A (en) Image processing method and device, computer readable medium and terminal equipment
WO2023284401A1 (en) Image beautification processing method and apparatus, storage medium, and electronic device
CN110751649A (en) Video quality evaluation method and device, electronic equipment and storage medium
CN111833360B (en) Image processing method, device, equipment and computer readable storage medium
US20230262189A1 (en) Generating stylized images on mobile devices
CN112200818A (en) Image-based dressing area segmentation and dressing replacement method, device and equipment
CN112200817A (en) Sky region segmentation and special effect processing method, device and equipment based on image
CN112085768A (en) Optical flow information prediction method, optical flow information prediction device, electronic device, and storage medium
CN113034413A (en) Low-illumination image enhancement method based on multi-scale fusion residual error codec
CN115482529A (en) Method, equipment, storage medium and device for recognizing fruit image in near scene
US11887277B2 (en) Removing compression artifacts from digital images and videos utilizing generative machine-learning models
CN114693929A (en) Semantic segmentation method for RGB-D bimodal feature fusion
US20230298148A1 (en) Harmonizing composite images utilizing a transformer neural network
CN116882511A (en) Machine learning method and apparatus
CN112200816A (en) Method, device and equipment for segmenting region of video image and replacing hair
CN111383289A (en) Image processing method, image processing device, terminal equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination