CN112200817A - Sky region segmentation and special effect processing method, device and equipment based on image - Google Patents
Sky region segmentation and special effect processing method, device and equipment based on image Download PDFInfo
- Publication number
- CN112200817A CN112200817A CN202011104753.9A CN202011104753A CN112200817A CN 112200817 A CN112200817 A CN 112200817A CN 202011104753 A CN202011104753 A CN 202011104753A CN 112200817 A CN112200817 A CN 112200817A
- Authority
- CN
- China
- Prior art keywords
- image
- sky
- original image
- convolution
- segmentation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 245
- 230000000694 effects Effects 0.000 title claims abstract description 65
- 238000003672 processing method Methods 0.000 title claims abstract description 26
- 238000012545 processing Methods 0.000 claims abstract description 86
- 230000006870 function Effects 0.000 claims description 56
- 238000012549 training Methods 0.000 claims description 49
- 239000000463 material Substances 0.000 claims description 35
- 238000000034 method Methods 0.000 claims description 35
- 238000013528 artificial neural network Methods 0.000 claims description 26
- 230000007423 decrease Effects 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 3
- 238000010586 diagram Methods 0.000 description 11
- 238000013527 convolutional neural network Methods 0.000 description 10
- 230000009466 transformation Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000000605 extraction Methods 0.000 description 5
- 238000012986 modification Methods 0.000 description 5
- 230000004048 modification Effects 0.000 description 5
- 230000009286 beneficial effect Effects 0.000 description 4
- 230000006835 compression Effects 0.000 description 4
- 238000007906 compression Methods 0.000 description 4
- 238000005034 decoration Methods 0.000 description 4
- 238000013135 deep learning Methods 0.000 description 4
- 238000007781 pre-processing Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000003062 neural network model Methods 0.000 description 3
- 230000015572 biosynthetic process Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000009499 grossing Methods 0.000 description 2
- 238000003702 image correction Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 238000004519 manufacturing process Methods 0.000 description 2
- 238000003786 synthesis reaction Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 239000005441 aurora Substances 0.000 description 1
- 230000000903 blocking effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 230000009191 jumping Effects 0.000 description 1
- 230000006855 networking Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/143—Segmentation; Edge detection involving probabilistic approaches, e.g. Markov random field [MRF] modelling
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20076—Probabilistic image processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application provides a sky region segmentation and special effect processing method, a device and equipment based on an image, and relates to the field of image processing. According to the technical scheme, the sky area in the image is accurately segmented on the client, the performance is stable, the processing speed is high, and the display effect of the image is improved.
Description
Technical Field
The present application relates to the field of image processing, and in particular, to a sky region segmentation and special effect processing method, device, and apparatus based on an image, and further including a computer-readable storage medium.
Background
With the rapid development of artificial intelligence, especially deep learning, semantic segmentation has become an important research topic with a wide application scenario. Sky segmentation means that a sky region in an image is segmented, then weather transformation is achieved through relevant post-processing, such as sunny days, rainy days, sunset and the like, and special effects such as fireworks, streamer lights, aurora and the like are added.
In the related art, a semantic segmentation technology based on deep learning is generally adopted, an image is input into a deep learning network structure or a convolutional neural network for training to obtain a sky segmentation model, and the sky segmentation model is used for processing such as feature extraction, segmentation, detection and identification. However, the current sky segmentation model occupies a large memory, can only be implemented in the cloud, and cannot be deployed on a user terminal to run in real time. Even if some sky segmentation models operate at the user terminal, segmentation is inaccurate, so that the segmentation effect is poor, and the use experience of a user is influenced.
Disclosure of Invention
The present application aims to solve at least one of the above-described technical drawbacks, and particularly, to solve the problems of a large amount of computation and low segmentation accuracy.
In a first aspect, an embodiment of the present application provides an image-based sky region segmentation method, including the following steps:
acquiring an original image containing sky content characteristics;
inputting an original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and segmenting the sky area of the original image according to the probability image.
In one embodiment, the encoder module includes a multi-layer first convolution element;
the step of extracting first image features of the original image having different receptive fields using a first convolution unit of an encoder module of the sky segmentation network comprises:
extracting first image features with different receptive fields of the original image layer by utilizing a plurality of layers of first convolution units of an encoder module of the sky segmentation network; wherein the resolution of the first image feature decreases from layer to layer.
In one embodiment, the decoder module comprises a plurality of layers of second convolution units; the second convolution units correspond to the first convolution units one by one;
the step of inputting the first image feature into a second convolution unit of a decoder module of the sky segmentation network for processing, and restoring to obtain a first feature image consistent with the resolution of the input original image includes:
inputting the first image features output by each first convolution unit of the encoder module into a second convolution unit corresponding to the corresponding decoder module by adopting a feature jumper connection mode, and performing network operation layer by combining with an up-sampling mode to extract and obtain second image features;
and restoring according to the second image characteristic to obtain a first characteristic image which is consistent with the resolution of the input image to be segmented.
In one embodiment, each of the first convolution units includes a plurality of stacked Conv + BN + ReLU network layers;
the operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image features layer by layer through a line bilinear interpolation upsampling operation.
In one embodiment, the step of training the second feature images output by the second convolution units of at least two layers of the decoder module by using a loss function corresponding to the resolution of the second feature images comprises:
determining the resolution ratio of a second characteristic image output by a second convolution unit of at least two layers of the decoder module relative to the original image;
determining the type and weight value of a BCE loss function for training each convolution layer according to the resolution ratio;
and based on the weight value, carrying out constraint training on the sky segmentation network by utilizing the BCE loss function of the corresponding category.
In one embodiment, the step of determining the resolution ratio of the second feature image output by the second convolution units of at least two layers of the decoder module with respect to the original image comprises:
adjusting the characteristic channels of the second convolution units of at least two layers of the decoder module into output channels through 1 x 1 convolution, and outputting corresponding second characteristic images;
and calculating the resolution ratio of the second characteristic image relative to the original image based on the resolution of the second characteristic image.
In one embodiment, the step of obtaining a probability image corresponding to the original image according to the first feature image includes:
converting the first characteristic image by using a Sigmoid function to obtain a probability image corresponding to the characteristic image; and the first characteristic image is a second characteristic image output by a last layer of second convolution unit of the encoder module.
In a second aspect, an embodiment of the present application provides an image special effect processing method, including:
acquiring a target sky material selected by a user;
determining a sky region of an original image to be segmented, wherein the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image which is consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and fusing the target sky material to the sky area to generate a target image.
In a third aspect, an embodiment of the present application provides an image-based sky region segmentation apparatus, including:
the system comprises an original image acquisition module, a background image acquisition module and a background image acquisition module, wherein the original image acquisition module is used for acquiring an original image containing sky content characteristics;
a probability image obtaining module, configured to input an original image into a sky segmentation network, perform semantic segmentation processing on the original image, extract first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, input the first image features into a second convolution unit of a decoder module of the sky segmentation network, perform processing on the first image features, restore the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtain a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and the sky region segmentation module is used for segmenting the sky region of the original image according to the probability image.
In a fourth aspect, an embodiment of the present application provides an image special effect processing apparatus, including:
the sky material acquisition module is used for acquiring a target sky material selected by a user;
a sky region determining module, configured to determine a sky region to be segmented of the original image, where the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network, processing the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and the target image generation module is used for fusing the target sky material into the sky area to generate a target image.
In a fifth aspect, an embodiment of the present application provides an electronic device, which includes:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the image-based sky region segmentation method according to the first aspect or the image special effect processing method according to the second aspect.
In a sixth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the image-based sky region segmentation method of the first aspect or the image special effect processing method of the second aspect.
In the image-based sky region segmentation and special effect processing method, device, equipment, and computer-readable storage medium provided by the embodiments, an original image including sky content features is acquired, the original image is input to a sky segmentation network to perform semantic segmentation processing, first image features having different receptive fields of the original image are extracted by using a first convolution unit of an encoder module of the sky segmentation network, the first image features are input to a second convolution unit of a decoder module of the sky segmentation network to be processed, a first feature image consistent with the resolution of the input original image is obtained by restoring, a probability image corresponding to the original image is obtained, and the sky region of the original image is segmented according to the probability image, wherein the sky segmentation network is a lightweight deep convolution neural network, and second feature images output by second convolution units of at least two layers of the decoder module utilize the resolution of the second feature image to be the same as the resolution of the second feature image The corresponding loss function is obtained by training and deployed at the client, so that the sky area in the image is accurately segmented at the client, the performance is stable, the processing speed is high, the display effect of the image is improved, and the requirements of users are met.
Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
fig. 1 is a schematic diagram of an application system framework involved in an image-based sky region segmentation process according to an embodiment of the present application;
fig. 2 is a flowchart of a sky region segmentation method based on an image according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a sky segmentation network according to an embodiment of the present application;
fig. 4 is a flowchart of an image special effect processing method provided in an embodiment of the present application;
FIG. 5 is a schematic diagram of an implementation of image special effects processing according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an image-based sky region segmentation apparatus according to an embodiment;
fig. 7 is a schematic structural diagram of an image special effect processing apparatus according to an embodiment.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
It will be understood by those within the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.
The following describes an application scenario related to an embodiment of the present application.
The embodiment of the application is applied to a scene with effect transformation on the sky in an image, and particularly can be applied to the image, the sky area in the image is identified and segmented, and the original sky is replaced by a target sky pattern through a weather transformation mode, a color transformation mode and the like.
For example, a neural network for identifying and segmenting a sky region of an image is deployed at a client, the sky region is determined through the neural network, a target sky material selected by a user is obtained, an original sky in the image is replaced by the target sky material, and then the image is displayed, so that the display effect of the image is improved, and the image is more attractive.
Based on the application scenario, the neural network is required to be executed on the client, and the sky area can be accurately segmented, so that the display effect of the image can be better improved. Of course, the technical solution provided in the embodiment of the present application may also be applied to other positioning scenarios, which are not listed here.
In order to better explain the technical solution of the present application, a certain application environment to which the present solution can be applied is shown below. Fig. 1 is a schematic diagram of an application system framework related to image-based sky region segmentation processing according to an embodiment of the present application, and as shown in fig. 1, the application system 10 includes a client 101 and a server 102, and a communication connection is established between the client 101 and the server 102 through a wired network or a wireless network.
The client 101 may be a portable device such as a smart phone, a smart camera, a palm computer, a tablet computer, an electronic book, and a notebook computer, which is not limited to the above, and may have functions such as photographing and image processing, so as to implement image-based sky region segmentation and special effect processing. Optionally, the client 101 has a touch screen, and a user may perform corresponding operations on the touch screen of the client 101 to implement functions such as sky segmentation, image processing, special effect synthesis, and the like. The client acquires the related images, performs related processing on the images, and sends the images to the server 102, so that the images are sent to other clients for display through the server 102.
The server 102 includes an electronic device, such as a background server provided by the client 101, and may be implemented by a stand-alone server or a server cluster composed of multiple servers. In one embodiment, the server may be an image sharing platform. After the user shoots the image, the image is processed correspondingly, such as sky transformation, character beautification, background replacement and the like, the processed image is uploaded to the server 102, and then the server 102 pushes the image to other clients, so that other users can see the produced image of the user.
The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments.
In the related image processing technology, a user can set related parameters at a client to identify a designated area, such as a sky area, in an image through a server with better performance, and after the sky area is subjected to pattern transformation according to the parameters set by the user, a processed image is generated and then pushed to each viewer.
In the process, the client and the server are required to be performed in a networking state, the neural network model is trained by means of high performance of the server, and the neural network model based on deep learning in the related technology cannot be deployed at the client due to large operation amount and cannot realize image processing based on the neural network model in an offline state. Some improved image processing methods can be executed on the client, but other methods for image segmentation have low accuracy, so that the segmentation effect is caused, and the image processing effect is influenced.
The application provides a sky region segmentation and special effect processing method, a sky region segmentation and special effect processing device, sky region segmentation and special effect processing equipment and a computer readable storage medium, and aims to solve the technical problems in the prior art.
Fig. 2 is a flowchart of an image-based sky region segmentation method according to an embodiment of the present disclosure, which is applicable to an image-based sky region segmentation apparatus, such as a client. The following description will be given taking a mobile terminal as an example.
And S210, acquiring an original image containing sky content characteristics.
Generally, a user takes an external scene image including a sky area by using a mobile phone camera or the like. Sky content features refer to features representing the sky, such as clouds, color categories, color variations, and color regions.
In this embodiment, the original image including the sky content feature may be acquired from a local device such as a mobile phone camera, or may be acquired from a local storage device or an external storage device. Of course, the original image may not include sky content features, and if the original image not including sky content features is obtained, it may be deleted in subsequent processing, or subsequent sky segmentation or special effect processing is not required.
Optionally, an image format of the original image may be adjusted according to the sky segmentation scene characteristics.
In this embodiment, the sky-segmentation scene characteristic refers to a parameter characteristic that meets requirements of image processing, such as segmentation accuracy, display resolution, image size, and the like, required for fine image correction or small video cover creation.
For example, the image is subjected to processing such as cropping and resolution adjustment, and the height and width of the image are adjusted to (384,512), where the height of the image is 384 pixels in height and the width is 512 pixels in width.
And setting a preprocessing parameter of the image to be segmented according to the sky segmentation scene characteristic, and adjusting an image format of an original image in the image to be segmented according to the preprocessing parameter, such as adjusting the size, the resolution, the precision and the like.
S220, an original image is input into a sky segmentation network, semantic segmentation processing is carried out on the original image, first image features with different receptive fields of the original image are extracted by utilizing a first convolution unit of an encoder module of the sky segmentation network, the first image features are input into a second convolution unit of a decoder module of the sky segmentation network to be processed, a first feature image which is consistent with the input resolution of the original image is obtained through restoration, and a probability image corresponding to the original image is obtained according to the first feature image.
The sky segmentation network is a convolution neural network with a light-weight depth, and the sky segmentation network is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image.
Semantic segmentation of images (semantic segmentation) refers to having a computer segment according to the semantics of an image. In the image field, semantic meaning refers to the content of an image, and understanding the meaning of a picture, meaning that different objects in the picture are divided from the perspective of pixels, and labeling is performed on each pixel in an original image, for example, white in an image frame represents a sky area, and black represents a non-sky area. Of course, in other embodiments, white of the image frame may represent a sky area, black may represent a non-sky area, etc.
In this embodiment, the sky segmentation network is a convolutional neural network obtained through pre-training, can be adapted to an image-based sky region segmentation scene, is deployed on a client, and can perform pattern transformation on a sky region in an image in real time. For the high operand of the relevant convolution neural network, the sky segmentation model of this embodiment carries out structure adjustment and parameter adjustment to on the basis of guaranteeing the identification accuracy, the operand significantly reduces. Generally, the larger the number of channels of the neural network, the larger the amount of data computation. The sky segmentation network compresses the number of channels, so that the operation amount is reduced, for example, the number of basic channels is compressed from 64 to 16. Sky segmentation is based on image processing, and the operand requirement is a little lower, so the number of channels is set to 16, the parameter number is properly reduced, and the accuracy and the stability of the segmentation effect are ensured when the operand is reduced. In this embodiment, the segmentation effect caused by the compression of the number of channels can be compensated by other parameter adjustment or structure adjustment methods, so as to ensure the segmentation accuracy.
In this embodiment, a first convolution unit of an encoder module of the sky segmentation network is used to extract first image features of the original image with different receptive fields, the first image features are input to a second convolution unit of a decoder module of the sky segmentation network for processing, a first feature image consistent with the resolution of the input original image is restored, and a probability image corresponding to the original image is obtained according to the first feature image.
Wherein the encoder module comprises a plurality of layers of first convolution elements; the decoder module comprises a plurality of layers of second convolution units, and the second convolution units correspond to the first convolution units one by one. In this embodiment, the decoder module includes five layers of second convolution units, and the decoder module includes five layers of second convolution units.
In an embodiment, the extracting, by the first convolution unit of the encoder module of the sky segmentation network in step S220, the first image feature of the original image with different receptive fields may include:
s2201, extracting first image features with different receptive fields of the original image layer by utilizing a multilayer first convolution unit of an encoder module of the sky segmentation network; wherein the resolution of the first image feature decreases from layer to layer.
In the convolutional neural network, the definition of a Receptive Field (Receptive Field) is the area size of a pixel point on a feature image (feature map) output by each layer of the convolutional neural network, which is mapped on an input image. The explanation for the restyle point is that a point on the feature image corresponds to an area on the input image. In the embodiment, the semantic information and the spatial information of the original image are determined by acquiring the first image features with different receptive fields, so that the accuracy of pixel classification and the edge segmentation precision of the original image are improved.
Each first convolution unit comprises a Conv + BN + ReLU network layer which is stacked for multiple times; in the present embodiment, the second convolution unit includes two stacked Conv + BN + ReLU network layers.
In this embodiment, each layer of the first convolution unit is processed, the resolution of the output first image feature is halved with respect to the resolution of the input first image feature of the layer.
In one embodiment, the decoder module includes a plurality of layers of second convolution units; the second convolution units are in one-to-one correspondence with the first convolution units. The operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image feature layer by layer through a line bilinear interpolation upsampling operation, so that a feature image consistent with the resolution of the input original image, namely the first feature image, is restored layer by layer.
In an embodiment, the step S220 of inputting the first image feature to a second convolution unit of a decoder module of the sky segmentation network for processing, and restoring to obtain a first feature image consistent with the resolution of the input original image includes the following steps:
s2202, inputting the first image features output by each first convolution unit of the encoder module into a second convolution unit corresponding to the corresponding decoder module by adopting a feature jumper connection mode, performing network operation layer by combining with an up-sampling mode, and extracting to obtain second image features.
The characteristic skip connection (skip connection) refers to that a first image characteristic output by a first convolution unit positioned in a middle layer (not a last layer) of an encoder module is input into a corresponding second convolution unit of a corresponding decoder module in a skip mode, and is subjected to convolution operation processing with a second image characteristic output by a second convolution unit positioned in a previous layer corresponding to the layer.
Fig. 3 is a schematic diagram illustrating an operation of a sky segmentation network according to an embodiment of the present invention, as shown in fig. 3, a first image feature output by an in _ conv convolution unit of an Encoder module is input to a Decoder _5 convolution unit corresponding to a Decoder module in a skip manner, a first image feature output by an Encoder _1 convolution unit of the Encoder module is input to a Decoder _4 convolution unit corresponding to the Decoder module in a skip manner, a first image feature output by an Encoder _2 convolution unit of the Encoder module is input to a Decoder _3 convolution unit corresponding to the Decoder module in a skip manner, a first image feature output by an Encoder _3 convolution unit of the Encoder module is input to a Decoder _2 convolution unit corresponding to the Decoder module in a skip manner, and a first image feature output by an Encoder _4 convolution unit of the Encoder module is input to a Decoder _1 convolution unit corresponding to the Decoder module in a skip manner, so as to implement a feature skip manner, and input a first image feature output by each first convolution unit of the Encoder module to a corresponding Decoder module And the module corresponds to a second convolution unit. Meanwhile, a Decoder _1 convolution unit corresponding to the Decoder module acquires a first image characteristic output by an Encoder _5 convolution unit, a Decoder _2 corresponding to the Decoder module acquires a second image characteristic output by the Decode _1 convolution unit, a Decoder _3 convolution unit corresponding to the Decoder module acquires a second image characteristic output by the Decode _2 convolution unit, a Decode _4 convolution unit corresponding to the Decoder module acquires a second image characteristic output by the Decode _3 convolution unit, and a Decode _5 corresponding to the Decoder module acquires a second image characteristic output by the Decode _4 convolution layer, thereby realizing that the first image characteristics output by each first convolution unit of the encoder module are input into the second convolution unit corresponding to the corresponding decoder module, and performing network operation layer by combining an upsampling mode with the second image features output by each second convolution unit of the decoder module, and extracting to obtain second image features.
S2203, restoring according to the second image characteristics to obtain a first characteristic image which is consistent with the resolution of the input image to be segmented.
In this embodiment, the decoder module doubles the resolution of the second image feature layer by layer through multiple layers of second convolution units, so that when the resolution of the second image feature output by the last layer of second convolution units is consistent with the resolution of the input original image, the first feature image is obtained.
Further, a probability image corresponding to the original image is obtained according to the first feature image, wherein the probability image is an image formed by pixels with values of 0 or 1 and is a single-channel image. In the probability image, the value of each pixel is 0 or 1, and the size of the probability image is the same as that of the input image to be segmented.
And S230, receiving a probability image output by the sky segmentation network, and segmenting the sky area of the original image according to the probability image.
The sky segmentation network performs semantic segmentation on an original image, outputs a corresponding probability image, wherein the value of each pixel in the probability image is 1 or 0, and determines a sky segmentation area according to the pixel value, for example, the pixel value 255 is white, and the white area is a sky area.
Optionally, the sky region of the image may be segmented according to the probability image output by the sky segmentation network. In this embodiment, subsequent processing, such as guided filtering, may be performed on the probability image to optimize the segmentation accuracy of the sky region, or inter-frame mean smoothing may be performed to obtain a more accurate probability image, so as to further improve the segmentation effect of the sky region of the image.
In the image-based sky region segmentation method provided by the embodiment, an original image containing sky content features is obtained; inputting an original image into a sky segmentation network, and performing semantic segmentation processing on the original image to obtain a probability image corresponding to the original image; the method comprises the steps of receiving a probability image output by a sky segmentation network, segmenting a sky region of an original image according to the probability image, wherein the sky segmentation network is a convolution neural network with the number of compression channels, so that the sky region in the image is accurately segmented on a client, the performance is stable, the processing speed is high, and the requirements of users are met.
In order to make the technical solution clearer and easier to understand, specific implementation processes and modes of a plurality of steps in the technical solution are described in detail below.
It should be noted that, in the related art, all sky segmentation implementation schemes are processed based on image input, and generally have a large computation amount, and if a neural network in the related art, such as a Unet network, is directly transplanted to a client, the processing speed of sky segmentation is slow, and even the sky segmentation cannot work due to dead halt, so that a card end is caused, and the sky segmentation network in the related art cannot be deployed at the client for operation. According to the scheme, the sky segmentation of the image is carried out based on the lightweight sky segmentation network, the image can be deployed at a client, the sky area can be accurately segmented in real time, and blocking is avoided while the segmentation precision is guaranteed.
Based on this, the image-based sky region segmentation method provided by the present application further includes: and training to generate a sky segmentation network. In an embodiment, the sky segmentation network may be obtained by:
s300, compressing the number of basic channels of a Unet network model according to the sky segmentation scene characteristics and increasing the number of layers of the Unet network model, so that the number of the basic channels after modification is adapted to the characteristics of the sky segmentation scene, and training to obtain a lightweight sky segmentation network.
In this embodiment, the sky-segmented scene characteristic refers to a parameter characteristic that meets requirements such as segmentation accuracy, display resolution, and image size required by an image production platform, for example, fine-tuning of an artistic drawing, production of a short video or a small video.
In this embodiment, the convolutional neural network adopted by the sky segmentation network is a neural network built based on a net network architecture. The Unet network architecture learns deep features by means of downsampling and convolution in different degrees, restores the deep features into the size of an original image by means of upsampling, and outputs a probability image corresponding to the original image.
In an embodiment, the number of basic channels of the Unet network architecture is compressed, the number of basic channels is compressed from 64 to 16, meanwhile, 4 convolutional layers of the convolutional neural network are increased to 5 layers, the segmentation effect loss caused by the compression of the number of basic channels is made up, and the feature extraction capability of the sky segmentation network is enhanced, so that the resolution of feature images output by the sky segmentation network with the number of layers of the basic channels and the convolutional layers is adapted to the characteristics of a sky segmentation scene, and a lightweight sky segmentation network capable of being deployed on a client is obtained.
In this embodiment, each convolutional layer of the sky segmentation network includes a plurality of encoder basic convolution modules and decoder basic convolution modules; the encoder basic convolution module comprises Conv + BN + ReLU network layers stacked twice, and the decoder basic convolution module comprises Conv + BN + ReLU + UpSample network layers. Compared with the encoder basic convolution module and the decoder basic convolution module in the related art, the encoder basic convolution module and the decoder basic convolution module provided by the embodiment add BN operation to enable training to be more stable and the segmentation effect to be better.
In this embodiment, the number of the encoder basic convolution modules and the number of the decoder basic convolution modules included in each convolution layer of the sky segmentation network may be different, so as to achieve different training effects.
Compared with the calculation amount Flops of the original Unet network model being 46.3G, the calculation amount Flops of the sky segmentation network provided by the embodiment capable of performing sky segmentation in real time being 370M, the processing speed is increased by more than 160 times, the sky segmentation network can be deployed on a smart phone of a middle-low-end android system to operate in real time, and the segmentation accuracy and stability are better.
In an embodiment, the obtaining a probability image corresponding to the original image according to the first feature image in step S220 may include the following steps:
s2204, converting the first characteristic image by using a Sigmoid function to obtain a probability image corresponding to the first characteristic image.
The first characteristic image is a single-channel image with the same size as the original image. In a single-channel image, commonly referred to as a gray-scale image, each pixel point can only have one value representing color, the pixel value of the single-channel image is between 0 and 255, 0 is black, 255 is white, and the intermediate value is gray of different levels. The pixel value range of the first characteristic image is 0-255.
The value range of the Sigmoid function is between 0 and 1, and the Sigmoid function has very good symmetry. In the embodiment, the feature image with the pixel value range of 0-255 is converted by using a Sigmoid function, and the value range of the probability image corresponding to the output feature image is 0-1.
The Sigmoid function is:
wherein, x is the pixel value of the characteristic image, and f (x) is the value of the probability image corresponding to the pixel point of the characteristic image.
In one embodiment, in order to optimize the segmentation effect of the sky segmentation network and reduce the false detection phenomenon of missed detection, in the training process, a multi-scale loss constraint method is used for optimizing model training. Specifically, the training of the second feature image output by the second convolution unit of at least two layers of the decoder module by using the loss function corresponding to the resolution of the second feature image may include the following steps:
s3101, determining a resolution ratio of the second feature image outputted by the second convolution unit of at least two layers of the decoder module with respect to the original image.
In this embodiment, the second convolution unit of the upper layer of the decoder module outputs the second feature image as the input of the second convolution unit of the lower layer. The resolution of the second feature image output by each layer of second convolution unit is half of the resolution of the second feature image input by the layer of second convolution unit (i.e. the second feature image output by the last second convolution unit). For example, if the resolution of the input second feature image is (384,512), the resolution of the output second feature image is (192,256). In the present embodiment, the feature channel may be adjusted to the output channel by 1 × 1 convolution, and the second feature image output by each second convolution unit is output. During the prediction phase, the neural network does not need to compute these additional 1 × 1 convolutions and therefore does not add additional computational effort.
And after the second convolution unit of the second layer, the resolution of the output second characteristic image is 1/4 (namely 1/2 of the resolution of the second characteristic image of the previous layer) of the resolution of the original image, and so on. And acquiring the resolution of the second characteristic image output by at least two second convolution units of the decoder module, and determining the ratio of the resolution of the second characteristic image to the resolution of the original image based on the resolution of the second characteristic image.
S3102, determining the type and weight value of a BCE (binary Cross Entry) loss function for training the second convolution units of the at least two layers according to the resolution ratio.
In this embodiment, the type and weight of the BCE loss function that trains the second convolution unit of at least two layers are matched to the resolution of the feature image output by the convolution layer.
For example, on the branch of the second feature image output with the resolution ratio of 1/2, training is performed by the 1/2BCE loss function, on the branch of the second feature image output with the resolution ratio of 1/4, training is performed by the 1/4BCE loss function, and on the branch of the second feature image output with the resolution ratio of 1/8, training is performed by the 1/8BCE loss function. Among them, 1/2BCE loss function, 1/4BCE loss function, and 1/8BCE loss function are different kinds of loss functions.
Furthermore, the weight of the loss function on each output branch can be set, so that the trained result is more accurate and stable.
S3103, based on the weight values, utilizing the BCE loss functions of the corresponding types to carry out constraint training on the sky segmentation network.
In this embodiment, based on the weight of the Loss function on each output branch, the final Loss in the training phase is calculated as:
wherein,the predicted value of the network is shown, y is shown as the true value, and the values of alpha, beta, theta and lambda are 10, 2, 1 and 1. Of course, in other application scenarios, α, β, θ and λ may be other values, which is not limited herein.
Fig. 4 is a flowchart of an image special effect processing method provided in an embodiment of the present application, where the image special effect processing method is applicable to an image special effect processing device, such as a client.
Specifically, as shown in fig. 4, the image special effect processing method may include the following steps:
and S410, acquiring a target sky material selected by a user.
In this embodiment, a sky material is downloaded locally in advance at a client, a user triggers a panel for popping up the sky material by clicking a relevant button on a display interface of the client, and different sky materials are displayed through the panel, wherein the sky materials can be the sky material pre-configured by a system or the sky material self-defined by the user.
The user selects one or more of the sky materials on the panel as the target sky material for subsequent processing.
And S420, determining a sky area to be segmented of the original image to be segmented.
The sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image which is consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image.
An original image containing sky content features is acquired. Generally, a user takes an external scene image including a sky area by using a mobile phone camera or the like. Sky content features refer to features representing the sky, such as clouds, color categories, color variations, and color regions.
In this embodiment, the original image including the sky content feature may be acquired from a local device such as a mobile phone camera, or may be acquired from a local storage device or an external storage device.
And adjusting the image format of the original image according to the sky segmentation scene characteristics. In this embodiment, the sky-segmentation scene characteristic refers to a parameter characteristic that meets requirements of image processing, such as segmentation accuracy, display resolution, image size, and the like, required for fine image correction or small video cover creation.
For example, the image is subjected to processing such as cropping and resolution adjustment, and the height and width of the image are adjusted to (384,512), where the height of the image is 384 pixels in height and the width is 512 pixels in width.
And setting a preprocessing parameter of the image to be segmented according to the sky segmentation scene characteristic, and adjusting an image format of an original image in the image to be segmented according to the preprocessing parameter, such as adjusting the size, the resolution, the precision and the like.
Inputting an original image subjected to image format adjustment into a sky segmentation network, and performing semantic segmentation processing on the original image to obtain a probability image corresponding to the original image; wherein the sky segmentation network is a convolutional neural network that compresses a number of channels.
Semantic segmentation of images (semantic segmentation) refers to having a computer segment according to the semantics of an image. In the image field, semantic meaning refers to the content of an image, and understanding the meaning of a picture, meaning that different objects in the picture are divided from the perspective of pixels, and labeling is performed on each pixel in an original image, for example, white in an image frame represents a sky area, and black represents a non-sky area. Of course, in other embodiments, white of the image frame may represent a sky area, black may represent a non-sky area, etc.
In this embodiment, the sky segmentation network is a convolutional neural network obtained through pre-training, can be adapted to an image-based sky region segmentation scene, is deployed on a client, and can perform pattern transformation on a sky region in an image in real time. For the high operand of the relevant convolution neural network, the sky segmentation model of this embodiment carries out structure adjustment and parameter adjustment to on the basis of guaranteeing the identification accuracy, the operand significantly reduces. Generally, the larger the number of channels of the neural network, the larger the amount of data computation. The sky segmentation network compresses the number of channels, so that the operation amount is reduced, for example, the number of basic channels is compressed from 64 to 16. Sky segmentation is based on image processing, and the operand requirement is a little lower, so the number of channels is set to 16, the parameter number is properly reduced, and the accuracy and the stability of the segmentation effect are ensured when the operand is reduced. In this embodiment, the segmentation effect caused by the compression of the number of channels can be compensated by other parameter adjustment or structure adjustment methods, so as to ensure the segmentation accuracy.
In this embodiment, a first convolution unit of an encoder module of the sky segmentation network is used to extract first image features of the original image with different receptive fields, the first image features are input to a second convolution unit of a decoder module of the sky segmentation network for processing, a first feature image consistent with the resolution of the input original image is restored, and a probability image corresponding to the original image is obtained according to the first feature image.
Wherein the encoder module comprises a plurality of layers of first convolution elements; the decoder module comprises a plurality of layers of second convolution units, and the second convolution units correspond to the first convolution units one by one. In this embodiment, the decoder module includes five layers of second convolution units, and the decoder module includes five layers of second convolution units.
In an embodiment, a plurality of layers of first convolution units of an encoder module of the sky segmentation network are utilized to extract first image features with different receptive fields of the original image layer by layer; wherein the resolution of the first image feature decreases from layer to layer.
In the convolutional neural network, the definition of a Receptive Field (Receptive Field) is the area size of a pixel point on a feature image (feature map) output by each layer of the convolutional neural network, which is mapped on an input image. The explanation for the restyle point is that a point on the feature image corresponds to an area on the input image. In the embodiment, the semantic information and the spatial information of the original image are determined by acquiring the first image features with different receptive fields, so that the accuracy of pixel classification and the edge segmentation precision of the original image are improved.
Each first convolution unit comprises a Conv + BN + ReLU network layer which is stacked for multiple times; in the present embodiment, the second convolution unit includes two stacked Conv + BN + ReLU network layers.
In this embodiment, each layer of the first convolution unit is processed, the resolution of the output first image feature is halved with respect to the resolution of the input first image feature of the layer.
In one embodiment, the decoder module includes a plurality of layers of second convolution units; the second convolution units are in one-to-one correspondence with the first convolution units. The operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image feature layer by layer through a line bilinear interpolation upsampling operation, so that a feature image consistent with the resolution of the input original image, namely the first feature image, is restored layer by layer.
In an embodiment, the first image features output by each first convolution unit of the encoder module are input into a second convolution unit corresponding to a corresponding decoder module by adopting a feature jumper connection mode, and network operation is performed layer by combining with an up-sampling mode to extract and obtain second image features.
The characteristic skip connection (skip connection) refers to that a first image characteristic output by a first convolution unit positioned in a middle layer (not a last layer) of an encoder module is input into a corresponding second convolution unit of a corresponding decoder module in a skip mode, and is subjected to convolution operation processing with a second image characteristic output by a second convolution unit positioned in a previous layer corresponding to the layer.
Fig. 3 is a schematic diagram illustrating an operation of a sky segmentation network according to an embodiment of the present invention, as shown in fig. 3, a first image feature output by an in _ conv convolution unit of an Encoder module is input to a Decoder _5 convolution unit corresponding to a Decoder module in a skip manner, a first image feature output by an Encoder _1 convolution unit of the Encoder module is input to a Decoder _4 convolution unit corresponding to the Decoder module in a skip manner, a first image feature output by an Encoder _2 convolution unit of the Encoder module is input to a Decoder _3 convolution unit corresponding to the Decoder module in a skip manner, a first image feature output by an Encoder _3 convolution unit of the Encoder module is input to a Decoder _2 convolution unit corresponding to the Decoder module in a skip manner, and a first image feature output by an Encoder _4 convolution unit of the Encoder module is input to a Decoder _1 convolution unit corresponding to the Decoder module in a skip manner, so as to implement a feature skip manner, and input a first image feature output by each first convolution unit of the Encoder module to a corresponding Decoder module And the module corresponds to a second convolution unit. Meanwhile, a Decoder _1 convolution unit corresponding to the Decoder module acquires a first image characteristic output by an Encoder _5 convolution unit, a Decoder _2 corresponding to the Decoder module acquires a second image characteristic output by the Decode _1 convolution unit, a Decoder _3 convolution unit corresponding to the Decoder module acquires a second image characteristic output by the Decode _2 convolution unit, a Decode _4 convolution unit corresponding to the Decoder module acquires a second image characteristic output by the Decode _3 convolution unit, and a Decode _5 corresponding to the Decoder module acquires a second image characteristic output by the Decode _4 convolution layer, thereby realizing that the first image characteristics output by each first convolution unit of the encoder module are input into the second convolution unit corresponding to the corresponding decoder module, and performing network operation layer by combining an upsampling mode with the second image features output by each second convolution unit of the decoder module, and extracting to obtain second image features.
In an embodiment, a first feature image consistent with the resolution of the input image to be segmented is obtained through restoration according to the second image feature.
In this embodiment, the decoder module doubles the resolution of the second image feature layer by layer through multiple layers of second convolution units, so that when the resolution of the second image feature output by the last layer of second convolution units is consistent with the resolution of the input original image, the first feature image is obtained.
Further, a probability image corresponding to the original image is obtained according to the first feature image, wherein the probability image is an image formed by pixels with values of 0 or 1 and is a single-channel image. In the probability image, the value of each pixel is 0 or 1, and the size of the probability image is the same as that of the input image to be segmented.
And the client receives the probability image output by the sky segmentation network, and segments the sky area of the original image according to the probability image.
The sky segmentation network performs semantic segmentation on an original image, outputs a corresponding probability image, wherein the value of each pixel in the probability image is 1 or 0, and determines a sky segmentation area according to the pixel value, for example, the pixel value 255 is white, and the white area is a sky area.
Optionally, the sky region of the image may be segmented according to the probability image output by the sky segmentation network. In this embodiment, subsequent processing, such as guided filtering, may be performed on the probability image to optimize the segmentation accuracy of the sky region, or inter-frame mean smoothing may be performed to obtain a more accurate probability image, so as to further improve the segmentation effect of the sky region of the image.
And S430, fusing the target sky material to the sky area to generate a target image.
Covering a target sky material on a sky area of an original image, or deleting an original sky pattern of the original image, replacing the target sky material selected by a user, and fusing the target sky material into the original image to obtain a target image, so that the target image has special sky effect, for example, a cloudy sky is changed into a clear and bright sky.
According to the image special effect processing method provided by the embodiment, a target sky material selected by a user is obtained; performing semantic segmentation processing on the original image by using a pre-trained lightweight sky segmentation network to obtain a probability image corresponding to the original image; the method comprises the steps of receiving a probability image output by a sky segmentation network, determining that the probability image is segmented in a sky area of an original image correspondingly, carrying out special effect processing on the sky area by using a target sky material, outputting a target image fused with the target sky material, realizing accurate segmentation of the sky area of the original image and replacement of the sky material in real time on a client, and meeting the requirements of a user.
In order to more clearly illustrate the present solution, an implementation process of the present solution is exemplarily described below with reference to fig. 5. Fig. 5 is a schematic diagram of an implementation of image special effect processing according to an embodiment of the present application.
As shown in fig. 5, an input original image is acquired, the original image is input into a pre-trained sky segmentation network for semantic segmentation, encoding and decoding are performed by using convolution units of an encoding module and a decoding module of the sky segmentation network, in the encoding process, layer-by-layer convolution encoding processing is performed on convolution layers such as in _ conv, Encoder _1, Encoder _2, Encoder _3, Encoder _4, and Encoder _5, and then decoding processing such as Decoder _1, Decoder _2, Decoder _3, Decoder _4, and Decoder _5 is performed to obtain a probability image.
As shown in fig. 5, for each Decoder _ i (i is a positive integer, such as 1, 2, 3, 4 and 5) channel output branch, the output result is trained with a corresponding BCE penalty function, if the resolution ratio of the second characteristic image output by the Decoder _2 channel to the original image is 1/8, then 1/8BCE loss function is used for training, and further, the resolution ratio of the second characteristic image output by the Decoder _3 channel to the original image is 1/4, then, using 1/4BCE loss function for training, the resolution ratio of the second feature image output by the Decoder _4 channel to the original image is 1/2, then the training is performed by using 1/2BCE loss function, and the resolution ratio of the second feature image output by the Decoder _5 channel to the original image is 1/1, then the training is performed by using 1/1BCE loss function.
In the embodiment, different BCE loss function constraints are performed on output results of different convolutional layer output channels, which is beneficial to model training and learning of more detailed edge information.
The above examples are merely used to assist in explaining the technical solutions of the present disclosure, and the drawings and specific flows related thereto do not constitute a limitation on the usage scenarios of the technical solutions of the present disclosure.
The following describes in detail related embodiments of an image-based sky region segmentation apparatus and an image special effect processing apparatus.
Fig. 6 is a schematic structural diagram of an image-based sky region segmentation apparatus according to an embodiment, which is executable by an image-based sky region segmentation device, such as a client.
Specifically, the image-based sky region segmentation apparatus 200 includes: an original image obtaining module 210, a probability image obtaining module 220, and a sky region segmentation module 230.
The original image obtaining module 210 is configured to obtain an original image containing sky content features; a probability image obtaining module 220, configured to input the original image with the adjusted image format into a sky segmentation network, perform semantic segmentation processing on the original image, extract first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, input the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restore the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtain a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image; a sky region segmentation module 230 configured to segment a sky region of the original image according to the probability image.
The sky region segmentation device based on the image, which is provided by the embodiment, is deployed at the client, can realize accurate segmentation of the sky region of the image on the client, has stable performance and high processing speed, and meets the requirements of users.
In an embodiment, the encoder module comprises a plurality of layers of first convolution elements;
the probability image obtaining module 220 includes: a first image feature extraction unit, configured to extract, layer by layer, first image features of the original image with different receptive fields by using a multi-layer first convolution unit of an encoder module of the sky segmentation network; wherein the resolution of the first image feature decreases from layer to layer.
In one embodiment, the decoder module includes a plurality of layers of second convolution units; the second convolution units correspond to the first convolution units one by one;
the probability image obtaining module 220 includes: a second image feature extraction unit and a first feature image obtaining unit;
the second image feature extraction unit is used for inputting the first image features output by each first convolution unit of the encoder module into a second convolution unit corresponding to the corresponding decoder module in a feature jumping mode, and performing network operation layer by combining with an up-sampling mode to extract and obtain second image features; and the first characteristic image obtaining unit is used for obtaining a first characteristic image which is consistent with the resolution of the input image to be segmented according to the second image characteristic restoration.
In an embodiment, each of the first convolution units includes a plurality of stacked Conv + BN + ReLU network layers; the operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image features layer by layer through a line bilinear interpolation upsampling operation.
In an embodiment, the image-based sky region segmentation apparatus 200 further includes: and the segmentation network training module is used for training the second characteristic image output by the second convolution unit of at least two layers of the decoder module by using a loss function corresponding to the resolution of the second characteristic image.
In one embodiment, the split network training module comprises: the system comprises a resolution ratio unit, a weight value determining unit and a constraint training unit; the resolution ratio unit is used for determining the resolution ratio of the second characteristic image output by the second convolution unit of at least two layers of the decoder module relative to the original image; a weight value determining unit, configured to determine, according to the resolution ratio, a type and a weight value of a BCE loss function used for training the second convolution units of the at least two layers; and the constraint training unit is used for carrying out constraint training on the sky segmentation network by utilizing the BCE loss function of the corresponding category based on the weight value.
In one embodiment, the resolution ratio unit includes: an output channel adjusting subunit and a resolution ratio determining unit;
the output channel adjusting subunit is used for adjusting the characteristic channels of the second convolution units of at least two layers of the decoder module into output channels through 1 × 1 convolution and outputting corresponding second characteristic images; and the resolution ratio determining unit is used for calculating the resolution ratio of the second characteristic image relative to the original image based on the resolution of the second characteristic image.
In one embodiment, the probability image obtaining module 220 includes: and the probability image obtaining unit is used for converting the first characteristic image by using a Sigmoid function to obtain a probability image corresponding to the characteristic image.
The image-based sky region segmentation apparatus according to the embodiment of the present disclosure may perform the image-based sky region segmentation method according to the embodiment of the present disclosure, which achieves similar principles and has the same beneficial effects as the image special effect processing method, and thus, details thereof are not repeated herein.
Fig. 7 is a schematic structural diagram of an image special effect processing apparatus according to an embodiment, where the image special effect processing apparatus is executable on an image special effect processing device, such as a client.
Specifically, as shown in fig. 7, the image special effect processing apparatus 400 includes: a sky material acquisition module 410, a sky region determination module 420, and a target image generation module 430.
The sky material acquiring module 410 is configured to acquire a target sky material selected by a user;
a sky region determining module 420, configured to determine a sky region of an original image to be segmented, where the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network, processing the first image features, restoring to obtain a first feature image that is consistent with a resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and a target image generation module 430, configured to fuse the target sky material to the sky area, and generate a target image.
The image special effect processing device provided by the embodiment is deployed at a client, can realize the accurate segmentation of the sky area of an image and the synthesis processing of the sky special effect on the client, has stable performance and high processing speed, and meets the requirements of users.
The image special effect processing apparatus according to the embodiment of the present disclosure may execute the image special effect processing method provided by the embodiment of the present disclosure, which achieves similar principles and has the same beneficial effects as the image special effect processing method, and details are not repeated here.
An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the image-based sky region segmentation method or the image special effect processing method in any of the above embodiments when executing the program.
When the computer device provided by the above embodiment executes the sky region segmentation method based on the image or the image special effect processing method provided by any of the above embodiments, the computer device provided by the above embodiment has corresponding functions and beneficial effects.
An embodiment of the present invention provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a method for image-based sky region segmentation, including:
acquiring an original image containing sky content characteristics;
inputting an original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and receiving a probability image output by the sky segmentation network, and segmenting the sky area of the original image according to the probability image.
The computer executable instructions, when executed by a computer processor, are further for performing an image special effects processing method comprising:
acquiring a target sky material selected by a user;
determining a sky area to be segmented of the original image, wherein the sky area is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image which is consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and fusing the target sky material to the sky area to generate a target image.
Of course, the embodiments of the present invention provide a storage medium containing computer-executable instructions, which are not limited to the operation of the image-based sky region segmentation method or the image special effect processing method described above, and have corresponding functions and advantages.
From the above description of the embodiments, it is obvious for those skilled in the art that the present invention can be implemented by software and necessary general hardware, and certainly, can also be implemented by hardware, but the former is a better embodiment in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which may be stored in a computer-readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a FLASH Memory (FLASH), a hard disk, or an optical disk of a computer, and includes instructions for enabling a computer device (which may be a personal computer, a server, or a network device) to execute the image-based sky region segmentation method or the image special effect processing method according to any embodiment of the present invention.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps. The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.
The foregoing is only a partial embodiment of the present application, and it should be noted that, for those skilled in the art, several modifications and decorations can be made without departing from the principle of the present application, and these modifications and decorations should also be regarded as the protection scope of the present application.
Claims (12)
1. A sky region segmentation method based on images is characterized by comprising the following steps:
acquiring an original image containing sky content characteristics;
inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features with different receptive fields of the original image by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and segmenting the sky area of the original image according to the probability image.
2. The image-based sky region segmentation method of claim 1, wherein the encoder module includes a plurality of layers of a first convolution unit;
the step of extracting first image features of the original image having different receptive fields using a first convolution unit of an encoder module of the sky segmentation network comprises:
extracting first image features with different receptive fields of the original image layer by utilizing a plurality of layers of first convolution units of an encoder module of the sky segmentation network; wherein the resolution of the first image feature decreases from layer to layer.
3. The image-based sky region segmentation method of claim 2, wherein the decoder module includes a multi-layer second convolution unit; the second convolution units correspond to the first convolution units one by one;
the step of inputting the first image feature into a second convolution unit of a decoder module of the sky segmentation network for processing, and restoring to obtain a first feature image consistent with the resolution of the input original image includes:
inputting the first image features output by each first convolution unit of the encoder module into a second convolution unit corresponding to the corresponding decoder module by adopting a feature jumper connection mode, and performing network operation layer by combining with an up-sampling mode to extract and obtain second image features;
and restoring according to the second image characteristic to obtain a first characteristic image which is consistent with the resolution of the input image to be segmented.
4. The image-based sky region segmentation method of claim 3, wherein each of the first convolution units includes a plurality of stacked Conv + BN + ReLU network layers;
the operation of each second convolution unit comprises Conv + BN + ReLU + UpSample network layers, wherein each layer of second convolution unit increases the resolution of the output second image features layer by layer through a line bilinear interpolation upsampling operation.
5. The method of claim 1, wherein the step of training a second feature image output by a second convolution unit of at least two layers of the decoder module with a loss function corresponding to a resolution of the second feature image comprises:
determining the resolution ratio of a second characteristic image output by a second convolution unit of at least two layers of the decoder module relative to the original image;
determining the type and weight value of a BCE loss function used for training the second convolution units of the at least two layers according to the resolution ratio;
and based on the weight value, carrying out constraint training on the sky segmentation network by utilizing the BCE loss function of the corresponding category.
6. The method of claim 5, wherein the step of determining a resolution ratio of a second characteristic image outputted by a second convolution unit of at least two layers of the decoder module with respect to the original image comprises:
adjusting the characteristic channels of the second convolution units of at least two layers of the decoder module into output channels through 1 x 1 convolution, and outputting corresponding second characteristic images;
and calculating the resolution ratio of the second characteristic image relative to the original image based on the resolution of the second characteristic image.
7. The method of claim 1, wherein the step of deriving a probability image corresponding to the original image according to the first feature image comprises:
converting the first characteristic image by using a Sigmoid function to obtain a probability image corresponding to the characteristic image; and the first characteristic image is a second characteristic image output by a last layer of second convolution unit of the encoder module.
8. An image special effect processing method is characterized by comprising the following steps:
acquiring a target sky material selected by a user;
determining a sky region of an original image to be segmented, wherein the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restoring to obtain a first feature image which is consistent with the resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and fusing the target sky material to the sky area to generate a target image.
9. An image-based sky region segmentation apparatus, comprising:
the system comprises an original image acquisition module, a background image acquisition module and a background image acquisition module, wherein the original image acquisition module is used for acquiring an original image containing sky content characteristics;
a probability image obtaining module, configured to input the original image into a sky segmentation network, perform semantic segmentation processing on the original image, extract first image features of the original image with different receptive fields by using a first convolution unit of an encoder module of the sky segmentation network, input the first image features into a second convolution unit of a decoder module of the sky segmentation network for processing, restore the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtain a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and the sky region segmentation module is used for segmenting the sky region of the original image according to the probability image.
10. An image special effect processing apparatus, comprising:
the sky material acquisition module is used for acquiring a target sky material selected by a user;
a sky region determining module, configured to determine a sky region of an original image to be segmented, where the sky region is determined by inputting the original image into a sky segmentation network, performing semantic segmentation processing on the original image, extracting first image features of the original image, which have different receptive fields, by using a first convolution unit of an encoder module of the sky segmentation network, inputting the first image features into a second convolution unit of a decoder module of the sky segmentation network, processing the first image features to obtain a first feature image that is consistent with a resolution of the input original image, and obtaining a probability image corresponding to the original image according to the first feature image; the sky segmentation network is a lightweight deep convolution neural network and is obtained by training a second characteristic image output by at least two layers of second convolution units of the decoder module by using a loss function corresponding to the resolution of the second characteristic image;
and the target image generation module is used for fusing the target sky material into the sky area to generate a target image.
11. An electronic device, comprising:
one or more processors;
a memory;
one or more applications, wherein the one or more applications are stored in the memory and configured to be executed by the one or more processors, the one or more programs configured to: performing the image-based sky region segmentation method of any one of claims 1-7 or the image special effect processing method of claim 8.
12. A computer-readable storage medium on which a computer program is stored, which, when being executed by a processor, implements the image-based sky region segmentation method of any one of claims 1 to 7 or the image special effect processing method of claim 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011104753.9A CN112200817A (en) | 2020-10-15 | 2020-10-15 | Sky region segmentation and special effect processing method, device and equipment based on image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011104753.9A CN112200817A (en) | 2020-10-15 | 2020-10-15 | Sky region segmentation and special effect processing method, device and equipment based on image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112200817A true CN112200817A (en) | 2021-01-08 |
Family
ID=74009749
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011104753.9A Pending CN112200817A (en) | 2020-10-15 | 2020-10-15 | Sky region segmentation and special effect processing method, device and equipment based on image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112200817A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114445313A (en) * | 2022-01-28 | 2022-05-06 | 北京百度网讯科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN115660944A (en) * | 2022-10-27 | 2023-01-31 | 深圳市大头兄弟科技有限公司 | Dynamic method, device and equipment for static picture and storage medium |
WO2023016150A1 (en) * | 2021-08-09 | 2023-02-16 | 北京字跳网络技术有限公司 | Image processing method and apparatus, device, and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109345538A (en) * | 2018-08-30 | 2019-02-15 | 华南理工大学 | A kind of Segmentation Method of Retinal Blood Vessels based on convolutional neural networks |
CN110490884A (en) * | 2019-08-23 | 2019-11-22 | 北京工业大学 | A kind of lightweight network semantic segmentation method based on confrontation |
CN111311629A (en) * | 2020-02-21 | 2020-06-19 | 京东方科技集团股份有限公司 | Image processing method, image processing device and equipment |
-
2020
- 2020-10-15 CN CN202011104753.9A patent/CN112200817A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109345538A (en) * | 2018-08-30 | 2019-02-15 | 华南理工大学 | A kind of Segmentation Method of Retinal Blood Vessels based on convolutional neural networks |
CN110490884A (en) * | 2019-08-23 | 2019-11-22 | 北京工业大学 | A kind of lightweight network semantic segmentation method based on confrontation |
CN111311629A (en) * | 2020-02-21 | 2020-06-19 | 京东方科技集团股份有限公司 | Image processing method, image processing device and equipment |
Non-Patent Citations (1)
Title |
---|
LIANG XIAO ET AL.: "Hybrid Connection Network for Semantic Segmentation", 《TENTH INTERNATIONAL CONFERENCE ON DIGITAL IMAGE PROCESSING》, 31 December 2018 (2018-12-31), pages 1 - 7 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023016150A1 (en) * | 2021-08-09 | 2023-02-16 | 北京字跳网络技术有限公司 | Image processing method and apparatus, device, and storage medium |
CN114445313A (en) * | 2022-01-28 | 2022-05-06 | 北京百度网讯科技有限公司 | Image processing method and device, electronic equipment and storage medium |
CN115660944A (en) * | 2022-10-27 | 2023-01-31 | 深圳市大头兄弟科技有限公司 | Dynamic method, device and equipment for static picture and storage medium |
CN115660944B (en) * | 2022-10-27 | 2023-06-30 | 深圳市闪剪智能科技有限公司 | Method, device, equipment and storage medium for dynamic state of static picture |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109493350B (en) | Portrait segmentation method and device | |
CN111598776B (en) | Image processing method, image processing device, storage medium and electronic apparatus | |
CN112330574B (en) | Portrait restoration method and device, electronic equipment and computer storage medium | |
US20220222786A1 (en) | Image processing method, smart device, and computer readable storage medium | |
CN112200817A (en) | Sky region segmentation and special effect processing method, device and equipment based on image | |
CN112200818B (en) | Dressing region segmentation and dressing replacement method, device and equipment based on image | |
CN111833360B (en) | Image processing method, device, equipment and computer readable storage medium | |
CN113487618B (en) | Portrait segmentation method, portrait segmentation device, electronic equipment and storage medium | |
WO2023284401A1 (en) | Image beautification processing method and apparatus, storage medium, and electronic device | |
US11677897B2 (en) | Generating stylized images in real time on mobile devices | |
US11887277B2 (en) | Removing compression artifacts from digital images and videos utilizing generative machine-learning models | |
CN114693929A (en) | Semantic segmentation method for RGB-D bimodal feature fusion | |
US20240161240A1 (en) | Harmonizing composite images utilizing a semantic-guided transformer neural network | |
CN113034413A (en) | Low-illumination image enhancement method based on multi-scale fusion residual error codec | |
CN112085768A (en) | Optical flow information prediction method, optical flow information prediction device, electronic device, and storage medium | |
CN116882511A (en) | Machine learning method and apparatus | |
US12051225B2 (en) | Generating alpha mattes for digital images utilizing a transformer-based encoder-decoder | |
CN112200816A (en) | Method, device and equipment for segmenting region of video image and replacing hair | |
CN116824004A (en) | Icon generation method and device, storage medium and electronic equipment | |
US11856203B1 (en) | Neural face video compression using multiple views | |
US20230298148A1 (en) | Harmonizing composite images utilizing a transformer neural network | |
CN111383289A (en) | Image processing method, image processing device, terminal equipment and computer readable storage medium | |
Kang et al. | Lightweight Image Matting via Efficient Non-Local Guidance | |
Liu et al. | LightFuse: Lightweight CNN based Dual-exposure Fusion | |
CN118570054B (en) | Training method, related device and medium for image generation model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |