CN113079391A - Portrait image mixing processing method, equipment and computer readable storage medium - Google Patents
Portrait image mixing processing method, equipment and computer readable storage medium Download PDFInfo
- Publication number
- CN113079391A CN113079391A CN202011644727.5A CN202011644727A CN113079391A CN 113079391 A CN113079391 A CN 113079391A CN 202011644727 A CN202011644727 A CN 202011644727A CN 113079391 A CN113079391 A CN 113079391A
- Authority
- CN
- China
- Prior art keywords
- layer
- decoding
- image
- portrait
- processing method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 13
- 238000003860 storage Methods 0.000 title claims abstract description 7
- 238000002156 mixing Methods 0.000 title claims description 8
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000013528 artificial neural network Methods 0.000 claims abstract description 9
- 238000012549 training Methods 0.000 claims description 18
- 230000006870 function Effects 0.000 claims description 11
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims description 3
- 239000011159 matrix material Substances 0.000 claims description 3
- 238000003062 neural network model Methods 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 238000011084 recovery Methods 0.000 claims description 3
- 238000005096 rolling process Methods 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 230000001902 propagating effect Effects 0.000 claims 1
- 230000008569 process Effects 0.000 abstract description 13
- 238000012545 processing Methods 0.000 abstract description 8
- 238000012360 testing method Methods 0.000 abstract description 5
- 238000004458 analytical method Methods 0.000 abstract description 3
- 230000009286 beneficial effect Effects 0.000 abstract description 2
- 230000007246 mechanism Effects 0.000 abstract description 2
- 230000000875 corresponding effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000009471 action Effects 0.000 description 2
- 230000000052 comparative effect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000007704 transition Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 1
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/2343—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs involving reformatting operations of video signals for distribution or compliance with end-user requests or end-user device requirements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/439—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/40—Client devices specifically adapted for the reception of or interaction with content, e.g. set-top-box [STB]; Operations thereof
- H04N21/43—Processing of content or additional data, e.g. demultiplexing additional data from a digital video stream; Elementary client operations, e.g. monitoring of home network or synchronising decoder's clock; Client middleware
- H04N21/44—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs
- H04N21/4402—Processing of video elementary streams, e.g. splicing a video clip retrieved from local storage with an incoming video stream or rendering scenes according to encoded video stream scene graphs involving reformatting operations of video signals for household redistribution, storage or real-time display
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a portrait image hybrid processing method, equipment and a computer readable storage medium, and belongs to the field of image processing. The processing method realizes the mixed processing of semantic analysis and portrait matting by arranging a neural network comprising two parallel decoders and a common encoder, trains a model by utilizing a multi-task learning mechanism, and adopts different decoding modes for different tasks in the test process. The invention aims to solve the problem of repeated coding-decoding in the process of image matting in the prior art. The invention can promote and improve the parameters of the return update on the basis of realizing the feature sharing, is beneficial to improving the matting performance, reducing the model size, reducing the memory occupation and accelerating the reasoning speed of the test.
Description
Technical Field
The present invention relates to the field of image processing, and in particular, to a method and apparatus for blending and processing a portrait image, and a computer-readable storage medium.
Background
The human image matting and semantic parsing are two important tasks in the field of image processing, the traditional human image matting and semantic parsing are two independent tasks, an encoder and a decoder are respectively needed to achieve corresponding effects in the process of establishing a model by utilizing a neural network, the result obtained by actually utilizing the human image matting from an image original image is an image containing a foreground, a background and a transition scene, the semantic parsing is obtained by continuously processing the image by the encoder and the decoder for semantic decoding on the basis, the requirements of two continuous encoding-decoding processes on computing resources are huge in the actual training process, and the processing efficiency is relatively low.
Disclosure of Invention
In order to solve the problem of repeated coding-decoding in the process of portrait matting in the conventional technology, the invention provides a method for optimizing high consumption and low efficiency in the algorithm process by designing a network structure comprising an encoder and two parallel decoders connected with the encoder.
In view of the above, the present invention provides a portrait image mixing processing method, including: the system comprises two parallel decoders and a shared encoder, wherein the two decoders are a portrait decoder and a semantic decoder respectively, the portrait decoder is used for decoding portrait information, the semantic decoder is used for decoding semantic information, and the encoder is used for encoding an input image and extracting features for the two decoders.
Preferably, the processing method specifically includes the steps of:
s1: construction and pre-processing of a portrait image dataset,
cutting all training face images into uniform sizes after acquiring the training face images;
s2: constructing a neural network training model and establishing a neural network training model,
the trained neural network model comprises six parts, namely an input layer, a coding layer, a first decoding layer, a second decoding layer, a first output layer and a second output layer; the encoding layer completes compression on the feature data from the input layer through a complete rolling machine system structure, then the feature data are respectively output to the first decoding layer and the second decoding layer through two output heads, and the two decoding layers respectively complete decoding on the encoded image through multi-layer transposition convolution;
s3: the model is optimized, and the model is optimized,
the loss of two output layers is respectively calculated through the image after the forward propagation compression recovery, and then the weight matrix of the encoder is respectively updated through the backward propagation, the optimization is continuously carried out,
specifically, the network training model in S2 is a parallel training model using one encoder and two parallel decoders,
preferably, alpha matte in the first decoding layer is set as a weighted combination of the background and foreground of the input layer image, namely:
Ip=αpFp+(1-αp)Bp
preferably, the loss function in the neural network is defined as:
wherein sigma1And σ2Is the respective weight of each loss;
Lsas a loss function of the semantic decoder,
where C is the number of semantic classes, yncIs the true value of C, pncIs the predicted value corresponding to C,
Lαfor the loss function of the image decoder, after completing semantic image decoding, L is adoptedαThe target value is directly estimated and,
Lα=||αg-αp||1
wherein alpha isgAnd alphapRespectively representing the true value and the predicted value of the alpha matte;
the multi-task learning is a complete end-to-end process, so that a loss function L is adopted to train a semantic decoder and a portrait decoder so as to realize the function of automatically adjusting the weights of different tasks.
An electronic device comprising a processor and a memory, the memory having machine executable instructions capable of being executed by the processor, the processor being capable of executing the machine executable instructions to implement the above method.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the above-mentioned method.
Compared with the prior art, the invention has the following beneficial effects: a simple parallel network structure is provided, the training process of portrait matting and semantic analysis is connected with two parallel decoders through an encoder, in the training process, features are shared, parameters returned and updated are mutually promoted, mutual improvement is achieved, the performance of matting is improved, the size of a model is reduced, memory occupation is reduced, and the reasoning speed of testing is accelerated.
Drawings
FIG. 1 is a schematic diagram of a network structure according to the present invention
FIG. 2 is a diagram of a comparative network architecture
Detailed Description
In order to make the technical means, the creation characteristics, the achievement purposes and the effects of the invention easy to understand, the invention is further described with the specific embodiments.
As shown in fig. 1-2, fig. 1 is a schematic network structure diagram of a face image hybrid processing method provided by the present invention, which includes: the system comprises two parallel decoders and a shared encoder, wherein the two decoders are a portrait decoder and a semantic decoder respectively, the portrait decoder is used for decoding portrait information, the semantic decoder is used for decoding semantic information, and the encoder is used for encoding an input image and extracting features for the two decoders;
firstly, after a network structure acquires a training set, namely an input image and an output image, firstly, constructing and preprocessing a data set, and cutting all training face images into uniform sizes;
secondly, a neural network training model is constructed, a coder and two parallel decoders are adopted for training,
the trained neural network model comprises six parts, namely an input layer, a coding layer, a first decoding layer, a second decoding layer, a first output layer and a second output layer; the encoding layer completes compression on feature data from an input layer through a complete rolling machine system structure, then outputs the feature data to a first decoding layer and a second decoding layer through two output heads respectively, and is different from the traditional portrait cutout and semantic segmentation, the portrait cutout is a task crossed with the portrait cutout, so that the foreground and background of each pixel need to be classified, the opacity value of the foreground and the background needs to be estimated, namely the alpha of the portrait in a predicted image is needed, and therefore the alpha of the first decoding layer is set as a weighted combination of the background and the foreground of the image of the input layer, namely:
Ip=αpFp+(1-αp)Bp
where p represents the pixel position, α ∈ [0, 1] represents the foreground opacity value,
the two decoding layers respectively complete the decoding of the coded image through the transposition convolution of multiple layers;
thirdly, model optimization, namely calculating the loss of two output layers through the image after the compression recovery by forward propagation, updating the weight matrix of the encoder by backward propagation respectively, continuously optimizing,
wherein the loss function is defined as:
σ1and σ2Is the respective weight of each loss;
Lsas a loss function of the semantic decoder,
where C is the number of semantic classes, yncIs the true value of C, pncIs the predicted value corresponding to C,
Lαis a personLoss function of image decoder, after completing semantic image decoding, using LαThe target value is directly estimated and,
Lα=||αg-αp||1
wherein alpha isgAnd alphapRespectively representing the true value and the predicted value of the alpha matte;
the multi-task learning is a complete end-to-end process, so that a loss function L is adopted to train a semantic decoder and a portrait decoder so as to realize the function of automatically adjusting the weights of different tasks.
In other words, in the training process, the whole network is trained by using a multi-task learning method, so that the network can train two tasks, and in the testing process, one decoder is deleted for testing, so that compared with a single manual analysis task, a multi-task learning mechanism has a great improvement on a final segmentation result.
An electronic device comprising a processor and a memory, the memory having machine executable instructions capable of being executed by the processor, the processor being capable of executing the machine executable instructions to implement the above method.
A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the above-mentioned method.
Compared to the processing method of the comparative network structure of figure 2,
FIG. 2 shows that the original RGB image is input into a T network including an encoder and a decoder, and then output into an image T including foreground, background and transition scenesgThen T is addedgCompared with the method provided by the invention, the method provided by the invention needs to execute two coding and two decoding actions, and after the encoder and the two parallel decoders are set to be trained, the effect same as that of the traditional method can be realized by only performing one coding and decoding action, the processing speed is doubled, and the occupied resources are greatly reduced.
Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.
Claims (6)
1. A portrait image mixing processing method is characterized by comprising the following steps: the system comprises two parallel decoders and a shared encoder, wherein the two decoders are a portrait decoder and a semantic decoder respectively, the portrait decoder is used for decoding portrait information, the semantic decoder is used for decoding semantic information, and the encoder is used for encoding an input image and extracting features for the two decoders.
2. The portrait image mixing processing method according to claim 1, wherein: the processing method specifically comprises the following steps:
s1: construction and pre-processing of a portrait image dataset,
cutting all training face images into uniform sizes after acquiring the training face images;
s2: constructing a neural network training model and establishing a neural network training model,
the trained neural network model comprises six parts, namely an input layer, a coding layer, a first decoding layer, a second decoding layer, a first output layer and a second output layer; the encoding layer completes compression on the feature data from the input layer through a complete rolling machine system structure, then the feature data are respectively output to the first decoding layer and the second decoding layer through two output heads, and the two decoding layers respectively complete decoding on the encoded image through multi-layer transposition convolution;
s3: the model is optimized, and the model is optimized,
and respectively calculating the losses of the two output layers through the image subjected to forward propagation compression recovery, respectively propagating and updating the weight matrix of the encoder in the reverse direction, and continuously optimizing.
3. The portrait image mixing processing method according to claim 2, wherein: alpha matte in the first decoding layer is set as a weighted combination of the background and the foreground of the input layer image, namely:
Ip=αpFp+(1-αp)Bp
5. An electronic device comprising a processor and a memory, the memory having machine executable instructions executable by the processor to perform the method of any one of claims 1 to 4.
6. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011644727.5A CN113079391A (en) | 2020-12-31 | 2020-12-31 | Portrait image mixing processing method, equipment and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011644727.5A CN113079391A (en) | 2020-12-31 | 2020-12-31 | Portrait image mixing processing method, equipment and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113079391A true CN113079391A (en) | 2021-07-06 |
Family
ID=76609339
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011644727.5A Pending CN113079391A (en) | 2020-12-31 | 2020-12-31 | Portrait image mixing processing method, equipment and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113079391A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113096001A (en) * | 2021-04-01 | 2021-07-09 | 咪咕文化科技有限公司 | Image processing method, electronic device and readable storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108985250A (en) * | 2018-07-27 | 2018-12-11 | 大连理工大学 | A kind of traffic scene analytic method based on multitask network |
CN110148081A (en) * | 2019-03-25 | 2019-08-20 | 腾讯科技(深圳)有限公司 | Training method, image processing method, device and the storage medium of image processing model |
CN110809784A (en) * | 2017-09-27 | 2020-02-18 | 谷歌有限责任公司 | End-to-end network model for high resolution image segmentation |
CN111353505A (en) * | 2020-05-25 | 2020-06-30 | 南京邮电大学 | Network model capable of realizing semantic segmentation and depth of field estimation jointly and training method |
-
2020
- 2020-12-31 CN CN202011644727.5A patent/CN113079391A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110809784A (en) * | 2017-09-27 | 2020-02-18 | 谷歌有限责任公司 | End-to-end network model for high resolution image segmentation |
CN108985250A (en) * | 2018-07-27 | 2018-12-11 | 大连理工大学 | A kind of traffic scene analytic method based on multitask network |
CN110148081A (en) * | 2019-03-25 | 2019-08-20 | 腾讯科技(深圳)有限公司 | Training method, image processing method, device and the storage medium of image processing model |
CN111353505A (en) * | 2020-05-25 | 2020-06-30 | 南京邮电大学 | Network model capable of realizing semantic segmentation and depth of field estimation jointly and training method |
Non-Patent Citations (1)
Title |
---|
SHAOFAN CAI ET AL.: ""Disentangled Image Matting"", 《2019 IEEE/CVF INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV)》, pages 8819 - 8826 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113096001A (en) * | 2021-04-01 | 2021-07-09 | 咪咕文化科技有限公司 | Image processing method, electronic device and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110517329B (en) | Deep learning image compression method based on semantic analysis | |
CN115049936A (en) | High-resolution remote sensing image-oriented boundary enhancement type semantic segmentation method | |
CN108921910B (en) | JPEG coding compressed image restoration method based on scalable convolutional neural network | |
CN109886391B (en) | Neural network compression method based on space forward and backward diagonal convolution | |
CN111340708A (en) | Method for rapidly generating high-resolution complete face image according to prior information | |
CN113792621B (en) | FPGA-based target detection accelerator design method | |
CN114863539A (en) | Portrait key point detection method and system based on feature fusion | |
CN113706545A (en) | Semi-supervised image segmentation method based on dual-branch nerve discrimination dimensionality reduction | |
CN114708297A (en) | Video target tracking method and device | |
CN116958534A (en) | Image processing method, training method of image processing model and related device | |
Wang et al. | Deep joint source-channel coding for multi-task network | |
CN113079391A (en) | Portrait image mixing processing method, equipment and computer readable storage medium | |
CN115797835A (en) | Non-supervision video target segmentation algorithm based on heterogeneous Transformer | |
CN113436198A (en) | Remote sensing image semantic segmentation method for collaborative image super-resolution reconstruction | |
WO2023174256A1 (en) | Data compression method and related device | |
CN113781376B (en) | High-definition face attribute editing method based on divide-and-congress | |
CN117078539A (en) | CNN-transducer-based local global interactive image restoration method | |
DE102018129135A1 (en) | USE OF REST VIDEO DATA RESULTING FROM A COMPRESSION OF ORIGINAL VIDEO DATA TO IMPROVE A DECOMPRESSION OF ORIGINAL VIDEO DATA | |
CN114283181B (en) | Dynamic texture migration method and system based on sample | |
CN117036368A (en) | Image data processing method, device, computer equipment and storage medium | |
CN111881794B (en) | Video behavior recognition method and system | |
WO2023075630A1 (en) | Adaptive deep-learning based probability prediction method for point cloud compression | |
CN111626917A (en) | Bidirectional image conversion system and method based on deep learning | |
CN117993480B (en) | AIGC federal learning method for designer style fusion and privacy protection | |
CN114881843B (en) | Fluid artistic control method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
CB02 | Change of applicant information |
Address after: Building a, Cetus, Wuxi Software Park, No.18 Zhenze Road, Xinwu District, Wuxi City, Jiangsu Province, 214000 Applicant after: Wuxi Leqi Technology Co.,Ltd. Address before: Building a, Cetus, Wuxi Software Park, No.18 Zhenze Road, Xinwu District, Wuxi City, Jiangsu Province, 214000 Applicant before: Wuxi Le Chi Technology Co.,Ltd. |
|
CB02 | Change of applicant information | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |