CN113643305B - Portrait detection and segmentation method based on deep network context promotion - Google Patents

Portrait detection and segmentation method based on deep network context promotion Download PDF

Info

Publication number
CN113643305B
CN113643305B CN202110913353.0A CN202110913353A CN113643305B CN 113643305 B CN113643305 B CN 113643305B CN 202110913353 A CN202110913353 A CN 202110913353A CN 113643305 B CN113643305 B CN 113643305B
Authority
CN
China
Prior art keywords
features
scale
multiplied
depth
portrait
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110913353.0A
Other languages
Chinese (zh)
Other versions
CN113643305A (en
Inventor
许赢月
王俊宇
高自立
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhuhai Fudan Innovation Research Institute
Original Assignee
Zhuhai Fudan Innovation Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhuhai Fudan Innovation Research Institute filed Critical Zhuhai Fudan Innovation Research Institute
Priority to CN202110913353.0A priority Critical patent/CN113643305B/en
Publication of CN113643305A publication Critical patent/CN113643305A/en
Application granted granted Critical
Publication of CN113643305B publication Critical patent/CN113643305B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Quality & Reliability (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a portrait detection and segmentation method based on depth network context promotion, which specifically comprises the steps of extracting L depth features with different scales from a portrait picture based on a depth network frame; based on the highest scale features, feature fusion is carried out on the depth features of the highest scale on a plurality of pyramid scales through a pyramid pooling module, and global priori information is generated; the context information of the depth features is promoted and fused from a high scale to a low scale through a fusion block, and the output features of each scale are obtained; respectively optimizing and training the output characteristics of each scale to finish portrait detection and segmentation; by the method, the context information of the depth network can be deeply excavated from multiple scales, multiple spaces and multiple channels without additional knowledge, and precise portrait detection and segmentation of the monocular image are realized.

Description

Portrait detection and segmentation method based on deep network context promotion
Technical Field
The invention belongs to the technical field of image processing, and particularly relates to a portrait detection and segmentation method based on deep network context promotion.
Background
Portrait detection and segmentation are a special task for semantic segmentation, and have a wide application range. Aiming at beautifying application, portrait detection is the basis of portrait picture stylization, field depth blurring processing, matting and other applications; for safety protection application, portrait detection can be subjected to fuzzy processing to replace portrait picture background information; portrait detection of monocular images is more important in practical applications because the multiple images captured relative to a dual camera are more unrestricted by the capture light and distance.
The main challenge of image detection based on deep learning is to accurately position the image, accurately segment the boundary between the image and the background, and the edge details of the image such as hair exacerbates the difficulty of edge segmentation. The existing deep learning-based algorithm mainly carries out finer portrait detection and segmentation through additional knowledge.
Some deep learning based algorithms more accurately locate the network by taking additional knowledge as additional input to the deep network. For example Automatic portrait segmentation for image stylization, by computing portrait location and shape range as additional input channels to the depth network; the gesture detector is added to generate human body key point images as an additional input channel of a depth network. Additional input, while beneficial for accurate positioning of the portrait, requires additional computation and memory requirements while not helping with edge segmentation.
Some deep learning based algorithms use additional calibration (e.g., edge calibration) as additional knowledge for deep network training. These additional edge scaling helps the depth network refine the edge details. In practice, however, edge labeling is expensive and the edge labeling of most current datasets is based on manual portrait labeling, with a blurring of the finesse around the edge. Thus, edge labeling is beneficial for summarizing portrait shapes, but has little impact on accurate edge segmentation.
Therefore, how to perform refined portrait detection and segmentation without additional knowledge becomes a key problem of current research.
Disclosure of Invention
In view of the above problems, the present invention provides a method for detecting and segmenting a portrait based on context promotion of a depth network, which at least solves some of the above technical problems, and performs portrait detection and segmentation on a monocular image by deep mining of context information of the depth network from multiple scales, multiple spaces and multiple channels without additional knowledge.
The embodiment of the invention provides a portrait detection and segmentation method based on deep network context promotion, which comprises the following steps:
s1, extracting L depth features with different scales from a portrait picture based on a depth network frame;
s2, based on the highest scale features, feature fusion is carried out on the depth features of the highest scale on a plurality of pyramid scales through a pyramid pooling module, and global prior information is generated;
s3, the context information of the depth features is promoted and fused from a high scale to a low scale through a fusion block, and the output features of each scale are obtained;
and S4, respectively optimizing and training the output characteristics of each scale to finish portrait detection and segmentation.
Further, the step S2 specifically includes:
s21, reducing the feature size of the depth feature through an average pooling layer to generate features with sizes of 1 multiplied by 1, 3 multiplied by 3 and 5 multiplied by 5 respectively;
s22, respectively carrying out dimension reduction on the features with the sizes of 1 multiplied by 1, 3 multiplied by 3 and 5 multiplied by 5 through a convolution layer with the convolution kernel of 1 multiplied by 1, so as to obtain three dimension reduction features;
s23, upsampling the three dimension reduction features through bilinear interpolation, and splicing the depth features and the three features after upsampling to obtain a first spliced feature;
s24, smoothing the first splicing characteristic through a convolution layer with a convolution kernel of 3 multiplied by 3 to obtain global priori information.
Further, the fusion block in the step S3 includes a channel lifting module, a space lifting module, and a scale lifting module.
Further, the step S3 specifically includes:
s31, lifting context information of depth features from a channel through a channel lifting module;
s32, lifting context information of the depth features from the space through a space lifting module;
s33, fusing context information of the depth features from multi-scale through a scale lifting module.
Further, the step S31 specifically includes:
s311, taking the depth features corresponding to the scales from l=1 to l=L-1 as initial features, and processing the initial features by adopting a convolution layer with a convolution kernel of 3×3 and a group number equal to the number of channels to obtain generated features;
s312, splicing the generated features and the initial features to obtain second spliced features;
s313, performing dimension reduction processing on the second spliced feature through a convolution layer with a convolution kernel of 1 multiplied by 1 and an output channel equal to the number of input feature channels, and outputting the second spliced feature to obtain a first output feature.
Further, the step S32 specifically includes:
s321, reducing the characteristic size of the first input characteristic through an average pooling layer, reducing the characteristic size of the first input characteristic through the average pooling layer, and generating the characteristics of 1/2,1/4 and 1/8 of the first characteristic with pooling kernel sizes of 2×2, 4×4 and 8×8 respectively;
s322, smoothing the features with the sizes of 1/2,1/4 and 1/8 respectively through a convolution layer with a convolution kernel of 3 multiplied by 3;
s323, upsampling the features subjected to the smoothing processing in the S322 through bilinear interpolation, and adding and fusing upsampling results;
s324, smoothing the features after the addition and fusion in S323 through a convolution layer with a convolution kernel of 3 multiplied by 3 to obtain a second output feature.
Further, the step S33 specifically includes:
s331, processing the second output characteristic through a convolution layer with a convolution kernel of 3 multiplied by 3, and up-sampling the processed result through bilinear interpolation to obtain a third output characteristic;
s332, adding and fusing the second output characteristic and the third output characteristic;
s333, smoothing the features after the addition and fusion of the S332 through a convolution layer with a convolution kernel of 3 multiplied by 3; and obtaining a multi-scale feature fusion result.
Further, the step S4 specifically includes:
s41, respectively processing the output characteristics of each scale through a convolution layer with a convolution kernel of 1 multiplied by 1 to generate a portrait predictive picture;
s42, performing optimization training on each prediction graph through a cross entropy loss function;
s43, training a portrait detection and segmentation model through a large-scale portrait detection data set;
s43, calibrating a fine portrait data set through the carefully selected small-scale portrait edge, and performing model fine adjustment to realize a fine portrait detection model;
s44, detecting and dividing the portrait.
Compared with the prior art, the portrait detection and segmentation method based on the deep network context promotion has the following beneficial effects:
according to the invention, when the operations such as edge calibration and additional detection operators are performed on the portrait pictures, the portrait can be accurately detected and segmented only by deep mining of the context information of the depth network from multiple scales, multiple spaces and multiple channels without depending on additional knowledge, so that the data marking cost is reduced, and the requirements of industrial production and practical application are more met.
The invention can be used to perform well beyond depth models that use additional knowledge without using it.
The invention can realize accurate detection and segmentation of the portrait pictures, and the segmentation result can be used for subsequent applications such as matting, depth of field blurring processing, background replacement, sketching, stylized cartoon and the like.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
fig. 1 is a frame diagram of a portrait detection and segmentation method based on context promotion of a depth network according to an embodiment of the present invention.
Fig. 2 is a pyramid pooling block diagram according to an embodiment of the present invention.
Fig. 3 is a diagram of a channel-lifting module according to an embodiment of the present invention.
Fig. 4 is a block diagram of a space lifting module according to an embodiment of the present invention.
FIG. 5 is a block diagram illustrating a dimension improvement according to an embodiment of the present invention
Fig. 6 is a structural diagram of a portrait detection and segmentation method based on context promotion of a depth network according to an embodiment of the present invention.
Fig. 7 is a result diagram of labeling a portrait picture using an existing dataset.
Fig. 8 is an effect diagram of the image detection method according to the embodiment of the present invention in expanding applications.
Detailed Description
Exemplary embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
Referring to fig. 1, an embodiment of the present invention provides a portrait detection and segmentation method based on context promotion of a depth network, which specifically includes the following steps:
s1, extracting L depth features with different scales from a portrait picture based on a depth network frame;
s2, based on the highest scale features, feature fusion is carried out on the depth features of the highest scale on a plurality of pyramid scales through a pyramid pooling module, and global prior information is generated;
s3, the context information of the depth features is promoted and fused from a high scale to a low scale through a fusion block, and the output features of each scale are obtained;
and S4, respectively optimizing and training the output characteristics of each scale to finish portrait detection and segmentation.
The above steps are described in detail below.
In the step S1, given an input portrait picture I, extracting L depth features with different scales under a depth network frame; the depth network framework selected in the embodiment of the invention is selected as a plurality of popular depth network structures, and a convolution form adopted by the depth network framework is reserved; the aggregate set of extracted depth features is noted asWherein f l Is a feature on the l scale; l=0 represents the depth network highest scale; l=l-1 represents the lowest dimension of the depth network.
In the step S2, the pyramid pooling model is used for spatially counting the features in a plurality of pyramid spaces to summarize the feature information of the full scene; feature fusion is carried out on depth features on multiple pyramid scales through a pyramid pooling module, namely fusion blocks with L-1 scales are usedThe depth features are fused, and the formula is as follows:
wherein, the liquid crystal display device comprises a liquid crystal display device,representing fusion chunk->At the output on the first scale, xi represents model weights.
The embedded module of the embodiment of the invention adopts depth separable convolution to reduce the parameter quantity and the computation complexity, and particularly referring to fig. 2, three pyramid scales can be adopted, firstly, the feature size of the depth feature is reduced through an average pooling layer, and the features with the sizes of 1 multiplied by 1, 3 multiplied by 3 and 5 multiplied by 5 are generated; secondly, respectively carrying out dimension reduction on the features with the sizes of 1 multiplied by 1, 3 multiplied by 3 and 5 multiplied by 5 through a convolution layer with the convolution kernel of 1 multiplied by 1 to obtain three dimension reduction features; then up-sampling the three dimension-reducing features through bilinear interpolation, and splicing the depth features and the three features after up-sampling treatment to obtain a first spliced feature; finally, smoothing the first splicing characteristic through a convolution layer with a convolution kernel of 3 multiplied by 3 to obtain global priori information; the global priori information is used for gradually transmitting global information from a high scale to a low scale through the fusion block, so that the global priori information is used for guiding the overall portrait positioning and ensuring the precise portrait positioning;
global a priori information at the highest scale l=0Calculated by the following method:
wherein P (·) represents a pyramid pooling module, W represents a deep network framework weight, and W P Representing pyramid pooling module weights.
In the step S3, the depth features are promoted and fused from a high scale to a low scale by a fusion block to obtain output features of each scale, wherein the fusion block comprises a channel promotion module, a space promotion module and a scale promotion module;
since there are many similar pairs of depth features extracted directly from portrait pictures, it is considered that there is information redundancy; redundancy of depth features on the channel can be improved through the channel lifting module, so that expressive force of the features is more abundant; when the channel lifting module is used for lifting the context information of the depth features from the channel, referring to fig. 3, firstly, the depth features corresponding to the scale l=1 to the scale l=l-1 are used as initial features, and the convolution layers with the convolution kernel of 3×3 and the number of groups equal to the number of channels are adopted to process the initial features to obtain generated features; secondly, splicing the generated features and the initial features to obtain second spliced features; finally, performing dimension reduction processing on the second spliced feature through a convolution layer with a convolution kernel of 1 multiplied by 1 and an output channel equal to the number of input feature channels, and outputting the second spliced feature to obtain a first output feature; the first output feature has a rich expressive force.
When the context information of the depth feature is lifted from the space by using the space lifting module, the pyramid pooling concept is used, specifically referring to fig. 4, firstly, the feature size of the first input feature is reduced by an average pooling layer, the feature size of the first input feature is reduced by the average pooling layer, and the pooling kernel sizes are respectively 2×2, 4×4 and 8×8, so that the features with the sizes of 1/2,1/4 and 1/8 of the first feature are respectively generated; smoothing the features with the sizes of 1/2,1/4 and 1/8 respectively by a convolution layer with a convolution kernel of 3 multiplied by 3; then upsampling the smoothed characteristics through bilinear interpolation, and adding and fusing the upsampling results; finally, smoothing the features after addition and fusion through a convolution layer with a convolution kernel of 3 multiplied by 3 to obtain a second output feature; the feature quality of the second output feature obtained by the space lifting module is greatly improved.
When context information of depth features is fused from a multi-scale by using a scale lifting module, referring to fig. 5 and fig. 6 specifically, first, a second output feature is processed through a convolution layer with a convolution kernel of 3×3, and the processed result is up-sampled through bilinear interpolation to obtain a third output feature; secondly, adding and fusing the second output characteristic and the third output characteristic; finally, smoothing the features after addition and fusion through a convolution layer with a convolution kernel of 3 multiplied by 3; and obtaining a multi-scale feature fusion result.
The context information of the depth features is promoted and fused step by step from a high scale to a low scale at a channel angle, a space angle and a scale angle, and the human image detection result with high accuracy on the scale of l=L-1 is finally obtained from global positioning to local detail and from coarse to fine human image detection and segmentation prediction.
In the step S4, the output features of each scale are optimized, specifically: processing the output characteristics of each scale through a convolution layer with a convolution kernel of 1 multiplied by 1 to generate a portrait predictive picture; optimizing and training each prediction graph through a cross entropy loss function; secondly, carrying out data training on the output characteristics of each scale, wherein most of existing portrait detection data have the characteristic of relatively fuzzy edge labeling, and particularly referring to FIG. 7, the edge labeling error is large as can be seen through the last column of enlarged diagram of FIG. 7 of the accompanying drawing, so that the edge labeling obtained based on the labeling is inaccurate and cannot guide the training of a refined model; therefore, in order to perform the training of the refined portrait detection model, the invention specifically performs the training through two stages: the first stage is to train a high-robustness high-accuracy portrait detection and segmentation model through a large-scale portrait detection dataset, such as providing a large number of portrait pictures and corresponding labels; in the second stage, fine portrait data sets are calibrated through carefully chosen small-scale portrait edges, and model fine adjustment is carried out, so that the judgment of the portrait edge pixels is more accurate; the deep network framework selection in the embodiment of the invention can use various deep network structures which are popular currently. Taking VGG-16 as an example, the characteristic outputs of conv5, conv4, conv2, and conv1 can be used as f l . The convolution form adopted by the depth network framework can be reserved, and the modules embedded by the algorithm adopt depth separable convolution to reduce the parameters and the computational complexity. During training, the parameters of VGG-16 are set as follows: the weight decay is 0.0005; momentum is 0.9; the weight of the loss function of each scale is 1; batch size 1; the optimizer uses an adam optimizer. Training in the first stage, wherein the initial learning rate is fixed to be 1e-4, and the learning rate is divided by 10 every 10 cycles after training for 30 cycles, and the total training is 80 cycles; training in the second stage, wherein the initial learning rate is fixed to be 1e-5, and training is performed for 50 cycles by dividing the learning rate by 10 every 10 cycles;referring to fig. 8, the image detection and segmentation method based on the context promotion of the depth network provided by the invention can accurately detect and segment images, and can realize end-to-end image detection; when the image with the resolution of 300X400 is detected, the detection speed can reach 57.21FPS, and the segmentation result can be used for subsequent applications such as image matting, depth of field blurring processing, background replacement, sketching, stylization, cartoon and the like.
The embodiment of the invention provides a portrait detection and segmentation method based on context promotion of a depth network, which is shown by referring to fig. 6, firstly, global priori information based on the highest scale features is calculated through a pyramid pooling module based on a depth network frame and used for guiding the overall portrait positioning; then, the global prior information is used for gradually transmitting global information from a high scale to a bottom scale through a channel lifting module, a space lifting module and a scale lifting module so as to ensure accurate positioning of the portrait; the characteristic expressive force can be enriched through the channel lifting module; the quality of the feature map can be improved through the space lifting module; the multi-scale feature fusion result can be obtained through the scale lifting module; finally, respectively optimizing and training the output characteristics of each scale to realize a refined portrait detection model; based on the method, accurate detection and segmentation of the portrait picture can be realized.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (3)

1. A portrait detection and segmentation method based on deep network context promotion is characterized by comprising the following steps:
s1, extracting L depth features with different scales from a portrait picture based on a depth network frame;
s2, based on the highest scale features, feature fusion is carried out on the depth features of the highest scale on a plurality of pyramid scales through a pyramid pooling module, and global prior information is generated;
s3, the context information of the depth features is promoted and fused from a high scale to a low scale through a fusion block, and the output features of each scale are obtained;
s4, respectively optimizing and training the output characteristics of each scale to finish portrait detection and segmentation;
the fusion block in the step S3 includes a channel lifting module, a space lifting module and a scale lifting module;
the step S3 specifically comprises the following steps:
s31, lifting context information of the depth feature from a channel angle through a channel lifting module;
s32, lifting context information of the depth feature from a space angle through a space lifting module;
s33, fusing context information of the depth features from a multi-scale angle through a scale lifting module;
the step S31 specifically includes:
s311, taking the depth features corresponding to the scales from l=1 to l=L-1 as initial features, and processing the initial features by adopting a convolution layer with a convolution kernel of 3×3 and a group number equal to the number of channels to obtain generated features;
s312, splicing the generated features and the initial features to obtain second spliced features;
s313, performing dimension reduction processing on the second spliced feature through a convolution layer with a convolution kernel of 1 multiplied by 1 and an output channel equal to the number of input feature channels, and outputting the second spliced feature to obtain a first output feature;
the step S32 specifically includes:
s321, reducing the characteristic size of the first output characteristic through an average pooling layer, wherein the pooling core sizes are respectively 2 multiplied by 2, 4 multiplied by 4 and 8 multiplied by 8, and the characteristics of the first output characteristic 1/2,1/4 and 1/8 are respectively generated;
s322, smoothing the features with the sizes of 1/2,1/4 and 1/8 respectively through a convolution layer with a convolution kernel of 3 multiplied by 3;
s323, upsampling the features subjected to the smoothing processing in the S322 through bilinear interpolation, and adding and fusing upsampling results;
s324, performing smoothing processing on the features after addition and fusion in S323 through a convolution layer with a convolution kernel of 3 multiplied by 3 to obtain a second output feature;
the step S33 specifically includes:
s331, processing the second output characteristic through a convolution layer with a convolution kernel of 3 multiplied by 3, and up-sampling the processed result through bilinear interpolation to obtain a third output characteristic;
s332, adding and fusing the second output characteristic and the third output characteristic;
s333, smoothing the features after the addition and fusion of the S332 through a convolution layer with a convolution kernel of 3 multiplied by 3; and obtaining a multi-scale feature fusion result.
2. The method for detecting and segmenting portraits based on deep network context promotion as defined in claim 1, wherein S2 specifically comprises:
s21, reducing the feature size of the depth feature through an average pooling layer to generate features with sizes of 1 multiplied by 1, 3 multiplied by 3 and 5 multiplied by 5 respectively;
s22, respectively carrying out dimension reduction on the features with the sizes of 1 multiplied by 1, 3 multiplied by 3 and 5 multiplied by 5 through a convolution layer with the convolution kernel of 1 multiplied by 1, so as to obtain three dimension reduction features;
s23, upsampling the three dimension reduction features through bilinear interpolation, and splicing the depth features and the three features after upsampling to obtain a first spliced feature;
s24, smoothing the first splicing characteristic through a convolution layer with a convolution kernel of 3 multiplied by 3 to obtain global priori information.
3. The method for detecting and segmenting portraits based on deep network context promotion as claimed in claim 1, wherein said S4 specifically comprises:
s41, respectively processing the output characteristics of each scale through a convolution layer with a convolution kernel of 1 multiplied by 1 to generate a portrait predictive picture;
s42, performing optimization training on each prediction graph through a cross entropy loss function;
s43, training a portrait detection and segmentation model through a large-scale portrait detection data set;
s43, calibrating a fine portrait data set through the carefully selected small-scale portrait edge, and performing model fine adjustment to realize a fine portrait detection model;
s44, detecting and dividing the portrait.
CN202110913353.0A 2021-08-10 2021-08-10 Portrait detection and segmentation method based on deep network context promotion Active CN113643305B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110913353.0A CN113643305B (en) 2021-08-10 2021-08-10 Portrait detection and segmentation method based on deep network context promotion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110913353.0A CN113643305B (en) 2021-08-10 2021-08-10 Portrait detection and segmentation method based on deep network context promotion

Publications (2)

Publication Number Publication Date
CN113643305A CN113643305A (en) 2021-11-12
CN113643305B true CN113643305B (en) 2023-08-25

Family

ID=78420479

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110913353.0A Active CN113643305B (en) 2021-08-10 2021-08-10 Portrait detection and segmentation method based on deep network context promotion

Country Status (1)

Country Link
CN (1) CN113643305B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413161A (en) * 2013-07-30 2013-11-27 复旦大学 Electronic tag capable of being switched into safe mode and switching method thereof
CN108062756A (en) * 2018-01-29 2018-05-22 重庆理工大学 Image, semantic dividing method based on the full convolutional network of depth and condition random field
CN111402129A (en) * 2020-02-21 2020-07-10 西安交通大学 Binocular stereo matching method based on joint up-sampling convolutional neural network
CN111681273A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 Image segmentation method and device, electronic equipment and readable storage medium
CN111724300A (en) * 2020-06-30 2020-09-29 珠海复旦创新研究院 Single picture background blurring method, device and equipment
CN112508868A (en) * 2020-11-23 2021-03-16 西安科锐盛创新科技有限公司 Intracranial blood vessel comprehensive image generation method
WO2021056808A1 (en) * 2019-09-26 2021-04-01 上海商汤智能科技有限公司 Image processing method and apparatus, electronic device, and storage medium
CN112801183A (en) * 2021-01-28 2021-05-14 哈尔滨理工大学 Multi-scale target detection method based on YOLO v3
CN112927209A (en) * 2021-03-05 2021-06-08 重庆邮电大学 CNN-based significance detection system and method

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103413161A (en) * 2013-07-30 2013-11-27 复旦大学 Electronic tag capable of being switched into safe mode and switching method thereof
CN108062756A (en) * 2018-01-29 2018-05-22 重庆理工大学 Image, semantic dividing method based on the full convolutional network of depth and condition random field
WO2021056808A1 (en) * 2019-09-26 2021-04-01 上海商汤智能科技有限公司 Image processing method and apparatus, electronic device, and storage medium
CN111402129A (en) * 2020-02-21 2020-07-10 西安交通大学 Binocular stereo matching method based on joint up-sampling convolutional neural network
CN111681273A (en) * 2020-06-10 2020-09-18 创新奇智(青岛)科技有限公司 Image segmentation method and device, electronic equipment and readable storage medium
CN111724300A (en) * 2020-06-30 2020-09-29 珠海复旦创新研究院 Single picture background blurring method, device and equipment
CN112508868A (en) * 2020-11-23 2021-03-16 西安科锐盛创新科技有限公司 Intracranial blood vessel comprehensive image generation method
CN112801183A (en) * 2021-01-28 2021-05-14 哈尔滨理工大学 Multi-scale target detection method based on YOLO v3
CN112927209A (en) * 2021-03-05 2021-06-08 重庆邮电大学 CNN-based significance detection system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Structured Modeling of Joint Deep Feature and Prediction Refinement for Salient Object Detection;Yingyue xue .etc;《International Conference on Computer Vision》;1-10 *

Also Published As

Publication number Publication date
CN113643305A (en) 2021-11-12

Similar Documents

Publication Publication Date Title
EP3540637B1 (en) Neural network model training method, device and storage medium for image processing
CN110782490B (en) Video depth map estimation method and device with space-time consistency
CN110738697A (en) Monocular depth estimation method based on deep learning
CN111461110A (en) Small target detection method based on multi-scale image and weighted fusion loss
JP2019032773A (en) Image processing apparatus, and image processing method
CN102156969B (en) Processing method for correcting deviation of image
CN112967341B (en) Indoor visual positioning method, system, equipment and storage medium based on live-action image
CN111861880B (en) Image super-fusion method based on regional information enhancement and block self-attention
CN113449735B (en) Semantic segmentation method and device for super-pixel segmentation
CN109087261A (en) Face antidote based on untethered acquisition scene
US20230237683A1 (en) Model generation method and apparatus based on multi-view panoramic image
CN112464798A (en) Text recognition method and device, electronic equipment and storage medium
CN111914756A (en) Video data processing method and device
CN111753670A (en) Human face overdividing method based on iterative cooperation of attention restoration and key point detection
CN116363750A (en) Human body posture prediction method, device, equipment and readable storage medium
CN116645598A (en) Remote sensing image semantic segmentation method based on channel attention feature fusion
CN113643305B (en) Portrait detection and segmentation method based on deep network context promotion
CN112446353A (en) Video image trace line detection method based on deep convolutional neural network
CN117333682A (en) Multi-view three-dimensional reconstruction method based on self-attention mechanism
CN112541506B (en) Text image correction method, device, equipment and medium
CN115330655A (en) Image fusion method and system based on self-attention mechanism
CN115330935A (en) Three-dimensional reconstruction method and system based on deep learning
CN112634331A (en) Optical flow prediction method and device
CN112017120A (en) Image synthesis method and device
CN111899284A (en) Plane target tracking method based on parameterized ESM network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant