CN116206114A - Portrait extraction method and device under complex background - Google Patents

Portrait extraction method and device under complex background Download PDF

Info

Publication number
CN116206114A
CN116206114A CN202310474489.5A CN202310474489A CN116206114A CN 116206114 A CN116206114 A CN 116206114A CN 202310474489 A CN202310474489 A CN 202310474489A CN 116206114 A CN116206114 A CN 116206114A
Authority
CN
China
Prior art keywords
resolution
portrait
blocks
characteristic
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310474489.5A
Other languages
Chinese (zh)
Other versions
CN116206114B (en
Inventor
向雷
吕磊
黄德頔
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu Yundun Technology Co ltd
Original Assignee
Chengdu Yundun Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu Yundun Technology Co ltd filed Critical Chengdu Yundun Technology Co ltd
Priority to CN202310474489.5A priority Critical patent/CN116206114B/en
Publication of CN116206114A publication Critical patent/CN116206114A/en
Application granted granted Critical
Publication of CN116206114B publication Critical patent/CN116206114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a method and a device for extracting a portrait in a complex background, which relate to the technical field of picture processing, and can automatically and accurately extract the portrait in the complex background without manually extracting the portrait by a user; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide. The key points of the scheme are as follows: dividing a picture to be extracted into a first non-overlapping block; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the blocks according to the feature quantity to obtain a first resolution feature map; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; and outputting the portrait picture according to the image output parameters. The method is mainly used for extracting the image.

Description

Portrait extraction method and device under complex background
Technical Field
The invention relates to the technical field of picture processing, in particular to a method and a device for extracting a portrait under a complex background.
Background
The name of the portrait extraction technology is that the portrait is extracted from the picture, and in recent years, along with the development of scientific technology, the portrait extraction technology still has a plurality of problems, such as inaccurate portrait matting, and incapability of separating the face from the background more accurately; the edge detail is rough; the position of the person is more limited during photographing; without face tilt correction, etc.
The existing geometric feature-based identification method also has difficulty in that a unified and excellent feature extraction standard is not formed. As the facial patterns of adults become variable, even facial images of the same person are difficult to express in a uniform pattern due to time, illumination, camera angles, and the like, resulting in difficulty in feature extraction.
Disclosure of Invention
The invention provides a method and a device for extracting a portrait under a complex background, wherein the method comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the first aspect of the present invention provides a method for extracting a portrait in a complex background, including:
and dividing the picture to be extracted into a first non-overlapping block.
Mapping the first non-overlapping block to any dimension to obtain a dimension parameter.
And extracting the characteristic quantity in the dimension parameter.
And splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram.
And dividing the portrait characteristic diagram according to the first resolution characteristic diagram.
Dividing the blocks of the portrait characteristic map into second non-overlapping blocks.
And extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm.
And converting the portrait characteristic quantity into an image output parameter.
And outputting the portrait picture according to the image output parameters.
Further, the method for extracting the portrait under the complex background further comprises, after the feature quantity is spliced into blocks to obtain a first resolution feature map:
and extracting the characteristic quantity in the dimension parameter.
Splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
Further, the method for extracting the portrait under the complex background further comprises, after splicing the feature quantities into blocks to obtain a second resolution feature map:
and extracting the characteristic quantity in the dimension parameter.
Splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
Further, the method for extracting the portrait under the complex background divides the blocks of the portrait feature map into second non-overlapping blocks, including:
dividing the block of the first resolution feature map into blocks of a second resolution feature map.
Dividing the blocks of the second resolution feature map into blocks of a third resolution feature map.
The block of the third resolution feature map is partitioned into the second non-overlapping blocks.
A second aspect of the present invention provides a portrait extraction device under a complex background, including:
and the first segmentation unit is used for segmenting the picture to be extracted into first non-overlapping blocks.
And the mapping unit is used for mapping the first non-overlapping block to any dimension to obtain a dimension parameter.
And the first extraction unit is used for extracting the characteristic quantity in the dimension parameter.
And the first splicing unit is used for splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram.
And the second segmentation unit is used for segmenting the portrait characteristic diagram according to the resolution characteristic diagram.
And the third segmentation unit is used for segmenting the blocks of the portrait characteristic diagram into second non-overlapping blocks.
And the second extraction unit is used for extracting the portrait characteristic quantity of the second non-overlapping block and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by the jump connection algorithm.
And the conversion unit is used for converting the portrait characteristic quantity into an image output parameter.
And the output unit is used for outputting the portrait picture according to the image output parameters.
Further, the device for extracting the portrait under the complex background further comprises:
and the third extraction unit is used for extracting the characteristic quantity in the dimension parameter.
The second splicing unit is used for splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
Further, the device for extracting the portrait under the complex background further comprises:
and a fourth extraction unit, configured to extract a feature quantity in the dimension parameter.
The third splicing unit is used for splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
Further, in the image extraction device under a complex background, the third segmentation unit includes:
and the first segmentation module is used for segmenting the block of the first resolution characteristic map into blocks of the second resolution characteristic map.
And the second segmentation module is used for segmenting the blocks of the second resolution characteristic map into blocks of the third resolution characteristic map.
And a third segmentation module, configured to segment the block of the third resolution feature map into the second non-overlapping block.
The invention provides a method and a device for extracting a portrait under a complex background, wherein the method comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are used in the description of the embodiments will be briefly described below, which are only for the purpose of illustrating the embodiments and are not to be construed as limiting the present invention.
FIG. 1 is a schematic flow chart of a method for extracting images under a complex background in an embodiment of the invention;
FIG. 2 is a schematic flow chart of a method for extracting images under another complex background according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a configuration of a portrait extraction device under a complex background according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a configuration of a portrait extraction device under another complex background according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs; the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention and the terms "comprising" and "having" and any variations thereof, as described in the specification and claims of the invention and the above description of the drawings, are intended to cover a non-exclusive inclusion.
In the description of embodiments of the present invention, the technical terms "first," "second," and the like are used merely to distinguish between different objects and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated, a particular order or a primary or secondary relationship. In the description of the embodiments of the present invention, the meaning of "plurality" is two or more unless explicitly defined otherwise.
In the description of the embodiments of the present invention, the term "and/or" is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
In the description of the embodiments of the present invention, the term "plurality" means two or more (including two), and similarly, "plural sets" means two or more (including two), and "plural sheets" means two or more (including two).
In the description of the embodiments of the present invention, the orientation or positional relationship indicated by the technical terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. are based on the orientation or positional relationship shown in the drawings, and are merely for convenience of describing the embodiments of the present invention and for simplifying the description, rather than indicating or implying that the apparatus or component to be referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the embodiments of the present invention.
In the description of the embodiments of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured" and the like should be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally formed; or may be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between the two components or interaction relationship between the two components. The specific meaning of the above terms in the examples of the present invention will be understood by those skilled in the art according to the specific circumstances.
Example 1
The embodiment of the invention provides a method for extracting a human image under a complex background, which is shown in fig. 1 and comprises the following steps:
s1, dividing a picture to be extracted into a first non-overlapping block.
What needs to be explained here is: the embodiment of the invention does not limit the picture to be extracted, but the picture to be extracted necessarily contains the portrait.
Wherein, the non-overlapping blocks, as the name implies, are non-overlapping picture blocks, the size of the first non-overlapping block after division is not limited in the embodiments of the present invention.
S2, mapping the first non-overlapping block to any dimension to obtain a dimension parameter.
Where dimension is also known as dimension, is the number of independent parameters in mathematics. In the field of physics and philosophy, the number of independent space-time coordinates is meant.
And S3, extracting the characteristic quantity in the dimension parameter.
And S4, splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram.
S5, dividing the portrait characteristic diagram according to the first resolution characteristic diagram.
S6, dividing the blocks of the portrait characteristic diagram into second non-overlapping blocks.
And S7, extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained through a jump connection algorithm.
S8, converting the portrait characteristic quantity into an image output parameter.
S9, outputting the portrait picture according to the image output parameters.
The present invention achieves the above steps by FT-UNet, which consists of encoder, bottleneck, decoder and skip connections. The basic unit of FT-UNet is the Focal Transformer module. For the encoder, to convert the input into sequence embedding, the image is divided into non-overlapping blocks of 4×4 size. The patch metering layer is responsible for downsampling and upscaling dimensions, and Focal Transformer is responsible for learning feature representations. The extracted context features are fused with the multi-scale features from the encoder via skip connection to supplement the spatial information loss caused by downsampling. The patch merge layer duplicates the feature map of the adjacent dimension to a large feature map with twice the resolution of the upsampling. Finally, 4 x upsampling is performed using the patch expanding layer to restore the resolution of the feature map to the input resolution W x H, and then a linear projection layer is applied over these upsampled features to output the image extraction.
The Focal Transformer structure comprises a patch part layer, a linear embedded layer, a Focal Transformer layer and a patch raising layer. The patch partition layer is used for block dimension reduction, the linear embedding layer is used for linear transformation, and the patch raising layer is used for downsampling. The core module is a Focal Transformer block that contains focus self-attention for moving windows.
Wherein the focus self-attention model is as follows:
assume that a feature map is input
Figure SMS_1
Wherein->
Figure SMS_2
And d is a feature dimension. All L-level pool windows. L is the number of granularity levels of focus self-attention extraction tags. For the focus level l, the input feature map x is first divided into a size +.>
Figure SMS_3
Is provided. />
Figure SMS_4
Is indicated at->
Figure SMS_5
The level gets the size of the sub-window of the summary mark by using the linear layer +.>
Figure SMS_6
Spatially merging sub-windows: />
Figure SMS_7
Figure SMS_8
(1)
Wherein x represents an input feature map; l represents a focus level; m, N the spatial dimension; d represents a feature dimension;
Figure SMS_9
is indicated at->
Figure SMS_10
The level obtains the size of the sub-window of the summary mark; />
Figure SMS_11
Representing a linear layer.
When the collected feature map of all L layers is obtained
Figure SMS_12
When three linear projection layers are used +.>
Figure SMS_13
、/>
Figure SMS_14
And->
Figure SMS_15
To calculate the first tier query and the keys and values of all tiers:
Figure SMS_16
,/>
Figure SMS_17
Figure SMS_18
(2)
wherein Q, K, V is a query, key, and value matrix, respectively; x represents the input feature map;
Figure SMS_19
、/>
Figure SMS_20
and->
Figure SMS_21
Representing a linear projection layer.
To perform the focus self-intent, surrounding labels for each query label in the feature map are first extracted. For the ith window
Figure SMS_24
Inquiry in->
Figure SMS_26
Representing window division +.>
Figure SMS_29
Representing the number of horizontal and vertical sub-windows in the level I participation area, from +.>
Figure SMS_23
And->
Figure SMS_28
Extract->
Figure SMS_30
Key and value, then collecting key and value from all L +.>
Figure SMS_31
And->
Figure SMS_22
Wherein s is the sum of the focal areas of all layers, i.e. +.>
Figure SMS_25
. Here, the +.>
Figure SMS_27
Focus self-attention of (c):
Figure SMS_32
(3)
in the method, in the process of the invention,
Figure SMS_33
、/>
Figure SMS_34
and->
Figure SMS_35
Respectively representing the query, the key and the value matrix, d is the vector dimension, B is the deflection matrix, and SoftMax is the multi-class activation function.
The invention provides a portrait extraction method under a complex background, which comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Example 2
The embodiment of the invention provides a method for extracting a human image under a complex background, which is shown in fig. 2 and comprises the following steps:
s201, dividing a picture to be extracted into first non-overlapping blocks.
Specifically, the picture to be extracted is divided into first non-overlapping blocks with the same size, for example: the picture to be extracted is partitioned into non-overlapping blocks of 4 x 4 size, and what needs to be explained here is: the size of the non-overlapping blocks is not limited in the embodiment of the invention, and an implementer can correspondingly determine according to the size of the picture to be extracted.
S202, mapping the first non-overlapping block to any dimension to obtain a dimension parameter.
Specifically, the first non-overlapping block is mapped to three dimensions to obtain its parameters, such as: non-overlapping blocks of size 4 x 4 are mapped to three dimensions, resulting in dimension parameters of 4 x 3.
S203, extracting the characteristic quantity in the dimension parameter.
S204, splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram.
S205, extracting the characteristic quantity in the dimension parameter.
S206, splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
S207, extracting the characteristic quantity in the dimension parameter.
S208, splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
S209, dividing a portrait characteristic diagram according to the first resolution characteristic diagram.
S210, dividing the blocks of the portrait characteristic diagram into second non-overlapping blocks.
S2101, dividing the block of the first resolution feature map into blocks of a second resolution feature map.
S2102, dividing the block of the second resolution feature map into blocks of a third resolution feature map.
S2103, dividing the block of the third resolution feature map into the second non-overlapping blocks.
In the case of insufficient raw data, the data set is expanded by data enhancement. A common way to augment a data set is to add new data, but in reality this is difficult to implement. In another mode, the data set is subjected to operations such as overturning, rotating and shearing, namely the data is enhanced, so that the data is expanded, and the mode is high in operability and convenient to realize. Data enhancement is performed by pixel-to-pixel spatial transformation in the image. The following formula is used for the coordinate transformation:
Figure SMS_36
(4)
where (v, w) represents coordinates of pixels in the original image and (x, y) represents coordinates in the transformed graphic.
Affine transformation is a common transformation, the general form of which is given by the formula:
Figure SMS_37
(5)
in the method, in the process of the invention,
Figure SMS_38
representing coordinates of pixels in the original image, +.>
Figure SMS_39
Representing coordinates in the transformed graph.
The embodiment of the invention provides a data enhancement method Auto-segment based on search. The basic idea is to find the best image transformation strategy from the data itself using reinforcement learning and evaluate the quality of a particular strategy directly on some datasets. In the invention, a search space is designed, each policy is composed of a plurality of sub-policies, and one sub-policy is randomly selected for each image in each batch. The sub-strategy contains two operations, each of which is an image processing method such as translation, rotation or clipping, and for each of which there is a set of probabilities and magnitudes to characterize the nature of the use of this operation.
The search algorithm has two components: one part is the controller and the other part is the training algorithm PPO (Proximal Policy Optimization) algorithm. In each operation, the controller predicts the result output by SoftMax to generate a feature vector, and then takes the feature vector as an embedded vector of the next operation. The search algorithm has two components, one controller is a recurrent neural network and one training algorithm is an approximate policy optimization algorithm. Each step controller predicts a decision made by SoftMax. And then sent to the next step as an embedding. The controller predicts 30 SoftMax predictions for a total of 5 sub-policies, each with 2 operations, each requiring an operation type, intensity, probability.
Training of a controller: the controller is trained by the reward signal, i.e., how much the strategy can promote the generalization of the submodel, a neural network is trained as part of the search process. In the embodiment of the invention, the generalization capability of the sub-model is measured by setting a verification set. One sub-model is trained using the generation of enhanced data by applying 5 sub-strategies to the training set. For each small lot case, one of the 5 sub-policies would be randomly chosen to enhance the picture. This sub-model would then be measured for accuracy over the validation set and would then be used as a feedback signal to train the recurrent neural network. On each dataset, the controller would sample 1.5 tens of thousands of policies.
At the end of the search, a single strategy is spliced from the sub-strategies of the 5 best strategies. This final strategy, which has 25 sub-strategies, would be used to train the model for each dataset.
S211, extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained through a jump connection algorithm.
S212, converting the portrait characteristic quantity into an image output parameter.
S213, outputting the portrait picture according to the image output parameters.
The embodiment of the invention provides an improved structure segmentation loss function:
Figure SMS_40
(6)
in the method, in the process of the invention,
Figure SMS_41
representing the loss of structure segmentationA loss function; />
Figure SMS_42
Representation->
Figure SMS_43
Weights of (2); />
Figure SMS_44
Representing the exponential logarithmic Dice loss; />
Figure SMS_45
Representation->
Figure SMS_46
Weights of (2); />
Figure SMS_47
Representing weighted exponential cross entropy.
Using
Figure SMS_48
And->
Figure SMS_49
Respectively calculating the exponential logarithmic Dice loss (+)>
Figure SMS_50
) And weighted exponential cross entropy
Figure SMS_51
) Weight of (2):
Figure SMS_52
(7)
in the method, in the process of the invention,
Figure SMS_53
representing the exponential logarithmic Dice loss; i represents a label; />
Figure SMS_54
Representation->
Figure SMS_55
Average value for i.
Figure SMS_56
(8)/>
Wherein X represents a pixel position; i represents a label; l represents the real label at X;
Figure SMS_57
represents Kronecker delta; />
Figure SMS_58
Representing the probability of SoftMax; />
Figure SMS_59
Is the pseudo count of missing labels in the addition smoothing training samples.
Figure SMS_60
(9)
Where X represents the pixel location and l represents the real label at X.
Figure SMS_61
Representation->
Figure SMS_62
In relation to the average value of X,
Figure SMS_63
representing the probability of SoftMax.
When calculating
Figure SMS_65
When (I)>
Figure SMS_67
Serving as part of the pixel X owned by tag i. />
Figure SMS_69
Is the pseudo count of missing labels in the addition smoothing training samples. />
Figure SMS_66
Wherein->
Figure SMS_68
Is the frequency of tag k and is the tag weight used to reduce the impact of more frequently occurring tags. By introducing index->
Figure SMS_70
And->
Figure SMS_71
Further controlling the nonlinearity of the loss. For simplicity the invention is used here +.>
Figure SMS_64
The penalty function uses tag weights
Figure SMS_72
To balance the tag frequency, this focus loss also balances between simple and difficult samples. The combination of exponential and logarithmic conversions for Focal and Dice Loss allows the network to be forced to focus on portions of the prediction inaccuracy to merge finer segmentation boundaries with accurate data distribution.
The invention provides a portrait extraction method under a complex background, which comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Example 3
An embodiment of the present invention provides a device for extracting a portrait in a complex background, as shown in fig. 3, including:
a first dividing unit 31, configured to divide the picture to be extracted into first non-overlapping blocks.
And a mapping unit 32, configured to map the first non-overlapping block to any dimension, so as to obtain a dimension parameter.
A first extraction unit 33, configured to extract a feature quantity in the dimension parameter.
And the first splicing unit 34 is used for splicing the feature quantities into blocks to obtain a first resolution feature map.
And a second segmentation unit 35, configured to segment a portrait feature map according to the first resolution feature map.
A third segmentation unit 36 for segmenting the blocks of the portrait feature map into second non-overlapping blocks.
A second extracting unit 37 for extracting the portrait characteristic amount of the second non-overlapping block, and fusing the portrait characteristic amount with the multi-dimensional characteristic amount obtained by the jump connection algorithm.
A conversion unit 38 for converting the portrait characteristic amount into an image output parameter.
An output unit 39 for outputting a portrait picture according to the image output parameter.
What needs to be explained here is: the detailed description of each part of this embodiment may refer to the corresponding parts of other embodiments, and will not be repeated here.
The invention provides a portrait extraction device under a complex background, which comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Example 4
The embodiment of the invention provides a portrait extraction device under a complex background, as shown in fig. 4, comprising:
a first dividing unit 41 for dividing the picture to be extracted into first non-overlapping blocks.
And a mapping unit 42, configured to map the first non-overlapping block to any dimension, to obtain a dimension parameter.
A first extraction unit 43, configured to extract a feature quantity in the dimension parameter.
And a first stitching unit 44, configured to stitch the blocks according to the feature values to obtain a first resolution feature map.
A third extracting unit 45 for extracting the feature quantity in the dimension parameter.
And a second splicing unit 46, configured to splice the blocks according to the feature values, and obtain a second resolution feature map. The resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
A fourth extraction unit 47 for extracting feature amounts in the dimensional parameters.
A third splicing unit 48, configured to splice the features into blocks according to the feature values, so as to obtain a third resolution feature map; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
And a second segmentation unit 49, configured to segment the portrait feature map according to the first resolution feature map.
A third segmentation unit 410, configured to segment the block of the portrait characteristic map into a second non-overlapping block.
A first segmentation module 4101 for segmenting the block of the first resolution feature map into blocks of a second resolution feature map.
A second segmentation module 4102 for segmenting the block of the second resolution profile into blocks of a third resolution profile.
A third partitioning module 4103 for partitioning a block of the third resolution feature map into the second non-overlapping block.
A second extracting unit 411 for extracting the portrait feature quantity of the second non-overlapping block, and fusing the portrait feature quantity with the multi-dimensional feature quantity obtained by the jump connection algorithm.
A conversion unit 412 for converting the portrait characteristic amount into an image output parameter.
An output unit 413 for outputting a portrait picture according to the image output parameter.
What needs to be explained here is: the detailed description of each part of this embodiment may refer to the corresponding parts of other embodiments, and will not be repeated here.
The invention provides a portrait extraction device under a complex background, which comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limited thereto; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description. In particular, the technical features mentioned in the respective embodiments may be combined in any manner as long as there is no structural conflict. The present invention is not limited to the specific embodiments disclosed herein, but encompasses all technical solutions falling within the scope of the claims.

Claims (8)

1. The method for extracting the portrait under the complex background is characterized by comprising the following steps:
dividing a picture to be extracted into a first non-overlapping block;
mapping the first non-overlapping block to any dimension to obtain a dimension parameter;
extracting characteristic quantities in the dimension parameters;
splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram;
dividing a portrait characteristic diagram according to the first resolution characteristic diagram;
dividing the blocks of the portrait characteristic map into second non-overlapping blocks;
extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm;
converting the portrait characteristic quantity into an image output parameter;
and outputting the portrait picture according to the image output parameters.
2. The method for extracting a person image under a complex background according to claim 1, further comprising, after the block is spliced according to the feature quantity to obtain a first resolution feature map:
extracting characteristic quantities in the dimension parameters;
splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
3. The method for extracting a person image under a complex background according to claim 2, further comprising, after the blocks are spliced according to the feature quantity to obtain a second resolution feature map:
extracting characteristic quantities in the dimension parameters;
splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
4. The method for extracting a figure under a complex background according to claim 1, wherein dividing the block of the figure feature map into second non-overlapping blocks comprises:
dividing the block of the first resolution feature map into blocks of a second resolution feature map;
dividing the blocks of the second resolution feature map into blocks of a third resolution feature map;
the block of the third resolution feature map is partitioned into the second non-overlapping blocks.
5. A portrait extraction device under a complex background, comprising:
the first segmentation unit is used for segmenting the picture to be extracted into first non-overlapping blocks;
the mapping unit is used for mapping the first non-overlapping block to any dimension to obtain a dimension parameter;
a first extraction unit for extracting feature quantities in the dimension parameters;
the first splicing unit is used for splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram;
the second segmentation unit is used for segmenting the portrait characteristic diagram according to the first resolution characteristic diagram;
a third dividing unit for dividing the blocks of the portrait characteristic map into second non-overlapping blocks;
a second extraction unit for extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by the jump connection algorithm;
a conversion unit for converting the portrait characteristic quantity into an image output parameter;
and the output unit is used for outputting the portrait picture according to the image output parameters.
6. The complex background portrait extraction apparatus of claim 5 further comprising:
a third extraction unit for extracting feature quantities in the dimension parameters;
the second splicing unit is used for splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
7. The complex background portrait extraction apparatus of claim 6 further comprising:
a fourth extraction unit for extracting feature quantities in the dimension parameters;
the third splicing unit is used for splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
8. The apparatus for extracting a person from a complex background according to claim 5, wherein the third dividing unit includes:
a first segmentation module for segmenting the block of the first resolution feature map into blocks of a second resolution feature map;
a second segmentation module for segmenting the blocks of the second resolution feature map into blocks of a third resolution feature map;
and a third segmentation module, configured to segment the block of the third resolution feature map into the second non-overlapping block.
CN202310474489.5A 2023-04-28 2023-04-28 Portrait extraction method and device under complex background Active CN116206114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310474489.5A CN116206114B (en) 2023-04-28 2023-04-28 Portrait extraction method and device under complex background

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310474489.5A CN116206114B (en) 2023-04-28 2023-04-28 Portrait extraction method and device under complex background

Publications (2)

Publication Number Publication Date
CN116206114A true CN116206114A (en) 2023-06-02
CN116206114B CN116206114B (en) 2023-08-01

Family

ID=86509785

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310474489.5A Active CN116206114B (en) 2023-04-28 2023-04-28 Portrait extraction method and device under complex background

Country Status (1)

Country Link
CN (1) CN116206114B (en)

Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020043296A1 (en) * 2018-08-30 2020-03-05 Huawei Technologies Co., Ltd. Device and method for separating a picture into foreground and background using deep learning
CN113191953A (en) * 2021-06-04 2021-07-30 山东财经大学 Transformer-based face image super-resolution method
WO2021169128A1 (en) * 2020-02-29 2021-09-02 平安科技(深圳)有限公司 Method and apparatus for recognizing and quantifying fundus retina vessel, and device and storage medium
US20210350168A1 (en) * 2019-03-01 2021-11-11 Huawei Technologies Co., Ltd. Image segmentation method and image processing apparatus
CN113870283A (en) * 2021-09-29 2021-12-31 深圳万兴软件有限公司 Image matting method and device, computer equipment and readable storage medium
CN114494296A (en) * 2022-01-27 2022-05-13 复旦大学 Brain glioma segmentation method and system based on fusion of Unet and Transformer
CN114511703A (en) * 2022-01-21 2022-05-17 苏州医智影科技有限公司 Migration learning method and system for fusing Swin Transformer and UNet and oriented to segmentation task
CN114565763A (en) * 2022-02-28 2022-05-31 北京百度网讯科技有限公司 Image segmentation method, apparatus, device, medium, and program product
CN114972746A (en) * 2022-04-13 2022-08-30 湖南大学 Medical image segmentation method based on multi-resolution overlapping attention mechanism
US20220319155A1 (en) * 2020-02-21 2022-10-06 Boe Technology Group Co., Ltd. Image Processing Method, Image Processing Apparatus, and Device
CN115330817A (en) * 2022-08-25 2022-11-11 上海健康医学院 Pneumothorax lesion segmentation method based on Swin-Unet and morphological processing and related equipment
CN115457043A (en) * 2022-03-23 2022-12-09 苏州迭代智能医疗科技有限公司 Image segmentation network based on overlapped self-attention deformer framework U-shaped network
CN115471470A (en) * 2022-09-14 2022-12-13 安徽大学 Esophageal cancer CT image segmentation method
CN115482382A (en) * 2022-09-17 2022-12-16 北京工业大学 Image semantic segmentation method based on Transformer architecture
CN115984560A (en) * 2022-12-26 2023-04-18 杭州电子科技大学 Image segmentation method based on CNN and Transformer
CN115994914A (en) * 2022-09-06 2023-04-21 中南民族大学 ATFormer architecture for medical image segmentation and corresponding method

Patent Citations (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020043296A1 (en) * 2018-08-30 2020-03-05 Huawei Technologies Co., Ltd. Device and method for separating a picture into foreground and background using deep learning
US20210350168A1 (en) * 2019-03-01 2021-11-11 Huawei Technologies Co., Ltd. Image segmentation method and image processing apparatus
US20220319155A1 (en) * 2020-02-21 2022-10-06 Boe Technology Group Co., Ltd. Image Processing Method, Image Processing Apparatus, and Device
WO2021169128A1 (en) * 2020-02-29 2021-09-02 平安科技(深圳)有限公司 Method and apparatus for recognizing and quantifying fundus retina vessel, and device and storage medium
CN113191953A (en) * 2021-06-04 2021-07-30 山东财经大学 Transformer-based face image super-resolution method
CN113870283A (en) * 2021-09-29 2021-12-31 深圳万兴软件有限公司 Image matting method and device, computer equipment and readable storage medium
CN114511703A (en) * 2022-01-21 2022-05-17 苏州医智影科技有限公司 Migration learning method and system for fusing Swin Transformer and UNet and oriented to segmentation task
CN114494296A (en) * 2022-01-27 2022-05-13 复旦大学 Brain glioma segmentation method and system based on fusion of Unet and Transformer
CN114565763A (en) * 2022-02-28 2022-05-31 北京百度网讯科技有限公司 Image segmentation method, apparatus, device, medium, and program product
CN115457043A (en) * 2022-03-23 2022-12-09 苏州迭代智能医疗科技有限公司 Image segmentation network based on overlapped self-attention deformer framework U-shaped network
CN114972746A (en) * 2022-04-13 2022-08-30 湖南大学 Medical image segmentation method based on multi-resolution overlapping attention mechanism
CN115330817A (en) * 2022-08-25 2022-11-11 上海健康医学院 Pneumothorax lesion segmentation method based on Swin-Unet and morphological processing and related equipment
CN115994914A (en) * 2022-09-06 2023-04-21 中南民族大学 ATFormer architecture for medical image segmentation and corresponding method
CN115471470A (en) * 2022-09-14 2022-12-13 安徽大学 Esophageal cancer CT image segmentation method
CN115482382A (en) * 2022-09-17 2022-12-16 北京工业大学 Image semantic segmentation method based on Transformer architecture
CN115984560A (en) * 2022-12-26 2023-04-18 杭州电子科技大学 Image segmentation method based on CNN and Transformer

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
CHENGYIN LI等: "FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT Images", 《ARXIV:2210.03189V1》, pages 1 - 13 *
JIANWEI YANG等: "Focal Self-attention for Local-Global Interactions in Vision Transformers", 《ARXIV:2107.00641V1》, pages 1 - 21 *
王恒: "基于深度学习的乳腺癌病理图像分类研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 2023, pages 072 - 1719 *
王欣: "基于深度学习的人像分割方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2023, pages 138 - 1939 *
辛沐霖: "基于多尺度分析与数据均衡化的多角度肝内血管分割", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 2022, pages 064 - 12 *

Also Published As

Publication number Publication date
CN116206114B (en) 2023-08-01

Similar Documents

Publication Publication Date Title
CN111191663B (en) License plate number recognition method and device, electronic equipment and storage medium
CN113362329B (en) Method for training focus detection model and method for recognizing focus in image
CN113936256A (en) Image target detection method, device, equipment and storage medium
CN110717851A (en) Image processing method and device, neural network training method and storage medium
CN110555433A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN116645592B (en) Crack detection method based on image processing and storage medium
CN114758288A (en) Power distribution network engineering safety control detection method and device
CN105701489A (en) Novel digital extraction and identification method and system thereof
CN112836625A (en) Face living body detection method and device and electronic equipment
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN113378897A (en) Neural network-based remote sensing image classification method, computing device and storage medium
CN113012177A (en) Three-dimensional point cloud segmentation method based on geometric feature extraction and edge perception coding
Chen et al. SARAS-net: scale and relation aware siamese network for change detection
CN116740422A (en) Remote sensing image classification method and device based on multi-mode attention fusion technology
CN116977674A (en) Image matching method, related device, storage medium and program product
CN115690797A (en) Character recognition method, device, equipment and storage medium
Pan et al. An adaptive multifeature method for semiautomatic road extraction from high-resolution stereo mapping satellite images
Li et al. Maskformer with improved encoder-decoder module for semantic segmentation of fine-resolution remote sensing images
CN116206114B (en) Portrait extraction method and device under complex background
Chen et al. Towards deep and efficient: A deep Siamese self-attention fully efficient convolutional network for change detection in VHR images
Chacon-Murguia et al. Moving object detection in video sequences based on a two-frame temporal information CNN
CN116310832A (en) Remote sensing image processing method, device, equipment, medium and product
CN115713624A (en) Self-adaptive fusion semantic segmentation method for enhancing multi-scale features of remote sensing image
Chen et al. Exploring efficient and effective generative adversarial network for thermal infrared image colorization
CN113486879A (en) Image area suggestion frame detection method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant