CN116206114A - Portrait extraction method and device under complex background - Google Patents
Portrait extraction method and device under complex background Download PDFInfo
- Publication number
- CN116206114A CN116206114A CN202310474489.5A CN202310474489A CN116206114A CN 116206114 A CN116206114 A CN 116206114A CN 202310474489 A CN202310474489 A CN 202310474489A CN 116206114 A CN116206114 A CN 116206114A
- Authority
- CN
- China
- Prior art keywords
- resolution
- portrait
- blocks
- characteristic
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a method and a device for extracting a portrait in a complex background, which relate to the technical field of picture processing, and can automatically and accurately extract the portrait in the complex background without manually extracting the portrait by a user; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide. The key points of the scheme are as follows: dividing a picture to be extracted into a first non-overlapping block; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the blocks according to the feature quantity to obtain a first resolution feature map; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; and outputting the portrait picture according to the image output parameters. The method is mainly used for extracting the image.
Description
Technical Field
The invention relates to the technical field of picture processing, in particular to a method and a device for extracting a portrait under a complex background.
Background
The name of the portrait extraction technology is that the portrait is extracted from the picture, and in recent years, along with the development of scientific technology, the portrait extraction technology still has a plurality of problems, such as inaccurate portrait matting, and incapability of separating the face from the background more accurately; the edge detail is rough; the position of the person is more limited during photographing; without face tilt correction, etc.
The existing geometric feature-based identification method also has difficulty in that a unified and excellent feature extraction standard is not formed. As the facial patterns of adults become variable, even facial images of the same person are difficult to express in a uniform pattern due to time, illumination, camera angles, and the like, resulting in difficulty in feature extraction.
Disclosure of Invention
The invention provides a method and a device for extracting a portrait under a complex background, wherein the method comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
In order to achieve the above purpose, the invention adopts the following technical scheme:
the first aspect of the present invention provides a method for extracting a portrait in a complex background, including:
and dividing the picture to be extracted into a first non-overlapping block.
Mapping the first non-overlapping block to any dimension to obtain a dimension parameter.
And extracting the characteristic quantity in the dimension parameter.
And splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram.
And dividing the portrait characteristic diagram according to the first resolution characteristic diagram.
Dividing the blocks of the portrait characteristic map into second non-overlapping blocks.
And extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm.
And converting the portrait characteristic quantity into an image output parameter.
And outputting the portrait picture according to the image output parameters.
Further, the method for extracting the portrait under the complex background further comprises, after the feature quantity is spliced into blocks to obtain a first resolution feature map:
and extracting the characteristic quantity in the dimension parameter.
Splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
Further, the method for extracting the portrait under the complex background further comprises, after splicing the feature quantities into blocks to obtain a second resolution feature map:
and extracting the characteristic quantity in the dimension parameter.
Splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
Further, the method for extracting the portrait under the complex background divides the blocks of the portrait feature map into second non-overlapping blocks, including:
dividing the block of the first resolution feature map into blocks of a second resolution feature map.
Dividing the blocks of the second resolution feature map into blocks of a third resolution feature map.
The block of the third resolution feature map is partitioned into the second non-overlapping blocks.
A second aspect of the present invention provides a portrait extraction device under a complex background, including:
and the first segmentation unit is used for segmenting the picture to be extracted into first non-overlapping blocks.
And the mapping unit is used for mapping the first non-overlapping block to any dimension to obtain a dimension parameter.
And the first extraction unit is used for extracting the characteristic quantity in the dimension parameter.
And the first splicing unit is used for splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram.
And the second segmentation unit is used for segmenting the portrait characteristic diagram according to the resolution characteristic diagram.
And the third segmentation unit is used for segmenting the blocks of the portrait characteristic diagram into second non-overlapping blocks.
And the second extraction unit is used for extracting the portrait characteristic quantity of the second non-overlapping block and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by the jump connection algorithm.
And the conversion unit is used for converting the portrait characteristic quantity into an image output parameter.
And the output unit is used for outputting the portrait picture according to the image output parameters.
Further, the device for extracting the portrait under the complex background further comprises:
and the third extraction unit is used for extracting the characteristic quantity in the dimension parameter.
The second splicing unit is used for splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
Further, the device for extracting the portrait under the complex background further comprises:
and a fourth extraction unit, configured to extract a feature quantity in the dimension parameter.
The third splicing unit is used for splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
Further, in the image extraction device under a complex background, the third segmentation unit includes:
and the first segmentation module is used for segmenting the block of the first resolution characteristic map into blocks of the second resolution characteristic map.
And the second segmentation module is used for segmenting the blocks of the second resolution characteristic map into blocks of the third resolution characteristic map.
And a third segmentation module, configured to segment the block of the third resolution feature map into the second non-overlapping block.
The invention provides a method and a device for extracting a portrait under a complex background, wherein the method comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are used in the description of the embodiments will be briefly described below, which are only for the purpose of illustrating the embodiments and are not to be construed as limiting the present invention.
FIG. 1 is a schematic flow chart of a method for extracting images under a complex background in an embodiment of the invention;
FIG. 2 is a schematic flow chart of a method for extracting images under another complex background according to an embodiment of the invention;
FIG. 3 is a schematic diagram of a configuration of a portrait extraction device under a complex background according to an embodiment of the present invention;
fig. 4 is a schematic diagram of a configuration of a portrait extraction device under another complex background according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs; the terminology used in the description of the invention herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention and the terms "comprising" and "having" and any variations thereof, as described in the specification and claims of the invention and the above description of the drawings, are intended to cover a non-exclusive inclusion.
In the description of embodiments of the present invention, the technical terms "first," "second," and the like are used merely to distinguish between different objects and are not to be construed as indicating or implying a relative importance or implicitly indicating the number of technical features indicated, a particular order or a primary or secondary relationship. In the description of the embodiments of the present invention, the meaning of "plurality" is two or more unless explicitly defined otherwise.
In the description of the embodiments of the present invention, the term "and/or" is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship.
In the description of the embodiments of the present invention, the term "plurality" means two or more (including two), and similarly, "plural sets" means two or more (including two), and "plural sheets" means two or more (including two).
In the description of the embodiments of the present invention, the orientation or positional relationship indicated by the technical terms "center", "longitudinal", "transverse", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. are based on the orientation or positional relationship shown in the drawings, and are merely for convenience of describing the embodiments of the present invention and for simplifying the description, rather than indicating or implying that the apparatus or component to be referred to must have a specific orientation, be constructed and operated in a specific orientation, and therefore should not be construed as limiting the embodiments of the present invention.
In the description of the embodiments of the present invention, unless explicitly specified and limited otherwise, the terms "mounted," "connected," "secured" and the like should be construed broadly and may be, for example, fixedly connected, detachably connected, or integrally formed; or may be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between the two components or interaction relationship between the two components. The specific meaning of the above terms in the examples of the present invention will be understood by those skilled in the art according to the specific circumstances.
Example 1
The embodiment of the invention provides a method for extracting a human image under a complex background, which is shown in fig. 1 and comprises the following steps:
s1, dividing a picture to be extracted into a first non-overlapping block.
What needs to be explained here is: the embodiment of the invention does not limit the picture to be extracted, but the picture to be extracted necessarily contains the portrait.
Wherein, the non-overlapping blocks, as the name implies, are non-overlapping picture blocks, the size of the first non-overlapping block after division is not limited in the embodiments of the present invention.
S2, mapping the first non-overlapping block to any dimension to obtain a dimension parameter.
Where dimension is also known as dimension, is the number of independent parameters in mathematics. In the field of physics and philosophy, the number of independent space-time coordinates is meant.
And S3, extracting the characteristic quantity in the dimension parameter.
And S4, splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram.
S5, dividing the portrait characteristic diagram according to the first resolution characteristic diagram.
S6, dividing the blocks of the portrait characteristic diagram into second non-overlapping blocks.
And S7, extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained through a jump connection algorithm.
S8, converting the portrait characteristic quantity into an image output parameter.
S9, outputting the portrait picture according to the image output parameters.
The present invention achieves the above steps by FT-UNet, which consists of encoder, bottleneck, decoder and skip connections. The basic unit of FT-UNet is the Focal Transformer module. For the encoder, to convert the input into sequence embedding, the image is divided into non-overlapping blocks of 4×4 size. The patch metering layer is responsible for downsampling and upscaling dimensions, and Focal Transformer is responsible for learning feature representations. The extracted context features are fused with the multi-scale features from the encoder via skip connection to supplement the spatial information loss caused by downsampling. The patch merge layer duplicates the feature map of the adjacent dimension to a large feature map with twice the resolution of the upsampling. Finally, 4 x upsampling is performed using the patch expanding layer to restore the resolution of the feature map to the input resolution W x H, and then a linear projection layer is applied over these upsampled features to output the image extraction.
The Focal Transformer structure comprises a patch part layer, a linear embedded layer, a Focal Transformer layer and a patch raising layer. The patch partition layer is used for block dimension reduction, the linear embedding layer is used for linear transformation, and the patch raising layer is used for downsampling. The core module is a Focal Transformer block that contains focus self-attention for moving windows.
Wherein the focus self-attention model is as follows:
assume that a feature map is inputWherein->And d is a feature dimension. All L-level pool windows. L is the number of granularity levels of focus self-attention extraction tags. For the focus level l, the input feature map x is first divided into a size +.>Is provided. />Is indicated at->The level gets the size of the sub-window of the summary mark by using the linear layer +.>Spatially merging sub-windows: />
Wherein x represents an input feature map; l represents a focus level; m, N the spatial dimension; d represents a feature dimension;is indicated at->The level obtains the size of the sub-window of the summary mark; />Representing a linear layer.
When the collected feature map of all L layers is obtainedWhen three linear projection layers are used +.>、/>And->To calculate the first tier query and the keys and values of all tiers:
wherein Q, K, V is a query, key, and value matrix, respectively; x represents the input feature map;、/>and->Representing a linear projection layer.
To perform the focus self-intent, surrounding labels for each query label in the feature map are first extracted. For the ith windowInquiry in->Representing window division +.>Representing the number of horizontal and vertical sub-windows in the level I participation area, from +.>And->Extract->Key and value, then collecting key and value from all L +.>And->Wherein s is the sum of the focal areas of all layers, i.e. +.>. Here, the +.>Focus self-attention of (c):
in the method, in the process of the invention,、/>and->Respectively representing the query, the key and the value matrix, d is the vector dimension, B is the deflection matrix, and SoftMax is the multi-class activation function.
The invention provides a portrait extraction method under a complex background, which comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Example 2
The embodiment of the invention provides a method for extracting a human image under a complex background, which is shown in fig. 2 and comprises the following steps:
s201, dividing a picture to be extracted into first non-overlapping blocks.
Specifically, the picture to be extracted is divided into first non-overlapping blocks with the same size, for example: the picture to be extracted is partitioned into non-overlapping blocks of 4 x 4 size, and what needs to be explained here is: the size of the non-overlapping blocks is not limited in the embodiment of the invention, and an implementer can correspondingly determine according to the size of the picture to be extracted.
S202, mapping the first non-overlapping block to any dimension to obtain a dimension parameter.
Specifically, the first non-overlapping block is mapped to three dimensions to obtain its parameters, such as: non-overlapping blocks of size 4 x 4 are mapped to three dimensions, resulting in dimension parameters of 4 x 3.
S203, extracting the characteristic quantity in the dimension parameter.
S204, splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram.
S205, extracting the characteristic quantity in the dimension parameter.
S206, splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
S207, extracting the characteristic quantity in the dimension parameter.
S208, splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
S209, dividing a portrait characteristic diagram according to the first resolution characteristic diagram.
S210, dividing the blocks of the portrait characteristic diagram into second non-overlapping blocks.
S2101, dividing the block of the first resolution feature map into blocks of a second resolution feature map.
S2102, dividing the block of the second resolution feature map into blocks of a third resolution feature map.
S2103, dividing the block of the third resolution feature map into the second non-overlapping blocks.
In the case of insufficient raw data, the data set is expanded by data enhancement. A common way to augment a data set is to add new data, but in reality this is difficult to implement. In another mode, the data set is subjected to operations such as overturning, rotating and shearing, namely the data is enhanced, so that the data is expanded, and the mode is high in operability and convenient to realize. Data enhancement is performed by pixel-to-pixel spatial transformation in the image. The following formula is used for the coordinate transformation:
where (v, w) represents coordinates of pixels in the original image and (x, y) represents coordinates in the transformed graphic.
Affine transformation is a common transformation, the general form of which is given by the formula:
in the method, in the process of the invention,representing coordinates of pixels in the original image, +.>Representing coordinates in the transformed graph.
The embodiment of the invention provides a data enhancement method Auto-segment based on search. The basic idea is to find the best image transformation strategy from the data itself using reinforcement learning and evaluate the quality of a particular strategy directly on some datasets. In the invention, a search space is designed, each policy is composed of a plurality of sub-policies, and one sub-policy is randomly selected for each image in each batch. The sub-strategy contains two operations, each of which is an image processing method such as translation, rotation or clipping, and for each of which there is a set of probabilities and magnitudes to characterize the nature of the use of this operation.
The search algorithm has two components: one part is the controller and the other part is the training algorithm PPO (Proximal Policy Optimization) algorithm. In each operation, the controller predicts the result output by SoftMax to generate a feature vector, and then takes the feature vector as an embedded vector of the next operation. The search algorithm has two components, one controller is a recurrent neural network and one training algorithm is an approximate policy optimization algorithm. Each step controller predicts a decision made by SoftMax. And then sent to the next step as an embedding. The controller predicts 30 SoftMax predictions for a total of 5 sub-policies, each with 2 operations, each requiring an operation type, intensity, probability.
Training of a controller: the controller is trained by the reward signal, i.e., how much the strategy can promote the generalization of the submodel, a neural network is trained as part of the search process. In the embodiment of the invention, the generalization capability of the sub-model is measured by setting a verification set. One sub-model is trained using the generation of enhanced data by applying 5 sub-strategies to the training set. For each small lot case, one of the 5 sub-policies would be randomly chosen to enhance the picture. This sub-model would then be measured for accuracy over the validation set and would then be used as a feedback signal to train the recurrent neural network. On each dataset, the controller would sample 1.5 tens of thousands of policies.
At the end of the search, a single strategy is spliced from the sub-strategies of the 5 best strategies. This final strategy, which has 25 sub-strategies, would be used to train the model for each dataset.
S211, extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained through a jump connection algorithm.
S212, converting the portrait characteristic quantity into an image output parameter.
S213, outputting the portrait picture according to the image output parameters.
The embodiment of the invention provides an improved structure segmentation loss function:
in the method, in the process of the invention,representing the loss of structure segmentationA loss function; />Representation->Weights of (2); />Representing the exponential logarithmic Dice loss; />Representation->Weights of (2); />Representing weighted exponential cross entropy.
UsingAnd->Respectively calculating the exponential logarithmic Dice loss (+)>) And weighted exponential cross entropy) Weight of (2):
in the method, in the process of the invention,representing the exponential logarithmic Dice loss; i represents a label; />Representation->Average value for i.
Wherein X represents a pixel position; i represents a label; l represents the real label at X;represents Kronecker delta; />Representing the probability of SoftMax; />Is the pseudo count of missing labels in the addition smoothing training samples.
Where X represents the pixel location and l represents the real label at X.Representation->In relation to the average value of X,representing the probability of SoftMax.
When calculatingWhen (I)>Serving as part of the pixel X owned by tag i. />Is the pseudo count of missing labels in the addition smoothing training samples. />Wherein->Is the frequency of tag k and is the tag weight used to reduce the impact of more frequently occurring tags. By introducing index->And->Further controlling the nonlinearity of the loss. For simplicity the invention is used here +.>。
The penalty function uses tag weightsTo balance the tag frequency, this focus loss also balances between simple and difficult samples. The combination of exponential and logarithmic conversions for Focal and Dice Loss allows the network to be forced to focus on portions of the prediction inaccuracy to merge finer segmentation boundaries with accurate data distribution.
The invention provides a portrait extraction method under a complex background, which comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Example 3
An embodiment of the present invention provides a device for extracting a portrait in a complex background, as shown in fig. 3, including:
a first dividing unit 31, configured to divide the picture to be extracted into first non-overlapping blocks.
And a mapping unit 32, configured to map the first non-overlapping block to any dimension, so as to obtain a dimension parameter.
A first extraction unit 33, configured to extract a feature quantity in the dimension parameter.
And the first splicing unit 34 is used for splicing the feature quantities into blocks to obtain a first resolution feature map.
And a second segmentation unit 35, configured to segment a portrait feature map according to the first resolution feature map.
A third segmentation unit 36 for segmenting the blocks of the portrait feature map into second non-overlapping blocks.
A second extracting unit 37 for extracting the portrait characteristic amount of the second non-overlapping block, and fusing the portrait characteristic amount with the multi-dimensional characteristic amount obtained by the jump connection algorithm.
A conversion unit 38 for converting the portrait characteristic amount into an image output parameter.
An output unit 39 for outputting a portrait picture according to the image output parameter.
What needs to be explained here is: the detailed description of each part of this embodiment may refer to the corresponding parts of other embodiments, and will not be repeated here.
The invention provides a portrait extraction device under a complex background, which comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Example 4
The embodiment of the invention provides a portrait extraction device under a complex background, as shown in fig. 4, comprising:
a first dividing unit 41 for dividing the picture to be extracted into first non-overlapping blocks.
And a mapping unit 42, configured to map the first non-overlapping block to any dimension, to obtain a dimension parameter.
A first extraction unit 43, configured to extract a feature quantity in the dimension parameter.
And a first stitching unit 44, configured to stitch the blocks according to the feature values to obtain a first resolution feature map.
A third extracting unit 45 for extracting the feature quantity in the dimension parameter.
And a second splicing unit 46, configured to splice the blocks according to the feature values, and obtain a second resolution feature map. The resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
A fourth extraction unit 47 for extracting feature amounts in the dimensional parameters.
A third splicing unit 48, configured to splice the features into blocks according to the feature values, so as to obtain a third resolution feature map; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
And a second segmentation unit 49, configured to segment the portrait feature map according to the first resolution feature map.
A third segmentation unit 410, configured to segment the block of the portrait characteristic map into a second non-overlapping block.
A first segmentation module 4101 for segmenting the block of the first resolution feature map into blocks of a second resolution feature map.
A second segmentation module 4102 for segmenting the block of the second resolution profile into blocks of a third resolution profile.
A third partitioning module 4103 for partitioning a block of the third resolution feature map into the second non-overlapping block.
A second extracting unit 411 for extracting the portrait feature quantity of the second non-overlapping block, and fusing the portrait feature quantity with the multi-dimensional feature quantity obtained by the jump connection algorithm.
A conversion unit 412 for converting the portrait characteristic amount into an image output parameter.
An output unit 413 for outputting a portrait picture according to the image output parameter.
What needs to be explained here is: the detailed description of each part of this embodiment may refer to the corresponding parts of other embodiments, and will not be repeated here.
The invention provides a portrait extraction device under a complex background, which comprises the steps of dividing a picture to be extracted into first non-overlapping blocks; mapping the first non-overlapping block to any dimension to obtain a dimension parameter; extracting characteristic quantities in the dimension parameters; splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram; dividing a portrait characteristic diagram according to the first resolution characteristic diagram; dividing the blocks of the portrait characteristic map into second non-overlapping blocks; extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm; converting the portrait characteristic quantity into an image output parameter; compared with the prior art, the invention does not need a user to manually extract the portrait, and can automatically and accurately extract the portrait under a complex background; the picture to be extracted is free from the limitation of places and photographing positions, and the application range is wide.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limited thereto; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some or all of the technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit of the invention, and are intended to be included within the scope of the appended claims and description. In particular, the technical features mentioned in the respective embodiments may be combined in any manner as long as there is no structural conflict. The present invention is not limited to the specific embodiments disclosed herein, but encompasses all technical solutions falling within the scope of the claims.
Claims (8)
1. The method for extracting the portrait under the complex background is characterized by comprising the following steps:
dividing a picture to be extracted into a first non-overlapping block;
mapping the first non-overlapping block to any dimension to obtain a dimension parameter;
extracting characteristic quantities in the dimension parameters;
splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram;
dividing a portrait characteristic diagram according to the first resolution characteristic diagram;
dividing the blocks of the portrait characteristic map into second non-overlapping blocks;
extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by a jump connection algorithm;
converting the portrait characteristic quantity into an image output parameter;
and outputting the portrait picture according to the image output parameters.
2. The method for extracting a person image under a complex background according to claim 1, further comprising, after the block is spliced according to the feature quantity to obtain a first resolution feature map:
extracting characteristic quantities in the dimension parameters;
splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
3. The method for extracting a person image under a complex background according to claim 2, further comprising, after the blocks are spliced according to the feature quantity to obtain a second resolution feature map:
extracting characteristic quantities in the dimension parameters;
splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
4. The method for extracting a figure under a complex background according to claim 1, wherein dividing the block of the figure feature map into second non-overlapping blocks comprises:
dividing the block of the first resolution feature map into blocks of a second resolution feature map;
dividing the blocks of the second resolution feature map into blocks of a third resolution feature map;
the block of the third resolution feature map is partitioned into the second non-overlapping blocks.
5. A portrait extraction device under a complex background, comprising:
the first segmentation unit is used for segmenting the picture to be extracted into first non-overlapping blocks;
the mapping unit is used for mapping the first non-overlapping block to any dimension to obtain a dimension parameter;
a first extraction unit for extracting feature quantities in the dimension parameters;
the first splicing unit is used for splicing the characteristic quantities into blocks to obtain a first resolution characteristic diagram;
the second segmentation unit is used for segmenting the portrait characteristic diagram according to the first resolution characteristic diagram;
a third dividing unit for dividing the blocks of the portrait characteristic map into second non-overlapping blocks;
a second extraction unit for extracting the portrait characteristic quantity of the second non-overlapping block, and fusing the portrait characteristic quantity with the multidimensional characteristic quantity obtained by the jump connection algorithm;
a conversion unit for converting the portrait characteristic quantity into an image output parameter;
and the output unit is used for outputting the portrait picture according to the image output parameters.
6. The complex background portrait extraction apparatus of claim 5 further comprising:
a third extraction unit for extracting feature quantities in the dimension parameters;
the second splicing unit is used for splicing the characteristic quantities into blocks to obtain a second resolution characteristic diagram; the resolution of the second resolution feature map is greater than the resolution of the first resolution feature map.
7. The complex background portrait extraction apparatus of claim 6 further comprising:
a fourth extraction unit for extracting feature quantities in the dimension parameters;
the third splicing unit is used for splicing the characteristic quantities into blocks to obtain a third resolution characteristic diagram; the resolution of the third resolution feature map is greater than the resolution of the second resolution feature map.
8. The apparatus for extracting a person from a complex background according to claim 5, wherein the third dividing unit includes:
a first segmentation module for segmenting the block of the first resolution feature map into blocks of a second resolution feature map;
a second segmentation module for segmenting the blocks of the second resolution feature map into blocks of a third resolution feature map;
and a third segmentation module, configured to segment the block of the third resolution feature map into the second non-overlapping block.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310474489.5A CN116206114B (en) | 2023-04-28 | 2023-04-28 | Portrait extraction method and device under complex background |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310474489.5A CN116206114B (en) | 2023-04-28 | 2023-04-28 | Portrait extraction method and device under complex background |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116206114A true CN116206114A (en) | 2023-06-02 |
CN116206114B CN116206114B (en) | 2023-08-01 |
Family
ID=86509785
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310474489.5A Active CN116206114B (en) | 2023-04-28 | 2023-04-28 | Portrait extraction method and device under complex background |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116206114B (en) |
Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020043296A1 (en) * | 2018-08-30 | 2020-03-05 | Huawei Technologies Co., Ltd. | Device and method for separating a picture into foreground and background using deep learning |
CN113191953A (en) * | 2021-06-04 | 2021-07-30 | 山东财经大学 | Transformer-based face image super-resolution method |
WO2021169128A1 (en) * | 2020-02-29 | 2021-09-02 | 平安科技(深圳)有限公司 | Method and apparatus for recognizing and quantifying fundus retina vessel, and device and storage medium |
US20210350168A1 (en) * | 2019-03-01 | 2021-11-11 | Huawei Technologies Co., Ltd. | Image segmentation method and image processing apparatus |
CN113870283A (en) * | 2021-09-29 | 2021-12-31 | 深圳万兴软件有限公司 | Image matting method and device, computer equipment and readable storage medium |
CN114494296A (en) * | 2022-01-27 | 2022-05-13 | 复旦大学 | Brain glioma segmentation method and system based on fusion of Unet and Transformer |
CN114511703A (en) * | 2022-01-21 | 2022-05-17 | 苏州医智影科技有限公司 | Migration learning method and system for fusing Swin Transformer and UNet and oriented to segmentation task |
CN114565763A (en) * | 2022-02-28 | 2022-05-31 | 北京百度网讯科技有限公司 | Image segmentation method, apparatus, device, medium, and program product |
CN114972746A (en) * | 2022-04-13 | 2022-08-30 | 湖南大学 | Medical image segmentation method based on multi-resolution overlapping attention mechanism |
US20220319155A1 (en) * | 2020-02-21 | 2022-10-06 | Boe Technology Group Co., Ltd. | Image Processing Method, Image Processing Apparatus, and Device |
CN115330817A (en) * | 2022-08-25 | 2022-11-11 | 上海健康医学院 | Pneumothorax lesion segmentation method based on Swin-Unet and morphological processing and related equipment |
CN115457043A (en) * | 2022-03-23 | 2022-12-09 | 苏州迭代智能医疗科技有限公司 | Image segmentation network based on overlapped self-attention deformer framework U-shaped network |
CN115471470A (en) * | 2022-09-14 | 2022-12-13 | 安徽大学 | Esophageal cancer CT image segmentation method |
CN115482382A (en) * | 2022-09-17 | 2022-12-16 | 北京工业大学 | Image semantic segmentation method based on Transformer architecture |
CN115984560A (en) * | 2022-12-26 | 2023-04-18 | 杭州电子科技大学 | Image segmentation method based on CNN and Transformer |
CN115994914A (en) * | 2022-09-06 | 2023-04-21 | 中南民族大学 | ATFormer architecture for medical image segmentation and corresponding method |
-
2023
- 2023-04-28 CN CN202310474489.5A patent/CN116206114B/en active Active
Patent Citations (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020043296A1 (en) * | 2018-08-30 | 2020-03-05 | Huawei Technologies Co., Ltd. | Device and method for separating a picture into foreground and background using deep learning |
US20210350168A1 (en) * | 2019-03-01 | 2021-11-11 | Huawei Technologies Co., Ltd. | Image segmentation method and image processing apparatus |
US20220319155A1 (en) * | 2020-02-21 | 2022-10-06 | Boe Technology Group Co., Ltd. | Image Processing Method, Image Processing Apparatus, and Device |
WO2021169128A1 (en) * | 2020-02-29 | 2021-09-02 | 平安科技(深圳)有限公司 | Method and apparatus for recognizing and quantifying fundus retina vessel, and device and storage medium |
CN113191953A (en) * | 2021-06-04 | 2021-07-30 | 山东财经大学 | Transformer-based face image super-resolution method |
CN113870283A (en) * | 2021-09-29 | 2021-12-31 | 深圳万兴软件有限公司 | Image matting method and device, computer equipment and readable storage medium |
CN114511703A (en) * | 2022-01-21 | 2022-05-17 | 苏州医智影科技有限公司 | Migration learning method and system for fusing Swin Transformer and UNet and oriented to segmentation task |
CN114494296A (en) * | 2022-01-27 | 2022-05-13 | 复旦大学 | Brain glioma segmentation method and system based on fusion of Unet and Transformer |
CN114565763A (en) * | 2022-02-28 | 2022-05-31 | 北京百度网讯科技有限公司 | Image segmentation method, apparatus, device, medium, and program product |
CN115457043A (en) * | 2022-03-23 | 2022-12-09 | 苏州迭代智能医疗科技有限公司 | Image segmentation network based on overlapped self-attention deformer framework U-shaped network |
CN114972746A (en) * | 2022-04-13 | 2022-08-30 | 湖南大学 | Medical image segmentation method based on multi-resolution overlapping attention mechanism |
CN115330817A (en) * | 2022-08-25 | 2022-11-11 | 上海健康医学院 | Pneumothorax lesion segmentation method based on Swin-Unet and morphological processing and related equipment |
CN115994914A (en) * | 2022-09-06 | 2023-04-21 | 中南民族大学 | ATFormer architecture for medical image segmentation and corresponding method |
CN115471470A (en) * | 2022-09-14 | 2022-12-13 | 安徽大学 | Esophageal cancer CT image segmentation method |
CN115482382A (en) * | 2022-09-17 | 2022-12-16 | 北京工业大学 | Image semantic segmentation method based on Transformer architecture |
CN115984560A (en) * | 2022-12-26 | 2023-04-18 | 杭州电子科技大学 | Image segmentation method based on CNN and Transformer |
Non-Patent Citations (5)
Title |
---|
CHENGYIN LI等: "FocalUNETR: A Focal Transformer for Boundary-aware Segmentation of CT Images", 《ARXIV:2210.03189V1》, pages 1 - 13 * |
JIANWEI YANG等: "Focal Self-attention for Local-Global Interactions in Vision Transformers", 《ARXIV:2107.00641V1》, pages 1 - 21 * |
王恒: "基于深度学习的乳腺癌病理图像分类研究", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 2023, pages 072 - 1719 * |
王欣: "基于深度学习的人像分割方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 2023, pages 138 - 1939 * |
辛沐霖: "基于多尺度分析与数据均衡化的多角度肝内血管分割", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 2022, pages 064 - 12 * |
Also Published As
Publication number | Publication date |
---|---|
CN116206114B (en) | 2023-08-01 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111191663B (en) | License plate number recognition method and device, electronic equipment and storage medium | |
CN113362329B (en) | Method for training focus detection model and method for recognizing focus in image | |
CN113936256A (en) | Image target detection method, device, equipment and storage medium | |
CN110717851A (en) | Image processing method and device, neural network training method and storage medium | |
CN110555433A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN116645592B (en) | Crack detection method based on image processing and storage medium | |
CN114758288A (en) | Power distribution network engineering safety control detection method and device | |
CN105701489A (en) | Novel digital extraction and identification method and system thereof | |
CN112836625A (en) | Face living body detection method and device and electronic equipment | |
CN112036260A (en) | Expression recognition method and system for multi-scale sub-block aggregation in natural environment | |
CN113378897A (en) | Neural network-based remote sensing image classification method, computing device and storage medium | |
CN113012177A (en) | Three-dimensional point cloud segmentation method based on geometric feature extraction and edge perception coding | |
Chen et al. | SARAS-net: scale and relation aware siamese network for change detection | |
CN116740422A (en) | Remote sensing image classification method and device based on multi-mode attention fusion technology | |
CN116977674A (en) | Image matching method, related device, storage medium and program product | |
CN115690797A (en) | Character recognition method, device, equipment and storage medium | |
Pan et al. | An adaptive multifeature method for semiautomatic road extraction from high-resolution stereo mapping satellite images | |
Li et al. | Maskformer with improved encoder-decoder module for semantic segmentation of fine-resolution remote sensing images | |
CN116206114B (en) | Portrait extraction method and device under complex background | |
Chen et al. | Towards deep and efficient: A deep Siamese self-attention fully efficient convolutional network for change detection in VHR images | |
Chacon-Murguia et al. | Moving object detection in video sequences based on a two-frame temporal information CNN | |
CN116310832A (en) | Remote sensing image processing method, device, equipment, medium and product | |
CN115713624A (en) | Self-adaptive fusion semantic segmentation method for enhancing multi-scale features of remote sensing image | |
Chen et al. | Exploring efficient and effective generative adversarial network for thermal infrared image colorization | |
CN113486879A (en) | Image area suggestion frame detection method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |