CN111639641A - Clothing area acquisition method and device - Google Patents

Clothing area acquisition method and device Download PDF

Info

Publication number
CN111639641A
CN111639641A CN202010365522.7A CN202010365522A CN111639641A CN 111639641 A CN111639641 A CN 111639641A CN 202010365522 A CN202010365522 A CN 202010365522A CN 111639641 A CN111639641 A CN 111639641A
Authority
CN
China
Prior art keywords
skin
region
area
hand
clothing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010365522.7A
Other languages
Chinese (zh)
Other versions
CN111639641B (en
Inventor
黄磊
魏志强
张文锋
张明林
魏冠群
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ocean University of China
Original Assignee
Ocean University of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ocean University of China filed Critical Ocean University of China
Priority to CN202010365522.7A priority Critical patent/CN111639641B/en
Publication of CN111639641A publication Critical patent/CN111639641A/en
Application granted granted Critical
Publication of CN111639641B publication Critical patent/CN111639641B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Abstract

The invention discloses a clothing region acquisition method and a device, wherein the clothing region acquisition method comprises the following steps: step 1, positioning the hand position; step 2, initial clothing area positioning: after the hand area is obtained, the initial position of the clothing area is obtained between the two hands and in the area below the two hands according to the proportion; step 3, obtaining a precise clothing region: and after the initial clothing region is obtained, obtaining an accurate clothing region by means of image segmentation. The method is suitable for obtaining the clothing area which is not worn on the human body, and meets the requirements of intelligent home furnishing and intelligent retail. Furthermore, the present invention proposes to use the hand region as a reference for garment region acquisition, which is more adaptive than acquiring its region by only the visual characteristics of the garment.

Description

Clothing area acquisition method and device
Technical Field
The invention belongs to the technical field of computer vision, relates to image processing and automatic identification technology, and particularly relates to a clothing region acquisition method and device.
Background
The automatic acquisition of the clothing region plays an important role in many application fields, such as person re-identification, image processing and modification, person dressing trend analysis and the like in intelligent monitoring. The above fields are all the extraction and analysis of the clothing region on the human body, and the means for obtaining the clothing region mainly include two types: a method based on face detection and a method based on human body posture estimation.
The method based on the face detection comprises the steps of firstly obtaining the face position and the area through the face detection, and then determining the area in a certain proportion range below the face as a clothing area according to the structural characteristics of a human body. The method based on human body posture estimation firstly obtains the human body posture through human body posture estimation, the human body posture estimation result comprises main joint points of the human body, such as the joint points of shoulders, waist and the like, and the clothing region is determined based on the positions of the joint points.
The method is effective for positioning and acquiring the garment area worn on the human body, but the garment fails when not worn on the human body, and the application scenes exist in the fields of smart home and intelligent retail.
In the aspect of intelligent home furnishing, the intelligent washing machine firstly obtains a clothing area through videos collected by the camera, then analyzes styles, materials and the like of clothing, and the clothing is not worn on a human body at the moment. Similarly, in the field of intelligent retail, when the clothes are purchased and paid out at an automatic checkout counter, the purchased clothes need to be collected through a camera so as to identify the brand, model and the like of the purchased clothes, and the clothes are not worn on a human body at the moment. The existing technology based on human face detection and human body posture estimation is not suitable for the two occasions.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides the clothing region acquisition method and the clothing region acquisition device, which do not use a face region as a reference, can effectively acquire the clothing region which is not worn on a human body by positioning the clothing region based on the position of a human hand, and can be widely applied to the fields of intelligent home furnishing, intelligent retail and the like, such as intelligent washing machines, automatic identification of intelligent retail articles and the like.
In order to solve the technical problems, the invention adopts the technical scheme that:
a garment region acquisition method comprises the following steps:
step 1, positioning the hand position: the method comprises a human hand positioning method based on skin color and/or a human hand positioning method based on visual characteristics;
step 2, initial clothing area positioning: after the hand area is obtained, the initial position of the clothing area is obtained between the two hands and in the area below the two hands according to the proportion;
step 3, obtaining a precise clothing region: and after the initial clothing region is obtained, obtaining an accurate clothing region by means of image segmentation.
Further, in step 2, after the hand region is obtained in step 1, the circumscribed circle of the hand region is obtained, and further the diameter h1 of the circumscribed circle of the left hand region, the diameter h2 of the circumscribed circle of the right hand region and the distance h3 between the centers of the circumscribed circles of the left hand region and the right hand region are obtained; the initial clothing region is rectangular, the distance between the upper edge and the connecting line of the external circle centers of the two-hand region is h4, the width is h5, and the height is h6, wherein the value range of h4 is (0, h1+ h2), the value range of h5 is (h3/2, h3 x 2), and the value range of h6 is (h3/3, h3 x 4).
Preferably, the value of h4 is (h1+ h2)/2, the value of h5 is h3, and the value of h6 is h 3.
Further, the human hand positioning method based on skin color in step 1 comprises the following steps:
(1) firstly, obtaining a skin color area in an image/video frame by a skin color detection method: constructing a skin histogram model and a non-skin histogram model, and judging the probability of any pixel belonging to the skin class by adopting Bayesian decision according to the histogram model to obtain a skin color area in an image/video frame;
(2) then, morphologically processing the obtained skin color area to remove a noise area;
(3) then, analyzing the area of the skin color communication area in the image, and setting an area threshold value according to the actual application condition to eliminate the areas with too large and too small areas;
(4) and finally, in the rest skin color areas, when an area with the difference of two areas smaller than a threshold value just exists, the skin color area is regarded as a human hand area, wherein the threshold value range is 5% -30%.
Preferably, the step (1) in the step 1 is implemented by the following steps:
firstly, a skin histogram model and a non-skin histogram model are constructed, and the specific steps are as follows:
1) an image data set is constructed comprising two categories: images containing human skin and images completely free of any skin;
2) preprocessing a data set, and normalizing all images in the data set to be uniform in size;
3) dividing skin areas of an image containing human skin in an artificial mode;
4) constructing a skin histogram model, reading R, G, B values corresponding to each pixel in the skin images row by row and column by column, and counting color values corresponding to the pixel points of all the skin images to obtain the skin histogram model;
5) constructing a non-skin histogram model by adopting the same method as the step 4);
secondly, using a histogram model to detect skin color, and specifically comprising the following steps:
1) after the skin histogram model and the non-skin histogram model are constructed, the correlation probability can be calculated for any given pixel value through the color value, and the calculation formula is as follows:
P(RGB|SKIN)=S[RGB]/SN
P(RGB|NONSKIN)=N[RGB]/NN
wherein S [ RGB ] is the counting sum of the RGB color value in the corresponding weight handle in the skin class histogram model, N [ RGB ] is the counting sum of the RGB color value in the corresponding weight handle in the non-skin class histogram model, and SN and NN are the total number of pixels contained in the skin class histogram model and the non-skin class histogram model respectively.
2) The probability that any pixel belongs to a skin pixel is judged by adopting Bayesian decision, and the method specifically comprises the following steps:
a. calculating the prior probability P (SKIN) of any pixel belonging to the skin class according to the following calculation formula:
P(SKIN)=SN/(SN+NN)
b. calculating the prior probability P (NONSKIN) that any pixel belongs to the non-skin class according to the following formula:
P(NONSKIN)=1-P(SKIN)
c. according to the steps 1), a and b, calculating the probability that the given pixel belongs to the skin class by adopting Bayesian decision: p (SKIN | RGB) ═ P (RGB | SKIN)/[ P (RGB | SKIN) P (SKIN)/+ P (RGB | nononskin) P (nononskin) ]
d. Setting a threshold value T, calculating the probability of the given pixel belonging to the skin class obtained in the step c, if the obtained probability is more than T, judging the given pixel to be a skin color pixel, otherwise, judging the given pixel to be a non-skin color pixel.
Preferably, the step (3) in the step 1 is implemented as follows:
taking the result of the step (2) as input, and obtaining the outline of the skin color area;
traversing the contours to obtain the area of each contour;
setting a minimum area threshold value and a maximum area threshold value, and screening out regions with areas larger than the minimum area threshold value and smaller than the maximum area threshold value as human hand regions;
and fourthly, setting the length-width ratio limit of the hand area in the image, and further screening the hand area.
Further, the method for positioning human hand based on visual features in step 1 comprises the following steps:
(1) marking a hand area of an image in an application scene to obtain a training set;
(2) inputting the training set into a convolutional neural network to obtain a characteristic diagram, wherein the convolutional neural network adopts an SE-ResNet-101 network;
(3) taking the characteristic diagram obtained in the step (2) as input, and selecting a target area on the characteristic diagram by using the area candidate network;
(4) inputting the feature map obtained in the step (2) and the candidate region frame obtained in the step (3) as an interested pooling layer, and extracting the features of the target region frame on the feature map and the candidate region frame by using the interested pooling layer;
(5) after the interested pooling layer, obtaining a characteristic diagram with a fixed size, and then using a full-connection layer to judge the hand category and predict a target frame;
(6) after the human hand detector based on the visual characteristics is obtained, the human hand detector is used for human hand detection of corresponding application scenes, the position (x, y) of the upper left corner and the width and the height (w, h) of a target frame are predicted, and if and only if two hands appear simultaneously, the next step of initial clothing region positioning is carried out.
Further, in step 3, after the initial clothing region is obtained, a precise clothing region is obtained by segmentation through a GrabCT or GraphCut foreground and background segmentation method, and the specific steps of obtaining the precise clothing region are as follows:
(1) taking the clothing region obtained in the step 2 as a foreground seed point of a GrabCut or GraphCut method;
(2) setting background region seed points, and setting a two-hand region, a region above two hands, a left-hand region and a right-hand region as background region seed points;
(3) executing GrabCut or GraphCut algorithm to obtain a clothing foreground mask image;
(4) and (4) performing and operation on the foreground mask image obtained in the step (3) and the original image to obtain an accurate garment foreground image.
The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps as set forth above.
The present invention also provides a clothing region acquisition apparatus, including:
the hand positioning module is used for positioning the hand position;
the initial clothing region positioning module is used for acquiring an initial clothing region;
and the precise clothing region acquisition module is used for acquiring the precise clothing region.
Compared with the prior art, the invention has the advantages that:
(1) the invention aims at the acquisition of the clothing region which is not worn on the human body, and uses the hand region as the basis for the acquisition of the clothing region, thereby being more in line with the requirements of intelligent home furnishing, intelligent retail and the like, such as intelligent washing machines, automatic identification of intelligent retail articles and the like.
(2) According to the method, the hand region is accurately judged by optimizing the hand positioning method, so that the clothing region is identified, the clothing region which is not worn on the human body can be effectively obtained, and compared with the method of obtaining the clothing region only through the visual characteristics of the clothing, the method has the advantages of better adaptability and wide application field.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.
FIG. 1 is a schematic view of an initial garment region acquisition of the present invention;
FIG. 2 is a flow chart of a method of the present invention;
FIG. 3 is a schematic diagram of the embedding manner of the SE module of embodiment 1 in a ResNet-101 network;
fig. 4 is a diagram showing a structure of a regional candidate network according to embodiment 1.
Detailed Description
The invention is further described with reference to the following figures and specific embodiments.
Example 1
As shown in fig. 2, a clothing region acquiring method includes the following steps:
step 1, positioning the hand position
In smart home and smart retail, clothing images or videos are collected through a camera, and in order to better obtain information such as style of clothing, the clothing is generally unfolded in front of the camera when being shot, as shown in fig. 1. In this case, the human hand is a very good reference, so in the method proposed in this patent, the position of the human hand is first located.
Specifically, the method for positioning the human hand position comprises a human hand positioning method based on skin color and/or a human hand positioning method based on visual characteristics.
A human hand positioning method based on skin color comprises the following steps:
(1) firstly, obtaining a skin color area in an image/video frame by a skin color detection method: and constructing a skin histogram model and a non-skin histogram model, and judging the probability of any pixel belonging to the skin class by adopting Bayesian decision according to the histogram models to obtain a skin color area in the image/video frame.
The specific implementation steps are as follows:
firstly, a skin histogram model and a non-skin histogram model are constructed, and the specific steps are as follows:
1) an image data set is constructed comprising two categories: images containing human skin and images completely free of any skin;
2) preprocessing a data set, and normalizing all images in the data set to be uniform in size;
3) dividing skin areas of an image containing human skin in an artificial mode;
4) constructing a skin histogram model, reading R, G, B values corresponding to each pixel in the skin images row by row and column by column, and counting color values corresponding to the pixel points of all the skin images to obtain the skin histogram model;
5) constructing a non-skin histogram model by adopting the same method as the step 4);
secondly, using a histogram model to detect skin color, and specifically comprising the following steps:
1) after the skin histogram model and the non-skin histogram model are constructed, the correlation probability can be calculated for any given pixel value through the color value, and the calculation formula is as follows:
P(RGB|SKIN)=S[RGB]/SN
P(RGB|NONSKIN)=N[RGB]/NN
wherein S [ RGB ] is the counting sum of the RGB color value in the corresponding weight handle in the skin class histogram model, N [ RGB ] is the counting sum of the RGB color value in the corresponding weight handle in the non-skin class histogram model, and SN and NN are the total number of pixels contained in the skin class histogram model and the non-skin class histogram model respectively.
2) The probability that any pixel belongs to a skin pixel is judged by adopting Bayesian decision, and the method specifically comprises the following steps:
a. calculating the prior probability P (SKIN) of any pixel belonging to the skin class according to the following calculation formula:
P(SKIN)=SN/(SN+NN)
b. calculating the prior probability P (NONSKIN) that any pixel belongs to the non-skin class according to the following formula:
P(NONSKIN)=1-P(SKIN)
c. according to the steps 1), a and b, calculating the probability that the given pixel belongs to the skin class by adopting Bayesian decision:
P(SKIN|RGB)=P(RGB|SKIN)/[P(RGB|SKIN)P(SKIN)+P(RGB|NONSKIN)P(NONSKIN)]
d. setting a threshold value T, calculating the probability of the given pixel belonging to the skin class obtained in the step c, if the obtained probability is more than T, judging the given pixel to be a skin color pixel, otherwise, judging the given pixel to be a non-skin color pixel.
(2) And (2) according to the skin color area obtained in the step (1), performing morphological processing such as erosion and expansion on the obtained skin color area to remove a noise area.
(3) And then, analyzing the area of the skin color communication area in the image, and setting an area threshold value according to the actual application condition to eliminate the areas with overlarge areas and undersize areas. The specific implementation steps are as follows:
taking the result of the step (2) as input, and obtaining the outline of the skin color area;
traversing the contours to obtain the area of each contour;
setting a minimum area threshold value and a maximum area threshold value, and screening out regions with areas larger than the minimum area threshold value and smaller than the maximum area threshold value as human hand regions;
and fourthly, setting the length-width ratio limit of the hand area in the image, and further screening the hand area.
(4) Finally, human hand area determination: in the remaining skin color region, when there are exactly two regions with small area difference (the difference between the two areas is smaller than a threshold value, the threshold value range is 5% -30%, preferably 15%) close to the same vertical position, the skin color region is regarded as a human hand region.
A human hand positioning method based on visual features comprises the following steps:
(1) marking a hand area of an image in an application scene to obtain a training set;
(2) and inputting the training set into a convolutional neural network to obtain a characteristic diagram, wherein the convolutional neural network adopts an SE-ResNet-101 network. Carrying out feature extraction by adopting an SE-ResNet-101 network; the SE-ResNet-101 is added with an SE module on the basis of a ResNet-101 network, and the embedding mode is shown in figure 3.
(3) And (3) taking the feature map obtained in the step (2) as input, and selecting a target area on the feature map by using the area candidate network, wherein the network structure is shown in fig. 4.
The method mainly comprises the following steps:
firstly, carrying out convolution operation on a convolution feature map by using a convolution kernel with the size of 3 multiplied by 3 in a region candidate network, and obtaining a 512-dimensional vector at each position corresponding to the feature map;
setting k anchors (anchors) with fixed proportion and size for each pixel point by the regional candidate network, namely generating k personal hand candidate regions for the original image by each point in the feature map; in order to better adapt to human hand detection, three long proportion (1:1,1:2,1:3) and four scale multiples (16,32,64, 128) are designed;
third, the feature vector obtained for each position is followed by two branches: one branch is a classification layer, and human hand/non-human hand classification discrimination is carried out on each area; one branch is a regression layer, and k regression models are used for regressing the position and the size of the rectangular frame;
calculating the classification loss and the regression loss of the regional candidate network, and performing model optimization by adopting a gradient descent algorithm until the loss change is smaller than a set threshold value; the classification loss is calculated by adopting a cross entropy function, and the formula is as follows:
Figure BDA0002476416760000081
wherein p isiThe probability of containing the target is predicted for a rectangular frame corresponding to a certain anchor point;
Figure BDA0002476416760000082
if the anchor point contains the tag value of the target, the anchor point belongs to the positive sample when the value is 1, and the anchor point belongs to the negative sample when the value is 0.
The regression loss is calculated by using a smooth L1 loss function, and the formula is as follows:
Figure BDA0002476416760000083
wherein
Figure BDA0002476416760000084
And tiAre vectors containing the four elements of the top left ordinate and abscissa and width.
Figure BDA0002476416760000085
Representing the offset of the anchor point relative to the real target frame; t is tiIs a predicted value of the network and represents the offset of the prediction block with respect to the anchor point.
Figure BDA0002476416760000086
And tiThe calculation method of (c) is as follows:
Figure BDA0002476416760000087
Figure BDA0002476416760000088
Figure BDA0002476416760000089
Figure BDA00024764167600000810
wherein x*,y*,w*,h*Respectively the upper left corner horizontal and vertical coordinates and the width and height, x of the real target framea,ya,wa,haThe horizontal and vertical coordinates and the width and the height of the upper left corner of the anchor point are respectively, and x, y, w and h are respectively the horizontal and vertical coordinates and the width and the height of the upper left corner of the target frame predicted by the network.
(4) Inputting the feature map obtained in the step (2) and the candidate region frame obtained in the step (3) as an interested pooling layer, and extracting features of the target region frame on the feature map and the candidate region frame by using the interested pooling layer, wherein the specific steps are as follows:
mapping the target region frame obtained in the step (3) to the position (from the step (2)) on the feature map to obtain a target frame on the feature map, namely an interested region;
dividing the mapped target frame into k multiplied by k units, wherein k represents the output dimension;
and thirdly, performing maximum pooling operation on each unit.
(5) After the interested pooling layer, a feature map with a fixed size is obtained, and then a full-link layer is used for judging the human hand category and predicting the target frame (the full-link layer is used for connecting, and then two full-link branches are carried out, wherein the first branch is classified by calculating the probability of belonging to a specific category by using a SoftMax function, and the second branch is used for predicting the position of the target frame).
(6) After the human hand detector based on the visual characteristics is obtained, the human hand detector is used for human hand detection of corresponding application scenes, the position (x, y) of the upper left corner and the width and the height (w, h) of a target frame are predicted, and if and only if two hands appear simultaneously, the next step of initial clothing region positioning is carried out.
Step 2, initial clothing area positioning
After the hand regions are obtained, the initial positions of the garment regions are proportionally obtained in the regions between two hands and below the two hands, as shown in fig. 1, wherein the length h1 is the diameter of a circle circumscribing the left hand region, the length h2 is the diameter of a circle circumscribing the right hand region, h3 is the distance between the centers of circles circumscribing the left hand region and the right hand region, the dashed frame is the initial region of the obtained garment region, the initial garment region is rectangular, the distance between the upper edge and the connecting line of the centers of circles circumscribing the two hand regions is h4, the width is h5, and the height is h 6. The calculation steps at h1-h6 are as follows:
after the hand area is obtained in the step 1, the circumscribed circle of the hand area is obtained, and then the diameter h1 of the circumscribed circle of the left hand area, the diameter h2 of the circumscribed circle of the right hand area and the distance h3 between the centers of the circumscribed circles of the left hand area and the right hand area are obtained.
The value of h4 is (0, h1+ h2), and is generally (h1+ h 2)/2.
h5 may be in the range of (h3/2, h3 x 2), and may be generally h 3.
h6 may be (h3/3, h3 × 4) and may be h 3.
Step 3, obtaining accurate clothing region
After obtaining the initial clothing region, obtaining an accurate clothing region by segmenting through foreground and background segmentation methods such as GrabCut (or GraphCut), wherein the specific steps for obtaining the accurate clothing region are as follows:
(1) taking the clothing region obtained in the step 2 as a foreground seed point of the GrabCut method;
(2) setting background region seed points, and setting a two-hand region, a region above two hands, a left-hand region and a right-hand region as background region seed points;
(3) executing a GrabCont algorithm to obtain a clothing foreground mask image;
(4) and (4) performing and operation on the foreground mask image obtained in the step (3) and the original image to obtain an accurate garment foreground image.
Example 2
This embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the clothing region acquiring method according to embodiment 1, which are not described herein again.
Example 3
The present embodiment provides a clothing region acquiring apparatus, including:
and the hand positioning module is used for positioning the hand position. The method for positioning the hand position comprises two methods, one is a human hand positioning method based on skin color, and the other is a human hand positioning method based on visual characteristics.
And the initial clothing region positioning module is used for acquiring the initial clothing region. After the hand area is obtained, the initial position of the clothing area is obtained according to a certain proportion in the area between two hands and below the two hands.
And the precise clothing region acquisition module is used for acquiring the precise clothing region.
The specific implementation method for implementing the functions of each module of the device can be referred to as embodiment 1, and details are not described here.
The same or similar parts among the various embodiments of the present description may be referred to each other, and each embodiment is described with emphasis on differences from the other embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and certainly may be implemented by hardware, but in many cases, the former is a better embodiment. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments of the present invention.
It is understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims (10)

1. A garment region acquisition method is characterized by comprising the following steps:
step 1, positioning the hand position: the method comprises a human hand positioning method based on skin color and/or a human hand positioning method based on visual characteristics;
step 2, initial clothing area positioning: after the hand area is obtained, the initial position of the clothing area is obtained between the two hands and in the area below the two hands according to the proportion;
step 3, obtaining a precise clothing region: and after the initial clothing region is obtained, obtaining an accurate clothing region by means of image segmentation.
2. The garment region acquisition method according to claim 1, wherein in step 2, after the hand region is obtained in step 1, the circumscribed circle of the hand region is obtained, and further, the diameter h1 of the circumscribed circle of the left hand region and the diameter h2 of the circumscribed circle of the right hand region and the distance h3 between the centers of the circumscribed circles of the left and right hand regions are obtained; the initial clothing region is rectangular, the distance between the upper edge and the connecting line of the external circle centers of the two-hand region is h4, the width is h5, and the height is h6, wherein the value range of h4 is (0, h1+ h2), the value range of h5 is (h3/2, h3 x 2), and the value range of h6 is (h3/3, h3 x 4).
3. The clothing region acquisition method according to claim 2, wherein a value of h4 is (h1+ h2)/2, a value of h5 is h3, and a value of h6 is h 3.
4. The clothing region acquisition method according to any one of claims 1 to 3, wherein the skin color based human hand positioning method in step 1 comprises the following steps:
(1) firstly, obtaining a skin color area in an image/video frame by a skin color detection method: constructing a skin histogram model and a non-skin histogram model, and judging the probability of any pixel belonging to the skin class by adopting Bayesian decision according to the histogram model to obtain a skin color area in an image/video frame;
(2) then, morphologically processing the obtained skin color area to remove a noise area;
(3) then, analyzing the area of the skin color communication area in the image, and setting an area threshold value according to the actual application condition to eliminate the areas with too large and too small areas;
(4) and finally, in the rest skin color areas, when an area with the difference of two areas smaller than a threshold value just exists, the skin color area is regarded as a human hand area, wherein the threshold value range is 5% -30%.
5. The clothing region acquisition method according to claim 4, wherein the step (1) in the step 1 is implemented by the following steps:
firstly, a skin histogram model and a non-skin histogram model are constructed, and the specific steps are as follows:
1) an image data set is constructed comprising two categories: images containing human skin and images completely free of any skin;
2) preprocessing a data set, and normalizing all images in the data set to be uniform in size;
3) dividing skin areas of an image containing human skin in an artificial mode;
4) constructing a skin histogram model, reading R, G, B values corresponding to each pixel in the skin images row by row and column by column, and counting color values corresponding to the pixel points of all the skin images to obtain the skin histogram model;
5) constructing a non-skin histogram model by adopting the same method as the step 4);
secondly, using a histogram model to detect skin color, and specifically comprising the following steps:
1) after the skin histogram model and the non-skin histogram model are constructed, the correlation probability can be calculated for any given pixel value through the RGB color value, and the calculation formula is as follows:
P(RGB|SKIN)=S[RGB]/SN
P(RGB|NONSKIN)=N[RGB]/NN
wherein S [ RGB ] is the counting sum of the RGB color value in the corresponding weight handle in the skin class histogram model, N [ RGB ] is the counting sum of the RGB color value in the corresponding weight handle in the non-skin class histogram model, and SN and NN are the total number of pixels contained in the skin class histogram model and the non-skin class histogram model respectively.
2) The probability that any pixel belongs to a skin pixel is judged by adopting Bayesian decision, and the method specifically comprises the following steps:
a. calculating the prior probability P (SKIN) of any pixel belonging to the skin class according to the following calculation formula:
P(SKIN)=SN/(SN+NN)
b. calculating the prior probability P (NONSKIN) that any pixel belongs to the non-skin class according to the following formula:
P(NONSKIN)=1-P(SKIN)
c. according to the steps 1), a and b, calculating the probability that the given pixel belongs to the skin class by adopting Bayesian decision: p (SKIN | RGB) ═ P (RGB | SKIN)/[ P (RGB | SKIN) P (SKIN)/+ P (RGB | nononskin) P (nononskin) ]
d. Setting a threshold value T, calculating the probability of the given pixel belonging to the skin class obtained in the step c, if the obtained probability is more than T, judging the given pixel to be a skin color pixel, otherwise, judging the given pixel to be a non-skin color pixel.
6. The clothing region acquisition method according to claim 4, wherein the step (3) in the step 1 is implemented as follows:
taking the result of the step (2) as input, and obtaining the outline of the skin color area;
traversing the contours to obtain the area of each contour;
setting a minimum area threshold value and a maximum area threshold value, and screening out regions with areas larger than the minimum area threshold value and smaller than the maximum area threshold value as human hand regions;
and fourthly, setting the length-width ratio limit of the hand area in the image, and further screening the hand area.
7. The garment region acquisition method according to any one of claims 1-3, wherein the visual feature-based human hand positioning method in step 1 comprises the following steps:
(1) marking a hand area of an image in an application scene to obtain a training set;
(2) inputting the training set into a convolutional neural network to obtain a characteristic diagram, wherein the convolutional neural network adopts an SE-ResNet-101 network;
(3) taking the characteristic diagram obtained in the step (2) as input, and selecting a target area on the characteristic diagram by using the area candidate network;
(4) inputting the feature map obtained in the step (2) and the candidate region frame obtained in the step (3) as an interested pooling layer, and extracting the features of the target region frame on the feature map and the candidate region frame by using the interested pooling layer;
(5) after the interested pooling layer, obtaining a characteristic diagram with a fixed size, and then using a full-connection layer to judge the hand category and predict a target frame;
(6) after the human hand detector based on the visual characteristics is obtained, the human hand detector is used for human hand detection of corresponding application scenes, the position (x, y) of the upper left corner and the width and the height (w, h) of a target frame are predicted, and if and only if two hands appear simultaneously, the next step of initial clothing region positioning is carried out.
8. The clothing region acquisition method according to any one of claims 1 to 3, wherein in the step 3, after the clothing initial region is obtained, a precise clothing region is obtained by segmentation through a GrabCut or GraphCut foreground and background segmentation method, and the specific steps of obtaining the precise clothing region are as follows:
(1) taking the clothing region obtained in the step 2 as a foreground seed point of a GrabCut or GraphCut method;
(2) setting background region seed points, and setting a two-hand region, a region above two hands, a left-hand region and a right-hand region as background region seed points;
(3) executing GrabCut or GraphCut algorithm to obtain a clothing foreground mask image;
(4) and (4) performing and operation on the foreground mask image obtained in the step (3) and the original image to obtain an accurate garment foreground image.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 3.
10. A garment region acquisition device, comprising:
the hand positioning module is used for positioning the hand position;
the initial clothing region positioning module is used for acquiring an initial clothing region;
and the precise clothing region acquisition module is used for acquiring the precise clothing region.
CN202010365522.7A 2020-04-30 2020-04-30 Method and device for acquiring clothing region not worn on human body Active CN111639641B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010365522.7A CN111639641B (en) 2020-04-30 2020-04-30 Method and device for acquiring clothing region not worn on human body

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010365522.7A CN111639641B (en) 2020-04-30 2020-04-30 Method and device for acquiring clothing region not worn on human body

Publications (2)

Publication Number Publication Date
CN111639641A true CN111639641A (en) 2020-09-08
CN111639641B CN111639641B (en) 2022-05-03

Family

ID=72330870

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010365522.7A Active CN111639641B (en) 2020-04-30 2020-04-30 Method and device for acquiring clothing region not worn on human body

Country Status (1)

Country Link
CN (1) CN111639641B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102324019A (en) * 2011-08-12 2012-01-18 浙江大学 Method and system for automatically extracting gesture candidate region in video sequence
CN103106386A (en) * 2011-11-10 2013-05-15 华为技术有限公司 Dynamic self-adaption skin color segmentation method and device
CN103995595A (en) * 2014-05-28 2014-08-20 重庆大学 Game somatosensory control method based on hand gestures
CN104167006A (en) * 2014-07-10 2014-11-26 华南理工大学 Gesture tracking method of any hand shape
CN106909884A (en) * 2017-01-17 2017-06-30 北京航空航天大学 A kind of hand region detection method and device based on hierarchy and deformable part sub-model
CN107818489A (en) * 2017-09-08 2018-03-20 中山大学 A kind of more people's costume retrieval methods based on dressing parsing and human testing
CN109255312A (en) * 2018-08-30 2019-01-22 罗普特(厦门)科技集团有限公司 A kind of abnormal dressing detection method and device based on appearance features
CN110569722A (en) * 2019-08-01 2019-12-13 江苏濠汉信息技术有限公司 Visual analysis-based constructor dressing standard detection method and device

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102324019A (en) * 2011-08-12 2012-01-18 浙江大学 Method and system for automatically extracting gesture candidate region in video sequence
CN103106386A (en) * 2011-11-10 2013-05-15 华为技术有限公司 Dynamic self-adaption skin color segmentation method and device
CN103995595A (en) * 2014-05-28 2014-08-20 重庆大学 Game somatosensory control method based on hand gestures
CN104167006A (en) * 2014-07-10 2014-11-26 华南理工大学 Gesture tracking method of any hand shape
CN106909884A (en) * 2017-01-17 2017-06-30 北京航空航天大学 A kind of hand region detection method and device based on hierarchy and deformable part sub-model
CN107818489A (en) * 2017-09-08 2018-03-20 中山大学 A kind of more people's costume retrieval methods based on dressing parsing and human testing
CN109255312A (en) * 2018-08-30 2019-01-22 罗普特(厦门)科技集团有限公司 A kind of abnormal dressing detection method and device based on appearance features
CN110569722A (en) * 2019-08-01 2019-12-13 江苏濠汉信息技术有限公司 Visual analysis-based constructor dressing standard detection method and device

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
LEI HUANG等: "Robust skin detection in real-world images", 《J.VIS.COMMUN.IMAGE R》 *
ZHIQIANG WEI等: "Inferring intrinsic correlation between clothing style and wearers" personality", 《SPRINGER》 *
张凯丽: "基于深度学习的服装属性识别与关键点定位算法的研究", 《中国优秀博硕士学位论文全文数据库(硕士) 信息科技辑》 *
张萌岩: "基于深度学习的服装图像属性标签识别与关键点定位研究", 《中国优秀博硕士学位论文全文数据库(硕士) 工程科技|辑》 *
桂琳等: "融合多层次特征的服装图像描述方法", 《中国海洋大学学报》 *

Also Published As

Publication number Publication date
CN111639641B (en) 2022-05-03

Similar Documents

Publication Publication Date Title
CN115601374B (en) Chromosome image segmentation method
US9639748B2 (en) Method for detecting persons using 1D depths and 2D texture
CN113781402A (en) Method and device for detecting chip surface scratch defects and computer equipment
CN111652317B (en) Super-parameter image segmentation method based on Bayes deep learning
CN105740945A (en) People counting method based on video analysis
Zhang et al. Knowledge-based eye detection for human face recognition
CN108648211A (en) A kind of small target detecting method, device, equipment and medium based on deep learning
CN110298297A (en) Flame identification method and device
CN112734761B (en) Industrial product image boundary contour extraction method
CN113449606B (en) Target object identification method and device, computer equipment and storage medium
CN113379789B (en) Moving target tracking method in complex environment
Kim Image enhancement using patch-based principal energy analysis
CN107886060A (en) Pedestrian's automatic detection and tracking based on video
US11080861B2 (en) Scene segmentation using model subtraction
CN111639641B (en) Method and device for acquiring clothing region not worn on human body
CN108765384B (en) Significance detection method for joint manifold sequencing and improved convex hull
Feng et al. Image shadow detection and removal based on region matching of intelligent computing
CN115482569A (en) Target passenger flow statistical method, electronic device and computer readable storage medium
CN117152787A (en) Character clothing recognition method, device, equipment and readable storage medium
CN114155273A (en) Video image single-target tracking method combined with historical track information
CN113139946A (en) Shirt stain positioning device based on vision
CN113658223A (en) Multi-pedestrian detection and tracking method and system based on deep learning
CN117037049B (en) Image content detection method and system based on YOLOv5 deep learning
CN109711445A (en) The similar method of weighting of intelligence in the super-pixel of target following classifier on-line training sample
Kerdvibulvech Hybrid model of human hand motion for cybernetics application

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant