CN111639641A

CN111639641A - Clothing area acquisition method and device

Info

Publication number: CN111639641A
Application number: CN202010365522.7A
Authority: CN
Inventors: 黄磊; 魏志强; 张文锋; 张明林; 魏冠群
Original assignee: Ocean University of China
Current assignee: Ocean University of China
Priority date: 2020-04-30
Filing date: 2020-04-30
Publication date: 2020-09-08
Anticipated expiration: 2040-04-30
Also published as: CN111639641B

Abstract

The invention discloses a clothing region acquisition method and a device, wherein the clothing region acquisition method comprises the following steps: step 1, positioning the hand position; step 2, initial clothing area positioning: after the hand area is obtained, the initial position of the clothing area is obtained between the two hands and in the area below the two hands according to the proportion; step 3, obtaining a precise clothing region: and after the initial clothing region is obtained, obtaining an accurate clothing region by means of image segmentation. The method is suitable for obtaining the clothing area which is not worn on the human body, and meets the requirements of intelligent home furnishing and intelligent retail. Furthermore, the present invention proposes to use the hand region as a reference for garment region acquisition, which is more adaptive than acquiring its region by only the visual characteristics of the garment.

Description

Clothing area acquisition method and device

Technical Field

The invention belongs to the technical field of computer vision, relates to image processing and automatic identification technology, and particularly relates to a clothing region acquisition method and device.

Background

The automatic acquisition of the clothing region plays an important role in many application fields, such as person re-identification, image processing and modification, person dressing trend analysis and the like in intelligent monitoring. The above fields are all the extraction and analysis of the clothing region on the human body, and the means for obtaining the clothing region mainly include two types: a method based on face detection and a method based on human body posture estimation.

The method based on the face detection comprises the steps of firstly obtaining the face position and the area through the face detection, and then determining the area in a certain proportion range below the face as a clothing area according to the structural characteristics of a human body. The method based on human body posture estimation firstly obtains the human body posture through human body posture estimation, the human body posture estimation result comprises main joint points of the human body, such as the joint points of shoulders, waist and the like, and the clothing region is determined based on the positions of the joint points.

The method is effective for positioning and acquiring the garment area worn on the human body, but the garment fails when not worn on the human body, and the application scenes exist in the fields of smart home and intelligent retail.

In the aspect of intelligent home furnishing, the intelligent washing machine firstly obtains a clothing area through videos collected by the camera, then analyzes styles, materials and the like of clothing, and the clothing is not worn on a human body at the moment. Similarly, in the field of intelligent retail, when the clothes are purchased and paid out at an automatic checkout counter, the purchased clothes need to be collected through a camera so as to identify the brand, model and the like of the purchased clothes, and the clothes are not worn on a human body at the moment. The existing technology based on human face detection and human body posture estimation is not suitable for the two occasions.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides the clothing region acquisition method and the clothing region acquisition device, which do not use a face region as a reference, can effectively acquire the clothing region which is not worn on a human body by positioning the clothing region based on the position of a human hand, and can be widely applied to the fields of intelligent home furnishing, intelligent retail and the like, such as intelligent washing machines, automatic identification of intelligent retail articles and the like.

In order to solve the technical problems, the invention adopts the technical scheme that:

a garment region acquisition method comprises the following steps:

step 1, positioning the hand position: the method comprises a human hand positioning method based on skin color and/or a human hand positioning method based on visual characteristics;

step 2, initial clothing area positioning: after the hand area is obtained, the initial position of the clothing area is obtained between the two hands and in the area below the two hands according to the proportion;

step 3, obtaining a precise clothing region: and after the initial clothing region is obtained, obtaining an accurate clothing region by means of image segmentation.

Further, in step 2, after the hand region is obtained in step 1, the circumscribed circle of the hand region is obtained, and further the diameter h1 of the circumscribed circle of the left hand region, the diameter h2 of the circumscribed circle of the right hand region and the distance h3 between the centers of the circumscribed circles of the left hand region and the right hand region are obtained; the initial clothing region is rectangular, the distance between the upper edge and the connecting line of the external circle centers of the two-hand region is h4, the width is h5, and the height is h6, wherein the value range of h4 is (0, h1+ h2), the value range of h5 is (h3/2, h3 x 2), and the value range of h6 is (h3/3, h3 x 4).

Preferably, the value of h4 is (h1+ h2)/2, the value of h5 is h3, and the value of h6 is h 3.

Further, the human hand positioning method based on skin color in step 1 comprises the following steps:

(1) firstly, obtaining a skin color area in an image/video frame by a skin color detection method: constructing a skin histogram model and a non-skin histogram model, and judging the probability of any pixel belonging to the skin class by adopting Bayesian decision according to the histogram model to obtain a skin color area in an image/video frame;

(2) then, morphologically processing the obtained skin color area to remove a noise area;

(3) then, analyzing the area of the skin color communication area in the image, and setting an area threshold value according to the actual application condition to eliminate the areas with too large and too small areas;

(4) and finally, in the rest skin color areas, when an area with the difference of two areas smaller than a threshold value just exists, the skin color area is regarded as a human hand area, wherein the threshold value range is 5% -30%.

Preferably, the step (1) in the step 1 is implemented by the following steps:

firstly, a skin histogram model and a non-skin histogram model are constructed, and the specific steps are as follows:

1) an image data set is constructed comprising two categories: images containing human skin and images completely free of any skin;

2) preprocessing a data set, and normalizing all images in the data set to be uniform in size;

3) dividing skin areas of an image containing human skin in an artificial mode;

4) constructing a skin histogram model, reading R, G, B values corresponding to each pixel in the skin images row by row and column by column, and counting color values corresponding to the pixel points of all the skin images to obtain the skin histogram model;

5) constructing a non-skin histogram model by adopting the same method as the step 4);

secondly, using a histogram model to detect skin color, and specifically comprising the following steps:

1) after the skin histogram model and the non-skin histogram model are constructed, the correlation probability can be calculated for any given pixel value through the color value, and the calculation formula is as follows:

P(RGB|SKIN)＝S[RGB]/SN

P(RGB|NONSKIN)＝N[RGB]/NN

wherein S [ RGB ] is the counting sum of the RGB color value in the corresponding weight handle in the skin class histogram model, N [ RGB ] is the counting sum of the RGB color value in the corresponding weight handle in the non-skin class histogram model, and SN and NN are the total number of pixels contained in the skin class histogram model and the non-skin class histogram model respectively.

2) The probability that any pixel belongs to a skin pixel is judged by adopting Bayesian decision, and the method specifically comprises the following steps:

a. calculating the prior probability P (SKIN) of any pixel belonging to the skin class according to the following calculation formula:

P(SKIN)＝SN/(SN+NN)

b. calculating the prior probability P (NONSKIN) that any pixel belongs to the non-skin class according to the following formula:

P(NONSKIN)＝1-P(SKIN)

c. according to the steps 1), a and b, calculating the probability that the given pixel belongs to the skin class by adopting Bayesian decision: p (SKIN | RGB) ═ P (RGB | SKIN)/[ P (RGB | SKIN) P (SKIN)/+ P (RGB | nononskin) P (nononskin) ]

d. Setting a threshold value T, calculating the probability of the given pixel belonging to the skin class obtained in the step c, if the obtained probability is more than T, judging the given pixel to be a skin color pixel, otherwise, judging the given pixel to be a non-skin color pixel.

Preferably, the step (3) in the step 1 is implemented as follows:

taking the result of the step (2) as input, and obtaining the outline of the skin color area;

traversing the contours to obtain the area of each contour;

setting a minimum area threshold value and a maximum area threshold value, and screening out regions with areas larger than the minimum area threshold value and smaller than the maximum area threshold value as human hand regions;

and fourthly, setting the length-width ratio limit of the hand area in the image, and further screening the hand area.

Further, the method for positioning human hand based on visual features in step 1 comprises the following steps:

(1) marking a hand area of an image in an application scene to obtain a training set;

(2) inputting the training set into a convolutional neural network to obtain a characteristic diagram, wherein the convolutional neural network adopts an SE-ResNet-101 network;

(3) taking the characteristic diagram obtained in the step (2) as input, and selecting a target area on the characteristic diagram by using the area candidate network;

(4) inputting the feature map obtained in the step (2) and the candidate region frame obtained in the step (3) as an interested pooling layer, and extracting the features of the target region frame on the feature map and the candidate region frame by using the interested pooling layer;

(5) after the interested pooling layer, obtaining a characteristic diagram with a fixed size, and then using a full-connection layer to judge the hand category and predict a target frame;

(6) after the human hand detector based on the visual characteristics is obtained, the human hand detector is used for human hand detection of corresponding application scenes, the position (x, y) of the upper left corner and the width and the height (w, h) of a target frame are predicted, and if and only if two hands appear simultaneously, the next step of initial clothing region positioning is carried out.

Further, in step 3, after the initial clothing region is obtained, a precise clothing region is obtained by segmentation through a GrabCT or GraphCut foreground and background segmentation method, and the specific steps of obtaining the precise clothing region are as follows:

(1) taking the clothing region obtained in the step 2 as a foreground seed point of a GrabCut or GraphCut method;

(2) setting background region seed points, and setting a two-hand region, a region above two hands, a left-hand region and a right-hand region as background region seed points;

(3) executing GrabCut or GraphCut algorithm to obtain a clothing foreground mask image;

(4) and (4) performing and operation on the foreground mask image obtained in the step (3) and the original image to obtain an accurate garment foreground image.

The invention also provides a computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps as set forth above.

The present invention also provides a clothing region acquisition apparatus, including:

the hand positioning module is used for positioning the hand position;

the initial clothing region positioning module is used for acquiring an initial clothing region;

and the precise clothing region acquisition module is used for acquiring the precise clothing region.

Compared with the prior art, the invention has the advantages that:

(1) the invention aims at the acquisition of the clothing region which is not worn on the human body, and uses the hand region as the basis for the acquisition of the clothing region, thereby being more in line with the requirements of intelligent home furnishing, intelligent retail and the like, such as intelligent washing machines, automatic identification of intelligent retail articles and the like.

(2) According to the method, the hand region is accurately judged by optimizing the hand positioning method, so that the clothing region is identified, the clothing region which is not worn on the human body can be effectively obtained, and compared with the method of obtaining the clothing region only through the visual characteristics of the clothing, the method has the advantages of better adaptability and wide application field.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive labor.

FIG. 1 is a schematic view of an initial garment region acquisition of the present invention;

FIG. 2 is a flow chart of a method of the present invention;

FIG. 3 is a schematic diagram of the embedding manner of the SE module of embodiment 1 in a ResNet-101 network;

fig. 4 is a diagram showing a structure of a regional candidate network according to embodiment 1.

Detailed Description

The invention is further described with reference to the following figures and specific embodiments.

Example 1

As shown in fig. 2, a clothing region acquiring method includes the following steps:

step 1, positioning the hand position

In smart home and smart retail, clothing images or videos are collected through a camera, and in order to better obtain information such as style of clothing, the clothing is generally unfolded in front of the camera when being shot, as shown in fig. 1. In this case, the human hand is a very good reference, so in the method proposed in this patent, the position of the human hand is first located.

Specifically, the method for positioning the human hand position comprises a human hand positioning method based on skin color and/or a human hand positioning method based on visual characteristics.

A human hand positioning method based on skin color comprises the following steps:

(1) firstly, obtaining a skin color area in an image/video frame by a skin color detection method: and constructing a skin histogram model and a non-skin histogram model, and judging the probability of any pixel belonging to the skin class by adopting Bayesian decision according to the histogram models to obtain a skin color area in the image/video frame.

The specific implementation steps are as follows:

3) dividing skin areas of an image containing human skin in an artificial mode;

P(RGB|SKIN)＝S[RGB]/SN

P(RGB|NONSKIN)＝N[RGB]/NN

P(SKIN)＝SN/(SN+NN)

P(NONSKIN)＝1-P(SKIN)

c. according to the steps 1), a and b, calculating the probability that the given pixel belongs to the skin class by adopting Bayesian decision:

P(SKIN|RGB)＝P(RGB|SKIN)/[P(RGB|SKIN)P(SKIN)+P(RGB|NONSKIN)P(NONSKIN)]

(2) And (2) according to the skin color area obtained in the step (1), performing morphological processing such as erosion and expansion on the obtained skin color area to remove a noise area.

(3) And then, analyzing the area of the skin color communication area in the image, and setting an area threshold value according to the actual application condition to eliminate the areas with overlarge areas and undersize areas. The specific implementation steps are as follows:

traversing the contours to obtain the area of each contour;

(4) Finally, human hand area determination: in the remaining skin color region, when there are exactly two regions with small area difference (the difference between the two areas is smaller than a threshold value, the threshold value range is 5% -30%, preferably 15%) close to the same vertical position, the skin color region is regarded as a human hand region.

A human hand positioning method based on visual features comprises the following steps:

(2) and inputting the training set into a convolutional neural network to obtain a characteristic diagram, wherein the convolutional neural network adopts an SE-ResNet-101 network. Carrying out feature extraction by adopting an SE-ResNet-101 network; the SE-ResNet-101 is added with an SE module on the basis of a ResNet-101 network, and the embedding mode is shown in figure 3.

(3) And (3) taking the feature map obtained in the step (2) as input, and selecting a target area on the feature map by using the area candidate network, wherein the network structure is shown in fig. 4.

The method mainly comprises the following steps:

firstly, carrying out convolution operation on a convolution feature map by using a convolution kernel with the size of 3 multiplied by 3 in a region candidate network, and obtaining a 512-dimensional vector at each position corresponding to the feature map;

setting k anchors (anchors) with fixed proportion and size for each pixel point by the regional candidate network, namely generating k personal hand candidate regions for the original image by each point in the feature map; in order to better adapt to human hand detection, three long proportion (1:1,1:2,1:3) and four scale multiples (16,32,64, 128) are designed;

third, the feature vector obtained for each position is followed by two branches: one branch is a classification layer, and human hand/non-human hand classification discrimination is carried out on each area; one branch is a regression layer, and k regression models are used for regressing the position and the size of the rectangular frame;

calculating the classification loss and the regression loss of the regional candidate network, and performing model optimization by adopting a gradient descent algorithm until the loss change is smaller than a set threshold value; the classification loss is calculated by adopting a cross entropy function, and the formula is as follows:

wherein p is_iThe probability of containing the target is predicted for a rectangular frame corresponding to a certain anchor point;

if the anchor point contains the tag value of the target, the anchor point belongs to the positive sample when the value is 1, and the anchor point belongs to the negative sample when the value is 0.

The regression loss is calculated by using a smooth L1 loss function, and the formula is as follows:

wherein

And t_iAre vectors containing the four elements of the top left ordinate and abscissa and width.

Representing the offset of the anchor point relative to the real target frame; t is t_iIs a predicted value of the network and represents the offset of the prediction block with respect to the anchor point.

And t_iThe calculation method of (c) is as follows:

wherein x^*，y^*，w^*，h^*Respectively the upper left corner horizontal and vertical coordinates and the width and height, x of the real target frame_a，y_a，w_a，h_aThe horizontal and vertical coordinates and the width and the height of the upper left corner of the anchor point are respectively, and x, y, w and h are respectively the horizontal and vertical coordinates and the width and the height of the upper left corner of the target frame predicted by the network.

(4) Inputting the feature map obtained in the step (2) and the candidate region frame obtained in the step (3) as an interested pooling layer, and extracting features of the target region frame on the feature map and the candidate region frame by using the interested pooling layer, wherein the specific steps are as follows:

mapping the target region frame obtained in the step (3) to the position (from the step (2)) on the feature map to obtain a target frame on the feature map, namely an interested region;

dividing the mapped target frame into k multiplied by k units, wherein k represents the output dimension;

and thirdly, performing maximum pooling operation on each unit.

(5) After the interested pooling layer, a feature map with a fixed size is obtained, and then a full-link layer is used for judging the human hand category and predicting the target frame (the full-link layer is used for connecting, and then two full-link branches are carried out, wherein the first branch is classified by calculating the probability of belonging to a specific category by using a SoftMax function, and the second branch is used for predicting the position of the target frame).

Step 2, initial clothing area positioning

After the hand regions are obtained, the initial positions of the garment regions are proportionally obtained in the regions between two hands and below the two hands, as shown in fig. 1, wherein the length h1 is the diameter of a circle circumscribing the left hand region, the length h2 is the diameter of a circle circumscribing the right hand region, h3 is the distance between the centers of circles circumscribing the left hand region and the right hand region, the dashed frame is the initial region of the obtained garment region, the initial garment region is rectangular, the distance between the upper edge and the connecting line of the centers of circles circumscribing the two hand regions is h4, the width is h5, and the height is h 6. The calculation steps at h1-h6 are as follows:

after the hand area is obtained in the step 1, the circumscribed circle of the hand area is obtained, and then the diameter h1 of the circumscribed circle of the left hand area, the diameter h2 of the circumscribed circle of the right hand area and the distance h3 between the centers of the circumscribed circles of the left hand area and the right hand area are obtained.

The value of h4 is (0, h1+ h2), and is generally (h1+ h 2)/2.

h5 may be in the range of (h3/2, h3 x 2), and may be generally h 3.

h6 may be (h3/3, h3 × 4) and may be h 3.

Step 3, obtaining accurate clothing region

After obtaining the initial clothing region, obtaining an accurate clothing region by segmenting through foreground and background segmentation methods such as GrabCut (or GraphCut), wherein the specific steps for obtaining the accurate clothing region are as follows:

(1) taking the clothing region obtained in the step 2 as a foreground seed point of the GrabCut method;

(3) executing a GrabCont algorithm to obtain a clothing foreground mask image;

Example 2

This embodiment provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the clothing region acquiring method according to embodiment 1, which are not described herein again.

Example 3

The present embodiment provides a clothing region acquiring apparatus, including:

and the hand positioning module is used for positioning the hand position. The method for positioning the hand position comprises two methods, one is a human hand positioning method based on skin color, and the other is a human hand positioning method based on visual characteristics.

And the initial clothing region positioning module is used for acquiring the initial clothing region. After the hand area is obtained, the initial position of the clothing area is obtained according to a certain proportion in the area between two hands and below the two hands.

The specific implementation method for implementing the functions of each module of the device can be referred to as embodiment 1, and details are not described here.

The same or similar parts among the various embodiments of the present description may be referred to each other, and each embodiment is described with emphasis on differences from the other embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the present invention may be implemented by software plus a necessary hardware platform, and certainly may be implemented by hardware, but in many cases, the former is a better embodiment. With this understanding in mind, all or part of the technical solutions of the present invention that contribute to the background can be embodied in the form of a software product, which can be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes instructions for causing a computer device (which can be a personal computer, a server, or a network device, etc.) to execute the methods according to the embodiments or some parts of the embodiments of the present invention.

It is understood that the above description is not intended to limit the present invention, and the present invention is not limited to the above examples, and those skilled in the art should understand that they can make various changes, modifications, additions and substitutions within the spirit and scope of the present invention.

Claims

1. A garment region acquisition method is characterized by comprising the following steps:

2. The garment region acquisition method according to claim 1, wherein in step 2, after the hand region is obtained in step 1, the circumscribed circle of the hand region is obtained, and further, the diameter h1 of the circumscribed circle of the left hand region and the diameter h2 of the circumscribed circle of the right hand region and the distance h3 between the centers of the circumscribed circles of the left and right hand regions are obtained; the initial clothing region is rectangular, the distance between the upper edge and the connecting line of the external circle centers of the two-hand region is h4, the width is h5, and the height is h6, wherein the value range of h4 is (0, h1+ h2), the value range of h5 is (h3/2, h3 x 2), and the value range of h6 is (h3/3, h3 x 4).

3. The clothing region acquisition method according to claim 2, wherein a value of h4 is (h1+ h2)/2, a value of h5 is h3, and a value of h6 is h 3.

4. The clothing region acquisition method according to any one of claims 1 to 3, wherein the skin color based human hand positioning method in step 1 comprises the following steps:

5. The clothing region acquisition method according to claim 4, wherein the step (1) in the step 1 is implemented by the following steps:

3) dividing skin areas of an image containing human skin in an artificial mode;

1) after the skin histogram model and the non-skin histogram model are constructed, the correlation probability can be calculated for any given pixel value through the RGB color value, and the calculation formula is as follows:

P(RGB|SKIN)＝S[RGB]/SN

P(RGB|NONSKIN)＝N[RGB]/NN

P(SKIN)＝SN/(SN+NN)

P(NONSKIN)＝1-P(SKIN)

6. The clothing region acquisition method according to claim 4, wherein the step (3) in the step 1 is implemented as follows:

traversing the contours to obtain the area of each contour;

7. The garment region acquisition method according to any one of claims 1-3, wherein the visual feature-based human hand positioning method in step 1 comprises the following steps:

8. The clothing region acquisition method according to any one of claims 1 to 3, wherein in the step 3, after the clothing initial region is obtained, a precise clothing region is obtained by segmentation through a GrabCut or GraphCut foreground and background segmentation method, and the specific steps of obtaining the precise clothing region are as follows:

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method steps of any one of claims 1 to 3.

10. A garment region acquisition device, comprising:

the hand positioning module is used for positioning the hand position;