CN111369686A

CN111369686A - AR imaging virtual shoe fitting method and device capable of processing local shielding objects

Info

Publication number: CN111369686A
Application number: CN202010138425.4A
Authority: CN
Inventors: 李汪洋
Original assignee: Zugou Technology Hangzhou Co ltd
Current assignee: Hangzhou Huotui Technology Co ltd
Priority date: 2020-03-03
Filing date: 2020-03-03
Publication date: 2020-07-03

Abstract

The invention discloses an AR imaging virtual shoe fitting method capable of processing local shelters and a device for realizing the method. The method mainly comprises the following steps: step 1, segmenting and identifying an ankle target, a instep target or a shoe target and a shelter target in a foot image and a shelter relation of the ankle target, the instep target or the shoe target and the shelter target through a Mask R-CNN neural network; calculating the predicted 6D pose of the instep or shoe target by using a PVnet algorithm; generating a try-on shoe image corresponding to the predicted 6D pose based on the 3D model of the try-on shoe, covering the try-on shoe image on the instep or the shoe target in the foot area image, and maintaining the shielding relation. This scheme can be to the discernment shoot in the image intentionally the foreign object get into the condition of sheltering from the foot and carry out effective processing, still can be complete accurately in shoot the image foot show the AR three-dimensional model of shoes, can restore the effect of sheltering from of these parts of trousers and ankle better simultaneously, improve the user and try on experience.

Description

AR imaging virtual shoe fitting method and device capable of processing local shielding objects

Technical Field

The invention relates to the technical field of AR (augmented reality), in particular to an AR imaging virtual shoe fitting method and device.

Background

At present, online shopping is a trend, consumers can not know information of various commodities and select favorite commodities, and payment is convenient and rapid. But has problems that: (1) the traditional shoe picture and video mode for the e-commerce cannot be used by taking the user as the center, and the display of the commodity is transferred by taking the attention point of the user as the transfer. (2) The user can not well judge whether the style of the shoes purchased on the network accords with the user, the goods return can be caused, and the conversion rate is reduced.

By applying the AR technology, a user can use the camera to shoot the steps in the shot image at any time and any place to increase the AR three-dimensional model of the shoes, try on various shoes, and can check the full appearance of the commodity at 360 degrees through moving and rotating the user and check the details of the commodity at the same time. However, the existing virtual shoe fitting technology needs to shoot an image of the whole foot to accurately identify the position of the foot. When there is the foot and is sheltered from by trousers, both feet are alternately, other people or other objects shelter from the foot, only expose less parts such as heel, the foot can't be discerned, when leading to rotatory foot to observe the whole effect of examination shoes, the shoes formation of image is not good, perhaps has the virtual examination shoes system can't normally work when shielding the thing beyond.

Meanwhile, when the user tries on the shoes, the user needs to consider the fitting condition of the shoes to other parts such as trousers, ankles and the like, the shielding effect of the parts is simply shielded, and the user trying experience is influenced. The best effect is that when the AR three-dimensional image of the try-on shoes is added into the camera image, the shielding effect of the parts of the trousers and the ankles can be well restored.

Disclosure of Invention

The invention aims to overcome the problem that the virtual shoe trying method and the virtual shoe trying system in the prior art cannot process the condition that a foot is shielded by a foreign object in a shot image, so that the practical operation is inconvenient, and provides an AR imaging virtual shoe trying method and an AR imaging virtual shoe trying device capable of processing a local shielding object.

In order to achieve the purpose, the invention adopts the following technical scheme:

the invention provides an AR imaging virtual shoe fitting method capable of processing local shelters, which comprises the following steps:

step 1, acquiring and calling a camera to capture a foot region image, segmenting and identifying an ankle target, a foot surface or a vamp target and a shielding object target in the foot image and a shielding relation of the ankle target, the foot surface or the vamp target and the shielding object target through a MaskR-CNN neural network;

step 2, calculating the predicted 6D pose of the instep or vamp target by using a PVnet algorithm;

and 3, generating a try-on shoe image corresponding to the predicted 6D pose based on the 3D model of the try-on shoe, covering the try-on shoe image on the foot surface or the vamp target in the foot area image, and maintaining the shielding relation, so that the rendered AR try-on effect image is displayed on the user terminal.

Preferably, the step 2 specifically comprises the following steps:

step 201, down-sampling the obtained foot area image through a ResNet network, up-sampling the down-sampled characteristics of the foot area image, and obtaining a segmentation graph of semantic segmentation of a foot surface or vamp target in the foot area image and a vector field pointing to a 2D key point in the foot surface or vamp target;

step 202, calculating the voting score of each pixel point to each 2D key point according to the vector from each pixel point to the 2D key point;

and 203, calculating the 6D pose of the instep or vamp target by using a PnP algorithm according to the mean value and covariance of the voting scores of all the 2D key points.

Preferably, the 2D keypoints are determined by using a farthest point sampling algorithm.

Preferably, in step 202, the vector calculation formula from each pixel to the 2D key point is as follows:

wherein p represents a pixel point, xK represents a 2D key point;

the voting score calculation formula of each pixel point to each key point is as follows:

wherein k and i are two pixel points for determining possible key, h (k, i) is a possible key point, and p is a pixel point.

Preferably, the step 203 specifically includes the following steps:

calculating the mean value of the voting scores of all the 2D key points, wherein the calculation formula is as follows:

where h (k, i) is a possible key point, W_k，iIs a voting score;

and (3) calculating the covariance of the voting scores of all the 2D key points, wherein the calculation formula is as follows:

and performing algorithm calculation on the 6D pose by using PnP, wherein the 6D pose is calculated by minimizing the Mahalanobis distance:

wherein X_KIs the coordinates of the 3D key points,

is X_KThe 2D mapping of (a) to (b),

and obtaining the mapping relation between the 2D key points in the foot image and the 3D key points of the preset foot object, and obtaining the 6D pose of the user foot in the foot image.

Preferably, the foot area image includes a picture of a current frame of the user foot area image acquired by the user camera.

Preferably, the step 3 specifically comprises the following steps:

step 301, loading a 3D model of a try-on shoe through a network or a local place;

step 302, acquiring a try-on shoe image when the 3D model of the try-on shoe is in the predicted 6D pose of the instep or vamp target;

step 303, covering the image of the try-on shoe at the corresponding position of the instep or vamp target in the image of the foot area, simultaneously shielding the ankle target and the shelter target at the corresponding position of the image of the try-on shoe according to the shielding relationship of the ankle target, the instep or vamp target and the shelter target, and displaying the image on the user terminal after rendering.

The invention provides an AR imaging virtual shoe fitting device capable of processing local shelters, which is used for realizing the method and comprises the following steps:

the segmentation module is used for acquiring and calling a camera to capture a foot region image, segmenting and identifying an ankle target, a foot surface or an upper target and a shielding object target in the foot image and a shielding relation of the ankle target, the foot surface or the upper target and the shielding object target through a MaskR-CNN neural network; (ii) a

The pose prediction module is used for calculating the predicted 6D pose of the instep or vamp target by utilizing a PVnet algorithm;

and the rendering module is used for generating a try-on shoe image corresponding to the predicted 6D pose based on the 3D model of the try-on shoe, covering the try-on shoe image on the foot surface or the vamp target in the foot area image, and maintaining the shielding relation, so that the rendered AR try-on effect image is displayed on the user terminal.

Preferably, the pose prediction module includes:

the sampling unit is used for carrying out down-sampling on the obtained foot area image through a ResNet network, carrying out up-sampling on the down-sampled characteristics of the foot area image, and obtaining a segmentation map for semantic segmentation of a foot surface or vamp target in the foot area image and a vector field pointing to a 2D key point in the foot surface or vamp target;

the voting unit is used for calculating the voting score of each pixel point to each 2D key point according to the vector from each pixel point to the 2D key point;

and the pose unit is used for calculating the 6D pose of the instep or vamp target by using a PnP algorithm according to the mean value and covariance of the voting scores of all the 2D key points.

Preferably, the rendering module includes:

the loading unit is used for loading the 3D model of the try-on shoe through a network or locally;

the image unit is used for acquiring a try-on shoe image when the 3D model of the try-on shoe is in the predicted 6D pose of the instep or vamp target;

and the rendering unit is used for covering the positions corresponding to the instep or vamp targets in the foot area image with the try-on shoe image, simultaneously shielding the ankle target and the shelter target at the corresponding positions of the try-on shoe image according to the shielding relation of the ankle target, the instep or vamp target and the shelter target, and displaying the image on the user terminal after rendering.

The method can eliminate the interference of other unrelated objects when the feet of the user are shielded by unexpected articles (such as trouser legs and the like). And detecting key points by using local information of the visible part of the instep or the vamp target. Then, each pixel predicts a direction vector pointing to the key point of the object, thereby judging the pose of the foot of the user. And superposing the 3D model image of the try-on shoe prestored in the server to the foot position on the image, thereby realizing the effect of virtual try-on. Simultaneously, according to the shielding relation between the ankle target, the instep or the vamp target and the shielding object target, the ankle target and the shielding object target are shielded at the corresponding positions of the images of the try-on shoes according to the shielding relation, and the shielding effects of the parts of the trousers and the ankles can be well restored.

Drawings

FIG. 1 is a first flowchart of an AR imaging virtual shoe fitting method of the present invention capable of handling local occlusions.

FIG. 2 is a second flowchart of the AR imaging virtual shoe fitting method of the present invention capable of handling local occlusions.

FIG. 3 is a third flowchart of the AR imaging virtual shoe fitting method of the present invention capable of handling local occlusions.

FIG. 4 is a functional block diagram of an AR imaging virtual shoe fitting apparatus of the present invention capable of handling local obstructions.

FIG. 5 is another functional block diagram of an AR imaging virtual shoe fitting apparatus of the present invention capable of handling local obstructions.

Detailed Description

The invention is further described with reference to the following figures and detailed description.

As shown in fig. 1, the present application provides an AR imaging virtual shoe fitting method capable of handling local obstructions, the method comprising the steps of:

step 1, acquiring and calling a camera to capture a foot region image, segmenting and identifying an ankle target, a foot surface or an upper target and a shielding object target in the foot image and a shielding relation of the ankle target, the foot surface or the upper target and the shielding object target through a MaskR-CNN neural network.

The foot area image comprises a still picture of the foot area of the user shot by the user camera or an acquired picture of the current frame of the image of the foot area of the user. Therefore, the method can perform AR virtual shoe fitting on the pictures and real-time images shot by the user. Dividing and identifying an ankle target by using a MaskR-CNN neural network, wherein the ankle target refers to the identified ankle of the user; identifying the instep or upper is targeted to the shoe in the case of distinguishing the user wearing the shoe so that the test shoe image can be rendered at the position of the shoe, or identifying the instep in the case of the user wearing the foot so that the test shoe image can be rendered at the position of the instep. And the shade target includes the bottom of a trouser leg or other accidental shade. Meanwhile, the maskR-CNN neural network can be used for identifying the shielding relation between an ankle target, a instep target or an upper target and a shielding object target, namely the shielding relation between the ankle, the upper, the trouser leg and an accidental shielding object from inside to outside.

And 2, calculating the predicted 6D pose of the instep or vamp target by utilizing a PVnet algorithm.

As shown in fig. 2, preferably, the step 2 specifically includes the following steps:

step 201, down-sampling the obtained foot area image through a ResNet network, up-sampling the down-sampled characteristics of the foot area image, and obtaining a segmentation graph of semantic segmentation of a foot surface or vamp target in the foot area image and a vector field pointing to a 2D key point in the foot surface or vamp target.

The 2D key points are determined by adopting a farthest point sampling algorithm.

Step 202, calculating the voting score of each pixel point to each 2D key point according to the vector from each pixel point to the 2D key point.

In step 202, the vector calculation formula from each pixel to the 2D keypoint is:

wherein p represents a pixel point, X_KRepresenting 2D keypoints;

Said step 203 further comprises the steps of:

where h (k, i) is a possible key point, W_k，iIs a voting score;

wherein X_KIs the coordinates of the 3D key points,

is X_KThe 2D mapping of (a) to (b),

In the prior art, for identification and judgment of the 6D pose of an object in a space, an image of the object needs to be acquired completely, and the effect is poor when the object is partially shielded. However, when virtual try-on is performed, the vamp or the instep may be shielded by the bottoms of trousers or other objects, so that the pose identification is inaccurate, and the virtual try-on effect is deviated. According to the technical scheme of the step 2, when the target of the instep or the vamp is partially shielded by the trouser legs or other accidental shielding objects, the instep or the vamp can still be identified, and the 6D position of the instep or the vamp can be judged, so that an accurate basis can be provided for generating the image of the try-on shoe according to the 6D position, and the try-on effect can be faithfully rendered.

As shown in fig. 3, preferably, the step 3 specifically includes the following steps:

The technical scheme of the step 3 is that by utilizing the shielding relation between the ankle target, the instep or the vamp target and the shielding object target identified in the step 1, when the trying-on shoe image is covered on the instep or inclined plane target for rendering, the shielding relation of the shielding objects such as the trouser legs and the like on the instep or the vamp target and the shielding relation of the vamp target for shielding the ankle target are maintained. Therefore, the effect of trying on the feet of the shoes can be simulated more truly, and the user can see the matching effect of the trying on shoes and own socks or trouser legs.

As shown in fig. 4, the present application provides an AR imaging virtual shoe fitting apparatus capable of handling local obstructions, the apparatus being used for implementing the above method, the apparatus comprising:

the segmentation module 1 is used for acquiring and calling a camera to capture a foot region image, segmenting and identifying an ankle target, a foot surface or an upper target and a shielding object target in the foot image and a shielding relation between the ankle target, the foot surface or the upper target and the shielding object target through a MaskR-CNN neural network.

And the pose prediction module 2 is used for calculating the predicted 6D pose of the instep or vamp target by utilizing a PVnet algorithm.

And the rendering module 3 is used for generating a try-on shoe image corresponding to the predicted 6D pose based on the 3D model of the try-on shoe, covering the try-on shoe image on a foot surface or a vamp target in the foot area image, and maintaining the shielding relation, so that the rendered AR try-on effect image is displayed on the user terminal.

As shown in fig. 5, preferably, the pose prediction module 2 includes:

the sampling unit 201 is configured to perform downsampling on the obtained foot area image through a ResNet network, perform upsampling on the downsampled features of the foot area image, and obtain a segmentation map of semantic segmentation of the instep or upper target in the foot area image and a vector field pointing to a 2D key point in the instep or upper target.

And the voting unit 202 is configured to calculate a voting score of each pixel point for each 2D key point according to the vector from each pixel point to the 2D key point.

And the pose unit 203 is used for calculating the 6D pose of the instep or vamp target by using a PnP algorithm according to the mean value and covariance of the voting scores of all the 2D key points.

The rendering module 3 comprises:

a loading unit 301, configured to load the 3D model of the try-on shoe through a network or locally.

An image unit 302, configured to obtain an image of the try-on shoe when the 3D model of the try-on shoe is at the predicted 6D pose of the instep or upper target.

And the rendering unit 303 is configured to cover the image of the try-on shoe at a position corresponding to the instep or upper target in the image of the foot area, and simultaneously, according to a shielding relationship between the ankle target, the instep or upper target and the shielding object target, shield the ankle target and the shielding object target at the corresponding position of the image of the try-on shoe according to the shielding relationship, and display the image on the user terminal after rendering.

The method can eliminate the interference of other unrelated objects when the feet of the user are shielded by unexpected articles (such as trouser legs and the like). And detecting the key points by using the local information of the visible part of the object. Then, each pixel predicts a direction vector pointing to the key point of the object, thereby judging the pose of the foot of the user. And superposing the 3D model image of the try-on shoe prestored in the server to the foot position on the image, thereby realizing the effect of virtual try-on.

The new keypoint locating method has three advantages over the existing keypoint detection method. One is that the existing method only predicts key points once, but in the method, pixels of the visible part of an object have one prediction on the key points, so that the robustness of the model is greatly improved. The second advantage is that we use the properties of the user's foot to a large extent for the representation of the direction vector field of the object's key points. When the user's foot sees a part of the exposed object, the direction of the other part of the object can be estimated. Through the representation of the direction vector field, the network can be helped to learn the structural properties of the foot of the user. The third advantage is that the current method can only represent key points in the picture, and the direction vector field can detect key points outside the picture, so that the 6D pose of the foot in the trunk state can be detected.

Meanwhile, the MaskR-CNN neural network is applied to the scheme, the Mask-RCNN is an image segmentation model of the convolutional neural network based on deep learning, and can segment and identify ankle targets, instep targets or vamp targets and shielding object targets in foot images, wherein the shielding object targets comprise trousers and other accidental shielding objects, and the ankle targets and the shielding object targets are shielded at corresponding positions of the trying-on shoe images according to the shielding relation of the ankle targets, the instep targets or the vamp targets and the shielding object targets, so that the shielding effects of the trousers and the ankle can be well restored.

Claims

1. An AR imaging virtual shoe fitting method capable of processing local shelters is characterized by comprising the following steps:

step 1, acquiring and calling a camera to capture a foot region image, segmenting and identifying an ankle target, a foot surface or a vamp target and a shielding object target in the foot image and a shielding relation of the ankle target, the foot surface or the vamp target and the shielding object target through a Mask R-CNN neural network;

2. The AR imaging virtual shoe fitting method capable of processing local shelters according to claim 1, characterized in that the step 2 specifically comprises the following steps:

3. The AR imaging virtual shoe fitting method capable of handling local obstructions of claim 2, wherein the 2D key points are determined using a farthest point sampling algorithm.

4. The AR imaging virtual shoe fitting method capable of handling local obstructions of claim 2,

wherein p represents a pixel point, xK represents a 2D key point;

5. The AR imaging virtual shoe fitting method capable of processing local shelters according to claim 2, characterized in that said step 203 comprises the following steps:

where h (k, i) is a possible key point, W_k，iIs a voting score;

wherein X_KIs the coordinates of the 3D key points,

is X_KThe 2D mapping of (a) to (b),

6. The AR imaging virtual shoe fitting method capable of handling local obstructions of claim 1, wherein the foot area image comprises a picture of a current frame of a user foot area image acquired by a user camera.

7. The AR imaging virtual shoe fitting method capable of processing local shelters according to claim 1, characterized in that said step 3 comprises the following steps:

8. An AR imaging virtual shoe fitting apparatus capable of handling local obstructions, the apparatus comprising:

the segmentation module is used for acquiring and calling a camera to capture a foot region image, segmenting and identifying an ankle target, a foot surface or an upper target and a shielding object target in the foot image and a shielding relation of the ankle target, the foot surface or the upper target and the shielding object target through a Mask R-CNN neural network; (ii) a

9. The AR imaging virtual shoe fitting apparatus capable of handling local obstructions of claim 8, wherein the pose prediction module comprises:

10. The AR imaging virtual shoe fitting apparatus capable of handling local obstructions of claim 8, wherein the rendering module comprises: