WO2024077809A1 - 三维重建与商品信息处理方法、装置、设备及存储介质 - Google Patents

三维重建与商品信息处理方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2024077809A1
WO2024077809A1 PCT/CN2023/071989 CN2023071989W WO2024077809A1 WO 2024077809 A1 WO2024077809 A1 WO 2024077809A1 CN 2023071989 W CN2023071989 W CN 2023071989W WO 2024077809 A1 WO2024077809 A1 WO 2024077809A1
Authority
WO
WIPO (PCT)
Prior art keywords
target
feature
dimensional model
image
frame
Prior art date
Application number
PCT/CN2023/071989
Other languages
English (en)
French (fr)
Inventor
俞洪蕴
陈志文
吕承飞
Original Assignee
阿里巴巴(中国)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴(中国)有限公司 filed Critical 阿里巴巴(中国)有限公司
Publication of WO2024077809A1 publication Critical patent/WO2024077809A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0641Shopping interfaces
    • G06Q30/0643Graphical representation of items or shoppers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features

Definitions

  • the present application relates to the field of Internet technology, and in particular to a three-dimensional reconstruction and commodity information processing method, device, equipment and storage medium.
  • Multiple aspects of the present application provide a three-dimensional reconstruction and product information processing method, device, equipment and storage medium for high-precision three-dimensional reconstruction of a target object, so that the target object can select suitable products based on the three-dimensional reconstructed model, thereby providing conditions for solving the existing return and exchange problem.
  • An embodiment of the present application provides a 3D reconstruction method, including: acquiring multiple frames of images including a target object, and three-dimensional model description information corresponding to the target object; inputting the multiple frames of images into a feature extraction network for feature extraction to obtain feature vectors of the multiple frames of images, and splicing the feature vectors of the multiple frames of images to obtain a target spliced feature vector; inputting the target spliced feature vector into a parameter regression network, and predicting multiple control parameter sets for model control according to the three-dimensional model description information, the multiple control parameters including posture control parameters and shape control parameters; and masking an initial three-dimensional model of the target object according to the posture control parameters and the shape control parameters to obtain a target three-dimensional model of the target object, wherein the initial three-dimensional model is obtained according to the three-dimensional model description information.
  • the embodiment of the present application also provides a 3D reconstruction device, comprising: an image acquisition unit, for acquiring multiple frames of images of a target object, and 3D model description information corresponding to the target object; a feature extraction unit, for inputting the multiple frames of images into a A feature extraction network performs feature extraction to obtain feature vectors of multiple frame images; a vector splicing unit is used to splice the feature vectors of multiple frame images to obtain a target splicing feature vector; a parameter regression unit is used to input the target splicing feature vector into the parameter regression network, and predict multiple control parameter sets for model control according to the three-dimensional model description information, and the multiple control parameters include posture control parameters and shape control parameters; a mask processing unit is used to perform mask processing on the initial three-dimensional model of the target object according to the posture control parameters and the shape control parameters to obtain a target three-dimensional model of the target object, and the initial three-dimensional model is obtained according to the three-dimensional model description information.
  • An embodiment of the present application also provides a computer device, including: a memory and a processor; the memory is used to store a computer program, and the processor is coupled to the memory and is used to execute the computer program to implement the steps in the three-dimensional reconstruction method provided in the embodiment of the present application.
  • the embodiment of the present application also provides a computer-readable storage medium storing a computer program.
  • the processor is caused to execute the steps in the three-dimensional reconstruction method provided in the embodiment of the present application.
  • the embodiment of the present application also provides a commodity information processing method, comprising: obtaining a plurality of image frames including a tried-on object, and three-dimensional model description information corresponding to the tried-on object; inputting the plurality of image frames into a feature extraction network for feature extraction to obtain feature vectors of the plurality of image frames, and splicing the feature vectors of the plurality of image frames to obtain a target spliced feature vector; inputting the target spliced feature vector into a parameter regression network, and predicting a plurality of control parameters for model control according to the three-dimensional model description information, the plurality of control parameters including posture control parameters and shape control parameters; masking an initial three-dimensional model of the tried-on object according to the posture control parameters and the shape control parameters to obtain a target three-dimensional model of the tried-on object, the initial three-dimensional model being obtained according to the three-dimensional model description information; and providing the tried-on object with target commodity information adapted thereto according to the target three-dimensional model.
  • a new 3D reconstruction network architecture is used to perform 3D reconstruction for the target object.
  • the 3D reconstruction network architecture includes a feature extraction network for extracting features from multiple frames of images containing the target object, a vector splicing network for splicing feature vectors of multiple frames of images, a parameter regression network for predicting model parameters based on the number of parameters in the 3D model description information, and a mask processing network for performing mask processing based on the predicted control parameters.
  • the 3D reconstruction network architecture not only end-to-end 3D reconstruction can be achieved, but also the accuracy of 3D reconstruction can be improved.
  • the target object After obtaining a high-precision 3D reconstruction model of the target object, the target object can be selected for suitable goods based on the 3D reconstruction model, thereby solving the problem of return and exchange caused by inappropriate purchase.
  • FIG1 is a model structure diagram of a three-dimensional reconstruction network provided in an embodiment of the present application.
  • FIG2 is a flow chart of a three-dimensional reconstruction method provided in an embodiment of the present application.
  • FIG3 is a model structure diagram of another 3D reconstruction network provided in an embodiment of the present application.
  • FIG4 is a model structure diagram of a feature extraction network provided in an embodiment of the present application.
  • FIG5 is a model structure diagram of a feature extraction module in a feature extraction network provided in an embodiment of the present application.
  • FIG6 is a model structure diagram of a downsampling submodule provided in an embodiment of the present application.
  • FIG7 is a flow chart of a method for processing commodity information provided in an embodiment of the present application.
  • FIG8 is a schematic structural diagram of a three-dimensional reconstruction device provided in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of the structure of a computer device provided in an embodiment of the present application.
  • “at least one” refers to one or more, and “more than one” refers to two or more.
  • “And/or” describes the access relationship of associated objects, indicating that three relationships may exist.
  • a and/or B may represent: A exists alone, A and B exist at the same time, and B exists alone, where A and B may be singular or plural.
  • the character “/” generally indicates that the related objects before and after are in an "or” relationship.
  • “first”, “second”, “third”, “fourth”, “fifth” and “sixth” are only used to distinguish the contents of different objects and have no other special meanings.
  • the embodiment of the present application provides a three-dimensional reconstruction and commodity information processing method, device, equipment and storage medium.
  • the initial three-dimensional model of the target object is created using the three-dimensional model description information corresponding to the target object, and the target object is three-dimensionally reconstructed using a new three-dimensional reconstruction network architecture using multiple images including the target object.
  • the feature vectors of each of the multiple images are extracted, and the feature vectors of each of the multiple images are spliced, and the posture control parameters and shape control parameters used for model control are predicted based on the spliced feature vectors, and the initial three-dimensional model of the target object is masked according to the posture control parameters and shape control parameters to obtain the target three-dimensional model of the target object.
  • this three-dimensional reconstruction method greatly improves the accuracy of the three-dimensional model.
  • the higher the accuracy of the three-dimensional model the stronger the sense of reality of the three-dimensional model, and the more it can truly express the target object in the real world, thereby effectively expanding the application scope of the three-dimensional model and improving the application effect of the three-dimensional model.
  • the commodity selection scene it is possible to select and purchase a commodity suitable for the target object based on the three-dimensional reconstructed model, which provides conditions for solving the existing return and exchange problem.
  • FIG1 is a model structure diagram of a three-dimensional reconstruction network provided in an embodiment of the present application.
  • the entire three-dimensional reconstruction network may include: a feature extraction network, a vector splicing network, a parameter regression network, and a mask processing network.
  • the target object may be any object that needs to be three-dimensionally reconstructed.
  • the target object may be, for example, a body part such as a foot object, a hand object, a head object, an elbow object, or a leg object on a human body, or various animals and plants in nature, or three-dimensional space scenes such as real houses and mountains, etc., without limitation.
  • an image acquisition device may be used to capture a video of the target object to obtain a video stream including the target object, as shown in 1 in FIG1 , and then multiple consecutive frames of images including the target object in the video stream are sequentially input into the three-dimensional reconstruction network, as shown in 2 and 3 in FIG1 , and then the target object may be reconstructed in three dimensions.
  • the feature extraction network extracts features from each frame of the image in turn, and extracts the feature vector of each frame of the image.
  • the feature vectors of multiple frames of images are sequentially spliced in the order of image acquisition time from early to late using the vector splicing network to obtain the target splicing feature vector; then, as shown in 6 and 7 in Figure 1, the target splicing feature vector is predicted and processed using the parameter regression network to obtain multiple control parameters for model control, and the multiple control parameters may include posture control parameters and shape control parameters.
  • the initial three-dimensional model of the target object obtained based on the three-dimensional model description information is masked using the mask processing network, and the target three-dimensional model of the target object can be output, thus completing the entire three-dimensional reconstruction task.
  • the entire 3D reconstruction network can be deployed on a terminal device or on a server, or part of the entire 3D reconstruction network can be deployed on a terminal device and part of the network can be deployed on a server, without limitation.
  • the terminal device includes, but is not limited to, a mobile phone, a tablet computer, a laptop computer, a wearable device, and a vehicle-mounted device.
  • the server includes, but is not limited to, a single server or a distributed server cluster consisting of multiple servers.
  • model structure of the 3D reconstruction network in FIG1 is merely schematic.
  • the feature extraction network may also be provided with a splicing processing function, so that the 3D reconstruction network does not need to include a dedicated vector splicing network.
  • the parameter regression network may also be provided with a mask processing function, so that the 3D reconstruction network does not need to include a dedicated mask processing network. Any neural network architecture having the above-mentioned feature extraction, vector splicing, parameter regression, and mask processing capabilities is applicable to the embodiments of the present application.
  • FIG2 is a flow chart of a 3D reconstruction method provided in an embodiment of the present application. Referring to FIG2 , the method may include the following steps:
  • Masking is performed on the initial three-dimensional model of the target object according to the posture control parameters and the shape control parameters to obtain a target three-dimensional model of the target object.
  • the initial three-dimensional model is obtained according to the three-dimensional model description information.
  • the three-dimensional model description information corresponding to the target object is prepared in advance.
  • the three-dimensional model description information corresponding to the target object can be determined based on the SMPL (Skinned Multi-Person Linear Model) model, which is a skinned, vertex-based three-dimensional human body model that can accurately represent different shapes and poses of the human body.
  • SMPL Sed Multi-Person Linear Model
  • the 3D model description information describes the number of vertices that the 3D model of the target object needs to contain, the position information of each vertex, and the number of parameters used to control the model. Based on the position information of each vertex, an initial 3D model of the target object can be constructed.
  • the target object is a foot
  • a 3D model constructed with 1600 vertices can be used as the initial 3D model of the foot.
  • the 1600 vertices are only examples and are not limited thereto.
  • the number of vertices required for the three-dimensional model can be flexibly selected according to the model accuracy.
  • the number of parameters used for model control is not limited, and can also be flexibly set according to the accuracy and complexity of the model control.
  • the multiple control parameters used for model control may include attitude control parameters and shape control parameters.
  • the attitude control parameters are used to control the attitude of the three-dimensional model
  • the shape control parameters are used to control the shape of the three-dimensional model.
  • the attitude control parameters may include three attitude angles, namely, roll angle, pitch angle, and yaw angle, and the attitude of the three-dimensional model is controlled by the three attitude angles.
  • the shape control parameters vary depending on the target object. The change of any shape parameter may cause the shape of one or more parts of the target object to change.
  • the shape control parameters include, for example, 10 shape parameters, and the 10 shape parameters can control the size of the toes, the fatness of the foot, the longitudinal and lateral stretching, the arch curvature, etc.
  • the shape control parameters include, for example, 8 shape parameters, and the 8 shape parameters can control the size of the mouth, the height of the bridge of the nose, the distance between the eyes, the width of the forehead, etc.
  • the shape control parameters include 30 shape parameters, and the 30 shape parameters can control the storey height, house size, exterior wall structure of the house, and the like.
  • a 3D reconstruction network is used to reconstruct the target 3D model of the target object.
  • multiple frames of images including the target object can be obtained, and the multiple frames of images can be input into the three-dimensional reconstruction network for three-dimensional reconstruction.
  • the number of multiple frames of images for example, 3 frames, 4 frames, 5 frames, etc.
  • video capture can be performed on the target object in advance to obtain a video stream, and the video stream can be saved locally.
  • multiple frames of images including the target object can be obtained from the locally saved video stream.
  • video capture can also be performed on the target object in real time to obtain a video stream, and multiple frames of images including the target object can be obtained from the real-time captured video stream, and there is no limit to this.
  • the frame image can be directly input into the feature extraction network in the three-dimensional reconstruction network for feature extraction.
  • each frame of the multi-frame images is sequentially used as the current frame image, and the current frame image can be directly input into the feature extraction network for feature extraction.
  • the feature vector of each frame image extracted can be saved.
  • the current frame image can be input into the feature extraction network in the three-dimensional reconstruction network for feature extraction, and the feature vectors of the other several frames of historical images before the current frame image can be directly obtained from the corresponding storage space, but it is not limited to this.
  • the current frame image and the previous several frames of historical images are input into the feature extraction network for feature extraction at the same time.
  • the current frame image includes not only the target object but also the surrounding environment where the target object is located during image acquisition, in order to improve the accuracy of feature extraction, the current frame image can be cropped and the cropped image can be subjected to feature extraction. Therefore, when the current frame image is input into the feature extraction network for feature extraction to obtain the feature vector of the current frame image, the image position of the target object in the current frame image can be detected, and the local image where the target object is located can be cropped from the current frame image according to the image position; the local image is input into the feature extraction network for feature extraction to obtain the feature vector of the current frame image.
  • the category and position of the target object in the image can be detected by the object detection algorithm.
  • the current frame image is preprocessed in sequence, and the preprocessing includes at least one of image scaling and normalization processing; the preprocessed image is input into the target detection network for target detection to obtain the image position of the target object in the preprocessed image.
  • the foot is photographed continuously to obtain 4 frames of original images.
  • the 4 frames of original images are scaled to 160 pixels in height and 90 pixels in width, and the scaled 4 frames of images are normalized by the Z-Score (standard score) method.
  • the 4 frames of normalized images are input into a real-time foot target detection network for foot detection to obtain the image position of the foot.
  • 4 foot images are cropped from the 4 original images, and the size of the 4 foot images is 128*128 pixels.
  • the foot images with a size of 128*128 pixels can be input into the feature extraction network for feature extraction.
  • the feature extraction network can be used to extract features from each frame of multiple images to obtain a feature vector for each frame.
  • the feature vectors corresponding to each of the multiple images are vector-joined to obtain a target joint feature vector.
  • images 1, 2, 3, and 4 are respectively subjected to feature extraction using the feature extraction network to obtain their respective corresponding 128-dimensional feature vectors; the vector joint network is used to join the four 128-dimensional feature vectors to obtain a 512-dimensional feature vector.
  • the 512-dimensional feature vector is the target joint feature vector.
  • a feature extraction network can be used to extract features from each frame of an image, obtain a feature vector for each frame of an image, and save the feature vector of the frame of an image in a designated storage space. After the feature extraction of the current frame of an image with the latest image acquisition time among multiple frames of an image is completed, the feature vector of the current frame of an image and the feature vector of at least one frame of a historical image obtained from a designated storage space are vector-joined.
  • multiple frames of an image include a current frame of an image and at least one frame of a historical image; multiple frames of an image are input into a feature extraction network for feature extraction to obtain feature vectors of multiple frames of an image, including: each time the current frame of an image is input into a feature extraction network for feature extraction to obtain a feature vector of the current frame of an image; the feature vectors of multiple frames of an image are joined to obtain a target joined feature vector, including: using a set sliding window to obtain a feature vector of at least one frame of a historical image from a designated storage space; the feature vector of the current frame of an image is joined to a feature vector of at least one frame of a historical image to obtain a target joined feature vector.
  • the set sliding window is used to control the number of historical images obtained from the specified storage space.
  • the length of the sliding window can be 3; in a scenario where 5 frames of images are used for 3D reconstruction, the length of the sliding window can be 4.
  • the target spliced feature vector is input into a parameter regression network to predict multiple control parameters for model control according to the three-dimensional model description information.
  • the parameter regression network can be expressed as MLP (Multilayer Perceptron) network, and can perform at least one MLP operation.
  • MLP Multilayer Perceptron
  • the MLP network includes multiple input layers, multiple output layers and multiple hidden layers, and is a feedforward artificial neural network model, which maps multiple input data sets to a single output data set.
  • the target splicing feature vector is input into the parameter regression network, and multiple control parameters for model control are predicted according to the three-dimensional model description information, including: inputting the target splicing feature vector into the parameter regression network, and performing at least one multilayer perceptron MLP operation on the target splicing feature vector according to the three-dimensional model description information to obtain multiple control parameters for model control.
  • an MLP operation is performed on the 512-dimensional feature vector output by the vector concatenation network to obtain a 1600-dimensional feature vector; the 1600-dimensional feature vector is subjected to another MLP operation to obtain a 13-dimensional feature vector, and each element in the 13-dimensional feature vector is a control parameter, that is, 13 control parameters are obtained.
  • 13 dimensions are only an example of the number of control parameters, which can be flexibly set according to the requirements of the target object and control complexity.
  • the initial three-dimensional model of the target object is masked according to the posture control parameters and shape control parameters to obtain the target three-dimensional model of the target object. Since the initial three-dimensional model is obtained based on the three-dimensional model description information, the accuracy of the initial three-dimensional model needs to be improved.
  • the posture control parameters are used to adjust the posture of the initial three-dimensional model
  • the shape control parameters are used to adjust the shape of the initial three-dimensional model, thereby obtaining a target three-dimensional model with higher accuracy.
  • the technical solution provided by the embodiment of the present application uses the three-dimensional model description information corresponding to the target object to create the initial three-dimensional model of the target object, and uses multiple images including the target object to perform three-dimensional reconstruction.
  • the feature vectors of each of the multiple images are extracted, and the feature vectors of each of the multiple images are spliced, and the posture control parameters and shape control parameters used for model control are predicted based on the spliced feature vectors, and the initial three-dimensional model of the target object is masked according to the posture control parameters and shape control parameters to obtain the target three-dimensional model of the target object. Therefore, this three-dimensional reconstruction method greatly improves the accuracy of the three-dimensional model.
  • the commodity selection scene it is possible to select and purchase commodities suitable for the target object based on the three-dimensionally reconstructed model, providing conditions for solving the existing return and exchange problems.
  • the feature extraction network in order to perform feature extraction more accurately, can combine image features and camera posture data to perform feature extraction.
  • the feature extraction network may include a feature extraction module, a camera parameter fusion module, a feature stitching module, and a feature dimension reduction module.
  • multiple frames of images are input into the feature extraction network for feature extraction to obtain feature vectors of the multiple frames of images, including: for each frame of the multiple frames of images, the frame image is input into the feature extraction module in the feature extraction network for feature extraction to obtain an image feature map of the frame image; the camera posture data when the frame image is collected is input into the feature extraction network
  • the camera parameter fusion module is used to extract features to obtain the camera pose feature map of the frame image; the image feature map and the camera pose feature map of each frame image are spliced with the feature splicing module in the feature extraction network to obtain the spliced feature map of each frame image; and the feature dimension reduction module in the feature extraction network is used to reduce the dimension of the spliced feature map of each frame image to obtain the feature vector of each frame image.
  • the feature extraction module is used to extract the image feature map of each frame of the image.
  • the model structure of the feature extraction module there is no restriction on the model structure of the feature extraction module, and any feature extraction network that can extract image features can be used as the feature extraction module.
  • the camera parameter fusion module is a module for extracting features from camera posture data.
  • there is no restriction on the model structure of the camera parameter fusion module and any network that can extract features from camera posture data can be used as a camera parameter fusion module.
  • the camera pose data when the frame of image is collected is input into a camera parameter fusion module in a feature extraction network for feature extraction.
  • the implementation method for obtaining the camera pose feature map for the frame of image may be: inputting the camera pose data when the frame of image is collected into a camera parameter fusion module in a feature extraction network, where the camera pose data includes at least two pose angles; performing trigonometric function processing based on at least two pose angles and the relationship between at least two pose angles to obtain a plurality of pose representation parameters; and using a multi-layer perceptron MLP network in a camera parameter fusion module to process the plurality of pose representation parameters to obtain the camera pose feature map for the frame of image.
  • the camera attitude data may include at least two attitude angles of yaw, pitch and roll.
  • performing trigonometric function processing on at least two attitude angles and the relationship between at least two attitude angles to obtain multiple attitude characterization parameters includes: performing numerical calculations on the attitude angles of at least two attitude angles to obtain multiple fused attitude angles, each fused attitude angle representing the relationship between the corresponding two attitude angles; performing trigonometric function processing on each attitude angle of at least two attitude angles and each fused attitude angle of the multiple fused attitude angles to obtain multiple attitude characterization parameters.
  • various numerical calculations such as addition, subtraction or multiplication are performed on two or more of the at least two attitude angles to obtain a plurality of fused attitude angles, each of which represents the relationship between the two corresponding attitude angles.
  • trigonometric function processing is performed on each attitude angle and each fused attitude angle, cosine function, sine function, cotangent function or tangent function processing may be performed, but is not limited thereto.
  • the camera attitude data may include a yaw angle ⁇ , a pitch angle ⁇ , and a roll angle ⁇ .
  • may be any attitude angle among the yaw angle ⁇ , the pitch angle ⁇ , and the roll angle ⁇ , and ⁇ may be any attitude angle except ⁇ .
  • the addition of two different attitude angles ⁇ and ⁇ may obtain a fused attitude angle ⁇ + ⁇ , and the subtraction of two different attitude angles may obtain a fused attitude angle ⁇ - ⁇ , and then 6 fused attitude angles may be obtained, namely ⁇ + ⁇ , ⁇ + ⁇ , ⁇ + ⁇ , ⁇ - ⁇ , ⁇ - ⁇ , and ⁇ - ⁇ .
  • the three attitude angles and the six fused attitude angles are processed by trigonometric functions ⁇ (e) such as sine function sin(e) and cosine function cos(e), respectively, and 18 trigonometric function processing results, i.e., 18 attitude characterization parameters, may be obtained.
  • the 18 attitude characterization parameters constitute an 18-dimensional vector.
  • the multi-layer perceptron MLP network is used to process the various posture representation parameters.
  • a camera pose feature map of the frame image is obtained.
  • a multi-layer perceptron MLP network is used to process a plurality of pose representation parameters to obtain a camera pose feature map of the frame image.
  • the implementation method may be: vectorizing a plurality of pose representation parameters to obtain a camera pose feature vector; and processing the camera pose feature vector using a multi-layer perceptron MLP network to obtain a camera pose feature map.
  • an 18-dimensional camera pose feature vector composed of 18 pose representation parameters is input into a multi-layer perceptron MLP network for processing to obtain a 64-dimensional feature vector, and the 64-dimensional feature vector is converted into a feature map of size 4*4*64.
  • the 4*4*64 feature map is the camera pose feature map.
  • the feature splicing module in the feature extraction network is used to splice the image feature map of the frame image output by the feature extraction module and the camera pose feature map of the frame image output by the camera parameter fusion module to obtain a spliced feature map of each frame of image
  • the feature dimensionality reduction module in the feature extraction network is used to perform dimensionality reduction processing on the spliced feature map of each frame of image to obtain a splicing vector for each frame of image.
  • the feature concatenation module outputs a feature map of size 4*4*256.
  • the convolution module with a convolution kernel size of 1*1 is used to reduce the dimension of the 4*4*256 feature map to obtain a 4*4*64 feature map.
  • the 4*4*64 feature map is concatenated with the 4*4*64 feature map output by the camera parameter fusion module to obtain a 4*4*128 feature map.
  • the convolution module with a convolution kernel size of 4*4 is used to reduce the dimension of the 4*4*128 feature map to obtain a 1*1*128 feature map, and the 1*1*128 feature map is converted into a 128-dimensional feature vector.
  • the feature extraction task of the frame image is completed.
  • the feature extraction module in the feature extraction network may include a jump connection layer and a downsampling layer connected in sequence. Therefore, for each frame image in the multiple frames of images, the frame image is input into the feature extraction module in the feature extraction network for feature extraction to obtain an image feature map of the frame image.
  • An optional implementation method is: for each frame image in the multiple frames of images, the frame image is input into the jump connection layer in the feature extraction module, multi-resolution feature map extraction is performed on the frame image, and feature maps with the same resolution are jump-connected to obtain a second intermediate feature map of the frame image; the second intermediate feature map of the frame image is input into the downsampling layer in the feature extraction module for M downsampling processing to obtain the image feature map of the frame image, where M is a positive integer ⁇ 1.
  • the skip connection layer can perform multiple downsampling and multiple upsampling operations, and perform skip connection operations during the upsampling process.
  • the feature map of the current input is upsampled to obtain the feature map of the current upsampling output
  • the feature map of the current upsampling output is connected with the feature map of the same resolution that has been obtained, that is, a skip connection, to obtain the feature map of the final output of the current upsampling.
  • the skip connection layer first extracts features from the input image to obtain the initial feature map of the input image, and then performs multiple downsampling operations on the initial feature map.
  • the input feature map of the first downsampling operation is the initial feature map. In this way, after multiple downsampling operations, multiple feature maps of different resolutions can be obtained.
  • the feature map output by the last downsampling operation is used as the first intermediate feature map. Then, multiple upsampling operations are performed on the first intermediate feature map.
  • the feature map output by the last upsampling operation is obtained, and the feature map output by the last downsampling operation is upsampled to obtain the intermediate feature map output by this upsampling operation; the intermediate feature map output by this upsampling operation is connected to the feature map of the same resolution obtained by the downsampling operation or feature extraction, i.e., a jump connection is performed to obtain the feature map output by this upsampling operation.
  • the feature map output by the last upsampling operation is used as the jump connection layer to extract features from the input image to obtain the second intermediate feature map.
  • the jump connection layer adopts an encoder and decoder structure, and for each frame image, the frame image is input into the jump connection layer in the feature extraction module, multi-resolution feature map extraction is performed on the frame image, and the feature map with the same resolution is jump-connected to obtain a second intermediate feature map, including: inputting the frame image into the encoder in the jump connection layer, encoding the frame image to obtain the initial feature map of the frame image, and sequentially performing N downsampling processing on the initial feature map to obtain a first intermediate feature map; inputting the first intermediate feature map into the decoder in the jump connection layer, sequentially performing N upsampling processing on the first intermediate feature map, and performing jump connection with the first intermediate feature map with the same resolution obtained by downsampling processing in the encoder in each upsampling processing to obtain the second intermediate feature map of the frame image.
  • the four arrows representing downsampling in the jump connection layer correspond to the encoder
  • the encoder includes a sequentially connected encoding submodule and N downsampling submodules
  • the frame image is input into the encoder in the jump connection layer
  • the frame image is encoded to obtain the initial feature map of the frame image
  • the initial feature map is sequentially downsampled N times to obtain the first intermediate feature map, including: the frame image is input into the encoding submodule for encoding to obtain the initial feature map of the frame image; the initial feature map is downsampled N times by using N downsampling submodules to obtain the first intermediate feature map;
  • the input is convolved using the target convolution parameters corresponding to the K1 convolution units connected in sequence to obtain the intermediate feature map to be activated, and the activation function is used to activate the intermediate feature map to be activated to obtain the output of each convolution unit
  • K1 is a positive integer ⁇ 2.
  • each downsampling submodule includes three convolution units connected in sequence.
  • the output result of the previous convolution unit is the input parameter of the next convolution unit
  • the input parameter of the first convolution unit of the first downsampling submodule is the initial feature map output by the encoding submodule
  • the output result of the last convolution unit in the last downsampling submodule is the first intermediate feature map.
  • the target convolution parameters corresponding to each convolution unit are obtained by merging the parameters of multiple branches in the training phase using the reparameterization technology.
  • Introducing multiple branches in the training phase of the 3D reconstruction network can improve the accuracy of the 3D reconstruction network, and merging branches in the inference phase of the 3D reconstruction network can improve the 3D reconstruction efficiency of the 3D reconstruction network.
  • the operation process of the convolution unit is divided into three branches.
  • the parameters of the first branch are denoted as c1 and b1; the parameters of the second branch are denoted as c2 and b2; the parameter of the second branch is denoted as b3; c1 and c2 are convolution parameters, and b1, b2, and b3 are BN (Batch Normalization, Batch normalization) parameters; after the input parameters are processed in sequence by the convolution parameters and batch normalization parameters corresponding to the three branches, the processing results of the three branches are added to obtain the intermediate feature map to be activated, and the activation function (such as ReLu or sigmoid) is used to activate the intermediate feature map to be activated to obtain the output of each convolution unit.
  • the activation function such as ReLu or sigmoid
  • the target convolution parameters of the convolution unit are obtained by merging the convolution parameters and batch normalization parameters of the three branches in the training phase.
  • the intermediate feature map to be activated obtained by processing the convolution parameters and batch normalization parameters corresponding to the three branches is the same as the intermediate feature map to be activated obtained by processing the reparameterized target convolution parameters c3.
  • reparameterization changes the way the input parameters are calculated, it does not change the calculation results of the input parameters.
  • the feature extraction module in the feature extraction network includes a jump connection layer and a downsampling layer connected in sequence.
  • the downsampling layer includes a plurality of downsampling submodules connected in sequence.
  • Each downsampling submodule can be any module with a downsampling function, and there is no limitation on this. Referring to FIG3 , in the downsampling layer, each downsampling submodule performs downsampling processing on the feature map output by the previous downsampling submodule to obtain the feature map output by the downsampling submodule.
  • the first downsampling submodule performs downsampling processing on the second intermediate feature map output by the jump connection layer.
  • the feature map output by the last downsampling submodule is used as the output result of the downsampling layer.
  • the downsampling layer includes M downsampling submodules connected in sequence
  • the second intermediate feature map of the frame image is input into the downsampling layer in the feature extraction module for M downsampling processing to obtain the image feature map of the frame image, including: using M downsampling submodules to perform M downsampling processing on the second intermediate feature map to obtain the image feature map of the frame image; wherein, in each downsampling submodule, the input is convolved using the target convolution parameters corresponding to the K2 convolution units connected in sequence to obtain the intermediate feature map to be activated, and the activation function is used to activate the intermediate feature map to be activated to obtain the output of each convolution unit, and K2 is a positive integer ⁇ 2.
  • each downsampling submodule in the downsampling layer is not limited, for example, it can be 2, 3, 4 or 5.
  • each downsampling submodule in the downsampling layer can include 3 convolution units, and the structure of the downsampling submodule shown in Figure 6 can be adopted, but it is not limited to this.
  • the target three-dimensional model is adapted to the target object in the frame according to the camera posture data when the frame is acquired, and a product that is adapted to the target object is selected for the target object based on the adaptation result.
  • the camera extrinsic parameters are obtained according to the camera posture data when the frame of image is collected.
  • the camera extrinsic parameters refer to the parameters of the camera in the world coordinate system, such as the camera's position, rotation direction, etc., which are mainly divided into rotation matrix and translation matrix.
  • the camera parameter estimation network can be trained in advance using a large number of sample images and the corresponding camera extrinsic parameters when the sample images are taken.
  • the image is input into the camera parameter estimation network for recognition processing to obtain the camera extrinsic parameters corresponding to the time when the image was taken.
  • the camera extrinsic parameters After obtaining the camera extrinsic parameters, based on the pinhole Imaging theory, project each vertex in the target three-dimensional model into the frame image according to the camera external parameters, and obtain the projection points corresponding to each vertex in the target three-dimensional model; use the feature point matching technology to determine the real image feature points matching the projection points from the real image feature points of the frame image, and for each projection point, determine the adaptation result between the vertex on the target object in the real world and the vertex on the target three-dimensional model according to the image position of the real image feature point corresponding to the projection point and the image position of the projection point in the image.
  • the real image feature point refers to the feature point corresponding to the vertex on the target object in the real world.
  • the degree of adaptation between the vertex on the target object in the real world and the vertex on the target three-dimensional model is quantified.
  • the target 3D model and the product 3D models corresponding to the plurality of candidate product information may be used to select product information whose product 3D model has the highest degree of fit with the target 3D model from the plurality of candidate product information as the target product information, and the target product information may be provided to the target object.
  • a product three-dimensional model that matches the target three-dimensional model can be customized for the target object based on model parameters corresponding to the target three-dimensional model and the selected product type, and the product information corresponding to the product three-dimensional model can be provided to the target object as target product information.
  • any frame of the multiple frames is input into a depth estimation network to estimate the size information of the target object, and the target three-dimensional model is labeled according to the estimated size information of the target object.
  • a large number of sample images and the size information of the target objects in the sample images can be used to train the depth estimation network in advance.
  • the image is input into the depth estimation network to estimate the size information of the target object, which includes, but is not limited to, the length and width of the target object.
  • the estimated size information of the target object can be annotated on the target 3D model. For example, in a virtual shoe-trying scene, there may be a need to measure the length and width of the foot, and the length and width of the foot are annotated on the reconstructed 3D model of the foot.
  • the target three-dimensional model is adapted to the target object in the frame according to the camera posture data when the frame is acquired, and the shape parameters of the target object are measured based on the adaptation result.
  • FIG7 is a flow chart of a commodity information processing method provided in an embodiment of the present application. Referring to FIG7 , the method may include the following steps:
  • the multiple control parameters of the type control include posture control parameters and shape control parameters.
  • Masking is performed on the initial three-dimensional model of the fitting object according to the posture control parameters and the shape control parameters to obtain a target three-dimensional model of the fitting object.
  • the initial three-dimensional model is obtained according to the three-dimensional model description information.
  • target product information matching the target three-dimensional model is provided to the try-on object according to the target three-dimensional model, including: selecting product information whose product three-dimensional model has the highest degree of compatibility with the target three-dimensional model from multiple candidate product information as the target product information, and providing the target product information to the try-on object; or customizing a product three-dimensional model matching the target three-dimensional model for the try-on object according to model parameters corresponding to the target three-dimensional model and the selected product type, and providing the product information corresponding to the product three-dimensional model as the target product information to the try-on object.
  • selecting the product information whose product three-dimensional model has the highest degree of fit between the target three-dimensional model from multiple candidate product information as the target product information including: for the product three-dimensional model corresponding to each candidate product information, fusing the target three-dimensional model of the tried-on object with the product three-dimensional model to obtain a fused three-dimensional model, the fused three-dimensional model representing a first relative position relationship between the three-dimensional model of the tried-on object and the three-dimensional model of the product in a tried-on state; according to the first relative position relationship, obtaining multiple distance information between multiple target vertices on the three-dimensional model of the tried-on object and corresponding vertices or areas on the three-dimensional model of the product as the fitness information of the multiple target vertices; judging the fitness of the target three-dimensional model and the product three-dimensional model according to the fitness information of the multiple target vertices; after obtaining the fitness of the product three-dimensional model corresponding to each candidate product information and the target three-dimensional model, the
  • the above-mentioned product three-dimensional model corresponding to each candidate product information is integrated with the target three-dimensional model of the try-on object and the product three-dimensional model to obtain an optional implementation method of the fused three-dimensional model: obtaining the target three-dimensional model of the try-on object, the product three-dimensional model and the target try-on parameters of the try-on object corresponding to the product three-dimensional model; determining, according to the target try-on parameters, a second relative position relationship between at least three reference vertices on the target three-dimensional model of the try-on object and the corresponding reference vertices on the product three-dimensional model; according to the second relative position relationship, placing the target three-dimensional model of the try-on object at least partially inside the product three-dimensional model to obtain a fused three-dimensional model.
  • the target fitting parameters can be set based on experience. Further, optionally, the target fitting parameters of the fitting object for the commodity object can be obtained based on the attribute information of the fitting object, the fitting preference information of the user to whom the fitting object belongs, and/or the reference fitting parameters corresponding to the commodity object.
  • determining, according to the target fitting parameter, a second relative position relationship between a plurality of reference vertices on the target three-dimensional model of the fitting object and corresponding reference vertices on the three-dimensional model includes at least one of the following:
  • Method 1 According to the fitting distance between the shoe and the heel, determine the fitting distance between the first heel vertex on the three-dimensional model of the foot and the second heel vertex on the three-dimensional model of the shoe as the second relative position relationship.
  • the vertex type can be marked.
  • Point types include, for example, heel vertices, sole vertices, or toe vertices.
  • a vertex on the heel is selected from multiple vertices included in the three-dimensional model of the foot as the first heel vertex, and according to the position distribution of the first heel vertex on the heel, a heel vertex with the same position distribution as the first heel vertex is selected from multiple heel vertices on the three-dimensional model of the shoe as the corresponding second heel vertex.
  • the first heel vertex and the second heel vertex are controlled to be separated by a trial distance.
  • Method 2 According to the fitting relationship between the sole of the foot and the sole of the shoe, determine that the first sole vertex on the three-dimensional model of the foot coincides with the second sole vertex on the three-dimensional model of the shoe as the second relative position relationship.
  • a plurality of first sole vertices on the sole of the foot are selected from a plurality of vertices included in the three-dimensional model of the foot based on the vertex type.
  • a plurality of second sole vertices having the same position distribution as the first sole vertices are selected from a plurality of vertices on the three-dimensional model of the shoe according to the position distribution of each sole vertex on the heel.
  • Method 3 Based on the alignment relationship between the center of the sole and the center of the shoe sole, determine that the first center line vertex on the center line of the sole on the three-dimensional model of the foot and the second center line vertex on the center line of the shoe sole on the three-dimensional model of the shoe are aligned in the direction of the foot length as the second relative position relationship.
  • a vertex on the center line of the sole of the foot is selected from multiple vertices included in the three-dimensional model of the foot as the first center line vertex, and a vertex with the same position distribution as the first center line vertex is selected from multiple vertices on the three-dimensional model of the shoe as the corresponding second center line vertex.
  • the first center line vertex and the second center line vertex are controlled to be aligned in the direction of the foot length.
  • the position coordinates of each vertex included in the target three-dimensional model of the object to be tried on and the position coordinates of each vertex included in the three-dimensional model of the product are uniformly transformed into the same coordinate system, and the target three-dimensional model of the object to be tried on and the three-dimensional model of the product are controlled to maintain a second relative position relationship.
  • the operation of placing at least part of the target three-dimensional model of the object to be tried on inside the three-dimensional model is completed, and a fused three-dimensional model is obtained.
  • the target three-dimensional model of the fitting object and the commodity three-dimensional model in the fused three-dimensional model maintain a first relative position relationship, and in this fused state, the fitness information calculation operation is performed.
  • the fitness information reflects the degree of wearing adaptation.
  • multiple target vertices participating in the fitness information calculation are selected. For example, each vertex on the target three-dimensional model of the fitting object is used as a target vertex.
  • some vertices can be selected from the target three-dimensional model of the fitting object as target vertices.
  • the vertex corresponding to the key part information is selected from the target three-dimensional model of the fitting object as the target vertex.
  • the key parts include, for example, but are not limited to: toes, heels, arches, insteps, inner insteps, outer insteps, soles, etc.
  • the distance information between the target vertex and the corresponding vertex on the three-dimensional model of the product can be used as the fitness information of the target vertex.
  • the distance information from the point to the area where the corresponding vertex on the three-dimensional model of the product is located is used as the fitness information of the target vertex.
  • multiple distance information between multiple target vertices on the target three-dimensional model of the tried-on object and the corresponding areas on the three-dimensional model of the product is calculated as the fitness information of multiple target vertices, including: for each target vertex on the target three-dimensional model of the tried-on object, according to the first relative position relationship, the first vertex on the three-dimensional model of the product closest to the target vertex is obtained; multiple triangular facets with the first vertex as the connection point are used as the area corresponding to the target vertex on the three-dimensional model of the product; multiple distances from the target vertex to the multiple triangular facets are calculated, and the fitness information of the target vertex is generated according to the multiple distances.
  • the distance from the target vertex to the triangle patch includes, but is not limited to: the distance from the target vertex to the center point of the triangle patch, the vertical distance from the target vertex to the triangle patch, and the maximum, minimum or average of the distances from the target vertex to the three vertices of the triangle patch.
  • the maximum, minimum or average of multiple distances from the target vertex to multiple triangle patches is calculated to obtain the final distance information from the target vertex to the triangle patch, and the final distance information is used as the fitness information of the target vertex.
  • the fitness range information corresponding to each target vertex that meets the fitness requirements can be flexibly set. If the fitness information of each target vertex falls within its corresponding fitness range information, the target vertex meets the fitness requirements. If the fitness information of each target vertex does not fall within its corresponding fitness range information, the target vertex does not meet the fitness requirements. After determining whether each target vertex meets its own fitness requirements, the fitness of the target three-dimensional model and the three-dimensional model of the product is determined based on whether each target vertex meets its own fitness requirements.
  • the degree of fit between the target 3D model and the product 3D model can be manually intervened.
  • the target 3D model of the object to be tried on, the product 3D model and any 3D model in the fused 3D model can be displayed, and the degree of fit information of multiple target vertices on any 3D model can be visually marked, wherein the degree of fit information with different size relationships to the reference degree of fit range corresponds to different visual marking states, so that the user can confirm the degree of fit between the target 3D model and the product 3D model.
  • the fitness information of multiple target vertices is visually marked on any of the above three-dimensional models, so that different fitness information is marked with different visual marking states. For example, vertices that meet the fitness requirements are marked in green, and vertices that do not meet the fitness requirements are marked in red.
  • the reference fitness range refers to the numerical range of the fitness that determines whether the fitness requirements are met.
  • the fitness information within the reference fitness range meets the fitness requirements, and the fitness information outside the reference fitness range does not meet the fitness requirements.
  • the more fitness information that does not meet the fitness requirements the lower the fitness between the target 3D model and the 3D model of the product.
  • the more fitness information that meets the fitness requirements the higher the fitness between the target 3D model and the 3D model of the product.
  • any three-dimensional model can be rendered according to the fitness information of multiple target vertices to obtain
  • different colors in the fitness heat map represent fitness information with different size relationships with the benchmark fitness range. It should be noted that there can be multiple benchmark fitness ranges, for example, different benchmark fitness ranges can be set for different parts of the try-on object.
  • the heel part corresponds to the first benchmark fitness range, such as 1-2cm
  • the sole part corresponds to the second benchmark fitness range, such as 0.5-1cm
  • the ankle part corresponds to the third benchmark fitness range, such as 0-1cm, and so on.
  • the first color value is used to mark the fitness information within the benchmark fitness range
  • the second color value is used to mark the fitness information greater than the upper limit of the benchmark fitness range
  • the third color value is used to mark the fitness information less than the lower limit of the benchmark fitness range.
  • the user can confirm the fitness of the target three-dimensional model and the three-dimensional model of the product based on the visual marking state of any three-dimensional model.
  • the fitness heat map as an example, when the user intuitively sees that the number of areas marked with colors (for example, red) that do not meet the fitness requirements on the fitness heat map is relatively large, it can be concluded that the fitness of the three-dimensional model of the product and the target three-dimensional model of the object being tried on is low.
  • the fitness of the three-dimensional model of the product and the target three-dimensional model of the object being tried on is high.
  • the number of areas marked with colors (for example, red) that do not meet the fitness requirements on the fitness heat map is not too many or too few, it can be concluded that the fitness of the three-dimensional model of the product and the target three-dimensional model of the object being tried on is medium.
  • the product information with the highest degree of fit with the target three-dimensional model can be selected as the target product information.
  • the implementation method of customizing the commodity three-dimensional model adapted to the target three-dimensional model for the try-on object includes: obtaining the reference three-dimensional model corresponding to the selected commodity type, fusing the target three-dimensional model of the try-on object with the reference three-dimensional model to obtain a fused three-dimensional model, wherein the fused three-dimensional model represents the first relative position relationship between the target three-dimensional model of the try-on object and the reference three-dimensional model in the try-on state; according to the first relative position relationship, obtaining multiple distance information between multiple target vertices on the target three-dimensional model of the try-on object and corresponding vertices or regions on the reference three-dimensional model as the fitness information of the multiple target vertices; when it is determined that the reference three-dimensional model does not meet the fitness requirements according to the fitness information of the multiple target vertices, adjusting the size parameters and/or shape
  • the adjusted baseline three-dimensional model is used as a new baseline three-dimensional model, and the fitness information of multiple target vertices is repeatedly obtained until it is determined that the baseline three-dimensional model meets the fitness requirements based on the fitness information of multiple target vertices, and the baseline three-dimensional model that meets the fitness requirements is used as the final product three-dimensional model.
  • the size parameters of the reference three-dimensional model include, but are not limited to, the length, width and Height, or the length, width and height of each part in the reference 3D model. Taking shoes as an example, the size parameters include: shoe length, shoe width, or toe length or width, or instep height, etc.
  • the shape parameters of the reference 3D model define the shape characteristics of the reference 3D model. Taking shoes as an example, the heel height, head width, head length or instep height of the shoe, etc.
  • the size parameters and/or shape parameters of the reference three-dimensional model may be adjusted automatically, or in response to an adjustment operation on the reference three-dimensional model triggered by a user, the size parameters and/or shape parameters of the reference three-dimensional model may be adjusted, without limitation.
  • an adjustment control can be provided to the user, and the user can initiate the adjustment operation for the reference three-dimensional model through the adjustment control.
  • the adjustment control can be displayed in the associated area of any of the above three-dimensional models, and the adjustment control can be but not limited to a sliding bar. Based on this, in response to at least one sliding operation on the sliding bar, the sliding distance and sliding direction of each sliding operation can be obtained, and the adjustment amplitude and adjustment direction can be determined according to the sliding distance and the sliding direction respectively; according to the adjustment direction and the adjustment amplitude, the size parameters and/or shape parameters of the reference three-dimensional model are adjusted.
  • the sliding distance determines the adjustment amplitude of the size parameters and/or shape parameters
  • the sliding direction determines the adjustment direction of the size parameters and/or shape parameters.
  • the adjustment direction can be adjusted in an increasing direction based on the current parameters, or in a decreasing direction, and there is no restriction on this. It is worth noting that any area of the display area where the three-dimensional model is located can be used as an associated area, and a sliding bar can be displayed in the associated area to facilitate the user to perform the adjustment operation.
  • the sliding distance is proportional to the adjustment amplitude, and the larger the sliding distance, the larger the adjustment amplitude of the size parameter and/or the shape parameter; the smaller the sliding distance, the smaller the adjustment amplitude of the size parameter and/or the shape parameter. Accordingly, taking the sliding bar from left to right as an example, sliding to the left represents adjustment back, which means adjusting the size parameter and/or the shape parameter to a smaller value, that is, the adjustment direction is the direction of adjustment to a smaller value; sliding to the right represents adjustment forward, which means adjusting the size parameter and/or the shape parameter to a larger value, that is, the adjustment direction is the direction of adjustment to a larger value.
  • a slider can be used to adjust the size parameters and shape parameters of the reference three-dimensional model in linkage. Considering that in practical applications, there may be a need to adjust only the size parameters or the shape parameters.
  • the slider may include a first slider and a second slider, the first slider is used to adjust the size parameters of the reference three-dimensional model, and the second slider is used to adjust the shape parameters of the reference three-dimensional model. The user can adjust the size parameters and shape parameters of the reference three-dimensional model through the first slider and the second slider, respectively.
  • the final three-dimensional model of the product can be sent to the server, and the server can obtain the target product object whose size and appearance match the tried-on object, and can also return the information of the target product object (i.e., target product information) to the terminal device.
  • the information of the target product object includes, but is not limited to: the material, model, style, production progress, logistics distribution progress, production date, manufacturer, etc. of the target product object.
  • the terminal device can output the information of the target product object to the user, and the user can determine whether to customize the target product object based on the information; and in response to the user's operation of determining customization, the terminal device can also send a message to the server.
  • the server can also send the target 3D model to the customization platform, and the customization platform performs production based on the target 3D model to customize the target commodity object whose size and shape match the tried-on object, and deliver the produced target commodity object to the user through logistics distribution.
  • the technical solution provided by the embodiment of the present application uses the three-dimensional model description information corresponding to the try-on object to create the initial three-dimensional model of the try-on object, and then uses multiple images including the try-on object for three-dimensional reconstruction, extracts the feature vectors of each of the multiple images during the three-dimensional reconstruction process, and splices the feature vectors of each of the multiple images, and predicts the posture control parameters and shape control parameters used for model control based on the spliced feature vectors, and masks the initial three-dimensional model of the try-on object according to the posture control parameters and shape control parameters to obtain the target three-dimensional model of the try-on object. Therefore, this three-dimensional reconstruction method greatly improves the accuracy of the three-dimensional model.
  • the three-dimensionally reconstructed model can be used to select and purchase suitable commodities for the try-on object, providing conditions for solving the existing return and exchange problems.
  • FIG8 is a schematic diagram of the structure of a 3D reconstruction device provided in an embodiment of the present application.
  • the device may include the following units:
  • An image acquisition unit 81 is used to acquire multiple frames of images of a target object and three-dimensional model description information corresponding to the target object;
  • a feature extraction unit 82 is used to input the multiple frame images into a feature extraction network for feature extraction to obtain feature vectors of the multiple frame images;
  • a vector splicing unit 83 is used to splice feature vectors of multiple frames of images to obtain a target spliced feature vector
  • a parameter regression unit 84 used for inputting the target splicing feature vector into a parameter regression network, and predicting a plurality of control parameters for model control according to the number of parameters, wherein the plurality of control parameters include a posture control parameter and a shape control parameter;
  • the mask processing unit 85 is used to perform mask processing on the initial three-dimensional model of the target object according to the posture control parameters and the shape control parameters to obtain a target three-dimensional model of the target object.
  • the initial three-dimensional model is obtained according to the three-dimensional model description information.
  • the feature extraction unit 82 inputs multiple frames of images into a feature extraction network for feature extraction to obtain feature vectors of the multiple frames of images, it is specifically used to: for each frame of the multiple frames, input the frame of image into a feature extraction module in the feature extraction network for feature extraction to obtain an image feature map of the frame of image; input the camera posture data when the frame of image is acquired into a camera parameter fusion module in the feature extraction network for feature extraction to obtain a camera posture feature map of the frame of image; use the feature splicing module in the feature extraction network to splice the image feature map and the camera posture feature map of each frame of image to obtain a spliced feature map of each frame of image; and use the feature dimensionality reduction module in the feature extraction network to perform dimensionality reduction processing on the spliced feature map of each frame of image to obtain a feature vector of each frame of image.
  • the feature extraction unit 82 inputs each frame image in the multiple frames of images into a feature
  • the feature extraction module in the feature extraction network performs feature extraction to obtain the image feature map of the frame image
  • it is specifically used to: for each frame image in the multiple frames of images, input the frame image into the jump connection layer in the feature extraction module, perform multi-resolution feature map extraction on the frame image and perform jump connection on the feature maps with the same resolution to obtain a second intermediate feature map of the frame image; input the second intermediate feature map of the frame image into the downsampling layer in the feature extraction module for M downsampling processing to obtain the image feature map of the frame image, where M is a positive integer ⁇ 1.
  • the jump connection layer adopts an encoder and decoder structure
  • the feature extraction unit 82 inputs the frame image into the jump connection layer in the feature extraction module, performs multi-resolution feature map extraction on the frame image and performs jump connection on the feature map of the same resolution to obtain a second intermediate feature map, which is specifically used for: inputting the frame image into the encoder in the jump connection layer, encoding the frame image to obtain an initial feature map of the frame image, and sequentially downsampling the initial feature map N times to obtain a first intermediate feature map; inputting the first intermediate feature map into the decoder in the jump connection layer, sequentially upsampling the first intermediate feature map N times, and in each upsampling process, performing a jump connection with the first intermediate feature map of the same resolution obtained by downsampling in the encoder to obtain the second intermediate feature map of the frame image.
  • the encoder includes a coding submodule and N downsampling submodules connected in sequence
  • the feature extraction unit 82 inputs the frame image into the encoder in the jump connection layer, encodes the frame image to obtain an initial feature map of the frame image, and sequentially downsamples the initial feature map N times to obtain a first intermediate feature map, which is specifically used for: inputting the frame image into the coding submodule for encoding to obtain the initial feature map of the frame image; using N downsampling submodules to downsample the initial feature map N times to obtain a first intermediate feature map; wherein, in each downsampling submodule, the input of K1 convolution units connected in sequence is convolved with their respective target convolution parameters to obtain an intermediate feature map to be activated, and the intermediate feature map to be activated is activated using an activation function to obtain the output of each convolution unit, and K1 is a positive integer ⁇ 2.
  • the downsampling layer includes M downsampling sub-modules connected in sequence
  • the feature extraction unit 82 inputs the second intermediate feature map of the frame image into the downsampling layer in the feature extraction module for M downsampling processes to obtain the image feature map of the frame image, specifically for: using M downsampling sub-modules to perform M downsampling processes on the second intermediate feature map to obtain the image feature map of the frame image; wherein, in each downsampling sub-module, the input is convolved using the target convolution parameters corresponding to each of the K2 convolution units connected in sequence to obtain the intermediate feature map to be activated, and the activation function is used to activate the intermediate feature map to be activated to obtain the output of each convolution unit, and K2 is a positive integer ⁇ 2.
  • the feature extraction unit 82 inputs the camera posture data when the frame image is collected into the camera parameter fusion module in the feature extraction network for feature extraction, and obtains the camera posture feature map of the frame image, which is specifically used to: input the camera posture data when the frame image is collected into the camera parameter fusion module in the feature extraction network, and the camera posture data includes at least two posture angles;
  • a plurality of posture characterization parameters are obtained by performing trigonometric function processing according to at least two posture angles and the relationship between at least two posture angles; a multi-layer perceptron MLP network in a camera parameter fusion module is used to process the plurality of posture characterization parameters to obtain a camera posture feature map of the frame image.
  • the feature extraction unit 82 when the feature extraction unit 82 performs trigonometric function processing based on at least two posture angles and the relationship between at least two posture angles to obtain a plurality of posture characterization parameters, it is specifically used to: perform numerical calculations on pairs of the at least two posture angles to obtain a plurality of fused posture angles, each fused posture angle representing the relationship between the corresponding two posture angles; perform trigonometric function processing on each posture angle of the at least two posture angles and each fused posture angle of the multiple fused posture angles to obtain a plurality of posture characterization parameters.
  • the feature extraction unit 82 uses a multi-layer perceptron MLP network to process multiple posture representation parameters to obtain a camera posture feature map of the frame image, it is specifically used to: vectorize the multiple posture representation parameters to obtain a camera posture feature vector; and use a multi-layer perceptron MLP network to process the camera posture feature vector to obtain a camera posture feature map.
  • the parameter regression unit 84 inputs the target splicing feature vector into the parameter regression network, and predicts multiple control parameters for model control according to the three-dimensional model description information, specifically for:
  • the target splicing feature vector is input into a parameter regression network, and at least one multi-layer perceptron MLP operation is performed on the target splicing feature vector according to the three-dimensional model description information to obtain a plurality of control parameters for model control.
  • the multiple frames of images include a current frame of image and at least one frame of historical image
  • the feature extraction unit 82 When the feature extraction unit 82 inputs the multiple frame images into the feature extraction network for feature extraction to obtain the feature vectors of the multiple frame images, it is specifically used to: input the current frame image into the feature extraction network for feature extraction each time to obtain the feature vector of the current frame image;
  • the vector stitching unit 83 stitches the feature vectors of multiple frames of images to obtain a target stitching feature vector, it is specifically used to: use a set sliding window to obtain the feature vector of at least one frame of historical image from a specified storage space; stitch the feature vector of the current frame image with the feature vector of at least one frame of historical image to obtain a target stitching feature vector.
  • each time the feature extraction unit 82 inputs the current frame image into the feature extraction network for feature extraction to obtain a feature vector of the current frame image it is specifically used to: detect the image position of the target object in the current frame image, and crop a local image where the target object is located from the current frame image according to the image position; and input the local image into the feature extraction network for feature extraction to obtain a feature vector of the current frame image.
  • the feature extraction unit 82 detects the image position of the target object in the current frame image, it is specifically used to: preprocess the current frame image in sequence, the preprocessing including at least one of image scaling and normalization; input the preprocessed image into the target detection network for target detection to obtain the image position of the target object in the preprocessed image.
  • the 3D reconstruction device further includes: an adaptation unit and/or a labeling unit.
  • An adaptation unit for adapting the target three-dimensional model to the target object in each frame of the multiple images according to the camera posture data when the frame is collected, and selecting a commodity adapted to the target object based on the adaptation result;
  • a labeling unit used for inputting any frame of the multiple frames of images into a depth estimation network to estimate the size information of the target object, and labeling the target three-dimensional model according to the estimated size information of the target object;
  • the adaptation unit is used to adapt the target three-dimensional model to the target object in each frame of the multiple frames according to the camera posture data when the frame is collected, and measure the shape parameters of the target object based on the adaptation result.
  • the target object is a foot object, a hand object, a head object, an elbow object or a leg object on a human body, and the three-dimensional model description information corresponding to the target object is determined based on the SMPL model.
  • the device shown in Figure 8 can execute the method of the embodiment shown in Figure 2, and its implementation principle and technical effects are not repeated.
  • the specific way in which each unit of the 8 devices in the above embodiment performs operations has been described in detail in the embodiment of the method, and will not be elaborated here.
  • the execution subject of each step of the method provided in the above embodiment can be the same device, or the method can be executed by different devices.
  • the execution subject of steps 201 to 204 can be device A; for another example, the execution subject of steps 201 and 202 can be device A, and the execution subject of steps 203 and 204 can be device B; and so on.
  • Fig. 9 is a schematic diagram of the structure of a computer device provided in an embodiment of the present application. As shown in Fig. 9, the computer device includes: a memory 91 and a processor 92;
  • the memory 91 is used to store computer programs and can be configured to store various other data to support operations on the computing platform. Examples of such data include instructions for any application or method operating on the computing platform, contact data, phone book data, messages, pictures, videos, etc.
  • the memory 91 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable programmable read-only memory
  • PROM programmable read-only memory
  • ROM read-only memory
  • magnetic memory magnetic memory
  • flash memory magnetic disk or optical disk.
  • the processor 92 is coupled to the memory 91 and is used to execute the computer program in the memory 91, so as to: obtain multiple frames of images including a target object, and three-dimensional model description information corresponding to the target object; input the multiple frames of images into a feature extraction network for feature extraction to obtain feature vectors of the multiple frames of images, and splice the feature vectors of the multiple frames of images to obtain a target spliced feature vector; input the target spliced feature vector into a parameter regression network, and predict multiple control parameters for model control according to the three-dimensional model description information, wherein the multiple control parameters include posture control parameters and shape control parameters; and mask the initial three-dimensional model of the target object according to the posture control parameters and the shape control parameters to obtain a target three-dimensional model of the target object, wherein the initial three-dimensional model is obtained according to the three-dimensional model description information.
  • the processor 92 executes the computer program in the memory 91, and can also be used to: obtain multiple frame images including the object to be tried on, and three-dimensional model description information corresponding to the object to be tried on; input the multiple frame images into a feature extraction network for feature extraction to obtain feature vectors of the multiple frame images, and splice the feature vectors of the multiple frame images to obtain a target spliced feature vector; input the target spliced feature vector into a parameter regression network, and predict multiple control parameters for model control according to the three-dimensional model description information, the multiple control parameters including posture control parameters and shape control parameters; mask the initial three-dimensional model of the object to be tried on according to the posture control parameters and the shape control parameters to obtain a target three-dimensional model of the object to be tried on, the initial three-dimensional model being obtained according to the three-dimensional model description information; and provide the object to be tried on with target product information that matches it according to the target three-dimensional model.
  • the computer device also includes: communication component 93, display 94, power supply component 95, audio component 96 and other components.
  • Figure 9 only schematically shows some components, which does not mean that the computer device only includes the components shown in Figure 9.
  • the components in the dotted box in Figure 9 are optional components, not mandatory components, which can be determined according to the product form of the computer device.
  • the computer device of this embodiment can be implemented as a terminal device such as a desktop computer, a laptop computer, a smart phone or an IOT device, or it can be a service-side device such as a conventional server, a cloud server or a server array.
  • the computer device of this embodiment is implemented as a terminal device such as a desktop computer, a laptop computer, a smart phone, etc., it may include the components in the dotted box in Figure 9; if the computer device of this embodiment is implemented as a service-side device such as a conventional server, a cloud server or a server array, it may not include the components in the dotted box in Figure 9.
  • an embodiment of the present application further provides a computer-readable storage medium storing a computer program, which, when executed, can implement each step that can be executed by a computer device in the above method embodiment.
  • an embodiment of the present application also provides a computer program product, including a computer program/instruction.
  • the processor is enabled to implement each step that can be executed by a computer device in the above method embodiment.
  • the above-mentioned communication component is configured to facilitate wired or wireless communication between the device where the communication component is located and other devices.
  • the device where the communication component is located can access a wireless network based on a communication standard, such as WiFi, 2G, 3G, 4G/LTE, 5G and other mobile communication networks, or a combination thereof.
  • the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel.
  • the communication component also includes a near field communication (NFC) module to facilitate short-range communication.
  • the NFC module can be implemented based on radio frequency identification (RFID) technology, infrared data association (IrDA) technology, ultra-wideband (UWB) technology, Bluetooth (BT) technology and other technologies.
  • RFID radio frequency identification
  • IrDA infrared data association
  • UWB ultra-wideband
  • Bluetooth Bluetooth
  • the above-mentioned display includes a screen, and the screen may include a liquid crystal display (LCD) and a touch panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive input signals from a user.
  • the touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.
  • the power supply assembly provides power to various components of the device where the power supply assembly is located.
  • the power supply assembly may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to the device where the power supply assembly is located.
  • the above-mentioned audio component can be configured to output and/or input audio signals.
  • the audio component includes a microphone (MIC), and when the device where the audio component is located is in an operating mode, such as a call mode, a recording mode, and a speech recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signal can be further stored in a memory or sent via a communication component.
  • the audio component also includes a speaker for outputting an audio signal.
  • the embodiments of the present application may be provided as methods, systems, or computer program products. Therefore, the present application may adopt the form of a complete hardware embodiment, a complete software embodiment, or an embodiment in combination with software and hardware. Moreover, the present application may adopt the form of a computer program product implemented in one or more computer-readable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) that contain computer-usable program code.
  • a computer-readable storage media including but not limited to disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing device to work in a specific manner, so that the instructions stored in the computer-readable memory produce a manufactured product including an instruction device that implements the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device so that a series of operational steps are executed on the computer or other programmable device to produce a computer-implemented process, whereby the instructions executed on the computer or other programmable device provide steps for implementing the functions specified in one or more processes in the flowchart and/or one or more boxes in the block diagram.
  • a computing device includes one or more processors (CPU), input/output interfaces, network interfaces, and memory.
  • processors CPU
  • input/output interfaces network interfaces
  • memory volatile and non-volatile memory
  • Memory may include non-permanent storage in a computer-readable medium, in the form of random access memory (RAM) and/or non-volatile memory, such as read-only memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.
  • RAM random access memory
  • ROM read-only memory
  • flash RAM flash memory
  • Computer-readable media include permanent and non-permanent, removable and non-removable media that can be used to store information by any method or technology. Information can be computer-readable instructions, data structures, program modules or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), Dynamic random access memory (DRAM), other types of random access memory (RAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, compact disk read-only memory (CD-ROM), digital versatile disk (DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media does not include transitory media such as modulated data signals and carrier waves.
  • PRAM phase change memory
  • SRAM static random access memory
  • DRAM Dynamic random access memory
  • RAM random access memory
  • ROM read-only memory

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Marketing (AREA)
  • Computing Systems (AREA)
  • Economics (AREA)
  • Geometry (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请实施例提供一种三维重建与商品信息处理方法、装置、设备及存储介质。在本申请实施例中,对目标对象的多张图像进行三维重建,在三维重建过程中对多张图像进行特征向量的提取和拼接,以及基于拼接的特征向量预测用于模型控制的姿态控制参数和形状控制参数,并按照姿态控制参数和形状控制参数对目标对象的初始三维模型进行蒙层处理,得到目标对象的目标三维模型。该三维重建方式极大地提高了三维模型的精度和真实感,进而有效地拓展基于三维模型的应用范围和应用效果。特别地,在商品选购场景中,能够基于三维重建的模型为目标对象选购与之适配的商品,为解决现有退换货问题提供条件。

Description

三维重建与商品信息处理方法、装置、设备及存储介质
本申请要求于2022年10月14日提交中国专利局、申请号为202211257959.4、申请名称为“三维重建与商品信息处理方法、装置、设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及互联网技术领域,尤其涉及一种三维重建与商品信息处理方法、装置、设备及存储介质。
背景技术
随着互联网技术和电子商务的发展,人们可以足不出户进行在线购物。但对于鞋子等穿戴类商品,在线购买时用户无法进行试穿,经常出现鞋子等穿戴类商品到货后因为不合脚、不合身要退换货的情况,这不仅会严重影响用户的购物体验,还会增加在线购物的成本,降低效率。
于是,现有技术出现了一些估计脚部长度,基于估计的脚部长度向用户推荐合适尺码的鞋子的方案,例如利用脚部图像中的关键点来估计脚长,或者借助于AR技术对用户脚长进行测量。基于这些方案,用户在购鞋时仍很难得知鞋子相对于自己的脚型是否挤脚、穿着是否舒适等特性。也就是说,现有方案依旧无法很好地解决穿戴类商品的选购问题,无法很好的解决退换货问题。
发明内容
本申请的多个方面提供一种三维重建与商品信息处理方法、装置、设备及存储介质,用以对目标对象进行高精度的三维重建,以便基于三维重建的模型为目标对象选购与之适配的商品,为解决现有退换货问题提供条件。
本申请实施例提供一种三维重建方法,包括:获取包括目标对象的多帧图像,以及目标对象对应的三维模型描述信息;将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量,对多帧图像的特征向量进行拼接,得到目标拼接特征向量;将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模型控制的多个控制参数集,多个控制参数包括姿态控制参数和形状控制参数;按照姿态控制参数和形状控制参数对目标对象的初始三维模型进行蒙层处理,得到目标对象的目标三维模型,初始三维模型是根据三维模型描述信息得到的。
本申请实施例还提供一种三维重建装置,包括:图像获取单元,用于获取目标对象的多帧图像,以及目标对象对应的三维模型描述信息;特征提取单元,用于将多帧图像输入 特征提取网络进行特征提取,以得到多帧图像的特征向量;向量拼接单元,用于对多帧图像的特征向量进行拼接,得到目标拼接特征向量;参数回归单元,用于将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模型控制的多个控制参数集,多个控制参数包括姿态控制参数和形状控制参数;蒙层处理单元,用于按照姿态控制参数和形状控制参数对目标对象的初始三维模型进行蒙层处理,得到目标对象的目标三维模型,初始三维模型是根据三维模型描述信息得到的。
本申请实施例还提供一种计算机设备,包括:存储器和处理器;存储器,用于存储计算机程序,处理器与存储器耦合,用于执行计算机程序,以用于实现本申请实施例提供的三维重建方法中的步骤。
本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,当计算机程序被处理器执行时,致使处理器执行本申请实施例提供的三维重建方法中的步骤。
本申请实施例还提供一种商品信息处理方法,包括:获取包含包括试穿对象的多帧图像,以及试穿对象对应的三维模型描述信息;将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量,对多帧图像的特征向量进行拼接,得到目标拼接特征向量;将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模型控制的多个控制参数,多个控制参数包括姿态控制参数和形状控制参数;按照姿态控制参数和形状控制参数对试穿对象的初始三维模型进行蒙层处理,得到试穿对象的目标三维模型,初始三维模型是根据三维模型描述信息得到的;根据目标三维模型为试穿对象提供与之适配的目标商品信息。
在本申请实施例中,采用一种全新的三维重建网络架构为目标对象进行三维重建,该三维重建网络架构包含用于对包含目标对象的多帧图像进行特征提取的特征提取网络、用于对多帧图像的特征向量进行拼接的向量拼接网络、用于基于三维模型描述信息中的参数数量进行模型参数预测的参数回归网络以及用于根据预测出的控制参数进行蒙层处理的蒙层处理网络,基于该三维重建网络架构不仅可以实现端到端的三维重建,而且可以提高三维重建的精度。在得到目标对象的高精度的三维重建模型之后,可以基于该三维重建模型为目标对象选购与之适配的商品,从而解决因为选购不合适引起的退换货问题。
附图说明
此处所说明的附图用来提供对本申请的进一步理解,构成本申请的一部分,本申请的示意性实施例及其说明用于解释本申请,并不构成对本申请的不当限定。在附图中:
图1为本申请实施例提供的一种三维重建网络的模型结构图;
图2为本申请实施例提供的一种三维重建方法的流程图;
图3为本申请实施例提供的另一种三维重建网络的模型结构图;
图4为本申请实施例提供的一种特征提取网络的模型结构图;
图5为本申请实施例提供的一种特征提取网络中的特征提取模块的模型结构图;
图6为本申请实施例提供的一种下采样子模块的模型结构图;
图7为本申请实施例提供的一种商品信息处理方法的流程图;
图8为本申请实施例提供的一种三维重建装置的结构示意图;
图9为本申请实施例提供的一种计算机设备的结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合本申请具体实施例及相应的附图对本申请技术方案进行清楚、完整地描述。显然,所描述的实施例仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
在本申请的实施例中,“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的访问关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况,其中A,B可以是单数或者复数。在本申请的文字描述中,字符“/”一般表示前后关联对象是一种“或”的关系。此外,在本申请实施例中,“第一”、“第二”、“第三”、“第四”、“第五”以及“第六”只是为了区分不同对象的内容而已,并无其它特殊含义。
现有方案面临着穿戴类商品选购不合适,存在频繁退换货的问题。为此,本申请实施例提供一种三维重建与商品信息处理方法、装置、设备及存储介质。在本申请实施例中,利用目标对象对应的三维模型描述信息创建目标对象的初始三维模型,又利用包括目标对象的多张图像采用一种全新的三维重建网络架构对目标对象进行三维重建,在三维重建过程中提取多张图像各自的特征向量,并对多张图像各自的特征向量进行拼接,以及基于拼接的特征向量预测用于模型控制的姿态控制参数和形状控制参数,并按照姿态控制参数和形状控制参数对目标对象的初始三维模型进行蒙层处理,得到目标对象的目标三维模型。由此,这种三维重建方式极大地提高了三维模型的精度,三维模型的精度越高,三维模型的真实感越强,也就越能够真实表达现实世界中的目标对象,进而有效地拓展三维模型的应用范围和提高三维模型的应用效果。特别地,在商品选购场景中,能够基于三维重建的模型为目标对象选购与之适配的商品,为解决现有退换货问题提供条件。
图1为本申请实施例提供的一种三维重建网络的模型结构图。参见图1,整个三维重建网络可以包括:特征提取网络、向量拼接网络、参数回归网络和蒙层处理网络。实际应用中,目标对象可以是任意的需要进行三维重建的对象,目标对象例如为人体上的脚部对象、手部对象、头部对象、肘部对象或腿部对象等身体部位,又例如为自然界中的各种动物、植物等等,又例如为真实的房屋、山体等三维空间场景等等,对此不做限制。在对目标对象进行三维重建时,首先可以利用图像采集装置对目标对象进行视频采集,获取包括目标对象的视频流,参见图1中的①所示,依次将视频流中连续多帧包括目标对象的图像输入至三维重建网络中,参见图1中的②和③所示,特 征提取网络依次对每帧图像进行特征提取,提取每帧图像的特征向量。在得到多帧图像的特征向量后,参见图1中的④和⑤所示,利用向量拼接网络对多帧图像的特征向量按照图像采集时刻从早到晚的顺序依次进行拼接处理,得到目标拼接特征向量;接着,参见图1中的⑥和⑦所示,利用参数回归网络对目标拼接特征向量进行预测处理,得到用于模型控制的多个控制参数,多个控制参数可以包括姿态控制参数和形状控制参数。最后,参见图1中的⑧和⑨所示,利用蒙层处理网络对基于三维模型描述信息得到的目标对象的初始三维模型进行蒙层处理,便可输出目标对象的目标三维模型,至此完成整个三维重建任务。
实际应用中,整个三维重建网络可以部署在终端设备上,可以部署在服务器上,或者,整个三维重建网络中的部分网络部署在终端设备上,部分网络部署在服务器上,对此不做限制。可选的,终端设备例如包括但不限于手机、平板电脑、笔记本电脑、可穿戴设备、车载设备。服务器例如包括但不限于单个服务器或多个服务器组成的分布式服务器集群。
应当理解的是,图1中的三维重建网络的模型结构仅仅是示意性的。例如,特征提取网络也可以增设拼接处理功能,这样,三维重建网络无需包含专门的向量拼接网络。又例如,参数回归网络也可以增设蒙层处理功能,这样,三维重建网络无需包含专门的蒙层处理网络。凡是具有上述特征提取、向量拼接、参数回归以及蒙层处理能力的神经网络架构均适用于本申请实施例。
以下结合附图,详细说明本申请各实施例提供的技术方案。
图2为本申请实施例提供的一种三维重建方法的流程图。参见图2,该方法可以包括以下步骤:
201、获取包括目标对象的多帧图像,以及目标对象对应的三维模型描述信息。
202、将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量,对多帧图像的特征向量进行拼接,得到目标拼接特征向量。
203、将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模型控制的多个控制参数,多个控制参数包括姿态控制参数和形状控制参数。
204、按照姿态控制参数和形状控制参数对目标对象的初始三维模型进行蒙层处理,得到目标对象的目标三维模型,初始三维模型是根据三维模型描述信息得到的。
在本实施例中,预先准备目标对象对应的三维模型描述信息。在目标对象为身体部位的情况下,目标对象对应的三维模型描述信息可以是基于SMPL(Skinned Multi-Person Linear Model,蒙皮多人线性模型)模型确定的,SMPL(Skinned Multi-Person Linear Model)是一种裸体的(skinned),基于顶点(vertex-based)的人体三维模型,能够精确地表示人体的不同形状(shape)和姿态(pose)。
三维模型描述信息描述了目标对象的三维模型需要包含的顶点数量、各顶点的位置信息和用于模型控制的参数数量。基于各顶点的位置信息可以构建出目标对象的初始三维模型。以目标对象为脚部为例,可以采用1600个顶点所构建的三维模型作为脚部的初始三 维模型,1600个顶点仅为示例,并不限于此,具体可以根据模型精度灵活选择三维模型所需的顶点数量。其中,用于模型控制的参数数量不做限制,也可以根据模型控制的精度和复杂度进行灵活设定。例如,用于模型控制的多个控制参数可以包括姿态控制参数和形状控制参数,姿态控制参数用于控制三维模型的姿态,形状控制参数用于控制三维模型的形状。姿态控制参数可以包含翻滚角、俯仰角和偏航角等3个姿态角,通过3个姿态角控制三维模型的姿态。形状控制参数因目标对象不同而有所不同,任一形状参数的改变,可以引起目标对象的一个或多个部位的形状发生改变。以目标对象为脚部为例,形状控制参数例如包括10个形状参数,10个形状参数可以控制脚趾头大小、脚的肥瘦、纵向横向拉伸、足弓弯曲等。以目标对象为头部为例,形状控制参数例如包括8个形状参数,8个形状参数可以控制嘴巴大小、鼻梁高低、眼距和额头宽度等等。以目标对象为房屋为例,形状控制参数例如包括30个形状参数,30个形状参数可以控制房屋的层高、房屋大小、房屋的外墙构造等等。
由于初始三维模型的精度较低,初始三维模型的真实感不够,难以真实表达真实世界中的目标对象。为此,出于提高三维模型的精度的考虑,利用三维重建网络重建目标对象的目标三维模型。
在本实施例中,为了增强模型的鲁棒性,为模型引入一定的平滑效果,可以获取包括目标对象的多帧图像,将多帧图像输入至三维重建网络中进行三维重建。多帧图像的数量不做限制,例如为3帧、4帧、5帧等。实际应用中,可以预先对目标对象进行视频采集,得到视频流,并在本地保存视频流,在需要对目标对象进行三维重建时,从本地保存的视频流中获取包括目标对象的多帧图像。当然,也可以实时对目标对象进行视频采集,得到视频流,从实时采集的视频流中获取包括目标对象的多帧图像,对此不做限制。
实际应用中,针对包括目标对象的多帧图像中每帧图像,可以直接将该帧图像输入至三维重建网络中的特征提取网络进行特征提取。具体而言,依次将多帧图像中每帧图像作为当前帧图像,可以直接将当前帧图像输入至特征提取网络进行特征提取,在该过程中,可以保存提取到的每帧图像的特征向量。这样,在使用多帧图像进行三维重建时,可以将当前帧图像输入三维重建网络中的特征提取网络进行特征提取,而当前帧图像之前的其它几帧历史图像的特征向量可以直接从对应的存储空间中获取,但并不限于此。例如,同时将当前帧图像和之前几帧历史图像同时输入特征提取网络进行特征提取,也是可以的。进一步可选的,由于图像采集时当前帧图像除了包括目标对象,还包括目标对象所在的周边环境,为了提高特征提取的准确度,可以对当前帧图像进行裁剪,对裁剪后的图像进行特征提取。于是,在将当前帧图像输入特征提取网络进行特征提取,以得到当前帧图像的特征向量时,可以检测目标对象在当前帧图像中的图像位置,根据图像位置从当前帧图像中裁剪出目标对象所在的局部图像;将局部图像输入特征提取网络进行特征提取,以得到当前帧图像的特征向量。其中,通过目标检测(Object Dectection)算法可以检测出图像中目标对象的类别和位置。
进一步可选的,为了准确定位出目标对象的图像位置,在检测目标对象在当前帧图像中的图像位置时,对当前帧图像依次进行预处理,预处理包括图像缩放处理和归一化处理中的至少一种;将预处理后的图像输入目标检测网络进行目标检测,以得到目标对象在预处理后的图像中的图像位置。
举例来说,对脚部连续拍摄,得到4帧原始图像。将这4帧原始图像缩放至高160像素、宽90像素,并通过Z-Score(标准分数)方法对缩放后的4帧图像进行归一化处理,归一化处理后的4帧图像输入至一个实时的脚部目标检测网络中进行脚部检测,得到脚部的图像位置,根据脚部的图像位置从4个原始图像裁剪出4个脚部图像,4个脚部图像的尺寸为128*128像素。尺寸为128*128像素的脚部图像可以输入至特征提取网络中进行特征提取。
在本实施例中,对特征提取网络的模型结构不做限制,任何具有特征提取功能的网络均可以作为特征提取网络。
实际应用中,可以利用特征提取网络对多帧图像中每帧图像进行特征提取,得到每帧图像的特征向量,在完成全部的图像的特征提取后,对多张图像各自对应的特征向量进行向量拼接,得到目标拼接特征向量。参见图3,经过裁剪得到图像1、图像2、图像3和图像4等分别利用特征提取网络进行特征提取,得到各自对应的128维度的特征向量;利用向量拼接网络对4个128维度的特征向量进行向量拼接,可以得到512维度的特征向量。其中,512维度的特征向量即为目标拼接特征向量。
实际应用中,可以利用特征提取网络对每帧图像进行特征提取,得到每帧图像的特征向量,并在指定存储空间保存该帧图像的特征向量。当完成多帧图像中图像采集时间最晚的当前帧图像的特征提取后,将当前帧图像的特征向量和从指定存储空间获取的至少一帧历史图像的特征向量进行向量拼接。于是,示例性的,多帧图像包括当前帧图像和至少一帧历史图像;将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量,包括:每次将当前帧图像输入特征提取网络进行特征提取,以得到当前帧图像的特征向量;对多帧图像的特征向量进行拼接,得到目标拼接特征向量,包括:采用设定的滑动窗口从指定存储空间中,获取至少一帧历史图像的特征向量;将当前帧图像的特征向量和至少一帧历史图像的特征向量进行拼接,得到目标拼接特征向量。值得注意的是,设定的滑动窗口用于控制从指定存储空间获取的历史图像的数量,例如在使用4帧图像进行三维重建的场景中,该滑动窗口的长度可以是3;在使用5帧图像进行三维重建的场景中,该滑动窗口的长度可以是4。
在本实施例中,在对多帧图像的特征向量进行拼接,得到目标拼接特征向量之后,将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模型控制的多个控制参数。
本实施例对参数回归网络的模型结构不做限制,任何经过训练可以进行控制参数预测的模型均可以作为参数回归网络。进一步可选的,参数回归网络可以表现为MLP (多层感知器,Multilayer Perceptron)网络,且能够进行至少一次的MLP运算。其中,MLP网络包括多个输入层、多个输出层以及多个隐藏层,是一种前馈人工神经网络模型,其将输入的多个数据集映射到单一的输出的数据集上。于是,进一步可选的,将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模型控制的多个控制参数,包括:将目标拼接特征向量输入参数回归网络中,根据三维模型描述信息对目标拼接特征向量进行至少一次多层感知机MLP运算,以得到用于模型控制的多个控制参数。
参见图3,以参数回归网络对目标拼接特征向量进行两次MLP运算为例,对向量拼接网络输出的512维度的特征向量进行一次MLP运算,得到1600维度的特征向量;对1600维度的特征向量再进行一次MLP运算,得到13维度的特征向量,13维度的特征向量中的每个元素即为一个控制参数,也即得到了13个控制参数。在此说明,13维度仅为控制参数数量的一种示例,具体可根据目标对象以及控制复杂度等需求灵活设定。
在参数回归网络输出的姿态控制参数和形状控制参数之后,按照姿态控制参数和形状控制参数对目标对象的初始三维模型进行蒙层处理,得到目标对象的目标三维模型。由于初始三维模型是根据三维模型描述信息得到的,初始三维模型的精度有待改善,在蒙层处理过程中,利用姿态控制参数调整初始三维模型的姿态,利用形状控制参数调整初始三维模型的形状,进而得到精度更高的目标三维模型。值得注意的是,蒙皮处理过程中除了调整三维模型的姿态和形状,还可以将三维模型包括的各个顶点和骨骼联系起来。关于蒙皮处理本申请实施例中不做过多介绍。
本申请实施例提供的技术方案,利用目标对象对应的三维模型描述信息创建目标对象的初始三维模型,又利用包括目标对象的多张图像进行三维重建,在三维重建过程中提取多张图像各自的特征向量,并对多张图像各自的特征向量进行拼接,以及基于拼接的特征向量预测用于模型控制的姿态控制参数和形状控制参数,并按照姿态控制参数和形状控制参数对目标对象的初始三维模型进行蒙层处理,得到目标对象的目标三维模型。由此,这种三维重建方式极大地提高了三维模型的精度,三维模型的精度越高,三维模型的真实感越强,也就越能够真实表达现实世界中的目标对象,进而有效地拓展三维模型的应用范围和提高三维模型的应用效果。特别地,在商品选购场景中,能够基于三维重建的模型为目标对象选购与之适配的商品,为解决现有退换货问题提供条件。
在本申请的一些可选实施例中,为了更为准确地进行特征提取,特征提取网络可以结合图像特征和相机姿态数据进行特征提取。作为一种示例,特征提取网络可以包括特征提取模块、相机参数融合模块、特征拼接模块和特征降维模块。于是,将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量,包括:针对多帧图像中的每帧图像,将该帧图像输入特征提取网络中的特征提取模块进行特征提取,得到该帧图像的图像特征图;将采集该帧图像时的相机姿态数据输入特征提取网络中 的相机参数融合模块进行特征提取,得到该帧图像的相机位姿特征图;利用特征提取网络中的特征拼接模块对每帧图像的图像特征图和相机位姿特征图进行拼接,得到每帧图像的拼接特征图;以及利用特征提取网络中的特征降维模块对每帧图像的拼接特征图进行降维处理,得到每帧图像的特征向量。
具体而言,特征提取模块用于提取每帧图像的图像特征图。另外,对特征提取模块的模型结构不做限制,任何能够提取图像特征的特征提取网络均可以作为特征提取模块。
相机参数融合模块是对相机姿态数据进行特征提取的模块。另外,对相机参数融合模块的模型结构不做限制,任何能够对相机姿态数据进行特征提取的网络均可以作为相机参数融合模块。
进一步可选的,为了获得准确度更高的每帧图像的相机位姿特征图,将采集该帧图像时的相机姿态数据输入特征提取网络中的相机参数融合模块进行特征提取,得到该帧图像的相机位姿特征图的实现方式可以是:将采集该帧图像时的相机姿态数据输入特征提取网络中的相机参数融合模块,相机姿态数据包括至少两种姿态角;根据至少两种姿态角以及至少两种姿态角之间的相互关系进行三角函数处理,得到多种姿态表征参数;利用相机参数融合模块中的多层感知机MLP网络处理多种姿态表征参数,得到该帧图像的相机姿态特征图。
具体而言,相机姿态数据可以包括偏航角、俯仰角和翻滚角中的至少两种姿态角。作为一种示例,根据至少两种姿态角以及至少两种姿态角之间的相互关系进行三角函数处理,得到多种姿态表征参数,包括:对至少两种姿态角中的两两姿态角进行数值计算,以得到多种融合姿态角,每种融合姿态角表示对应两个姿态角之间的相互关系;对至少两种姿态角中的每种姿态角以及多种融合姿态角中的每种融合姿态角分别进行三角函数处理,得到多种姿态表征参数。
实际应用中,对至少两种姿态角中的两两姿态角进行相加、相减或相乘等各种数值计算,得到扩展出多种融合姿态角,每种融合姿态角表示对应两个姿态角之间的相互关系。在对每种姿态角和每种融合姿态角进行三角函数处理时,可以进行余弦函数、正弦函数、余切函数或正切函数处理,但并不限于此。
参见图4,相机姿态数据可以包括偏航角α、俯仰角β和翻滚角γ。θ可以是偏航角α、俯仰角β和翻滚角γ中任一个姿态角,ψ是除去θ之外的任一个姿态角。两两不同的姿态角θ、ψ相加可以得到一个融合姿态角θ+ψ,两两不同的姿态角相减得到一个融合姿态角θ-ψ,进而可以得到6种融合姿态角,分别为α+β、α+γ、β+γ、α-β、α-γ、β-γ。3种姿态角和6种融合姿态角分别进行正弦函数sin(e)和余弦函数cos(e)等三角函数τ(e)处理,能够得到18个三角函数处理结果,也即18个姿态表征参数,18个姿态表征参数组成18维度的向量。
在得到多种姿态表征参数之后,利用多层感知机MLP网络处理多种姿态表征参数, 得到该帧图像的相机姿态特征图。作为一种示例,利用多层感知机MLP网络处理多种姿态表征参数,得到该帧图像的相机姿态特征图的实现方式可以是:对多种姿态表征参数进行向量化处理,得到相机姿态特征向量;利用多层感知机MLP网络处理相机姿态特征向量,得到相机姿态特征图。参见图4,由18个姿态表征参数组成18维度的相机姿态特征向量输入到多层感知机MLP网络处理,得到64维度的特征向量,以及将64维度的特征向量转化成尺寸为4*4*64的特征图,4*4*64的特征图即为相机姿态特征图。
在本实施例中,针对每帧图像,利用特征提取网络中的特征拼接模块,将特征提取模块输出的该帧图像的图像特征图和相机参数融合模块输出的该帧图像的相机位姿特征图进行拼接,得到每帧图像的拼接特征图,利用特征提取网络中的特征降维模块对每帧图像的拼接特征图进行降维处理,得到每帧图像的拼接向量。
参见图4,特征拼接模块输出的是尺寸为4*4*256的特征图,利用卷积核大小为1*1的卷积模块对4*4*256的特征图进行降维处理,得到4*4*64的特征图。将该4*4*64的特征图和相机参数融合模块输出的4*4*64的特征图进行拼接,得到4*4*128的特征图;利用卷积核大小为4*4的卷积模块对4*4*128的特征图进行降维处理,得到1*1*128的特征图,并将1*1*128的特征图转化为128维度的特征向量,至此,完成该帧图像的特征提取任务。
在本申请的一些可选实施例中,为了提高特征提取的准确度,特征提取网络中的特征提取模块可以包括依次连接的跳跃连接层和下采样层,于是,针对多帧图像中的每帧图像,将该帧图像输入特征提取网络中的特征提取模块进行特征提取,得到该帧图像的图像特征图的一种可选实现方式为:针对多帧图像中的每帧图像,将该帧图像输入特征提取模块中的跳跃连接层,对该帧图像进行多分辨率的特征图提取并对相同分辨率的特征图进行跳跃连接,以得到该帧图像的第二中间特征图;将该帧图像的第二中间特征图输入特征提取模块中的下采样层进行M次下采样处理,得到该帧图像的图像特征图,其中,M是≥1的正整数。
具体而言,跳跃连接层可以执行多次下采样和多次上采样操作,并在上采样过程中执行跳跃连接操作。针对每次上采样操作,对本次输入的特征图进行上采样,得到本次上采样输出的特征图,并将本次上采样输出的特征图和已经得到的相同分辨率的特征图进行连接,也即跳跃连接,得到本次上采样的最终输出的特征图。参见图5,跳跃连接层首先对输入的图像进行特征提取,得到该输入的图像的初始特征图,接着,针对该初始特征图执行多次下采样操作,在每次执行下采样操作时,获取上一次下采样操作输出的特征图,对上一次下采样操作输出的特征图进行下采样,得到本次下采样操作输出的特征图,第一次下采样操作的输入特征图为初始特征图;这样,经过多次下采样操作,可以得到多个不同分辨率的特征图;将最后一次下采样操作输出的特征图作为第一中间特征图;接着,针对第一中间特征图执行多次上采样操作,在每次 上采样操作过程中,获取上一次上采样操作输出的特征图,对上一次下采样操作输出的特征图进行上采样,得到本次上采样操作输出的中间特征图;将本次上采样操作输出的中间特征图和下采样操作或特征提取得到的相同分辨率的特征图进行连接也即跳跃连接,得到本次上采样操作最终输出的特征图。在经过多次上采样操作后,将最后一次上采样操作输出的特征图作为跳跃连接层对其输入的图像进行特征提取得到的第二中间特征图。
作为一种示例,跳跃连接层采用编码器和解码器结构,对于每帧图像,将该帧图像输入特征提取模块中的跳跃连接层,对该帧图像进行多分辨率的特征图提取并对相同分辨率的特征图进行跳跃连接,以得到第二中间特征图,包括:将该帧图像输入跳跃连接层中的编码器,对该帧图像进行编码以得到该帧图像的初始特征图,并依次对初始特征图进行N次下采样处理,得到第一中间特征图;将第一中间特征图输入跳跃连接层中的解码器,依次对第一中间特征图进行N次上采样处理,并在每次上采样处理中与编码器中下采样处理得到的相同分辨率的第一中间特征图进行跳跃连接,以得到该帧图像的第二中间特征图。参见图5,跳跃连接层中四个表示下采样的箭头对应的编码器,跳跃连接层中三个表示上采样的箭头对应的解码器。
在一可能的实现方式中,编码器包括依次连接的编码子模块和N个下采样子模块,则将该帧图像输入跳跃连接层中的编码器,对该帧图像进行编码以得到该帧图像的初始特征图,并依次对初始特征图进行N次下采样处理,得到第一中间特征图,包括:将该帧图像输入编码子模块进行编码,以得到该帧图像的初始特征图;利用N个下采样子模块对初始特征图进行N次下采样处理,得到第一中间特征图;其中,在每个下采样子模块中,利用依次连接的K1个卷积单元各自对应的目标卷积参数对其输入进行卷积处理,得到待激活的中间特征图,利用激活函数对待激活的中间特征图进行激活以得到每个卷积单元的输出,K1是≥2的正整数。在本实施例中,对编码器中每个下采样子模块包含的卷积单元的数量不做限定,例如可以是2、3、4或5个等。
参见图6,以每个下采样子模块包括依次连接的3个卷积单元为例进行图示。上一个卷积单元的输出结果为下一个卷积单元的输入参数,第一个下采样子模块的第一个卷积单元的输入参数为编码子模块输出的初始特征图,最后一个下采样子模块中最后一个卷积单元的输出结果为第一中间特征图。
在三维重建网络的推理阶段,每个卷积单元对应的目标卷积参数是对训练阶段的多个分支的参数进行重参数化技术合并得到的。在三维重建网络的训练阶段引入多个分支可以提高三维重建网络的精度,在三维重建网络的推理阶段合并分支可以提高三维重建网络的三维重建效率。
参见图6,针对下采样子模块中每个卷积单元,在训练阶段,该卷积单元的运算过程分为三个分支,假设第一分支的参数记为c1和b1;第二分支的参数记为c2和b2;第二分支的参数记为b3;c1、c2为卷积参数,b1、b2、b3为BN(Batch Normalization, 批量归一化)参数;输入参数经过三个分支对应的卷积参数和批量归一化参数进行依次处理后,将三个分支的处理结果进行相加,得到待激活的中间特征图,利用激活函数(例如为ReLu或sigmoid)对待激活的中间特征图进行激活以得到每个卷积单元的输出。
参见图6,针对下采样子模块中每个卷积单元,在推理阶段,经过重参数化技术,该卷积单元的目标卷积参数是对训练阶段的三个分支的卷积参数和批量归一化参数进行合并得到的。应理解,相同的输入参数在训练阶段和推理阶段,采用三个分支对应的卷积参数和批量归一化参数进行处理得到的待激活的中间特征图,与采用重参数化的目标卷积参数c3处理得到的待激活的中间特征图相同。也就是说,重参数化尽管改变了对输入参数的运算方式,但是不会改变输入参数的运算结果。
在本申请的一些实施例中,特征提取网络中的特征提取模块包括依次连接的跳跃连接层和下采样层,进一步,该下采样层包括依次连接的多个下采样子模块,每个下采样子模块可以是任意的具有下采样功能的模块,对此不做限制。参见图3,在下采样层中,每个下采样子模块对上一个下采样子模块输出的特征图进行下采样处理,得到该下采样子模块输出的特征图,第一个下采样子模块对跳跃连接层输出的第二中间特征图进行下采样处理,最后一个下采样子模块输出的特征图作为下采样层的输出结果。
进一步可选的,下采样层包括依次连接的M个下采样子模块,则将该帧图像的第二中间特征图输入特征提取模块中的下采样层进行M次下采样处理,得到该帧图像的图像特征图,包括:利用M个下采样子模块对第二中间特征图进行M次下采样处理,得到该帧图像的图像特征图;其中,在每个下采样子模块中,利用依次连接的K2个卷积单元各自对应的目标卷积参数对其输入进行卷积处理,得到待激活的中间特征图,利用激活函数对待激活的中间特征图进行激活以得到每个卷积单元的输出,K2是≥2的正整数。在本实施例中,对下采样层中每个下采样子模块包含的卷积单元的数量不做限定,例如可以是2、3、4或5个等。在一可选实施例中,下采样层中每个下采样子模块可以包含3个卷积单元,且可以采用图6所示的下采样子模块的结构,但并不限于此。
在一些可选的实施例中,在得到目标三维模型之后,针对多帧图像中的每帧图像,根据采集该帧图像时的相机姿态数据,将目标三维模型与该帧图像中的目标对象进行适配,并基于适配结果为目标对象选购与之适配的商品。
具体而言,针对每帧图像,根据采集该帧图像时的相机姿态数据得到相机外参,相机外参是指相机在世界坐标系中的参数,比如相机的位置、旋转方向等,主要包括分为旋转矩阵和平移矩阵。当然,可以预先利用海量的样本图像和拍摄样本图像时对应的相机外参训练相机参数估计网络。在推理阶段,将图像输入至相机参数估计网络中进行识别处理,得到拍摄该图像时对应的相机外参。在得到相机外参后,基于小孔 成像理论,按照相机外参将目标三维模型中的各个顶点投影至该帧图像中,得到目标三维模型中的各个顶点对应的投影点;利用特征点匹配技术,从该帧图像的真实图像特征点中确定与投影点匹配的真实图像特征点,针对每个投影点,根据图像中与投影点对应的真实图像特征点的图像位置和投影点的图像位置,确定真实世界中目标对象上的顶点与目标三维模型上顶点之间的适配结果。真实图像特征点是指真实世界中目标对象上的顶点对应的特征点。例如,基于真实图像特征点的图像位置和投影点的图像位置之差,对真实世界中目标对象上的顶点与目标三维模型上顶点之间的适配度进行量化。真实图像特征点的图像位置和投影点的图像位置之差越大,适配度越小;真实图像特征点的图像位置和投影点的图像位置之差越小,适配度越大。在得到真实世界中目标对象上的各顶点与目标三维模型上对应顶点之间的适配结果后,基于适配结果为目标对象选购与之适配的商品。
作为一种示例,根据目标三维模型为目标对象提供与之适配的目标商品信息时,可以根据目标三维模型以及多个候选商品信息对应的商品三维模型,从多个候选商品信息选择商品三维模型与目标三维模型适配度最高的商品信息作为目标商品信息,并将目标商品信息提供给目标对象;
作为另一种示例,根据目标三维模型为目标对象提供与之适配的目标商品信息时,可以根据目标三维模型对应的模型参数和选定的商品类型,为目标对象定制化与目标三维模型适配的商品三维模型,并将商品三维模型对应的商品信息作为目标商品信息提供给目标对象。
在一些可选的实施例中,将多帧图像中任一帧图像输入深度估计网络进行目标对象的尺寸信息的估计,并根据估计出的目标对象的尺寸信息对目标三维模型进行标注。
实际应用中,可以预先利用海量的样本图像和样本图像中目标对象的尺寸信息训练深度估计网络。在推理阶段,将图像输入至深度估计网络中估计目标对象的尺寸信息,尺寸信息例如包括但不限于:目标对象的长度和宽度。估计出的目标对象的尺寸信息可以标注在目标三维模型。例如,在虚拟试鞋场景中,可能有量脚长和脚宽的需求,在重建的脚部三维模型上标注脚长和脚宽。
在一些可选的实施例中,针对多帧图像中的每帧图像,根据采集该帧图像时的相机姿态数据,将目标三维模型与该帧图像中的目标对象进行适配,并基于适配结果测量目标对象的形状参数。
图7为本申请实施例提供的一种商品信息处理方法的流程图。参见图7,该方法可以包括以下步骤:
701、获取包含包括试穿对象的多帧图像,以及试穿对象对应的三维模型描述信息。
702、将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量,对多帧图像的特征向量进行拼接,得到目标拼接特征向量。
703、将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模 型控制的多个控制参数,多个控制参数包括姿态控制参数和形状控制参数。
704、按照姿态控制参数和形状控制参数对试穿对象的初始三维模型进行蒙层处理,得到试穿对象的目标三维模型,初始三维模型是根据三维模型描述信息得到的。
705、根据目标三维模型为试穿对象提供与之适配的目标商品信息。
进一步可选的,根据目标三维模型为试穿对象提供与之适配的目标商品信息,包括:根据目标三维模型以及多个候选商品信息对应的商品三维模型,从多个候选商品信息选择商品三维模型与目标三维模型适配度最高的商品信息作为目标商品信息,并将目标商品信息提供给试穿对象;或者根据目标三维模型对应的模型参数和选定的商品类型,为试穿对象定制化与目标三维模型适配的商品三维模型,并将商品三维模型对应的商品信息作为目标商品信息提供给试穿对象。
进一步可选地,从多个候选商品信息选择商品三维模型与目标三维模型适配度最高的商品信息作为目标商品信息,包括:针对每个候选商品信息对应的商品三维模型,将试穿对象的目标三维模型与该商品三维模型进行融合,得到融合三维模型,融合三维模型表征试穿状态下试穿对象的三维模型与该商品三维模型的第一相对位置关系;根据第一相对位置关系,获取试穿对象的三维模型上多个目标顶点与该商品三维模型上对应顶点或区域之间的多个距离信息,作为多个目标顶点的适配度信息;在根据多个目标顶点的适配度信息判断目标三维模型与该商品三维模型的适配度;在得到各个候选商品信息对应的商品三维模型与目标三维模型的适配度后,可以从中选择与目标三维模型的适配度最高的商品信息作为目标商品信息。
上述针对每个候选商品信息对应的商品三维模型,将试穿对象的目标三维模型与该商品三维模型进行融合,得到融合三维模型的一种可选实施方式为:获取试穿对象的目标三维模型、商品三维模型以及试穿对象针对该商品三维模型对应的商品对象的目标试穿参数;根据目标试穿参数,确定试穿对象的目标三维模型上至少三个基准顶点和商品三维模型上对应基准顶点之间的第二相对位置关系;根据第二相对位置关系,将试试穿对象的目标三维模型至少部分放置于商品三维模型内部,以得到融合三维模型。
实际应用中,可以根据经验设置目标试穿参数。进一步可选的,还可以根据试穿对象的属性信息、试穿对象所属用户的试穿偏好信息和/或商品对象对应的基准试穿参数,获取试穿对象针对商品对象的目标试穿参数。
在试穿对象为脚部,商品对象为鞋的情况下,根据目标试穿参数,确定试穿对象的目标三维模型上多个基准顶点和三维模型上对应基准顶点之间的第二相对位置关系,包括以下至少一种:
方式1:根据鞋与脚跟之间的试穿距离,确定脚部的三维模型上的第一脚跟顶点与鞋的三维模型上的第二脚跟顶点之间相距试穿距离,作为第二相对位置关系。
在三维重建时,针对脚部的三维模型包括的每个顶点,可以标记该顶点类型,顶 点类型例如包括:脚跟顶点、脚底顶点或脚趾头顶点。基于顶点类型从脚部的三维模型包括的多个顶点中选择脚跟上的一个顶点作为第一脚跟顶点,根据第一脚跟顶点在脚跟上的位置分布,从鞋的三维模型上的多个脚跟顶点中选择位置分布与第一脚跟顶点相同的一个脚跟顶点作为对应的第二脚跟顶点。在三维模型融合时,在同一坐标系下,控制第一脚跟顶点与第二脚跟顶点相距试穿距离。
方式2:根据脚底部与鞋底部之间的贴合关系,确定脚部的三维模型上的第一脚底顶点与鞋的三维模型上的第二脚底顶点重合,作为第二相对位置关系。
基于顶点类型从脚部的三维模型包括的多个顶点中选择脚底上的若干个第一脚底顶点。根据各脚底顶点在脚跟上的位置分布,从鞋的三维模型上的多个顶点中选择位置分布与第一脚底顶点相同的若干个第二脚底顶点。在三维模型融合时,在同一坐标系下,控制每组第一脚底顶点和第二脚底顶点的顶点位置相同或相近,以使脚底部与鞋底部贴合。
方式3:根据脚底中心与鞋底中心的对齐关系,确定脚部的三维模型上位于脚底中心线上的第一中心线顶点与鞋的三维模型上位于鞋底中心线上的第二中心线顶点在脚长方向上对齐,作为第二相对位置关系。
基于顶点类型和顶点位置从脚部的三维模型包括的多个顶点中脚底中心线上顶点作为第一中心线顶点,从鞋的三维模型上的多个顶点中选择位置分布与第一中心线顶点相同的一个顶点作为对应的第二中心线顶点。在三维模型融合时,在同一坐标系下,控制第一中心线顶点与第二中心线顶点在脚长方向上对齐。
在本实施例中,将试穿对象的目标三维模型包括的各个顶点的位置坐标和商品三维模型包括的各个顶点的位置坐标统一变换至同一坐标系下,控制试穿对象的目标三维模型和商品三维模型之间保持第二相对位置关系,至此完成将试穿对象的目标三维模型至少部分放置于三维模型内部的操作,得到融合三维模型。
具体而言,融合三维模型中的试穿对象的目标三维模型和商品三维模型保持第一相对位置关系,在这种融合状态下,执行适配度信息计算操作。适配度信息反映的是穿戴适配程度,首先,从试穿对象的目标三维模型包括的多个顶点中,选择参与适配度信息计算的多个目标顶点。例如,将试穿对象的目标三维模型上的每个顶点均作为目标顶点。进一步的,为了减少数据处理量,同时兼顾适配度信息计算的准确度,可以从试穿对象的目标三维模型上选择部分顶点作为目标顶点。例如,根据试穿对象的关键部位信息,从试穿对象的目标三维模型上选择与关键部位信息对应的顶点作为目标顶点。关键部分例如包括但不限于:脚趾头、脚后跟、脚弓、脚背、内脚背、外脚背,脚底等等。
在确定参与适配度信息计算的试穿对象的目标三维模型上多个目标顶点后,针对每个目标顶点,可以根据目标顶点与商品三维模型上对应顶点之间的距离信息作为目标顶点的适配度信息。进一步可选的,为了更好地度量适配度信息,还可以将目标顶 点到商品三维模型上对应顶点所在区域的距离信息作为目标顶点的适配度信息。于是,根据第一相对位置关系,计算试穿对象的目标三维模型上多个目标顶点与商品三维模型上对应区域之间的多个距离信息,作为多个目标顶点的适配度信息,包括:针对试穿对象的目标三维模型上的每个目标顶点,根据第一相对位置关系,获取商品三维模型上与目标顶点最近的第一顶点;将以第一顶点为连接点的多个三角面片作为目标顶点在商品三维模型上对应的区域;计算目标顶点到多个三角面片的多个距离,根据多个距离生成目标顶点的适配度信息。
其中,目标顶点到三角面片的距离例如包括不限于:目标顶点到三角面片的中心点的距离、目标顶点到三角面片的垂直距离、对目标顶点到三角面片的三个顶点的距离进行求最大值、最小值或者均值得到的。实际应用中,对目标顶点到多个三角面片的多个距离求最大值、最小值或者均值,得到目标顶点到三角面片的最终的距离信息,将最终的距离信息作为目标顶点的适配度信息。
实际应用中,可以灵活设置各个目标顶点对应的满足适配度要求的适配度范围信息。每个目标顶点的适配度信息若是落在其对应的适配度范围信息内,该目标顶点满足适配度要求。每个目标顶点的适配度信息若未落在其对应的适配度范围信息内,该目标顶点不满足适配度要求。在确定各个目标顶点是否满足各自的适配度要求后,基于各个目标顶点满足各自的适配度要求的情况,确定目标三维模型与该商品三维模型的适配度。
进一步可选的,还可以引入人工干预方式目标三维模型与该商品三维模型的适配度。为了使得用户可以直观地获知目标三维模型与该商品三维模型的适配度,可以展示试穿对象的目标三维模型、商品三维模型以及融合三维模型中的任一三维模型,并在任一三维模型上对多个目标顶点的适配度信息进行可视化标记,其中,与基准适配度范围大小关系不同的适配度信息对应不同的可视化标记状态,以供用户确认目标三维模型与该商品三维模型的适配度。
具体而言,在上述任一三维模型上对多个目标顶点的适配度信息进行可视化标记,这样,不同的适配度信息采用不同的可视化标记状态进行标识。例如,满足适配度要求的顶点采用绿色标记,不满足适配度要求的顶点采用红色标记。
基准适配度范围是指限定是否满足适配度要求的适配度所在的数值范围。在基准适配度范围内的适配度信息满足适配度要求,不在基准适配度范围内的适配度信息不满足适配度要求。不满足适配度要求的适配度信息的数量越多,说明目标三维模型与该商品三维模型的适配度越低,反之,满足适配度要求的适配度信息的数量越多,说明目标三维模型与该商品三维模型的适配度越高。
进一步可选的,为了更加形象直观地反映试穿对象的目标三维模型上各个目标顶点的适配度信息分布信息,在上述任一三维模型上对多个目标顶点的适配度信息进行可视化标记时,可以根据多个目标顶点的适配度信息对任一三维模型进行渲染,以得 到适配度热力图,适配度热力图中的不同颜色表示与基准适配度范围大小关系不同的适配度信息。需要说明的是,基准适配度范围可以有多个,例如针对试穿对象的不同部位可以设置不同的基准适配度范围。以脚部为例,脚后跟部位对应第一基准适配度范围,例如1-2cm,脚掌部位对应第二基准适配度分为例如0.5-1cm,脚踝部位对应第三基准适配度范围,例如0-1cm,等等。其中,对于位于基准适配度范围内的适配度信息采用第一颜色值进行标记,针对大于基准适配度上限值的适配度信息采用第二颜色值进行标记,针对小于基准适配度范围下限值的适配度信息采用第三颜色值进行标记。这样,用户可以通过第一颜色值了解哪些位置合适,根据第二颜色值了解哪些位置太宽松,根据第三颜色值了解哪些位置太紧凑。
在向用户展示对适配度信息进行可视化标记的任一三维模型后,用户根据任一三维模型的可视化标记状态可主管确认目标三维模型与该商品三维模型的适配度。以适配度热力图为例,用户直观查看到适配度热力图上标记不满足适配度要求的颜色(例如为红色)的区域的数量比较多时,可以得出商品三维模型与试穿对象的目标三维模型的适配度低的结论。用户直观查看到适配度热力图上标记不满足适配度要求的颜色(例如为红色)的区域的数量比较少时,可以得出商品三维模型与试穿对象的目标三维模型的适配度高的结论。用户直观查看到适配度热力图上标记不满足适配度要求的颜色(例如为红色)的区域的数量不多不少时,可以得出商品三维模型与试穿对象的目标三维模型的适配度中的结论。
在得到各个候选商品信息对应的商品三维模型与目标三维模型的适配度后,可以从中选择与目标三维模型的适配度最高的商品信息作为目标商品信息。
进一步,在上述定制化场景中,根据目标三维模型对应的模型参数和选定的商品类型,为试穿对象定制化与目标三维模型适配的商品三维模型的实施方式包括:获取选定的商品类型对应的基准三维模型,将试穿对象的目标三维模型与基准三维模型进行融合,得到融合三维模型,融合三维模型表征试穿状态下试穿对象的目标三维模型与基准三维模型的第一相对位置关系;根据第一相对位置关系,获取试穿对象的目标三维模型上多个目标顶点与基准三维模型上对应顶点或区域之间的多个距离信息,作为多个目标顶点的适配度信息;在根据多个目标顶点的适配度信息确定基准三维模型不满足适配度要求的情况下,调整基准三维模型的尺寸参数和/或外形参数,并重新获取多个目标顶点的适配度信息,直至得到满足适配度要求的最终商品三维模型。其中,关于适配度信息的获取过程可参见前述实施例,在此不再赘述。
值得注意的是,在每次调整基准三维模型的尺寸参数和/或外形参数后,将调整后的基准三维模型作为新的基准三维模型,并重复执行获取多个目标顶点的适配度信息,直至根据多个目标顶点的适配度信息确定基准三维模型满足适配度要求,并将满足适配度要求的基准三维模型作为最终商品三维模型。
基准三维模型的尺寸参数例如包括但不限于:整个基准三维模型的长度、宽度和 高度,或者,基准三维模型中各个部位的长度、宽度和高度。以鞋为例,尺寸参数包括:鞋长、鞋宽、或者,脚趾头长度或宽度,或者,脚背的高度等等。基准三维模型的外形参数定义基准三维模型的外形特点。以鞋为例,鞋的根部高度、头部宽度、头部长度或者脚背高度等等。
在本实施例中,可以自动调整基准三维模型的尺寸参数和/或外形参数,也可以响应用户触发的针对基准三维模型的调整操作,调整基准三维模型的尺寸参数和/或外形参数,对此不做限制。
进一步可选的,为了方便用户发起调整操作,可以向用户提供调整控件,用户通过调整控件可以发起针对基准三维模型的调整操作。具体而言,在上述任一三维模型的关联区域内可以展示调整控件,该调整控件可以是但不限于滑动条。基于此,可响应滑动条上的至少一次滑动操作,获取每次滑动操作的滑动距离和滑动方向,根据滑动距离和滑动方向分别确定调整幅度和调整方向;根据调整方向和调整幅度,调整基准三维模型的尺寸参数和/或外形参数。在本实施例中,滑动距离决定对尺寸参数和/或外形参数的调整幅度,滑动方向决定尺寸参数和/或外形参数的调整方向。调整方向可以是在当前参数基础上朝着增大方向调整,或者朝着递减方向调整,对此不做限制。值得注意的是,在三维模型所在的显示区域的任一区域可以作为关联区域,并在该关联区域内展示滑动条,以便于用户执行调整操作。
在一可选实施例中,滑动距离与调整幅度成正比,滑动距离越大,表示对尺寸参数和/或外形参数的调整幅度越大;滑动距离越小,表示对尺寸参数和/或外形参数的调整幅度越小。相应地,以从左到右的滑动条为例,向左滑动代表往回调整,意味着将尺寸参数和/或外形参数往小了调整,即调整方向是往小了调整的方向;向右滑动代表向前调整,意味着将尺寸参数和/或外形参数往大了调整,即调整方向是往大了调整的方向。
实际应用中,可以利用一个滑动条联动调整基准三维模型的尺寸参数和外形参数。考虑到实际应用中,可能仅仅对尺寸参数有调整需求,或者对外形参数有调整需求。为了便于独立调整尺寸参数或外形参数,滑动条可以包括第一滑动条和第二滑动条,第一滑动条用于对基准三维模型的尺寸参数进行调整,第二滑动条件用于对基准三维模型的外形参数进行调整。用户可以分别通过第一滑动条和第二滑动条对基准三维模型的尺寸参数和外形参数进行调整。
在本实施例中,在得到最终商品三维模型后,可以将最终商品三维模型发送给服务器,服务器可以获取尺寸和外形均与试穿对象匹配的目标商品对象,还可以向终端设备返回目标商品对象的信息(即目标商品信息)。目标商品对象的信息例如包括但不限于:目标商品对象的材质、款式、风格、生产进度、物流配送进度、生产日期、生产厂商等等。进一步可选地,终端设备可以向用户输出目标商品对象的信息,用户可以根据该信息确定是否针对目标商品对象进行定制化;以及响应于用户确定定制化的操作,终端设备还可以向服 务器发送定制化指令。基于此,服务器还可以将目标三维模型发送给定制化平台,定制化平台基于目标三维模型进行生产制造,以定制出尺寸和外形和试穿对象匹配的目标商品对象,以及将生产出的目标商品对象经过物流配送送达给用户。
关于图7所示实施例中执行各步骤的详细实施过程可参见前述方法实施例中的相关描述,在此不再赘述。
本申请实施例提供的技术方案,利用试穿对象对应的三维模型描述信息创建试穿对象的初始三维模型,又利用包括试穿对象的多张图像进行三维重建,在三维重建过程中提取多张图像各自的特征向量,并对多张图像各自的特征向量进行拼接,以及基于拼接的特征向量预测用于模型控制的姿态控制参数和形状控制参数,并按照姿态控制参数和形状控制参数对试穿对象的初始三维模型进行蒙层处理,得到试穿对象的目标三维模型。由此,这种三维重建方式极大地提高了三维模型的精度,三维模型的精度越高,三维模型的真实感越强,也就越能够真实表达现实世界中的试穿对象,进而有效地拓展三维模型的应用范围和提高三维模型的应用效果。特别地,在商品选购场景中,能够基于三维重建的模型为试穿对象选购与之适配的商品,为解决现有退换货问题提供条件。
图8为本申请实施例提供的一种三维重建装置的结构示意图。参见图8,该装置可以包括以下单元:
图像获取单元81,用于获取目标对象的多帧图像,以及目标对象对应的三维模型描述信息;
特征提取单元82,用于将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量;
向量拼接单元83,用于对多帧图像的特征向量进行拼接,得到目标拼接特征向量;
参数回归单元84,用于将目标拼接特征向量输入参数回归网络,根据参数数量预测用于模型控制的多个控制参数,多个控制参数包括姿态控制参数和形状控制参数;
蒙层处理单元85,用于按照姿态控制参数和形状控制参数对目标对象的初始三维模型进行蒙层处理,得到目标对象的目标三维模型,初始三维模型是根据三维模型描述信息得到的。
进一步可选的,特征提取单元82将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量时,具体用于:针对多帧图像中的每帧图像,将该帧图像输入特征提取网络中的特征提取模块进行特征提取,得到该帧图像的图像特征图;将采集该帧图像时的相机姿态数据输入特征提取网络中的相机参数融合模块进行特征提取,得到该帧图像的相机位姿特征图;利用特征提取网络中的特征拼接模块对每帧图像的图像特征图和相机位姿特征图进行拼接,得到每帧图像的拼接特征图;以及利用特征提取网络中的特征降维模块对每帧图像的拼接特征图进行降维处理,得到每帧图像的特征向量。
进一步可选的,特征提取单元82针对多帧图像中的每帧图像,将该帧图像输入特 征提取网络中的特征提取模块进行特征提取,得到该帧图像的图像特征图时,具体用于:针对多帧图像中的每帧图像,将该帧图像输入特征提取模块中的跳跃连接层,对该帧图像进行多分辨率的特征图提取并对相同分辨率的特征图进行跳跃连接,以得到该帧图像的第二中间特征图;将该帧图像的第二中间特征图输入特征提取模块中的下采样层进行M次下采样处理,得到该帧图像的图像特征图,其中,M是≥1的正整数。
进一步可选的,跳跃连接层采用编码器和解码器结构,则特征提取单元82将该帧图像输入特征提取模块中的跳跃连接层,对该帧图像进行多分辨率的特征图提取并对相同分辨率的特征图进行跳跃连接,以得到第二中间特征图时,具体用于:将该帧图像输入跳跃连接层中的编码器,对该帧图像进行编码以得到该帧图像的初始特征图,并依次对初始特征图进行N次下采样处理,得到第一中间特征图;将第一中间特征图输入跳跃连接层中的解码器,依次对第一中间特征图进行N次上采样处理,并在每次上采样处理中与编码器中下采样处理得到的相同分辨率的第一中间特征图进行跳跃连接,以得到该帧图像的第二中间特征图。
进一步可选的,编码器包括依次连接的编码子模块和N个下采样子模块,则特征提取单元82将该帧图像输入跳跃连接层中的编码器,对该帧图像进行编码以得到该帧图像的初始特征图,并依次对初始特征图进行N次下采样处理,得到第一中间特征图时,具体用于:将该帧图像输入编码子模块进行编码,以得到该帧图像的初始特征图;利用N个下采样子模块对初始特征图进行N次下采样处理,得到第一中间特征图;其中,在每个下采样子模块中,利用依次连接的K1个卷积单元各自对应的目标卷积参数对其输入进行卷积处理,得到待激活的中间特征图,利用激活函数对待激活的中间特征图进行激活以得到每个卷积单元的输出,K1是≥2的正整数。
进一步可选的,下采样层包括依次连接的M个下采样子模块,则特征提取单元82将该帧图像的第二中间特征图输入特征提取模块中的下采样层进行M次下采样处理,得到该帧图像的图像特征图时,具体用于:利用M个下采样子模块对第二中间特征图进行M次下采样处理,得到该帧图像的图像特征图;其中,在每个下采样子模块中,利用依次连接的K2个卷积单元各自对应的目标卷积参数对其输入进行卷积处理,得到待激活的中间特征图,利用激活函数对待激活的中间特征图进行激活以得到每个卷积单元的输出,K2是≥2的正整数。
进一步可选的,特征提取单元82将采集该帧图像时的相机姿态数据输入特征提取网络中的相机参数融合模块进行特征提取,得到该帧图像的相机位姿特征图时,具体用于:将采集该帧图像时的相机姿态数据输入特征提取网络中的相机参数融合模块,相机姿态数据包括至少两种姿态角;
根据至少两种姿态角以及至少两种姿态角之间的相互关系进行三角函数处理,得到多种姿态表征参数;利用相机参数融合模块中的多层感知机MLP网络处理多种姿态表征参数,得到该帧图像的相机姿态特征图。
进一步可选的,特征提取单元82根据至少两种姿态角以及至少两种姿态角之间的相互关系进行三角函数处理,得到多种姿态表征参数时,具体用于:对至少两种姿态角中的两两姿态角进行数值计算,以得到多种融合姿态角,每种融合姿态角表示对应两个姿态角之间的相互关系;对至少两种姿态角中的每种姿态角以及多种融合姿态角中的每种融合姿态角分别进行三角函数处理,得到多种姿态表征参数。
进一步可选的,特征提取单元82利用多层感知机MLP网络处理多种姿态表征参数,得到该帧图像的相机姿态特征图时,具体用于:对多种姿态表征参数进行向量化处理,得到相机姿态特征向量;利用多层感知机MLP网络处理相机姿态特征向量,得到相机姿态特征图。
进一步可选的,参数回归单元84将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模型控制的多个控制参数时,具体用于:
将目标拼接特征向量输入参数回归网络中,根据三维模型描述信息对目标拼接特征向量进行至少一次多层感知机MLP运算,以得到用于模型控制的多个控制参数。
进一步可选的,多帧图像包括当前帧图像和至少一帧历史图像;
特征提取单元82将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量时,具体用于:每次将当前帧图像输入特征提取网络进行特征提取,以得到当前帧图像的特征向量;
向量拼接单元83对多帧图像的特征向量进行拼接,得到目标拼接特征向量时,具体用于:采用设定的滑动窗口从指定存储空间中,获取至少一帧历史图像的特征向量;将当前帧图像的特征向量和至少一帧历史图像的特征向量进行拼接,得到目标拼接特征向量。
进一步可选的,特征提取单元82每次将当前帧图像输入特征提取网络进行特征提取,以得到当前帧图像的特征向量时,具体用于:检测目标对象在当前帧图像中的图像位置,根据图像位置从当前帧图像中裁剪出目标对象所在的局部图像;将局部图像输入特征提取网络进行特征提取,以得到当前帧图像的特征向量。
进一步可选的,特征提取单元82检测目标对象在当前帧图像中的图像位置时,具体用于:对当前帧图像依次进行预处理,预处理包括图像缩放处理和归一化处理中的至少一种;将预处理后的图像输入目标检测网络进行目标检测,以得到目标对象在预处理后的图像中的图像位置。
进一步可选的,三维重建装置还包括:适配单元和/或标注单元。
适配单元,用于针对多帧图像中的每帧图像,根据采集该帧图像时的相机姿态数据,将目标三维模型与该帧图像中的目标对象进行适配,并基于适配结果为目标对象选购与之适配的商品;和/或
标注单元,用于将多帧图像中任一帧图像输入深度估计网络进行目标对象的尺寸信息的估计,并根据估计出的目标对象的尺寸信息对目标三维模型进行标注;和/或
适配单元,用于针对多帧图像中的每帧图像,根据采集该帧图像时的相机姿态数据,将目标三维模型与该帧图像中的目标对象进行适配,并基于适配结果测量目标对象的形状参数。
进一步可选的,目标对象为人体上的脚部对象、手部对象、头部对象、肘部对象或腿部对象,目标对象对应的三维模型描述信息是基于SMPL模型确定的。
图8所示的装置可以执行图2所示实施例的方法,其实现原理和技术效果不再赘述。对于上述实施例中的8装置其中各个单元执行操作的具体方式已经在有关该方法的实施例中进行了详细描述,此处将不做详细阐述说明。
需要说明的是,上述实施例所提供方法的各步骤的执行主体均可以是同一设备,或者,该方法也由不同设备作为执行主体。比如,步骤201至步骤204的执行主体可以为设备A;又比如,步骤201和202的执行主体可以为设备A,步骤203和204的执行主体可以为设备B;等等。
另外,在上述实施例及附图中的描述的一些流程中,包含了按照特定顺序出现的多个操作,但是应该清楚了解,这些操作可以不按照其在本文中出现的顺序来执行或并行执行,操作的序号如201、202等,仅仅是用于区分开各个不同的操作,序号本身不代表任何的执行顺序。另外,这些流程可以包括更多或更少的操作,并且这些操作可以按顺序执行或并行执行。需要说明的是,本文中的“第一”、“第二”等描述,是用于区分不同的消息、设备、模块等,不代表先后顺序,也不限定“第一”和“第二”是不同的类型。
图9为本申请实施例提供的一种计算机设备的结构示意图。如图9所示,该计算机设备包括:存储器91和处理器92;
存储器91,用于存储计算机程序,并可被配置为存储其它各种数据以支持在计算平台上的操作。这些数据的示例包括用于在计算平台上操作的任何应用程序或方法的指令,联系人数据,电话簿数据,消息,图片,视频等。
存储器91可以由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。
处理器92,与存储器91耦合,用于执行存储器91中的计算机程序,以用于:获取包括目标对象的多帧图像,以及目标对象对应的三维模型描述信息;将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量,对多帧图像的特征向量进行拼接,得到目标拼接特征向量;将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模型控制的多个控制参数,多个控制参数包括姿态控制参数和形状控制参数;按照姿态控制参数和形状控制参数对目标对象的初始三维模型进行蒙层处理,得到目标对象的目标三维模型,初始三维模型是根据三维模型描述信息得到的。
进一步可选地,处理器92执行存储器91中的计算机程序,还可以用于:获取包含包括试穿对象的多帧图像,以及试穿对象对应的三维模型描述信息;将多帧图像输入特征提取网络进行特征提取,以得到多帧图像的特征向量,对多帧图像的特征向量进行拼接,得到目标拼接特征向量;将目标拼接特征向量输入参数回归网络,根据三维模型描述信息预测用于模型控制的多个控制参数,多个控制参数包括姿态控制参数和形状控制参数;按照姿态控制参数和形状控制参数对试穿对象的初始三维模型进行蒙层处理,得到试穿对象的目标三维模型,初始三维模型是根据三维模型描述信息得到的;根据目标三维模型为试穿对象提供与之适配的目标商品信息。
进一步,如图9所示,该计算机设备还包括:通信组件93、显示器94、电源组件95、音频组件96等其它组件。图9中仅示意性给出部分组件,并不意味着计算机设备只包括图9所示组件。另外,图9中虚线框内的组件为可选组件,而非必选组件,具体可视计算机设备的产品形态而定。本实施例的计算机设备可以实现为台式电脑、笔记本电脑、智能手机或IOT设备等终端设备,也可以是常规服务器、云服务器或服务器阵列等服务端设备。若本实施例的计算机设备实现为台式电脑、笔记本电脑、智能手机等终端设备,可以包含图9中虚线框内的组件;若本实施例的计算机设备实现为常规服务器、云服务器或服务器阵列等服务端设备,则可以不包含图9中虚线框内的组件。
关于处理器执行各动作的详细实施过程可参见前述方法实施例或设备实施例中的相关描述,在此不再赘述。
相应地,本申请实施例还提供一种存储有计算机程序的计算机可读存储介质,计算机程序被执行时能够实现上述方法实施例中可由计算机设备执行的各步骤。
相应地,本申请实施例还提供一种计算机程序产品,包括计算机程序/指令,当计算机程序/指令被处理器执行时,致使处理器能够实现上述方法实施例中可由计算机设备执行的各步骤。
上述通信组件被配置为便于通信组件所在设备和其他设备之间有线或无线方式的通信。通信组件所在设备可以接入基于通信标准的无线网络,如WiFi,2G、3G、4G/LTE、5G等移动通信网络,或它们的组合。在一个示例性实施例中,通信组件经由广播信道接收来自外部广播管理系统的广播信号或广播相关信息。在一个示例性实施例中,通信组件还包括近场通信(NFC)模块,以促进短程通信。例如,在NFC模块可基于射频识别(RFID)技术,红外数据协会(IrDA)技术,超宽带(UWB)技术,蓝牙(BT)技术和其他技术来实现。
上述显示器包括屏幕,其屏幕可以包括液晶显示器(LCD)和触摸面板(TP)。如果屏幕包括触摸面板,屏幕可以被实现为触摸屏,以接收来自用户的输入信号。触摸面板包括一个或多个触摸传感器以感测触摸、滑动和触摸面板上的手势。触摸传感器可以不仅感测触摸或滑动动作的边界,而且还检测与触摸或滑动操作相关的持续时间和压力。
上述电源组件,为电源组件所在设备的各种组件提供电力。电源组件可以包括电源管理系统,一个或多个电源,及其他与为电源组件所在设备生成、管理和分配电力相关联的组件。
上述音频组件,可被配置为输出和/或输入音频信号。例如,音频组件包括一个麦克风(MIC),当音频组件所在设备处于操作模式,如呼叫模式、记录模式和语音识别模式时,麦克风被配置为接收外部音频信号。所接收的音频信号可以被进一步存储在存储器或经由通信组件发送。在一些实施例中,音频组件还包括一个扬声器,用于输出音频信号。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可读存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
在一个典型的配置中,计算设备包括一个或多个处理器(CPU)、输入/输出接口、网络接口和内存。
内存可能包括计算机可读介质中的非永久性存储器,随机存取存储器(RAM)和/或非易失性内存等形式,如只读存储器(ROM)或闪存(flash RAM)。内存是计算机可读介质的示例。
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、 动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、商品或者设备中还存在另外的相同要素。
以上仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。

Claims (18)

  1. 一种三维重建方法,其特征在于,包括:
    获取包括目标对象的多帧图像,以及所述目标对象对应的三维模型描述信息;
    将所述多帧图像输入特征提取网络进行特征提取,以得到所述多帧图像的特征向量,对所述多帧图像的特征向量进行拼接,得到目标拼接特征向量;
    将所述目标拼接特征向量输入参数回归网络,根据所述三维模型描述信息预测用于模型控制的多个控制参数,所述多个控制参数包括姿态控制参数和形状控制参数;
    按照所述姿态控制参数和形状控制参数对所述目标对象的初始三维模型进行蒙层处理,得到所述目标对象的目标三维模型,所述初始三维模型是根据所述三维模型描述信息得到的。
  2. 根据权利要求1所述的方法,其特征在于,将所述多帧图像输入特征提取网络进行特征提取,以得到所述多帧图像的特征向量,包括:
    针对所述多帧图像中的每帧图像,将该帧图像输入所述特征提取网络中的特征提取模块进行特征提取,得到该帧图像的图像特征图;
    将采集该帧图像时的相机姿态数据输入所述特征提取网络中的相机参数融合模块进行特征提取,得到该帧图像的相机位姿特征图;
    利用所述特征提取网络中的特征拼接模块对每帧图像的图像特征图和相机位姿特征图进行拼接,得到每帧图像的拼接特征图;以及
    利用所述特征提取网络中的特征降维模块对每帧图像的拼接特征图进行降维处理,得到每帧图像的特征向量。
  3. 根据权利要求2所述的方法,其特征在于,针对所述多帧图像中的每帧图像,将该帧图像输入所述特征提取网络中的特征提取模块进行特征提取,得到该帧图像的图像特征图,包括:
    针对所述多帧图像中的每帧图像,将该帧图像输入所述特征提取模块中的跳跃连接层,对该帧图像进行多分辨率的特征图提取并对相同分辨率的特征图进行跳跃连接,以得到该帧图像的第二中间特征图;
    将该帧图像的第二中间特征图输入所述特征提取模块中的下采样层进行M次下采样处理,得到该帧图像的图像特征图,其中,M是≥1的正整数。
  4. 根据权利要求3所述的方法,其特征在于,所述跳跃连接层采用编码器和解码器结构,则将该帧图像输入所述特征提取模块中的跳跃连接层,对该帧图像进行多分辨率的特征图提取并对相同分辨率的特征图进行跳跃连接,以得到该帧图像的第二中间特征图,包括:
    将该帧图像输入所述跳跃连接层中的编码器,对该帧图像进行编码以得到该帧图像的初始特征图,并依次对所述初始特征图进行N次下采样处理,得到第一中间特征图;
    将所述第一中间特征图输入所述跳跃连接层中的解码器,依次对所述第一中间特征图进行N次上采样处理,并在每次上采样处理中与所述编码器中下采样处理得到的相同分辨率的第一中间特征图进行跳跃连接,以得到该帧图像的第二中间特征图。
  5. 根据权利要求4所述的方法,其特征在于,所述编码器包括依次连接的编码子模块和N个下采样子模块,则将该帧图像输入所述跳跃连接层中的编码器,对该帧图像进行编码以得到该帧图像的初始特征图,并依次对所述初始特征图进行N次下采样处理,得到第一中间特征图,包括:
    将该帧图像输入所述编码子模块进行编码,以得到该帧图像的初始特征图;
    利用所述N个下采样子模块对所述初始特征图进行N次下采样处理,得到第一中间特征图;
    其中,在每个下采样子模块中,利用依次连接的K1个卷积单元各自对应的目标卷积参数对其输入进行卷积处理,得到待激活的中间特征图,利用激活函数对待激活的中间特征图进行激活以得到每个卷积单元的输出,K1是≥2的正整数。
  6. 根据权利要求3所述的方法,其特征在于,所述下采样层包括依次连接的M个下采样子模块,则将该帧图像的第二中间特征图输入所述特征提取模块中的下采样层进行M次下采样处理,得到该帧图像的图像特征图,包括:
    利用所述M个下采样子模块对所述第二中间特征图进行M次下采样处理,得到该帧图像的图像特征图;
    其中,在每个下采样子模块中,利用依次连接的K2个卷积单元各自对应的目标卷积参数对其输入进行卷积处理,得到待激活的中间特征图,利用激活函数对待激活的中间特征图进行激活以得到每个卷积单元的输出,K2是≥2的正整数。
  7. 根据权利要求2所述的方法,其特征在于,将采集该帧图像时的相机姿态数据输入所述特征提取网络中的相机参数融合模块进行特征提取,得到该帧图像的相机位姿特征图,包括:
    将采集该帧图像时的相机姿态数据输入所述特征提取网络中的相机参数融合模块,所述相机姿态数据包括至少两种姿态角;
    根据所述至少两种姿态角以及所述至少两种姿态角之间的相互关系进行三角函数处理,得到多种姿态表征参数;
    利用所述相机参数融合模块中的多层感知机MLP网络处理所述多种姿态表征参数,得到该帧图像的相机姿态特征图。
  8. 根据权利要求7所述的方法,其特征在于,根据所述至少两种姿态角以及所述至少两种姿态角之间的相互关系进行三角函数处理,得到多种姿态表征参数,包括:
    对所述至少两种姿态角中的两两姿态角进行数值计算,以得到多种融合姿态角,每种融合姿态角表示对应两个姿态角之间的相互关系;
    对所述至少两种姿态角中的每种姿态角以及所述多种融合姿态角中的每种融合姿 态角分别进行三角函数处理,得到多种姿态表征参数。
  9. 根据权利要求7所述的方法,其特征在于,利用所述相机参数融合模块中的多层感知机MLP网络处理所述多种姿态表征参数,得到该帧图像的相机姿态特征图,包括:
    对所述多种姿态表征参数进行向量化处理,得到相机姿态特征向量;
    利用多层感知机MLP网络处理所述相机姿态特征向量,得到相机姿态特征图。
  10. 根据权利要求1所述的方法,其特征在于,将所述目标拼接特征向量输入参数回归网络,根据所述参数数量预测用于模型控制的多个控制参数,包括:
    将所述目标拼接特征向量输入参数回归网络中,根据所述参数数量对所述目标拼接特征向量进行至少一次多层感知机MLP运算,以得到用于模型控制的多个控制参数。
  11. 根据权利要求1-10任一项所述的方法,其特征在于,所述多帧图像包括当前帧图像和至少一帧历史图像;
    将所述多帧图像输入特征提取网络进行特征提取,以得到所述多帧图像的特征向量,包括:每次将当前帧图像输入特征提取网络进行特征提取,以得到所述当前帧图像的特征向量;
    对所述多帧图像的特征向量进行拼接,得到目标拼接特征向量,包括:采用设定的滑动窗口从指定存储空间中,获取至少一帧历史图像的特征向量;将所述当前帧图像的特征向量和至少一帧历史图像的特征向量进行拼接,得到目标拼接特征向量。
  12. 根据权利要求1-10任一项所述的方法,其特征在于,在得到所述目标三维模型之后,所述方法还包括:
    针对所述多帧图像中的每帧图像,根据采集该帧图像时的相机姿态数据,将所述目标三维模型与该帧图像中的所述目标对象进行适配,并基于适配结果为所述目标对象选购与之适配的商品;和/或
    将所述多帧图像中任一帧图像输入深度估计网络进行所述目标对象的尺寸信息的估计,并根据估计出的所述目标对象的尺寸信息对所述目标三维模型进行标注;和/或
    针对所述多帧图像中的每帧图像,根据采集该帧图像时的相机姿态数据,将所述目标三维模型与该帧图像中的所述目标对象进行适配,并基于适配结果测量所述目标对象的形状参数。
  13. 根据权利要求1-10任一项所述的方法,其特征在于,所述目标对象为人体上的脚部对象、手部对象、头部对象、肘部对象或腿部对象,所述目标对象对应的三维模型描述信息是基于SMPL模型确定的。
  14. 一种商品信息处理方法,其特征在于,包括:
    获取包含包括试穿对象的多帧图像,以及所述试穿对象对应的三维模型描述信息;
    将所述多帧图像输入特征提取网络进行特征提取,以得到所述多帧图像的特征向 量,对所述多帧图像的特征向量进行拼接,得到目标拼接特征向量;
    将所述目标拼接特征向量输入参数回归网络,根据所述三维模型描述信息预测用于模型控制的多个控制参数,所述多个控制参数包括姿态控制参数和形状控制参数;
    按照所述姿态控制参数和形状控制参数对所述试穿对象的初始三维模型进行蒙层处理,得到所述试穿对象的目标三维模型,所述初始三维模型是根据所述三维模型描述信息生成的;
    根据所述目标三维模型为所述试穿对象提供与之适配的目标商品信息。
  15. 根据权利要求14所述的方法,其特征在于,根据所述目标三维模型为所述试穿对象提供与之适配的目标商品信息,包括:
    根据所述目标三维模型以及多个候选商品信息对应的商品三维模型,从多个候选商品信息选择商品三维模型与所述目标三维模型适配度最高的商品信息作为所述目标商品信息,并将所述目标商品信息提供给所述试穿对象;
    或者
    根据所述目标三维模型对应的模型参数和选定的商品类型,为试穿对象定制化与所述目标三维模型适配的商品三维模型,并将所述商品三维模型对应的商品信息作为目标商品信息提供给所述试穿对象。
  16. 一种三维重建装置,其特征在于,包括:
    图像获取单元,用于获取目标对象的多帧图像,以及所述目标对象对应的三维模型描述信息;
    特征提取单元,用于将所述多帧图像输入特征提取网络进行特征提取,以得到所述多帧图像的特征向量;
    向量拼接单元,用于对所述多帧图像的特征向量进行拼接,得到目标拼接特征向量;
    参数回归单元,用于将所述目标拼接特征向量输入参数回归网络,根据所述参数数量预测用于模型控制的多个控制参数集,所述多个控制参数包括姿态控制参数和形状控制参数;
    蒙层处理单元,用于按照所述姿态控制参数和形状控制参数对所述目标对象的初始三维模型进行蒙层处理,得到所述目标对象的目标三维模型,所述初始三维模型是根据所述三维模型描述信息得到的。
  17. 一种计算机设备,其特征在于,包括:存储器和处理器;所述存储器,用于存储计算机程序,所述处理器与所述存储器耦合,用于执行所述计算机程序,以用于实现权利要求1-13以及权利要求14-15中任一项所述方法中的步骤。
  18. 一种存储有计算机程序的计算机可读存储介质,其特征在于,当所述计算机程序被处理器执行时,致使所述处理器执行权利要求1-13以及权利要求14-15中任一项所述方法中的步骤。
PCT/CN2023/071989 2022-10-14 2023-01-13 三维重建与商品信息处理方法、装置、设备及存储介质 WO2024077809A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202211257959.4 2022-10-14
CN202211257959.4A CN115359192B (zh) 2022-10-14 2022-10-14 三维重建与商品信息处理方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2024077809A1 true WO2024077809A1 (zh) 2024-04-18

Family

ID=84008726

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/071989 WO2024077809A1 (zh) 2022-10-14 2023-01-13 三维重建与商品信息处理方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN115359192B (zh)
WO (1) WO2024077809A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115359192B (zh) * 2022-10-14 2023-03-28 阿里巴巴(中国)有限公司 三维重建与商品信息处理方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514450A (zh) * 2012-06-29 2014-01-15 华为技术有限公司 一种图像特征提取方法和图像校正方法以及设备
CN114066987A (zh) * 2022-01-12 2022-02-18 深圳佑驾创新科技有限公司 一种相机位姿估计方法、装置、设备及存储介质
CN114549765A (zh) * 2022-02-28 2022-05-27 北京京东尚科信息技术有限公司 三维重建方法及装置、计算机可存储介质
WO2022151661A1 (zh) * 2021-01-15 2022-07-21 浙江商汤科技开发有限公司 一种三维重建方法、装置、设备及存储介质
CN114841783A (zh) * 2022-05-27 2022-08-02 阿里巴巴(中国)有限公司 商品信息处理方法、装置、终端设备及存储介质
CN115359192A (zh) * 2022-10-14 2022-11-18 阿里巴巴(中国)有限公司 三维重建与商品信息处理方法、装置、设备及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2016170050A (ja) * 2015-03-12 2016-09-23 キヤノン株式会社 位置姿勢計測装置、位置姿勢計測方法及びコンピュータプログラム
CN109584295B (zh) * 2017-09-29 2022-08-26 阿里巴巴集团控股有限公司 对图像内目标物体进行自动标注的方法、装置及系统
CN111009007B (zh) * 2019-11-20 2023-07-14 广州光达创新科技有限公司 一种指部多特征全面三维重建方法
CN111783611B (zh) * 2020-06-28 2023-12-29 阿波罗智能技术(北京)有限公司 无人车的定位方法、装置、无人车及存储介质
CN113959444A (zh) * 2021-09-30 2022-01-21 达闼机器人有限公司 用于无人设备的导航方法、装置、介质及无人设备
CN114219890A (zh) * 2021-11-10 2022-03-22 中国科学院深圳先进技术研究院 一种三维重建方法、装置、设备及计算机存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514450A (zh) * 2012-06-29 2014-01-15 华为技术有限公司 一种图像特征提取方法和图像校正方法以及设备
WO2022151661A1 (zh) * 2021-01-15 2022-07-21 浙江商汤科技开发有限公司 一种三维重建方法、装置、设备及存储介质
CN114066987A (zh) * 2022-01-12 2022-02-18 深圳佑驾创新科技有限公司 一种相机位姿估计方法、装置、设备及存储介质
CN114549765A (zh) * 2022-02-28 2022-05-27 北京京东尚科信息技术有限公司 三维重建方法及装置、计算机可存储介质
CN114841783A (zh) * 2022-05-27 2022-08-02 阿里巴巴(中国)有限公司 商品信息处理方法、装置、终端设备及存储介质
CN115359192A (zh) * 2022-10-14 2022-11-18 阿里巴巴(中国)有限公司 三维重建与商品信息处理方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN115359192A (zh) 2022-11-18
CN115359192B (zh) 2023-03-28

Similar Documents

Publication Publication Date Title
US10679046B1 (en) Machine learning systems and methods of estimating body shape from images
US11030782B2 (en) Accurately generating virtual try-on images utilizing a unified neural network framework
CN111787242B (zh) 用于虚拟试衣的方法和装置
JP2021008126A (ja) 3dプリントされたカスタム着用物の生成
US11145133B2 (en) Methods and systems for generating an animated 3D model based on a 2D image
TWI554951B (zh) 實現虛擬試戴的方法和裝置
CN110096156B (zh) 基于2d图像的虚拟换装方法
KR20180121494A (ko) 단안 카메라들을 이용한 실시간 3d 캡처 및 라이브 피드백을 위한 방법 및 시스템
CN108876886B (zh) 图像处理方法、装置和计算机设备
US20220301295A1 (en) Recurrent multi-task convolutional neural network architecture
CN113496507A (zh) 一种人体三维模型重建方法
Hu et al. 3DBodyNet: fast reconstruction of 3D animatable human body shape from a single commodity depth camera
US10395404B2 (en) Image processing device for composite images, image processing system and storage medium
WO2021052208A1 (zh) 用于运动障碍病症分析的辅助拍摄设备、控制方法和装置
CN111783511A (zh) 美妆处理方法、装置、终端以及存储介质
JP7499280B2 (ja) 人物の単眼深度推定のための方法およびシステム
WO2024077809A1 (zh) 三维重建与商品信息处理方法、装置、设备及存储介质
EP3847628A1 (en) Marker-less augmented reality system for mammoplasty pre-visualization
CN114821675B (zh) 对象的处理方法、系统和处理器
US20240119681A1 (en) Systems and methods for using machine learning models to effect virtual try-on and styling on actual users
CN115358828B (zh) 基于虚拟试穿的信息处理与交互方法、装置、设备及介质
CN116452745A (zh) 手部建模、手部模型处理方法、设备和介质
CN109040612B (zh) 目标对象的图像处理方法、装置、设备及存储介质
CN113450448A (zh) 图像的处理方法、装置和系统
US20240020901A1 (en) Method and application for animating computer generated images

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23876024

Country of ref document: EP

Kind code of ref document: A1