WO2023020327A1 - Image processing - Google Patents

Image processing Download PDF

Info

Publication number
WO2023020327A1
WO2023020327A1 PCT/CN2022/111023 CN2022111023W WO2023020327A1 WO 2023020327 A1 WO2023020327 A1 WO 2023020327A1 CN 2022111023 W CN2022111023 W CN 2022111023W WO 2023020327 A1 WO2023020327 A1 WO 2023020327A1
Authority
WO
WIPO (PCT)
Prior art keywords
foot
image
key point
dimensional
model
Prior art date
Application number
PCT/CN2022/111023
Other languages
French (fr)
Chinese (zh)
Inventor
何野
四建楼
王玉峰
杜天元
王明峰
钱晨
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023020327A1 publication Critical patent/WO2023020327A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present application relates to the technical field of computer vision, in particular to image processing.
  • Human body key point detection technology refers to the technology of using deep learning technology to extract features from input images, and using the extracted feature maps to locate key points.
  • the key point detection technology of the human body can only detect the key points of the human body, and cannot detect the two-dimensional position information of the key points of the foot that can be estimated for the pose of the foot, so the three-dimensional pose of the foot cannot be obtained.
  • the present application discloses an image processing method.
  • the method may include: acquiring a region image corresponding to the foot in the image to be processed; using a foot key point detection model to perform key point detection on the region image to obtain the first foot key point of the foot Two-dimensional position information; based on the mapping between the preset position information in three-dimensional space and the two-dimensional position information of the second key point of the foot corresponding to the key point of the first foot in the preset three-dimensional model of the foot relationship, and determine the three-dimensional pose of the foot in the three-dimensional space.
  • the present application also proposes an image processing device, which includes: a first acquisition module, used to acquire an area image corresponding to the foot in the image to be processed; a key point detection module, used to use the foot key A point detection model, which detects the key points of the region image to obtain the two-dimensional position information of the first foot key point of the foot; the determination module is used to match the first three-dimensional model based on the foot
  • the mapping relationship between the preset position information of the second key point of the foot corresponding to the key point of the foot in the three-dimensional space and the two-dimensional position information determines the three-dimensional pose of the foot in the three-dimensional space.
  • the present application also proposes an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein, the processor executes the executable instructions to implement the out image processing method.
  • the present application also provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to cause a processor to execute the image processing method as shown in any one of the foregoing embodiments.
  • the key point detection model of the foot can be used to detect the key points of the foot area image corresponding to the foot to obtain the two-dimensional position information of the first key point of the foot; and then based on the preset The mapping relationship between the preset position information of the second key point of the foot corresponding to the first key point of the foot in the three-dimensional space and the two-dimensional position information in the three-dimensional model of the foot can determine the The three-dimensional pose of the head in the three-dimensional space.
  • neural network regression can be used to obtain the two-dimensional position information of the first key point of the foot for pose estimation, so as to facilitate subsequent determination of the position of the foot in the three-dimensional position based on the mapping relationship. 3D pose in space.
  • a three-dimensional virtual model of the shoe material can be superimposed on the position corresponding to the foot in the image to be processed , to get the augmented reality effect of virtual shoe fitting.
  • Fig. 1 is a method flowchart of an image processing method shown in the present application
  • FIG. 2 is a schematic flow diagram of a method for acquiring an area image shown in the present application
  • Fig. 3 is a schematic flow chart of a model training method shown in the present application.
  • FIG. 4 is a schematic flow diagram of a foot tracking method shown in the present application.
  • Fig. 5 is a schematic diagram of a virtual shoe trial process shown in the present application.
  • Fig. 6 is a schematic flow chart of a virtual shoe-trying method shown in the present application.
  • FIG. 7 is a schematic structural diagram of an image processing device shown in the present application.
  • FIG. 8 is a schematic diagram of a hardware structure of an electronic device shown in the present application.
  • This application relates to the field of augmented reality.
  • the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places.
  • Vision-related algorithms may involve visual positioning, SLAM (Simultaneous Localization and Mapping), 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc.
  • Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as model display.
  • the relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network.
  • the aforementioned convolutional neural network is a network model obtained through model training based on a deep learning framework.
  • This application proposes an image processing method.
  • the method can use the key point detection model of the foot to detect the key points of the foot area image corresponding to the foot, and obtain the two-dimensional position information of the first key point of the foot; then based on the preset three-dimensional model of the foot and the The mapping relationship between the preset position information of the second foot key point corresponding to the first foot key point in the three-dimensional space and the two-dimensional position information can determine the position of the foot in the three-dimensional space 3D pose.
  • neural network regression can be used to obtain the two-dimensional position information of the first key point of the foot for pose estimation, so as to facilitate subsequent determination of the position of the foot in the three-dimensional position based on the mapping relationship. 3D pose in space.
  • the first foot key point is at least one key point in at least one preset foot area of the foot in the image.
  • the predetermined foot area may be any area in the foot.
  • the preset area may be the thumb area of the foot.
  • the second key point of the foot refers to a key point in the same area of the foot as the first key point of the foot among the key points of the foot included in the preset three-dimensional model of the foot.
  • the method can be applied to electronic equipment.
  • the electronic device may implement the method by carrying a software device corresponding to the image processing method.
  • the type of the electronic device may be a notebook computer, a computer, a server, a mobile phone, a PAD terminal and the like.
  • the present application does not specifically limit the specific type of the electronic device.
  • the electronic device may be a device on the client side or on the server side.
  • the server may be a server or a cloud provided by a server, a server cluster or a distributed server cluster.
  • FIG. 1 is a method flowchart of an image processing method shown in the present application. As shown in Fig. 1, the method may include S102-S106.
  • S102 acquire an area image corresponding to the foot in the image to be processed.
  • the image to be processed may include feet.
  • the purpose of this application is to capture the foot in the image to be processed and obtain the three-dimensional pose of the foot.
  • the image to be processed may be an image transmitted by a user through a client program. In this way, images uploaded by users can be processed.
  • the image to be processed may also be an image collected by image collection hardware.
  • the image acquisition hardware may be a camera mounted on a mobile phone terminal. Users can collect video streams in real time through the camera.
  • the image to be processed may be a picture, or an image in the video stream.
  • the foot can be captured in the image to be processed in real time, and the three-dimensional pose estimation of the foot can be performed.
  • the feet may refer to the feet of a human body or other animals or robots.
  • the foot may comprise a left foot or a right foot.
  • the region image may be an image corresponding to the foot region among the images to be processed.
  • the image to be processed is a picture or the first frame image in a certain video stream, and the area surrounded by the detection frame of the foot in the image to be processed may be determined as the area image .
  • the image to be processed is a non-first frame image in the video stream, the method for acquiring the area image will be described in subsequent embodiments, and will not be described in detail here.
  • FIG. 2 is a schematic flowchart of a method for acquiring an area image shown in the present application.
  • S21-S22 may be executed when S102 is executed.
  • S21 using an object detection model to perform object detection on the image to be processed to obtain an object detection result, where the object detection result includes a detection frame of a foot in the image to be processed.
  • the image to be processed may be input into the trained object detection model to obtain a foot detection frame corresponding to the foot included in the image to be processed.
  • the object detection model can be based on RCNN (Region Convolutional Neural Networks, regional convolutional neural network), FAST-RCNN (Fast Region Convolutional Neural Networks, fast regional convolutional neural network) or FASTER-RCNN (Faster Region Convolutional Neural Networks , a model built by a faster regional convolutional neural network).
  • RCNN Regular Convolutional Neural Networks, regional convolutional neural network
  • FAST-RCNN Fest Region Convolutional Neural Networks, fast regional convolutional neural network
  • FASTER-RCNN Faster Region Convolutional Neural Networks , a model built by a faster regional convolutional neural network.
  • the object detection model can be trained. Specifically, multiple training samples marked with detection frame information corresponding to the feet can be obtained. The model may then be trained under supervision based on the training samples until the model converges.
  • the object detection model can be used to detect the feet and the foot detection frame in the image.
  • the foot detection frame obtained by object detection and the image to be processed (or the feature map obtained by performing feature extraction on the image to be processed by using the backbone network) can be input into the area feature extraction unit to obtain the area image.
  • the region feature extraction unit may be a ROI Align (Region of Interest Align, region of interest alignment) unit or ROI Pooling (Region of Interest Pooling, region of interest collection) unit. This unit can extract the area image surrounded by the foot detection frame in the image to be processed.
  • ROI Align Region of Interest Align, region of interest alignment
  • ROI Pooling Region of Interest Pooling, region of interest collection
  • the deep learning technology can be used to accurately perform matting processing on the image to be processed to obtain an image of the area corresponding to the foot.
  • the foot key point detection model includes a neural network model trained based on a plurality of foot region image samples labeled with position information of the first foot key point.
  • the first foot key point refers to at least one key point in at least one preset foot area of the foot in the image.
  • the predetermined foot area may be any area in the foot.
  • the preset area may be the thumb area of the foot.
  • the predetermined foot area may be any area in the foot.
  • the preset area may be the thumb area of the foot.
  • the number of the first foot key points and the preset foot area can be set according to business requirements.
  • the first key point of the foot may include a plurality of key points on the contour of the edge of the foot. Therefore, the outline of the foot can be well represented according to the first key point of the foot, thereby improving the effect of estimating the pose of the foot.
  • the two-dimensional position information may indicate the two-dimensional position information of the first foot key point in the image to be processed. In some embodiments, the two-dimensional position information may indicate the two-dimensional coordinates of the first foot key point in the image to be processed.
  • the number of the first foot key points may be no less than four. This can improve the fine-grainedness of the key points of the feet, increase the pose information of the feet, and thus improve the pose estimation effect of the feet.
  • the number of the first foot key points can be any number within the range of four to fifteen, by setting the key points at the necessary foot positions, so that the first foot key points The set quantity can be within an appropriate quantity range, which is also convenient for subsequent three-dimensional pose detection of the feet.
  • the first foot key points include key points of at least one of the following areas:
  • points in the protruding area and/or concave area on the foot contour can be used as the first key point of the foot to improve the accuracy of the foot contour representation, thereby improving the effect of foot pose estimation.
  • the foot key point detection model can be a regression or classification model constructed based on a neural network or a deep learning network.
  • the model is used to detect the two-dimensional position information of the key point of the first foot.
  • Fig. 3 is a schematic flow chart of a model training method shown in the present application. As shown in FIG. 3, S31-S33 may be executed during model training.
  • S31 acquire multiple training samples.
  • the training samples may be foot images including feet.
  • the two-dimensional position information of the first key point of the foot may be marked in the foot image.
  • the model has the ability to predict the two-dimensional position information of the key points of the first foot.
  • the area image obtained in S102 may be input into the trained foot key point detection model to obtain two-dimensional position information of the first foot key point of the foot.
  • the three-dimensional space refers to the three-dimensional space to be projected by the feet. This space is usually set according to business needs.
  • the preset three-dimensional model of the foot may be a three-dimensional model maintained in advance according to business requirements.
  • the preset three-dimensional model of the foot may include a plurality of key points of the foot.
  • the second key point of the foot refers to at least one of the key points of the foot included in the preset three-dimensional model of the foot, at least one of the foot areas that is the same as at least one foot area of the first key point of the foot key point.
  • the foot region may be any region of the foot.
  • the foot area may be the thumb area of the foot.
  • the preset position information may indicate the three-dimensional position information of the second key point of the foot in the three-dimensional space under the standard pose of the preset three-dimensional model of the foot.
  • the preset position information may be the three-dimensional coordinates of the second key point of the foot in the three-dimensional space in the standard pose of the three-dimensional model of the foot.
  • the standard pose may be a preset pose according to business requirements.
  • the pose at the origin of the three-dimensional coordinate system corresponding to the three-dimensional space and perpendicular to the plane formed by the X-axis and the Y-axis of the three-dimensional coordinate system may be determined as the standard pose.
  • the three-dimensional coordinates of the second key point of the foot when the three-dimensional model of the foot is in the standard pose is the preset position information.
  • the three-dimensional pose is used to indicate the posture of the foot in the three-dimensional space in the image to be processed.
  • the three-dimensional pose may include translation and rotation of the foot relative to X, Y, and Z axes in the three-dimensional coordinate system.
  • the three-dimensional mapping algorithm may include PNP (Perspective-N-Point, multi-point perspective imaging) and other similar algorithms. This application does not specifically limit the algorithm for solving the mapping relationship.
  • the PNP algorithm has two inputs. One is the two-dimensional coordinates of the first foot key point in the image to be processed; the other is the three-dimensional coordinate of the second foot key point in the three-dimensional space.
  • This algorithm can be based on the mapping relationship between the two-dimensional coordinates of each foot key point in the first foot key point and the three-dimensional coordinates corresponding to the second foot key point, to obtain the three-dimensional coordinates of the foot in the three-dimensional 3D pose in space.
  • the three-dimensional coordinates of the second key point of the foot in the three-dimensional space may be obtained. Then input the three-dimensional coordinates and the two-dimensional coordinates into the solution formula corresponding to the PNP algorithm to obtain the three-dimensional pose of the foot in the three-dimensional space.
  • the key point detection model of the foot can be used to detect the key points of the foot area image corresponding to the foot to obtain the two-dimensional position information of the key point of the first foot; then based on the preset foot
  • the mapping relationship between the preset position information of the second foot key point corresponding to the first foot key point in the three-dimensional space and the two-dimensional position information in the three-dimensional model of the foot can determine the foot 3D pose in the 3D space.
  • neural network regression can be used to obtain the two-dimensional position information of the first key point of the foot for pose estimation, so as to facilitate subsequent determination of the position of the foot in the three-dimensional position based on the mapping relationship. 3D pose in space.
  • the image processing method shown in this application can also identify the type of the foot, that is, whether the foot is left or right.
  • the object detection result obtained through S21 includes not only the detection frame of the foot in the image to be processed, but also the type of the foot.
  • an object detection model when performing S21, may be used to perform object detection on the image to be processed to obtain the detection frame of the foot included in the image to be processed and the type of the foot.
  • the object detection model can be used to distinguish the left and right feet on the basis of detecting the foot detection frame in the image to be processed.
  • the object detection model includes a neural network model trained based on a plurality of training samples marked with detection frames corresponding to feet and type information; the type indicates that the foot is left or right.
  • the object detection model can be based on RCNN (Region Convolutional Neural Networks, regional convolutional neural network), FAST-RCNN (Fast Region Convolutional Neural Networks, fast regional convolutional neural network) or FASTER-RCNN (Faster Region Convolutional Neural Networks , a model built by a faster regional convolutional neural network).
  • RCNN Regular Convolutional Neural Networks, regional convolutional neural network
  • FAST-RCNN Fest Region Convolutional Neural Networks, fast regional convolutional neural network
  • FASTER-RCNN Faster Region Convolutional Neural Networks , a model built by a faster regional convolutional neural network.
  • the object detection model may be trained. Specifically, a plurality of training samples marked with detection frame information and type information corresponding to the feet (that is, left foot or right foot) can be obtained. The model may then be trained under supervision based on the training samples until the model converges.
  • the object detection model can be used to detect the detection frame and the type of the foot in the image.
  • S103 may also be executed, in response to the foot being of a preset type, flipping the region image so that all regions of the foot key point detection model are input.
  • the types of the feet in the images are consistent, thereby ensuring that the feet in all regional images input to the foot key point detection model are of the same type (ie, left foot or right foot), which is convenient for the foot key point detection model to process.
  • the preset type can be set according to business requirements. In some embodiments, when training the foot key point detection model, a left foot sample may be used for training. At this point, the preset type can be set to right foot. When executing S103, if the foot is recognized as a right foot, the area image corresponding to the foot is reversed so that the foot in the area image becomes a left foot type, which is convenient for the foot key point detection model to process.
  • image processing needs to be performed on the images in the captured video stream.
  • foot tracking is required.
  • the object detection model can be used to perform object detection on each frame of images in the video stream to obtain the foot detection frame in the video stream, and then determine the same foot detection frame in the video stream according to the position of the detected foot detection frame. Foot for foot tracking.
  • the feature that the position of the feet in adjacent frame images does not change significantly can be used to track the feet in the video stream, reducing the need for foot tracking through the object detection model.
  • FIG. 4 is a schematic flowchart of a foot tracking method shown in the present application. As shown in FIG. 4, S41-S43 can be executed when foot tracking is implemented.
  • S41 acquire an area image corresponding to the foot in the image to be processed.
  • the image to be processed is an image in a video stream.
  • the image to be processed may be the first frame image or images after the first frame image in a certain video stream.
  • the area image can be acquired through steps S21-S22.
  • the position information of the first foot key point of the foot in the previous frame image of the frame where the image to be processed is located in the video stream may be acquired, And according to the obtained position information, determine the foot key point frame. Based on the key point frame of the foot and the image to be processed, an area image corresponding to the foot is obtained.
  • the key point frame of the foot can be a key point frame of any shape.
  • the foot key point box may be a rectangular box.
  • the two-dimensional position information of the first key point of the foot in the cached image of the previous frame of the image to be processed may be acquired.
  • the maximum coordinates x0, y0 and the minimum coordinates x1, y1 in the directions of the X and Y axes can be respectively determined.
  • a rectangular frame composed of four vertices (x0, y0), (x0, y1), (x1, y0) and (x1, y1) can be used as the foot key point frame.
  • the key point frame of the foot and the image to be processed can be input into the region feature extraction unit to obtain the second region image .
  • the region feature extraction unit may be a ROI Align (Region of Interest Align, region of interest alignment) unit or ROI Pooling (Region of Interest Pooling, region of interest collection) unit.
  • the unit can extract the second area image surrounded by the frame of the key points of the foot in the second image.
  • the area image corresponding to the foot is obtained based on the key point frame of the foot and the image to be processed, it is also possible to obtain the object that is processed on the area image in the previous frame image.
  • the type of the foot in the previous frame image stored during detection or classification; in response to the type of the foot in the previous frame image being a preset type, the region image is flipped so that the input foot
  • the type of feet is consistent in all region images of the keypoint detection model. Therefore, it is ensured that the feet in all the region images input to the foot key point detection model are of the same type (ie left foot or right foot), which is convenient for the foot key point detection model to process.
  • the classification result indicates whether the region image is a foot image.
  • the image classification model may include a convolutional neural network.
  • image classification information indicates whether the image sample is a foot image. This model can then be trained under supervision based on image samples.
  • the image classification model can be used to perform image classification on the region image to obtain a classification result.
  • the region image obtained in S41 is the region image at the same position as the foot position in the previous frame image in the image to be processed, if the classification result of the region image indicates that it is a foot image, Then it can be explained that there is also a foot in the image to be processed at the same position as the foot in the previous frame image. According to the characteristic that the position of the same foot in several adjacent frames of images does not change significantly, it can be determined that the foot in the region image is the same foot as the foot in the previous frame of image.
  • the foot tracking in the video stream is realized, the calculation amount brought by the foot tracking through the object detection model is reduced, the overhead is reduced, the foot tracking efficiency is improved, and the real-time performance of the image processing method is improved.
  • S104 may be continued to perform key point detection on the region image by using the foot key point detection model to obtain two-dimensional position information of the first foot key point of the foot. S104 will not be described in detail here.
  • the foot in the region image is the same foot as the foot in the previous frame image, based on the preset three-dimensional model of the foot and the foot in the
  • the mapping relationship between the preset position information of the second foot key point corresponding to the first foot key point in the three-dimensional space and the two-dimensional position information determines the three-dimensional position of the foot in the three-dimensional space posture.
  • the step information of pose estimation is not described in detail here.
  • the classification result indicates that the area image is not a foot image
  • the image to be processed can be used as an image of a video tracking failure, and the area image is acquired by using the steps of S21-S22, Then through S104 and S106, the three-dimensional pose of the foot is obtained.
  • the image classification model shares a feature extraction network with the foot landmark detection model.
  • the backbone network For example, the backbone network.
  • the image classification model and the foot key point detection model can be trained in a joint training manner, thereby improving the ability of the feature extraction network to extract features, thereby extracting useful feature information for classification and key point detection, Improve classification and keypoint detection performance.
  • the first image sample labeled with image classification information and the second image sample labeled with position information of the first key point of the foot may be acquired.
  • the first image sample and the second image sample may be the same image.
  • the classification information indicates whether the first image sample is an image of a foot.
  • the first image sample may be input into the image classification model to obtain a classification prediction result, and the first loss information may be obtained according to the classification prediction result and the marked image classification information; and the second image sample may be input into the
  • the foot key point detection model obtains the first foot key point position prediction result, and obtains the second loss information according to the first foot key point position prediction result and the marked first foot key point position information.
  • model parameters of the image classification model and the foot key point detection model may be adjusted based on the first loss information and the second loss information.
  • the foregoing training steps may be performed iteratively for multiple times until the image classification model and the foot key point detection model converge.
  • the image classification model shares the feature extraction network with the foot key point detection model, when any model is trained, it will affect the training of the other model, so that the training of the two models can be Complement and promote each other to realize joint training of the image classification model and the foot key point detection model.
  • the ability of the feature extraction network to extract useful feature information for classification and key point detection can be improved, and the effect of classification and key point detection can be improved; Complement and promote each other to improve the efficiency of model training.
  • virtual shoe fitting may be performed.
  • a three-dimensional virtual model of the shoe material may be obtained first. Then, based on the three-dimensional pose of the foot in the image to be processed, the three-dimensional virtual model is superimposed on the position corresponding to the foot in the image to be processed, so as to obtain the augmented reality effect of virtual shoe fitting.
  • the three-dimensional virtual model can be used to indicate the outline and/or texture color of the shoe material.
  • the 3D virtual model may include the 3D coordinates of each vertex of the shoe material in 3D space, and the pixel value of each vertex.
  • shoe material By superimposing the positions corresponding to the feet using the three-dimensional virtual model, shoe material can be displayed in the image to be processed, thereby achieving the augmented reality effect of virtual shoe fitting.
  • FIG. 5 is a schematic diagram of a virtual shoe fitting process shown in the present application.
  • the method may include S51-S54.
  • S51 acquire the initial pose of the 3D virtual model corresponding to the shoe material in the 3D space.
  • the initial pose is used to indicate the pose information of the shoe material in three-dimensional space.
  • the initial pose of the shoe material in the three-dimensional space may be pre-maintained in the database.
  • the three-dimensional pose can be acquired from the database.
  • the pose transformation information of the initial pose can be determined based on the 3D pose of the foot.
  • the pose transformation information may indicate translation and rotation amounts of the initial pose relative to the X-axis, Y-axis, and Z-axis of the three-dimensional coordinate system.
  • the converted three-dimensional pose may indicate pose information after converting the shoe material to the pose of the foot.
  • the pose transformation information may include translation and rotation information.
  • the pose transformation information may be used to perform a translation and rotation operation on the initial position to obtain the translation and rotation of the shoe material. Pose information after arriving at the pose of the foot.
  • the three-dimensional coordinates of each vertex in the three-dimensional virtual model may be adjusted based on the converted three-dimensional pose first, to obtain an adjusted three-dimensional virtual model. Then use a projection algorithm to project the three-dimensional coordinates of each vertex in the adjusted three-dimensional virtual model to the two-dimensional plane where the image to be processed is located, to obtain the two-dimensional coordinates of each vertex.
  • the shape corresponding to the two-dimensional virtual material corresponding to the shoe material can be determined according to the two-dimensional coordinates of the vertices, and can be determined according to the pixel values of the vertices (obtained from the pixel values of the vertices in the three-dimensional virtual model).
  • the texture and color corresponding to the two-dimensional virtual material so as to obtain the two-dimensional virtual material.
  • the image fusion can be completed by covering each pixel of the two-dimensional virtual material with the pixel at the corresponding position of the foot, or adjusting the transparency of the pixel at the corresponding position of the foot.
  • the fused image is obtained, and the AR effect of virtual shoe fitting can be displayed by outputting the fused image.
  • Embodiments will be described below in conjunction with a virtual shoe fitting scene.
  • the shoes in the three-dimensional space can be used to fuse (for example, image rendering) the feet captured by the video stream to complete the virtual shoe fitting.
  • the virtual shoe trial client can be carried in the mobile terminal.
  • the mobile terminal can be equipped with a camera for real-time collection of video streams.
  • the virtual shoe library can be installed locally on the mobile terminal (hereinafter referred to as the terminal), or in a server corresponding to the virtual shoe-trying terminal (hereinafter referred to as the client).
  • the virtual shoe library may include 3D virtual models of various shoe materials developed.
  • the virtual shoe library can be any type of database.
  • the user can select shoes to try on from the virtual shoe library in the virtual shoe trial client, and collect foot video streams through the camera.
  • FIG. 6 is a schematic flowchart of a virtual shoe fitting method shown in the present application. As shown in Fig. 6, the method may include S601-S611.
  • S601 may be executed for the first image in the video stream.
  • the first image includes the first frame image in the video stream, or the image after it is determined that the foot tracking fails according to steps S607-S610.
  • S601. Use the pre-trained object detection model to perform object detection on the first image to obtain the foot detection frame in the first image and the corresponding type of the foot, that is, the foot is a left foot or a right foot. foot. In the following, it is assumed that this foot is the right foot.
  • the key points of the first foot include the key points of at least part of the foot area in the following areas: the tip of the big toe; the inner joint of the forefoot; the inner arch of the foot; the inner side of the rear sole; the rear of the heel; the outer side of the rear sole; joint; junction of forefoot and leg; medial ankle joint; rear hamstring; lateral ankle joint.
  • the outline of the foot can be described in detail at a fine-grained level, including a large amount of three-dimensional information of the foot, and the accuracy of pose estimation can be improved.
  • S606 Obtain a 3D virtual model of the shoe corresponding to the right foot from the virtual shoe library, and fuse the shoe with the foot according to the obtained 3D virtual model and the 3D pose of the foot, To demonstrate the effect of virtual shoe fitting.
  • S606 reference may be made to S51-S54, which will not be described in detail here.
  • S607 may be executed for the second image in the video stream.
  • the second image may be any non-first frame image after the first frame image in the video stream.
  • S607. Acquire position information of the first key point of the foot in a previous frame image of the second image, and determine a key point frame of the foot based on the position information.
  • the foot tracking model includes a classification branch and a key point detection branch.
  • the model is obtained through joint training based on the left foot image sample marked with the position information of the first foot key point and the left foot image sample with image classification information.
  • S610 according to the classification branch of the pre-trained foot tracking model, determine whether the second region image is a foot image, and according to the key point branch of the pre-trained foot tracking model, determine the first foot of the foot location information of key points.
  • the second region image is a foot image based on the image classification information, it can be determined that the foot in the second image is the foot that appeared in the previous frame image, complete the foot tracking, and execute S605 and S606 show the effect of virtual shoe fitting.
  • the foot tracking fails, and S601-S606 can be performed using the second image as the first image to obtain the foot in the other images. 3D pose of the head.
  • the key point detection model of the foot can be used to obtain the position information of the key point of the first foot through regression;
  • the three-dimensional pose of the foot then based on the three-dimensional pose of the foot in the image to be processed, the three-dimensional virtual model corresponding to the shoe can be superimposed on the position corresponding to the foot in the first image or the second image to obtain a virtual trial Augmented reality effect of shoes.
  • the feature that the position of the foot in the adjacent frame image does not change significantly can be used to track the foot in the video stream, reducing the amount of calculations brought by the foot tracking through the object detection model, and reducing the overhead , improve the efficiency of foot tracking, thereby improving the real-time performance of the virtual shoe fitting method.
  • the present application proposes an image processing device 70 .
  • FIG. 7 is a schematic structural diagram of an image processing device shown in the present application.
  • the device 70 may include:
  • the first obtaining module 71 is used to obtain the area image corresponding to the foot in the image to be processed
  • the key point detection module 72 is used to use the foot key point detection model to perform key point detection on the region image to obtain the two-dimensional position information of the first foot key point of the foot;
  • a determining module 73 configured to be based on the difference between the preset position information in the three-dimensional space of the second key point of the foot corresponding to the first key point in the three-dimensional model of the foot and the two-dimensional position information The mapping relationship determines the three-dimensional pose of the foot in the three-dimensional space.
  • the first obtaining module 71 is specifically configured to:
  • the object detection result includes a detection frame of the foot in the image to be processed
  • an area image corresponding to the foot is obtained.
  • the object detection result further includes the type of the foot; the type is used to indicate that the foot is a left foot or a right foot;
  • the device 70 also includes:
  • the first inversion module is configured to perform inversion processing on the region images in response to the feet being of a preset type, so that the types of feet in all region images input to the foot key point detection model are consistent.
  • the first acquisition module 71 includes:
  • an area image corresponding to the foot is obtained.
  • the device 70 also includes:
  • the second obtaining module is used to obtain the type of the foot in the stored image of the previous frame
  • the second flip module is used to perform flip processing on the regional images in response to the type of the feet in the previous frame image being a preset type, so that the feet in all the regional images of the input foot key point detection model
  • the type of department is the same.
  • the image to be processed is an image in a video stream, and the video stream also includes an image in the previous frame of the image to be processed; the region image of the image to be processed is based on the Determine the foot key point frame determined by the first foot key point in the previous frame image;
  • the device 70 also includes:
  • a tracking module configured to use an image classification model to classify the region image to obtain a classification result of the region image; the classification result is used to indicate whether the region image is a foot image;
  • the foot in the area image is the same foot as the foot in the previous frame image, so as to perform foot tracking.
  • the determining module 73 is specifically configured to:
  • the image classification model shares a feature extraction network with the foot key point detection model; the device 70 also includes:
  • the joint training module of the image classification model and the foot key point detection model is used to obtain the first image sample marked with image classification information, and the second image sample marked with the position information of the first foot key point;
  • model parameters of the image classification model and the foot key point detection model are adjusted.
  • the first key points of the foot include a plurality of key points on the contour of the edge of the foot; the number of the first key points of the foot is not less than four.
  • the first foot key points include key points of at least one of the following regions:
  • the device 70 also includes:
  • the virtual shoe-trying module is used to obtain the three-dimensional virtual model of the shoe material
  • the three-dimensional virtual model is superimposed on the position corresponding to the foot in the image to be processed to obtain an augmented reality effect of virtual shoe fitting.
  • the virtual shoe fitting module is specifically used for:
  • Image fusion of the two-dimensional virtual material and the corresponding position of the feet is performed to obtain a fused image for AR effect display of virtual shoe fitting.
  • Embodiments of the image processing apparatus shown in this application can be applied to electronic equipment.
  • the present application discloses an electronic device, and the device may include: a processor.
  • Memory used to store processor-executable instructions.
  • the processor is configured to call executable instructions stored in the memory to implement the image processing method shown in any one of the foregoing embodiments.
  • Embodiments of the image processing apparatus shown in this application can be applied to electronic equipment.
  • the present application discloses an electronic device, and the device may include: a processor.
  • Memory used to store processor-executable instructions.
  • the processor is configured to call executable instructions stored in the memory to implement the image processing method shown in any one of the foregoing embodiments.
  • FIG. 8 is a schematic diagram of a hardware structure of an electronic device shown in the present application.
  • the electronic device may include a processor for executing instructions, a network interface for connecting to a network, a memory for storing operation data for the processor, and a memory for storing instructions corresponding to the image processing device. volatile memory.
  • the embodiment of the apparatus may be implemented by software, or by hardware or a combination of software and hardware.
  • software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where it is located.
  • the electronic device where the device in the embodiment is usually based on the actual function of the electronic device can also include other Hardware, no more details on this.
  • the device corresponding instructions may also be directly stored in the memory, which is not limited herein.
  • the present application proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to cause a processor to execute the image processing method shown in any one of the foregoing embodiments.
  • one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) having computer-usable program code embodied therein.
  • each embodiment in the present application is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments.
  • the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
  • Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this application and their structural equivalents, or in A combination of one or more of .
  • Embodiments of the subject matter described in this application can be implemented as one or more computer programs, i.e. one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules.
  • the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data
  • the processing means executes.
  • a computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
  • the processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output.
  • the processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
  • FPGA Field Programmable Gate Array
  • ASIC Application Specific Integrated Circuit
  • Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing system.
  • a central processing system will receive instructions and data from read only memory and/or random access memory.
  • the basic components of a computer include a central processing system for implementing or executing instructions and one or more memory devices for storing instructions and data.
  • a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or to It transmits data, or both.
  • mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or to It transmits data, or both.
  • a computer is not required to have such a device.
  • a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.
  • PDA personal digital assistant
  • GPS Global Positioning System
  • USB Universal Serial Bus
  • Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or removable disk), magneto-optical disk, and 0xCD_00ROM and DVD-ROM disks.
  • semiconductor memory devices such as EPROM, EEPROM, and flash memory devices
  • magnetic disks such as internal hard disks or removable disk
  • magneto-optical disk and 0xCD_00ROM and DVD-ROM disks.
  • the processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.

Abstract

The present application provides an image processing method and apparatus, a device, and a storage medium. The method may comprise obtaining a regional image, which is in an image to be processed, corresponding to a foot; performing key point detection on the regional image by means of a foot key point detection model to obtain two-dimensional position information of a first foot key point of the foot; and on the basis of a mapping relationship between preset position information of a second foot key point in a preset foot three-dimensional model corresponding to the first foot key point in a three-dimensional space and the two-dimensional position information, determining a three-dimensional pose of the foot in the three-dimensional space.

Description

图像处理Image Processing 技术领域technical field
本申请涉及计算机视觉技术领域,具体涉及图像处理。The present application relates to the technical field of computer vision, in particular to image processing.
背景技术Background technique
人体关键点检测技术,是指利用深度学习技术对输入图片进行特征提取,并利用提取到的特征图对关键点进行定位的技术。Human body key point detection technology refers to the technology of using deep learning technology to extract features from input images, and using the extracted feature maps to locate key points.
然而,人体关键点检测技术只能检测出人体关键点,无法针对脚部检测出可以进行位姿估计的脚部关键点的二维位置信息,因此无法得到脚部的三维位姿。However, the key point detection technology of the human body can only detect the key points of the human body, and cannot detect the two-dimensional position information of the key points of the foot that can be estimated for the pose of the foot, so the three-dimensional pose of the foot cannot be obtained.
发明内容Contents of the invention
有鉴于此,第一方面,本申请公开一种图像处理方法。所述方法可以包括:获取待处理图像中与脚部对应的区域图像;利用脚部关键点检测模型,对所述区域图像进行关键点检测,得到所述脚部的第一脚部关键点的二维位置信息;基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿。In view of this, in the first aspect, the present application discloses an image processing method. The method may include: acquiring a region image corresponding to the foot in the image to be processed; using a foot key point detection model to perform key point detection on the region image to obtain the first foot key point of the foot Two-dimensional position information; based on the mapping between the preset position information in three-dimensional space and the two-dimensional position information of the second key point of the foot corresponding to the key point of the first foot in the preset three-dimensional model of the foot relationship, and determine the three-dimensional pose of the foot in the three-dimensional space.
第二方面,本申请还提出一种图像处理装置,所述装置包括:第一获取模块,用于获取待处理图像中与脚部对应的区域图像;关键点检测模块,用于利用脚部关键点检测模型,对所述区域图像进行关键点检测,得到所述脚部的第一脚部关键点的二维位置信息;确定模块,用于基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿。In the second aspect, the present application also proposes an image processing device, which includes: a first acquisition module, used to acquire an area image corresponding to the foot in the image to be processed; a key point detection module, used to use the foot key A point detection model, which detects the key points of the region image to obtain the two-dimensional position information of the first foot key point of the foot; the determination module is used to match the first three-dimensional model based on the foot The mapping relationship between the preset position information of the second key point of the foot corresponding to the key point of the foot in the three-dimensional space and the two-dimensional position information determines the three-dimensional pose of the foot in the three-dimensional space.
第三方面,本申请还提出一种电子设备,包括:处理器;用于存储处理器可执行指令的存储器;其中,所述处理器通过运行所述可执行指令以实现如前述任一实施例示出的图像处理方法。In a third aspect, the present application also proposes an electronic device, including: a processor; a memory for storing processor-executable instructions; wherein, the processor executes the executable instructions to implement the out image processing method.
第四方面,本申请还提出一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行如前述任一实施例示出的图像处理方法。In a fourth aspect, the present application also provides a computer-readable storage medium, where the storage medium stores a computer program, and the computer program is used to cause a processor to execute the image processing method as shown in any one of the foregoing embodiments.
在前述实施例公开的技术方案中,可以利用脚部关键点检测模型,对脚部对应的脚部区域图像进行关键点检测,得到第一脚部关键点的二维位置信息;然后基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,即可确定所述脚部在所述三维空间中的三维位姿。与人体关键点检测技术相比,可以利用神经网络回归得到可以进行位姿估计的第一脚部关键点的二维位置信息,从而便于后续基于所述映射关系确定所述脚部在所述三维空间中的三维位姿。In the technical solutions disclosed in the aforementioned embodiments, the key point detection model of the foot can be used to detect the key points of the foot area image corresponding to the foot to obtain the two-dimensional position information of the first key point of the foot; and then based on the preset The mapping relationship between the preset position information of the second key point of the foot corresponding to the first key point of the foot in the three-dimensional space and the two-dimensional position information in the three-dimensional model of the foot can determine the The three-dimensional pose of the head in the three-dimensional space. Compared with the key point detection technology of the human body, neural network regression can be used to obtain the two-dimensional position information of the first key point of the foot for pose estimation, so as to facilitate subsequent determination of the position of the foot in the three-dimensional position based on the mapping relationship. 3D pose in space.
此外,在前述部分实施例记载的技术方案中,还可以基于所述待处理图像中脚部的三维位姿,在所述待处理图像与所述脚部对应的位置叠加鞋子素材的三维虚拟模型,得到虚拟试鞋的增强现实效果。In addition, in the technical solutions described in some of the aforementioned embodiments, based on the three-dimensional pose of the foot in the image to be processed, a three-dimensional virtual model of the shoe material can be superimposed on the position corresponding to the foot in the image to be processed , to get the augmented reality effect of virtual shoe fitting.
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本申请。It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.
附图说明Description of drawings
为了更清楚地说明本申请一个或多个实施例或相关技术中的技术方案,下面将对实施例或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请一个或多个实施例中记载的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in one or more embodiments of the present application or related technologies, the following will briefly introduce the drawings that need to be used in the descriptions of the embodiments or related technologies. Obviously, the accompanying drawings in the following description The drawings are only some embodiments described in one or more embodiments of the present application, and those skilled in the art can obtain other drawings based on these drawings without any creative effort.
图1为本申请示出的一种图像处理方法的方法流程图;Fig. 1 is a method flowchart of an image processing method shown in the present application;
图2为本申请示出的一种区域图像获取方法的流程示意图;FIG. 2 is a schematic flow diagram of a method for acquiring an area image shown in the present application;
图3为本申请示出的一种模型训练方法的流程示意图;Fig. 3 is a schematic flow chart of a model training method shown in the present application;
图4为本申请示出的一种脚部跟踪方法的流程示意图;FIG. 4 is a schematic flow diagram of a foot tracking method shown in the present application;
图5为本申请示出的一种虚拟试鞋流程示意图;Fig. 5 is a schematic diagram of a virtual shoe trial process shown in the present application;
图6为本申请示出的一种虚拟试鞋方法的流程示意图;Fig. 6 is a schematic flow chart of a virtual shoe-trying method shown in the present application;
图7为本申请示出的一种图像处理装置的结构示意图;FIG. 7 is a schematic structural diagram of an image processing device shown in the present application;
图8为本申请示出的一种电子设备的硬件结构示意图。FIG. 8 is a schematic diagram of a hardware structure of an electronic device shown in the present application.
具体实施方式Detailed ways
下面将详细地结合附图对示例性实施例进行说明。下面的描述涉及附图时,除非另有表示,不同附图中的相同数字表示相同或相似的要素。以下示例性实施例中所描述的实施方式并不代表与本申请相一致的所有实施方式。相反,它们仅是与如所附权利要求书中所详述的、本申请的一些方面相一致的设备和方法的例子。Exemplary embodiments will be described in detail below with reference to the accompanying drawings. When the following description refers to the accompanying drawings, the same numerals in different drawings refer to the same or similar elements unless otherwise indicated. The implementations described in the following exemplary embodiments do not represent all implementations consistent with this application. Rather, they are merely examples of devices and methods consistent with aspects of the present application as recited in the appended claims.
在本申请使用的术语是仅仅出于描述特定实施例的目的,而非旨在限制本申请。在本申请和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式,除非上下文清楚地表示其他含义。还应当理解,本文中使用的术语“和/或”是指包含一个或多个相关联的列出项目的任何或所有可能组合。还应当理解,本文中所使用的词语“如果”,取决于语境,可以被解释成为“在……时”或“当……时”或“响应于确定”。The terminology used in this application is for the purpose of describing particular embodiments only, and is not intended to limit the application. As used in this application and the appended claims, the singular forms "a", "the", and "the" are intended to include the plural forms as well, unless the context clearly dictates otherwise. It should also be understood that the term "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items. It should also be understood that the word "if", as used herein, could be interpreted as "at" or "when" or "in response to a determination", depending on the context.
本申请涉及增强现实领域,通过获取现实环境中的目标对象的图像信息,进而借助各类视觉相关算法实现对目标对象的相关特征、状态及属性进行检测或识别处理,从而得到与具体应用匹配的虚拟与现实相结合的AR效果。示例性的,目标对象可涉及与人体相关的脸部、肢体、手势、动作等,或者与物体相关的标识物、标志物,或者与场馆或场所相关的沙盘、展示区域或展示物品等。视觉相关算法可涉及视觉定位、SLAM(Simultaneous Localization and Mapping)、三维重建、图像注册、背景分割、对象的关键点提取及跟踪、对象的位姿或深度检测等。具体应用不仅可以涉及跟真实场景或物品相关的导览、导航、讲解、重建、虚拟效果叠加展示等交互场景,还可以涉及与人相关的特效处理,比如妆容美化、肢体美化、特效展示、虚拟模型展示等交互场景。可通过卷积神经网络,实现对目标对象的相关特征、状态及属性进行检测或识别处理。前述卷积神经网络是基于深度学习框架进行模型训练而得到的网络模型。This application relates to the field of augmented reality. By obtaining the image information of the target object in the real environment, and then using various visual related algorithms to detect or identify the relevant characteristics, states and attributes of the target object, so as to obtain the matching specific application. AR effect combining virtual and reality. Exemplarily, the target object may involve faces, limbs, gestures, actions, etc. related to the human body, or markers and markers related to objects, or sand tables, display areas or display items related to venues or places. Vision-related algorithms may involve visual positioning, SLAM (Simultaneous Localization and Mapping), 3D reconstruction, image registration, background segmentation, object key point extraction and tracking, object pose or depth detection, etc. Specific applications can not only involve interactive scenes such as guided tours, navigation, explanations, reconstructions, virtual effect overlays and display related to real scenes or objects, but also special effects processing related to people, such as makeup beautification, body beautification, special effect display, virtual Interactive scenarios such as model display. The relevant features, states and attributes of the target object can be detected or identified through the convolutional neural network. The aforementioned convolutional neural network is a network model obtained through model training based on a deep learning framework.
本申请提出一种图像处理方法。该方法可以利用脚部关键点检测模型,对脚部对应的脚部区域图像进行关键点检测,得到第一脚部关键点的二维位置信息;然后基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,即可确定所述脚部在所述三维空间中的三维位姿。与人体关键点检测技术相比,可以利用神经网络回归得到可以进行位姿估计的第一脚部关键点的二维位置信息,从而便于后续基于所述映射关系确定所述脚部在所述 三维空间中的三维位姿。This application proposes an image processing method. The method can use the key point detection model of the foot to detect the key points of the foot area image corresponding to the foot, and obtain the two-dimensional position information of the first key point of the foot; then based on the preset three-dimensional model of the foot and the The mapping relationship between the preset position information of the second foot key point corresponding to the first foot key point in the three-dimensional space and the two-dimensional position information can determine the position of the foot in the three-dimensional space 3D pose. Compared with the key point detection technology of the human body, neural network regression can be used to obtain the two-dimensional position information of the first key point of the foot for pose estimation, so as to facilitate subsequent determination of the position of the foot in the three-dimensional position based on the mapping relationship. 3D pose in space.
所述第一脚部关键点是图像中的脚部至少一个预设脚部区域中的至少一个关键点。所述预设脚部区域可以是在脚部中的任意区域。例如,所述预设区域可以是脚部大拇指区域。The first foot key point is at least one key point in at least one preset foot area of the foot in the image. The predetermined foot area may be any area in the foot. For example, the preset area may be the thumb area of the foot.
所述第二脚部关键点是是指所述预设脚部三维模型包括的脚部关键点中,与所述第一脚部关键点处于相同脚部区域的关键点。The second key point of the foot refers to a key point in the same area of the foot as the first key point of the foot among the key points of the foot included in the preset three-dimensional model of the foot.
该方法可以应用于电子设备中。其中,所述电子设备可以通过搭载与图像处理方法对应的软件装置执行所述方法。所述电子设备的类型可以是笔记本电脑,计算机,服务器,手机,PAD终端等。本申请不对所述电子设备的具体类型进行特别限定。所述电子设备可以是客户端或服务端一侧的设备。所述服务端可以是由服务器、服务器集群或分布式服务器集群提供的服务端或云端。The method can be applied to electronic equipment. Wherein, the electronic device may implement the method by carrying a software device corresponding to the image processing method. The type of the electronic device may be a notebook computer, a computer, a server, a mobile phone, a PAD terminal and the like. The present application does not specifically limit the specific type of the electronic device. The electronic device may be a device on the client side or on the server side. The server may be a server or a cloud provided by a server, a server cluster or a distributed server cluster.
参见图1,图1为本申请示出的一种图像处理方法的方法流程图。如图1所示,所述方法可以包括S102-S106。Referring to FIG. 1 , FIG. 1 is a method flowchart of an image processing method shown in the present application. As shown in Fig. 1, the method may include S102-S106.
其中,S102,获取待处理图像中与脚部对应的区域图像。Wherein, S102, acquire an area image corresponding to the foot in the image to be processed.
所述待处理图像中可以包括脚部。本申请的目的在于捕捉待处理图像中的脚部,并得到该脚部的三维位姿。The image to be processed may include feet. The purpose of this application is to capture the foot in the image to be processed and obtain the three-dimensional pose of the foot.
在一些实施例中,所述待处理图像可以是由用户通过客户端程序传输的图像。由此可以对用户上传的图像进行处理。In some embodiments, the image to be processed may be an image transmitted by a user through a client program. In this way, images uploaded by users can be processed.
在一些实施例中,所述待处理图像也可以是通过图像采集硬件采集的图像。例如,所述图像采集硬件可以是手机终端搭载的摄像头。用户通过摄像头可以实时采集视频流。所述待处理图像既可以是图片,也可以是所述视频流中的图像。本申请中可以实时在所述待处理图像中捕捉脚部,并进行该脚部的三维位姿估计。In some embodiments, the image to be processed may also be an image collected by image collection hardware. For example, the image acquisition hardware may be a camera mounted on a mobile phone terminal. Users can collect video streams in real time through the camera. The image to be processed may be a picture, or an image in the video stream. In this application, the foot can be captured in the image to be processed in real time, and the three-dimensional pose estimation of the foot can be performed.
所述脚部,可以是指人体的脚部也可以是其他动物或机器人的脚部等。该脚部可以包括左脚或右脚。The feet may refer to the feet of a human body or other animals or robots. The foot may comprise a left foot or a right foot.
所述区域图像,可以是所述待处理图像中,与脚部区域对应的图像。The region image may be an image corresponding to the foot region among the images to be processed.
在一些实施例中,所述待处理图像为图片或者某一视频流中的首帧图像,可以将所述脚部的检测框在所述待处理图像中围成的区域确定为所述区域图像。当所述待处理图像为该视频流中的非首帧图像时,获取该区域图像的方法在后续实施例中进行说明,在此不做详述。In some embodiments, the image to be processed is a picture or the first frame image in a certain video stream, and the area surrounded by the detection frame of the foot in the image to be processed may be determined as the area image . When the image to be processed is a non-first frame image in the video stream, the method for acquiring the area image will be described in subsequent embodiments, and will not be described in detail here.
参见图2,图2为本申请示出的一种区域图像获取方法的流程示意图。Referring to FIG. 2 , FIG. 2 is a schematic flowchart of a method for acquiring an area image shown in the present application.
如图2所示,在执行S102时可以执行S21-S22。As shown in FIG. 2, S21-S22 may be executed when S102 is executed.
其中,S21,利用对象检测模型,对所述待处理图像进行对象检测,得到对象检测结果,所述对象检测结果包括所述待处理图像中脚部的检测框。Wherein, S21, using an object detection model to perform object detection on the image to be processed to obtain an object detection result, where the object detection result includes a detection frame of a foot in the image to be processed.
在一些实施例中,在执行S21时,可以将所述待处理图像输入完成训练的对象检测模型,得到所述待处理图像中包括的脚部对应的脚部检测框。In some embodiments, when performing S21, the image to be processed may be input into the trained object detection model to obtain a foot detection frame corresponding to the foot included in the image to be processed.
所述对象检测模型,可以是基于RCNN(Region Convolutional Neural Networks,区域卷积神经网络),FAST-RCNN(Fast Region Convolutional Neural Networks,快速区域卷积神经网络)或FASTER-RCNN(Faster Region Convolutional Neural Networks,更快速的区域卷积神经网络)构建的模型。The object detection model can be based on RCNN (Region Convolutional Neural Networks, regional convolutional neural network), FAST-RCNN (Fast Region Convolutional Neural Networks, fast regional convolutional neural network) or FASTER-RCNN (Faster Region Convolutional Neural Networks , a model built by a faster regional convolutional neural network).
在一些实施例中,可以对所述对象检测模型进行训练。具体地,可以获取多个标注 了脚部对应的检测框信息的训练样本。然后可以基于所述训练样本对所述模型进行有监督训练,直至所述模型收敛。In some embodiments, the object detection model can be trained. Specifically, multiple training samples marked with detection frame information corresponding to the feet can be obtained. The model may then be trained under supervision based on the training samples until the model converges.
完成训练后,即可利用所述对象检测模型进行图像中脚部和脚部检测框的检测。After the training is completed, the object detection model can be used to detect the feet and the foot detection frame in the image.
S22,根据所述检测框以及所述待处理图像,得到所述脚部对应的区域图像。S22. Obtain an area image corresponding to the foot according to the detection frame and the image to be processed.
在一些实施例中,可以将对象检测得到的脚部检测框以及待处理图像(或者利用骨干网络对所述待处理图像进行特征提取得到特征图)输入区域特征提取单元,得到区域图像。In some embodiments, the foot detection frame obtained by object detection and the image to be processed (or the feature map obtained by performing feature extraction on the image to be processed by using the backbone network) can be input into the area feature extraction unit to obtain the area image.
所述区域特征提取单元可以是ROI Align(Region of Interest Align,感兴趣区域对齐)单元或者ROI Pooling(Region of Interest Pooling,感兴趣区域汇集)单元。该单元可以在待处理图像中抠出脚部检测框围成的区域图像。The region feature extraction unit may be a ROI Align (Region of Interest Align, region of interest alignment) unit or ROI Pooling (Region of Interest Pooling, region of interest collection) unit. This unit can extract the area image surrounded by the foot detection frame in the image to be processed.
通过S21至S22包括的方法,可以利用深度学习技术准确地对待处理图像进行抠图处理,得到与脚部对应的区域图像。Through the methods included in S21 to S22, the deep learning technology can be used to accurately perform matting processing on the image to be processed to obtain an image of the area corresponding to the foot.
S104,利用脚部关键点检测模型,对所述区域图像进行关键点检测,得到所述脚部的第一脚部关键点的二维位置信息。S104. Using the foot key point detection model, perform key point detection on the region image to obtain two-dimensional position information of a first foot key point of the foot.
其中,所述脚部关键点检测模型包括基于多个标注了所述第一脚部关键点的位置信息的脚部区域图像样本训练得到的神经网络模型。Wherein, the foot key point detection model includes a neural network model trained based on a plurality of foot region image samples labeled with position information of the first foot key point.
所述第一脚部关键点,是指图像中的脚部至少一个预设脚部区域中的至少一个关键点。所述预设脚部区域可以是在脚部中的任意区域。例如,所述预设区域可以是脚部大拇指区域。所述预设脚部区域可以是在脚部中的任意区域。例如,所述预设区域可以是脚部大拇指区域。The first foot key point refers to at least one key point in at least one preset foot area of the foot in the image. The predetermined foot area may be any area in the foot. For example, the preset area may be the thumb area of the foot. The predetermined foot area may be any area in the foot. For example, the preset area may be the thumb area of the foot.
所述第一脚部关键点的数量和预设脚部区域可以根据业务需求进行设定。在一些实施例中,所述第一脚部关键点可以包括脚部边缘轮廓上的多个关键点。由此可以根据所述第一脚部关键点,很好的表征脚部的轮廓,从而可以提升脚部的位姿估计效果。The number of the first foot key points and the preset foot area can be set according to business requirements. In some embodiments, the first key point of the foot may include a plurality of key points on the contour of the edge of the foot. Therefore, the outline of the foot can be well represented according to the first key point of the foot, thereby improving the effect of estimating the pose of the foot.
所述二维位置信息可以指示所述第一脚部关键点在所述待处理图像中的二维位置信息。在一些实施例中所述二维位置信息可以指示所述第一脚部关键点在所述待处理图像中的二维坐标。The two-dimensional position information may indicate the two-dimensional position information of the first foot key point in the image to be processed. In some embodiments, the two-dimensional position information may indicate the two-dimensional coordinates of the first foot key point in the image to be processed.
在一些实施例中所述第一脚部关键点的数量可以不少于四个。由此可以提升脚部关键点的细粒度,增加脚部的位姿信息,从而可以提升脚部的位姿估计效果。在一些实施例中,所述第一脚部关键点的数量可以为四个至十五个范围内的任意数量,通过在必要的脚部位置设置关键点,以使第一脚部关键点的设置的数量可以在合适的数量范围,也便于进行后续的脚部三维位姿检测。In some embodiments, the number of the first foot key points may be no less than four. This can improve the fine-grainedness of the key points of the feet, increase the pose information of the feet, and thus improve the pose estimation effect of the feet. In some embodiments, the number of the first foot key points can be any number within the range of four to fifteen, by setting the key points at the necessary foot positions, so that the first foot key points The set quantity can be within an appropriate quantity range, which is also convenient for subsequent three-dimensional pose detection of the feet.
在一些实施例中,所述第一脚部关键点包括以下至少一个区域的关键点:In some embodiments, the first foot key points include key points of at least one of the following areas:
大脚趾脚尖;前脚掌内侧关节;脚内侧足弓;后脚掌内侧;脚后跟后方;后脚掌外侧;前脚掌外侧关节;前脚面和腿连接处;脚踝内侧关节;后脚筋;脚踝外侧关节。tip of big toe; medial forefoot joint; medial arch of foot; medial rear ball; rear of heel; lateral rear ball; lateral forefoot joint; junction of forefoot and leg; medial ankle joint; rear hamstring; lateral ankle joint.
总的来说,可以利用脚部轮廓上突出区域和/或内凹区域内的点作为第一脚部关键点,提升对脚部轮廓表征的准确性,从而提升脚部的位姿估计效果。In general, points in the protruding area and/or concave area on the foot contour can be used as the first key point of the foot to improve the accuracy of the foot contour representation, thereby improving the effect of foot pose estimation.
所述脚部关键点检测模型,可以是基于神经网络或深度学习网络构建的回归或分类模型。该模型用于检测第一脚部关键点的二维位置信息。The foot key point detection model can be a regression or classification model constructed based on a neural network or a deep learning network. The model is used to detect the two-dimensional position information of the key point of the first foot.
以所述脚部关键点检测模型为基于深度学习网络构建的回归模型为例。在一些实施例中,可以利用训练样本对其进行训练。参见图3,图3为本申请示出的一种模型训练 方法的流程示意图。如图3所示,在进行模型训练时,可以执行S31-S33。Take the foot key point detection model as an example based on a regression model built on a deep learning network. In some embodiments, it can be trained using training samples. Referring to Fig. 3, Fig. 3 is a schematic flow chart of a model training method shown in the present application. As shown in FIG. 3, S31-S33 may be executed during model training.
其中,S31,获取多个训练样本。所述训练样本可以为包括脚部的脚部图像。所述脚部图像中可以标注第一脚部关键点的二维位置信息。Wherein, S31, acquire multiple training samples. The training samples may be foot images including feet. The two-dimensional position information of the first key point of the foot may be marked in the foot image.
S32,将所述多个训练样本输入所述脚部关键点检测模型,得到所述多个训练样本分别对应的估计二维位置信息。S32. Input the plurality of training samples into the foot key point detection model to obtain estimated two-dimensional position information respectively corresponding to the plurality of training samples.
S33,根据得到的估计二维位置信息,与各训练样本预先标注的第一脚部关键点的二维位置信息之间差异,得到损失信息,并利用所述损失信息,通过反向传播更新所述脚部关键点检测模型的参数。S33. Obtain loss information according to the difference between the obtained estimated two-dimensional position information and the two-dimensional position information of the first foot key points pre-marked by each training sample, and use the loss information to update the obtained position by backpropagation. Describe the parameters of the foot key point detection model.
完成模型训练后,该模型即具备预测第一脚部关键点的二维位置信息的能力。在执行S104时,可以将S102得到的区域图像输入训练完成的所述脚部关键点检测模型,得到所述脚部的第一脚部关键点的二维位置信息。After the model training is completed, the model has the ability to predict the two-dimensional position information of the key points of the first foot. When performing S104, the area image obtained in S102 may be input into the trained foot key point detection model to obtain two-dimensional position information of the first foot key point of the foot.
S106,基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿。S106, based on the mapping relationship between the preset position information in the three-dimensional space of the second foot key point corresponding to the first foot key point in the preset three-dimensional model of the foot and the two-dimensional position information, determine The three-dimensional pose of the foot in the three-dimensional space.
所述三维空间,是指所述脚部需投影的三维空间。该空间通常根据业务需求进行设定。The three-dimensional space refers to the three-dimensional space to be projected by the feet. This space is usually set according to business needs.
所述预设脚部三维模型,可以是根据业务需求预先维护的三维模型。所述预设脚部三维模型可以包括多个脚部关键点。The preset three-dimensional model of the foot may be a three-dimensional model maintained in advance according to business requirements. The preset three-dimensional model of the foot may include a plurality of key points of the foot.
所述第二脚部关键点,是指所述预设脚部三维模型包括的脚部关键点中,与所述第一脚部关键点的至少一个脚部区域相同的脚部区域的至少一个关键点。所述脚部区域可以是脚部的任意区域。例如,所述脚部区域可以是脚部大拇指区域。The second key point of the foot refers to at least one of the key points of the foot included in the preset three-dimensional model of the foot, at least one of the foot areas that is the same as at least one foot area of the first key point of the foot key point. The foot region may be any region of the foot. For example, the foot area may be the thumb area of the foot.
所述预设位置信息可以指示预设脚部三维模型在标准位姿下,所述第二脚部关键点在所述三维空间中的三维位置信息。在一些实施例中,所述预设位置信息可以是所述脚部三维模型在标准位姿下,所述第二脚部关键点在所述三维空间中的三维坐标。The preset position information may indicate the three-dimensional position information of the second key point of the foot in the three-dimensional space under the standard pose of the preset three-dimensional model of the foot. In some embodiments, the preset position information may be the three-dimensional coordinates of the second key point of the foot in the three-dimensional space in the standard pose of the three-dimensional model of the foot.
所述标准位姿可以是根据业务需求预设的位姿。在一些实施例中可以将处于所述三维空间对应的三维坐标系的原点处且垂直于所述三维坐标系的X轴和Y轴组成的平面的姿态确定为所述标准位姿。脚部三维模型在该标准位姿时所述第二脚部关键点的三维坐标即为所述预设位置信息。The standard pose may be a preset pose according to business requirements. In some embodiments, the pose at the origin of the three-dimensional coordinate system corresponding to the three-dimensional space and perpendicular to the plane formed by the X-axis and the Y-axis of the three-dimensional coordinate system may be determined as the standard pose. The three-dimensional coordinates of the second key point of the foot when the three-dimensional model of the foot is in the standard pose is the preset position information.
所述三维位姿用于指示待处理图像中的脚部在三维空间中的姿态。在一些实施例中,所述三维位姿可以包括所述脚部相对于三维坐标系中的X、Y、Z轴三个方向的平移量与旋转量。The three-dimensional pose is used to indicate the posture of the foot in the three-dimensional space in the image to be processed. In some embodiments, the three-dimensional pose may include translation and rotation of the foot relative to X, Y, and Z axes in the three-dimensional coordinate system.
在执行S106时,可以根据位姿估计算法,基于S104得到的所述第一脚部关键点的二维位置信息(通常为二维坐标信息)与所述预设位置信息之间的映射关系,对所述脚部进行位姿估计,得到所述脚部的三维位姿。所述三维映射算法可以包括PNP(Perspective-N-Point,多点透视成像)等类似算法。本申请不对映射关系求解算法进行特别限定。When executing S106, according to the pose estimation algorithm, based on the mapping relationship between the two-dimensional position information (usually two-dimensional coordinate information) of the first foot key point obtained in S104 and the preset position information, Estimating the pose of the foot to obtain the three-dimensional pose of the foot. The three-dimensional mapping algorithm may include PNP (Perspective-N-Point, multi-point perspective imaging) and other similar algorithms. This application does not specifically limit the algorithm for solving the mapping relationship.
以所述三维映射算法为PNP算法为例。该PNP算法的输入有二。其一为所述第一脚部关键点在所述待处理图像中的二维坐标;其二为所述第二脚部关键点在所述三维空间中的三维坐标。该算法可以基于所述第一脚部关键点中的各脚部关键点的二维坐标,与对应第二脚部关键点的三维坐标之间的映射关系,得到所述脚部在所述三维空间中的三维位姿。Take the three-dimensional mapping algorithm as the PNP algorithm as an example. The PNP algorithm has two inputs. One is the two-dimensional coordinates of the first foot key point in the image to be processed; the other is the three-dimensional coordinate of the second foot key point in the three-dimensional space. This algorithm can be based on the mapping relationship between the two-dimensional coordinates of each foot key point in the first foot key point and the three-dimensional coordinates corresponding to the second foot key point, to obtain the three-dimensional coordinates of the foot in the three-dimensional 3D pose in space.
在得到所述第一脚部关键点的二维坐标后,可以获取所述第二脚部关键点在所述三维空间中的三维坐标。然后将所述三维坐标与所述二维坐标输入所述PNP算法对应的求解公式,得到所述脚部在所述三维空间中的三维位姿。After obtaining the two-dimensional coordinates of the first key point of the foot, the three-dimensional coordinates of the second key point of the foot in the three-dimensional space may be obtained. Then input the three-dimensional coordinates and the two-dimensional coordinates into the solution formula corresponding to the PNP algorithm to obtain the three-dimensional pose of the foot in the three-dimensional space.
在前述实施例提出的方案中,可以利用脚部关键点检测模型,对脚部对应的脚部区域图像进行关键点检测,得到第一脚部关键点的二维位置信息;然后基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,即可确定所述脚部在所述三维空间中的三维位姿。与人体关键点检测技术相比,可以利用神经网络回归得到可以进行位姿估计的第一脚部关键点的二维位置信息,从而便于后续基于所述映射关系确定所述脚部在所述三维空间中的三维位姿。In the solution proposed in the foregoing embodiments, the key point detection model of the foot can be used to detect the key points of the foot area image corresponding to the foot to obtain the two-dimensional position information of the key point of the first foot; then based on the preset foot The mapping relationship between the preset position information of the second foot key point corresponding to the first foot key point in the three-dimensional space and the two-dimensional position information in the three-dimensional model of the foot can determine the foot 3D pose in the 3D space. Compared with the key point detection technology of the human body, neural network regression can be used to obtain the two-dimensional position information of the first key point of the foot for pose estimation, so as to facilitate subsequent determination of the position of the foot in the three-dimensional position based on the mapping relationship. 3D pose in space.
在一些实施例中,本申请示出的图像处理方法还可以识别出脚部的类型,即脚部为左脚还是右脚。该方法通过S21得到对象检测结果中除了包括所述待处理图像中脚部的检测框之外,还包括所述脚部的类型。In some embodiments, the image processing method shown in this application can also identify the type of the foot, that is, whether the foot is left or right. In the method, the object detection result obtained through S21 includes not only the detection frame of the foot in the image to be processed, but also the type of the foot.
在一些实施例中,执行S21时可以利用对象检测模型,对所述待处理图像进行对象检测,得到所述待处理图像包括的脚部的检测框,以及所述脚部的类型。由此,可以利用对象检测模型,在检测出待处理图像中脚部检测框的基础上,对左右脚进行区分。In some embodiments, when performing S21, an object detection model may be used to perform object detection on the image to be processed to obtain the detection frame of the foot included in the image to be processed and the type of the foot. Thus, the object detection model can be used to distinguish the left and right feet on the basis of detecting the foot detection frame in the image to be processed.
所述对象检测模型包括基于多个标注了脚部对应的检测框和类型信息的训练样本训练得到的神经网络模型;所述类型指示脚部为左脚或右脚。The object detection model includes a neural network model trained based on a plurality of training samples marked with detection frames corresponding to feet and type information; the type indicates that the foot is left or right.
所述对象检测模型,可以是基于RCNN(Region Convolutional Neural Networks,区域卷积神经网络),FAST-RCNN(Fast Region Convolutional Neural Networks,快速区域卷积神经网络)或FASTER-RCNN(Faster Region Convolutional Neural Networks,更快速的区域卷积神经网络)构建的模型。The object detection model can be based on RCNN (Region Convolutional Neural Networks, regional convolutional neural network), FAST-RCNN (Fast Region Convolutional Neural Networks, fast regional convolutional neural network) or FASTER-RCNN (Faster Region Convolutional Neural Networks , a model built by a faster regional convolutional neural network).
在一些实施例中,可以对所述对象检测模型进行训练。具体地,可以获取多个标注了脚部对应的检测框信息和类型信息(即为左脚或右脚)的训练样本。然后可以基于所述训练样本对所述模型进行有监督训练,直至所述模型收敛。In some embodiments, the object detection model may be trained. Specifically, a plurality of training samples marked with detection frame information and type information corresponding to the feet (that is, left foot or right foot) can be obtained. The model may then be trained under supervision based on the training samples until the model converges.
完成训练后,即可利用所述对象检测模型进行图像中脚部的检测框和脚部类型的检测。After the training is completed, the object detection model can be used to detect the detection frame and the type of the foot in the image.
在一些实施例中,在执行S102之后,还可以执行S103,响应于所述脚部为预设类型,对所述区域图像进行翻转处理,以使输入所述脚部关键点检测模型的所有区域图像中的脚部的类型一致,从而保证输入脚部关键点检测模型的所有区域图像中的脚部为相同的类型(即左脚或右脚),便于脚部关键点检测模型进行处理。In some embodiments, after executing S102, S103 may also be executed, in response to the foot being of a preset type, flipping the region image so that all regions of the foot key point detection model are input The types of the feet in the images are consistent, thereby ensuring that the feet in all regional images input to the foot key point detection model are of the same type (ie, left foot or right foot), which is convenient for the foot key point detection model to process.
所述预设类型,可以根据业务需求进行设定。在一些实施例中,在训练所述脚部关键点检测模型时,可以适用左脚样本进行训练。此时可以将所述预设类型设置为右脚。在执行S103时,如果识别出脚部为右脚,则将脚部对应的区域图像进行翻转,使得区域图像中的脚部变为左脚类型,便于脚部关键点检测模型进行处理。The preset type can be set according to business requirements. In some embodiments, when training the foot key point detection model, a left foot sample may be used for training. At this point, the preset type can be set to right foot. When executing S103, if the foot is recognized as a right foot, the area image corresponding to the foot is reversed so that the foot in the area image becomes a left foot type, which is convenient for the foot key point detection model to process.
在一些实施例中,需要对采集的视频流中的图像进行图像处理。为了捕捉到视频流中的同一脚部,需要进行脚部跟踪。在一些实施例中,可以利用对象检测模型对视频流中的各帧图像进行对象检测,得到视频流中的脚部检测框,然后根据检测出的脚部检测框的位置确定视频流中的同一脚部,以进行脚部跟踪。In some embodiments, image processing needs to be performed on the images in the captured video stream. In order to capture the same foot in the video stream, foot tracking is required. In some embodiments, the object detection model can be used to perform object detection on each frame of images in the video stream to obtain the foot detection frame in the video stream, and then determine the same foot detection frame in the video stream according to the position of the detected foot detection frame. Foot for foot tracking.
不难发现,前述脚部跟踪方法需要对各帧图像进行对象检测,而由于对象检测模型的结构比较复杂,运算量比较大,开销较大,因此,前述脚部跟踪方法的跟踪效率较低,可能导致本申请的图像处理方法实时性较差。It is not difficult to find that the above-mentioned foot tracking method needs to perform object detection on each frame image, and because the structure of the object detection model is relatively complex, the amount of calculation is relatively large, and the overhead is relatively large. Therefore, the tracking efficiency of the above-mentioned foot tracking method is low. The real-time performance of the image processing method of the present application may be poor.
为了解决前述问题,在一些实施例中,可以利用相邻帧图像中脚部的位置不会发生明显变化的特性,对视频流中的脚部进行跟踪,减少通过对象检测模型进行脚部跟踪带来的运算量,降低开销,提升脚部跟踪效率,从而提升图像处理方法的实时性。In order to solve the aforementioned problems, in some embodiments, the feature that the position of the feet in adjacent frame images does not change significantly can be used to track the feet in the video stream, reducing the need for foot tracking through the object detection model. The amount of computation to come, reduce overhead, improve the efficiency of foot tracking, thereby improving the real-time performance of the image processing method.
参见图4,图4为本申请示出的一种脚部跟踪方法的流程示意图。如图4所示,在实现脚部跟踪时,可以执行S41-S43。Referring to FIG. 4 , FIG. 4 is a schematic flowchart of a foot tracking method shown in the present application. As shown in FIG. 4, S41-S43 can be executed when foot tracking is implemented.
其中,S41,获取待处理图像中与脚部对应的区域图像。Wherein, S41, acquire an area image corresponding to the foot in the image to be processed.
所述待处理图像为视频流中的图像。所述待处理图像可以是某一视频流中首帧图像或首帧图像之后的图像。The image to be processed is an image in a video stream. The image to be processed may be the first frame image or images after the first frame image in a certain video stream.
如果所述待处理图像为视频流中的首帧图像,则可以通过S21-S22的步骤获取区域图像。If the image to be processed is the first frame image in the video stream, the area image can be acquired through steps S21-S22.
如果所述待处理图像为首帧图像之后的图像,在执行S41时,可以获取所述视频流中所述待处理图像所在帧的前一帧图像中脚部的第一脚部关键点位置信息,并根据获取的位置信息,确定脚部关键点框。基于所述脚部关键点框与所述待处理图像,得到所述脚部对应的区域图像。If the image to be processed is an image after the first frame image, when executing S41, the position information of the first foot key point of the foot in the previous frame image of the frame where the image to be processed is located in the video stream may be acquired, And according to the obtained position information, determine the foot key point frame. Based on the key point frame of the foot and the image to be processed, an area image corresponding to the foot is obtained.
由此可以得到所述待处理图像中,与所述前一帧图像中的脚部位置相同位置的区域图像。Thus, an image of an area in the image to be processed that is at the same position as the foot in the previous frame image can be obtained.
所述脚部关键点框可以是任意形状的关键点框。在一些实施例中,所述脚部关键点框可以是矩形框。在执行S41时,可以获取缓存的所述待处理图像的前一帧图像中的第一脚部关键点的二维位置信息。然后可以根据所述二维位置信息,分别确定X和Y轴方向的最大坐标x0,y0和最小坐标x1,y1。之后可以将(x0,y0),(x0,y1),(x1,y0)与(x1,y1)4个顶点组成的矩形框作为所述脚部关键点框。The key point frame of the foot can be a key point frame of any shape. In some embodiments, the foot key point box may be a rectangular box. When S41 is executed, the two-dimensional position information of the first key point of the foot in the cached image of the previous frame of the image to be processed may be acquired. Then, according to the two-dimensional position information, the maximum coordinates x0, y0 and the minimum coordinates x1, y1 in the directions of the X and Y axes can be respectively determined. Afterwards, a rectangular frame composed of four vertices (x0, y0), (x0, y1), (x1, y0) and (x1, y1) can be used as the foot key point frame.
得到脚部关键点框后,可以将所述脚部关键点框以及待处理图像(或者利用骨干网络对所述待处理图像进行特征提取得到特征图)输入区域特征提取单元,得到第二区域图像。After obtaining the key point frame of the foot, the key point frame of the foot and the image to be processed (or using the backbone network to perform feature extraction on the image to be processed to obtain a feature map) can be input into the region feature extraction unit to obtain the second region image .
所述区域特征提取单元可以是ROI Align(Region of Interest Align,感兴趣区域对齐)单元或者ROI Pooling(Region of Interest Pooling,感兴趣区域汇集)单元。该单元可以在第二图像中抠出脚部关键点框围成的第二区域图像。The region feature extraction unit may be a ROI Align (Region of Interest Align, region of interest alignment) unit or ROI Pooling (Region of Interest Pooling, region of interest collection) unit. The unit can extract the second area image surrounded by the frame of the key points of the foot in the second image.
在一些实施例中,在基于所述脚部关键点框与所述待处理图像,得到所述脚部对应的区域图像之后,还可以获取在对所述前一帧图像中的区域图像进行对象检测或分类时存储的所述前一帧图像中脚部的类型;响应于所述前一帧图像中脚部的类型为预设类型,对所述区域图像进行翻转处理,以使输入脚部关键点检测模型的所有区域图像中的脚部的类型一致。从而保证输入脚部关键点检测模型的所有区域图像中的脚部为相同的类型(即左脚或右脚),便于脚部关键点检测模型进行处理。In some embodiments, after the area image corresponding to the foot is obtained based on the key point frame of the foot and the image to be processed, it is also possible to obtain the object that is processed on the area image in the previous frame image. The type of the foot in the previous frame image stored during detection or classification; in response to the type of the foot in the previous frame image being a preset type, the region image is flipped so that the input foot The type of feet is consistent in all region images of the keypoint detection model. Therefore, it is ensured that the feet in all the region images input to the foot key point detection model are of the same type (ie left foot or right foot), which is convenient for the foot key point detection model to process.
S42,利用图像分类模型,对所述区域图像进行分类,得到所述区域图像的分类结果。S42. Classify the region image by using an image classification model to obtain a classification result of the region image.
所述分类结果,指示所述区域图像是否为脚部图像。The classification result indicates whether the region image is a foot image.
所述图像分类模型,可以包括卷积神经网络。在训练该模型时,可以获取多个标注了图像分类信息的图像样本。所述图像分类信息指示所述图像样本是否为脚部图像。然后可以基于图像样本对该模型进行有监督训练。The image classification model may include a convolutional neural network. When training the model, multiple image samples labeled with image classification information can be obtained. The image classification information indicates whether the image sample is a foot image. This model can then be trained under supervision based on image samples.
完成训练后则可以利用所述图像分类模型对所述区域图像进行图像分类,得到分类结果。After the training is completed, the image classification model can be used to perform image classification on the region image to obtain a classification result.
S43,响应于所述分类结果指示所述区域图像为脚部图像,确定所述区域图像中的脚部与所述前一帧图像中的脚部为同一脚部,以进行脚部跟踪。S43. In response to the classification result indicating that the area image is a foot image, determine that the foot in the area image is the same foot as the foot in the previous frame image, so as to perform foot tracking.
由于S41得到的所述区域图像为所述待处理图像中,与所述前一帧图像中的脚部位置相同位置的区域图像,那么如果所述区域图像的分类结果指示其为脚部图像,则可以说明所述待处理图像中与所述前一帧图像中的脚部位置相同位置也存在脚部。根据同一脚部在相邻几帧图像中的位置不会发生明显变化的特性,则可以确定所述区域图像中的脚部与所述前一帧图像中的脚部为同一脚部。由此即实现了视频流中的脚部跟踪,减少通过对象检测模型进行脚部跟踪带来的运算量,降低开销,提升脚部跟踪效率,从而提升图像处理方法的实时性。Since the region image obtained in S41 is the region image at the same position as the foot position in the previous frame image in the image to be processed, if the classification result of the region image indicates that it is a foot image, Then it can be explained that there is also a foot in the image to be processed at the same position as the foot in the previous frame image. According to the characteristic that the position of the same foot in several adjacent frames of images does not change significantly, it can be determined that the foot in the region image is the same foot as the foot in the previous frame of image. Thus, the foot tracking in the video stream is realized, the calculation amount brought by the foot tracking through the object detection model is reduced, the overhead is reduced, the foot tracking efficiency is improved, and the real-time performance of the image processing method is improved.
在脚部跟踪的过程中,还可以继续执行S104,利用脚部关键点检测模型,对所述区域图像进行关键点检测,得到所述脚部的第一脚部关键点的二维位置信息。在此不对S104进行详细说明。In the process of foot tracking, S104 may be continued to perform key point detection on the region image by using the foot key point detection model to obtain two-dimensional position information of the first foot key point of the foot. S104 will not be described in detail here.
完成脚部跟踪后,在执行S106时,可以响应于确定所述区域图像中的脚部与所述前一帧图像中的脚部为同一脚部,基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿。在此不对位姿估计的步骤信息详述。After the foot tracking is completed, when executing S106, it may be determined that the foot in the region image is the same foot as the foot in the previous frame image, based on the preset three-dimensional model of the foot and the foot in the The mapping relationship between the preset position information of the second foot key point corresponding to the first foot key point in the three-dimensional space and the two-dimensional position information determines the three-dimensional position of the foot in the three-dimensional space posture. The step information of pose estimation is not described in detail here.
如果所述分类结果指示所述区域图像不是脚部图像,则可以确定脚部跟踪失败,即可将所述待处理图像作为视频中跟踪失败的图像,利用S21-S22的步骤,获取区域图像,然后通过S104与S106,得到脚部三维位姿。If the classification result indicates that the area image is not a foot image, it can be determined that the foot tracking has failed, that is, the image to be processed can be used as an image of a video tracking failure, and the area image is acquired by using the steps of S21-S22, Then through S104 and S106, the three-dimensional pose of the foot is obtained.
在一些实施例中,所述图像分类模型与所述脚部关键点检测模型共享特征提取网络。例如,骨干网络。In some embodiments, the image classification model shares a feature extraction network with the foot landmark detection model. For example, the backbone network.
可以采用联合训练的方式对所述图像分类模型与所述脚部关键点检测模型进行训练,由此可以提升特征提取网络提取特征的能力,从而提取出对分类和关键点检测有益的特征信息,提升分类和关键点检测效果。The image classification model and the foot key point detection model can be trained in a joint training manner, thereby improving the ability of the feature extraction network to extract features, thereby extracting useful feature information for classification and key point detection, Improve classification and keypoint detection performance.
在一些实施例中,可以获取标注了图像分类信息的第一图像样本,以及标注了第一脚部关键点位置信息的第二图像样本。所述第一图像样本与所述第二图像样本可以是同一张图像。所述分类信息指示所述第一图像样本是否为脚部图像。In some embodiments, the first image sample labeled with image classification information and the second image sample labeled with position information of the first key point of the foot may be acquired. The first image sample and the second image sample may be the same image. The classification information indicates whether the first image sample is an image of a foot.
然后可以将所述第一图像样本输入所述图像分类模型得到分类预测结果,并根据所述分类预测结果与标注的图像分类信息,得到第一损失信息;以及将所述第二图像样本输入所述脚部关键点检测模型得到第一脚部关键点位置预测结果,并根据所述第一脚部关键点位置预测结果与标注的第一脚部关键点位置信息,得到第二损失信息。Then, the first image sample may be input into the image classification model to obtain a classification prediction result, and the first loss information may be obtained according to the classification prediction result and the marked image classification information; and the second image sample may be input into the The foot key point detection model obtains the first foot key point position prediction result, and obtains the second loss information according to the first foot key point position prediction result and the marked first foot key point position information.
之后可以基于所述第一损失信息与所述第二损失信息,调整所述图像分类模型与所述脚部关键点检测模型的模型参数。在一些实施例中可以多次迭代执行前述训练步骤,直至所述图像分类模型与所述脚部关键点检测模型收敛。Then, model parameters of the image classification model and the foot key point detection model may be adjusted based on the first loss information and the second loss information. In some embodiments, the foregoing training steps may be performed iteratively for multiple times until the image classification model and the foot key point detection model converge.
由于所述图像分类模型与所述脚部关键点检测模型共享特征提取网络,因此在对其中任一模型进行训练时,均会影响到对另一模型的训练,使得针对两个模型的训练可以相互补充与促进,实现所述图像分类模型与所述脚部关键点检测模型的联合训练。通过前述联合训练,一方面,可以通过提升所述特征提取网络提取出对分类和关键点检测有益的特征信息的能力,提升分类和关键点检测效果;另一方面,可以通过两个模型训练之间的相互补充与促进,提升模型训练效率。Since the image classification model shares the feature extraction network with the foot key point detection model, when any model is trained, it will affect the training of the other model, so that the training of the two models can be Complement and promote each other to realize joint training of the image classification model and the foot key point detection model. Through the aforementioned joint training, on the one hand, the ability of the feature extraction network to extract useful feature information for classification and key point detection can be improved, and the effect of classification and key point detection can be improved; Complement and promote each other to improve the efficiency of model training.
在一些实施例中,在完成S106,得到待处理图像中的脚部在所述三维空间中的三维位姿后,可以进行虚拟试鞋。In some embodiments, after completing S106 and obtaining the three-dimensional pose of the foot in the three-dimensional space in the image to be processed, virtual shoe fitting may be performed.
在一些实施例中,可以先获取鞋子素材的三维虚拟模型。然后基于所述待处理图像中脚部的三维位姿,在所述待处理图像中与所述脚部对应的位置叠加所述三维虚拟模型,得到虚拟试鞋的增强现实效果。In some embodiments, a three-dimensional virtual model of the shoe material may be obtained first. Then, based on the three-dimensional pose of the foot in the image to be processed, the three-dimensional virtual model is superimposed on the position corresponding to the foot in the image to be processed, so as to obtain the augmented reality effect of virtual shoe fitting.
所述三维虚拟模型可以用于指示鞋子素材的轮廓和/或纹理颜色等。在一些实施例中所述三维虚拟模型可以包括鞋子素材的各顶点在三维空间中的三维坐标,以及所述各顶点的像素值。The three-dimensional virtual model can be used to indicate the outline and/or texture color of the shoe material. In some embodiments, the 3D virtual model may include the 3D coordinates of each vertex of the shoe material in 3D space, and the pixel value of each vertex.
通过将利用所述三维虚拟模型对所述脚部对应的位置叠加,可以在所述待处理图像中展现鞋子素材,达到虚拟试鞋的增强现实效果。By superimposing the positions corresponding to the feet using the three-dimensional virtual model, shoe material can be displayed in the image to be processed, thereby achieving the augmented reality effect of virtual shoe fitting.
参见图5,图5为本申请示出的一种虚拟试鞋流程示意图。Referring to FIG. 5 , FIG. 5 is a schematic diagram of a virtual shoe fitting process shown in the present application.
如图5所示,所述方法可以包括S51-S54。As shown in Fig. 5, the method may include S51-S54.
其中,S51,获取鞋子素材对应的三维虚拟模型在所述三维空间中的初始位姿。Wherein, S51, acquire the initial pose of the 3D virtual model corresponding to the shoe material in the 3D space.
所述初始位姿,用于指示所述鞋子素材在三维空间中的姿态信息。在一些实施例中,可以在数据库中预先维护鞋子素材在所述三维空间的中的初始位姿。执行S51时,从所述数据库中获取所述三维位姿即可。The initial pose is used to indicate the pose information of the shoe material in three-dimensional space. In some embodiments, the initial pose of the shoe material in the three-dimensional space may be pre-maintained in the database. When executing S51, the three-dimensional pose can be acquired from the database.
S52,基于所述三维位姿,对所述初始位姿进行转换到与所述脚部对应的三维位姿相匹配,得到转换后的三维位姿。S52. Based on the three-dimensional pose, transform the initial pose to match the three-dimensional pose corresponding to the foot, and obtain a transformed three-dimensional pose.
基于脚部的三维位姿可以确定初始位姿的位姿转换信息。所述位姿转换信息可以指示所述初始位姿相对于三维坐标系的X轴、Y轴、Z轴的平移量与旋转量。The pose transformation information of the initial pose can be determined based on the 3D pose of the foot. The pose transformation information may indicate translation and rotation amounts of the initial pose relative to the X-axis, Y-axis, and Z-axis of the three-dimensional coordinate system.
所述转换后的三维位姿,可以指示将所述鞋子素材转换到所述脚部的位姿后的位姿信息。The converted three-dimensional pose may indicate pose information after converting the shoe material to the pose of the foot.
在一些实施例中,所述位姿转换信息可以包括平移与旋转信息,在执行S52时,可以利用所述位姿转换信息对所述初始位置进行平移旋转操作,得到将所述鞋子素材平移旋转到所述脚部的位姿后的位姿信息。In some embodiments, the pose transformation information may include translation and rotation information. When performing S52, the pose transformation information may be used to perform a translation and rotation operation on the initial position to obtain the translation and rotation of the shoe material. Pose information after arriving at the pose of the foot.
S53,基于所述转换后的三维位姿,将所述三维虚拟模型映射至所述待处理图像,得到与所述鞋子素材对应的二维虚拟素材。S53. Based on the converted 3D pose, map the 3D virtual model to the image to be processed to obtain a 2D virtual material corresponding to the shoe material.
在执行S53时,可以先基于所述转换后的三维位姿,调整所述三维虚拟模型中各顶点的三维坐标,得到调整后的三维虚拟模型。然后利用投影算法将调整后的三维虚拟模型中各顶点的三维坐标投影到所述待处理图像所在的二维平面,得到所述各顶点的二维坐标。根据所述各顶点的二维坐标可以确定与所述鞋子素材对应的二维虚拟素材对应的形状,根据所述各顶点的像素值(通过三维虚拟模型中的各顶点的像素值得到)可以确定所述二维虚拟素材对应的纹理和颜色,从而得到所述二维虚拟素材。When executing S53, the three-dimensional coordinates of each vertex in the three-dimensional virtual model may be adjusted based on the converted three-dimensional pose first, to obtain an adjusted three-dimensional virtual model. Then use a projection algorithm to project the three-dimensional coordinates of each vertex in the adjusted three-dimensional virtual model to the two-dimensional plane where the image to be processed is located, to obtain the two-dimensional coordinates of each vertex. The shape corresponding to the two-dimensional virtual material corresponding to the shoe material can be determined according to the two-dimensional coordinates of the vertices, and can be determined according to the pixel values of the vertices (obtained from the pixel values of the vertices in the three-dimensional virtual model). The texture and color corresponding to the two-dimensional virtual material, so as to obtain the two-dimensional virtual material.
S54,将所述二维虚拟素材与所述脚部对应的位置进行图像融合,得到融合后的图像,以进行虚拟试鞋的AR效果展示。S54. Perform image fusion of the two-dimensional virtual material and the corresponding position of the foot to obtain a fused image for AR effect display of virtual shoe fitting.
在一些实施例中,可以采用将所述二维虚拟素材各像素点覆盖所述脚部对应位置的像素点的方式,或者调整所述脚部对应位置的像素点透明度等方式,完成图像融合,得到融合后的图像,由此通过输出该融合后的图像即可进行虚拟试鞋的AR效果的展示。In some embodiments, the image fusion can be completed by covering each pixel of the two-dimensional virtual material with the pixel at the corresponding position of the foot, or adjusting the transparency of the pixel at the corresponding position of the foot. The fused image is obtained, and the AR effect of virtual shoe fitting can be displayed by outputting the fused image.
以下结合虚拟试鞋场景进行实施例说明。Embodiments will be described below in conjunction with a virtual shoe fitting scene.
在虚拟试鞋场景中可以利用三维空间中的鞋子对视频流捕捉到的脚部进行融合(例如,图像渲染),完成虚拟试鞋。In the virtual shoe fitting scene, the shoes in the three-dimensional space can be used to fuse (for example, image rendering) the feet captured by the video stream to complete the virtual shoe fitting.
虚拟试鞋客户端可以搭载于移动终端中。该移动终端可以搭载摄像头,用于实时 采集视频流。The virtual shoe trial client can be carried in the mobile terminal. The mobile terminal can be equipped with a camera for real-time collection of video streams.
虚拟鞋库可以安装在移动终端(以下简称终端)本地,或与虚拟试鞋端(以下简称客户端)对应的服务端中。所述虚拟鞋库可以包括开发的多款鞋子素材的三维虚拟模型。所述虚拟鞋库可以是任意类型的数据库。The virtual shoe library can be installed locally on the mobile terminal (hereinafter referred to as the terminal), or in a server corresponding to the virtual shoe-trying terminal (hereinafter referred to as the client). The virtual shoe library may include 3D virtual models of various shoe materials developed. The virtual shoe library can be any type of database.
用户可以在虚拟试鞋客户端中从所述虚拟鞋库中选择要试穿的鞋子,并通过所述摄像头采集脚部视频流。The user can select shoes to try on from the virtual shoe library in the virtual shoe trial client, and collect foot video streams through the camera.
参见图6,图6为本申请示出的一种虚拟试鞋方法的流程示意图。如图6所示,该方法可以包括S601-S611。Referring to FIG. 6 , FIG. 6 is a schematic flowchart of a virtual shoe fitting method shown in the present application. As shown in Fig. 6, the method may include S601-S611.
针对所述视频流中的第一图像可以执行S601。其中,所述第一图像包括视频流中的首帧图像,或者是根据S607-S610步骤判定脚部跟踪失败后的图像。S601,利用预先训练的对象检测模型,对所述第一图像进行对象检测,得到所述第一图像中的脚部检测框,以及脚部对应的类型,即所述脚部为左脚或右脚。以下假设该脚部为右脚。S601 may be executed for the first image in the video stream. Wherein, the first image includes the first frame image in the video stream, or the image after it is determined that the foot tracking fails according to steps S607-S610. S601. Use the pre-trained object detection model to perform object detection on the first image to obtain the foot detection frame in the first image and the corresponding type of the foot, that is, the foot is a left foot or a right foot. foot. In the following, it is assumed that this foot is the right foot.
S602,根据所述脚部检测框,获取所述脚部在第一图像中的第一区域图像。S602. According to the foot detection frame, acquire a first area image of the foot in the first image.
S603,响应于所述脚部为右脚,对所述第一区域图像进行翻转处理,以使所述第一区域图像为左脚图像,便于脚部关键点检测模型进行检测。其中,脚部关键点检测模型是根据标注了第一脚部关键点的位置信息的左脚图像样本训练得到。S603. In response to the fact that the foot is a right foot, flipping the first region image is performed so that the first region image is a left foot image, which is convenient for the foot key point detection model to detect. Wherein, the foot key point detection model is trained according to the left foot image sample with the position information of the first foot key point marked.
所述第一脚部关键点包括以下个区域中至少部分脚部区域的关键点:大脚趾脚尖;前脚掌内侧关节;脚内侧足弓;后脚掌内侧;脚后跟后方;后脚掌外侧;前脚掌外侧关节;前脚面和腿连接处;脚踝内侧关节;后脚筋;脚踝外侧关节。由此可以细粒度的对脚部轮廓进行详细的描述,包括大量的脚部三维信息,提升位姿估计准确性。The key points of the first foot include the key points of at least part of the foot area in the following areas: the tip of the big toe; the inner joint of the forefoot; the inner arch of the foot; the inner side of the rear sole; the rear of the heel; the outer side of the rear sole; joint; junction of forefoot and leg; medial ankle joint; rear hamstring; lateral ankle joint. In this way, the outline of the foot can be described in detail at a fine-grained level, including a large amount of three-dimensional information of the foot, and the accuracy of pose estimation can be improved.
S604,根据所述脚部关键点检测模型,得到所述脚部的第一脚部关键点的位置信息。S604. Obtain position information of a first foot key point of the foot according to the foot key point detection model.
S605,根据所述第一脚部关键点的位置信息,根据PNP算法,得到所述脚部在所述三维空间的三维位姿。S605. Obtain the three-dimensional pose of the foot in the three-dimensional space according to the position information of the key point of the first foot according to the PNP algorithm.
S606,从所述虚拟鞋库中获取与右脚对应的鞋子的三维虚拟模型,并根据获取的该三维虚拟模型,以及所述脚部的三维位姿,将所述鞋子与脚部进行融合,以展示虚拟试鞋效果。S606的说明可以参照S51-S54,在此不做详述。S606. Obtain a 3D virtual model of the shoe corresponding to the right foot from the virtual shoe library, and fuse the shoe with the foot according to the obtained 3D virtual model and the 3D pose of the foot, To demonstrate the effect of virtual shoe fitting. For the description of S606, reference may be made to S51-S54, which will not be described in detail here.
针对所述视频流中的第二图像可以执行S607。其中,所述第二图像可以是所述视频流中首帧图像之后的任意非首帧图像。S607,获取所述第二图像的前一帧图像的第一脚部关键点位置信息,并基于该位置信息确定脚部关键点框。S607 may be executed for the second image in the video stream. Wherein, the second image may be any non-first frame image after the first frame image in the video stream. S607. Acquire position information of the first key point of the foot in a previous frame image of the second image, and determine a key point frame of the foot based on the position information.
S608,根据所述脚部关键点框,在所述第二图像中扣出第二区域图像。S608. According to the foot key point frame, extract a second region image from the second image.
S609,响应于所述脚部为右脚,对所述第二区域图像进行翻转,以使所述第二区域图像为左脚图像,便于脚部跟踪模型进行检测。其中,脚部跟踪模型包括分类分支与关键点检测分支,该模型是根据标注了第一脚部关键点的位置信息的左脚图像样本,以及图像分类信息的左脚图像样本进行联合训练得到。S609. In response to the fact that the foot is the right foot, reverse the image in the second area, so that the image in the second area is an image of the left foot, which is convenient for the foot tracking model to detect. Among them, the foot tracking model includes a classification branch and a key point detection branch. The model is obtained through joint training based on the left foot image sample marked with the position information of the first foot key point and the left foot image sample with image classification information.
S610,根据预先训练的脚部跟踪模型的分类分支,确定所述第二区域图像是否为脚部图像,以及根据预先训练的脚部跟踪模型的关键点分支,确定所述脚部的第一脚部关键点的位置信息。S610, according to the classification branch of the pre-trained foot tracking model, determine whether the second region image is a foot image, and according to the key point branch of the pre-trained foot tracking model, determine the first foot of the foot location information of key points.
如果基于所述图像分类信息判断所述第二区域图像为脚部图像,则可以确定所述第二图像中的脚部为其前一帧图像中出现的脚部,完成脚部跟踪,并执行S605与S606,展示虚拟试鞋效果。If it is judged that the second region image is a foot image based on the image classification information, it can be determined that the foot in the second image is the foot that appeared in the previous frame image, complete the foot tracking, and execute S605 and S606 show the effect of virtual shoe fitting.
如果基于所述图像分类信息判断所述第二区域图像不是脚部图像,则脚部跟踪失败,可以将所述第二图像作为所述第一图像执行S601-S606,得到所述其它图像中脚部的三维位姿。If it is judged based on the image classification information that the second area image is not a foot image, the foot tracking fails, and S601-S606 can be performed using the second image as the first image to obtain the foot in the other images. 3D pose of the head.
在前述方案中一方面,可以利用脚部关键点检测模型,回归得到第一脚部关键点的位置信息;然后可以基于第一脚部关键点从二维到三维空间的映射,得到所述脚部的三维位姿;之后可以基于所述待处理图像中脚部的三维位姿,在第一图像或第二图像中与所述脚部对应的位置叠加鞋子对应的三维虚拟模型,得到虚拟试鞋的增强现实效果。In the foregoing solution, on the one hand, the key point detection model of the foot can be used to obtain the position information of the key point of the first foot through regression; The three-dimensional pose of the foot; then based on the three-dimensional pose of the foot in the image to be processed, the three-dimensional virtual model corresponding to the shoe can be superimposed on the position corresponding to the foot in the first image or the second image to obtain a virtual trial Augmented reality effect of shoes.
另一方面,可以利用相邻帧图像中脚部的位置不会发生明显变化的特性,对视频流中的脚部进行跟踪,减少通过对象检测模型进行脚部跟踪带来的运算量,降低开销,提升脚部跟踪效率,从而提升虚拟试鞋方法的实时性。On the other hand, the feature that the position of the foot in the adjacent frame image does not change significantly can be used to track the foot in the video stream, reducing the amount of calculations brought by the foot tracking through the object detection model, and reducing the overhead , improve the efficiency of foot tracking, thereby improving the real-time performance of the virtual shoe fitting method.
与前述实施例相应的,本申请提出一种图像处理装置70。Corresponding to the foregoing embodiments, the present application proposes an image processing device 70 .
参见图7,图7为本申请示出的一种图像处理装置的结构示意图。Referring to FIG. 7 , FIG. 7 is a schematic structural diagram of an image processing device shown in the present application.
如图7所示,所述装置70可以包括:As shown in Figure 7, the device 70 may include:
第一获取模块71,用于获取待处理图像中与脚部对应的区域图像;The first obtaining module 71 is used to obtain the area image corresponding to the foot in the image to be processed;
关键点检测模块72,用于利用脚部关键点检测模型,对所述区域图像进行关键点检测,得到所述脚部的第一脚部关键点的二维位置信息;The key point detection module 72 is used to use the foot key point detection model to perform key point detection on the region image to obtain the two-dimensional position information of the first foot key point of the foot;
确定模块73,用于基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿。A determining module 73, configured to be based on the difference between the preset position information in the three-dimensional space of the second key point of the foot corresponding to the first key point in the three-dimensional model of the foot and the two-dimensional position information The mapping relationship determines the three-dimensional pose of the foot in the three-dimensional space.
在一些实施例中,所述第一获取模块71具体用于:In some embodiments, the first obtaining module 71 is specifically configured to:
利用对象检测模型,对所述待处理图像进行对象检测,得到对象检测结果,所述对象检测结果包括所述待处理图像中脚部的检测框;Using an object detection model to perform object detection on the image to be processed to obtain an object detection result, the object detection result includes a detection frame of the foot in the image to be processed;
根据所述检测框以及所述待处理图像,得到所述脚部对应的区域图像。According to the detection frame and the image to be processed, an area image corresponding to the foot is obtained.
在一些实施例中,所述对象检测结果还包括所述脚部的类型;所述类型用于指示脚部为左脚或右脚;In some embodiments, the object detection result further includes the type of the foot; the type is used to indicate that the foot is a left foot or a right foot;
所述装置70还包括:The device 70 also includes:
第一翻转模块,用于响应于所述脚部为预设类型,对所述区域图像进行翻转处理,以使输入所述脚部关键点检测模型的所有区域图像中的脚部的类型一致。The first inversion module is configured to perform inversion processing on the region images in response to the feet being of a preset type, so that the types of feet in all region images input to the foot key point detection model are consistent.
在一些实施例中,所述第一获取模块71,包括:In some embodiments, the first acquisition module 71 includes:
获取所述视频流中所述待处理图像的前一帧图像中脚部的第一脚部关键点位置信息,并根据获取的位置信息,确定脚部关键点框;Obtaining the position information of the first key point of the foot in the previous frame image of the image to be processed in the video stream, and determining the key point frame of the foot according to the acquired position information;
基于所述脚部关键点框与所述待处理图像,得到所述脚部对应的区域图像。Based on the key point frame of the foot and the image to be processed, an area image corresponding to the foot is obtained.
在一些实施例中,所述装置70还包括:In some embodiments, the device 70 also includes:
第二获取模块,用于获取存储的所述前一帧图像中脚部的类型;The second obtaining module is used to obtain the type of the foot in the stored image of the previous frame;
第二翻转模块,用于响应于所述前一帧图像中脚部的类型为预设类型,对所述区域图像进行翻转处理,以使输入脚部关键点检测模型的所有区域图像中的脚部的类型一致。The second flip module is used to perform flip processing on the regional images in response to the type of the feet in the previous frame image being a preset type, so that the feet in all the regional images of the input foot key point detection model The type of department is the same.
在一些实施例中,所述待处理图像为视频流中的图像,所述视频流中还包括在所 述待处理图像的前一帧图像;所述待处理图像的所述区域图像根据基于所述前一帧图像中的第一脚部关键点确定的脚部关键点框来确定;In some embodiments, the image to be processed is an image in a video stream, and the video stream also includes an image in the previous frame of the image to be processed; the region image of the image to be processed is based on the Determine the foot key point frame determined by the first foot key point in the previous frame image;
所述装置70还包括:The device 70 also includes:
跟踪模块,用于利用图像分类模型,对所述区域图像进行分类,得到所述区域图像的分类结果;所述分类结果用于指示所述区域图像中是否为脚部图像;A tracking module, configured to use an image classification model to classify the region image to obtain a classification result of the region image; the classification result is used to indicate whether the region image is a foot image;
响应于所述分类结果指示所述区域图像为脚部图像,确定所述区域图像中的脚部与所述前一帧图像中的脚部为同一脚部,以进行脚部跟踪。In response to the classification result indicating that the area image is a foot image, it is determined that the foot in the area image is the same foot as the foot in the previous frame image, so as to perform foot tracking.
在一些实施例中,所述确定模块73具体用于:In some embodiments, the determining module 73 is specifically configured to:
响应于确定所述区域图像中的脚部与所述前一帧图像中的脚部为同一脚部,基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿。In response to determining that the foot in the region image is the same as the foot in the previous frame image, based on a second foot corresponding to the key point of the first foot in the preset three-dimensional model of the foot The mapping relationship between the preset position information of key points in the three-dimensional space and the two-dimensional position information determines the three-dimensional pose of the foot in the three-dimensional space.
在一些实施例中,所述图像分类模型与所述脚部关键点检测模型共享特征提取网络;所述装置70还包括:In some embodiments, the image classification model shares a feature extraction network with the foot key point detection model; the device 70 also includes:
所述图像分类模型与所述脚部关键点检测模型的联合训练模块,用于获取标注了图像分类信息的第一图像样本,以及标注了第一脚部关键点位置信息的第二图像样本;The joint training module of the image classification model and the foot key point detection model is used to obtain the first image sample marked with image classification information, and the second image sample marked with the position information of the first foot key point;
将所述第一图像样本输入所述图像分类模型得到分类预测结果,并根据所述分类预测结果与标注的图像分类信息,得到第一损失信息;inputting the first image sample into the image classification model to obtain a classification prediction result, and obtaining first loss information according to the classification prediction result and the marked image classification information;
将所述第二图像样本输入所述脚部关键点检测模型得到第一脚部关键点位置预测结果,并根据所述第一脚部关键点位置预测结果与标注的第一脚部关键点位置信息,得到第二损失信息;Inputting the second image sample into the foot key point detection model to obtain a first foot key point position prediction result, and according to the first foot key point position prediction result and the marked first foot key point position Information, get the second loss information;
基于所述第一损失信息与所述第二损失信息,调整所述图像分类模型与所述脚部关键点检测模型的模型参数。Based on the first loss information and the second loss information, model parameters of the image classification model and the foot key point detection model are adjusted.
在一些实施例中,所述第一脚部关键点包括脚部边缘轮廓上的多个关键点;所述第一脚部关键点的数量不少于四个。In some embodiments, the first key points of the foot include a plurality of key points on the contour of the edge of the foot; the number of the first key points of the foot is not less than four.
在一些实施例中,所述第一脚部关键点包括以下至少一个区域的关键点:In some embodiments, the first foot key points include key points of at least one of the following regions:
大脚趾脚尖;前脚掌内侧关节;脚内侧足弓;后脚掌内侧;脚后跟后方;后脚掌外侧;前脚掌外侧关节;前脚面和腿连接处;脚踝内侧关节;后脚筋;脚踝外侧关节。tip of big toe; medial forefoot joint; medial arch of foot; medial rear ball; rear of heel; lateral rear ball; lateral forefoot joint; junction of forefoot and leg; medial ankle joint; rear hamstring; lateral ankle joint.
在一些实施例中,所述装置70还包括:In some embodiments, the device 70 also includes:
虚拟试鞋模块,用于获取鞋子素材的三维虚拟模型;The virtual shoe-trying module is used to obtain the three-dimensional virtual model of the shoe material;
基于所述待处理图像中脚部的三维位姿,在所述待处理图像中与所述脚部对应的位置叠加所述三维虚拟模型,得到虚拟试鞋的增强现实效果。Based on the three-dimensional pose of the foot in the image to be processed, the three-dimensional virtual model is superimposed on the position corresponding to the foot in the image to be processed to obtain an augmented reality effect of virtual shoe fitting.
在一些实施例中,所述虚拟试鞋模块,具体用于:In some embodiments, the virtual shoe fitting module is specifically used for:
获取鞋子素材对应的三维虚拟模型在所述三维空间中的初始位姿;Obtaining the initial pose of the three-dimensional virtual model corresponding to the shoe material in the three-dimensional space;
基于所述三维位姿,对所述初始位姿进行转换到与所述脚部对应的三维位姿相匹配,得到转换后的三维位姿;Based on the three-dimensional pose, converting the initial pose to match the three-dimensional pose corresponding to the foot to obtain a converted three-dimensional pose;
基于所述转换后的三维位姿,将所述三维虚拟模型映射至所述待处理图像,得到与所述鞋子素材对应的二维虚拟素材;Mapping the 3D virtual model to the image to be processed based on the converted 3D pose to obtain a 2D virtual material corresponding to the shoe material;
将所述二维虚拟素材与所述脚部对应的位置进行图像融合,得到融合后的图像,以进行虚拟试鞋的AR效果展示。Image fusion of the two-dimensional virtual material and the corresponding position of the feet is performed to obtain a fused image for AR effect display of virtual shoe fitting.
本申请示出的图像处理装置的实施例可以应用于电子设备上。相应地,本申请公开了一种电子设备,该设备可以包括:处理器。Embodiments of the image processing apparatus shown in this application can be applied to electronic equipment. Correspondingly, the present application discloses an electronic device, and the device may include: a processor.
用于存储处理器可执行指令的存储器。Memory used to store processor-executable instructions.
其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现前述任一实施例示出的图像处理方法。Wherein, the processor is configured to call executable instructions stored in the memory to implement the image processing method shown in any one of the foregoing embodiments.
本申请示出的图像处理装置的实施例可以应用于电子设备上。相应地,本申请公开了一种电子设备,该设备可以包括:处理器。Embodiments of the image processing apparatus shown in this application can be applied to electronic equipment. Correspondingly, the present application discloses an electronic device, and the device may include: a processor.
用于存储处理器可执行指令的存储器。Memory used to store processor-executable instructions.
其中,所述处理器被配置为调用所述存储器中存储的可执行指令,实现前述任一实施例示出的图像处理方法。Wherein, the processor is configured to call executable instructions stored in the memory to implement the image processing method shown in any one of the foregoing embodiments.
参见图8,图8为本申请示出的一种电子设备的硬件结构示意图。Referring to FIG. 8 , FIG. 8 is a schematic diagram of a hardware structure of an electronic device shown in the present application.
如图8所示,该电子设备可以包括用于执行指令的处理器,用于进行网络连接的网络接口,用于为处理器存储运行数据的内存,以及用于存储图像处理装置对应指令的非易失性存储器。As shown in FIG. 8, the electronic device may include a processor for executing instructions, a network interface for connecting to a network, a memory for storing operation data for the processor, and a memory for storing instructions corresponding to the image processing device. volatile memory.
其中,装置的实施例可以通过软件实现,也可以通过硬件或者软硬件结合的方式实现。以软件实现为例,作为一个逻辑意义上的装置,是通过其所在电子设备的处理器将非易失性存储器中对应的计算机程序指令读取到内存中运行形成的。从硬件层面而言,除了图8所示的处理器、内存、网络接口、以及非易失性存储器之外,实施例中装置所在的电子设备通常根据该电子设备的实际功能,还可以包括其他硬件,对此不再赘述。Wherein, the embodiment of the apparatus may be implemented by software, or by hardware or a combination of software and hardware. Taking software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory for operation by the processor of the electronic device where it is located. From the perspective of hardware, in addition to the processor, memory, network interface, and non-volatile memory shown in Figure 8, the electronic device where the device in the embodiment is usually based on the actual function of the electronic device can also include other Hardware, no more details on this.
可以理解的是,为了提升处理速度,装置对应指令也可以直接存储于内存中,在此不作限定。It can be understood that, in order to increase the processing speed, the device corresponding instructions may also be directly stored in the memory, which is not limited herein.
本申请提出一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序可以用于使处理器执行前述任一实施例示出的图像处理方法。The present application proposes a computer-readable storage medium, the storage medium stores a computer program, and the computer program can be used to cause a processor to execute the image processing method shown in any one of the foregoing embodiments.
本领域技术人员应明白,本申请一个或多个实施例可提供为方法、系统或计算机程序产品。因此,本申请一个或多个实施例可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请一个或多个实施例可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that one or more embodiments of the present application may be provided as a method, system or computer program product. Accordingly, one or more embodiments of the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, one or more embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) having computer-usable program code embodied therein.
本申请中记载的“和/或”表示至少具有两者中的其中一个,例如,“A和/或B”包括三种方案:A、B、以及“A和B”。"And/or" described in this application means at least one of the two, for example, "A and/or B" includes three options: A, B, and "A and B".
本申请中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于数据处理设备实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in the present application is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the data processing device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, please refer to part of the description of the method embodiment.
所述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下,在权利要求书中记载的行为或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外,在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中,多任务处理和并行处理也是可以的或者可能是有利的。The foregoing describes specific embodiments of the present application. Other implementations are within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in an order different from that in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. Multitasking and parallel processing are also possible or may be advantageous in certain embodiments.
本申请中描述的主题及功能操作的实施例可以在以下中实现:数字电子电路、有形体现的计算机软件或固件、包括本申请中公开的结构及其结构性等同物的计算机硬件、或者它们中的一个或多个的组合。本申请中描述的主题的实施例可以实现为一个或多个计算机程序,即编码在有形非暂时性程序载体上以被数据处理装置执行或控制数据处理装置的操作的计算机程序指令中的一个或多个模块。可替代地或附加地,程序指令可以被编码在人工生成的传播信号上,例如机器生成的电、光或电磁信号,该信号被生成以将信息编码并传输到合适的接收机装置以由数据处理装置执行。计算机存储介质可以是机器可读存储设备、机器可读存储基板、随机或串行存取存储器设备、或它们中的一个或多个的组合。Embodiments of the subject matter and functional operations described in this application can be implemented in digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this application and their structural equivalents, or in A combination of one or more of . Embodiments of the subject matter described in this application can be implemented as one or more computer programs, i.e. one or more of computer program instructions encoded on a tangible, non-transitory program carrier for execution by or to control the operation of data processing apparatus. Multiple modules. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, such as a machine-generated electrical, optical or electromagnetic signal, which is generated to encode and transmit information to a suitable receiver device for transmission by the data The processing means executes. A computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
本申请中描述的处理及逻辑流程可以由执行一个或多个计算机程序的一个或多个可编程计算机执行,以通过根据输入数据进行操作并生成输出来执行相应的功能。所述处理及逻辑流程还可以由专用逻辑电路—例如FPGA(现场可编程门阵列)或ASIC(专用集成电路)来执行,并且装置也可以实现为专用逻辑电路。The processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, such as an FPGA (Field Programmable Gate Array) or an ASIC (Application Specific Integrated Circuit).
适合用于执行计算机程序的计算机包括,例如通用和/或专用微处理器,或任何其他类型的中央处理系统。通常,中央处理系统将从只读存储器和/或随机存取存储器接收指令和数据。计算机的基本组件包括用于实施或执行指令的中央处理系统以及用于存储指令和数据的一个或多个存储器设备。通常,计算机还将包括用于存储数据的一个或多个大容量存储设备,例如磁盘、磁光盘或光盘等,或者计算机将可操作地与此大容量存储设备耦接以从其接收数据或向其传送数据,抑或两种情况兼而有之。然而,计算机不是必须具有这样的设备。此外,计算机可以嵌入在另一设备中,例如移动电话、个人数字助理(PDA)、移动音频或视频播放器、游戏操纵台、全球定位系统(GPS)接收机、或例如通用串行总线(USB)闪存驱动器的便携式存储设备,仅举几例。Computers suitable for the execution of a computer program include, for example, general and/or special purpose microprocessors, or any other type of central processing system. Typically, a central processing system will receive instructions and data from read only memory and/or random access memory. The basic components of a computer include a central processing system for implementing or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to, one or more mass storage devices for storing data, such as magnetic or magneto-optical disks, or optical disks, to receive data therefrom or to It transmits data, or both. However, a computer is not required to have such a device. In addition, a computer may be embedded in another device such as a mobile phone, a personal digital assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a device such as a Universal Serial Bus (USB) ) portable storage devices like flash drives, to name a few.
适合于存储计算机程序指令和数据的计算机可读介质包括所有形式的非易失性存储器、媒介和存储器设备,例如包括半导体存储器设备(例如EPROM、EEPROM和闪存设备)、磁盘(例如内部硬盘或可移动盘)、磁光盘以及0xCD_00ROM和DVD-ROM盘。处理器和存储器可由专用逻辑电路补充或并入专用逻辑电路中。Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media, and memory devices, including, for example, semiconductor memory devices (such as EPROM, EEPROM, and flash memory devices), magnetic disks (such as internal hard disks or removable disk), magneto-optical disk, and 0xCD_00ROM and DVD-ROM disks. The processor and memory can be supplemented by, or incorporated in, special purpose logic circuitry.
虽然本申请包含许多具体实施细节,但是这些不应被解释为限制任何公开的范围或所要求保护的范围,而是主要用于描述特定公开的具体实施例的特征。本申请内在多个实施例中描述的某些特征也可以在单个实施例中被组合实施。另一方面,在单个实施例中描述的各种特征也可以在多个实施例中分开实施或以任何合适的子组合来实施。此外,虽然特征可以如上所述在某些组合中起作用并且甚至最初如此要求保护,但是来自所要求保护的组合中的一个或多个特征在一些情况下可以从该组合中去除,并且所要求保护的组合可以指向子组合或子组合的变型。While this application contains many specific implementation details, these should not be construed as limitations on the scope of any disclosure or of what may be claimed, but rather as primarily describing features of particular disclosed embodiments. Certain features that are described in this application in multiple embodiments can also be implemented in combination in a single embodiment. On the other hand, various features that are described in a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may function in certain combinations as described above and even be initially so claimed, one or more features from a claimed combination may in some cases be removed from that combination and the claimed A protected combination can point to a subcombination or a variant of a subcombination.
类似地,虽然在附图中以特定顺序描绘了操作,但是这不应被理解为要求这些操作以所示的特定顺序执行或顺次执行、或者要求所有例示的操作被执行,以实现期望的结果。在某些情况下,多任务和并行处理可能是有利的。此外,所述实施例中的各种系统模块和组件的分离不应被理解为在所有实施例中均需要这样的分离,并且应当理解,所描述的程序组件和系统通常可以一起集成在单个软件产品中,或者封装成多个软件产品。Similarly, while operations are depicted in the figures in a particular order, this should not be construed as requiring that those operations be performed in the particular order shown, or sequentially, or that all illustrated operations be performed, to achieve the desired result. In some cases, multitasking and parallel processing may be advantageous. Furthermore, the separation of the various system modules and components in the described embodiments should not be construed as requiring such separation in all embodiments, and it should be understood that the described program components and systems can often be integrated together in a single software product, or packaged into multiple software products.
由此,主题的特定实施例已被描述。其他实施例在所附权利要求书的范围以内。在某些情况下,权利要求书中记载的动作可以以不同的顺序执行并且仍实现期望的结果。此外,附图中描绘的处理并非必需所示的特定顺序或顺次顺序,以实现期望的结果。在某些实现中,多任务和并行处理可能是有利的。Thus, certain embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
以上所述仅为本申请一个或多个实施例的较佳实施例而已,并不用以限制本申请 一个或多个实施例,凡在本申请一个或多个实施例的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本申请一个或多个实施例保护的范围之内。The above descriptions are only preferred embodiments of one or more embodiments of the present application, and are not intended to limit one or more embodiments of the present application. Within the spirit and principles of one or more embodiments of the present application, Any modification, equivalent replacement, improvement, etc. should be included in the protection scope of one or more embodiments of the present application.

Claims (17)

  1. 一种图像处理方法,其特征在于,所述方法包括:An image processing method, characterized in that the method comprises:
    获取待处理图像中与脚部对应的区域图像;Obtain the image of the area corresponding to the foot in the image to be processed;
    利用脚部关键点检测模型,对所述区域图像进行关键点检测,得到所述脚部的第一脚部关键点的二维位置信息;Using the foot key point detection model to perform key point detection on the region image to obtain two-dimensional position information of the first foot key point of the foot;
    基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿。Based on the mapping relationship between the preset position information of the second foot key point corresponding to the first foot key point in the three-dimensional space and the two-dimensional position information in the preset three-dimensional model of the foot, determine the The three-dimensional pose of the foot in the three-dimensional space.
  2. 根据权利要求1所述的方法,其特征在于,所述获取所述待处理图像中与所述脚部对应的所述区域图像,包括:The method according to claim 1, wherein the acquiring the region image corresponding to the foot in the image to be processed comprises:
    利用对象检测模型,对所述待处理图像进行对象检测,得到对象检测结果,所述对象检测结果包括所述待处理图像中所述脚部的检测框;Using an object detection model, performing object detection on the image to be processed to obtain an object detection result, the object detection result including the detection frame of the foot in the image to be processed;
    根据所述检测框以及所述待处理图像,得到所述脚部对应的所述区域图像。According to the detection frame and the image to be processed, the region image corresponding to the foot is obtained.
  3. 根据权利要求2所述的方法,其特征在于,所述对象检测结果还包括所述脚部的类型;所述类型用于指示脚部为左脚或右脚;The method according to claim 2, wherein the object detection result further includes the type of the foot; the type is used to indicate that the foot is a left foot or a right foot;
    所述方法还包括:The method also includes:
    响应于所述脚部为预设类型,对所述区域图像进行翻转处理,以使输入所述脚部关键点检测模型的所有区域图像中的脚部的类型一致。Responding to the fact that the foot is of a preset type, flipping is performed on the region images, so that the types of feet in all region images input to the foot key point detection model are consistent.
  4. 根据权利要求1至3任一所述的方法,其特征在于,所述待处理图像为视频流中的图像;所述获取所述待处理图像中与所述脚部对应的区域图像,包括:The method according to any one of claims 1 to 3, wherein the image to be processed is an image in a video stream; the acquisition of an image of an area corresponding to the foot in the image to be processed includes:
    获取所述视频流中所述待处理图像的前一帧图像中所述脚部的第一脚部关键点的位置信息,并根据获取的所述位置信息,确定脚部关键点框;Acquiring the position information of the first foot key point of the foot in the previous frame image of the image to be processed in the video stream, and determining the foot key point frame according to the acquired position information;
    基于所述脚部关键点框与所述待处理图像,得到所述脚部对应的区域图像。Based on the key point frame of the foot and the image to be processed, an area image corresponding to the foot is obtained.
  5. 根据权利要求4所述的方法,其特征在于,所述方法还包括:The method according to claim 4, characterized in that the method further comprises:
    获取存储的所述前一帧图像中脚部的类型;Obtain the type of the foot in the stored image of the previous frame;
    响应于所述前一帧图像中脚部的类型为预设类型,对所述区域图像进行翻转处理,以使输入所述脚部关键点检测模型的所有区域图像中的脚部的类型一致。Responding to the fact that the type of the foot in the previous frame image is a preset type, the regional image is flipped so that the types of feet in all the regional images input to the key point detection model of the foot are consistent.
  6. 根据权利要求1至5任一所述的方法,其特征在于,所述待处理图像为视频流中的图像,所述视频流中还包括在所述待处理图像的前一帧图像;所述待处理图像的所述区域图像根据基于所述前一帧图像中的第一脚部关键点确定的脚部关键点框来确定;The method according to any one of claims 1 to 5, wherein the image to be processed is an image in a video stream, and the video stream also includes an image of a previous frame of the image to be processed; The region image of the image to be processed is determined according to the foot key point frame determined based on the first foot key point in the previous frame image;
    在基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿之前,还包括:Based on the mapping relationship between the preset position information of the second foot key point corresponding to the first foot key point in the three-dimensional space and the two-dimensional position information in the preset three-dimensional model of the foot, determine the Before the three-dimensional pose of the foot in the three-dimensional space, it also includes:
    利用图像分类模型,对所述区域图像进行分类,得到所述区域图像的分类结果;所述分类结果用于指示所述区域图像中是否为脚部图像;Using an image classification model to classify the region image to obtain a classification result of the region image; the classification result is used to indicate whether the region image is a foot image;
    响应于所述分类结果指示所述区域图像为脚部图像,确定所述区域图像中的脚部与所述前一帧图像中的脚部为同一脚部,以进行脚部跟踪。In response to the classification result indicating that the area image is a foot image, it is determined that the foot in the area image is the same foot as the foot in the previous frame image, so as to perform foot tracking.
  7. 根据权利要求6所述的方法,其特征在于,所述基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿,包括:The method according to claim 6, characterized in that, based on the preset position information in three-dimensional space of the second key point of the foot corresponding to the key point of the first foot in the preset three-dimensional model of the foot and The mapping relationship between the two-dimensional position information determines the three-dimensional pose of the foot in the three-dimensional space, including:
    响应于确定所述区域图像中的脚部与所述前一帧图像中的脚部为同一脚部,基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿。In response to determining that the foot in the region image is the same foot as the foot in the previous frame image, based on a second foot corresponding to the key point of the first foot in the preset three-dimensional model of the foot The mapping relationship between the preset position information of the key point in the three-dimensional space and the two-dimensional position information determines the three-dimensional pose of the foot in the three-dimensional space.
  8. 根据权利要求6或7所述的方法,其特征在于,所述图像分类模型与所述脚部关键点检测模型共享特征提取网络;所述图像分类模型与所述脚部关键点检测模型的联合 训练方法包括:The method according to claim 6 or 7, wherein the image classification model and the foot key point detection model share a feature extraction network; the joint of the image classification model and the foot key point detection model Training methods include:
    获取标注了图像分类信息的第一图像样本,以及标注了第一脚部关键点位置信息的第二图像样本;Obtaining a first image sample marked with image classification information, and a second image sample marked with position information of first foot key points;
    将所述第一图像样本输入所述图像分类模型得到分类预测结果,并根据所述分类预测结果与标注的图像分类信息,得到第一损失信息;inputting the first image sample into the image classification model to obtain a classification prediction result, and obtaining first loss information according to the classification prediction result and the marked image classification information;
    将所述第二图像样本输入所述脚部关键点检测模型得到第一脚部关键点位置预测结果,并根据所述第一脚部关键点位置预测结果与标注的第一脚部关键点位置信息,得到第二损失信息;Inputting the second image sample into the foot key point detection model to obtain a first foot key point position prediction result, and according to the first foot key point position prediction result and the marked first foot key point position Information, get the second loss information;
    基于所述第一损失信息与所述第二损失信息,调整所述图像分类模型与所述脚部关键点检测模型的模型参数。Based on the first loss information and the second loss information, model parameters of the image classification model and the foot key point detection model are adjusted.
  9. 根据权利要求1-8任一所述的方法,其特征在于,所述第一脚部关键点包括脚部边缘轮廓上的多个关键点;所述第一脚部关键点的数量不少于四个。The method according to any one of claims 1-8, wherein the first key point of the foot includes a plurality of key points on the edge contour of the foot; the quantity of the first key point of the foot is not less than four.
  10. 根据权利要求9所述的方法,其特征在于,所述第一脚部关键点包括以下至少一个区域的关键点:The method according to claim 9, wherein the first foot key points include key points of at least one of the following regions:
    大脚趾脚尖;前脚掌内侧关节;脚内侧足弓;后脚掌内侧;脚后跟后方;后脚掌外侧;前脚掌外侧关节;前脚面和腿连接处;脚踝内侧关节;后脚筋;脚踝外侧关节。tip of big toe; medial forefoot joint; medial arch of foot; medial rear ball; rear of heel; lateral rear ball; lateral forefoot joint; junction of forefoot and leg; medial ankle joint; rear hamstring; lateral ankle joint.
  11. 根据权利要求1-10任一所述的方法,其特征在于,在确定所述脚部在所述三维空间中的三维位姿后,还包括:The method according to any one of claims 1-10, wherein after determining the three-dimensional pose of the foot in the three-dimensional space, further comprising:
    获取鞋子素材的三维虚拟模型;Obtain the three-dimensional virtual model of the shoe material;
    基于所述待处理图像中所述脚部的三维位姿,在所述待处理图像中与所述脚部对应的位置叠加所述三维虚拟模型,得到虚拟试鞋的增强现实效果。Based on the three-dimensional pose of the foot in the image to be processed, the three-dimensional virtual model is superimposed on the position corresponding to the foot in the image to be processed to obtain an augmented reality effect of virtual shoe fitting.
  12. 根据权利要求11所述的方法,其特征在于,所述基于所述待处理图像中所述脚部的三维位姿,在所述待处理图像中与所述脚部对应的位置叠加所述三维虚拟模型,得到虚拟试鞋的增强现实效果,包括:The method according to claim 11, wherein, based on the three-dimensional pose of the foot in the image to be processed, superimposing the three-dimensional pose on the position corresponding to the foot in the image to be processed Virtual model, get the augmented reality effect of virtual shoe fitting, including:
    获取鞋子素材对应的三维虚拟模型在所述三维空间中的初始位姿;Obtaining the initial pose of the three-dimensional virtual model corresponding to the shoe material in the three-dimensional space;
    基于所述三维位姿,对所述初始位姿进行转换到与所述脚部对应的三维位姿相匹配,得到转换后的三维位姿;Based on the three-dimensional pose, converting the initial pose to match the three-dimensional pose corresponding to the foot to obtain a converted three-dimensional pose;
    基于所述转换后的三维位姿,将所述三维虚拟模型映射至所述待处理图像,得到与所述鞋子素材对应的二维虚拟素材;Mapping the 3D virtual model to the image to be processed based on the converted 3D pose to obtain a 2D virtual material corresponding to the shoe material;
    将所述二维虚拟素材与所述脚部对应的位置进行图像融合,得到融合后的图像,以进行虚拟试鞋的效果展示。Image fusion is performed on the two-dimensional virtual material and the corresponding position of the foot to obtain a fused image for displaying the effect of virtual shoe fitting.
  13. 一种图像处理装置,其特征在于,所述装置包括:An image processing device, characterized in that the device comprises:
    第一获取模块,用于获取待处理图像中与脚部对应的区域图像;The first acquisition module is used to acquire the area image corresponding to the foot in the image to be processed;
    关键点检测模块,用于利用脚部关键点检测模型,对所述区域图像进行关键点检测,得到所述脚部的第一脚部关键点的二维位置信息;A key point detection module, configured to use a foot key point detection model to perform key point detection on the region image to obtain two-dimensional position information of the first foot key point of the foot;
    确定模块,用于基于预设脚部三维模型中与所述第一脚部关键点对应的第二脚部关键点在三维空间中的预设位置信息和所述二维位置信息之间的映射关系,确定所述脚部在所述三维空间中的三维位姿。A determining module, configured to map between preset position information in three-dimensional space and the two-dimensional position information of the second key point of the foot corresponding to the first key point in the three-dimensional model of the preset foot relationship, and determine the three-dimensional pose of the foot in the three-dimensional space.
  14. 根据权利要求13所述的装置,其特征在于,所述待处理图像为视频流中的图像,所述视频流中还包括在所述待处理图像的前一帧图像;所述待处理图像的所述区域图像根据基于所述前一帧图像中的第一脚部关键点确定的脚部关键点框来确定;The device according to claim 13, wherein the image to be processed is an image in a video stream, and the video stream also includes an image in the previous frame of the image to be processed; The region image is determined according to the foot key point frame determined based on the first foot key point in the previous frame image;
    所述装置还包括:The device also includes:
    跟踪模块,用于利用图像分类模型,对所述区域图像进行分类,得到所述区域图像的分类结果;所述分类结果用于指示所述区域图像中是否为脚部图像;A tracking module, configured to use an image classification model to classify the region image to obtain a classification result of the region image; the classification result is used to indicate whether the region image is a foot image;
    响应于所述分类结果指示所述区域图像为脚部图像,确定所述区域图像中的脚部与所述前一帧图像中的脚部为同一脚部,以进行脚部跟踪。In response to the classification result indicating that the area image is a foot image, it is determined that the foot in the area image is the same foot as the foot in the previous frame image, so as to perform foot tracking.
  15. 根据权利要求13或14所述的装置,其特征在于,所述装置还包括:The device according to claim 13 or 14, wherein the device further comprises:
    虚拟试鞋模块,用于获取鞋子素材的三维虚拟模型;The virtual shoe-trying module is used to obtain the three-dimensional virtual model of the shoe material;
    基于所述待处理图像中脚部的三维位姿,在所述待处理图像中与所述脚部对应的位置叠加所述三维虚拟模型,得到虚拟试鞋的增强现实效果。Based on the three-dimensional pose of the foot in the image to be processed, the three-dimensional virtual model is superimposed on the position corresponding to the foot in the image to be processed to obtain an augmented reality effect of virtual shoe fitting.
  16. 一种电子设备,其特征在于,所述设备包括:An electronic device, characterized in that the device comprises:
    处理器;processor;
    用于存储处理器可执行指令的存储器;memory for storing processor-executable instructions;
    其中,所述处理器通过运行所述可执行指令以实现如权利要求1-12任一所述的图像处理方法。Wherein, the processor implements the image processing method according to any one of claims 1-12 by running the executable instructions.
  17. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序用于使处理器执行如权利要求1-12任一所述的图像处理方法。A computer-readable storage medium, characterized in that the storage medium stores a computer program, and the computer program is used to make a processor execute the image processing method according to any one of claims 1-12.
PCT/CN2022/111023 2021-08-19 2022-08-09 Image processing WO2023020327A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110955120.7A CN113627379A (en) 2021-08-19 2021-08-19 Image processing method, device, equipment and storage medium
CN202110955120.7 2021-08-19

Publications (1)

Publication Number Publication Date
WO2023020327A1 true WO2023020327A1 (en) 2023-02-23

Family

ID=78386695

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/111023 WO2023020327A1 (en) 2021-08-19 2022-08-09 Image processing

Country Status (2)

Country Link
CN (1) CN113627379A (en)
WO (1) WO2023020327A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113627379A (en) * 2021-08-19 2021-11-09 北京市商汤科技开发有限公司 Image processing method, device, equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013073324A (en) * 2011-09-27 2013-04-22 Dainippon Printing Co Ltd Image display system
CN110111415A (en) * 2019-04-25 2019-08-09 上海时元互联网科技有限公司 A kind of 3D intelligent virtual of shoes product tries method and system on
CN112257582A (en) * 2020-10-21 2021-01-22 北京字跳网络技术有限公司 Foot posture determination method, device, equipment and computer readable medium
CN112287869A (en) * 2020-11-10 2021-01-29 上海依图网络科技有限公司 Image data detection method and device
CN112926364A (en) * 2019-12-06 2021-06-08 北京四维图新科技股份有限公司 Head posture recognition method and system, automobile data recorder and intelligent cabin
CN113627379A (en) * 2021-08-19 2021-11-09 北京市商汤科技开发有限公司 Image processing method, device, equipment and storage medium

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109558865A (en) * 2019-01-22 2019-04-02 郭道宁 A kind of abnormal state detection method to the special caregiver of need based on human body key point
CN110008835B (en) * 2019-03-05 2021-07-09 成都旷视金智科技有限公司 Sight line prediction method, device, system and readable storage medium
CN111242973A (en) * 2020-01-06 2020-06-05 上海商汤临港智能科技有限公司 Target tracking method and device, electronic equipment and storage medium
CN111507806B (en) * 2020-04-23 2023-08-29 北京百度网讯科技有限公司 Virtual shoe test method, device, equipment and storage medium
CN111931720B (en) * 2020-09-23 2021-01-22 深圳佑驾创新科技有限公司 Method, apparatus, computer device and storage medium for tracking image feature points
CN112241731B (en) * 2020-12-03 2021-03-16 北京沃东天骏信息技术有限公司 Attitude determination method, device, equipment and storage medium
CN112614184A (en) * 2020-12-28 2021-04-06 清华大学 Object 6D attitude estimation method and device based on 2D detection and computer equipment
CN113239925A (en) * 2021-05-24 2021-08-10 北京有竹居网络技术有限公司 Text detection model training method, text detection method, device and equipment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013073324A (en) * 2011-09-27 2013-04-22 Dainippon Printing Co Ltd Image display system
CN110111415A (en) * 2019-04-25 2019-08-09 上海时元互联网科技有限公司 A kind of 3D intelligent virtual of shoes product tries method and system on
CN112926364A (en) * 2019-12-06 2021-06-08 北京四维图新科技股份有限公司 Head posture recognition method and system, automobile data recorder and intelligent cabin
CN112257582A (en) * 2020-10-21 2021-01-22 北京字跳网络技术有限公司 Foot posture determination method, device, equipment and computer readable medium
CN112287869A (en) * 2020-11-10 2021-01-29 上海依图网络科技有限公司 Image data detection method and device
CN113627379A (en) * 2021-08-19 2021-11-09 北京市商汤科技开发有限公司 Image processing method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN113627379A (en) 2021-11-09

Similar Documents

Publication Publication Date Title
US11238606B2 (en) Method and system for performing simultaneous localization and mapping using convolutional image transformation
KR102647351B1 (en) Modeling method and modeling apparatus using 3d point cloud
CN108028871B (en) Label-free multi-user multi-object augmented reality on mobile devices
US9495761B2 (en) Environment mapping with automatic motion model selection
Alexiadis et al. An integrated platform for live 3D human reconstruction and motion capturing
WO2019164498A1 (en) Methods, devices and computer program products for global bundle adjustment of 3d images
WO2013029675A1 (en) Method for estimating a camera motion and for determining a three-dimensional model of a real environment
CN111127524A (en) Method, system and device for tracking trajectory and reconstructing three-dimensional image
da Silveira et al. Dense 3D scene reconstruction from multiple spherical images for 3-DoF+ VR applications
Aly et al. Street view goes indoors: Automatic pose estimation from uncalibrated unordered spherical panoramas
WO2023020327A1 (en) Image processing
Vo et al. Spatiotemporal bundle adjustment for dynamic 3d human reconstruction in the wild
Alam et al. Pose estimation algorithm for mobile augmented reality based on inertial sensor fusion.
US10977810B2 (en) Camera motion estimation
Imre et al. Calibration of nodal and free-moving cameras in dynamic scenes for post-production
Price et al. Augmenting crowd-sourced 3d reconstructions using semantic detections
Yang et al. Vision-inertial hybrid tracking for robust and efficient augmented reality on smartphones
JP2023056466A (en) Global positioning device and method for global positioning
Laskar et al. Robust loop closures for scene reconstruction by combining odometry and visual correspondences
CN114445601A (en) Image processing method, device, equipment and storage medium
Garau et al. Unsupervised continuous camera network pose estimation through human mesh recovery
TWI811108B (en) Mixed reality processing system and mixed reality processing method
Masher Accurately scaled 3-D scene reconstruction using a moving monocular camera and a single-point depth sensor
Wang et al. DynOcc: Learning Single-View Depth from Dynamic Occlusion Cues
Almeida et al. Incremental reconstruction approach for telepresence or ar applications

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE