CN114693752A

CN114693752A - Data processing method, data processing device, storage medium and computer equipment

Info

Publication number: CN114693752A
Application number: CN202011624274.XA
Authority: CN
Inventors: 冉清; 李嗣旺; 高扬; 董源; 李嘉辉
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-12-30
Filing date: 2020-12-30
Publication date: 2022-07-01

Abstract

The invention discloses a data processing method, a data processing device, a storage medium and computer equipment. Wherein, the method comprises the following steps: collecting a color image and a depth image of a physical target object; according to the multiple depth images, a three-dimensional reconstruction model of the physical target object is obtained through fusion; determining a first pose of the virtual target object under a preset coordinate system and a second pose of the virtual object under the preset coordinate system according to the three-dimensional reconstruction model; and projecting the 3D key points of the virtual target object to an image plane according to the first pose to obtain 2D key points, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose and the color image. The invention solves the technical problems of high cost and low quality of training data in the process of acquiring the training data.

Description

Data processing method, data processing device, storage medium and computer equipment

Technical Field

The present invention relates to the field of images, and in particular, to a data processing method, apparatus, storage medium, and computer device.

Background

Augmented Reality (AR) try shoes are a technology based on computer vision and graphics, and can provide a new retail solution for the footwear industry. In the technology of the AR shoe test, a supervised learning method is generally adopted to detect the foot key points in the 2D image, but the supervised learning method needs abundant training data to enhance the accuracy and robustness of the detection algorithm. The training data comprises a shoe-trying image and pixel positions of 2D foot key points in the image, and in the related technology, a full-manual method of marking the training data frame by frame, a semi-automatic method of marking key frames or a full-automatic virtual rendering method are adopted for obtaining the training data. However, the method for acquiring training data has various problems, for example, the method for manually labeling has the problems of unstable cost and uncontrollable data quality, and the fully automatic virtual rendering method has the problems of poor reality of generated data and low training value due to the fact that samples tend to be identical.

In view of the above problems, no effective solution has been proposed.

Disclosure of Invention

The embodiment of the invention provides a data processing method, a data processing device, a storage medium and computer equipment, which at least solve the technical problems of high cost and low quality of training data in the process of acquiring the training data.

According to an aspect of an embodiment of the present invention, there is provided a data processing method including: collecting a color image and a depth image of a physical target object; determining a three-dimensional reconstruction model of the physical target object according to the depth image; determining a first pose of a virtual target object in a preset coordinate system and a second pose of a virtual object in the preset coordinate system according to the three-dimensional reconstruction model; and projecting the 3D key points of the virtual target object to an image plane to obtain 2D key points according to the first pose, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose and the color image.

Optionally, in a case that the depth image is multiple, determining a three-dimensional reconstruction model of the physical target object according to the multiple depth images includes: converting the plurality of depth images into a plurality of pieces of three-dimensional point cloud; establishing a point cloud pose graph according to the plurality of pieces of three-dimensional point clouds, wherein each piece of point cloud in the plurality of pieces of three-dimensional point clouds is a node in the point cloud pose graph; and registering the point cloud pose image to obtain a three-dimensional reconstruction model of the physical target object.

Optionally, registering the point cloud pose graph to obtain a three-dimensional reconstruction model of the physical target object, including: carrying out serialized registration on the point cloud pose graph by adopting an iterative closest point ICP (inductively coupled plasma) method to obtain a serialized registration result graph of the point cloud pose graph; and carrying out global registration on the serialized registration result graph to obtain a three-dimensional reconstruction model of the physical target object, wherein the pose of each point cloud in the three-dimensional reconstruction model is the pose under a global coordinate system.

Optionally, determining a first pose of the virtual target object in a predetermined coordinate system according to the three-dimensional reconstruction model includes: acquiring a third pose of the three-dimensional reconstruction model in the preset coordinate system; determining a transformation matrix for transforming the virtual target object to the three-dimensional reconstruction model; and determining the first pose of the virtual target object under the preset coordinate system according to the third pose and the transformation matrix.

Optionally, determining a transformation matrix for transforming the virtual target object to the three-dimensional reconstructed model comprises: aligning the virtual target object with the three-dimensional reconstruction model based on a point cloud rough registration method of a Fast Point Feature Histogram (FPFH) and a point cloud fine registration method of a point surface ICP to obtain an alignment parameter; and determining a transformation matrix of the virtual target object to the three-dimensional reconstruction model according to the alignment parameters.

Optionally, determining a second pose of the virtual object under the predetermined coordinate system comprises: acquiring the alignment relation between the virtual object and the virtual target object; and determining the second pose of the virtual object under the preset coordinate system according to the first pose and the alignment relation.

Optionally, projecting the 3D keypoints of the virtual target object to an image plane according to the first pose to obtain 2D keypoints, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose and the color image, including: projecting the 3D key points of the virtual target object to an image plane to obtain 2D key points according to the first pose and a preset imaging method; and calling a preset rendering engine to draw a result of the virtual object after interacting with the physical target object in the image plane according to the second posture and the color image, so as to obtain an interactive image of the virtual object and the physical target object in the image plane.

Optionally, according to the second pose and the color image, invoking a predetermined rendering engine to draw a result of the virtual object after interacting with the physical target object in the image plane, so as to obtain an interactive image of the virtual object and the physical target object in the image plane, including: and when a preset rendering engine is called to draw an interaction result of the virtual object and the physical target object in the image plane according to the second posture and the color image, a plurality of interaction images of the virtual object and the physical target object in the image plane are obtained by replacing a background and/or adjusting a light source.

Optionally, when the acquired color images and depth images are a plurality of color images and a plurality of depth images of the physical target object in different states and/or at different angles, obtaining 2D key points of the physical target object in different states and/or at different angles and interactive images with the virtual object respectively.

Optionally, generating training data according to the 2D key points and the interactive image; acquiring different training data corresponding to various physical target objects; and performing machine training by adopting different training data corresponding to various physical target objects to obtain a key point detection model.

According to another aspect of the embodiments of the present invention, there is also provided a data processing method, including: receiving an input image, wherein the input image comprises a physical target object; detecting 2D key points of the physical target object in the input image by using a key point detection model, wherein the key point detection model is obtained by adopting a plurality of groups of training data through machine training, and the data in the plurality of groups of training data comprises: the method comprises the steps that an interactive image of a physical target object and an object and 2D key points in the interactive image are obtained, the 2D key points are obtained by projecting 3D key points of a virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual target object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing a plurality of acquired depth images of the physical target object.

According to another aspect of the embodiments of the present invention, there is also provided a data processing method, including: acquiring a plurality of sets of training data, wherein data in the plurality of sets of training data comprises: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points are obtained by projecting 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing a plurality of acquired depth images of the physical target object; and performing machine training by adopting the multiple groups of training data to obtain a key point detection model.

According to still another aspect of the embodiments of the present invention, there is also provided a data processing apparatus including: the acquisition module is used for acquiring a color image and a depth image of a physical target object; the first processing module is used for obtaining a three-dimensional reconstruction model of the physical target object through fusion according to the plurality of depth images; the determining module is used for determining a first pose of the virtual target object in a preset coordinate system and determining a second pose of the virtual object in the preset coordinate system according to the three-dimensional reconstruction model; and the second processing module is used for projecting the 3D key points of the virtual target object to an image plane according to the first pose to obtain 2D key points, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose and the color image.

According to still another aspect of the embodiments of the present invention, there is also provided a data processing apparatus including: the device comprises a receiving module, a processing module and a display module, wherein the receiving module is used for receiving an input image, and the input image comprises a physical target object; a detection module, configured to detect a 2D keypoint of the physical target object in the input image by using a keypoint detection model, where the keypoint detection model is obtained by machine training using multiple sets of training data, and data in the multiple sets of training data includes: the method comprises the steps that an interactive image of a physical target object and an object and 2D key points in the interactive image are obtained, the 2D key points are obtained by projecting 3D key points of a virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual target object on the image plane according to a second pose of the virtual target object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing a plurality of acquired depth images of the physical target object.

According to still another aspect of the embodiments of the present invention, there is also provided a data processing apparatus including: an obtaining module, configured to obtain multiple sets of training data, where data in the multiple sets of training data includes: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points are obtained by projecting 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing a plurality of acquired depth images of the physical target object; and the training module is used for performing machine training by adopting the multiple groups of training data to obtain a key point detection model.

According to still another aspect of the embodiments of the present invention, there is provided a storage medium including a stored program, wherein when the program runs, a device on which the storage medium is located is controlled to execute any one of the data processing methods described above.

According to still another aspect of the embodiments of the present invention, there is also provided a computer apparatus, including: a memory and a processor, the memory storing a computer program; the processor is configured to execute the computer program stored in the memory, and when the computer program runs, the processor is enabled to execute any one of the data processing methods.

According to another aspect of the embodiments of the present invention, there is also provided a data processing method, including: collecting a color image and a depth image of a foot; determining a three-dimensional reconstruction foot model of the foot according to the depth image; determining a first pose of a virtual foot under a preset coordinate system and a second pose of a virtual shoe under the preset coordinate system according to the three-dimensional reconstruction foot model; and projecting the 3D foot key points of the virtual foot to an image plane to obtain 2D foot key points according to the first pose, and drawing a fitting image of the virtual shoe on the image plane according to the second pose and the color image.

In the embodiment of the invention, a mode of acquiring a color image and a depth image of a physical target object is adopted, a three-dimensional reconstruction model of the physical target object is obtained by fusing a plurality of depth images, a first pose of a virtual target object under a preset coordinate system is determined according to the three-dimensional reconstruction model, a second pose of a virtual object under the preset coordinate system is determined, a 3D key point of the virtual target object is projected to an image plane according to the first pose to obtain a 2D key point, an interactive image of the virtual object and the physical target object in the image plane is drawn according to the second pose and the color image, the purpose of generating the interactive image labeled with the 2D key point according to the image of the physical target object is achieved, the cost of acquiring training data is reduced, the technical effect of improving the quality of the training data is improved, and the problem of high cost of acquiring the training data is solved, the training data quality is not high.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:

fig. 1 is a block diagram of a hardware structure of a computer terminal for implementing a data processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a first data processing method according to embodiment 1 of the present invention;

FIG. 3 is a flowchart of a second data processing method according to embodiment 1 of the present invention;

FIG. 4 is a flowchart of a third data processing method according to embodiment 1 of the present invention;

FIG. 5 is a flowchart of a fourth data processing method according to embodiment 1 of the present invention;

FIG. 6 is a schematic diagram of a training data acquisition flow according to an alternative embodiment of the present invention;

FIG. 7 is a schematic diagram of obtaining a three-dimensional reconstructed model according to an alternative embodiment of the invention;

FIG. 8 is a schematic view of aligning a virtual shoe, a virtual foot and a reconstructed foot according to an alternative embodiment of the invention;

fig. 9 is a block diagram of a first configuration of a data processing apparatus according to embodiment 2 of the present invention;

fig. 10 is a block diagram of a second data processing apparatus according to embodiment 3 of the present invention;

FIG. 11 is a block diagram showing a third configuration of a data processing apparatus according to embodiment 4 of the present invention;

FIG. 12 is a block diagram showing a fourth configuration of a data processing apparatus according to embodiment 5 of the present invention;

fig. 13 is a block diagram of a computer terminal according to an embodiment of the present invention.

Detailed Description

In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

First, some terms or terms appearing in the description of the embodiments of the present application are applicable to the following explanations:

deep Learning (Deep Learning): deep learning, also known as model training, refers to an algorithm set for solving various problems such as images and texts by applying various machine learning algorithms on a multi-layer neural network. Deep learning can fall into neural networks in a broad category, but there are many variations in the specific implementation. The core of deep learning is feature learning, which aims to obtain hierarchical feature information through a hierarchical network, thereby solving the important problem that features need to be designed manually in the past.

6 DoF: and the pose with 6 degrees of freedom refers to translation of an object in space along x, y and z axes and rotation around the axes.

PnP: Perspectral-n-Point, n-Point Perspective (knowing the 2D-3D Point pair, solving 6DoF pose).

ICP: iterative closest point registration algorithm, for paired point cloud registration.

Pose Graph Optimization: and optimizing the pose graph and aiming at global point cloud registration.

3D key points: a plurality of keypoints on the three-dimensional model of the virtual target object.

2D key points: and 2D projection points of the 3D key points on the imaging plane.

Virtual target object: a 3D model of the target object made by the designer.

Virtual object: and (4) carrying out parameterized alignment on the 3D model of the object manufactured by the art designer and the virtual target object.

Example 1

There is also provided, in accordance with an embodiment of the present invention, an embodiment of a data processing method, it should be noted that the steps illustrated in the flowchart of the accompanying drawings may be performed in a computer system such as a set of computer executable instructions, and that, although a logical order is illustrated in the flowchart, in some cases, the steps illustrated or described may be performed in an order different than that herein.

The method provided by the embodiment 1 of the present application can be executed in a mobile terminal, a computer terminal or a similar computing device. Fig. 1 shows a hardware configuration block diagram of a computer terminal (or mobile device) for implementing a data processing method. As shown in fig. 1, the computer terminal 10 (or mobile device) may include one or more (shown as 102a, 102b, … …, 102 n) processors 102 (processor 102 may include, but is not limited to, a processing device such as a microprocessor MCU or a programmable logic device FPGA, etc.), memory 104 for storing data, and a transmission device for communication functions. Besides, the method can also comprise the following steps: a display, an input/output interface (I/O interface), a Universal Serial BUS (USB) port (which may be included as one of the ports of the BUS), a network interface, a power source, and/or a camera. It will be understood by those skilled in the art that the structure shown in fig. 1 is only an illustration and is not intended to limit the structure of the electronic device. For example, the computer terminal 10 may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

It should be noted that the one or more processors 102 and/or other data processing circuitry described above may be referred to generally herein as "data processing circuitry". The data processing circuitry may be embodied in whole or in part in software, hardware, firmware, or any combination thereof. Further, the data processing circuit may be a single stand-alone processing module, or incorporated in whole or in part into any of the other elements in the computer terminal 10 (or mobile device). As referred to in the embodiments of the application, the data processing circuit acts as a processor control (e.g. selection of a variable resistance termination path connected to the interface).

The memory 104 may be used to store software programs and modules of application software, such as program instructions/data storage devices corresponding to the data processing method in the embodiment of the present invention, and the processor 102 executes various functional applications and data processing by running the software programs and modules stored in the memory 104, that is, implementing the vulnerability detection method of the application program. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to the computer terminal 10 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device is used for receiving or transmitting data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal 10. In one example, the transmission device includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device may be a Radio Frequency (RF) module, which is used for communicating with the internet in a wireless manner.

The display may be, for example, a touch screen type Liquid Crystal Display (LCD) that may enable a user to interact with a user interface of the computer terminal 10 (or mobile device).

Under the operating environment, the application provides a data processing method as shown in fig. 2. Fig. 2 is a flowchart of a first data processing method according to embodiment 1 of the present invention. As shown in fig. 2, the method comprises the steps of:

step S202, collecting a color image and a depth image of a physical target object;

step S204, according to the plurality of depth images, a three-dimensional reconstruction model of the physical target object is obtained through fusion;

step S206, determining a first pose of the virtual target object under a preset coordinate system and a second pose of the virtual object under the preset coordinate system according to the three-dimensional reconstruction model;

and S208, projecting the 3D key points of the virtual target object to an image plane according to the first pose to obtain 2D key points, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose and the color image.

Through the steps, the purpose of establishing the three-dimensional reconstruction model according to the image of the physical target object and generating the interactive image which is marked with the 2D key point and is used for the physical target object and the virtual object according to the established three-dimensional reconstruction model is achieved, so that the technical effects of reducing the cost of acquiring the training data and improving the quality of the training data are achieved, and the technical problems of high cost and low quality of the training data in acquiring the training data are solved.

As an alternative embodiment, in the case of acquiring an image of a physical target object, the acquisition of the image of the physical target object may be obtained by camera shooting, vector diagram, or the like, and by using the above method, an interactive image of a virtual object including 2D keypoints and the physical target object in an image plane may be accurately generated, and the above image is used as training data to train the keypoint detection model. According to the method, the interactive image of the virtual object on the physical target object is labeled with the 2D key points, so that the problems of low efficiency and difficulty in controlling the error rate when the 2D key points are labeled manually are solved. Meanwhile, the real data of the physical target object is used in the process of generating the interactive image of the virtual object and the physical target object, so that the generated interactive image of the virtual object and the physical target object is more vivid and can play a better training effect as training data.

As an alternative embodiment, when the color image and the depth image of the physical target object are acquired, it is only necessary to acquire the information of the physical target object from multiple angles. Therefore, the number of the collected color images can be one or more, and the depth image can be one or more. For example, when a color image of a physical target object is used, if a panoramic camera is used to shoot the physical target object, information of multiple view angles of the physical target object can be obtained; if the non-panoramic camera is adopted to shoot the physical target object, the physical target object can be shot for multiple times from multiple angles, and information of multiple visual angles of the physical target object can be obtained. When the depth image is adopted, the same processing mode as the above-mentioned color image collection can be adopted. For example, acquiring a color image and acquiring a depth image may be performed simultaneously.

As an optional embodiment, when the collected color image and depth image are a plurality of color images and a plurality of depth images of the physical target object in different states and/or at different angles, the 2D key points and the interactive images with the physical target object in different states and/or at different angles of the physical target object are obtained respectively. The method comprises the steps of obtaining accurate 2D key points by adopting image data of various physical target objects through the corresponding relation of a three-dimensional reconstruction model, obtaining interactive images of the physical target objects and the virtual objects in different states (for example, when the physical target objects are feet, the different states of the feet comprise the states of light feet or socks), different types, different postures and different visual angles when the interactive images of the virtual objects and the physical target objects are drawn, providing accurate and various training data for the subsequent training process of the key point monitoring model, and making up the problem that the interactive images of the virtual objects and the physical target objects in an image plane as the training data often lack some special visual angles. For example, the wearing images at the toe visual angle, the heel visual angle and the lateral flat visual angle are very few in the existing image training set, and the embodiment can effectively make up for the loss of the training data and provide richer training materials.

The point cloud data can obtain an accurate topological structure and a geometric structure of an object with low storage cost, meanwhile, in the process of shooting a physical target object, complete geometric information of the physical target object can not be easily obtained through single scanning, the physical target object needs to be shot for multiple times from different angles, and therefore a three-dimensional reconstruction model corresponding to the physical target object can be established in a mode of registering multiple groups of point clouds corresponding to multiple physical target object images. As an alternative embodiment, when there are a plurality of depth images, obtaining a three-dimensional reconstruction model of a physical target object according to fusion of the plurality of depth images may be implemented in the following alternative ways: converting the depth images into a plurality of pieces of three-dimensional point clouds, establishing a point cloud pose graph according to the three-dimensional point clouds, wherein each point cloud in the three-dimensional point clouds is a node in the point cloud pose graph, and then registering the point cloud pose graph to obtain a three-dimensional reconstruction model of the physical target object.

As an alternative embodiment, the type of the predetermined coordinate system may be various, for example, the depth image is obtained by the same photographing, and the coordinate system related to the camera is various, for example, including a pixel plane coordinate system (u, v), an image plane coordinate system, an image physical coordinate system (x, y), a camera coordinate system (Xc, Yc, Zc), and a world coordinate system. A camera coordinate system is used here. The camera coordinate system is a three-dimensional rectangular coordinate system established by taking the focusing center of the camera as an origin and taking the optical axis as a Z axis. The origin of the camera coordinate system is the optical center of the camera, the x-axis and the y-axis are parallel to the X, Y axis of the image, and the z-axis is the optical axis of the camera, which is perpendicular to the graphics plane. The intersection point of the optical axis and the image plane is the origin of the image coordinate system, and the image coordinate system is a two-dimensional rectangular coordinate system.

As an alternative embodiment, the point cloud pose graph may be registered in the following manner to obtain a three-dimensional reconstruction model of the physical target object: carrying out serialized registration on the point cloud pose graph by adopting an iterative closest point ICP (inductively coupled plasma) method to obtain a serialized registration result graph of the point cloud pose graph; and carrying out global registration on the serialized registration result graph to obtain a three-dimensional reconstruction model of the physical target object, wherein the pose of each point cloud in the three-dimensional reconstruction model is the pose under a global coordinate system. Due to the fact that the visual angles and the shielding relations of the images obtained by shooting the physical target object are different, point clouds corresponding to the images are not in the same coordinate system, the point cloud pose images are subjected to sequential registration and overall registration, pose data obtained by shooting the physical target object at different shooting angles can be converted into the same preset coordinate system, and the construction of a three-dimensional reconstruction model of the physical target object is achieved.

As an optional embodiment, when the three-dimensional reconstruction model of the physical target object is obtained through a plurality of depth images, since the plurality of depth images have camera coordinate systems respectively corresponding to the plurality of depth images, in order to establish an accurate three-dimensional reconstruction model, a uniform coordinate system may be selected for the plurality of depth images for global registration of the plurality of depth images. The manner of selecting the unified coordinate system may be various, for example, the camera coordinate system when the first depth image is shot may be selected as the unified coordinate system, or the camera coordinate system corresponding to the shooting of the depth image at another time point may be selected as the unified coordinate system, and the unified coordinate system may be flexibly selected according to the requirement.

As an alternative embodiment, determining the first pose of the virtual target object under the predetermined coordinate system according to the three-dimensional reconstruction model may be implemented as follows: acquiring a third pose of the three-dimensional reconstruction model in a preset coordinate system; determining a transformation matrix for transforming the virtual target object to the three-dimensional reconstruction model; and determining the first pose of the virtual target object under the preset coordinate system according to the third pose and the transformation matrix. Specifically, under the condition that the virtual target object and the three-dimensional reconstruction model are already aligned, a transformation matrix of the virtual target object and the three-dimensional reconstruction model can be obtained. The third pose of the three-dimensional reconstruction model in the preset coordinate system can be obtained by solving in the process of overall registration of the three-dimensional reconstruction model, so that the pose of the virtual target object in the preset coordinate system can be obtained by transforming the virtual target object through the transformation matrix with the three-dimensional reconstruction model as a reference.

The Fast Point Feature Histograms (FPFH) is a simplified form of a Point Feature Histogram (PFH) calculation method, and the Fast Point Feature Histograms (FPFH) are obtained by calculating the simplified Point Feature histogram of each Point in the k neighborhood of a query Point, and then weighting all SPFH into the final Fast Point Feature histogram by a formula. The FPFH may be used for registration between three-dimensional point clouds. An Iterative Closest Point algorithm (ICP for short) can find the Closest Point in the target Point cloud and the source Point cloud with matching respectively according to a certain constraint condition, and then calculate a better matching parameter so that an error function is minimum. Both the FPFH method and the ICP method can achieve registration between three-dimensional point clouds.

As an optional embodiment, the virtual target object and the three-dimensional reconstruction model may be aligned to obtain an alignment parameter based on a point cloud coarse registration method of a fast point feature histogram FPFH and a point cloud fine registration method of a point surface ICP, and then a transformation matrix for transforming the virtual target object to the three-dimensional reconstruction model is determined according to the alignment parameter. The virtual target object can be transformed through the transformation matrix, and the pose of the virtual target object in a preset coordinate system is obtained.

As an optional embodiment, after the pose of the three-dimensional reconstruction model is determined and the virtual object is aligned with the three-dimensional reconstruction model, the alignment of the virtual object, the virtual object and the three-dimensional reconstruction model is realized according to the alignment relationship between the virtual object and the virtual object. The alignment relationship between the virtual target object and the virtual object can be predefined. After the alignment of the virtual target object and the three-dimensional reconstruction model is realized, the data of the virtual object is adjusted according to the predefined alignment relation, the alignment of the virtual object and the three-dimensional reconstruction model is realized, and the pose of the virtual object in a preset coordinate system is obtained.

As an alternative embodiment, the second position of the virtual object in the predetermined coordinate system may be determined by: acquiring an alignment relation between a virtual object and a virtual target object; and determining a second pose of the virtual object in a preset coordinate system according to the first pose and the alignment relation.

The training data of the keypoint monitoring model includes interactive images of the 2D keypoints and the virtual object. As an optional embodiment, the training data may be obtained by: projecting the 3D key points of the virtual target object to an image plane to obtain 2D key points according to the first pose; and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second posture and the color image. Specifically, the following method may be adopted to obtain the key points on the 2D image: and according to the first pose and a camera imaging method, projecting the 3D key points of the virtual target object to an image plane to obtain 2D key points. The interactive image of the virtual object and the physical target object in the image plane can be obtained by adopting the following method: and calling a preset rendering engine to draw an interaction result of the virtual object and the physical target object in the image plane according to the second position and the color image to obtain an interaction image of the virtual object and the physical target object in the image plane.

As an optional implementation mode, the 3D key points of the virtual target object can be projected to the image plane based on the perspective principle to obtain the 2D key points, the obtained image is labeled with the key points, manual labeling is not needed, the labeling effect is accurate, and manpower is greatly saved.

For example, a predetermined rendering engine, such as blend, may be invoked to render the interaction result of the virtual object with the physical target object in the image plane according to the second pose of the virtual object in the predetermined coordinate system. Because the virtual object is aligned with the three-dimensional reconstruction model generated according to the physical target object, the interactive image drawn according to the virtual object and the physical target object is very vivid, and the reality of the generated interactive effect image of the virtual object and the physical target object in the image plane is improved. In addition, when the rendering engine is used for drawing the effect image of the virtual object interacting with the physical target object on the image plane, a plurality of images of the virtual object interacting with the physical target object on the image plane can be obtained by replacing the background and/or adjusting the light source. Through replacing the background and/or adjusting the light source, the interactive image materials of the virtual object and the physical target object, which can be obtained, can be greatly enriched, on one hand, the workload of measuring a large number of physical target objects and generating the virtual object images interacting with the physical target object can be avoided, on the other hand, various image materials comprising different types and physical target object interaction environments can be obtained, the training database is enriched, and the better training effect can be obtained for the training of the key point detection model.

As an alternative embodiment, training data is generated according to the 2D keypoints and the interactive images; acquiring different training data corresponding to various physical target objects; and performing machine training by adopting different training data corresponding to various physical target objects to obtain a key point detection model. By the method, the purpose of training the key point detection model by using abundant training data including the 2D key points and the interactive images can be achieved, the 2D key points in the used training data do not depend on subjective judgment of manual marking, the SOTA detection result of the key point detection model is used as a standard, and the marking precision of the 2D key points of the method provided by the embodiment can reach over 90%. In addition, better training effect can be achieved by enriching the training database to help the key point detection model.

As an alternative embodiment, the physical target object and the virtual object may comprise a combination of any of the following: feet and virtual shoes; a head and a virtual hat; a wrist and a virtual bracelet; human body and virtual apparel; face and virtual face ornaments. Different target objects and virtual objects may interact in different ways in different scenarios, for example, generating an image of a foot fitting a virtual shoe, or an image of a head fitting a virtual hat.

Fig. 3 is a flowchart of a second data processing method according to embodiment 1 of the present invention, and as shown in fig. 3, the method includes the following steps:

step S302, receiving an input image, wherein the input image comprises a physical target object;

step S304, detecting 2D key points of a physical target object in an input image by using a key point detection model, wherein the key point detection model is obtained by adopting a plurality of groups of training data through machine training, and the data in the plurality of groups of training data comprises: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points project 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object.

Through the steps, the adopted key point detection model is obtained by performing machine training by using a large amount of training data, the training data is subjected to accurate 2D key point labeling, and meanwhile, the interactive images in the training data are very vivid and various in type, so that the technical effect of accurately detecting the 2D key points of the physical target object in the received input image comprising the physical target object by adopting the key point detection model is realized, and the technical problem of inaccurate detection of the 2D key points of the physical target object in the input image is solved.

Fig. 4 is a flowchart of a third data processing method according to embodiment 1 of the present invention, as shown in fig. 4, the method includes the following steps:

step S402, acquiring a plurality of sets of training data, wherein the data in the plurality of sets of training data includes: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points project 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object;

and S404, performing machine training by adopting multiple groups of training data to obtain a key point detection model.

Through the steps, the purpose of obtaining the key point detection model by adopting multiple groups of training data to perform machine training is achieved. Because the training data are accurately labeled with the 2D key points, and the interactive images in the training data are vivid and various in types, the training data can achieve a good training effect, and the technical problems that the training result of the key point detection model is not ideal due to less training materials of the 2D key point detection model and inaccurate manual labeling are solved.

Fig. 5 is a flowchart of a data processing method four according to embodiment 1 of the present invention. As shown in fig. 5, the method includes the steps of:

step S502, collecting color images and depth images of feet;

step S504, determining a three-dimensional reconstruction foot model of the foot according to the depth image;

step S506, determining a first pose of the virtual foot in a preset coordinate system and a second pose of the virtual shoe in the preset coordinate system according to the three-dimensional reconstruction foot model;

and step S508, projecting the 3D foot key points of the virtual foot to an image plane according to the first pose to obtain 2D foot key points, and drawing a fitting image of the virtual shoe on the image plane according to the second pose and the color image.

Through the steps, the purpose of establishing the three-dimensional reconstruction model according to the foot image and generating the try-on image labeled with the 2D key points and the virtual shoe on the image plane according to the established three-dimensional reconstruction model is achieved, so that the technical effects of reducing the cost of acquiring training data and improving the quality of the training data are achieved, and the technical problems of high cost and low quality of the training data in acquiring the training data are solved.

As an alternative embodiment, the following description will be given taking the leg as an example.

FIG. 6 is a schematic diagram of a training data acquisition process according to an alternative embodiment of the present invention. As shown in FIG. 6, training data including virtual shoe try-on images and 2D foot keypoints may be obtained by:

s1, performing foot modeling based on RGB color data and depth data of a real foot acquired by the RGBD camera to obtain a 3D foot model of the real foot;

s2, aligning the virtual foot model according to the 3D foot model;

s3, aligning the virtual shoe model with the virtual foot model based on the virtual foot model aligned with the 3D foot model;

s4, determining 2D foot key points obtained by projecting the 3D foot key points on the virtual foot model on the image according to the projection rule of the virtual foot model on the camera image;

and S5, according to the virtual shoe model and the RGB image of the real foot, rendering by using a rendering engine to obtain a virtual shoe fitting image, wherein the virtual shoe fitting image also comprises the label of the key point of the 2D foot.

Fig. 7 is a schematic flow chart of a three-dimensional registration of a real foot according to an alternative embodiment of the invention. As shown in fig. 7, the three-dimensional registration of the real foot can be achieved by the following steps:

and S1, acquiring multi-view image data of the real foot to obtain a local color image and depth information corresponding to the pixels.

And S2, converting the depth information of the multi-view image into three-dimensional point clouds, wherein each three-dimensional point cloud is positioned under a camera coordinate system of a corresponding image, and then establishing a point cloud pose graph in the acquisition process, wherein each point cloud forms a node in the pose graph.

S3, performing serialization registration, if the point clouds are overlapped enough, adding a connecting edge to the point clouds, wherein the connecting edge is divided into an adjacent edge and a loop edge, the adjacent edge connects two nodes Pi and Pi +1 which are adjacent in time sequence, the two nodes Pi and Pj which are connected by the loop edge have no adjacent time sequence requirement, and adding is performed when the overlapped area meets a certain threshold; when the pose graph is established, the weighted values of all the connecting edges and the corresponding transformation matrixes need to be solved, the solution is realized through an ICP point cloud registration algorithm and is also a serialization registration process;

s4, global registration is carried out on the basis of the pose graph, and an energy function is defined as a formula:

wherein { Pk } is the quantity to be solved, namely the pose of each point cloud under the global coordinate system, { xk } is observed point cloud data, and ek (-) is a cost function defined between the observed quantity and the quantity to be solved. And solving the energy function by adopting an LM (linear modeling) nonlinear optimization method to finally obtain the pose of each point cloud under a global coordinate system. As an optional embodiment, a coordinate system where the point cloud P0 with the recording start time t being 0 is located may be a global coordinate system, a 6DoF pose of each point cloud under the point cloud P0 coordinate system is obtained after global registration solving, after transformation is performed according to a pose result, the point clouds at all times are aligned to the point cloud P0 coordinate system, and a reconstructed foot model is obtained after fusion.

As an alternative, when performing 3D reconstruction on a real foot, the global registration process may also only adopt time-series sequential registration, that is, registration of two adjacent frames is performed on the collected point clouds { x0, x1, x2,. once.. xi, xi +1,. once.. xn }, and then the pose of each point cloud in the global coordinate system (x0) is obtained through transformation relation transfer. However, in the above method, due to the accumulation of errors during the transfer process, a large drift error is easily generated, i.e., the registration error between x0 and xn is significant.

FIG. 8 is a schematic view of aligning a virtual shoe, a virtual foot, and a reconstructed foot according to an alternative embodiment of the invention. As shown in fig. 8, the alignment of the three of the virtual shoe, the virtual foot and the reconstructed foot can be achieved as follows: firstly, predefining the relation between the aligned virtual shoes and the virtual feet; then, aligning the virtual feet and the reconstruction feet, wherein the alignment process can be based on a point cloud rough registration method of FPFH (field programmable gate flash) and a point cloud fine registration method of a point surface ICP (inductively coupled plasma), so as to obtain a transformation matrix, and then aligning the virtual feet and the reconstruction feet based on the transformation matrix; and finally, based on the alignment relation between the virtual shoes and the virtual feet, adjusting the pose of the virtual shoes to obtain the virtual shoes aligned with the virtual feet, wherein the virtual feet are aligned with the reconstructed feet, so that the alignment between the obtained virtual shoes and the reconstructed feet is realized.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the invention. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required by the invention.

Through the above description of the embodiments, those skilled in the art can clearly understand that the data processing method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

Example 2

According to an embodiment of the present invention, there is further provided a data processing apparatus for implementing the first data processing method, and fig. 9 is a block diagram of a first data processing apparatus according to embodiment 2 of the present invention, and as shown in fig. 9, the apparatus includes: a first acquisition module 92, a first processing module 94, a first determination module 96, and a second processing module 98. The data processing apparatus will be described in detail below.

A first acquisition module 92 for acquiring a color image and a depth image of a physical target object;

a first processing module 94, connected to the first collecting module 92, for obtaining a three-dimensional reconstruction model of the physical target object by fusion according to the depth image;

a first determining module 96, connected to the first processing module 94, for determining a first pose of the virtual target object in the predetermined coordinate system and a second pose of the virtual object in the predetermined coordinate system according to the three-dimensional reconstruction model;

and a second processing module 98, connected to the first determining module 96, for projecting the 3D keypoints of the virtual target object to the image plane according to the first pose to obtain 2D keypoints, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose and the color image.

It should be noted here that the first acquiring module 92, the first processing module 94, the first determining module 96 and the second processing module 98 correspond to steps S202 to S208 in embodiment 1, and the modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 3

According to an embodiment of the present invention, there is further provided a data processing apparatus for implementing the second data processing method, where fig. 10 is a block diagram of a second data processing apparatus according to embodiment 3 of the present invention, and as shown in fig. 10, the apparatus includes: a receiving module 1002 and a detecting module 1004. The second data processing apparatus will be described in detail below.

A receiving module 1002, configured to receive an input image, where the input image includes a physical target object;

the detecting module 1004 is connected to the receiving module 1002, and configured to detect a 2D keypoint of a physical target object in an input image by using a keypoint detection model, where the keypoint detection model is obtained by using multiple sets of training data through machine training, and data in the multiple sets of training data includes: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points project 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual target object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object.

It should be noted here that the receiving module 1002 and the detecting module 1004 correspond to steps S302 to S304 in embodiment 1, and the modules are the same as the corresponding steps in the implementation example and application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 4

According to an embodiment of the present invention, there is further provided a data processing apparatus for implementing the third data processing method, where fig. 11 is a block diagram of a third data processing apparatus according to embodiment 4 of the present invention, and as shown in fig. 11, the apparatus includes: an acquisition module 1102 and a training module 1104. The data processing apparatus three will be described in detail below.

An obtaining module 1102, configured to obtain multiple sets of training data, where data in the multiple sets of training data includes: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points project 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object;

and the training module 1104 is connected to the obtaining module 1102 and is configured to perform machine training by using multiple sets of training data to obtain a key point detection model.

It should be noted here that the acquiring module 1102 and the training module 1104 correspond to steps S402 to S404 in embodiment 1, and the modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 5

According to an embodiment of the present invention, there is further provided a data processing apparatus for implementing the fourth data processing method, and fig. 12 is a block diagram of a fourth data processing apparatus according to embodiment 5 of the present invention, and as shown in fig. 12, the apparatus includes: a second acquisition module 1202, a third processing module 1204, a second determination module 1206, and a fourth processing module 1208. The data processing device four will be described in detail below.

A second collecting module 1202, configured to collect a color image and a depth image of the foot;

a third processing module 1204, connected to the second collecting module 1202, for determining a three-dimensional reconstructed foot model of the foot according to the depth image;

a second determining module 1206, connected to the third processing module 1204, for determining a first pose of the virtual foot in the predetermined coordinate system and a second pose of the virtual shoe in the predetermined coordinate system according to the three-dimensional reconstructed foot model;

and a fourth processing module 1208, connected to the second determining module 1206, for projecting the 3D foot key points of the virtual foot to the image plane according to the first pose to obtain 2D foot key points, and drawing a fitting image of the virtual shoe on the image plane according to the second pose and the color image.

It should be noted here that the second acquiring module 1202, the third processing module 1204, the second determining module 1206 and the fourth processing module 1208 correspond to steps S502 to S508 in embodiment 1, and a plurality of modules are the same as the corresponding steps in the implementation example and the application scenario, but are not limited to the disclosure in embodiment 1. It should be noted that the above modules may be operated in the computer terminal 10 provided in embodiment 1 as a part of the apparatus.

Example 6

The embodiment of the invention can provide a computer terminal which can be any computer terminal device in a computer terminal group. Optionally, in this embodiment, the computer terminal may also be replaced with a terminal device such as a mobile terminal.

Optionally, in this embodiment, the computer terminal may be located in at least one network device of a plurality of network devices of a computer network.

In this embodiment, the computer terminal may execute program codes of the following steps in the data processing method of the application program: collecting a color image and a depth image of a physical target object; according to the multiple depth images, a three-dimensional reconstruction model of the physical target object is obtained through fusion; determining a first pose of the virtual target object under a preset coordinate system and a second pose of the virtual object under the preset coordinate system according to the three-dimensional reconstruction model; and projecting the 3D key points of the virtual target object to an image plane according to the first pose to obtain 2D key points, and drawing an interactive image of the virtual object on the image plane according to the second pose and the color image.

Alternatively, fig. 13 is a block diagram of a computer terminal according to an embodiment of the present invention. As shown in fig. 13, the computer terminal may include: one or more processors 1302, only one of which is shown, memory 1304, and the like.

The memory 1304 may be used to store software programs and modules, such as program instructions/modules corresponding to the data processing method and apparatus in the embodiments of the present invention, and the processor 1302 executes various functional applications and data processing by running the software programs and modules stored in the memory, that is, implementing the data processing method described above. The memory 1304 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 1304 can further include memory remotely located from the processor, which can be connected to a computer terminal through a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The processor can call the information and application program stored in the memory through the transmission device to execute the following steps: collecting a color image and a depth image of a physical target object; according to the multiple depth images, a three-dimensional reconstruction model of the physical target object is obtained through fusion; determining a first pose of the virtual target object under a preset coordinate system and a second pose of the virtual object under the preset coordinate system according to the three-dimensional reconstruction model; and projecting the 3D key points of the virtual target object to an image plane according to the first pose to obtain 2D key points, and drawing an interactive image of the virtual object on the image plane according to the second pose and the color image.

Optionally, the processor may further execute the program code of the following steps: under the condition that a plurality of depth images are available, a three-dimensional reconstruction model of a physical target object is obtained by fusion according to the plurality of depth images, and the three-dimensional reconstruction model comprises the following steps: converting the plurality of depth images into a plurality of pieces of three-dimensional point cloud; establishing a point cloud pose graph according to a plurality of pieces of three-dimensional point clouds, wherein each piece of point cloud in the plurality of pieces of three-dimensional point clouds is a node in the point cloud pose graph; and registering the point cloud pose graph to obtain a three-dimensional reconstruction model of the physical target object.

Optionally, the processor may further execute the program code of the following steps: registering the point cloud pose graph to obtain a three-dimensional reconstruction model of the physical target object, wherein the three-dimensional reconstruction model comprises the following steps: carrying out serialized registration on the point cloud pose graph by adopting an iterative closest point ICP (inductively coupled plasma) method to obtain a serialized registration result graph of the point cloud pose graph; and carrying out global registration on the serialized registration result graph to obtain a three-dimensional reconstruction model of the physical target object, wherein the pose of each point cloud in the three-dimensional reconstruction model is the pose under a global coordinate system.

Optionally, the processor may further execute the program code of the following steps: determining a first pose of the virtual target object under a predetermined coordinate system according to the three-dimensional reconstruction model, comprising: acquiring a third pose of the three-dimensional reconstruction model in a preset coordinate system; determining a transformation matrix for transforming the virtual target object to the three-dimensional reconstruction model; and determining the first pose of the virtual target object under the preset coordinate system according to the third pose and the transformation matrix.

Optionally, the processor may further execute the program code of the following steps: determining a transformation matrix for transforming the virtual target object to the three-dimensional reconstructed model, comprising: aligning a virtual target object with a three-dimensional reconstruction model based on a point cloud rough registration method of a Fast Point Feature Histogram (FPFH) and a point cloud fine registration method of a point surface ICP to obtain alignment parameters; and determining a transformation matrix of the virtual target object to the three-dimensional reconstruction model according to the alignment parameters.

Optionally, the processor may further execute the program code of the following steps: determining a second position of the virtual object in the predetermined coordinate system, comprising: acquiring an alignment relation between a virtual object and a virtual target object; and determining a second pose of the virtual object under a preset coordinate system according to the first pose and the alignment relation.

Optionally, the processor may further execute the program code of the following steps: projecting the 3D key points of the virtual target object to an image plane according to the first pose to obtain 2D key points, and drawing an interactive image of the virtual object and the physical target object on the image plane according to the second pose and the color image, wherein the interactive image comprises: according to the first pose and a camera imaging method, projecting 3D key points of a virtual target object to an image plane to obtain 2D key points; and calling a preset rendering engine to draw an interaction result of the virtual object and the physical target object in the image plane according to the second position and the color image to obtain an interaction image of the virtual object and the physical target object in the image plane.

Optionally, the processor may further execute the program code of the following steps: according to the second position and the color image, calling a preset rendering engine to draw the interactive effect of the virtual object on the image plane to obtain an interactive image of the virtual object on the image plane, wherein the interactive image comprises: and when a preset rendering engine is called to draw an interaction result of the virtual object and the physical target object in the image plane according to the second posture and the color image, a plurality of interaction images of the virtual object and the physical target object in the image plane are obtained by replacing the background and/or adjusting the light source.

Optionally, the processor may further execute the program code of the following steps: and under the condition that the collected color images and depth images are a plurality of color images and a plurality of depth images of the physical target object in different states and/or at different angles, respectively obtaining the 2D key points of the physical target object in different states and/or at different angles and the interactive images with the virtual object.

Optionally, the processor may further execute the program code of the following steps: generating training data according to the 2D key points and the interactive images; acquiring different training data corresponding to various physical target objects; and performing machine training by adopting different training data corresponding to various physical target objects to obtain a key point detection model.

Optionally, the processor may further execute the program code of the following steps: the physical target object and the virtual object comprise a combination of any one of: feet and virtual shoes; a head and a virtual hat; a wrist and a virtual bracelet; human body and virtual apparel; faces and virtual facial fittings.

Optionally, the processor may further execute the program code of the following steps: receiving an input image, wherein the input image comprises a physical target object; the method comprises the following steps of detecting 2D key points of a physical target object in an input image by adopting a key point detection model, wherein the key point detection model is obtained by adopting a plurality of groups of training data through machine training, and the data in the plurality of groups of training data comprise: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points project 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object.

Optionally, the processor may further execute the program code of the following steps: acquiring a plurality of sets of training data, wherein the data in the plurality of sets of training data comprises: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points project 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object; and performing machine training by adopting multiple groups of training data to obtain a key point detection model.

Optionally, the processor may further execute the program code of the following steps: collecting a color image and a depth image of a foot; determining a three-dimensional reconstruction foot model of the foot according to the depth image; determining a first pose of the virtual foot in a preset coordinate system and a second pose of the virtual shoe in the preset coordinate system according to the three-dimensional reconstruction foot model; and projecting the 3D foot key points of the virtual foot to an image plane according to the first pose to obtain 2D foot key points, and drawing a fitting image of the virtual shoe on the image plane according to the second pose and the color image.

It can be understood by those skilled in the art that the structure shown in fig. 13 is only an illustration, and the computer terminal may also be a terminal device such as a smart phone (e.g., an Android phone, an iOS phone, etc.), a tablet computer, a palmtop computer, a Mobile Internet Device (MID), a PAD, and the like. Fig. 13 is a diagram illustrating a structure of the electronic device. For example, the computer terminal may also include more or fewer components (e.g., network interfaces, display devices, etc.) than shown in FIG. 13, or have a different configuration than shown in FIG. 13.

Those skilled in the art will appreciate that all or part of the steps in the methods of the above embodiments may be implemented by a program instructing hardware associated with the terminal device, where the program may be stored in a computer-readable storage medium, and the storage medium may include: flash disks, Read-Only memories (ROMs), Random Access Memories (RAMs), magnetic or optical disks, and the like.

Example 7

The embodiment of the invention also provides a storage medium. Alternatively, in this embodiment, the storage medium may be configured to store the program code executed by the data processing method provided in embodiment 1.

Optionally, in this embodiment, the storage medium may be located in any one of computer terminals in a computer terminal group in a computer network, or in any one of mobile terminals in a mobile terminal group.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: collecting a color image and a depth image of a physical target object; according to the depth image, a three-dimensional reconstruction model of the physical target object is obtained through fusion; determining a first pose of the virtual target object under a preset coordinate system and a second pose of the virtual object under the preset coordinate system according to the three-dimensional reconstruction model; and projecting the 3D key points of the virtual target object to an image plane according to the first pose to obtain 2D key points, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose and the color image.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: under the condition that a plurality of depth images are available, a three-dimensional reconstruction model of a physical target object is obtained by fusion according to the plurality of depth images, and the three-dimensional reconstruction model comprises the following steps: converting the plurality of depth images into a plurality of pieces of three-dimensional point cloud; establishing a point cloud pose graph according to a plurality of pieces of three-dimensional point clouds, wherein each piece of point cloud in the plurality of pieces of three-dimensional point clouds is a node in the point cloud pose graph; and registering the point cloud pose graph to obtain a three-dimensional reconstruction model of the physical target object.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: registering the point cloud pose graph to obtain a three-dimensional reconstruction model of the physical target object, wherein the three-dimensional reconstruction model comprises the following steps: carrying out serialized registration on the point cloud pose graph by adopting an iterative closest point ICP (inductively coupled plasma) method to obtain a serialized registration result graph of the point cloud pose graph; and carrying out global registration on the serialized registration result graph to obtain a three-dimensional reconstruction model of the physical target object, wherein the pose of each point cloud in the three-dimensional reconstruction model is the pose under a global coordinate system.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: determining a first pose of the virtual target object under a predetermined coordinate system according to the three-dimensional reconstruction model, comprising: acquiring a third pose of the three-dimensional reconstruction model in a preset coordinate system; determining a transformation matrix for transforming the virtual target object to the three-dimensional reconstruction model; and determining the first pose of the virtual target object under the preset coordinate system according to the third pose and the transformation matrix.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: determining a transformation matrix for transforming the virtual target object to the three-dimensional reconstructed model, comprising: aligning a virtual target object with a three-dimensional reconstruction model based on a point cloud rough registration method of a Fast Point Feature Histogram (FPFH) and a point cloud fine registration method of a point surface ICP to obtain alignment parameters; and determining a transformation matrix of the virtual target object to the three-dimensional reconstruction model according to the alignment parameters.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: determining a second pose of the virtual object under the predetermined coordinate system, comprising: acquiring an alignment relation between a virtual object and a virtual target object; and determining a second pose of the virtual object under a preset coordinate system according to the first pose and the alignment relation.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: projecting the 3D key points of the virtual target object to an image plane according to the first pose to obtain 2D key points, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose, wherein the interactive image comprises the following steps: according to the first pose and a camera imaging method, projecting 3D key points of a virtual target object to an image plane to obtain 2D key points; and calling a preset rendering engine to draw an interaction result of the virtual object and the physical target object in the image plane according to the second position and the color image to obtain an interaction image of the virtual object and the physical target object in the image plane.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: according to the second pose and the color image, calling a preset rendering engine to draw an interaction result of the virtual object and the physical target object in the image plane to obtain an interaction image of the virtual object and the physical target object in the image plane, wherein the interaction image comprises: and when a preset rendering engine is called to draw an interaction result of the virtual object and the physical target object in the image plane according to the second posture and the color image, a plurality of interaction images of the virtual object and the physical target object in the image plane are obtained by replacing the background and/or adjusting the light source.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: and under the condition that the collected color images and depth images are the color images and the depth images of the physical target object in different states and/or at different angles, respectively obtaining the 2D key points of the physical target object in different states and/or at different angles and the images after interaction with the virtual object.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: generating training data according to the 2D key points and the interactive images; acquiring different training data corresponding to various physical target objects; and performing machine training by adopting different training data corresponding to various physical target objects to obtain a key point detection model.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: the physical target object and the virtual object comprise a combination of any one of: feet and virtual shoes; a head and a virtual hat; a wrist and a virtual bracelet; human body and virtual apparel; face and virtual face ornaments.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: receiving an input image, wherein the input image comprises a physical target object; the method comprises the following steps of detecting 2D key points of a physical target object in an input image by adopting a key point detection model, wherein the key point detection model is obtained by adopting a plurality of groups of training data through machine training, and the data in the plurality of groups of training data comprise: the method comprises the steps that an interactive image of a physical target object and an object and 2D key points in the interactive image are obtained, the 2D key points project 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: acquiring a plurality of groups of training data, wherein the data in the plurality of groups of training data comprises: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points project 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object; and performing machine training by adopting multiple groups of training data to obtain a key point detection model.

Optionally, in this embodiment, the storage medium is configured to store program code for performing the following steps: collecting a color image and a depth image of a foot; determining a three-dimensional reconstruction foot model of the foot according to the depth image; determining a first pose of the virtual foot in a preset coordinate system and a second pose of the virtual shoe in the preset coordinate system according to the three-dimensional reconstruction foot model; and projecting the 3D foot key points of the virtual foot to an image plane according to the first pose to obtain 2D foot key points, and drawing a fitting image of the virtual shoe on the image plane according to the second pose and the color image.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one type of division of logical functions, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and various media capable of storing program codes.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A data processing method, comprising:

collecting a color image and a depth image of a physical target object;

determining a three-dimensional reconstruction model of the physical target object according to the depth image;

determining a first pose of a virtual target object in a preset coordinate system and a second pose of a virtual object in the preset coordinate system according to the three-dimensional reconstruction model;

and projecting the 3D key points of the virtual target object to an image plane to obtain 2D key points according to the first pose, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose and the color image.

2. The method of claim 1, wherein determining a three-dimensional reconstructed model of the physical target object from the plurality of depth images in the case that the plurality of depth images are present comprises:

converting the plurality of depth images into a plurality of pieces of three-dimensional point cloud;

establishing a point cloud pose graph according to the plurality of pieces of three-dimensional point clouds, wherein each piece of point cloud in the plurality of pieces of three-dimensional point clouds is a node in the point cloud pose graph;

and registering the point cloud pose image to obtain a three-dimensional reconstruction model of the physical target object.

3. The method of claim 2, wherein registering the point cloud pose graph to obtain a three-dimensional reconstructed model of the physical target object comprises:

carrying out serialized registration on the point cloud pose graph by adopting an iterative closest point ICP (inductively coupled plasma) method to obtain a serialized registration result graph of the point cloud pose graph;

and carrying out global registration on the serialized registration result graph to obtain a three-dimensional reconstruction model of the physical target object, wherein the pose of each point cloud in the three-dimensional reconstruction model is the pose under a global coordinate system.

4. The method of claim 1, wherein determining a first pose of the virtual target object in a predetermined coordinate system from the three-dimensional reconstructed model comprises:

acquiring a third pose of the three-dimensional reconstruction model in the preset coordinate system;

determining a transformation matrix for transforming the virtual target object to the three-dimensional reconstruction model;

and determining the first pose of the virtual target object under the preset coordinate system according to the third pose and the transformation matrix.

5. The method of claim 4, wherein determining a transformation matrix for transforming the virtual target object to the three-dimensional reconstructed model comprises:

aligning the virtual target object with the three-dimensional reconstruction model based on a point cloud rough registration method of a Fast Point Feature Histogram (FPFH) and a point cloud fine registration method of a point surface ICP to obtain an alignment parameter;

and determining a transformation matrix of the virtual target object to the three-dimensional reconstruction model according to the alignment parameters.

6. The method of claim 4, wherein determining a second pose of the virtual object in the predetermined coordinate system comprises:

acquiring the alignment relation between the virtual object and the virtual target object;

and determining the second pose of the virtual object under the preset coordinate system according to the first pose and the alignment relation.

7. The method of claim 1, wherein projecting 3D keypoints of the virtual target object to an image plane in accordance with the first pose results in 2D keypoints, and rendering an interactive image of the virtual object with the physical target object in the image plane in accordance with the second pose and the color image, comprises:

projecting the 3D key points of the virtual target object to an image plane to obtain 2D key points according to the first pose and a camera imaging method;

and calling a preset rendering engine to draw an interaction result of the virtual object and the physical target object in the image plane according to the second posture and the color image, so as to obtain an interaction image of the virtual object and the physical target object in the image plane.

8. The method according to claim 7, wherein invoking a predetermined rendering engine to render the interaction result of the virtual object with the physical target object in the image plane according to the second pose and the color image, and obtaining an interaction image of the virtual object with the physical target object in the image plane, comprises:

and when a preset rendering engine is called to draw an interaction result of the virtual object and the physical target object in the image plane according to the second posture and the color image, a plurality of interaction images of the virtual object and the physical target object in the image plane are obtained by replacing a background and/or adjusting a light source.

9. The method according to any one of claims 1 to 8,

and under the condition that the collected color images and the collected depth images are a plurality of color images and a plurality of depth images of the physical target object in different states and/or at different angles, respectively obtaining 2D key points of the physical target object in different states and/or at different angles and interactive images of the physical target object and the virtual object.

10. The method of claim 9, further comprising:

generating training data according to the 2D key points and the interactive images;

acquiring different training data corresponding to various physical target objects;

and performing machine training by adopting different training data corresponding to various physical target objects to obtain a key point detection model.

11. The method of claim 10, wherein the physical target object and the virtual object comprise a combination of any one of:

feet and virtual shoes;

a head and a virtual hat;

a wrist and a virtual bracelet;

human and virtual apparel;

faces and virtual facial fittings.

12. A method of data processing, comprising:

receiving an input image, wherein the input image comprises a physical target object;

detecting 2D key points of the physical target object in the input image by adopting a key point detection model, wherein the key point detection model is obtained by adopting a plurality of groups of training data through machine training, and the data in the plurality of groups of training data comprises: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points are obtained by projecting 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual target object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object.

13. A data processing method, comprising:

acquiring a plurality of sets of training data, wherein data in the plurality of sets of training data comprises: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points are obtained by projecting 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object;

and performing machine training by adopting the multiple groups of training data to obtain a key point detection model.

14. A data processing apparatus, comprising:

the acquisition module is used for acquiring a color image and a depth image of a physical target object;

the first processing module is used for obtaining a three-dimensional reconstruction model of the physical target object through fusion according to the plurality of depth images;

the determining module is used for determining a first pose of the virtual target object in a preset coordinate system and determining a second pose of the virtual object in the preset coordinate system according to the three-dimensional reconstruction model;

and the second processing module is used for projecting the 3D key points of the virtual target object to an image plane according to the first pose to obtain 2D key points, and drawing an interactive image of the virtual object and the physical target object in the image plane according to the second pose and the color image.

15. A data processing apparatus, characterized by comprising:

the device comprises a receiving module, a processing module and a display module, wherein the receiving module is used for receiving an input image, and the input image comprises a physical target object;

a detection module, configured to detect a 2D keypoint of the physical target object in the input image by using a keypoint detection model, where the keypoint detection model is obtained by machine training using multiple sets of training data, and data in the multiple sets of training data includes: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points are obtained by projecting 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual target object on the image plane according to a second pose of the virtual target object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object.

16. A data processing apparatus, comprising:

an obtaining module, configured to obtain multiple sets of training data, where data in the multiple sets of training data includes: the method comprises the steps that an interactive image of a physical target object and a virtual object and 2D key points in the interactive image are obtained, the 2D key points are obtained by projecting 3D key points of the virtual target object to an image plane according to a first pose of the virtual target object in a preset coordinate system, the interactive image is obtained by drawing the virtual object on the image plane according to a second pose of the virtual object in the preset coordinate system and an acquired color image of the physical target object, the first pose and the second pose are determined according to a three-dimensional reconstruction model, and the three-dimensional reconstruction model is obtained by fusing acquired depth images of the physical target object;

and the training module is used for performing machine training by adopting the multiple groups of training data to obtain a key point detection model.

17. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the data processing method according to any one of claims 1 to 13.

18. A computer device, comprising: a memory and a processor, wherein the processor is capable of,

the memory stores a computer program;

the processor is configured to execute a computer program stored in the memory, and the computer program causes the processor to execute the data processing method according to any one of claims 1 to 13 when the computer program runs.

19. A data processing method, comprising:

collecting a color image and a depth image of a foot;

determining a three-dimensional reconstruction foot model of the foot according to the depth image;

determining a first pose of a virtual foot in a preset coordinate system and a second pose of a virtual shoe in the preset coordinate system according to the three-dimensional reconstruction foot model;

and projecting the 3D foot key points of the virtual foot to an image plane to obtain 2D foot key points according to the first pose, and drawing a fitting image of the virtual shoe on the image plane according to the second pose and the color image.