CN115661232A

CN115661232A - Pose estimation method, model training method and device

Info

Publication number: CN115661232A
Application number: CN202211105936.1A
Authority: CN
Inventors: 张夏杰; 史培元
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2022-09-07
Filing date: 2022-09-07
Publication date: 2023-01-31

Abstract

The embodiment of the disclosure provides a pose estimation method, a model training method and a device, comprising: the method comprises the steps of firstly obtaining a target image comprising a target object, inputting the target image into a pre-trained thermodynamic diagram model, outputting a key point thermodynamic image corresponding to the target object, wherein the thermodynamic diagram model corresponds to three-dimensional key point information of an object model to be rendered, then determining two-dimensional key point information of the target object in the target image based on the key point thermodynamic image and the target image, and finally generating pose information of the target object based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered.

Description

Pose estimation method, model training method and device

Technical Field

The embodiment of the disclosure relates to the technical field of computers and the technical field of internet, in particular to the technical field of artificial intelligence and the technical field of image processing, and particularly relates to a pose estimation method, a model training method and a device.

Background

Wristbands refer to various accessories worn on the wrist, such as watches, bracelets, and the like. Currently, the experience of wrist ornaments mainly comprises the following ways: firstly, the user goes to the off-line store to experience and really feels; and secondly, seeing the wearing effect of the model. The existing experience mode has the problems of time and labor waste or incapability of experiencing real effects, so that the on-line virtual try-on technology is generated. The wrist of a person in a real environment is shot by the aid of a virtual try-on technology through a mobile phone camera or a web camera (webcam) as input, a 3D model of a corresponding ornament is rendered to the wrist, a consumer can see the effect of wearing the ornament through a screen, the effect of try-on experience is achieved, the try-on efficiency of a user is accelerated, and the user experience is improved.

In order to achieve an accurate try-on effect, accurately positioning the wrist pose is one of the bases for realizing the try-on effect. The common human body posture estimation related technologies mainly comprise human body bone node positioning, hand bone node positioning, human face key point positioning and the like, but the posture estimation technologies are all based on key points with obvious characteristics, and are not friendly enough to objects with unobvious characteristics such as wrists. Moreover, the current best method for realizing 6Dof pose estimation still depends on special sensors, such as RGBD cameras, stereo cameras and the like. There are also applications where the virtual fitting effect is achieved by taking a picture, then algorithmically identifying the initial position, and then fine-tuning by human. Therefore, the technical means in the related art depends on special sensors such as RGBD cameras, stereo cameras, and the like, and for the scheme requiring manual assistance to perform fine adjustment, the fitting effect can only be seen from one picture, and adjustment needs to be performed again when the angle is changed, which is inefficient.

Disclosure of Invention

The embodiment of the disclosure provides a pose estimation method, a model training method, a pose estimation device, an electronic device and a computer readable medium.

In a first aspect, an embodiment of the present disclosure provides a pose estimation method, including: acquiring a target image including a target object; inputting a target image into a pre-trained thermodynamic diagram model, and outputting a key point thermodynamic image corresponding to a target object, wherein the thermodynamic diagram model corresponds to three-dimensional key point information of an object model to be rendered; determining two-dimensional key point information of a target object in the target image based on the key point thermal image and the target image; and generating the position and pose information of the target object based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered.

In some embodiments, determining two-dimensional keypoint information of a target object in a target image based on a keypoint thermal image and the target image comprises: extracting key point coordinate information of the target object from the key point thermal image; and determining two-dimensional key point information of the target object in the target image based on the key point coordinate information and the target image.

In some embodiments, generating pose information for the target object based on the two-dimensional keypoint information of the target object and the three-dimensional keypoint information of the object model to be rendered comprises: acquiring size information of a target image; calculating camera internal parameters and distortion coefficients corresponding to the target image based on the size information; and generating pose information of the target object based on the two-dimensional key point information of the target object, the three-dimensional key point information of the object model to be rendered, the camera internal parameters and the distortion coefficient.

In some embodiments, the method further comprises: in response to the obtained pose information of the target object, calculating a transformation matrix of the object to be rendered; and projecting the object to be rendered to the key point position of the target object through the transformation matrix, and generating a rendered image of the object to be rendered at the target object.

In a second aspect, an embodiment of the present disclosure provides a model training method, including: acquiring a sample training set, wherein the sample training set comprises a sample image corresponding to a sample object and a sample key point thermal image corresponding to the sample object; constructing a thermodynamic diagram initial model comprising an encoder and a decoder; and training the thermodynamic diagram initial model by using a machine learning method and taking the sample image corresponding to the sample object as the input of the encoder and the sample key point thermodynamic image corresponding to the input sample image as the expected output of the decoder to obtain the thermodynamic diagram model.

In some embodiments, obtaining a training set of samples comprises: performing key point projection on the sample object based on the three-dimensional key point information of the object model to be rendered in response to the acquisition of the sample image corresponding to the sample object to obtain a sample key point image corresponding to the sample image; acquiring a sample key point thermal image corresponding to the sample key point image based on a preset generation condition of the thermal image and the sample key point image; and correspondingly forming a sample training set based on the sample images corresponding to the sample objects and the thermal images of the sample key points.

In some embodiments, obtaining a training set of samples further comprises: responding to the acquired sample key point image corresponding to the sample image, and performing data enhancement processing on the sample key point image to obtain an enhanced sample key point image; and acquiring a sample key point thermal image corresponding to the sample key point image based on the preset generation condition of the thermal image and the sample key point image, wherein the method comprises the following steps: and acquiring a sample key point thermal image corresponding to the enhanced sample key point image based on the preset generating condition of the thermal image and the enhanced sample key point image.

In a third aspect, an embodiment of the present disclosure provides a pose estimation apparatus, including: an acquisition module configured to acquire a target image including a target object; the output module is configured to input the target image into a pre-trained thermodynamic diagram model and output a key point thermodynamic image corresponding to the target object, wherein the thermodynamic diagram model corresponds to three-dimensional key point information of an object model to be rendered; a determination module configured to determine two-dimensional keypoint information of a target object in a target image based on the keypoint thermal image and the target image; a generating module configured to generate pose information of the target object based on the two-dimensional keypoint information of the target object and the three-dimensional keypoint information of the object model to be rendered.

In some embodiments, the determining module is further configured to: extracting key point coordinate information of a target object from the key point thermal image; and determining two-dimensional key point information of the target object in the target image based on the key point coordinate information and the target image.

In some embodiments, the generation module is further configured to: acquiring size information of a target image; calculating camera internal parameters and distortion coefficients corresponding to the target image based on the size information; and generating pose information of the target object based on the two-dimensional key point information of the target object, the three-dimensional key point information of the object model to be rendered, the camera internal parameters and the distortion coefficient.

In some embodiments, the apparatus further comprises a computing module; a computing module configured to: responding to the obtained pose information of the target object, and calculating a transformation matrix of the object to be rendered; a generation module further configured to: and projecting the object to be rendered to the key point position of the target object through the transformation matrix, and generating a rendered image of the object to be rendered at the target object.

In a fourth aspect, an embodiment of the present disclosure provides a model training apparatus, including: the acquisition module is configured to acquire a sample training set, wherein the sample training set comprises a sample image corresponding to a sample object and a sample key point thermal image corresponding to the sample object; a construction module configured to construct an initial model of a thermodynamic diagram comprising an encoder and a decoder; and the training module is configured to train the thermodynamic diagram initial model by using a machine learning method and taking the sample image corresponding to the sample object as the input of the encoder and the sample key point thermodynamic image corresponding to the input sample image as the expected output of the decoder, so as to obtain the thermodynamic diagram model.

In some embodiments, the obtaining module comprises: the projection unit is configured to perform key point projection on the sample object based on the three-dimensional key point information of the object model to be rendered in response to the acquisition of the sample image corresponding to the sample object, so as to obtain a sample key point image corresponding to the sample image; the acquisition unit is configured to acquire a sample key point thermal image corresponding to the sample key point image based on a preset generation condition of the thermal image and the sample key point image; and the composition unit is configured to compose a sample training set based on the sample image corresponding to the sample object and the sample key point thermal image corresponding to the sample object.

In some embodiments, the obtaining module further comprises: a data enhancement unit; a data enhancement unit configured to: responding to the acquired sample key point image corresponding to the sample image, and performing data enhancement processing on the sample key point image to obtain an enhanced sample key point image; and an acquisition unit, further configured to: and acquiring a sample key point thermal image corresponding to the enhanced sample key point image based on the preset generating condition of the thermal image and the enhanced sample key point image.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a method as described in any of the embodiments of the first or second aspects.

In a sixth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which computer program, when executed by a processor, implements a method as described in any of the embodiments of the first or second aspect.

According to the pose estimation method provided by the embodiment of the disclosure, the execution main body firstly obtains a target image comprising a target object, inputs the target image into a pre-trained thermodynamic diagram model, outputs a key point thermodynamic image corresponding to the target object, the thermodynamic diagram model corresponds to three-dimensional key point information of an object model to be rendered, then determines two-dimensional key point information of the target object in the target image based on the key point thermodynamic image and the target image, and finally generates pose information of the target object based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a pose estimation method according to the present disclosure;

FIG. 3 is a flow diagram for one embodiment of determining two-dimensional keypoint information of a target object in a target image, according to the present disclosure;

FIG. 4 is a flow diagram for one embodiment of generating pose information for a target object, according to the present disclosure;

FIG. 5A is a flow diagram for one embodiment of generating a rendered image, according to the present disclosure;

FIG. 5B is a schematic diagram of rendering an image according to the present disclosure;

FIG. 6 is a flow diagram for one embodiment of a model training method according to the present disclosure;

FIG. 7 is a flow diagram for one embodiment of obtaining a training set of samples, according to the present disclosure;

fig. 8 is a schematic structural view of an embodiment of a pose estimation apparatus according to the present disclosure;

FIG. 9 is a schematic diagram of an embodiment of a model training apparatus according to the present disclosure;

FIG. 10 is a schematic structural diagram of an electronic device suitable for use in implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant disclosure and are not limiting of the disclosure. It should be noted that, for the convenience of description, only the parts relevant to the related disclosure are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

FIG. 1 illustrates an exemplary system architecture 100 to which pose estimation methods, model training methods, and apparatus of embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

104, 105, 106, a network 107, and

servers

101, 102, 103. The network 107 serves as a medium for providing communication links between the

terminal devices

104, 105, 106 and the

servers

101, 102, 103. The network 107 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.

A user may interact with

servers

101, 102, 103 belonging to the same server cluster via a network 107 via

terminal devices

104, 105, 106 to receive or transmit information or the like. Various applications may be installed on the

terminal devices

104, 105, 106, such as an item presentation application, a data analysis application, a search-type application, and so forth.

The

terminal devices

104, 105, 106 may be hardware or software. When the terminal device is hardware, it may be various electronic devices having a display screen and supporting communication with the server, including but not limited to a smart phone, a tablet computer, a laptop portable computer, a desktop computer, and the like. When the terminal device is software, the terminal device can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules, or as a single piece of software or software module. And is not particularly limited herein.

The

terminal devices

104, 105, and 106 may obtain a target image including a target object, input the target image into a pre-trained thermodynamic diagram model, output a key point thermodynamic image corresponding to the target object, where the thermodynamic diagram model corresponds to three-dimensional key point information of an object model to be rendered, determine two-dimensional key point information of the target object in the target image based on the key point thermodynamic diagram and the target image, and finally generate pose information of the target object based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered.

The

servers

101, 102, 103 may be servers that provide various services, such as background servers that receive requests sent by terminal devices with which communication connections are established. The background server can receive and analyze the request sent by the terminal device and generate a processing result.

The

servers

101, 102, and 103 may obtain a sample training set, where the sample training set includes a sample image corresponding to a sample object and a sample key point thermal image corresponding to the sample object, then construct a thermal diagram initial model including an encoder and a decoder, and finally train the thermal diagram initial model by using a machine learning method, where the sample image corresponding to the sample object is used as an input of the encoder and the sample key point thermal image corresponding to the input sample image is used as an expected output of the decoder, so as to obtain the thermal diagram model.

The server may be hardware or software. When the server is hardware, it may be various electronic devices that provide various services to the terminal device. When the server is software, it may be implemented as a plurality of software or software modules for providing various services to the terminal device, or may be implemented as a single software or software module for providing various services to the terminal device. And is not particularly limited herein.

It should be noted that the pose estimation method and the model training method provided by the embodiments of the present disclosure may be executed by the

terminal devices

104, 105, and 106 and the

servers

101, 102, and 103. Accordingly, pose estimation means and model training means are provided in the

terminal devices

104, 105, 106 and the

servers

101, 102, 103.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to FIG. 2, a flow 200 of one embodiment of a pose estimation method according to the present disclosure is shown. The pose estimation method comprises the following steps:

step 210, a target image including a target object is acquired.

In this step, an execution subject (for example, the

terminal devices

104, 105, 106 in fig. 1) on which the pose estimation method operates may receive, via the network, a target image including a target object input by a user, where the target image may be an image obtained by shooting the target object by a video camera or a camera, and the target image may be a monocular image, may be in a format of an RGB image, or the like; the target object may be an object whose key point is not easily determined, for example, an object whose feature point such as a wrist is not obvious.

And step 220, inputting the target image into a pre-trained thermodynamic diagram model, and outputting a key point thermodynamic image corresponding to the target object.

In this step, after the execution subject obtains the target image including the target object, a pre-trained thermodynamic diagram model may be further obtained, where the thermodynamic diagram model may correspond to three-dimensional key point information of the object model to be rendered, and may be used to predict key points of the input object and generate a key point thermodynamic image, where the key points may correspond to the three-dimensional key points. The object model to be rendered may be a three-dimensional model composed of an object to be rendered and an object carrier to be rendered, the object to be rendered may be any object to be rendered, and may be a watch, a bracelet, a wrist band, and the like, the object carrier to be rendered may be an object bearing the object to be rendered, such as an object of a wrist, and the like, the three-dimensional key point information of the object model to be rendered may be three-dimensional key points of the object carrier to be rendered in the object model to be rendered, and the three-dimensional key points may be a preset number of key points symmetrical in the object carrier to be rendered, and may be 10 symmetrical three-dimensional key points. As an example, the object model to be rendered may be a three-dimensional model of a wrist-worn watch, and the three-dimensional key point information of the object model to be rendered may be a three-dimensional key point of the wrist in the three-dimensional model of the wrist-worn watch.

The executing body can input the target image into a pre-trained thermodynamic diagram model, the thermodynamic diagram model can perform key point prediction processing on the target image, determine key point information of a target object in the target image and generate a key point thermodynamic image corresponding to the target object, the key point information of the target object corresponds to three-dimensional key points of an object carrier to be rendered in the object model to be rendered, and the number of the key point information can be the same as the number of the three-dimensional key point information. The key point thermal image may be a thermal image with a preset resolution including key point information of the target object, where the preset resolution may be different from or the same as the resolution of the target image, and the number of channels of the key point thermal image is the same as the number of key points of the target object.

And step 230, determining two-dimensional key point information of the target object in the target image based on the key point thermal image and the target image.

In this step, after the execution main body obtains the key point thermal image through the thermodynamic model, since the number of channels and the resolution of the key point thermal image are different from the number of channels and the resolution of the target image, channel and resolution processing may be performed according to the key point thermal image and the target image, the key point information in the key point thermal image is converted into the target image, and the two-dimensional key point information of the target object in the target image is obtained.

As an alternative implementation, referring to fig. 3, fig. 3 shows a flowchart 300 of an embodiment of determining two-dimensional keypoint information of the target object in the target image, that is, the step 230, determining two-dimensional keypoint information of the target object in the target image based on the keypoint thermal image and the target image, which may include the following steps:

step 310, extracting key point coordinate information of the target object from the key point thermal image.

In this step, after the execution main body obtains the key point thermal image through the thermodynamic diagram model, the execution main body can analyze the pixel points in the key point thermal image to obtain the pixel value of each pixel point in the key point thermal image. The execution main body can determine key points corresponding to the target objects in the key point thermal image according to the pixel values of all the pixel points and determine the coordinate information of the key points.

And step 320, determining two-dimensional key point information of the target object in the target image based on the key point coordinate information and the target image.

In this step, the execution subject obtains the coordinate information of the key point of the target object, may determine a proportional relationship between the thermal image of the key point and the target image, converts the coordinate information of the key point into the target image based on the proportional relationship, and obtains the two-dimensional key point information of the target object in the target image.

In the implementation mode, the two-dimensional key point information of the target object in the target image is determined by extracting the key point information in the key point thermal image and converting the key point information into the target image, so that the spatial relationship of the key points can be concerned, and the accuracy of the two-dimensional key point information is improved.

And 240, generating the pose information of the target object based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered.

In this step, after the execution main body acquires the two-dimensional key point information of the target object, the projection relationship between the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered may be calculated by solving the PNP problem according to the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered, thereby acquiring the pose information of the target object.

According to the pose estimation method provided by the embodiment of the disclosure, the execution main body firstly obtains a target image comprising a target object, inputs the target image into a pre-trained thermodynamic diagram model, outputs a key point thermodynamic image corresponding to the target object, the thermodynamic diagram model corresponds to three-dimensional key point information of an object model to be rendered, then determines two-dimensional key point information of the target object in the target image based on the key point thermodynamic image and the target image, and finally generates pose information of the target object relative to the object to be rendered based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered.

Referring to fig. 4, fig. 4 shows a flowchart 400 of one embodiment of generating pose information of a target object, namely step 240 described above, generating pose information of the target object based on two-dimensional keypoint information of the target object and three-dimensional keypoint information of an object model to be rendered, and may include the steps of:

in step 410, size information of the target image is obtained.

In this step, the executing entity may analyze the target image to determine size information of the target image, where the size information may represent a size of the target image and may be represented as (w, h).

And step 420, calculating the camera internal parameter and distortion coefficient corresponding to the target image based on the size information.

In this step, after the execution subject acquires the size information of the target image, the camera internal reference and the distortion coefficient corresponding to the target image may be calculated according to the size information of the target image.

The size information of the target image is expressed as: (w, h);

the camera internal parameters can be expressed as:

the distortion coefficient can be expressed as: d = [ d = ₀ d ₁ d ₂ d ₃ d ₄ ]＝[0 0 0 0 0]。

And 430, generating pose information of the target object based on the two-dimensional key point information of the target object, the three-dimensional key point information of the object model to be rendered, the camera internal parameters and the distortion coefficient.

In this step, after the execution subject obtains the two-dimensional key point information of the target object, the three-dimensional key point information of the object model to be rendered, the camera internal parameters and the distortion coefficients, the two-dimensional key point information of the target object, the three-dimensional key point information of the object model to be rendered, the camera internal parameters and the distortion coefficients may be calculated through a solvePnP () function of OpenCV, and corresponding rotation vectors r are obtained _v And an offset vector t _v In order to apply to OpenGL for rendering, the rotation vector r needs to be applied _v And converting the pose information into a rotation matrix R, and realizing the rotation matrix R through a Rodrigues () function of OpenCV (open vehicle computer vision) so as to generate the pose information of the target object.

Specifically, the two-dimensional key point information of the target object may be represented as P _i The three-dimensional key point information of the object model to be rendered may be expressed as: p _o If the camera parameter is represented by K and the distortion coefficient is represented by d, the above-mentioned rotation vector r _v And an offset vector t _v The calculation process of (a) can be expressed as: r is _v ，t _v ＝cv2.solvePnP(P _i ，P _o K, d); the calculation process of the rotation matrix R can be expressed as: r = cv2.Rodrigues (R) _v )。

In the implementation mode, the pose information of the target object is generated based on the two-dimensional key point information of the target object, the three-dimensional key point information of the object model to be rendered, the camera internal parameters and the distortion coefficients, so that the acquired pose information is related to the two-dimensional key point information of the target object, the three-dimensional key point information of the object model to be rendered, the camera internal parameters and the distortion coefficients, and the accuracy of the pose information is improved.

Referring to FIG. 5A, FIG. 5A shows a flowchart 500 of one embodiment of generating a rendered image, which may include the steps of:

at step 510, a target image including a target object is acquired.

Step 510 of this embodiment may be performed in a manner similar to step 210 in the embodiment shown in fig. 2, which is not described herein again.

And step 520, inputting the target image into a pre-trained thermodynamic diagram model, and outputting a key point thermodynamic image corresponding to the target object.

Step 520 of this embodiment can be performed in a similar manner to step 220 in the embodiment shown in fig. 2, and is not described herein again.

Step 530, determining two-dimensional key point information of the target object in the target image based on the key point thermal image and the target image.

Step 530 of this embodiment may be performed in a manner similar to step 230 of the embodiment shown in fig. 2, which is not described herein again.

And 540, generating the pose information of the target object based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered.

Step 540 of this embodiment may be performed in a manner similar to step 240 of the embodiment shown in fig. 2, and is not described herein again.

Step 550, in response to acquiring the pose information of the target object, calculating a transformation matrix of the object to be rendered.

In this step, after the execution subject obtains the pose information of the target object, the rotation matrix R and the offset vector t of the target object are determined _v The transformation matrix of the object to be rendered may be calculated, i.e. the rotation matrix R and the offset vector t of the target object may be calculated _v Combining to obtain a transformation matrix T = [ R | T = _v ]。

And step 560, projecting the object to be rendered to the key point position of the target object through the transformation matrix, and generating a rendered image of the object to be rendered at the target object.

In this step, after the execution main body obtains the transformation matrix of the object to be rendered, the object model to be rendered may be rendered onto the target object through the transformation matrix T, specifically, the transformation matrix T may be used to replace a model view matrix of OpenGL, the object to be rendered is projected to a key point position of the target object, and a rendered image of the object to be rendered at the target object is generated.

As an example, referring to fig. 5B, the target object may be a wrist, the object to be rendered may be a watch, and the two-dimensional keypoint information of the target object in the target image and the rendered image may be as shown in fig. 5B.

In this embodiment, the pose information of the target object is obtained by solving the PNP, and the model of the object to be rendered is rendered on the target object by using tools such as OpenGL and the like, so that the object to be rendered can be accurately rendered on the target object based on the pose information and the transformation matrix without manual fine tuning, and the rendering efficiency and effect of the object to be rendered are improved.

With continued reference to FIG. 6, a flow 600 of one embodiment of a model training method according to the present disclosure is shown. The model training method comprises the following steps:

step 610, a sample training set is obtained.

In this step, an execution subject (for example, the

servers

101, 102, 103 in fig. 1) on which the model training method operates may read a sample object video from a network platform or from a local database, obtain different frame images from the video, where the sample objects in the different frame images may have different angles, and use the different frame images as sample images corresponding to the sample objects.

The execution subject can determine the sample key points of the sample object in the sample image, label the sample key points to the sample image, and obtain the sample image labeled with the sample key points. And then the execution main body can perform image format conversion on the sample images marked with the sample key points, convert the sample images marked with the sample key points into thermal images, and acquire the thermal images of the sample key points corresponding to each sample image marked with the sample key points, so that the execution main body can acquire a plurality of groups of sample images and the thermal images of the sample key points corresponding to the sample objects, and acquire training sample sets for training the network by taking the acquired plurality of groups of sample images and the thermal images of the sample key points corresponding to the sample objects as training data sets, verification data sets and test data sets.

And, the executing subject may also obtain a training sample set including the sample image and the sample key point thermal image corresponding to the sample object in a manner supported by any related art, which is not specifically limited in this disclosure.

At step 620, an initial thermodynamic diagram model is constructed that includes an encoder and a decoder.

In this step, after the execution subject acquires the training sample set, a thermodynamic diagram initial model for generating a thermodynamic image of a sample key point according to the sample image may be constructed, where the thermodynamic diagram initial model may include an encoder and a decoder, where the encoder may be connected by a plurality of CDCxx modules and residuals, so as to improve the depth of the network and enhance the expression capability of the network, and the encoder may be configured to reduce the resolution of the sample image and increase the number of channels of the sample image; the decoder can be connected by a plurality of deconvolution modules and convolution modules, and the decoder can be used for recovering the resolution of the feature map and keeping the number of channels of the output thermodynamic diagram consistent with the number of key points.

Step 630, using a machine learning method, taking the sample image corresponding to the sample object as the input of the encoder, taking the sample key point thermal image corresponding to the input sample image as the expected output of the decoder, and training the thermodynamic diagram initial model to obtain the thermodynamic diagram model.

In this step, after the executing subject obtains the training sample set and constructs the thermodynamic diagram initial model, the thermodynamic diagram initial model may be trained based on the training sample set by using a machine learning method, so as to obtain a thermodynamic diagram model for generating a key point thermodynamic image. Specifically, the execution subject may use the sample image as an input of the encoder, use the sample key point thermal image corresponding to the input sample image as an expected output of the decoder, and train the thermodynamic diagram initial model to obtain the thermodynamic diagram model.

The model training method provided by the embodiment of the disclosure includes the steps that the execution subject firstly obtains a sample training set, the sample training set comprises a sample image corresponding to a sample object and a sample key point thermal image corresponding to the sample object, then a thermodynamic diagram initial model comprising an encoder and a decoder is constructed, finally, the sample image corresponding to the sample object is used as the input of the encoder, the sample key point thermal image corresponding to the input sample image is used as the expected output of the decoder, the thermodynamic diagram initial model is trained to obtain a thermodynamic diagram model, the thermodynamic diagram model is trained based on the sample image and the sample key point thermal image, the thermodynamic diagram model capable of obtaining the key point thermal image based on the input image is obtained through training, the thermodynamic diagram model can output the corresponding key point thermal image based on the input image, and the diversity and the accuracy of model training are improved.

Referring to fig. 7, fig. 7 shows a flowchart 700 of an embodiment of obtaining a sample training set, i.e. the step 610 of obtaining a sample training set, which may include the following steps:

step 710, in response to acquiring a sample image corresponding to the sample object, performing keypoint projection on the sample object based on the three-dimensional keypoint information of the object model to be rendered to obtain a sample keypoint image corresponding to the sample image.

In this step, after the execution main body obtains the sample image corresponding to the sample object, the to-be-rendered object model corresponding to the sample object may be obtained, where the to-be-rendered object model may be a three-dimensional model composed of the to-be-rendered object and an to-be-rendered object carrier, the to-be-rendered object may be an object that needs to be rendered for the sample object, and may be a watch, a bracelet, a wrist band, or the like, the to-be-rendered object carrier may correspond to the sample object, and may be an object that carries the to-be-rendered object, such as an object like a wrist, and the three-dimensional key point information of the to-be-rendered object model may be three-dimensional key points of the to-be-rendered object carrier in the to-be-rendered object carrier, and the three-dimensional key points may be a preset number of symmetric key points in the to-be-rendered object carrier, and may be 10 symmetric three-dimensional key points. As an example, the object model to be rendered may be a three-dimensional model of a wrist-worn watch, and the three-dimensional key point information of the object model to be rendered may be a three-dimensional key point of the wrist in the three-dimensional model of the wrist-worn watch.

The executing body can acquire the three-dimensional key point information of the object model to be rendered, and perform key point projection processing on the sample object by using the determined three-dimensional key point information to obtain a sample key point image corresponding to the sample image.

And 720, acquiring a sample key point thermal image corresponding to the sample key point image based on the preset generation condition of the thermal image and the sample key point image.

In this step, after the execution subject obtains the sample key point image, a preset generating condition of the thermal image may be further obtained, where the preset generating condition of the thermal image may represent a format requirement of the sample key point thermodynamic diagram, and may include a requirement of a channel number, a requirement of a resolution, and the like. In the process of generating the thermodynamic diagram of the sample key point image, the execution subject may generate the sample key point thermodynamic image corresponding to the sample key point image according to the preset generation condition of the thermodynamic image, where the sample key point thermodynamic image meets the preset generation condition of the thermodynamic image and may be a thermodynamic image with a preset number of channels and a preset resolution.

And step 730, correspondingly forming a sample training set based on the sample image corresponding to the sample object and the thermal image of the sample key point.

In this step, after the execution subject may obtain the sample key point thermal images, the sample images corresponding to the sample objects and the sample key point thermal images may be correspondingly combined to form a sample training set.

In the implementation mode, the symmetrical three-dimensional key point information is selected for projection to obtain the corresponding sample key points, when individual key point prediction is wrong, follow-up processing is not influenced, robustness is provided for unstable prediction results, local specific diagnosis is relatively obvious after most points in the three-dimensional key point information project sample images, the target object with the unobvious characteristics is important, and follow-up neural network learning is facilitated.

As an optional implementation manner, the step 610 of obtaining the sample training set may further include the following steps: responding to the acquired sample key point image corresponding to the sample image, and performing data enhancement processing on the sample key point image to obtain an enhanced sample key point image; and in step 720, obtaining a sample key point thermal image corresponding to the sample key point image based on the preset generating condition of the thermal image and the sample key point image, including: and acquiring a sample key point thermal image corresponding to the enhanced sample key point image based on the preset generating condition of the thermal image and the enhanced sample key point image.

Specifically, after the sample key point image corresponding to the sample image is acquired, the executing entity may further perform data enhancement on the sample key point image, where the data enhancement mode is various and may include rotation enhancement by 20 times. Then, the execution main body can also randomly perform blurring, channel brightness and cutting on the sample key point image to obtain an enhanced sample key point image. Therefore, the execution subject can obtain the sample key point thermal image corresponding to the enhanced sample key point image based on the preset generation condition of the thermal image and the enhanced sample key point image.

In the implementation mode, the data set is subjected to data enhancement, a rich data set can be obtained, the richness of the training sample set is ensured, and therefore the accuracy of the training model can be improved.

With further reference to fig. 8, as an implementation of the methods shown in the above figures, the present disclosure provides an embodiment of a pose estimation apparatus. This embodiment of the device corresponds to the embodiment of the method shown in fig. 2.

As shown in fig. 8, the pose estimation apparatus 800 of the present embodiment may include: an obtaining module 810, an outputting module 820, a determining module 830 and a generating module 840.

Wherein the obtaining module 810 is configured to obtain a target image including a target object;

an output module 820 configured to input the target image into a pre-trained thermodynamic diagram model and output a key point thermodynamic image corresponding to the target object, wherein the thermodynamic diagram model corresponds to three-dimensional key point information of the object model to be rendered;

a determining module 830 configured to determine two-dimensional keypoint information of the target object in the target image based on the keypoint thermal image and the target image;

a generating module 840 configured to generate pose information of the target object based on the two-dimensional keypoint information of the target object and the three-dimensional keypoint information of the object model to be rendered.

In some alternative implementations of this implementation, the determining module 830 is further configured to: extracting key point coordinate information of a target object from the key point thermal image; and determining two-dimensional key point information of the target object in the target image based on the key point coordinate information and the target image.

In some optional implementations of this implementation, the generating module 840 is further configured to: acquiring size information of a target image; calculating camera internal parameters and distortion coefficients corresponding to the target image based on the size information; and generating pose information of the target object based on the two-dimensional key point information of the target object, the three-dimensional key point information of the object model to be rendered, the camera internal parameters and the distortion coefficient.

In some optional implementations of this implementation, the apparatus further includes a computing module; a computing module configured to: in response to the obtained pose information of the target object, calculating a transformation matrix of the object to be rendered; a generating module 840, further configured to: and projecting the object to be rendered to the key point position of the target object through the transformation matrix, and generating a rendered image of the object to be rendered at the target object.

According to the pose estimation device provided by the embodiment of the disclosure, the execution main body firstly acquires a target image comprising a target object, inputs the target image into a pre-trained thermodynamic diagram model, outputs a key point thermodynamic image corresponding to the target object, the thermodynamic diagram model corresponds to three-dimensional key point information of an object model to be rendered, then determines two-dimensional key point information of the target object in the target image based on the key point thermodynamic diagram and the target image, and finally generates pose information of the target object based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered.

Those skilled in the art will appreciate that the above-described apparatus may also include some other well-known structures, such as processors, memories, etc., which are not shown in fig. 8 in order to not unnecessarily obscure embodiments of the present disclosure.

With further reference to FIG. 9, as an implementation of the methods illustrated in the above figures, the present disclosure provides one embodiment of a model training apparatus. This device embodiment corresponds to the method embodiment shown in fig. 6.

As shown in fig. 9, the model training apparatus 900 of the present embodiment may include: an acquisition module 910, a construction module 920, and a training module 930.

The obtaining module 910 is configured to obtain a sample training set, where the sample training set includes a sample image corresponding to a sample object and a sample key point thermal image corresponding to the sample object;

a construction module 920 configured to construct an initial model of a thermodynamic diagram including an encoder and a decoder;

and a training module 930 configured to train the thermodynamic diagram initial model by using a machine learning method, with the sample image corresponding to the sample object as an input of the encoder and the sample key point thermodynamic image corresponding to the input sample image as an expected output of the decoder, so as to obtain the thermodynamic diagram model.

In some optional implementations of this implementation, the obtaining module 910 includes: the projection unit is configured to perform key point projection on the sample object based on the three-dimensional key point information of the object model to be rendered in response to the acquisition of the sample image corresponding to the sample object, so as to obtain a sample key point image corresponding to the sample image; the acquisition unit is configured to acquire a sample key point thermal image corresponding to the sample key point image based on a preset generation condition of the thermal image and the sample key point image; and the composition unit is configured to compose a sample training set based on the sample image corresponding to the sample object and the sample key point thermal image corresponding to the sample object.

In some optional implementations of this implementation, the obtaining module 910 further includes: a data enhancement unit; a data enhancement unit configured to: performing data enhancement processing on the sample key point image in response to the obtained sample key point image corresponding to the sample image to obtain an enhanced sample key point image; and an acquisition unit, further configured to: and acquiring a sample key point thermal image corresponding to the enhanced sample key point image based on the preset generating condition of the thermal image and the enhanced sample key point image.

In the model training apparatus provided by the above embodiment of the present disclosure, the executing entity first obtains a sample training set, where the sample training set includes a sample image corresponding to a sample object and a sample key point thermal image corresponding to the sample object, then constructs a thermodynamic diagram initial model including an encoder and a decoder, and finally utilizes a machine learning method to take the sample image corresponding to the sample object as input of the encoder, take the sample key point thermal image corresponding to the input sample image as expected output of the decoder, trains the thermodynamic diagram initial model to obtain a thermodynamic diagram model, trains based on the sample image and the sample key point thermal image, and obtains a thermodynamic diagram model capable of obtaining a key point thermal image based on processing of the input image, so that the thermodynamic diagram model can output the corresponding key point thermal image based on the input image only, and diversity and accuracy of model training are improved.

Those skilled in the art will appreciate that the above-described apparatus may also include some other well-known structure, such as a processor, memory, etc., which is not shown in fig. 9 in order not to unnecessarily obscure embodiments of the present disclosure.

Referring now to FIG. 10, shown is a schematic diagram of an electronic device 1000 suitable for use in implementing embodiments of the present disclosure. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a smart screen, a notebook computer, a PAD (tablet computer), a PMP (portable multimedia player), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a digital TV, a desktop computer, etc. The terminal device shown in fig. 10 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 10, the electronic device 1000 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1001 that may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1002 or a program loaded from a storage means 1008 into a Random Access Memory (RAM) 1003. In the RAM 1003, various programs and data necessary for the operation of the electronic apparatus 1000 are also stored. The processing device 1001, ROM 1002, and RAM 1003 are connected to each other by a bus 1004. An input/output (I/O) interface 1005 is also connected to bus 1004.

Generally, the following devices may be connected to the I/O interface 1005: input devices 1006 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1007 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 1008 including, for example, magnetic tape, hard disk, and the like; and a communication device 1009. The communication device 1009 may allow the electronic device 1000 to communicate with other devices wirelessly or by wire to exchange data. While fig. 10 illustrates an electronic device 1000 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 10 may represent one device or may represent multiple devices as desired.

In particular, the processes described above with reference to the flow diagrams may be implemented as computer software programs, according to embodiments of the present disclosure. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 1009, or installed from the storage means 1008, or installed from the ROM 1002. The computer program, when executed by the processing device 1001, performs the above-described functions defined in the methods of the embodiments of the present disclosure. It should be noted that the computer readable medium of the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C + +, and including conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition module, an output module, a determination module, and a generation module, or an acquisition module, a construction module, and a training module, where the names of the modules do not in some cases constitute a limitation on the modules themselves.

As another aspect, the present application also provides a computer-readable medium, which may be included in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target image including a target object; inputting a target image into a pre-trained thermodynamic diagram model, and outputting a key point thermodynamic image corresponding to a target object, wherein the thermodynamic diagram model corresponds to three-dimensional key point information of an object model to be rendered; determining two-dimensional key point information of a target object in a target image based on the key point thermal image and the target image; and generating the position and pose information of the target object based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered. Or, causing the electronic device to: acquiring a sample training set, wherein the sample training set comprises a sample image corresponding to a sample object and a sample key point thermal image corresponding to the sample object; constructing a thermodynamic diagram initial model comprising an encoder and a decoder; and training the thermodynamic diagram initial model by using a machine learning method and taking the sample image corresponding to the sample object as the input of the encoder and the sample key point thermodynamic image corresponding to the input sample image as the expected output of the decoder to obtain the thermodynamic diagram model.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A pose estimation method, the method comprising:

acquiring a target image including a target object;

inputting the target image into a pre-trained thermodynamic diagram model, and outputting a key point thermodynamic image corresponding to the target object, wherein the thermodynamic diagram model corresponds to three-dimensional key point information of an object model to be rendered;

determining two-dimensional key point information of the target object in the target image based on the key point thermal image and the target image;

and generating the pose information of the target object based on the two-dimensional key point information of the target object and the three-dimensional key point information of the object model to be rendered.

2. The method of claim 1, wherein said determining two-dimensional keypoint information of the target object in the target image based on the keypoint thermal image and the target image comprises:

extracting key point coordinate information of the target object from the key point thermal image;

and determining two-dimensional key point information of the target object in the target image based on the key point coordinate information and the target image.

3. The method of claim 1, wherein the generating pose information for the target object based on the two-dimensional keypoint information of the target object and the three-dimensional keypoint information of the object model to be rendered comprises:

acquiring size information of the target image;

calculating camera parameters and distortion coefficients corresponding to the target image based on the size information;

and generating the pose information of the target object based on the two-dimensional key point information of the target object, the three-dimensional key point information of the object to be rendered, the camera internal parameters and the distortion coefficient.

4. The method of any of claims 1-3, further comprising:

in response to the fact that the pose information of the target object is obtained, calculating a transformation matrix of the object to be rendered;

and projecting the object to be rendered to the key point position of the target object through the transformation matrix, and generating a rendered image of the object to be rendered at the target object.

5. A method of model training, the method comprising:

acquiring a sample training set, wherein the sample training set comprises a sample image corresponding to a sample object and a sample key point thermal image corresponding to the sample object;

constructing a thermodynamic diagram initial model comprising an encoder and a decoder;

and training the thermodynamic diagram initial model by using a machine learning method and taking the sample image corresponding to the sample object as the input of the encoder and the sample key point thermodynamic image corresponding to the input sample image as the expected output of the decoder to obtain the thermodynamic diagram model.

6. The method of claim 5, wherein the obtaining a training set of samples comprises:

performing key point projection on the sample object based on three-dimensional key point information of an object model to be rendered in response to the acquisition of the sample image corresponding to the sample object to obtain a sample key point image corresponding to the sample image;

acquiring a sample key point thermal image corresponding to the sample key point image based on a preset generation condition of the thermal image and the sample key point image;

and correspondingly forming the sample training set based on the sample images corresponding to the sample objects and the thermal images of the sample key points.

7. The method of claim 6, wherein the obtaining a training set of samples further comprises:

performing data enhancement processing on the sample key point image in response to the acquisition of the sample key point image corresponding to the sample image to obtain an enhanced sample key point image; and the number of the first and second groups,

the obtaining of the sample key point thermal image corresponding to the sample key point image based on the preset generating condition of the thermal image and the sample key point image includes:

and acquiring a sample key point thermal image corresponding to the enhanced sample key point image based on a preset generation condition of the thermal image and the enhanced sample key point image.

8. A pose estimation apparatus, the apparatus comprising:

an acquisition module configured to acquire a target image including a target object;

the output module is configured to input the target image into a pre-trained thermal graph model and output a key point thermal image corresponding to the target object, wherein the thermal graph model corresponds to three-dimensional key point information of an object model to be rendered;

a determination module configured to determine two-dimensional keypoint information of the target object in the target image based on the keypoint thermal image and the target image;

a generating module configured to generate pose information of the target object based on the two-dimensional keypoint information of the target object and the three-dimensional keypoint information of the object model to be rendered.

9. The apparatus of claim 8, wherein the determination module is further configured to:

10. The apparatus of claim 8, wherein the generation module is further configured to:

acquiring size information of the target image;

calculating camera internal parameters and distortion coefficients corresponding to the target image based on the size information;

and generating the pose information of the target object based on the two-dimensional key point information of the target object, the three-dimensional key point information of the object model to be rendered, the camera internal parameters and the distortion coefficient.

11. The apparatus of any of claims 8-10, further comprising a computing module;

the computing module configured to: in response to the fact that the pose information of the target object is obtained, calculating a transformation matrix of the object to be rendered;

the generation module further configured to: and projecting the object to be rendered to the key point position of the target object through the transformation matrix, and generating a rendered image of the object to be rendered at the target object.

12. A model training apparatus, the apparatus comprising:

the acquisition module is configured to acquire a sample training set, wherein the sample training set comprises a sample image corresponding to a sample object and a sample key point thermal image corresponding to the sample object;

a construction module configured to construct an initial model of a thermodynamic diagram comprising an encoder and a decoder;

and the training module is configured to train the thermodynamic diagram initial model by using a machine learning method and taking the sample image corresponding to the sample object as the input of the encoder and the sample key point thermodynamic image corresponding to the input sample image as the expected output of the decoder, so as to obtain the thermodynamic diagram model.

13. The apparatus of claim 12, wherein the means for obtaining comprises:

the projection unit is configured to perform key point projection on the sample object based on three-dimensional key point information of an object model to be rendered in response to the acquisition of a sample image corresponding to the sample object, so as to obtain a sample key point image corresponding to the sample image;

the acquisition unit is configured to acquire a sample key point thermal image corresponding to the sample key point image based on a preset generation condition of the thermal image and the sample key point image;

a composition unit configured to compose the sample training set based on sample images corresponding to the sample objects and the sample keypoint thermal image correspondences.

14. The apparatus of claim 13, wherein the means for obtaining further comprises: a data enhancement unit;

the data enhancement unit configured to: responding to the acquired sample key point image corresponding to the sample image, and performing data enhancement processing on the sample key point image to obtain an enhanced sample key point image; and the number of the first and second groups,

the obtaining unit is further configured to: and acquiring a sample key point thermal image corresponding to the enhanced sample key point image based on a preset generation condition of the thermal image and the enhanced sample key point image.

15. An electronic device, comprising:

one or more processors;

a storage device for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-7.

16. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-7.