WO2020134010A1

WO2020134010A1 - Training of image key point extraction model and image key point extraction

Info

Publication number: WO2020134010A1
Application number: PCT/CN2019/094740
Authority: WO
Inventors: 喻冬东; 王长虎
Original assignee: 北京字节跳动网络技术有限公司
Priority date: 2018-12-27
Filing date: 2019-07-04
Publication date: 2020-07-02
Also published as: CN109753910B; CN109753910A

Abstract

A method for training an image key point extraction model. The image key point extraction model comprises multiple cascaded sub models. The method comprises: inputting a training image into the image key point extraction model, obtaining a key point outputted by each sub model, and taking the key points as the primary training of the image key point extraction model (S11); for each sub model, determining a difference between the key point outputted by the sub model and the key point corresponding to the degree identifier of the sub model in the training image (S12), wherein the degree identifier is used for characterizing the difficulty level of key point extraction; and determining the sum of the differences corresponding to all the sub models to be the target difference of the image key point extraction model, and when the number of times of training of the image key point extraction model does not reach a preset number of times, updating the image key point extraction model according to the target difference (S13). By respectively processing the key points having different difficulty levels, the present invention can improve the accuracy and the applicable range of the image key point extraction model.

Description

Training of image key point extraction model and image key point extraction

Cross-reference of related applications

This application requires the priority of China Patent Application No. 201811615301.X filed at the China Intellectual Property Office on December 27, 2018. The entire contents of the disclosure of this Chinese patent application are incorporated herein by reference.

Technical field

The present disclosure relates to the field of image processing, and in particular, to training of image key point extraction models and image key point extraction.

Background technique

In the prior art, when performing image key point extraction, the key points of the image are usually extracted through a convolutional neural network, and the labeled image is used to uniformly train the image key point extraction model. However, the difference in image clarity or the shooting environment may result in difficulty in extracting key points in the image. Therefore, when using such images for unified training, the obtained network has less applicability and lower accuracy.

Summary of the invention

According to a first aspect of the present disclosure, there is provided a training method for an image key point extraction model, the image key point extraction model includes a plurality of cascaded sub-models, the method includes:

Input the training image into the image key point extraction model to obtain the key points output by each sub-model as a training for the image key point extraction model;

For each sub-model, determine the difference between the key points output by the sub-model and the key points in the training image corresponding to the degree identifier of the sub-model, where the degree identifier is used to characterize the difficulty of key point extraction degree;

The sum of the differences corresponding to the sub-models is determined as the target difference of the image key point extraction model, and when the training times of the image key point extraction model does not reach the preset number, the The image key point extraction model is described.

Optionally, after updating the image key point extraction model, return to the step of inputting the training image into the image key point extraction model to obtain the key points output by each sub-model until the training of the image key point extraction model The number of times reaches the preset number of times.

Optionally, the input of the first sub-model in the image key point extraction model is a feature map of the human body image portion in the training image, and the image key-point extraction model except the first sub-model The input of the external sub-model is the key points output by the previous sub-model and the feature map of the human image part in the training image.

Optionally, the feature map of the human image part in the training image is determined in the following manner:

Extract the first image corresponding to the human image part of the training image;

The resolution corresponding to the first image is adjusted to a preset resolution, a second image is obtained, and the feature map of the human body image part in the training image is determined according to the second image.

According to a second aspect of the present disclosure, an image key point extraction method is provided, the method including:

Receiving a target image, the target image containing a human body image portion;

Input the target image into an image key point extraction model, and determine the key points output by the last sub-model of the image key point extraction model as key points of the human body image part in the target image, wherein the image key points The extraction model includes multiple cascaded sub-models, and the image key point extraction model is obtained by training according to the method of the first aspect.

According to a third aspect of the present disclosure, there is provided a training device for extracting a model of an image key point, the image key point extraction model includes a plurality of cascaded sub-models, and the device includes:

The processing module is used to input the training image into the image key point extraction model to obtain the key points output by each sub-model as a training for the image key point extraction model;

The first determining module is used to determine, for each sub-model, the difference between the key points output by the sub-model and the key points in the training image corresponding to the degree identifier of the sub-model, wherein the degree identifier is used to Characterize the difficulty of key point extraction;

The update module is used to determine the sum of the differences corresponding to the respective sub-models as the target difference of the image key point extraction model. When the training times of the image key point extraction model does not reach the preset number, the The target difference updates the image key point extraction model.

Optionally, after the update module updates the image key point extraction model, the processing module inputs the training image into the updated image key point extraction model to obtain the key points output by each sub-model until the image The training times of the key point extraction model reach the preset times.

Optionally, the device further includes a feature extraction module for obtaining a feature map of a human image part in the training image, the feature extraction module includes:

An extraction sub-module for extracting the first image corresponding to the human body image part of the training image;

The adjustment sub-module is used to adjust the resolution corresponding to the first image to a preset resolution, obtain a second image, and determine the feature map of the human body image portion in the training image according to the second image.

According to a fourth aspect of the present disclosure, there is provided an image key point extraction device, the device comprising:

The receiving module is used to receive a target image, the target image includes a human body image portion;

A second determining module, configured to input the target image into an image key point extraction model, and determine the key point output by the last sub-model of the image key point extraction model as the key point of the human body image part in the target image, Wherein, the image key point extraction model includes multiple cascaded sub-models, and the image key point extraction model is obtained by training according to the method of the first aspect.

According to a fifth aspect of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the method of the first aspect described above.

According to a sixth aspect of the present disclosure, there is provided a computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the method of the second aspect described above.

According to a seventh aspect of the present disclosure, an electronic device is provided, including:

Memory, on which computer programs are stored;

A processor is configured to execute the computer program in the memory to implement the method of the first aspect.

According to an eighth aspect of the present disclosure, an electronic device is provided, including:

Memory, on which computer programs are stored;

A processor is configured to execute the computer program in the memory to implement the method of the second aspect.

According to an embodiment of the present disclosure, each sub-model of the image key point extraction model outputs key points, and the difference is calculated separately for each sub-model, so that each sub-model in the image key point extraction model can focus on corresponding to its degree identification Key points to facilitate the extraction of key points with different degrees of difficulty. In addition, through the difference of each sub-model to determine the target difference of the image key point extraction model, to achieve the update of the image key point extraction model, it can effectively ensure the accuracy of the image key point extraction model, by targeting key points of different degrees of difficulty Separate processing, so as to improve the application range of the image key point extraction model and enhance the user experience.

BRIEF DESCRIPTION

The drawings are used to help further understanding of the present disclosure, and constitute a part of the specification, together with the following detailed description to explain the present disclosure, but do not constitute a limitation of the present disclosure. In the drawings:

1 is a flowchart of a training method for an image keypoint extraction model according to an exemplary embodiment of the present disclosure;

2 is a flowchart of a method of acquiring a feature map of a human image portion in a training image according to an exemplary embodiment of the present disclosure;

3 is a flowchart of an image key point extraction method according to an exemplary embodiment of the present disclosure;

4 is a block diagram of a training device for an image keypoint extraction model according to an exemplary embodiment of the present disclosure;

5 is a block diagram of an image keypoint extraction device according to an exemplary embodiment of the present disclosure;

6 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure;

7 is a block diagram of an electronic device according to an exemplary embodiment of the present disclosure.

detailed description

The specific embodiments of the present disclosure will be described in detail below with reference to the drawings. It should be understood that the specific embodiments described herein are only used to illustrate and explain the present disclosure, and are not intended to limit the present disclosure.

FIG. 1 is a flowchart of a training method for an image keypoint extraction model according to an exemplary embodiment of the present disclosure, the image keypoint extraction model including multiple cascaded sub-models.

As shown in FIG. 1, in step S11, the training image is input to the image key point extraction model, and the key points output by each sub-model are obtained as a training for the image key point extraction model.

Among them, a large number of images can be obtained from a database or the Internet. After that, the key points in the image are marked to determine the training image.

In step S12, for each sub-model, the difference between the key points output by the sub-model and the key points corresponding to the degree identifier of the sub-model in the training image is determined, where the degree identifier is used to characterize the key point extraction Degree of difficulty.

According to an embodiment of the present disclosure, when the key point information in the training image is marked, the difficulty of extracting each key point may be marked. As an example, you can mark according to the attributes of the training image. For example, in the high-definition and high-resolution training image, it is easier to extract the key points of the human image part. At this time, you can mark the key points in the training image. One-level identification, the first-level identification is used to characterize the extraction of the key point is relatively simple. It is difficult to extract the key points of the human body image part in the blurred, low-resolution training image. The key points in the training image can be marked with a second degree identifier, which is used to characterize the extraction and comparison of the key points difficult.

As another example, different key points in the training image may be directly labeled with degree identifiers. For example, the key points that are more difficult to extract from the training image are labeled with the second degree identification, and the key points that are easier to extract from the training image are labeled with the first degree identification. The above is an exemplary implementation of the mark degree identification, and does not limit the present disclosure.

Therefore, when training the image keypoint extraction model, you can specify the degree identifier corresponding to the submodel. Among them, for the image keypoint extraction model, each cascaded submodel in the model is extracted, and the keypoints corresponding to each submodel in the cascade order The difficulty of extraction is from easy to difficult. According to an embodiment of the present disclosure, the degree indicator corresponding to the first sub-model is a first degree indicator, and the degree indicator corresponding to the next sub-model is a second degree indicator. When determining the difference of the sub-models, for the first sub-model, the difference between the two key points is determined according to the key points output by the first sub-model and the key points corresponding to the first degree identification in the training image. For the next sub-model, the difference between the two key points is determined according to the key point output by the next sub-model and the key point corresponding to the second degree identifier in the training image. Therefore, when determining the difference corresponding to each sub-model, the sub-model can only focus on the key points corresponding to the degree identification in the sub-model.

In step S13, the sum of the differences corresponding to the sub-models is determined as the target difference of the image key point extraction model. When the training times of the image key point extraction model do not reach the preset number, the image key point extraction is updated according to the target difference model.

Among them, the difference corresponding to each sub-model can be used to characterize the accuracy of extracting the key points identified by the corresponding degree of the sub-model. The smaller the difference, the more accurate the extraction of characterizing key points. After the differences corresponding to each sub-model are determined, the sum of the differences corresponding to each sub-model can be determined as the target difference of the image key point extraction model, then the differences of the key point extraction model of the image can be comprehensively characterized according to the differences corresponding to each sub-model , So that the key point extraction model of the image can be updated according to the target difference.

According to an embodiment of the present disclosure, the preset number of times may be set according to actual usage scenarios. For example, in a scene with higher accuracy requirements, the preset number of times may be set to be larger; in a scene with general accuracy requirements, the preset number of times may be set to be smaller.

Therefore, according to an embodiment of the present disclosure, each sub-model of the model extracts key points through image key point extraction, and the difference is calculated separately for each sub-model, so that each sub-model in the image key point extraction model can focus on its degree Identify the corresponding key points, so as to facilitate the extraction of key points with different degrees of difficulty. In addition, through the difference of each sub-model to determine the target difference of the image key point extraction model, to achieve the update of the image key point extraction model, it can effectively ensure the accuracy of the image key point extraction model, by targeting key points of different degrees of difficulty Separate processing, so as to improve the application range of the image key point extraction model and enhance the user experience.

According to an embodiment of the present disclosure, after the image key point extraction model is updated, it may return to step S11 until the training times of the image key point extraction model reaches a preset number of times.

Among them, updating the image key point extraction model refers to adjusting the weight parameters in the image key point extraction model according to the target difference, which can be implemented through the existing neural network feedback update method, which will not be repeated here.

According to an embodiment of the present disclosure, when returning to the step of inputting the training image into the key point extraction model of the image to obtain the key points output by each sub-model, the training image used may be the training image used before or a new one The training image is not limited in this disclosure. When the training times of the image key point extraction model reaches a preset number, the training process of the image key point extraction model is completed to obtain an accurate image key point extraction model, thereby providing support for the extraction of the image key point.

According to an embodiment of the present disclosure, the input of the first sub-model in the image key point extraction model is the feature map of the human body image part in the training image, and the sub-models in the image key point extraction model other than the first sub-model The input of is the key points output by the previous sub-model and the feature map of the human image part in the training image.

In this embodiment, for the key points of the image, the sub-models other than the first sub-model are extracted from the model, and the inputs are the key points output by the previous sub-model and the feature map of the human image part in the training image. Therefore, when the current sub-model performs key point extraction, it can be determined based on the key points output by the previous sub-model, which can effectively simplify the image key point extraction process, avoid repeated data processing and calculation, and improve the image key point extraction model s efficiency.

According to an embodiment of the present disclosure, the feature map of the human image part in the training image is determined in the following manner, as shown in FIG. 2:

In step S21, the first image corresponding to the human body image part of the training image is extracted, wherein the first image can be extracted by an existing human body recognition extraction algorithm. According to an embodiment of the present disclosure, the human body image in the training image may be extracted through the faster-rcnn algorithm or the maskrcnn algorithm.

In step S22, the resolution corresponding to the first image is adjusted to a preset resolution, a second image is obtained, and the feature map of the human image portion in the training image is determined according to the second image.

The corresponding proportions of human image parts in different training images may be the same or different. For example, the training images are obtained by the same user through continuous shooting, where the proportions corresponding to the human body image parts are generally similar, and for images taken by different users, the proportions corresponding to the human body image parts are generally different. Therefore, in order to facilitate uniform processing of the human body image portion in the training image. In this embodiment, after the first image corresponding to the human body image portion in the training image is extracted, the resolution of the first image may be adjusted to a preset resolution to obtain the second image. For example, the preset resolution may be 400*600. When the resolution of the extracted first image is less than the preset resolution, the resolution of the first image can be made 400*600 by enlarging the image; when the resolution of the extracted first image is greater than the preset For resolution, the resolution of the first image can be reduced to 400*600 by reducing the image. Among them, the way to enlarge or reduce the image is the prior art, and will not be repeated here.

Therefore, according to the embodiments of the present disclosure, feature maps with the same resolution can be extracted from different training images, which facilitates uniform processing of the feature maps, effectively simplifies the processing flow, and increases the processing speed. At the same time, it meets the user's needs and is convenient for users.

An embodiment of the present disclosure also provides an image key point extraction method. As shown in FIG. 3, the method includes:

In step S31, a target image is received, and the target image contains a human body image part, wherein the human body image in the target image can be detected by a faster-rcnn algorithm or a maskrcnn algorithm.

In step S32, the target image is input to the image key point extraction model, and the key points output by the last sub-model of the image key point extraction model are determined as the key points of the human body image part in the target image, where the image key point extraction model includes For multiple cascaded sub-models, the image key point extraction model is trained according to any of the above training methods for the image key point extraction model.

In this embodiment, by inputting the target image to the image key point extraction model, key points in the target image can be extracted. The key point extraction model based on the image can accurately extract key points of different degrees of difficulty in the target image. On the one hand, it can ensure the comprehensiveness and completeness of key point extraction, on the other hand, it can also effectively ensure the extraction of key points The accuracy of the system provides accurate data support for subsequent processing based on this key point, and further improves the user experience.

According to an embodiment of the present disclosure, the key points of the human body image part are the bone key points corresponding to the human body image part. After the bone key points of the human body image part in the target image are determined, the key points in the target image may be determined according to the bone key points Posture estimation is performed on the part of the human body image. Therefore, the prediction accuracy of the bone key points corresponding to the human body image part can be improved, thereby ensuring the accuracy of the pose estimation of the human body image part in the target image.

An embodiment of the present disclosure also provides a training device for extracting a model of an image key point. The image key point extraction model includes multiple cascaded sub-models. As shown in FIG. 4, the device 10 may include:

The processing module 100 is used to input a training image into an image key point extraction model to obtain key points output by each sub-model as a training for the image key point extraction model;

The first determination module 200 is used to determine, for each sub-model, the difference between the key points output by the sub-model and the key points in the training image corresponding to the degree identification of the sub-model, where the degree identification is used to characterize the key point Difficulty of extraction;

The update module 300 is used to determine the sum of the differences corresponding to the sub-models as the target difference of the image key point extraction model. When the training times of the image key point extraction model do not reach the preset number, update the image key point according to the target difference Extract the model.

Optionally, after the update module 300 updates the image key point extraction model, the processing module may input the training image into the updated image key point extraction model to obtain the key points output by each sub-model until the image key points The training times of the extracted model have reached the preset times.

According to an embodiment of the present disclosure, the apparatus may further include a feature extraction module for obtaining a feature map of the human body image part in the training image. The feature extraction module may include:

An extraction sub-module for extracting the first image corresponding to the body image part of the training image;

The adjustment submodule is used to adjust the resolution corresponding to the first image to a preset resolution, obtain a second image, and determine the feature map of the human image portion in the training image according to the second image.

The present disclosure also provides an image key point extraction device. As shown in FIG. 5, the device 20 may include:

The receiving module 400 is used to receive a target image, and the target image includes a human body image part;

The second determination module 500 is used to input the target image into the image keypoint extraction model, and determine the keypoint output by the last sub-model of the image keypoint extraction model as the keypoint of the human image part in the target image, where the image keypoint The extraction model includes multiple cascaded sub-models. The image key point extraction model is obtained by training according to any of the above training methods for the image key point extraction model.

Regarding the device in the above embodiment, the specific manner in which each module performs operations has been described in detail in the embodiment related to the method, and will not be elaborated here.

FIG. 6 is a block diagram of an electronic device 700 according to an embodiment of the present disclosure. As shown in FIG. 6, the electronic device 700 may include a processor 701 and a memory 702. The electronic device 700 may further include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.

The processor 701 is used to control the overall operation of the electronic device 700 to complete all or part of the steps in the training method of the image key point extraction model or the image key point extraction method. The memory 702 is used to store various types of data to support operation on the electronic device 700, and the data may include, for example, instructions for any application or method for operating on the electronic device 700, and application-related data, For example, contact data, messages sent and received, pictures, audio, video, etc. The memory 702 may be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as static random access memory (Static Random Access Memory, SRAM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (Read -Only Memory (ROM for short), magnetic memory, flash memory, magnetic disk or optical disk. The multimedia component 703 may include a screen and an audio component. The screen may be, for example, a touch screen, and the audio component is used to output and/or input audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may be further stored in the memory 702 or transmitted through the communication component 705. The audio component also includes at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules. The other interface modules may be a keyboard, a mouse, a button, and so on. These buttons can be virtual buttons or physical buttons. The communication component 705 is used for wired or wireless communication between the electronic device 700 and other devices. Wireless communication, such as Wi-Fi, Bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or 5G, etc., or a combination of one or more of them, in This is not limited. Therefore, the corresponding communication component 707 may include a Wi-Fi module, a Bluetooth module, an NFC module, and so on.

In an exemplary embodiment, the electronic device 700 may be one or more application specific integrated circuits (Application Specific Integrated Circuit (ASIC), digital signal processor (Digital Signal Processor, DSP), digital signal processing device (Digital Signal Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field Programmable Gate Array, FPGA), controller, microcontroller, microprocessor, or other electronic components for implementation The above training method for image key point extraction model or image key point extraction method.

In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided. When the program instructions are executed by a processor, the above-mentioned image key point extraction model training method or image key point extraction method is implemented. For example, the computer-readable storage medium may be the above-mentioned memory 702 including program instructions, and the above-mentioned program instructions may be executed by the processor 701 of the electronic device 700 to implement the above training method or image key point extraction method for the image key point extraction model.

7 is a block diagram of an electronic device 1900 according to an embodiment of the present disclosure. For example, the electronic device 1900 may be provided as a server. 7, the electronic device 1900 may include: a processor 1922, the number of which may be one or more; and a memory 1932 for storing a computer program executable by the processor 1922. The computer program stored in the memory 1932 may include one or more modules each corresponding to a set of instructions. In addition, the processor 1922 may be configured to execute the computer program to perform the above-mentioned training method of the image key point extraction model or image key point extraction method.

In addition, the electronic device 1900 may further include a power supply component 1926 and a communication component 1950, which may be configured to perform power management of the electronic device 1900, and the communication component 1950 may be configured to implement communication of the electronic device 1900, for example, wired Or wireless communication. In addition, the electronic device 1900 may also include an input/output (I/O) interface 1958. The electronic device 1900 can operate an operating system based on the memory 1932, such as Windows Server ^™ , Mac OS X ^™ , Unix ^™ , Linux ^™, and so on.

In another exemplary embodiment, a computer-readable storage medium including program instructions is also provided. When the program instructions are executed by a processor, the above method for training an image keypoint extraction model or image keypoint extraction method is implemented . For example, the computer-readable storage medium may be the above-mentioned memory 1932 including program instructions, and the above-mentioned program instructions may be executed by the processor 1922 of the electronic device 1900 to complete the above training method or image key point extraction method for the image key point extraction model.

The exemplary embodiments of the present disclosure have been described in detail above with reference to the drawings, however, the present disclosure is not limited to the specific details in the above-described embodiments. Within the scope of the technical concept of the present disclosure, various simple modifications can be made to the embodiments of the present disclosure, and these simple modifications all fall within the protection scope of the present disclosure.

In addition, it should be noted that the specific technical features described in the foregoing specific embodiments can be combined in any suitable manner without contradictions. In order to avoid unnecessary repetition, the present disclosure does not describe various possible combinations.

In addition, any combination of various embodiments of the present disclosure can also be arbitrarily combined, as long as it does not violate the concept of the present disclosure, it should be regarded as within the scope of the present disclosure.

Claims

A training method for an image key point extraction model. The image key point extraction model includes multiple cascaded sub-models. The method includes:

Input the training image into the image key point extraction model to obtain the key points output by each sub-model as a training for the image key point extraction model;

For each sub-model, determine the difference between the key points output by the sub-model and the key points in the training image corresponding to the degree identifier of the sub-model, where the degree identifier is used to characterize the difficulty of key point extraction degree;

The sum of the differences corresponding to the sub-models is determined as the target difference of the image key point extraction model, and when the training times of the image key point extraction model does not reach the preset number, the The image key point extraction model is described.
The method according to claim 1, wherein after updating the image keypoint extraction model, returning to the step of inputting the training image into the image keypoint extraction model to obtain the keypoints output by each sub-model until the The training times of the image key point extraction model reach the preset times.
The method according to claim 1, wherein the input of the first sub-model in the image key point extraction model is a feature map of the human image part in the training image, and the image key point extraction model is divided by The inputs of the sub-models other than the first sub-model are the key points output by the previous sub-model and the feature map of the human image part in the training image.
The method according to claim 3, wherein the feature map of the human image part in the training image is determined in the following manner:

Extract the first image corresponding to the human image part of the training image;

The resolution corresponding to the first image is adjusted to a preset resolution, a second image is obtained, and the feature map of the human body image part in the training image is determined according to the second image.
An image key point extraction method, the method includes:

Receiving a target image, the target image containing a human body image portion;

Input the target image into an image key point extraction model, and determine the key points output by the last sub-model of the image key point extraction model as key points of the human body image part in the target image, wherein the image key points The extraction model includes multiple cascaded sub-models, and the image key point extraction model is obtained by training according to any one of claims 1-4.
A training device for extracting a model of an image key point. The image key point extraction model includes a plurality of cascaded sub-models. The device includes:

The processing module is used to input the training image into the image key point extraction model to obtain the key points output by each sub-model as a training for the image key point extraction model;

The first determining module is used to determine, for each sub-model, the difference between the key points output by the sub-model and the key points in the training image corresponding to the degree identifier of the sub-model, wherein the degree identifier is used to Characterize the difficulty of key point extraction;

The update module is used to determine the sum of the differences corresponding to the respective sub-models as the target difference of the image key point extraction model. When the training times of the image key point extraction model does not reach the preset number, the The target difference updates the image key point extraction model.
An image key point extraction device, the device includes:

The receiving module is used to receive a target image, the target image includes a human body image portion;

The second determination module is used for inputting the target image into the image key point extraction model, and determining the key point output by the last sub-model of the image key point extraction model as the key of the human body image part in the target image Point, wherein the image key point extraction model includes multiple cascaded sub-models, and the image key point extraction model is obtained by training according to any one of claims 1-4.
A computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the method of any one of claims 1-4.
A computer-readable storage medium on which a computer program is stored, which when executed by a processor implements the method of claim 5.
An electronic device, including:

Memory, on which computer programs are stored;

A processor, configured to execute the computer program in the memory, to implement the method according to any one of claims 1-4.
An electronic device, including:

Memory, on which computer programs are stored;

A processor, configured to execute the computer program in the memory, to implement the method of claim 5.