WO2021175071A1

WO2021175071A1 - Image processing method and apparatus, storage medium, and electronic device

Info

Publication number: WO2021175071A1
Application number: PCT/CN2021/075025
Authority: WO
Inventors: 吴佳涛
Original assignee: Oppo广东移动通信有限公司
Priority date: 2020-03-06
Filing date: 2021-02-03
Publication date: 2021-09-10
Also published as: CN111368751A

Abstract

Disclosed in embodiments of the present application are an image processing method and apparatus, a storage medium, and an electronic device: obtaining an image to be detected requiring key point detection; calling a pre-trained key point detection model to perform key point detection on said image to obtain key point position information and key point attribution information; and identifying, according to the key point position information and the key point attribution information, a human body key point set belonging to a same human body.

Description

Image processing method, device, storage medium and electronic equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 202010152690.8, and the invention title is "Image processing methods, devices, storage media and electronic equipment" on March 6, 2020. The entire contents are incorporated by reference. In this application.

Technical field

This application relates to the field of image processing technology, and in particular to an image processing method, device, storage medium and electronic equipment.

Background technique

At present, the key point detection is mainly to detect the key points of the human body, that is, to detect some key points of the human body, such as eyes, nose, elbows, shoulders, etc., and connect them in order according to the order of the limbs. Describe the human body.

Summary of the invention

The embodiments of the present application provide an image processing method, device, storage medium, and electronic equipment, which can improve the efficiency of key point detection.

The image processing method provided by the embodiment of the application includes:

Obtain the image to be detected that requires key point detection;

Calling a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;

According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.

The image processing device provided by the embodiment of the application includes:

The image acquisition module is used to acquire the to-be-detected image that requires key point detection;

The image detection module is used to call a pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information;

The human body recognition module is used to identify a set of human body key points belonging to the same human body based on the key point location information and the key point attribution information.

The storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is loaded by a processor, the image processing method as provided in the present application is executed.

The electronic device provided by the embodiment of the present application includes a processor and a memory, the memory stores a computer program, and the processor loads the computer program to execute the image processing method provided by the present application.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without creative work.

FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the application.

Fig. 2 is an example diagram of a key point detection interface provided by an embodiment of the present application.

Fig. 3 is an example diagram of a selection sub-interface provided by an embodiment of the present application.

Fig. 4 is a schematic structural diagram of a key point detection model provided by an embodiment of the present application.

Fig. 5 is a schematic structural diagram of a feature prediction network in an embodiment of the present application.

Fig. 6 is a schematic structural diagram of a home branch in an embodiment of the present application.

FIG. 7 is a schematic diagram of another flow of the image processing method provided in an embodiment of the present application.

Fig. 8 is an example diagram of outputting prompt information in an embodiment of the present application.

Fig. 9 is a diagram showing an example of matching of positioning points and composition points in an embodiment of the present application.

Fig. 10 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.

FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

Please refer to the drawings, where the same component symbols represent the same components, and the principle of the present application is implemented in an appropriate computing environment for illustration. The following description is based on the specific embodiments of the present application exemplified, which should not be construed as limiting other specific embodiments of the present application that are not described in detail herein.

Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

Among them, Machine Learning (ML) is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance. Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies.

The solutions provided in the embodiments of the present application involve artificial intelligence machine learning technology, which is specifically illustrated by the following embodiments:

The embodiments of the present application provide an image processing method, an image processing device, a storage medium, and electronic equipment, wherein the execution subject of the image processing method may be the image processing device provided in the embodiment of the application, or integrate the image processing device The image processing device can be implemented in hardware or software. Among them, the electronic device may be a device equipped with a processor (including but not limited to a general-purpose processor, a customized processor, etc.) and having processing capabilities, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.

This application provides an image processing method, including:

Obtain the image to be detected that requires key point detection;

Optionally, in an embodiment, the key point detection model includes a feature extraction network and a feature prediction network, and the pre-trained key point detection model is invoked to perform key point detection on the image to be detected to obtain key point positions Information and key point attribution information, including:

Calling the feature extraction network to extract the image features of the image to be detected;

Invoke the feature prediction network to perform key point detection on the image feature to obtain key point location information and the key point attribution information.

Optionally, in an embodiment, the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on the image feature to obtain key point location information and the key point Attribution information, including:

Calling the location branch to perform key point location detection on the image feature to obtain the key point location information;

The attribution branch is called to perform key point attribution detection based on the image feature and the key point location information, to obtain the key point attribution information.

Optionally, in an embodiment, the location branch includes a convolution unit with a convolution kernel size of 1*1.

Optionally, in an embodiment, the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module, and the attribution branch is called to perform key points based on the image features and the key point location information. Attribution detection to obtain attribution information of the key point includes:

Calling the feature optimization sub-module to perform optimization processing on the image features to obtain optimized image features;

Calling the fusion sub-module to fuse the optimized image feature and the key point position information to obtain a fusion feature;

Invoke the output sub-module to perform key point position detection on the fusion feature to obtain the key point position information.

Optionally, in an embodiment, the feature optimization submodule includes a convolution unit with a convolution kernel size of 1*1, and the output submodule includes a convolution unit with a convolution kernel size of 1*1.

Optionally, in an embodiment, before acquiring the image to be detected that requires key point detection, the method further includes:

Acquiring a sample image and sample key point position information corresponding to the sample image, and constructing the key point detection model;

Calling the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;

Acquiring key point location loss according to the sample key point location information and the predicted key point location information, and acquiring key point attribution loss according to the predicted key point location information and the predicted key point attribution information;

The key point position loss and the key point attribution loss are fused to obtain a fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.

Optionally, in an embodiment, the acquiring key point attribution loss based on the sample key point location information and the predicted key point attribution information includes:

Perform key point clustering according to the predicted key point location information to obtain multiple human body key point sets belonging to different human bodies;

Acquire the attribution loss of the key point according to the multiple sets of human body key points and the predicted key point attribution information.

Optionally, in an embodiment, the acquiring the image to be detected that requires key point detection includes:

When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as an image to be detected;

After identifying a set of human body key points belonging to the same human body according to the key point location information and the key point attribution information, the method further includes:

Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;

Determining a positioning point and a composition type corresponding to the target human body according to the human body type and a set of human body key points corresponding to the target human body;

Determining a composition point corresponding to the target human body according to the positioning point and the composition type;

When the positioning point does not match the composition point, outputting prompt information for instructing to adjust the shooting posture of the electronic device.

Please refer to FIG. 1. FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the application. The image processing method provided by an embodiment of the application may be as follows:

In 101, obtain the image to be detected that requires key point detection.

It should be noted that the key point detection mentioned in this application is mainly the detection of key points of the human body, that is, the detection of some key points of the human body, such as eyes, nose, elbows, shoulders, etc., Connect in sequence and describe the human body through these key points of the human body.

Among them, the electronic device can receive the key point detection request input by the user, and obtain the to-be-detected image that requires key-point detection according to the key-point detection request, and can also automatically identify the to-be-detected image that needs to be key-point detection, and obtain the The image to be detected is used for key point detection.

For example, the electronic device may receive the input key point detection request through the key point detection interface including the request input interface, as shown in Figure 2, the request input interface may be in the form of an input box, and the user can request input in the form of the input box Enter the identification information of the image that needs key point detection in the interface, and enter the confirmation information (for example, directly press the Enter key on the keyboard) to input the key point detection request, which carries the image that needs key point detection的identification information. Correspondingly, the electronic device can obtain the image that needs to be detected for the key point according to the identification information in the received key point detection request, and record it as the image to be detected.

For another example, the key point detection interface described in Figure 2 also includes an "open" control. On the one hand, when the electronic device detects that the open control is triggered, it will superimpose the selection sub-interface on the key point detection interface. (As shown in Figure 3), the selection sub-interface provides the user with thumbnails of images that can be used for key point detection, such as image A, image B, image C, image D, image E, image F and other image thumbnails. For the user to find and select the thumbnail of the image that needs key point detection; on the other hand, the user can trigger the confirmation control provided by the selection sub-interface after selecting the thumbnail of the image that needs key point detection to input to the electronic device A key point detection request, the key point detection request is associated with the thumbnail of the image selected by the user, and instructs the electronic device to use the image selected by the user as the image to be detected that requires key point detection.

In 102, the pre-trained key point detection model is called to perform key point detection on the image to be detected to obtain key point location information and key point attribution information.

Exemplarily, a machine learning method is used in this application to pre-train a key point detection model. Wherein, the key point detection model is configured to simultaneously predict all the key points of the human body in the input image and the human body to which they belong, which can be set locally in the electronic device or in the server. In addition, the configuration of the key point detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs.

Correspondingly, after the electronic device obtains the to-be-detected image that requires key-point detection, it calls the pre-trained key-point detection model from the local or server, and inputs the acquired to-be-detected image into the key-point detection model to obtain the key point The key point location information and key point attribution information output by the detection model. Among them, the key point location information is used to describe all the key points of the human body in the image to be detected, and the key point attribution information is used to describe the human body to which each key point of the human body belongs.

For example, the key point location information describes the existence of the human key point A and the human key point B in the image to be detected, and the key point attribution information describes that the human key point A belongs to the human body A, and the human key point B belongs to the human body B.

In 103, a set of human body key points belonging to the same human body is identified according to key point location information and key point attribution information.

As mentioned above, the key point location information describes all the key points of the human body in the image to be detected, and the key point attribution information describes the human body to which each key point of the human body belongs. After obtaining the key point location information and key points corresponding to the image to be detected After the attribution information, the electronic device can identify the key point set belonging to the same human body according to the key point location information and the key point attribution information, and thus, can simultaneously realize the key point detection of multiple human bodies.

This application obtains the image to be detected that requires key point detection; calls the pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information; according to key point location information and key point attribution The information identifies a set of key points of the human body belonging to the same human body. Compared with related technologies, the present application does not require a human body detection algorithm as a front support, and can detect all key points of the human body in the image at the same time, thereby achieving the purpose of improving the efficiency of key point detection.

In one embodiment, the key point detection model includes a feature extraction network and a feature prediction network. The pre-trained key point detection model is called to perform key point detection on the image to be detected to obtain key point location information and key point attribution information, including:

(1) Call the feature extraction network to extract the image features of the image to be detected;

(2) Call the feature prediction network to detect the key points of the image features, and obtain the key point location information and key point attribution information.

Referring to FIG. 4, in the embodiment of the present application, the key point detection model is composed of two parts, which are a feature extraction network used for feature extraction and a feature prediction network used for key point detection. Among them, the feature extraction network can be any known feature extraction network, such as VGG, MobileNet, and ResNet. If a deeper network model such as VGG and ResNet is used, the computational complexity of the model will increase, but higher Detection accuracy. If a lightweight network model such as MobileNet is used, a certain detection accuracy will be lost, but a faster detection speed can be obtained. The specific selection can be made by a person of ordinary skill in the art according to actual needs. This application does not specifically limit this .

Correspondingly, when the electronic device calls the key point detection model to perform key point detection on the image to be detected, it can first call the feature extraction network in the key point detection model to perform feature extraction on the image to be detected to obtain the image features of the image to be detected. Call the feature prediction network in the key point detection model to perform key point detection based on the image features of the image to be detected, and obtain key point location information and key point attribution information corresponding to the image to be detected.

For example, the key point location information is displayed in the form of a key point location heat map, which is a three-dimensional matrix of height*width*keypoints, where height and width represent height and width respectively, and keypoints represent the number of key points of the human body, that is to say , Each key point of the human body corresponds to a height*width matrix, the value of each position in the matrix indicates the possibility of the key point of the human body at this position, and the larger the value, the more likely the key point of the human body is at this position. For example, you can take the position of the maximum value in each area of the key point location heat map to obtain the corresponding human body key point. Among them, the key point location heat map can be pooled to the maximum, and then the key points before and after pooling can be pooled. Point position heat map comparison, the position with the same value is taken as the key point of the human body.

In addition, the display form of the key point attribution information can be an integer human body number, that is, at each key point position of the human body detected, the feature prediction module will predict an integer as the human body number, and the human body key points with the same body number belong to The same human body.

In an embodiment, the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on image features to obtain key point location information and key point attribution information, including:

(1) Call the location branch to detect the key point position of the image feature, and obtain the key point position information;

(2) Invoke the attribution branch to perform key point attribution detection based on image features and key point location information to obtain key point attribution information.

Please refer to FIG. 5, in this embodiment of the application, the key point detection task is segmented, and a dual branch network is used to realize key point detection. One of the branch networks is configured to detect the key points of the human body in the image, which is recorded as the location branch. , The other branch network is configured to detect the human body to which the key points of the human body belong, and it is recorded as the belonging branch. Correspondingly, when the electronic device calls the feature prediction network to perform key point detection on image features, it can call the location branch in the feature prediction network to perform key point location detection based on the image feature to obtain key point location information corresponding to the image to be detected.

In addition, it should be noted that in the high-level semantic information, the attribution of the key points of the human body is the deeper feature information of the key point location. Only by knowing the accurate key point location can a more accurate prediction of the key point attribution be made. Based on this consideration, the electronic device calls the attribution branch in the feature prediction network to perform key point attribution detection based on image features and key point location information, and obtain key point attribution information corresponding to the image to be detected.

In an embodiment, the location branch includes a convolution unit with a convolution kernel size of 1*1.

In an embodiment, the attribution branch includes a feature optimization submodule, a fusion submodule, and an output submodule. The attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information, including:

(1) Call the feature optimization sub-module to optimize the image features to obtain optimized image features;

(2) Call the fusion sub-module to fuse and optimize image features and key point position information to obtain fusion features;

(3) Invoke the output sub-module to detect the key point position of the fusion feature, and obtain the key point position information.

Please refer to FIG. 6, in the embodiment of the present application, the attribution branch is composed of three parts, which are feature optimization sub-modules used to further extract image features to optimize image features, and are used to fuse optimized image features and key points. The location information fusion sub-module is used to perform key point attribution and detection of the fusion feature to the output sub-module.

Correspondingly, when the electronic device calls the attribution branch to perform key point attribution detection based on image features and key point location information, it can call the feature optimization submodule in the attribution branch to optimize image features, and record the optimized image features as optimized Image features; then, call the fusion sub-module to fuse and optimize the image features and key point location information to obtain the fusion feature; finally, call the output sub-module to detect the key point location of the fusion feature, and obtain the key point location information corresponding to the image to be detected.

Among them, the feature optimization sub-module includes a 1*1 convolution unit, the output sub-module includes a 1*1 convolution unit, and the fusion sub-module includes a Concat unit.

Exemplarily, taking the image feature as the feature map and the key point location information key point location heat map as an example, the electronic device calls the feature optimization sub-module to perform further convolution operations on the image features to optimize the image features and obtain the optimized image Feature; then, the electronic device calls the fusion submodule to connect the feature map and the key point location heat map to achieve feature fusion. For example, the feature map is 19-dimensional, and the key point location heat map is 38-dimensional, which is performed through the fusion sub-module After the channels are connected, an optimized image feature of 19+38=57 dimensions is obtained; finally, the electronic device calls the output sub-module to perform a convolution operation on the optimized image features obtained by the fusion, and obtain the key points corresponding to the image to be detected. information.

In an embodiment, before acquiring the image to be detected that requires key point detection, the method further includes:

(1) Obtain sample image and sample key point position information corresponding to the sample image, and build a key point detection model;

(2) Invoke the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;

(3) Obtain key point location loss based on sample key point location information and predicted key point location information, and obtain key point attribution loss based on predicted key point location information and predicted key point attribution information;

(4) The fusion loss is obtained by fusing the position loss of the key point and the loss of the key point attribution, and the parameters of the key point detection model are adjusted according to the fusion loss.

The embodiment of the present application also provides a training solution for the key point detection model.

The electronic device first obtains the sample image and the sample key point position information corresponding to the sample image. For example, an image including the human body can be obtained from the ImageNet data set as the sample image, and the corresponding sample key point position information is obtained by labeling the sample image.

In addition, the electronic device also constructs a key point detection model, and the structure of the key point detection model can refer to the relevant description in the above embodiment, which will not be repeated here.

Then, the electronic device calls the key point detection model to perform key point detection on the sample image, and correspondingly obtains the predicted key point location information and predicted key point attribution information of the corresponding sample image. The predicted key point location information describes all the information in the sample image. The key points of the human body, and the attribution information of the predicted key points describes the human body to which each key point of the human body belongs.

Then, the electronic device obtains the key point location loss according to the sample key point location information and the predicted key point location information, and the key point location loss is used to measure the difference between the predicted key point location information and the sample key point location information. Taking sample key point location information and predicted key point location information in the form of heat maps (the two have the same size) as an example, the key point location loss can be expressed as:

Among them, L _position represents the loss of the key point position, (i,j) represents the coordinate position, p(i,j) represents the value of the position (i,j) in the heat map of the predicted key point position, and g(i,j) represents the sample The value of the position (i, j) in the heat map of the key point position, width represents the width of the heat map of the predicted key point location, and height represents the height of the heat map of the predicted key point location.

On the other hand, the electronic device also obtains the key point attribution loss based on the predicted key point location information and the predicted key point attribution information. It should be noted that the loss of key point attribution is different from the loss of key point position. Because the number of human bodies in different sample images is different, it is impossible to pre-mark the attribution of human body key points in the sample image, that is, there is no real human body key point. Attribution is the training goal.

In an embodiment, obtaining the attribution loss of the key point according to the predicted key point location information and the predicted key point attribution information includes:

(1) Carry out key point clustering according to the predicted key point location information, and obtain multiple human body key point sets belonging to different human bodies;

(2) Obtain the attribution loss of key points based on multiple sets of human body key points and predicted key point attribution information.

In the embodiment of this application, the idea of clustering is adopted for processing. In the training and prediction process of the model, an integer is predicted at each key point position as the human body number. Therefore, the key point attribution loss needs to ensure that the training is towards " The goal of narrowing the gap between the body numbers of the same human body and increasing the gap between the body numbers of different human bodies" can be expressed as:

Clustering algorithm (selected by those of ordinary skill in the art according to actual needs) is used to cluster key points according to the key points of the human body in the sample image described by the predicted key point location information to obtain multiple sets of human key points belonging to different human bodies , Among them, the human body key points in the same human body key point set belong to the same human body.

According to the predicted key point attribution information, average the human body number value corresponding to each human body key point set to obtain:

Where n represents the set of human body key points corresponding to the nth person, k represents the kth key point, K represents the number of human body key points, and h _nk represents the body number at the kth person key point of the nth person;

Calculate the difference between the human body number at each human body key point position in each human body key point set and the aforementioned value, and find the square sum:

Where N represents the number of key point collections of the human body;

Calculate the difference between the human body mean values between different sets of key points of the human body to ensure that when the difference between the body numbers of a certain two human bodies is very large, the loss is 0, and when the difference between the body numbers of a certain two human bodies is very large Very small, the loss is large, and it needs to be reduced during the training process:

Where σ is a constant, taking empirical values, n/n’ ∈ [1, N], and n≠n’;

L _attribution =L1+L2;

Where L _attribution represents the attribution loss of key points.

In the embodiment of the present application, after obtaining the key point location loss and the key point attribution loss, the electronic device also fuses the key point location loss and the key point attribution loss to obtain the fusion loss, which can be expressed as:

Ltotal=L _position +L _attribution ;

Among them, Ltotal represents the fusion loss.

After obtaining the fusion loss, the electronic device adjusts the parameters of the key point detection model according to the fusion loss until the training of the key point detection model is completed.

In an embodiment, acquiring the image to be detected that requires key point detection includes:

(1) When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as the image to be detected;

After identifying the set of human body key points belonging to the same human body according to the key point location information and key point attribution information, it also includes:

(2) Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;

(3) Determine the positioning point and composition type of the target human body according to the human body type and the set of human body key points corresponding to the target human body;

(4) Determine the composition point corresponding to the human body according to the positioning point and the composition type;

(5) When the positioning point does not match the composition point, a prompt message for instructing to adjust the shooting posture of the electronic device is output.

It should be noted that the shooting scene is the scene that the camera of the electronic device is aimed at after the shooting function is enabled, and it can be any scene, which can include people and objects.

For example, the electronic device can start the system application "camera" of the electronic device according to the user's operation. After the "camera" is started, the electronic device will enable the shooting function to collect images in real time through the camera. At this time, the camera is aimed at The scene is the shooting scene. Among them, the electronic device can start the "camera" according to the user's touch operation on the entrance of the "camera", and can also start the "camera" according to the user's voice password "start the camera" and so on.

In the embodiment of the present application, when the electronic device enables the shooting function, it obtains a preview image of the shooting scene, uses the preview image as the image to be detected that requires key point detection, performs key point detection on it, and obtains the preview image. A set of human body key points belonging to the same human body. When there are multiple human bodies in the preview image, a human body key point set corresponding to each human body will be finally obtained, and there are multiple human body key point sets; and when there are multiple human body key points in the preview image When there is a human body, a set of human body key points corresponding to the human body will finally be obtained.

After that, the electronic device determines the target human body based on the identified set of key points of the human body. For example, when there is a set of human body key points, the human body corresponding to the human body key point set is directly determined as the target human body; when there are multiple human body key point sets, one of the human bodies is determined according to the preset target decision strategy The human body corresponding to the key point set is regarded as the target human body.

After determining the target human body, the electronic device further classifies the target human body in the shooting scene according to the set of human body key points corresponding to the target human body and the preset human body classification strategy to obtain the human body type of the target human body. It should be noted that the classification of human body types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.

After that, the electronic device determines the positioning point corresponding to the target human body according to the human body type and the set of key points of the human body corresponding to the target human body according to the preset positioning point decision strategy. In addition, it also determines the corresponding target human body according to the preset composition type decision strategy. The type of composition. Among them, the positioning point is used to represent the position of the target human body. It should be noted that the division of composition types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.

It should be noted that, corresponding to different composition types in the embodiment of the present application, a plurality of optional candidate composition points are preset. The electronic device may further determine the currently selectable candidate composition points according to the determined composition type, and then determine the composition point corresponding to the target human body from the currently selectable candidate composition points according to the positioning point.

After determining the positioning point and composition point of the corresponding target human body, the electronic device determines in real time whether the positioning point matches the composition point, and if it does not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device to make the target in the shooting scene The positioning point of the human body is matched with the composition point to obtain a better composition; if they match, the shooting scene can be directly photographed to obtain the shooting image of the shooting scene.

Wherein, matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance. This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.

Please refer to FIG. 7. FIG. 7 is a schematic diagram of another flow of the image processing method provided by the embodiment of the application. The flow of the image processing method provided by the embodiment of the application may also be as follows:

In 201, when the shooting function is enabled, the electronic device obtains a preview image of the shooting scene, and uses the preview image as an image to be detected that requires key point detection.

In the embodiment of the present application, the electronic device acquires a preview image of the shooting scene when the shooting function is enabled, and uses the preview image as the image to be detected that requires key point detection.

In 202, the electronic device invokes the pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information.

In 203, the electronic device identifies a set of human body key points belonging to the same human body based on the key point location information and the key point attribution information.

As mentioned above, the key point location information describes all the key points of the human body in the image to be detected, and the key point attribution information describes the human body to which each key point of the human body belongs. After obtaining the key point location information and key points corresponding to the image to be detected After the attribution information, the electronic device can identify the set of key points belonging to the same human body based on the key point location information and the key point attribution information. Among them, when there are multiple human bodies in the image to be detected, a human body key point set corresponding to each human body will be finally obtained, a total of multiple human body key point sets; and when there is a human body in the image to be detected, the corresponding human body key point set will be finally obtained. It should be a collection of key points of the human body.

In 204, the electronic device determines the target human body according to the identified human body key point set, and classifies the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body.

Among them, the electronic device determines the target human body based on the identified set of key points of the human body. For example, when there is a set of human body key points, the human body corresponding to the human body key point set is directly determined as the target human body; when there are multiple human body key point sets, one of the human bodies is determined according to the preset target decision strategy The human body corresponding to the key point set is regarded as the target human body.

It should be noted that there are no specific restrictions on the setting of the target decision strategy in this application, and can be set by a person of ordinary skill in the art according to actual needs.

Exemplarily, when the human body key point set of the target human body includes only the head key points, the head length and head width of the target human body are obtained according to the head key points, and the larger of the head length and head width is obtained. The ratio of the value to the length of the bounding box of the portrait. If the ratio is in the first ratio interval, the target human body is determined to be the first human body type, if the ratio is in the second ratio interval, the target human body is determined to be the second human body type, if the ratio is in the third If the ratio is in the fourth ratio interval, the target human body is determined to be the third human body type, and if the ratio is in the fourth ratio interval, the target human body is determined to be the fourth human body type; or,

When the set of human body key points of the target human body includes head key points and foot key points, the target human body is determined to be the fourth human body type; or,

When the set of key points of the target human body includes key points other than the key points of the feet, the target human body is determined to be the third body type; or,

When the set of human body key points of the target human body includes key points other than the key points of the hip joint and the key points of the feet, the target human body is determined to be the second human body type.

In the embodiment of this application, an optional target human body classification strategy is provided. First, identify whether the detected human body key points include only the head key points. If only the head key points are included, it means that there may be other key points that have not been detected. come out.

At this time, the electronic device further obtains the head length and head width of the target human body according to the key points of the head. Then, the electronic device determines the larger value of the head length and the head width, and calculates the larger value and the length of the bounding box of the portrait (where the length of the bounding box of the portrait is the length of the side of the vertical axis). Ratio, and then divide the human body type according to the calculated ratio. For example, if the head length is greater than the head width, the electronic device calculates the ratio of the head length to the portrait bounding box length. Correspondingly, if the head width is greater than the head length, the electronic device calculates the head width and the portrait bounding box length Ratio.

Wherein, if the ratio is in the first ratio interval, it is determined that the target human body is the first human body type;

If the ratio is in the second ratio interval, it is determined that the target human body is the second human body type;

If the ratio is in the third ratio interval, the target human body is determined to be the third human body type;

If the ratio is in the fourth ratio interval, it is determined that the target human body is the fourth human body type.

In the embodiment of the present application, there are four body types defined, which are the first body type, the second body type, the third body type, and the fourth body type. Among them, each ratio interval can be divided by a person of ordinary skill in the art according to actual needs, and this application does not specifically limit this.

Exemplarily, the first ratio interval is configured as (1/4, +∞], that is, when the ratio of the larger of the head length and head width of the target human body to the length of the bounding box of the portrait is greater than 1/4, it is determined The target human body in the shooting scene is the first human body type, and it is determined that the user wants to take a close-up of the face of the aforementioned target human body at this time;

The second ratio interval is configured as (1/6, 1/4), that is, when the ratio of the larger value of the head length and head width of the target human body to the length of the bounding box of the portrait is greater than 1/6, but not greater than 1/ At 4 o'clock, it is determined that the target human body in the shooting scene is the second human body type, and it is determined that the user wants to photograph the bust of the aforementioned target human body at this time;

The third ratio interval is configured as (1/9, 1/6), that is, when the ratio of the larger value of the head length and head width of the target human body to the length of the bounding box of the portrait is greater than 1/9, but not greater than 1/ At 6 o'clock, it is determined that the target human body in the shooting scene is the third human body type, and it is determined that the user wants to take a seven-part portrait of the aforementioned target human body at this time;

The fourth ratio interval is configured as (-∞,1/9], that is, when the ratio of the larger value of the head length and head width of the target human body to the length of the portrait bounding box is not greater than 1/9, it is determined that the shooting scene The target human body is the fourth human body type, and it is determined that the user wants to take a full-length image of the aforementioned target human body at this time.

In addition, when the detected key points of the human body include key points of other parts in addition to the key points of the head, the human body type is classified according to the key points of other parts.

Wherein, when the detected key points of the human body include the key points of the head and the key points of the feet, it is determined that the target human body in the shooting scene is the fourth human body type, and it is determined that the user wants to take a full-length image of the aforementioned target human body at this time;

When the detected key points of the human body include key points other than the key points of the feet, it is determined that the target human body is the third human body type, and it is determined that the user wants to take a seven-part portrait of the aforementioned target human body at this time;

When the key points of the human body include key points other than the key points of the hip joint and the key points of the feet, the target human body is determined to be the second human body type, and it is determined that the user wants to take the bust of the aforementioned target human body at this time.

In 205, the electronic device determines the positioning point and composition type corresponding to the target human body according to the human body type and the set of human body key points corresponding to the target human body.

Among them, the positioning point is used to represent the position of the human body. In the embodiment of the present application, after the electronic device classifies and obtains the human body type, it further determines the positioning point corresponding to the target human body according to the human body type and the aforementioned set of key points of the target human body according to the preset positioning point decision strategy. In addition, according to The preset composition type decision strategy determines the composition type corresponding to the aforementioned target human body.

Among them, the division of composition types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.

For example, the composition types classified in the embodiment of the present application include a facial close-up composition and a full-body composition.

Exemplarily, the electronic device recognizes whether the head orientation of the target human body is forward or lateral according to the head key points in the human body key point set of the target human body;

When the head orientation of the target human body is forward and the human body type is the first human body type, the geometric center point of the bounding box of the portrait is determined as the anchor point, and the composition type is determined to be the first composition type; or,

When the head orientation of the target human body is lateral and the human body type is the first human body type, multiple symmetric head key points among the head key points are identified, and the geometric center points of the multiple symmetric head key points are determined and positioned Point and confirm that the composition type is the first composition type; or,

When the head of the target human body is oriented laterally and the human body type is the second human body type, the geometric center point of the bounding box of the portrait is determined as the anchor point, and the composition type is determined to be the second composition type; or,

When the head orientation of the target human body is positive, and the human body type is the second human body type, the third human body type, or the fourth human body type, multiple symmetrical head key points among the head key points are identified, and multiple The geometric center point of the key point of the symmetrical head determines the positioning point, and the composition type is determined to be the second composition type; or,

When the head of the target human body is oriented laterally, and the human body type is the third human body type or the fourth human body type, the average value of the ordinate of the key points of the head is determined as the ordinate of the positioning point, and the geometric center of the bounding box of the portrait is determined. The abscissa of the point is determined as the abscissa of the anchor point, and the composition type is determined as the second composition type.

This application provides an optional anchor point decision strategy and composition type decision strategy.

Wherein, the electronic device first recognizes whether the head orientation of the target human body is forward or lateral according to the head key points in the human body key point set of the target human body.

For example, the electronic device can obtain the abscissa of the key point of the eye, the abscissa of the key point of the nose, and the abscissa of the key point of the mouth, and then obtain the average value of the abscissa, if the average value is located at the left or right of the bounding box of the portrait Within 1/4 of the area, the head is judged to be lateral, otherwise it is positive.

Then, the positioning point and composition type are further determined according to the recognized head orientation and human body type.

Wherein, when the head orientation of the target human body is forward and the human body type is the first human body type, the geometric center point of the bounding box of the portrait is determined as the anchor point, and the composition type is determined to be the first composition type (ie, the face close-up type composition) ).

When the head orientation of the target human body is lateral and the human body type is the first human body type, multiple symmetric head key points among the head key points are identified, and the geometric center points of the multiple symmetric head key points are determined and positioned Point, and confirm that the composition type is the first composition type. Among them, symmetrical head key points refer to head key points that appear in pairs and are all detected, such as left eye key point and right eye key point, left ear key point and right ear key point, etc. It should be noted that the geometric center points of the multiple symmetric head key points are the geometric center points of the polygon obtained by connecting the multiple symmetric head key points.

When the head of the target human body is oriented laterally and the human body type is the second human body type, the geometric center point of the portrait bounding box is determined as the positioning point, and the composition type is determined to be the second composition type (ie, the whole body type composition).

In 206, the electronic device determines the composition point corresponding to the target human body according to the positioning point and the composition type.

Exemplarily, when the composition type is the first composition type, the candidate composition point closest to the anchor point is selected from the candidate composition points corresponding to the first composition type, and the candidate composition point is determined as the composition point;

When the composition type is the second composition type, the candidate composition point closest to the anchor point is selected from the candidate composition points corresponding to the second composition type, and determined as the composition point.

Exemplarily, for the first composition type, a plurality of optional candidate composition points are preset in the embodiment of the present application, which are divided into two parts, which are the candidate composition points suitable for horizontal screen shooting and the candidate composition points suitable for vertical screen shooting. Candidate composition points during shooting, where candidate composition points suitable for landscape shooting include the center of the image, the midpoint of the upper third line, and the intersection of the upper third line and other thirds, which are suitable for portrait shooting Candidate composition points include the center of the image and the midpoint of the upper three-pointer.

Similarly, for the second composition type, a plurality of optional candidate composition points are also preset in the embodiment of this application, which are also divided into two parts, which are the candidate composition points suitable for landscape shooting and the candidate composition points suitable for vertical shooting. Candidate composition points for screen shooting, where candidate composition points for landscape shooting include the image center, the four intersection points of the top/bottom three-point line and the left/right three-point line, and the top/bottom three-point line and the left /Four midpoints of the right third line. The candidate composition points suitable for portrait shooting include the center of the image and the midpoint of the upper third line.

Based on the candidate composition points set above, the electronic device first recognizes that the current shooting mode is portrait mode or landscape mode, and then determines the closest to the anchor point from the determined composition type among the candidate composition points corresponding to the current shooting mode Candidate composition points are used as composition points corresponding to the target human body.

In 207, when the positioning point does not match the composition point, the electronic device outputs prompt information for instructing to adjust the shooting posture of the electronic device.

Correspondingly, the electronic device determines in real time whether the positioning point of the target human body in the shooting scene matches the composition point, and if it does not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device, so that the positioning point of the target human body in the shooting scene matches the composition point. The composition points are matched to obtain a better composition.

Exemplarily, please refer to FIG. 8, the determined positioning point is the geometric center point of a plurality of symmetrical head key points of the human head, and the determined composition point is the intersection of the upper three-point line and the right three-point line. The electronic device superimposes the upper/lower/left/right three-pointers, as well as the determined positioning point and composition point on the preview image collected in real time, and uses the arrow from the positioning point to the composition point as the prompt information to guide the user to adjust The shooting posture of the electronic device makes the positioning point and the composition point in the real-time preview image match, as shown in FIG. 9.

In 208, when the positioning point matches the composition point, the electronic device photographs the shooting scene to obtain a photographed image.

When the positioning point matches the composition point, the electronic device determines that a better composition can be obtained at this time, that is, the shooting scene is photographed, so as to obtain a photographed image of the shooting scene.

The application also provides an image processing device. Please refer to FIG. 10, which is a schematic structural diagram of an image processing apparatus provided by an embodiment of the application. The image processing device is applied to electronic equipment. The image processing device includes an image acquisition module 301, an image detection module 302, and a human body recognition module 303, as follows:

The image acquisition module 301 is used to acquire the image to be detected that requires key point detection;

The image detection module 302 is configured to call a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;

The human body recognition module 303 is used to identify a set of human body key points belonging to the same human body according to the key point location information and the key point attribution information.

In one embodiment, the key point detection model includes a feature extraction network and a feature prediction network. When the pre-trained key point detection model is called to perform key point detection on the image to be detected, and the key point location information and key point attribution information are obtained, the image detection Module 302 is used to:

Call the feature extraction network to extract the image features of the image to be detected;

Call the feature prediction network to detect the key points of the image features, and obtain the key point location information and key point attribution information.

In one embodiment, the feature prediction network includes a location branch and an attribution branch. The feature prediction network is called to perform key point detection on image features, and when key point location information and key point attribution information are obtained, the image detection module 302 is used to:

Call the location branch to detect the key point position of the image feature, and obtain the key point position information;

The attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information.

In one embodiment, the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module. When the attribution branch is called to perform key point attribution detection based on image features and key point location information, and the key point attribution information is obtained, the image detection module 302 is used for:

Call the feature optimization sub-module to optimize the image features to obtain optimized image features;

Call the fusion sub-module to fuse and optimize image features and key point location information to obtain fusion features;

Call the output sub-module to detect the key point position of the fusion feature, and obtain the key point position information.

In an embodiment, the feature optimization submodule includes a convolution unit with a convolution kernel size of 1*1, and the output submodule includes a convolution unit with a convolution kernel size of 1*1.

In an embodiment, the image processing device provided in this application further includes a model training module, which is used to:

Obtain sample image and sample key point location information corresponding to the sample image, and build a key point detection model;

Call the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;

Obtain key point location loss based on sample key point location information and predicted key point location information, and obtain key point attribution loss based on predicted key point location information and predicted key point attribution information;

Fusion key point location loss and key point attribution loss are fused to obtain the fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.

In an embodiment, when obtaining the attribution loss of the key point according to the predicted key point location information and the predicted key point attribution information, the model training module is used to:

Obtain the attribution loss of key points according to multiple sets of human body key points and predicted key point attribution information.

In an embodiment, when acquiring an image to be detected that requires key point detection, the image acquisition module 301 is used to:

When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as the image to be detected;

The image processing device provided in this application also includes a composition prompting module, which is used after the human body recognition module 303 identifies a set of human body key points belonging to the same human body according to the key point position information and key point attribution information:

Determine the positioning point and composition type of the target human body according to the human body type and the set of human body key points corresponding to the target human body;

Determine the composition point corresponding to the human body according to the positioning point and the composition type;

When the positioning point does not match the composition point, a prompt message for instructing to adjust the shooting posture of the electronic device is output.

This application also provides an electronic device. Please refer to FIG. 11. The electronic device includes a processor 401 and a memory 402.

The processor 401 in the embodiment of the present application is a general-purpose processor, such as an ARM architecture processor.

A computer program is stored in the memory 402, which may be a high-speed random access memory or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the computer program in the memory 402 to implement the following functions:

Obtain the image to be detected that requires key point detection;

Call the pre-trained key point detection model to perform key point detection on the image to be detected, and obtain key point location information and key point attribution information;

In one embodiment, the key point detection model includes a feature extraction network and a feature prediction network. When the pre-trained key point detection model is invoked to perform key point detection on the image to be detected, and the key point location information and key point attribution information are obtained, the processor 401 is used to execute:

In an embodiment, the feature prediction network includes a location branch and an attribution branch. The feature prediction network is called to perform key point detection on image features, and when key point location information and key point attribution information are obtained, the processor 401 is configured to execute:

In an embodiment, the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module. When the attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information, the processor 401 Used to execute:

In an embodiment, before acquiring the image to be detected that requires key point detection, the processor 401 is further configured to execute:

In an embodiment, when acquiring the attribution loss of a key point according to the predicted key point location information and the predicted key point attribution information, the processor 401 is configured to execute:

In an embodiment, when acquiring a to-be-detected image that requires key point detection, the processor 401 is configured to execute:

After identifying the set of human body key points belonging to the same human body according to the key point location information and the key point attribution information, the processor 401 is further configured to execute:

It should be noted that the electronic device provided in this embodiment of the application belongs to the same concept as the image capturing method in the above embodiment, and any method provided in the image capturing method embodiment can be run on the electronic device. The specific implementation process is detailed. See the embodiment of the feature extraction method, which will not be repeated here.

It should be noted that the electronic device provided in this embodiment of the application belongs to the same concept as the image processing method in the above embodiment. Any method provided in the image processing method embodiment can be run on the electronic device. The specific implementation process is detailed. See the embodiment of the image processing method, which will not be repeated here.

It should be noted that for the image processing method of the embodiment of the present application, those of ordinary skill in the art can understand that all or part of the process of implementing the image processing method of the embodiment of the present application can be completed by controlling the relevant hardware through a computer program. The computer program may be stored in a computer readable storage medium, such as stored in the memory of an electronic device, and executed by a processor in the electronic device, and may include embodiments such as image processing methods during execution. Process. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.

The above describes in detail an image processing method, device, storage medium, and electronic equipment provided by the embodiments of the present application. Specific examples are used in this article to illustrate the principles and implementations of the present application. The description of the above embodiments is only It is used to help understand the methods and core ideas of this application; at the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and scope of application. In summary, this specification The content should not be construed as a limitation on this application.

Claims

An image processing method, which includes:

Obtain the image to be detected that requires key point detection;

Calling a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;

According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.
The image processing method according to claim 1, wherein the key point detection model includes a feature extraction network and a feature prediction network, and the pre-trained key point detection model is invoked to perform key point detection on the image to be detected to obtain Key point location information and key point attribution information, including:

Calling the feature extraction network to extract the image features of the image to be detected;

Invoke the feature prediction network to perform key point detection on the image feature to obtain key point location information and the key point attribution information.
The image processing method according to claim 2, wherein the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on the image feature to obtain key point location information and all The attribution information of the key points, including:

Calling the location branch to perform key point location detection on the image feature to obtain the key point location information;

The attribution branch is called to perform key point attribution detection based on the image feature and the key point location information, to obtain the key point attribution information.
The image processing method according to claim 3, wherein the location branch includes a convolution unit with a convolution kernel size of 1*1.
The image processing method according to claim 3, wherein the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module, and the attribution branch is called according to the image feature and the key point location information Perform key point attribution detection to obtain the key point attribution information, including:

Calling the feature optimization sub-module to perform optimization processing on the image features to obtain optimized image features;

Calling the fusion sub-module to fuse the optimized image feature and the key point position information to obtain a fusion feature;

Invoke the output sub-module to perform key point position detection on the fusion feature to obtain the key point position information.
The image processing method according to claim 5, wherein the feature optimization sub-module includes a convolution unit with a convolution kernel size of 1*1, and the output sub-module includes a convolution unit with a convolution kernel size of 1*1. unit.
The image processing method according to any one of claims 1 to 6, wherein before said acquiring the image to be detected that requires key point detection, the method further comprises:

Acquiring a sample image and sample key point position information corresponding to the sample image, and constructing the key point detection model;

Calling the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;

Acquiring key point location loss according to the sample key point location information and the predicted key point location information, and acquiring key point attribution loss according to the predicted key point location information and the predicted key point attribution information;

The key point position loss and the key point attribution loss are fused to obtain a fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
8. The image processing method according to claim 7, wherein said acquiring the attribution loss of a key point according to the position information of the sample key point and the attribution information of the predicted key point comprises:

Perform key point clustering according to the predicted key point location information to obtain multiple human body key point sets belonging to different human bodies;

Acquire the attribution loss of the key point according to the multiple sets of human body key points and the predicted key point attribution information.
The image processing method according to any one of claims 1 to 6, wherein the acquiring the image to be detected that needs to be detected by key points comprises:

When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as an image to be detected;

After identifying a set of human body key points belonging to the same human body according to the key point location information and the key point attribution information, the method further includes:

Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;

Determining a positioning point and a composition type corresponding to the target human body according to the human body type and a set of human body key points corresponding to the target human body;

Determining a composition point corresponding to the target human body according to the positioning point and the composition type;

When the positioning point does not match the composition point, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
An image processing device, which includes:

The image acquisition module is used to acquire the to-be-detected image that requires key point detection;

The image detection module is used to call a pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information;

The human body recognition module is used to identify a set of human body key points belonging to the same human body based on the key point location information and the key point attribution information.
A storage medium on which a computer program is stored, wherein, when the computer program is loaded by a processor, it executes:

Obtain the image to be detected that requires key point detection;

Calling a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;

According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.
An electronic device includes a processor and a memory, the memory stores a computer program, wherein the processor loads the computer program to execute:

Obtain the image to be detected that requires key point detection;

Calling a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;

According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.
The electronic device according to claim 12, wherein the key point detection model includes a feature extraction network and a feature prediction network, and the pre-trained key point detection model is called to perform key point detection on the image to be detected to obtain key points In the case of location information and key point attribution information, the processor is used to execute:

Calling the feature extraction network to extract the image features of the image to be detected;

Invoke the feature prediction network to perform key point detection on the image feature to obtain key point location information and the key point attribution information.
The electronic device according to claim 13, wherein the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on the image feature to obtain key point location information and the key point. When the attribution information is clicked, the processor is used to execute:

Calling the location branch to perform key point location detection on the image feature to obtain the key point location information;

The attribution branch is called to perform key point attribution detection based on the image feature and the key point location information, to obtain the key point attribution information.
The electronic device according to claim 14, wherein the location branch includes a convolution unit with a convolution kernel size of 1*1.
The electronic device according to claim 14, wherein the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module, and the attribution branch is called to perform keying based on the image features and the location information of the key points. Point attribution detection, when the key point attribution information is obtained, the processor is configured to execute:

Calling the feature optimization sub-module to perform optimization processing on the image features to obtain optimized image features;

Calling the fusion sub-module to fuse the optimized image feature and the key point position information to obtain a fusion feature;

Invoke the output sub-module to perform key point position detection on the fusion feature to obtain the key point position information.
The electronic device according to claim 16, wherein the feature optimization sub-module includes a convolution unit with a convolution kernel size of 1*1, and the output sub-module includes a convolution unit with a convolution kernel size of 1*1 .
The electronic device according to any one of claims 12-17, wherein, before acquiring the image to be detected that requires key point detection, the processor is further configured to execute:

Acquiring a sample image and sample key point position information corresponding to the sample image, and constructing the key point detection model;

Calling the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;

Acquiring key point location loss according to the sample key point location information and the predicted key point location information, and acquiring key point attribution loss according to the predicted key point location information and the predicted key point attribution information;

The key point position loss and the key point attribution loss are fused to obtain a fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
The electronic device according to claim 18, wherein, when acquiring the attribution loss of a key point according to the sample key point location information and the predicted key point attribution information, the processor is configured to execute:

Perform key point clustering according to the predicted key point location information to obtain multiple human body key point sets belonging to different human bodies;

Acquire the attribution loss of the key point according to the multiple sets of human body key points and the predicted key point attribution information.
The electronic device according to any one of claims 12-17, wherein, when acquiring an image to be detected that requires key point detection, the processor is configured to execute:

When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as an image to be detected;

After identifying a set of human body key points belonging to the same human body according to the key point location information and the key point attribution information, the processor is further configured to execute:

Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;

Determining a positioning point and a composition type corresponding to the target human body according to the human body type and a set of human body key points corresponding to the target human body;

Determining a composition point corresponding to the target human body according to the positioning point and the composition type;

When the positioning point does not match the composition point, outputting prompt information for instructing to adjust the shooting posture of the electronic device.