WO2021169754A1

WO2021169754A1 - Photographic composition prompting method and apparatus, storage medium, and electronic device

Info

Publication number: WO2021169754A1
Application number: PCT/CN2021/074905
Authority: WO
Inventors: 罗彤; 李亚乾; 蒋燚
Original assignee: Oppo广东移动通信有限公司; 上海瑾盛通信科技有限公司
Priority date: 2020-02-27
Filing date: 2021-02-02
Publication date: 2021-09-02
Also published as: CN111277759B; CN111277759A

Abstract

Disclosed in the present application are a photographic composition prompting method and apparatus, a storage medium, and an electronic device. A preview image of a photography scene is obtained for key point detection to obtain human body key points of a human body in the photography scene; the preview image is divided into a plurality of category areas, a positioning point set corresponding to the photography scene is obtained in combination with the human body key points, and a corresponding photographic composition point set is determined; if the positioning point set does not match the photographic composition point set, prompt information used for giving indication to adjust the photographing attitude of the electronic device is output.

Description

Composition reminding method, device, storage medium and electronic equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 27, 2020, the application number is 202010125410.4, and the invention title is "Composition Prompt Method, Apparatus, Storage Medium and Electronic Equipment", the entire content of which is incorporated by reference In this application.

Technical field

This application relates to the field of image processing technology, and in particular to a composition prompting method, device, storage medium, and electronic equipment.

Background technique

At present, people's lives are inseparable from electronic devices such as smart phones and tablet computers. The various and rich functions provided by these electronic devices enable people to entertain and work anytime and anywhere. For example, by using the shooting function of the electronic device, the user can use the electronic device to shoot anytime and anywhere. However, in order to obtain high-quality images, not only the electronic equipment is required to have a high shooting ability, but also the user is required to have certain professional shooting skills.

Summary of the invention

The embodiments of the present application provide a composition prompting method, device, storage medium, and electronic equipment, which can improve the quality of images captured by the electronic equipment.

The composition prompting method provided by the embodiment of the present application includes:

Acquiring a preview image of the shooting scene, and calling a pre-trained key point detection model to perform key point detection on the preview image, to obtain the human body key points of the human body in the shooting scene;

Dividing the preview image into multiple category areas, and obtaining a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;

Determining a set of composition points corresponding to the set of positioning points;

When the set of positioning points does not match the set of composition points, outputting prompt information for instructing to adjust the shooting posture of the electronic device.

The composition prompting device provided by the embodiment of the present application includes:

The key point detection module is used to obtain a preview image of the shooting scene, and call a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene;

An anchor point determination module, configured to divide the preview image into a plurality of category areas, and obtain an anchor point set corresponding to the shooting scene according to the category area and the key points of the human body;

A composition point determination module, configured to determine a composition point set corresponding to the positioning point set;

The composition prompting module is configured to output prompt information for instructing to adjust the shooting posture of the electronic device when the set of positioning points does not match the set of composition points.

The storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is loaded by a processor, the composition prompting method as provided in the present application is executed.

The electronic device provided by the embodiment of the present application includes a processor and a memory, the memory stores a computer program, and the processor loads the computer program to execute the composition prompting method provided by the present application.

Description of the drawings

In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without creative work.

FIG. 1 is a schematic flowchart of a composition prompting method provided by an embodiment of the application.

Fig. 2 is a schematic diagram of the key points of the human body detected in an embodiment of the present application.

Fig. 3 is a schematic diagram of intercepting a human body image in an embodiment of the present application.

Fig. 4 is a schematic structural diagram of a key point detection model provided by an embodiment of the present application.

Fig. 5 is a detailed structure diagram of a key point detection model provided by an embodiment of the present application.

Fig. 6 is a schematic structural diagram of the first position segment in an embodiment of the present application.

Fig. 7 is a schematic structural diagram of the second position segment in an embodiment of the present application.

Fig. 8 is an example diagram of outputting prompt information in an embodiment of the present application.

FIG. 9 is a schematic diagram of another flow chart of the composition prompting method provided by an embodiment of the present application

Fig. 10 is a schematic structural diagram of a composition prompting device provided by an embodiment of the present application.

FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

Detailed ways

Please refer to the drawings, where the same component symbols represent the same components, and the principle of the present application is implemented in an appropriate computing environment for illustration. The following description is based on the specific embodiments of the present application exemplified, which should not be construed as limiting other specific embodiments of the present application that are not described in detail herein.

Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.

Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.

Among them, Machine Learning (ML) is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance. Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies.

The solutions provided in the embodiments of the present application involve artificial intelligence machine learning technology, which is specifically illustrated by the following embodiments:

The embodiment of the present application provides a model training method, a composition prompting method, a composition prompting device, a portrait segmentation device, a storage medium, and an electronic device, wherein the execution subject of the model training method may be the composition prompting device provided in the embodiment of the present application , Or an electronic device integrated with the composition prompting device, wherein the composition prompting device can be implemented in hardware or software; the execution subject of the composition prompting method can be the portrait segmentation device provided in the embodiment of the application, or the composition prompting device can be integrated with The electronic equipment of the portrait segmentation device, wherein the portrait segmentation device can be implemented in hardware or software. Among them, the electronic device may be a device equipped with a processor (including but not limited to a general-purpose processor, a customized processor, etc.) and having processing capabilities, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.

This application provides a composition prompting method, including

Optionally, in an embodiment, the invoking a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene includes:

Intercepting the human body image of the human body from the preview image;

Calling the key point detection model to perform key point detection on the human body image to obtain the human body key point.

Optionally, in an embodiment, intercepting the human body image of the human body from the preview image includes:

Calling a pre-trained portrait detection model to perform portrait detection on the preview image to obtain a portrait bounding box corresponding to the preview image;

The image content in the bounding box of the portrait is intercepted to obtain the image of the human body.

Optionally, in an embodiment, the key point detection model includes a feature extraction network, a dual-branch network, and an output network, the dual-branch network includes a location branch network and a relationship branch network, and the key point detection is invoked The model detects the key points of the human body image to obtain the key points of the human body, including:

Calling the feature extraction network to extract the image features of the human body image;

Calling the location branch network to detect key points of the candidate body according to the image feature, and calling the relationship branch network to detect the connection relationship between the key points of the candidate body according to the image feature;

The output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.

Optionally, in an embodiment, the location branch network includes N location segments, the relationship branch network includes N relationship segments, and the location branch network is called to obtain a candidate based on the image feature detection. The key points of the human body, and calling the relationship branch network to obtain the connection relationship between the key points of the candidate body according to the image feature detection, including:

Call the first position segment to obtain the first group of candidate key points based on the image feature detection, and call the first relationship segment to obtain the connection between the first group of candidate key points based on the image feature detection relation;

Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation according to the first Fusion feature detection obtains the key points of the second group of candidates, and calls the second relationship segment to obtain the connection relationship between the key points of the second group of candidates according to the first fusion feature detection;

Fusion of the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation basis is obtained The key points of the Nth group of candidates obtained by the N-1th fusion feature detection, and the Nth relationship segment is called to obtain the connection relationship between the key points of the Nth group of candidates according to the N-1th fusion feature detection ；

Use the Nth group of candidate key points as the candidate key points detected by the location branch network, and use the connection relationship between the N groups of candidate key points as the relationship branch network detected The connection between the key points of the candidate body.

Optionally, in an embodiment, the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence, and the first convolution module includes a convolution kernel size It is a 3*3 convolution unit, and the second convolution module includes a convolution unit with a convolution kernel size of 1*1.

Optionally, in an embodiment, the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolutions connected in sequence. Module.

Optionally, in an embodiment, the third convolution module includes a convolution unit with a convolution kernel size of 7*7.

Optionally, in an embodiment, after the determining a set of composition points corresponding to the set of positioning points, the method further includes:

When the set of positioning points matches the set of composition points, the shooting scene is photographed to obtain a photographed image.

Optionally, in an embodiment, the acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body includes:

The category center point of each category area is determined, and the category center point of each category area and each key point of the human body are used as the positioning points to obtain the set of positioning points.

Optionally, in an embodiment, the determining a set of composition points corresponding to the set of positioning points includes:

Determine the preset composition template image with the highest similarity to the preview image according to the set of positioning points;

Taking each category center point and each human body key point in the preset composition template image as the composition point to obtain the composition point set.

Optionally, in an embodiment, it further includes:

Combine positioning points and composition points corresponding to the same category into a data group, and combine positioning points and composition points corresponding to the same human body position into a data group to obtain multiple data groups;

Calculate the distance between the positioning point and the composition point in each data group, and calculate the distance and value of multiple data groups;

When the distance and value are less than a preset threshold, it is determined that the set of positioning points matches the set of composition points.

Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a composition prompting method provided by an embodiment of the present application. The flow of the composition prompting method provided by an embodiment of the present application may be as follows:

In 101, a preview image of the shooting scene is obtained, and a pre-trained key point detection model is called to perform key point detection on the preview image, and the human body key points of the human body in the shooting scene are obtained.

The shooting scene is the scene where the camera of the electronic device is aimed after the shooting application is started, and it can be any scene, which can include people and objects.

There are no specific restrictions on how to start the shooting application program of the electronic device and what kind of shooting application program. For example, the electronic device can start the system application "camera" of the electronic device according to the user's operation. After starting the "camera", the electronic device will collect real-time images through the camera. At this time, the scene where the camera is aimed is the shooting scene. . For example, the electronic device can start the "camera" according to the user's touch operation on the entrance of the "camera", and can also start the "camera" according to the user's voice password "start the camera" and so on. The composition prompting method provided in this application can be applied to image shooting of a portrait scene, where the portrait scene is a shooting scene in which a human body exists.

It should be noted that the preview image is obtained by the electronic device using the camera to perform the image scene of the shooting scene, and is used by default to show to the user, so that the user can preview the imaging effect of the image shooting.

In the embodiment of the present application, the electronic device uses the preview image collected in real time to detect the key points of the human body in the shooting scene, so as to detect the key points of the human body.

Among them, the electronic device first obtains a preview image of the shooting scene. It should be noted that, in the embodiment of the present application, a machine learning method is also used to pre-train a key point detection model. Among them, the key point detection model can be set locally on the electronic device or on the server. In addition, the configuration of the portrait detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs. Correspondingly, in addition to obtaining the preview image of the shooting scene, the electronic device also calls the pre-trained key point detection model from the local or server, and inputs the obtained preview image into the pre-trained key point detection model for key point detection. Obtain the key points of the human body in the shooting scene. The key points of the human body are used to locate the head, neck, shoulders, elbows, hands, hips, knees and feet of the human body. The key points of the head can be subdivided into the eyes, nose, mouth, eyebrows and contour points of various parts of the head, etc. . For example, referring to Figure 2, for the human body image shown on the left side of Figure 2, input it into the pre-trained key point detection model for key point detection, and multiple human key points are obtained, as shown on the right side of Figure 2.

In 102, the preview image is divided into multiple category areas, and a set of positioning points corresponding to the shooting scene is obtained according to the category areas and the key points of the human body.

In the embodiment of the present application, after acquiring the preview image of the shooting scene, the electronic device not only performs key point detection on the preview image, but also divides the preview image into multiple category areas.

Exemplarily, a machine learning method is also used in this application to pre-train a semantic segmentation model. Among them, the semantic segmentation model can be set locally on the electronic device or on the server. In addition, the configuration of the semantic segmentation model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs. For example, the semantic segmentation model of ICNet configuration is adopted in this application.

When dividing the preview image into multiple category areas, the electronic device can call the pre-trained semantic segmentation model from the local or the server, and input the obtained preview image into the pre-trained semantic segmentation model for semantic segmentation, to obtain the preview image Object category information to which each area belongs. Then, according to the category information, the electronic device divides the preview image into multiple category areas.

Then, the electronic device determines multiple positioning points according to the preset positioning point decision strategy according to the divided category areas and key points, and the determined multiple positioning points form a positioning point set. Among them, the positioning point is used to represent the position of the human body and other objects in the shooting scene.

In 103, a set of composition points corresponding to the set of anchor points is determined.

Wherein, the electronic device also determines the set of composition points corresponding to the set of positioning points according to the preset composition point decision strategy according to the acquired set of positioning points. Wherein, the composition points in the composition point set correspond to the positioning points in the positioning point set one-to-one, and when each positioning point matches its corresponding composition point, it is considered that the best composition can be obtained at this time. Wherein, matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance. This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.

In 104, when the set of positioning points and the set of composition points do not match, prompt information for instructing to adjust the shooting posture of the electronic device is output.

Among them, according to the above definition of matching of positioning points and composition points, a person of ordinary skill in the art can configure the constraint conditions for matching the positioning point set and the composition point set according to actual needs. This application does not make specific restrictions on this, for example, it can be configured as positioning When each anchor point in the point set matches with the composition point in the corresponding composition point set, it is determined that the anchor point set matches the composition point set; for example, it can be configured as a preset number of anchor points in the anchor point set and their counterparts. When the composition point in the composition point set is matched, it is determined that the anchor point set matches the composition point set.

Correspondingly, the electronic device determines in real time whether the set of anchor points corresponding to the shooting scene matches the set of composition points, and if they do not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device, so that the set of anchor points corresponding to the shooting scene and the composition are output. The point set is matched, so that the people and objects in the shooting scene can obtain a better composition.

It can be seen from the above that this application obtains the key points of the human body in the shooting scene by acquiring the preview image of the shooting scene and calling the pre-trained key point detection model to perform key point detection on the preview image; and dividing the preview image into multiple categories Area, and obtain the positioning point set corresponding to the shooting scene according to the category area and the key points of the human body, and determine the composition point set corresponding to the positioning point set; when the positioning point set does not match the composition point set, the output is used to instruct the adjustment of the electronic device Prompt information of the shooting posture; when the positioning point set matches the composition point set, the shooting scene is shot to obtain the shot image. Compared with the related technology, the present application can guide the user to compose the image to improve the image quality taken by the electronic device.

In an embodiment, after determining the set of composition points corresponding to the set of anchor points, the method further includes:

When the set of positioning points matches the set of composition points, the electronic device determines that the people and objects in the shooting scene at this time have a better composition, that is, the shooting scene is photographed, so as to obtain a high-quality shooting image of the shooting scene.

In one embodiment, calling the pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, including:

(1) Intercept the human body image of the human body from the preview image;

(2) Call the key point detection model to detect the key points of the human body image to obtain the key points of the human body.

In order to improve the detection efficiency of key point detection on the preview image, this application does not perform key point detection on the complete preview image, but performs key point detection on the part of the human body in the preview image.

Among them, after the electronic device obtains the preview image of the shooting scene, it does not directly call the key point detection model to perform key point detection on the preview image, but first intercepts the human body image of the human body from the preview image, and then calls the key point The detection model detects the key points of the intercepted human body image to obtain the key points of the human body in the shooting scene.

It should be noted that there is no restriction on how to intercept the human body image from the preview image in this application, and a person of ordinary skill in the art can adopt a suitable interception method according to actual needs.

In one embodiment, intercepting the human body image of the human body from the preview image includes:

(1) Call the pre-trained portrait detection model to perform portrait detection on the preview image, and obtain the portrait bounding box corresponding to the preview image;

(2) Intercept the image content in the bounding box of the portrait to obtain the human body image.

It should be noted that the embodiment of the present application also pre-trains a portrait detection model using a machine learning method. The portrait detection model is configured to take an image as an input and a portrait bounding box corresponding to the image as an output. The image within the bounding box of the portrait The content is the portrait part of the image. Among them, the portrait detection model can be set locally on the electronic device or on the server. In addition, there is no specific restriction on the configuration of the portrait detection model in this application, and can be selected by a person of ordinary skill in the art according to actual needs. The Yolo model or SSD model is used as the basic model in the application, and the portrait detection model is obtained through machine learning training.

Correspondingly, when the electronic device intercepts the human body image from the preview image, it can call the pre-trained portrait detection model from the local or server, and input the preview image into the pre-trained portrait detection model for portrait detection to obtain the portrait corresponding to the preview image Bounding box. Then, the human body image can be obtained by cutting out the image content in the bounding box of the portrait.

For example, please refer to Figure 3, in addition to the presence of human figures, there are other objects in the preview image. The preview image is input into the portrait detection model for portrait detection, and the portrait bounding box corresponding to the preview image is obtained. As shown in FIG. 3, the portrait bounding box includes only the portrait part of the preview image. Then, cut out the image content in the bounding box of the portrait from the preview image to obtain the human body image in the preview image.

In one embodiment, the key point detection model includes a feature extraction network, a dual branch network, and an output network. The dual branch network includes a location branch network and a relationship branch network. The key point detection model is called to perform key point detection on the human body image to obtain the human body key. Points, including:

(1) Call the feature extraction network to extract the image features of the human body image;

(2) The call location branch network detects the key points of the candidate body according to the image feature, and the call relationship branch network detects the connection relationship between the key points of the candidate body according to the image feature;

(3) Call the output network to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.

Please refer to Figure 4, the key point detection model consists of three parts, namely the feature extraction network, the dual branch network and the output network.

The feature extraction network can be any known feature extraction network, such as VGG, MobileNet, and Resnet, etc., and its purpose is to perform feature extraction on the input image as the input of the subsequent branch network. Correspondingly, the electronic device first calls the feature extraction network to perform feature extraction on the intercepted human body image to obtain the image features of the human body image.

In this application, the key point detection task is segmented, and the dual branch network is used to realize key point detection. One branch network tends to detect the key points of the human body that may exist in the image, which is recorded as the location branch network, and the other branch network tends to The connection relationship between key points of the human body that may exist is detected, and it is recorded as the relationship branch network. Correspondingly, after the electronic device extracts the image features of the human body image based on the feature extraction network, it further calls the location branch network to detect possible human body key points based on the aforementioned image feature detection, and record them as candidate body key points. Exemplarily, the output of the location branch network is heatmap, which is a three-dimensional matrix of height*width*keypoints, where height and width represent height and width respectively, and keypints represent the number of key points of the candidate body, that is, each The candidate body key point corresponds to a height*width matrix. The value of each position in the matrix indicates the possibility that the candidate body key point is in this position. The larger the value, the more likely the candidate body key point is in this position. For example, you can take the position of the maximum value in each area in the heatmap to get the key points of the corresponding candidate body. Among them, the heatmap can be pooled to the maximum, and then the heatmap before and after pooling can be compared, and the positions with the same value can be selected. As the key point of the candidate body.

In addition, the electronic device also calls the relationship branch network to obtain the connection relationship between the key points of the candidate body based on the aforementioned image feature detection. Exemplarily, the output of the relational branch network is pafmap, which is a three-dimensional matrix of height*width*(2*limbs), and limbs represents the number of limbs (the limbs here are not chivalrous limbs, but two related keys The area between the points, for example, think that the connection between the left eye and the right eye is a limb, and the connection between the neck and the left shoulder is a limb). Each limb corresponds to a matrix of height*width*2, which can be considered as a 2-channel heat map. Each position of the heat map has 2 values, namely x and y, and the vector (x, y) represents the position The direction of the limbs (when x and y are both 0, it means that there is no limb at this position), which represents the connection relationship between the key points of the candidate body.

After obtaining the key points of the candidate body and the connection relationship of the key points of the candidate body, the key points of the candidate body can be connected according to the connection relationship, so as to obtain a complete human body. Among them, each time a pafmap corresponding to a limb is taken, and the key points of the candidate body at both ends of the limb are connected. The confidence that the two key points dj1 and dj2 (j1 and j2 represent the key point categories of the candidate body, such as eyes, nose tip, eyebrows, etc.) are from the same human body:

Where P(u) is the position of interpolation between the two key points, namely:

P(u)=(1-u)d _j1 +ud _j2 ;

In actual use, u is generally sampled at uniform intervals on [0,1], and the integral is approximated. L _c is the value at P(u) in pafmap.

According to the above process, the possible connection between two adjacent candidate key points, that is, the potential limbs, can be obtained, thereby directly completing the connection of the human body.

After completing the connection of the key points of the candidate body, the electronic device also normalizes the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body. Among them, the normalization process is carried out according to the following formula,

x’=x/w;

y’=y/h;

Among them, x and y represent the abscissa and ordinate of the key points of a candidate body, x'and y'represent the abscissa and ordinate of the key points of the human body obtained by normalizing the key points of the candidate body, and w represents the portrait The width of the bounding box, h represents the height of the bounding box of the portrait.

In an embodiment, the location branch network includes N location segments, the relationship branch network includes N relationship segments, the location branch network is called to obtain key points of candidates based on image feature detection, and the calling relationship branch network is detected based on image features. The connection relationship between the key points of the candidate body, including:

(1) Call the first position segment to obtain the key points of the first group of candidates based on image feature detection, and call the first relationship segment to obtain the connection relationship between the key points of the first group of candidates based on image feature detection;

(2) Fuse the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segment to obtain the first fusion feature based on the first fusion feature detection. 2 sets of candidate body key points, and call the second relationship segment to detect the connection relationship between the second set of candidate body key points according to the first fusion feature;

(3) Fusion of the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation is obtained according to the Nth- The key points of the Nth group of candidates obtained by 1 fusion feature detection, and the connection relationship between the key points of the Nth group of candidates obtained by calling the Nth relationship segment according to the N-1th fusion feature detection;

(4) Regarding the key points of the Nth group of candidates as the key points of the candidate bodies detected by the location branch network, and the connection relationship between the key points of the N groups of candidate bodies as the relationship between the key points of the candidate bodies detected by the relationship branch network Connection relationship.

It should be noted that, in the embodiment of the present application, the location branch network includes N (N is a positive integer greater than 2 and can be valued by a person of ordinary skill in the art according to actual needs) location segments, and the relationship branch network includes N relationships Segmented. For example, referring to Figure 5, the location branch network includes N location segments, namely location segment 1 to location segment N. Correspondingly, the relationship branch network includes N relationship segments, namely relationship segment 1 to relationship. Segment N. Among them, location segment 1 and relation segment 1 form network segment 1, location segment 2 and relation segment 2 form network segment 2, and so on, location segment N and relation segment N form network segment N In other words, the dual-branch network constructed in the embodiment of this application can be regarded as composed of multiple network segments, such as network segment 1 to network segment N shown in FIG. 5, each of which includes a corresponding Location segmentation and relationship segmentation.

The following continues to take the network structure shown in FIG. 5 as an example for description.

In the embodiment of the present application, the electronic device inputs the image features extracted by the feature extraction network into the location segment 1 in the network segment 1 for detection, obtains the first group of candidate body key points output by the location segment 1, and extracts the obtained The image features of the image are input into the relationship segment 1 in the network segment 1 for detection, and the connection relationship between the key points of the first group of candidates output by the relationship segment 1 is obtained; then, the key points (coordinates) of the first group of candidates , The connection relationship between the key points of the first group of candidates and the image features are fused, and the first fusion feature is obtained as the output of network segment 1. Then, the first fusion feature output by network segment 1 is input to the network The location segment 2 in segment 2 is detected, and the key points of the second group of candidates output by location segment 2 are obtained, and the fusion features output by network segment 1 are input into the relationship segment 2 of network segment 2 for detection , Get the connection relationship between the key points of the second group of candidates output by the relationship segment 2; then, the connection relationship between the key points of the second group of candidates (coordinates), the connection relations between the key points of the second group of candidates, and the image features Perform fusion to obtain the second fusion feature as the output of network segment 2; and so on, until the position in network segment N is obtained. Segment N is the N-1th fusion output according to network segment N-1 The key points of the Nth group of candidates detected by the feature, and the relationship between the segment N obtained in the network segment N, and the Nth group of candidates detected according to the N-1th fusion feature output by the network segment N-1 The connection relationship between the key points of the human body; then, the key points of the Nth group of candidates output by the location segment N are used as the key points of the candidate body finally output by the location branch network, and the key points of the Nth group of candidates output by the relationship segment N The connection relationship between the points is used as the connection relationship between the key points of the candidate body finally output by the relationship branch network.

In an embodiment, the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence.

It should be noted that in the embodiment of the present application, the structure of the first relationship segment and the first location segment are the same, but the two do not share parameters. The following takes the first location segment as an example for description.

In the embodiment of the present application, the first position segment includes a plurality of first convolution models and a plurality of second convolution modules that are sequentially connected, wherein the first convolution module includes a convolution kernel with a size of 3*3. Convolution unit, the second convolution module includes a convolution unit with a convolution kernel size of 1*1.

It should be noted that in the embodiment of the present application, the number of the first convolution module and the second convolution module constituting the first position segment is not specifically limited, and can be configured by a person of ordinary skill in the art according to actual needs, such as In the embodiment of the present application, three first convolution modules and two second convolution modules are used, as shown in FIG. 6.

In an embodiment, the structures of the second to N position segments are the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolution modules connected in sequence.

It should be noted that, in the embodiment of this application, except for the first relationship segment and the first position segment, all the relationship segments and all the position segments have the same structure, but the two do not share parameters. , The following takes the second position segment as an example for description.

In the embodiment of the present application, the second position segment includes multiple third convolution models and multiple second convolution modules that are sequentially connected, where the third convolution module includes a convolution kernel with a size of 7*7. Product unit.

In the embodiment of the application, the number of the third convolution module and the second convolution module constituting the second position segment is not specifically limited, and can be configured by a person of ordinary skill in the art according to actual needs. For example, the embodiment of the application In this, five third convolution modules and two second convolution modules are used, as shown in Figure 7.

It should be noted that the purpose of adopting the 7*7 convolution unit in this application is to obtain a larger receptive field, so as to obtain more information. In other embodiments, in the case of limited computing power, each 7×7 convolution unit can be replaced with three 3×3 convolution units to reduce the amount of processed data.

In an embodiment, acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body includes:

Determine the category center point of each category area, and use the category center point of each category area and each key point of the human body as positioning points to obtain a set of positioning points.

In the embodiment of the present application, for each category area obtained by dividing the preview image, the category center point of each category area is determined, the category center point of each category area is taken as a positioning point, and each human body key point is taken as a Anchor point, these anchor points form an anchor point set.

In an embodiment, determining the set of composition points corresponding to the set of anchor points includes:

(1) According to the set of anchor points, determine the preset composition template image with the highest similarity to the preview image;

(2) Taking the center point of each category and each key point of the human body in the preset composition template image as the composition point to obtain the composition point set.

It should be noted that this application constructs a portrait composition database in advance, and the portrait composition data includes a plurality of preset composition template images.

Exemplarily, the portrait composition database is constructed as follows:

a. The number of images with excellent composition should be collected as much as possible.

b. For each image collected, call the portrait detection model to detect the corresponding portrait bounding box, and intercept the image content in the portrait bounding box to obtain the human image, and then call the key point detection model according to the portrait bounding box to detect the human image The key points of the human body in (for details, please refer to the key point detection process of the preview image above, which will not be described in detail here).

c. For each image collected, divide it into multiple category areas, and determine the category center of each category area.

d. Take each image collected as a sample, and use the previously obtained category center, key points (coordinates) of the human body and the aspect ratio of the bounding box of the portrait as the characteristics of the sample, and perform Q-type clustering. Each category contains multiple samples. . The Ming’s distance is used to measure the similarity between samples, and the AGENS hierarchical clustering algorithm is used for clustering. Among them, the number of clustering categories can be determined according to the scene of the collected image and the distribution of the posture of the human body, such as scene and More categories can be set when the posture is more changeable, and fewer categories can be set when the posture is more singular, which can be specifically configured by a person of ordinary skill in the art according to actual needs.

e. Use the image at the center of each category as a preset composition template image.

In the embodiment of the present application, when determining the set of composition points corresponding to the set of anchor points, the electronic device may use the set of anchor points (including the category center points of the preview image and the key points of the human body) and the aspect ratio of the portrait bounding box of the preview image as The characteristics of the preview image are determined from the portrait composition data, and the preset composition template image with the highest similarity (measured by the Minnesian distance) to the preview image is determined. Then, each category center point and each human body key point of the determined preset composition template image are used as composition points to obtain a set of composition points.

For example, referring to Figure 8, the electronic device displays the composition point set and the positioning point set in the previewable image. As shown in Figure 8, the composition point set includes composition point 1 and composition point 2, and the positioning point set includes composition point 1 corresponding to composition point 1. The positioning point 1 and the positioning point 2 corresponding to the composition point 2 are combined with the pointing arrow from the positioning point 1 to the composition point 1 and the pointing arrow from the positioning point 2 to the composition point 2 as the prompt information for adjusting the shooting posture of the electronic device. The user guides the composition.

In an embodiment, the composition prompting method provided by the present application further includes:

(1) Combine positioning points and composition points corresponding to the same category into a data group, and combine positioning points and composition points corresponding to the same human body position into a data group to obtain multiple data groups;

(2) Calculate the distance between the positioning point and the composition point in each data group, and calculate the distance and value of multiple data groups;

(3) When the distance and the value are less than the preset threshold, it is determined that the set of positioning points matches the set of composition points.

In the embodiment of the present application, the electronic device combines positioning points and composition points corresponding to the same category into a data group, and combines positioning points and composition points corresponding to the same human body position into a data group, thereby obtaining multiple data sets.

For each data group, the electronic device calculates the distance between the positioning point and the composition point (Euclidean distance), and calculates the distance and value of the multiple data groups.

Then, the electronic device determines whether the calculated distance and value reach the preset threshold, and if so, it determines that the set of positioning points matches the set of composition points, otherwise it does not match.

Please refer to FIG. 9. FIG. 9 is a schematic flowchart of a composition prompting method provided by an embodiment of the present application. The flow of the composition prompting method provided by an embodiment of the present application may be as follows:

In 201, the electronic device obtains a preview image of the shooting scene, and intercepts a human body image from the preview image.

The shooting scene is the scene where the camera of the electronic device is aimed after the shooting application is started, and it can be any scene, which can include people and objects. The preview image is obtained by the electronic device using the camera to perform the image scene of the shooting scene, and is used by default to show to the user, so that the user can preview the imaging effect of the image shooting.

Among them, the electronic device first obtains a preview image of the shooting scene. It should be noted that the embodiment of the present application also pre-trains a portrait detection model using a machine learning method. The portrait detection model is configured to take an image as an input and a portrait bounding box corresponding to the image as an output. The image within the bounding box of the portrait The content is the portrait part of the image. Correspondingly, after acquiring the preview image, the electronic device can call the portrait detection model to perform portrait detection on the preview image, and obtain the portrait bounding box corresponding to the preview image. Then, the human body image can be obtained by cutting out the image content in the bounding box of the portrait.

In 202, the electronic device calls the pre-trained key point detection model to perform key point detection on the human body image to obtain the human body key points of the human body in the shooting scene.

It should be noted that, in the embodiment of the present application, a machine learning method is also used to pre-train a key point detection model. Among them, the key point detection model can be set locally on the electronic device or on the server. In addition, the configuration of the portrait detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs. Correspondingly, in addition to obtaining the preview image of the shooting scene, the electronic device also calls the pre-trained key point detection model from the local or server, and inputs the obtained preview image into the pre-trained key point detection model for key point detection. Obtain the key points of the human body in the shooting scene. The key points of the human body are used to locate the head, neck, shoulders, elbows, hands, hips, knees and feet of the human body. The key points of the head can be subdivided into the eyes, nose, mouth, eyebrows and contour points of various parts of the head, etc. . For example, referring to Figure 2, for the human body image shown on the left side of Figure 2, input it into the pre-trained key point detection model for key point detection, and multiple human key points are obtained, as shown on the right side of Figure 2.

In 203, the electronic device divides the preview image into multiple category areas, and determines the category center point of each category area.

In 204, the electronic device uses the center point of each category and each key point of the human body as positioning points to obtain a set of positioning points.

When dividing the preview image into multiple category areas, the electronic device can call the pre-trained semantic segmentation model from the local or the server, and input the obtained preview image into the pre-trained semantic segmentation model for semantic segmentation, to obtain the preview image Object category information to which each area belongs. Then, according to the category information, the electronic device divides the preview image into multiple category areas, and determines the category center point of each category area.

The category center point of each category area is taken as an anchor point, and each key point of the human body is taken as an anchor point, and these anchor points form an anchor point set.

In 205, the electronic device determines the preset composition template image with the highest similarity to the preview image according to the set of positioning points.

In 206, the electronic device uses each category center point and each human body key point in the preset composition template image as a composition point to obtain a set of composition points.

Wherein, the composition points in the composition point set correspond to the positioning points in the positioning point set one-to-one, and when each positioning point matches its corresponding composition point, it is considered that the best composition can be obtained at this time. Wherein, matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance. This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.

In 207, when the positioning point set does not match the composition point set, the electronic device outputs prompt information for instructing to adjust the shooting posture of the electronic device.

In 208, when the set of positioning points matches the set of composition points, the electronic device photographs the shooting scene to obtain a photographed image.

In an embodiment, a composition prompting device is also provided. Please refer to FIG. 10, which is a schematic structural diagram of a composition prompting device provided by an embodiment of the application. The composition prompting device is applied to electronic equipment, and the composition prompting device includes a key point detection module 301, a positioning point determination module 302, a composition point determination module 303, a composition prompt module 304, and an image capturing module 305, as follows:

The key point detection module 301 is used to obtain a preview image of the shooting scene, and call a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene;

The positioning point determination module 302 is configured to divide the preview image into multiple category areas, and obtain a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;

The composition point determination module 303 is used to determine the composition point set corresponding to the positioning point set;

The composition prompting module 304 is configured to output prompt information for instructing to adjust the shooting posture of the electronic device when the set of positioning points and the set of composition points do not match.

In an embodiment, the composition prompting device provided by the present application further includes an image capturing module, which is used to capture the shooting scene when the positioning point set matches the composition point set to obtain the captured image.

In one embodiment, when the pre-trained key point detection model is called to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, the key point detection module 301 is used to:

Intercept the human body image of the human body from the preview image;

Call the key point detection model to detect the key points of the human body image to obtain the key points of the human body.

In an embodiment, when the human body image of the human body is intercepted from the preview image, the key point detection module 301 is used to:

Call the pre-trained portrait detection model to perform portrait detection on the preview image, and obtain the portrait bounding box corresponding to the preview image;

The image content in the bounding box of the portrait is intercepted to obtain the human body image.

In one embodiment, the key point detection model includes a feature extraction network, a dual branch network, and an output network. The dual branch network includes a location branch network and a relationship branch network. The key point detection model is called to perform key point detection on the human body image to obtain the human body. In the case of key points, the key point detection module 301 is used to:

Call the feature extraction network to extract the image features of the human body image;

The call location branch network detects the key points of the candidate body according to the image feature, and the call relationship branch network detects the connection relationship between the key points of the candidate body according to the image feature;

In one embodiment, the location branch network includes N location segments, the relationship branch network includes N relationship segments, the call location branch network detects candidate body key points according to image features, and the call relationship branch network detects the image features according to the image features. When obtaining the connection relationship between the key points of the candidate body, the key point detection module 301 is used to:

Calling the first position segment to obtain the key points of the first group of candidates based on image feature detection, and call the first relationship segment to obtain the connection relationship between the key points of the first group of candidates based on image feature detection;

Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation to obtain the second group of candidates based on the first fusion feature detection The key points of the human body, and call the second relationship segment to detect the connection relationship between the key points of the second group of candidates according to the first fusion feature;

Fuse the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation is obtained according to the N-1th fusion The key points of the Nth group of candidates obtained by feature detection, and the connection relationship between the key points of the Nth group of candidates obtained by calling the Nth relationship segment according to the N-1th fusion feature detection;

The key points of the Nth group of candidates are taken as the key points of the candidates detected by the location branch network, and the connection relationship between the key points of the N groups of candidates is taken as the connection relationship between the key points of the candidate detected by the relationship branch network.

In an embodiment, the first convolution module includes a convolution unit with a convolution kernel size of 3*3, and the second convolution module includes a convolution unit with a convolution kernel size of 1*1.

In an embodiment, the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolution modules connected in sequence.

In an embodiment, the third convolution module includes a convolution unit with a convolution kernel size of 7*7.

In an embodiment, when acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body, the positioning point determination module 302 is configured to:

In an embodiment, when determining the composition point set corresponding to the anchor point set, the composition point determination module 303 is configured to:

According to the set of anchor points, determine the preset composition template image with the highest similarity to the preview image;

The center point of each category and each key point of the human body in the preset composition template image are used as composition points to obtain a set of composition points.

In an embodiment, the composition prompting device provided by the present application further includes a judgment module for:

When the distance and the value are less than the preset threshold, it is determined that the set of positioning points matches the set of composition points.

It should be noted that the composition reminding device provided in this embodiment of the application belongs to the same concept as the composition reminding method in the above embodiment. Any method provided in the composition reminding method embodiment can be run on the composition reminding device, and its specific implementation For details of the process, refer to the above embodiment, which will not be repeated here.

In an embodiment, an electronic device is also provided. Referring to FIG. 11, the electronic device includes a processor 401 and a memory 402.

The processor 401 in the embodiment of the present application is a general-purpose processor, such as an ARM architecture processor.

A computer program is stored in the memory 402, which may be a high-speed random access memory or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the computer program in the memory 402 to implement the following functions:

Obtain a preview image of the shooting scene, and call the pre-trained key point detection model to perform key point detection on the preview image, and obtain the human body key points of the human body in the shooting scene;

Divide the preview image into multiple category areas, and obtain a set of anchor points corresponding to the shooting scene according to the category areas and the key points of the human body;

Determine the set of composition points corresponding to the set of anchor points;

When the set of positioning points and the set of composition points do not match, a prompt message for instructing to adjust the shooting posture of the electronic device is output.

In one embodiment, when the pre-trained key point detection model is called to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, the processor 401 is configured to execute:

Intercept the human body image of the human body from the preview image;

In an embodiment, when the human body image of the human body is intercepted from the preview image, the processor 401 is configured to execute:

In one embodiment, the key point detection model includes a feature extraction network, a dual branch network, and an output network. The dual branch network includes a location branch network and a relationship branch network. The key point detection model is called to perform key point detection on the human body image to obtain the human body. At key points, the processor 401 is used to execute:

In one embodiment, the location branch network includes N location segments, the relationship branch network includes N relationship segments, the call location branch network detects candidate body key points according to image features, and the call relationship branch network detects the image features according to the image features. When the connection relationship between the key points of the candidate body is obtained, the processor 401 is used to execute:

Call the first position segment to obtain the key points of the first group of candidates based on image feature detection, and call the first relationship segment to obtain the connection relationship between the key points of the first group of candidates based on image feature detection;

In an embodiment, when acquiring a set of anchor points corresponding to the shooting scene according to the category area and the key points of the human body, the processor 401 is configured to execute:

In an embodiment, when determining the composition point set corresponding to the anchor point set, the processor 401 is configured to execute:

In an embodiment, the processor 401 is further configured to execute:

It should be noted that the electronic device provided in the embodiment of this application belongs to the same concept as the composition reminding method in the above embodiment. Any method provided in the composition reminding method embodiment can be run on the electronic device. The specific implementation process is detailed. See the embodiment of the composition prompting method, which will not be repeated here.

It should be noted that for the composition prompting method of the embodiment of the present application, those of ordinary skill in the art can understand that all or part of the process of implementing the composition prompting method of the embodiment of the present application can be completed by controlling the relevant hardware through a computer program. The computer program may be stored in a computer readable storage medium, such as stored in the memory of an electronic device, and executed by a processor in the electronic device, and may include embodiments such as composition prompting methods during execution. Process. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.

The composition prompting method, model training method, device, storage medium, and electronic equipment provided by the embodiments of the application are described in detail above. Specific examples are used in this article to illustrate the principles and implementation of the application. The above implementations The description of the example is only used to help understand the method and core idea of this application; at the same time, for those skilled in the art, according to the idea of this application, there will be changes in the specific implementation and the scope of application. In summary As mentioned, the content of this specification should not be construed as a limitation to this application.

Claims

A composition reminding method, which includes

Acquiring a preview image of the shooting scene, and calling a pre-trained key point detection model to perform key point detection on the preview image, to obtain the human body key points of the human body in the shooting scene;

Dividing the preview image into multiple category areas, and obtaining a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;

Determining a set of composition points corresponding to the set of positioning points;

When the set of positioning points does not match the set of composition points, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
The composition prompting method according to claim 1, wherein the calling a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene comprises:

Intercepting the human body image of the human body from the preview image;

Calling the key point detection model to perform key point detection on the human body image to obtain the human body key point.
The composition prompting method according to claim 2, wherein the intercepting the human body image of the human body from the preview image comprises:

Calling a pre-trained portrait detection model to perform portrait detection on the preview image to obtain a portrait bounding box corresponding to the preview image;

The image content in the bounding box of the portrait is intercepted to obtain the image of the human body.
The composition prompting method according to claim 3, wherein the key point detection model includes a feature extraction network, a dual-branch network, and an output network, the dual-branch network includes a location branch network and a relationship branch network, and the call to the The key point detection model performs key point detection on the human body image to obtain the human body key points, including:

Calling the feature extraction network to extract the image features of the human body image;

Calling the location branch network to detect key points of the candidate body according to the image feature, and calling the relationship branch network to detect the connection relationship between the key points of the candidate body according to the image feature;

The output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
The composition prompting method according to claim 4, wherein the location branch network includes N location segments, the relationship branch network includes N relationship segments, and the location branch network is invoked according to the image feature Detecting the key points of the candidate body, and calling the relationship branch network to detect the connection relationship between the key points of the candidate body according to the image feature, including:

Call the first position segment to obtain the first group of candidate key points based on the image feature detection, and call the first relationship segment to obtain the connection between the first group of candidate key points based on the image feature detection relation;

Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation according to the first Fusion feature detection obtains the key points of the second group of candidates, and calls the second relationship segment to obtain the connection relationship between the key points of the second group of candidates according to the first fusion feature detection;

Fusion of the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation basis is obtained The key points of the Nth group of candidates obtained by the N-1th fusion feature detection, and the Nth relationship segment is called to obtain the connection relationship between the key points of the Nth group of candidates according to the N-1th fusion feature detection ；

Use the Nth group of candidate key points as the candidate key points detected by the location branch network, and use the connection relationship between the N groups of candidate key points as the relationship branch network detected The connection between the key points of the candidate body.
The composition prompting method according to claim 5, wherein the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence, and the first convolution module includes a convolution module. A convolution unit with a convolution kernel size of 3*3, and the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
The composition prompting method according to claim 6, wherein the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of the first convolution modules connected in sequence. Two convolution module.
8. The composition prompting method according to claim 7, wherein the third convolution module includes a convolution unit with a convolution kernel size of 7*7.
8. The composition prompting method according to any one of claims 1-8, wherein after the determining a composition point set corresponding to the positioning point set, the method further comprises:

When the set of positioning points matches the set of composition points, the shooting scene is photographed to obtain a photographed image.
8. The composition prompting method according to any one of claims 1-8, wherein the acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body comprises:

The category center point of each category area is determined, and the category center point of each category area and each key point of the human body are used as the positioning points to obtain the set of positioning points.
The composition prompting method according to claim 10, wherein the determining a composition point set corresponding to the positioning point set comprises:

Determine the preset composition template image with the highest similarity to the preview image according to the set of positioning points;

Taking each category center point and each human body key point in the preset composition template image as the composition point to obtain the composition point set.
The composition prompting method according to claim 11, further comprising:

Combine positioning points and composition points corresponding to the same category into a data group, and combine positioning points and composition points corresponding to the same human body position into a data group to obtain multiple data groups;

Calculate the distance between the positioning point and the composition point in each data group, and calculate the distance and value of multiple data groups;

When the distance and value are less than a preset threshold, it is determined that the set of positioning points matches the set of composition points.
A composition prompting device, which includes:

The key point detection module is used to obtain a preview image of the shooting scene, and call a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene;

An anchor point determination module, configured to divide the preview image into a plurality of category areas, and obtain an anchor point set corresponding to the shooting scene according to the category area and the key points of the human body;

A composition point determination module, configured to determine a composition point set corresponding to the positioning point set;

The composition prompting module is configured to output prompt information for instructing to adjust the shooting posture of the electronic device when the set of positioning points does not match the set of composition points.
A storage medium on which a computer program is stored, wherein, when the computer program is loaded by a processor, it executes:

Acquiring a preview image of the shooting scene, and calling a pre-trained key point detection model to perform key point detection on the preview image, to obtain the human body key points of the human body in the shooting scene;

Dividing the preview image into multiple category areas, and obtaining a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;

Determining a set of composition points corresponding to the set of positioning points;

When the set of positioning points does not match the set of composition points, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
An electronic device includes a processor and a memory, the memory stores a computer program, wherein the processor loads the computer program to execute:

Acquiring the image to be processed, and identifying the horizontal dividing line of the image to be processed;

Rotating the to-be-processed image to rotate the horizontal dividing line to a preset position, and cropping the rotated to-be-processed image to obtain a cropped image;

Dividing the cropped image into a plurality of sub-images, and using the sub-images and the image to be processed as candidate images for image quality scoring;

The candidate image with the highest quality score is selected as the processing result image of the image to be processed.
The electronic device according to claim 15, wherein, when a pre-trained key point detection model is called to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, the processor is configured to execute :

Intercepting the human body image of the human body from the preview image;

Calling the key point detection model to perform key point detection on the human body image to obtain the human body key point.
The electronic device according to claim 16, wherein, when the human body image of the human body is intercepted from the preview image, the processor is configured to execute:

Calling a pre-trained portrait detection model to perform portrait detection on the preview image to obtain a portrait bounding box corresponding to the preview image;

The image content in the bounding box of the portrait is intercepted to obtain the image of the human body.
The electronic device according to claim 17, wherein the key point detection model includes a feature extraction network, a dual branch network, and an output network, and the dual branch network includes a location branch network and a relation branch network. The detection model detects the key points of the human body image, and when the key points of the human body are obtained, the processor is configured to execute:

Calling the feature extraction network to extract the image features of the human body image;

Calling the location branch network to detect key points of the candidate body according to the image feature, and calling the relationship branch network to detect the connection relationship between the key points of the candidate body according to the image feature;

The output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
The electronic device according to claim 18, wherein the location branch network includes N location segments, the relationship branch network includes N relationship segments, and the location branch network is called based on the image feature detection. When the key points of the candidate body are called, and the connection relationship between the key points of the candidate body is obtained by calling the relationship branch network according to the image feature detection, the processor is configured to execute:

Call the first position segment to obtain the first group of candidate key points based on the image feature detection, and call the first relationship segment to obtain the connection between the first group of candidate key points based on the image feature detection relation;

Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation according to the first Fusion feature detection obtains the key points of the second group of candidates, and calls the second relationship segment to obtain the connection relationship between the key points of the second group of candidates according to the first fusion feature detection;

Fusion of the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation basis is obtained The key points of the Nth group of candidates obtained by the N-1th fusion feature detection, and the Nth relationship segment is called to obtain the connection relationship between the key points of the Nth group of candidates according to the N-1th fusion feature detection ；

Use the Nth group of candidate key points as the candidate key points detected by the location branch network, and use the connection relationship between the N groups of candidate key points as the relationship branch network detected The connection between the key points of the candidate body.
The electronic device according to claim 19, wherein the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence, and the first convolution module includes convolution A convolution unit with a kernel size of 3*3, and the second convolution module includes a convolution unit with a convolution kernel size of 1*1.