WO2021169754A1 - Photographic composition prompting method and apparatus, storage medium, and electronic device - Google Patents

Photographic composition prompting method and apparatus, storage medium, and electronic device Download PDF

Info

Publication number
WO2021169754A1
WO2021169754A1 PCT/CN2021/074905 CN2021074905W WO2021169754A1 WO 2021169754 A1 WO2021169754 A1 WO 2021169754A1 CN 2021074905 W CN2021074905 W CN 2021074905W WO 2021169754 A1 WO2021169754 A1 WO 2021169754A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
points
human body
key points
composition
Prior art date
Application number
PCT/CN2021/074905
Other languages
French (fr)
Chinese (zh)
Inventor
罗彤
李亚乾
蒋燚
Original Assignee
Oppo广东移动通信有限公司
上海瑾盛通信科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司, 上海瑾盛通信科技有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021169754A1 publication Critical patent/WO2021169754A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/23Recognition of whole body movements, e.g. for sport training
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • H04N23/632Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters for displaying or modifying preview images prior to image capturing, e.g. variety of image resolutions or capturing parameters
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/64Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image

Definitions

  • This application relates to the field of image processing technology, and in particular to a composition prompting method, device, storage medium, and electronic equipment.
  • the embodiments of the present application provide a composition prompting method, device, storage medium, and electronic equipment, which can improve the quality of images captured by the electronic equipment.
  • the key point detection module is used to obtain a preview image of the shooting scene, and call a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene;
  • An anchor point determination module configured to divide the preview image into a plurality of category areas, and obtain an anchor point set corresponding to the shooting scene according to the category area and the key points of the human body;
  • a composition point determination module configured to determine a composition point set corresponding to the positioning point set
  • the composition prompting module is configured to output prompt information for instructing to adjust the shooting posture of the electronic device when the set of positioning points does not match the set of composition points.
  • the storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is loaded by a processor, the composition prompting method as provided in the present application is executed.
  • the electronic device provided by the embodiment of the present application includes a processor and a memory, the memory stores a computer program, and the processor loads the computer program to execute the composition prompting method provided by the present application.
  • FIG. 1 is a schematic flowchart of a composition prompting method provided by an embodiment of the application.
  • Fig. 2 is a schematic diagram of the key points of the human body detected in an embodiment of the present application.
  • Fig. 3 is a schematic diagram of intercepting a human body image in an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a key point detection model provided by an embodiment of the present application.
  • Fig. 5 is a detailed structure diagram of a key point detection model provided by an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of the first position segment in an embodiment of the present application.
  • Fig. 7 is a schematic structural diagram of the second position segment in an embodiment of the present application.
  • Fig. 8 is an example diagram of outputting prompt information in an embodiment of the present application.
  • FIG. 9 is a schematic diagram of another flow chart of the composition prompting method provided by an embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of a composition prompting device provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies.
  • the embodiment of the present application provides a model training method, a composition prompting method, a composition prompting device, a portrait segmentation device, a storage medium, and an electronic device, wherein the execution subject of the model training method may be the composition prompting device provided in the embodiment of the present application , Or an electronic device integrated with the composition prompting device, wherein the composition prompting device can be implemented in hardware or software; the execution subject of the composition prompting method can be the portrait segmentation device provided in the embodiment of the application, or the composition prompting device can be integrated with The electronic equipment of the portrait segmentation device, wherein the portrait segmentation device can be implemented in hardware or software.
  • the electronic device may be a device equipped with a processor (including but not limited to a general-purpose processor, a customized processor, etc.) and having processing capabilities, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.
  • a processor including but not limited to a general-purpose processor, a customized processor, etc.
  • processing capabilities such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.
  • composition prompting method including
  • the invoking a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene includes:
  • intercepting the human body image of the human body from the preview image includes:
  • the image content in the bounding box of the portrait is intercepted to obtain the image of the human body.
  • the key point detection model includes a feature extraction network, a dual-branch network, and an output network
  • the dual-branch network includes a location branch network and a relationship branch network
  • the key point detection is invoked
  • the model detects the key points of the human body image to obtain the key points of the human body, including:
  • the output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
  • the location branch network includes N location segments
  • the relationship branch network includes N relationship segments
  • the location branch network is called to obtain a candidate based on the image feature detection.
  • the key points of the human body, and calling the relationship branch network to obtain the connection relationship between the key points of the candidate body according to the image feature detection including:
  • the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence
  • the first convolution module includes a convolution kernel size It is a 3*3 convolution unit
  • the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
  • the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolutions connected in sequence. Module.
  • the third convolution module includes a convolution unit with a convolution kernel size of 7*7.
  • the method further includes:
  • the shooting scene is photographed to obtain a photographed image.
  • the acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body includes:
  • the category center point of each category area is determined, and the category center point of each category area and each key point of the human body are used as the positioning points to obtain the set of positioning points.
  • the determining a set of composition points corresponding to the set of positioning points includes:
  • composition point Taking each category center point and each human body key point in the preset composition template image as the composition point to obtain the composition point set.
  • it further includes:
  • FIG. 1 is a schematic flowchart of a composition prompting method provided by an embodiment of the present application.
  • the flow of the composition prompting method provided by an embodiment of the present application may be as follows:
  • a preview image of the shooting scene is obtained, and a pre-trained key point detection model is called to perform key point detection on the preview image, and the human body key points of the human body in the shooting scene are obtained.
  • the shooting scene is the scene where the camera of the electronic device is aimed after the shooting application is started, and it can be any scene, which can include people and objects.
  • the electronic device can start the system application "camera” of the electronic device according to the user's operation. After starting the “camera", the electronic device will collect real-time images through the camera. At this time, the scene where the camera is aimed is the shooting scene. .
  • the electronic device can start the "camera” according to the user's touch operation on the entrance of the "camera”, and can also start the "camera” according to the user's voice password "start the camera” and so on.
  • the composition prompting method provided in this application can be applied to image shooting of a portrait scene, where the portrait scene is a shooting scene in which a human body exists.
  • the preview image is obtained by the electronic device using the camera to perform the image scene of the shooting scene, and is used by default to show to the user, so that the user can preview the imaging effect of the image shooting.
  • the electronic device uses the preview image collected in real time to detect the key points of the human body in the shooting scene, so as to detect the key points of the human body.
  • the electronic device first obtains a preview image of the shooting scene.
  • a machine learning method is also used to pre-train a key point detection model.
  • the key point detection model can be set locally on the electronic device or on the server.
  • the configuration of the portrait detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs.
  • the electronic device in addition to obtaining the preview image of the shooting scene, the electronic device also calls the pre-trained key point detection model from the local or server, and inputs the obtained preview image into the pre-trained key point detection model for key point detection. Obtain the key points of the human body in the shooting scene.
  • the key points of the human body are used to locate the head, neck, shoulders, elbows, hands, hips, knees and feet of the human body.
  • the key points of the head can be subdivided into the eyes, nose, mouth, eyebrows and contour points of various parts of the head, etc. .
  • Figure 2 for the human body image shown on the left side of Figure 2, input it into the pre-trained key point detection model for key point detection, and multiple human key points are obtained, as shown on the right side of Figure 2.
  • the preview image is divided into multiple category areas, and a set of positioning points corresponding to the shooting scene is obtained according to the category areas and the key points of the human body.
  • the electronic device after acquiring the preview image of the shooting scene, not only performs key point detection on the preview image, but also divides the preview image into multiple category areas.
  • a machine learning method is also used in this application to pre-train a semantic segmentation model.
  • the semantic segmentation model can be set locally on the electronic device or on the server.
  • the configuration of the semantic segmentation model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs.
  • the semantic segmentation model of ICNet configuration is adopted in this application.
  • the electronic device can call the pre-trained semantic segmentation model from the local or the server, and input the obtained preview image into the pre-trained semantic segmentation model for semantic segmentation, to obtain the preview image Object category information to which each area belongs. Then, according to the category information, the electronic device divides the preview image into multiple category areas.
  • the electronic device determines multiple positioning points according to the preset positioning point decision strategy according to the divided category areas and key points, and the determined multiple positioning points form a positioning point set.
  • the positioning point is used to represent the position of the human body and other objects in the shooting scene.
  • a set of composition points corresponding to the set of anchor points is determined.
  • the electronic device also determines the set of composition points corresponding to the set of positioning points according to the preset composition point decision strategy according to the acquired set of positioning points.
  • the composition points in the composition point set correspond to the positioning points in the positioning point set one-to-one, and when each positioning point matches its corresponding composition point, it is considered that the best composition can be obtained at this time.
  • matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance. This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.
  • a person of ordinary skill in the art can configure the constraint conditions for matching the positioning point set and the composition point set according to actual needs.
  • This application does not make specific restrictions on this, for example, it can be configured as positioning
  • each anchor point in the point set matches with the composition point in the corresponding composition point set
  • it is determined that the anchor point set matches the composition point set for example, it can be configured as a preset number of anchor points in the anchor point set and their counterparts.
  • the composition point in the composition point set is matched, it is determined that the anchor point set matches the composition point set.
  • the electronic device determines in real time whether the set of anchor points corresponding to the shooting scene matches the set of composition points, and if they do not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device, so that the set of anchor points corresponding to the shooting scene and the composition are output.
  • the point set is matched, so that the people and objects in the shooting scene can obtain a better composition.
  • this application obtains the key points of the human body in the shooting scene by acquiring the preview image of the shooting scene and calling the pre-trained key point detection model to perform key point detection on the preview image; and dividing the preview image into multiple categories Area, and obtain the positioning point set corresponding to the shooting scene according to the category area and the key points of the human body, and determine the composition point set corresponding to the positioning point set; when the positioning point set does not match the composition point set, the output is used to instruct the adjustment of the electronic device Prompt information of the shooting posture; when the positioning point set matches the composition point set, the shooting scene is shot to obtain the shot image.
  • the present application can guide the user to compose the image to improve the image quality taken by the electronic device.
  • the method further includes:
  • the shooting scene is photographed to obtain a photographed image.
  • the electronic device determines that the people and objects in the shooting scene at this time have a better composition, that is, the shooting scene is photographed, so as to obtain a high-quality shooting image of the shooting scene.
  • calling the pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene including:
  • this application does not perform key point detection on the complete preview image, but performs key point detection on the part of the human body in the preview image.
  • the electronic device after the electronic device obtains the preview image of the shooting scene, it does not directly call the key point detection model to perform key point detection on the preview image, but first intercepts the human body image of the human body from the preview image, and then calls the key point The detection model detects the key points of the intercepted human body image to obtain the key points of the human body in the shooting scene.
  • intercepting the human body image of the human body from the preview image includes:
  • the embodiment of the present application also pre-trains a portrait detection model using a machine learning method.
  • the portrait detection model is configured to take an image as an input and a portrait bounding box corresponding to the image as an output.
  • the image within the bounding box of the portrait The content is the portrait part of the image.
  • the portrait detection model can be set locally on the electronic device or on the server.
  • there is no specific restriction on the configuration of the portrait detection model in this application and can be selected by a person of ordinary skill in the art according to actual needs.
  • the Yolo model or SSD model is used as the basic model in the application, and the portrait detection model is obtained through machine learning training.
  • the electronic device when it intercepts the human body image from the preview image, it can call the pre-trained portrait detection model from the local or server, and input the preview image into the pre-trained portrait detection model for portrait detection to obtain the portrait corresponding to the preview image Bounding box. Then, the human body image can be obtained by cutting out the image content in the bounding box of the portrait.
  • the preview image is input into the portrait detection model for portrait detection, and the portrait bounding box corresponding to the preview image is obtained.
  • the portrait bounding box includes only the portrait part of the preview image. Then, cut out the image content in the bounding box of the portrait from the preview image to obtain the human body image in the preview image.
  • the key point detection model includes a feature extraction network, a dual branch network, and an output network.
  • the dual branch network includes a location branch network and a relationship branch network.
  • the key point detection model is called to perform key point detection on the human body image to obtain the human body key. Points, including:
  • the call location branch network detects the key points of the candidate body according to the image feature
  • the call relationship branch network detects the connection relationship between the key points of the candidate body according to the image feature
  • the key point detection model consists of three parts, namely the feature extraction network, the dual branch network and the output network.
  • the feature extraction network can be any known feature extraction network, such as VGG, MobileNet, and Resnet, etc., and its purpose is to perform feature extraction on the input image as the input of the subsequent branch network.
  • the electronic device first calls the feature extraction network to perform feature extraction on the intercepted human body image to obtain the image features of the human body image.
  • the key point detection task is segmented, and the dual branch network is used to realize key point detection.
  • One branch network tends to detect the key points of the human body that may exist in the image, which is recorded as the location branch network, and the other branch network tends to The connection relationship between key points of the human body that may exist is detected, and it is recorded as the relationship branch network.
  • the electronic device extracts the image features of the human body image based on the feature extraction network, it further calls the location branch network to detect possible human body key points based on the aforementioned image feature detection, and record them as candidate body key points.
  • the output of the location branch network is heatmap, which is a three-dimensional matrix of height*width*keypoints, where height and width represent height and width respectively, and keypints represent the number of key points of the candidate body, that is, each The candidate body key point corresponds to a height*width matrix.
  • the value of each position in the matrix indicates the possibility that the candidate body key point is in this position. The larger the value, the more likely the candidate body key point is in this position. For example, you can take the position of the maximum value in each area in the heatmap to get the key points of the corresponding candidate body.
  • the heatmap can be pooled to the maximum, and then the heatmap before and after pooling can be compared, and the positions with the same value can be selected. As the key point of the candidate body.
  • the electronic device also calls the relationship branch network to obtain the connection relationship between the key points of the candidate body based on the aforementioned image feature detection.
  • the output of the relational branch network is pafmap, which is a three-dimensional matrix of height*width*(2*limbs), and limbs represents the number of limbs (the limbs here are not chivalrous limbs, but two related keys
  • the area between the points for example, think that the connection between the left eye and the right eye is a limb, and the connection between the neck and the left shoulder is a limb).
  • Each limb corresponds to a matrix of height*width*2, which can be considered as a 2-channel heat map.
  • Each position of the heat map has 2 values, namely x and y, and the vector (x, y) represents the position The direction of the limbs (when x and y are both 0, it means that there is no limb at this position), which represents the connection relationship between the key points of the candidate body.
  • the key points of the candidate body After obtaining the key points of the candidate body and the connection relationship of the key points of the candidate body, the key points of the candidate body can be connected according to the connection relationship, so as to obtain a complete human body. Among them, each time a pafmap corresponding to a limb is taken, and the key points of the candidate body at both ends of the limb are connected.
  • P(u) is the position of interpolation between the two key points, namely:
  • u is generally sampled at uniform intervals on [0,1], and the integral is approximated.
  • L c is the value at P(u) in pafmap.
  • the possible connection between two adjacent candidate key points that is, the potential limbs, can be obtained, thereby directly completing the connection of the human body.
  • the electronic device After completing the connection of the key points of the candidate body, the electronic device also normalizes the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
  • the normalization process is carried out according to the following formula,
  • x and y represent the abscissa and ordinate of the key points of a candidate body
  • x'and y' represent the abscissa and ordinate of the key points of the human body obtained by normalizing the key points of the candidate body
  • w represents the portrait
  • h represents the height of the bounding box of the portrait.
  • the location branch network includes N location segments
  • the relationship branch network includes N relationship segments
  • the location branch network is called to obtain key points of candidates based on image feature detection
  • the calling relationship branch network is detected based on image features.
  • the connection relationship between the key points of the candidate body including:
  • the location branch network includes N (N is a positive integer greater than 2 and can be valued by a person of ordinary skill in the art according to actual needs) location segments, and the relationship branch network includes N relationships Segmented.
  • N is a positive integer greater than 2 and can be valued by a person of ordinary skill in the art according to actual needs
  • the relationship branch network includes N relationships Segmented.
  • the location branch network includes N location segments, namely location segment 1 to location segment N.
  • the relationship branch network includes N relationship segments, namely relationship segment 1 to relationship. Segment N.
  • location segment 1 and relation segment 1 form network segment 1
  • location segment 2 and relation segment 2 form network segment 2
  • location segment N and relation segment N form network segment N
  • the dual-branch network constructed in the embodiment of this application can be regarded as composed of multiple network segments, such as network segment 1 to network segment N shown in FIG. 5, each of which includes a corresponding Location segmentation and relationship segmentation.
  • the electronic device inputs the image features extracted by the feature extraction network into the location segment 1 in the network segment 1 for detection, obtains the first group of candidate body key points output by the location segment 1, and extracts the obtained
  • the image features of the image are input into the relationship segment 1 in the network segment 1 for detection, and the connection relationship between the key points of the first group of candidates output by the relationship segment 1 is obtained; then, the key points (coordinates) of the first group of candidates ,
  • the connection relationship between the key points of the first group of candidates and the image features are fused, and the first fusion feature is obtained as the output of network segment 1.
  • the first fusion feature output by network segment 1 is input to the network
  • the location segment 2 in segment 2 is detected, and the key points of the second group of candidates output by location segment 2 are obtained, and the fusion features output by network segment 1 are input into the relationship segment 2 of network segment 2 for detection , Get the connection relationship between the key points of the second group of candidates output by the relationship segment 2; then, the connection relationship between the key points of the second group of candidates (coordinates), the connection relations between the key points of the second group of candidates, and the image features Perform fusion to obtain the second fusion feature as the output of network segment 2; and so on, until the position in network segment N is obtained.
  • Segment N is the N-1th fusion output according to network segment N-1
  • the key points of the Nth group of candidates detected by the feature, and the relationship between the segment N obtained in the network segment N, and the Nth group of candidates detected according to the N-1th fusion feature output by the network segment N-1 The connection relationship between the key points of the human body; then, the key points of the Nth group of candidates output by the location segment N are used as the key points of the candidate body finally output by the location branch network, and the key points of the Nth group of candidates output by the relationship segment N
  • the connection relationship between the points is used as the connection relationship between the key points of the candidate body finally output by the relationship branch network.
  • the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence.
  • first relationship segment and the first location segment are the same, but the two do not share parameters.
  • the first position segment includes a plurality of first convolution models and a plurality of second convolution modules that are sequentially connected, wherein the first convolution module includes a convolution kernel with a size of 3*3. Convolution unit, the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
  • the number of the first convolution module and the second convolution module constituting the first position segment is not specifically limited, and can be configured by a person of ordinary skill in the art according to actual needs, such as In the embodiment of the present application, three first convolution modules and two second convolution modules are used, as shown in FIG. 6.
  • the structures of the second to N position segments are the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolution modules connected in sequence.
  • the second position segment includes multiple third convolution models and multiple second convolution modules that are sequentially connected, where the third convolution module includes a convolution kernel with a size of 7*7.
  • Product unit includes a convolution kernel with a size of 7*7.
  • the number of the third convolution module and the second convolution module constituting the second position segment is not specifically limited, and can be configured by a person of ordinary skill in the art according to actual needs.
  • the embodiment of the application In this, five third convolution modules and two second convolution modules are used, as shown in Figure 7.
  • each 7 ⁇ 7 convolution unit can be replaced with three 3 ⁇ 3 convolution units to reduce the amount of processed data.
  • acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body includes:
  • the category center point of each category area is determined, the category center point of each category area is taken as a positioning point, and each human body key point is taken as a Anchor point, these anchor points form an anchor point set.
  • determining the set of composition points corresponding to the set of anchor points includes:
  • composition point set (2) Taking the center point of each category and each key point of the human body in the preset composition template image as the composition point to obtain the composition point set.
  • this application constructs a portrait composition database in advance, and the portrait composition data includes a plurality of preset composition template images.
  • the portrait composition database is constructed as follows:
  • the number of clustering categories can be determined according to the scene of the collected image and the distribution of the posture of the human body, such as scene and More categories can be set when the posture is more changeable, and fewer categories can be set when the posture is more singular, which can be specifically configured by a person of ordinary skill in the art according to actual needs.
  • the electronic device when determining the set of composition points corresponding to the set of anchor points, may use the set of anchor points (including the category center points of the preview image and the key points of the human body) and the aspect ratio of the portrait bounding box of the preview image as The characteristics of the preview image are determined from the portrait composition data, and the preset composition template image with the highest similarity (measured by the Minnesian distance) to the preview image is determined. Then, each category center point and each human body key point of the determined preset composition template image are used as composition points to obtain a set of composition points.
  • the electronic device displays the composition point set and the positioning point set in the previewable image.
  • the composition point set includes composition point 1 and composition point 2
  • the positioning point set includes composition point 1 corresponding to composition point 1.
  • the positioning point 1 and the positioning point 2 corresponding to the composition point 2 are combined with the pointing arrow from the positioning point 1 to the composition point 1 and the pointing arrow from the positioning point 2 to the composition point 2 as the prompt information for adjusting the shooting posture of the electronic device.
  • the user guides the composition.
  • composition prompting method provided by the present application further includes:
  • the electronic device combines positioning points and composition points corresponding to the same category into a data group, and combines positioning points and composition points corresponding to the same human body position into a data group, thereby obtaining multiple data sets.
  • the electronic device calculates the distance between the positioning point and the composition point (Euclidean distance), and calculates the distance and value of the multiple data groups.
  • the electronic device determines whether the calculated distance and value reach the preset threshold, and if so, it determines that the set of positioning points matches the set of composition points, otherwise it does not match.
  • FIG. 9 is a schematic flowchart of a composition prompting method provided by an embodiment of the present application.
  • the flow of the composition prompting method provided by an embodiment of the present application may be as follows:
  • the electronic device obtains a preview image of the shooting scene, and intercepts a human body image from the preview image.
  • the shooting scene is the scene where the camera of the electronic device is aimed after the shooting application is started, and it can be any scene, which can include people and objects.
  • the preview image is obtained by the electronic device using the camera to perform the image scene of the shooting scene, and is used by default to show to the user, so that the user can preview the imaging effect of the image shooting.
  • the electronic device uses the preview image collected in real time to detect the key points of the human body in the shooting scene, so as to detect the key points of the human body.
  • the electronic device first obtains a preview image of the shooting scene.
  • the embodiment of the present application also pre-trains a portrait detection model using a machine learning method.
  • the portrait detection model is configured to take an image as an input and a portrait bounding box corresponding to the image as an output.
  • the image within the bounding box of the portrait The content is the portrait part of the image.
  • the electronic device can call the portrait detection model to perform portrait detection on the preview image, and obtain the portrait bounding box corresponding to the preview image.
  • the human body image can be obtained by cutting out the image content in the bounding box of the portrait.
  • the electronic device calls the pre-trained key point detection model to perform key point detection on the human body image to obtain the human body key points of the human body in the shooting scene.
  • a machine learning method is also used to pre-train a key point detection model.
  • the key point detection model can be set locally on the electronic device or on the server.
  • the configuration of the portrait detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs.
  • the electronic device in addition to obtaining the preview image of the shooting scene, the electronic device also calls the pre-trained key point detection model from the local or server, and inputs the obtained preview image into the pre-trained key point detection model for key point detection. Obtain the key points of the human body in the shooting scene.
  • the key points of the human body are used to locate the head, neck, shoulders, elbows, hands, hips, knees and feet of the human body.
  • the key points of the head can be subdivided into the eyes, nose, mouth, eyebrows and contour points of various parts of the head, etc. .
  • Figure 2 for the human body image shown on the left side of Figure 2, input it into the pre-trained key point detection model for key point detection, and multiple human key points are obtained, as shown on the right side of Figure 2.
  • the electronic device divides the preview image into multiple category areas, and determines the category center point of each category area.
  • the electronic device uses the center point of each category and each key point of the human body as positioning points to obtain a set of positioning points.
  • the electronic device after acquiring the preview image of the shooting scene, not only performs key point detection on the preview image, but also divides the preview image into multiple category areas.
  • a machine learning method is also used in this application to pre-train a semantic segmentation model.
  • the semantic segmentation model can be set locally on the electronic device or on the server.
  • the configuration of the semantic segmentation model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs.
  • the semantic segmentation model of ICNet configuration is adopted in this application.
  • the electronic device can call the pre-trained semantic segmentation model from the local or the server, and input the obtained preview image into the pre-trained semantic segmentation model for semantic segmentation, to obtain the preview image Object category information to which each area belongs. Then, according to the category information, the electronic device divides the preview image into multiple category areas, and determines the category center point of each category area.
  • the category center point of each category area is taken as an anchor point, and each key point of the human body is taken as an anchor point, and these anchor points form an anchor point set.
  • the electronic device determines the preset composition template image with the highest similarity to the preview image according to the set of positioning points.
  • the electronic device uses each category center point and each human body key point in the preset composition template image as a composition point to obtain a set of composition points.
  • this application constructs a portrait composition database in advance, and the portrait composition data includes a plurality of preset composition template images.
  • the electronic device when determining the set of composition points corresponding to the set of anchor points, may use the set of anchor points (including the category center points of the preview image and the key points of the human body) and the aspect ratio of the portrait bounding box of the preview image as The characteristics of the preview image are determined from the portrait composition data, and the preset composition template image with the highest similarity (measured by the Minnesian distance) to the preview image is determined. Then, each category center point and each human body key point of the determined preset composition template image are used as composition points to obtain a set of composition points.
  • composition points in the composition point set correspond to the positioning points in the positioning point set one-to-one, and when each positioning point matches its corresponding composition point, it is considered that the best composition can be obtained at this time.
  • matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance. This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.
  • the electronic device when the positioning point set does not match the composition point set, the electronic device outputs prompt information for instructing to adjust the shooting posture of the electronic device.
  • a person of ordinary skill in the art can configure the constraint conditions for matching the positioning point set and the composition point set according to actual needs.
  • This application does not make specific restrictions on this, for example, it can be configured as positioning
  • each anchor point in the point set matches with the composition point in the corresponding composition point set
  • it is determined that the anchor point set matches the composition point set for example, it can be configured as a preset number of anchor points in the anchor point set and their counterparts.
  • the composition point in the composition point set is matched, it is determined that the anchor point set matches the composition point set.
  • the electronic device determines in real time whether the set of anchor points corresponding to the shooting scene matches the set of composition points, and if they do not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device, so that the set of anchor points corresponding to the shooting scene and the composition are output.
  • the point set is matched, so that the people and objects in the shooting scene can obtain a better composition.
  • the electronic device displays the composition point set and the positioning point set in the previewable image.
  • the composition point set includes composition point 1 and composition point 2
  • the positioning point set includes composition point 1 corresponding to composition point 1.
  • the positioning point 1 and the positioning point 2 corresponding to the composition point 2 are combined with the pointing arrow from the positioning point 1 to the composition point 1 and the pointing arrow from the positioning point 2 to the composition point 2 as the prompt information for adjusting the shooting posture of the electronic device.
  • the user guides the composition.
  • the electronic device photographs the shooting scene to obtain a photographed image.
  • the electronic device determines that the people and objects in the shooting scene at this time have a better composition, that is, the shooting scene is photographed, so as to obtain a high-quality shooting image of the shooting scene.
  • a composition prompting device is also provided.
  • FIG. 10 is a schematic structural diagram of a composition prompting device provided by an embodiment of the application.
  • the composition prompting device is applied to electronic equipment, and the composition prompting device includes a key point detection module 301, a positioning point determination module 302, a composition point determination module 303, a composition prompt module 304, and an image capturing module 305, as follows:
  • the key point detection module 301 is used to obtain a preview image of the shooting scene, and call a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene;
  • the positioning point determination module 302 is configured to divide the preview image into multiple category areas, and obtain a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;
  • the composition point determination module 303 is used to determine the composition point set corresponding to the positioning point set;
  • the composition prompting module 304 is configured to output prompt information for instructing to adjust the shooting posture of the electronic device when the set of positioning points and the set of composition points do not match.
  • the composition prompting device provided by the present application further includes an image capturing module, which is used to capture the shooting scene when the positioning point set matches the composition point set to obtain the captured image.
  • the key point detection module 301 when the pre-trained key point detection model is called to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, the key point detection module 301 is used to:
  • the key point detection module 301 when the human body image of the human body is intercepted from the preview image, the key point detection module 301 is used to:
  • the image content in the bounding box of the portrait is intercepted to obtain the human body image.
  • the key point detection model includes a feature extraction network, a dual branch network, and an output network.
  • the dual branch network includes a location branch network and a relationship branch network.
  • the key point detection model is called to perform key point detection on the human body image to obtain the human body.
  • the key point detection module 301 is used to:
  • the call location branch network detects the key points of the candidate body according to the image feature
  • the call relationship branch network detects the connection relationship between the key points of the candidate body according to the image feature
  • the output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
  • the location branch network includes N location segments
  • the relationship branch network includes N relationship segments
  • the call location branch network detects candidate body key points according to image features
  • the call relationship branch network detects the image features according to the image features.
  • Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation to obtain the second group of candidates based on the first fusion feature detection The key points of the human body, and call the second relationship segment to detect the connection relationship between the key points of the second group of candidates according to the first fusion feature;
  • the key points of the Nth group of candidates are taken as the key points of the candidates detected by the location branch network, and the connection relationship between the key points of the N groups of candidates is taken as the connection relationship between the key points of the candidate detected by the relationship branch network.
  • the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence.
  • the first convolution module includes a convolution unit with a convolution kernel size of 3*3
  • the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
  • the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolution modules connected in sequence.
  • the third convolution module includes a convolution unit with a convolution kernel size of 7*7.
  • the positioning point determination module 302 when acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body, is configured to:
  • the composition point determination module 303 when determining the composition point set corresponding to the anchor point set, is configured to:
  • the set of anchor points determine the preset composition template image with the highest similarity to the preview image
  • the center point of each category and each key point of the human body in the preset composition template image are used as composition points to obtain a set of composition points.
  • composition prompting device provided by the present application further includes a judgment module for:
  • composition reminding device provided in this embodiment of the application belongs to the same concept as the composition reminding method in the above embodiment. Any method provided in the composition reminding method embodiment can be run on the composition reminding device, and its specific implementation For details of the process, refer to the above embodiment, which will not be repeated here.
  • an electronic device is also provided.
  • the electronic device includes a processor 401 and a memory 402.
  • the processor 401 in the embodiment of the present application is a general-purpose processor, such as an ARM architecture processor.
  • a computer program is stored in the memory 402, which may be a high-speed random access memory or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 402 may also include a memory controller to provide the processor 401 with access to the computer program in the memory 402 to implement the following functions:
  • the processor 401 when the pre-trained key point detection model is called to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, the processor 401 is configured to execute:
  • the processor 401 when the human body image of the human body is intercepted from the preview image, the processor 401 is configured to execute:
  • the image content in the bounding box of the portrait is intercepted to obtain the human body image.
  • the key point detection model includes a feature extraction network, a dual branch network, and an output network.
  • the dual branch network includes a location branch network and a relationship branch network.
  • the key point detection model is called to perform key point detection on the human body image to obtain the human body.
  • the processor 401 is used to execute:
  • the call location branch network detects the key points of the candidate body according to the image feature
  • the call relationship branch network detects the connection relationship between the key points of the candidate body according to the image feature
  • the output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
  • the location branch network includes N location segments
  • the relationship branch network includes N relationship segments
  • the call location branch network detects candidate body key points according to image features
  • the call relationship branch network detects the image features according to the image features.
  • Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation to obtain the second group of candidates based on the first fusion feature detection The key points of the human body, and call the second relationship segment to detect the connection relationship between the key points of the second group of candidates according to the first fusion feature;
  • the key points of the Nth group of candidates are taken as the key points of the candidates detected by the location branch network, and the connection relationship between the key points of the N groups of candidates is taken as the connection relationship between the key points of the candidate detected by the relationship branch network.
  • the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence.
  • the first convolution module includes a convolution unit with a convolution kernel size of 3*3
  • the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
  • the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolution modules connected in sequence.
  • the third convolution module includes a convolution unit with a convolution kernel size of 7*7.
  • the processor 401 when acquiring a set of anchor points corresponding to the shooting scene according to the category area and the key points of the human body, the processor 401 is configured to execute:
  • the processor 401 when determining the composition point set corresponding to the anchor point set, is configured to execute:
  • the set of anchor points determine the preset composition template image with the highest similarity to the preview image
  • the center point of each category and each key point of the human body in the preset composition template image are used as composition points to obtain a set of composition points.
  • the processor 401 is further configured to execute:
  • the electronic device provided in the embodiment of this application belongs to the same concept as the composition reminding method in the above embodiment. Any method provided in the composition reminding method embodiment can be run on the electronic device. The specific implementation process is detailed. See the embodiment of the composition prompting method, which will not be repeated here.
  • composition prompting method of the embodiment of the present application can be completed by controlling the relevant hardware through a computer program.
  • the computer program may be stored in a computer readable storage medium, such as stored in the memory of an electronic device, and executed by a processor in the electronic device, and may include embodiments such as composition prompting methods during execution.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.
  • composition prompting method, model training method, device, storage medium, and electronic equipment provided by the embodiments of the application are described in detail above. Specific examples are used in this article to illustrate the principles and implementation of the application. The above implementations The description of the example is only used to help understand the method and core idea of this application; at the same time, for those skilled in the art, according to the idea of this application, there will be changes in the specific implementation and the scope of application. In summary As mentioned, the content of this specification should not be construed as a limitation to this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Psychiatry (AREA)
  • Social Psychology (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present application are a photographic composition prompting method and apparatus, a storage medium, and an electronic device. A preview image of a photography scene is obtained for key point detection to obtain human body key points of a human body in the photography scene; the preview image is divided into a plurality of category areas, a positioning point set corresponding to the photography scene is obtained in combination with the human body key points, and a corresponding photographic composition point set is determined; if the positioning point set does not match the photographic composition point set, prompt information used for giving indication to adjust the photographing attitude of the electronic device is output.

Description

构图提示方法、装置、存储介质及电子设备Composition reminding method, device, storage medium and electronic equipment
本申请要求于2020年02月27日提交中国专利局、申请号为202010125410.4、发明名称为“构图提示方法、装置、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on February 27, 2020, the application number is 202010125410.4, and the invention title is "Composition Prompt Method, Apparatus, Storage Medium and Electronic Equipment", the entire content of which is incorporated by reference In this application.
技术领域Technical field
本申请涉及图像处理技术领域,具体涉及一种构图提示方法、装置、存储介质及电子设备。This application relates to the field of image processing technology, and in particular to a composition prompting method, device, storage medium, and electronic equipment.
背景技术Background technique
目前,人们的生活已离不开智能手机、平板电脑等电子设备,通过这些电子设备所提供的各种各样丰富的功能,使得人们能够随时随地的娱乐、办公等。比如,利用电子设备的拍摄功能,用户可以随时随地的通过电子设备进行拍摄。但是,为了拍摄得到高质量的图像,不仅需要电子设备具有较高的拍摄能力,更需要用户具有一定的专业拍摄技能。At present, people's lives are inseparable from electronic devices such as smart phones and tablet computers. The various and rich functions provided by these electronic devices enable people to entertain and work anytime and anywhere. For example, by using the shooting function of the electronic device, the user can use the electronic device to shoot anytime and anywhere. However, in order to obtain high-quality images, not only the electronic equipment is required to have a high shooting ability, but also the user is required to have certain professional shooting skills.
发明内容Summary of the invention
本申请实施例提供了一种构图提示方法、装置、存储介质及电子设备,能够提高电子设备拍摄图像的质量。The embodiments of the present application provide a composition prompting method, device, storage medium, and electronic equipment, which can improve the quality of images captured by the electronic equipment.
本申请实施例提供的构图提示方法,包括:The composition prompting method provided by the embodiment of the present application includes:
获取拍摄场景的预览图像,并调用预训练的关键点检测模型对所述预览图像进行关键点检测,得到所述拍摄场景中人体的人体关键点;Acquiring a preview image of the shooting scene, and calling a pre-trained key point detection model to perform key point detection on the preview image, to obtain the human body key points of the human body in the shooting scene;
将所述预览图像划分为多个类别区域,并根据所述类别区域以及所述人体关键点获取对应所述拍摄场景的定位点集合;Dividing the preview image into multiple category areas, and obtaining a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;
确定对应所述定位点集合的构图点集合;Determining a set of composition points corresponding to the set of positioning points;
当所述定位点集合与所述构图点集合不匹配时,输出用于指示调整所述电子设备拍摄姿态的提示信息。When the set of positioning points does not match the set of composition points, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
本申请实施例提供的构图提示装置,包括:The composition prompting device provided by the embodiment of the present application includes:
关键点检测模块,用于获取拍摄场景的预览图像,并调用预训练的关键点检测模型对所述预览图像进行关键点检测,得到所述拍摄场景中人体的人体关键点;The key point detection module is used to obtain a preview image of the shooting scene, and call a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene;
定位点确定模块,用于将所述预览图像划分为多个类别区域,并根据所述类别区域以及所述人体关键点获取对应所述拍摄场景的定位点集合;An anchor point determination module, configured to divide the preview image into a plurality of category areas, and obtain an anchor point set corresponding to the shooting scene according to the category area and the key points of the human body;
构图点确定模块,用于确定对应所述定位点集合的构图点集合;A composition point determination module, configured to determine a composition point set corresponding to the positioning point set;
构图提示模块,用于当所述定位点集合与所述构图点集合不匹配时,输出用于指示调整所述电子设备拍摄姿态的提示信息。The composition prompting module is configured to output prompt information for instructing to adjust the shooting posture of the electronic device when the set of positioning points does not match the set of composition points.
本申请实施例提供的存储介质,其上存储有计算机程序,当所述计算机程序被处理器加载时执行如本申请提供的构图提示方法。The storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is loaded by a processor, the composition prompting method as provided in the present application is executed.
本申请实施例提供的电子设备,包括处理器和存储器,所述存储器存有计算机程序,所述处理器通过加载所述计算机程序,用于执行本申请提供的构图提示方法。The electronic device provided by the embodiment of the present application includes a processor and a memory, the memory stores a computer program, and the processor loads the computer program to execute the composition prompting method provided by the present application.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without creative work.
图1为本申请实施例提供的构图提示方法的流程示意图。FIG. 1 is a schematic flowchart of a composition prompting method provided by an embodiment of the application.
图2是本申请实施例中检测得到的人体关键点的示意图。Fig. 2 is a schematic diagram of the key points of the human body detected in an embodiment of the present application.
图3是本申请实施例中截取人体图像的示意图。Fig. 3 is a schematic diagram of intercepting a human body image in an embodiment of the present application.
图4是本申请实施例提供的关键点检测模型的结构示意图。Fig. 4 is a schematic structural diagram of a key point detection model provided by an embodiment of the present application.
图5是本申请实施例提供的关键点检测模型的细化结构示意图。Fig. 5 is a detailed structure diagram of a key point detection model provided by an embodiment of the present application.
图6是本申请实施例中第1个位置分段的结构示意图。Fig. 6 is a schematic structural diagram of the first position segment in an embodiment of the present application.
图7是本申请实施例中第2个位置分段的结构示意图。Fig. 7 is a schematic structural diagram of the second position segment in an embodiment of the present application.
图8是本申请实施例中输出提示信息的示例图。Fig. 8 is an example diagram of outputting prompt information in an embodiment of the present application.
图9是本申请实施例提供的构图提示方法的另一流程示意图FIG. 9 is a schematic diagram of another flow chart of the composition prompting method provided by an embodiment of the present application
图10是本申请实施例提供的构图提示装置的结构示意图。Fig. 10 is a schematic structural diagram of a composition prompting device provided by an embodiment of the present application.
图11是本申请实施例提供的电子设备的结构示意图。FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
请参照图式,其中相同的组件符号代表相同的组件,本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是通过所例示的本申请具体实施例,其不应被视为限制本申请未在此详述的其它具体实施例。Please refer to the drawings, where the same component symbols represent the same components, and the principle of the present application is implemented in an appropriate computing environment for illustration. The following description is based on the specific embodiments of the present application exemplified, which should not be construed as limiting other specific embodiments of the present application that are not described in detail herein.
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
其中,机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习等技术。Among them, Machine Learning (ML) is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance. Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies.
本申请实施例提供的方案涉及人工智能的机器学习技术,具体通过如下实施例进行说明:The solutions provided in the embodiments of the present application involve artificial intelligence machine learning technology, which is specifically illustrated by the following embodiments:
本申请实施例提供一种模型训练方法、构图提示方法、构图提示装置、人像分割装置、存储介质以及电子设备,其中,该模型训练方法的执行主体可以是本申请实施例中提供的构图提示装置,或者集成了该构图提示装置的电子设备,其中该构图提示装置可以采用硬件或软件的方式实现;该构图提示方法的执行主体可以是本申请实施例中提供的人像分割装置,或者集成了该人像分割装置的电子设备,其中该人像分割装置可以采用硬件或软件的方式实现。其中,电子设备可以是智能手机、平板电脑、掌上电脑、笔记本电脑、或者台式电脑等配置有处理器(包括但不限于通用处理器、定制化处理器等)而具有处理能力的设备。The embodiment of the present application provides a model training method, a composition prompting method, a composition prompting device, a portrait segmentation device, a storage medium, and an electronic device, wherein the execution subject of the model training method may be the composition prompting device provided in the embodiment of the present application , Or an electronic device integrated with the composition prompting device, wherein the composition prompting device can be implemented in hardware or software; the execution subject of the composition prompting method can be the portrait segmentation device provided in the embodiment of the application, or the composition prompting device can be integrated with The electronic equipment of the portrait segmentation device, wherein the portrait segmentation device can be implemented in hardware or software. Among them, the electronic device may be a device equipped with a processor (including but not limited to a general-purpose processor, a customized processor, etc.) and having processing capabilities, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.
本申请提供一种构图提示方法,包括This application provides a composition prompting method, including
获取拍摄场景的预览图像,并调用预训练的关键点检测模型对所述预览图像进行关键点检测,得到所述拍摄场景中人体的人体关键点;Acquiring a preview image of the shooting scene, and calling a pre-trained key point detection model to perform key point detection on the preview image, to obtain the human body key points of the human body in the shooting scene;
将所述预览图像划分为多个类别区域,并根据所述类别区域以及所述人体关键点获取对应所述拍摄场景的定位点集合;Dividing the preview image into multiple category areas, and obtaining a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;
确定对应所述定位点集合的构图点集合;Determining a set of composition points corresponding to the set of positioning points;
当所述定位点集合与所述构图点集合不匹配时,输出用于指示调整所述电子设备拍摄姿态的提示信息。When the set of positioning points does not match the set of composition points, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
可选地,在一实施例中,所述调用预训练的关键点检测模型对所述预览图像进行关键点检测,得到所述拍摄场景中人体的人体关键点,包括:Optionally, in an embodiment, the invoking a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene includes:
从所述预览图像中截取所述人体的人体图像;Intercepting the human body image of the human body from the preview image;
调用所述关键点检测模型对所述人体图像进行关键点检测,得到所述人体关键点。Calling the key point detection model to perform key point detection on the human body image to obtain the human body key point.
可选地,在一实施例中,从所述预览图像中截取所述人体的人体图像,包括:Optionally, in an embodiment, intercepting the human body image of the human body from the preview image includes:
调用预训练的人像检测模型对所述预览图像进行人像检测,得到对应所述预览图像的人像边界框;Calling a pre-trained portrait detection model to perform portrait detection on the preview image to obtain a portrait bounding box corresponding to the preview image;
截取所述人像边界框中的图像内容,得到所述人体图像。The image content in the bounding box of the portrait is intercepted to obtain the image of the human body.
可选地,在一实施例中,所述关键点检测模型包括特征提取网络,双分支网络以及输出网络,所述双分支网络包括位置分支网络和关系分支网络,所述调用所述关键点检测模型对所述人体图像进行关键点检测,得到所述人体关键点,包括:Optionally, in an embodiment, the key point detection model includes a feature extraction network, a dual-branch network, and an output network, the dual-branch network includes a location branch network and a relationship branch network, and the key point detection is invoked The model detects the key points of the human body image to obtain the key points of the human body, including:
调用所述特征提取网络提取得到所述人体图像的图像特征;Calling the feature extraction network to extract the image features of the human body image;
调用所述位置分支网络根据所述图像特征检测得到候选人体关键点,以及调用所述关系分支网络根据所述图像特征检测得到候选人体关键点之间的连接关系;Calling the location branch network to detect key points of the candidate body according to the image feature, and calling the relationship branch network to detect the connection relationship between the key points of the candidate body according to the image feature;
调用所述输出网络根据所述连接关系连接所述候选人体关键点,并根据所述人像边界框对连接后的候选人体关键点进行归一化处理,得到所述人体关键点。The output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
可选地,在一实施例中,所述位置分支网络包括N个位置分段,所述关系分支网络包括N个关系分段,所述调用所述位置分支网络根据所述图像特征检测得到候选人体关键点,以及调用所述关系分支网络根据所述图像特征检测得到候选人体关键点之间的连接关系,包括:Optionally, in an embodiment, the location branch network includes N location segments, the relationship branch network includes N relationship segments, and the location branch network is called to obtain a candidate based on the image feature detection. The key points of the human body, and calling the relationship branch network to obtain the connection relationship between the key points of the candidate body according to the image feature detection, including:
调用第1个位置分段根据所述图像特征检测得到第1组候选人体关键点,以及调用第1个关系分段根据所述图像特征检测得到所述第1组候选人体关键点之间的连接关系;Call the first position segment to obtain the first group of candidate key points based on the image feature detection, and call the first relationship segment to obtain the connection between the first group of candidate key points based on the image feature detection relation;
融合所述第1组候选人体关键点、所述第1组候选人体关键点之间的连接关系以及所述图像特征得到第1个融合特征,调用第2个位置分段根据所述第1个融合特征检测得到第2组候选人体关键点,以及调用第2个关系分段根据所述第1个融合特征检测得到所述第2组候选人体关键点之间的连接关系;Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation according to the first Fusion feature detection obtains the key points of the second group of candidates, and calls the second relationship segment to obtain the connection relationship between the key points of the second group of candidates according to the first fusion feature detection;
融合所述第2组候选人体关键点、所述第2组候选人体关键点之间的连接关系以及所述图像特征得到第2个融合特征,以此类推,直至得到第N个位置分段根据第N-1个融合特征检测得到的第N组候选人体关键点,以及调用第N个关系分段根据第N-1个融合特征检测得到所述第N组候选人体关键点之间的连接关系;Fusion of the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation basis is obtained The key points of the Nth group of candidates obtained by the N-1th fusion feature detection, and the Nth relationship segment is called to obtain the connection relationship between the key points of the Nth group of candidates according to the N-1th fusion feature detection ;
将所述第N组候选人体关键点作为所述位置分支网络检测得到的候选人体关键点,以及将所述N组候选人体关键点之间的连接关系作为所述关系分支网络检测得到的所述候选人体关键点之间的连接关系。Use the Nth group of candidate key points as the candidate key points detected by the location branch network, and use the connection relationship between the N groups of candidate key points as the relationship branch network detected The connection between the key points of the candidate body.
可选地,在一实施例中,所述第1个位置分段包括依次连接的多个第一卷积模块和多个第二卷积模块,所述第一卷积模块包括卷积核大小为3*3的卷积单元,所述第二卷积模块包括卷积核大小为1*1的卷积单元。Optionally, in an embodiment, the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence, and the first convolution module includes a convolution kernel size It is a 3*3 convolution unit, and the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
可选地,在一实施例中,第2-N个位置分段的结构相同,所述第2个位置分段包括依次连接的多个第三卷积模块和多个所述第二卷积模块。Optionally, in an embodiment, the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolutions connected in sequence. Module.
可选地,在一实施例中,所述第三卷积模块包括卷积核大小为7*7的卷积单元。Optionally, in an embodiment, the third convolution module includes a convolution unit with a convolution kernel size of 7*7.
可选地,在一实施例中,所述确定对应所述定位点集合的构图点集合之后,还包括:Optionally, in an embodiment, after the determining a set of composition points corresponding to the set of positioning points, the method further includes:
当所述定位点集合与所述构图点集合匹配时,对所述拍摄场景进行拍摄,得到拍摄图像。When the set of positioning points matches the set of composition points, the shooting scene is photographed to obtain a photographed image.
可选地,在一实施例中,所述根据所述类别区域以及所述人体关键点获取对应所述拍摄场景的定位点集合,包括:Optionally, in an embodiment, the acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body includes:
确定每一类别区域的类别中心点,将每一类别区域的类别中心点以及每一人体关键点作为所述定位点,得到所述定位点集合。The category center point of each category area is determined, and the category center point of each category area and each key point of the human body are used as the positioning points to obtain the set of positioning points.
可选地,在一实施例中,所述确定对应所述定位点集合的构图点集合,包括:Optionally, in an embodiment, the determining a set of composition points corresponding to the set of positioning points includes:
根据所述定位点集合,确定与所述预览图像相似度最高的预设构图模板图像;Determine the preset composition template image with the highest similarity to the preview image according to the set of positioning points;
将所述预设构图模板图像中每一类别中心点以及每一人体关键点作为所述构图点,得到所述构图点集合。Taking each category center point and each human body key point in the preset composition template image as the composition point to obtain the composition point set.
可选地,在一实施例中,还包括:Optionally, in an embodiment, it further includes:
将对应同一类别的定位点和构图点组合为数据组,以及将对应同一人体位置的定位点和构图点组合为数据组,得到多个数据组;Combine positioning points and composition points corresponding to the same category into a data group, and combine positioning points and composition points corresponding to the same human body position into a data group to obtain multiple data groups;
计算每一数据组中定位点和构图点的距离,并计算多个数据组的距离和值;Calculate the distance between the positioning point and the composition point in each data group, and calculate the distance and value of multiple data groups;
当所述距离和值小于预设阈值时,判定所述定位点集合与所述构图点集合匹配。When the distance and value are less than a preset threshold, it is determined that the set of positioning points matches the set of composition points.
请参照图1,图1为本申请实施例提供的构图提示方法的流程示意图,本申请实施例提供的构图提示方法的流程可以如下:Please refer to FIG. 1. FIG. 1 is a schematic flowchart of a composition prompting method provided by an embodiment of the present application. The flow of the composition prompting method provided by an embodiment of the present application may be as follows:
在101中,获取拍摄场景的预览图像,并调用预训练的关键点检测模型对预览图像进行关键点检测,得到拍摄场景中人体的人体关键点。In 101, a preview image of the shooting scene is obtained, and a pre-trained key point detection model is called to perform key point detection on the preview image, and the human body key points of the human body in the shooting scene are obtained.
拍摄场景为电子设备在启动拍摄类应用程序后摄像头所对准的场景,其可以为任何场景,其中可以包括人和物等。The shooting scene is the scene where the camera of the electronic device is aimed after the shooting application is started, and it can be any scene, which can include people and objects.
其中,对于如何启动电子设备的拍摄类应用程序以及何种拍摄类应用程序,本申请中不做具体限制。比如,电子设备可以根据用户操作来启动电子设备的系统应用“相机”,在启动“相机”后,电子设备将通过摄像头实时进行图像采集,此时,其摄像头所对准的场景即为拍摄场景。比如,电子设备可以根据用户对“相机”入口的触摸操作来启动“相机”,还可以根据用户的语音口令“启动相机”来启动“相机”等。本申请提供的构图提示方法可以适用于人像场景的图像拍摄,其中,人像场景即存在人体的拍摄场景。There are no specific restrictions on how to start the shooting application program of the electronic device and what kind of shooting application program. For example, the electronic device can start the system application "camera" of the electronic device according to the user's operation. After starting the "camera", the electronic device will collect real-time images through the camera. At this time, the scene where the camera is aimed is the shooting scene. . For example, the electronic device can start the "camera" according to the user's touch operation on the entrance of the "camera", and can also start the "camera" according to the user's voice password "start the camera" and so on. The composition prompting method provided in this application can be applied to image shooting of a portrait scene, where the portrait scene is a shooting scene in which a human body exists.
应当说明的是,预览图像为电子设备通过摄像头对拍摄场景进行图像场景得到,缺省用于展示给用户,以使得用户能够预览图像拍摄的成像效果。It should be noted that the preview image is obtained by the electronic device using the camera to perform the image scene of the shooting scene, and is used by default to show to the user, so that the user can preview the imaging effect of the image shooting.
本申请实施例中,电子设备利用实时采集的预览图像来对拍摄场景中的人体进行关键点检测,以检测到前述人体的人体关键点。In the embodiment of the present application, the electronic device uses the preview image collected in real time to detect the key points of the human body in the shooting scene, so as to detect the key points of the human body.
其中,电子设备首先获取到拍摄场景的预览图像。应当说明的是,本申请实施例中还采用机器学习方法预先训练有关键点检测模型。其中,该关键点检测模型可以设置在电子设备本地,也可以设置在服务器。此外,本申请中对人像检测模型的构型不做具体限制,可由本领域普通技术人员根据实际需要选择。相应的,电子设备除了获取拍摄场景的预览图像之外,还从本地或服务器调用预训练的关键点检测模型,并将获取到的预览图像输入预训练的关键点检测模型中进行关键点检测,得到拍摄场景中人体的人体关键点。人体关键点用于定位人体的头、颈、肩、肘、手、臀、膝以及脚等部位,头部关键点又可以细分为眼睛、鼻尖、嘴、眉毛以及头部各部件轮廓点等。比如,请参照图2,对于图2左侧所示的人体图像,将其输入到预训练的关键点检测模型进行关键点检测,得到了多个人体关键点,如图2右侧所示。Among them, the electronic device first obtains a preview image of the shooting scene. It should be noted that, in the embodiment of the present application, a machine learning method is also used to pre-train a key point detection model. Among them, the key point detection model can be set locally on the electronic device or on the server. In addition, the configuration of the portrait detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs. Correspondingly, in addition to obtaining the preview image of the shooting scene, the electronic device also calls the pre-trained key point detection model from the local or server, and inputs the obtained preview image into the pre-trained key point detection model for key point detection. Obtain the key points of the human body in the shooting scene. The key points of the human body are used to locate the head, neck, shoulders, elbows, hands, hips, knees and feet of the human body. The key points of the head can be subdivided into the eyes, nose, mouth, eyebrows and contour points of various parts of the head, etc. . For example, referring to Figure 2, for the human body image shown on the left side of Figure 2, input it into the pre-trained key point detection model for key point detection, and multiple human key points are obtained, as shown on the right side of Figure 2.
在102中,将预览图像划分为多个类别区域,并根据类别区域以及人体关键点获取对应拍摄场景的定位点集合。In 102, the preview image is divided into multiple category areas, and a set of positioning points corresponding to the shooting scene is obtained according to the category areas and the key points of the human body.
本申请实施例中,电子设备在获取到拍摄场景的预览图像之后,除了对预览图像进行关键点检测之外,还将预览图像划分为多个类别区域。In the embodiment of the present application, after acquiring the preview image of the shooting scene, the electronic device not only performs key point detection on the preview image, but also divides the preview image into multiple category areas.
示例性的,本申请中还采用机器学习方法预先训练有语义分割模型。其中,该语义分割模型可以设置在电子设备本地,也可以设置在服务器。此外,本申请中对语义分割模型的构型不做具体限制,可由本领域普通技术人员根据实际需要选择。比如,本申请中采用ICNet构型的语义分割模型。Exemplarily, a machine learning method is also used in this application to pre-train a semantic segmentation model. Among them, the semantic segmentation model can be set locally on the electronic device or on the server. In addition, the configuration of the semantic segmentation model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs. For example, the semantic segmentation model of ICNet configuration is adopted in this application.
在将预览图像划分为多个类别区域时,电子设备可以从本地或服务器调用预训练的语义分割模型,并将获取到的预览图像输入预训练的语义分割模型中进行语义分割,得到预览图像中每个区域所属的物体类别信息。然后,根据该类别信息,电子设备将预览图像划分为多个类别区域。When dividing the preview image into multiple category areas, the electronic device can call the pre-trained semantic segmentation model from the local or the server, and input the obtained preview image into the pre-trained semantic segmentation model for semantic segmentation, to obtain the preview image Object category information to which each area belongs. Then, according to the category information, the electronic device divides the preview image into multiple category areas.
然后,电子设备根据划分得到的类别区域以及关键点,按照预设定位点决策策略确定出多个定位点,由确定出的多个定位点构成一个定位点集合。其中,定位点用于代表人体和拍摄场景中其它物体的位置。Then, the electronic device determines multiple positioning points according to the preset positioning point decision strategy according to the divided category areas and key points, and the determined multiple positioning points form a positioning point set. Among them, the positioning point is used to represent the position of the human body and other objects in the shooting scene.
在103中,确定对应定位点集合的构图点集合。In 103, a set of composition points corresponding to the set of anchor points is determined.
其中,电子设备还根据获取到的定位点集合,按照预设构图点决策策略确定出与定位点集合对应的 构图点集合。其中,构图点集合中的构图点与定位点集合中的定位点一一对应,当每一定位点均与其对应的构图点匹配时,认为此时能够获得最佳的构图。其中,定位点与构图点匹配包括定位点为构图点的距离小于或等于预设距离,本申请对该预设距离的取值不做具体限定,可由本领域普通技术人员根据实际需要取值。Wherein, the electronic device also determines the set of composition points corresponding to the set of positioning points according to the preset composition point decision strategy according to the acquired set of positioning points. Wherein, the composition points in the composition point set correspond to the positioning points in the positioning point set one-to-one, and when each positioning point matches its corresponding composition point, it is considered that the best composition can be obtained at this time. Wherein, matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance. This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.
在104中,当定位点集合与构图点集合不匹配时,输出用于指示调整电子设备拍摄姿态的提示信息。In 104, when the set of positioning points and the set of composition points do not match, prompt information for instructing to adjust the shooting posture of the electronic device is output.
其中,根据以上定位点和构图点匹配的定义,可由本领域普通技术人员根据实际需要配置定位点集合与构图点集合匹配的约束条件,本申请对此不做具体限制,比如,可以配置为定位点集合中的每一定位点均与其对应的构图点集合中的构图点匹配时,判定定位点集合与构图点集合匹配;又比如,可以配置为定位点集合中预设数量的定位点与其对应对构图点集合中的构图点匹配时,判定定位点集合与构图点集合匹配。Among them, according to the above definition of matching of positioning points and composition points, a person of ordinary skill in the art can configure the constraint conditions for matching the positioning point set and the composition point set according to actual needs. This application does not make specific restrictions on this, for example, it can be configured as positioning When each anchor point in the point set matches with the composition point in the corresponding composition point set, it is determined that the anchor point set matches the composition point set; for example, it can be configured as a preset number of anchor points in the anchor point set and their counterparts. When the composition point in the composition point set is matched, it is determined that the anchor point set matches the composition point set.
相应的,电子设备实时判定拍摄场景对应的定位点集合与构图点集合是否匹配,若不匹配,则输出用于指示调整电子设备拍摄姿态的提示信息,以使得拍摄场景对应的定位点集合与构图点集合匹配,从而使得拍摄场景中的人和物能够获得较佳的构图。Correspondingly, the electronic device determines in real time whether the set of anchor points corresponding to the shooting scene matches the set of composition points, and if they do not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device, so that the set of anchor points corresponding to the shooting scene and the composition are output. The point set is matched, so that the people and objects in the shooting scene can obtain a better composition.
由上可知,本申请通过获取拍摄场景的预览图像,并调用预训练的关键点检测模型对预览图像进行关键点检测,得到拍摄场景中人体的人体关键点;以及将预览图像划分为多个类别区域,并根据类别区域以及人体关键点获取对应拍摄场景的定位点集合,以及确定出对应定位点集合的构图点集合;当定位点集合与构图点集合不匹配时,输出用于指示调整电子设备拍摄姿态的提示信息;当定位点集合与构图点集合匹配时,对拍摄场景进行拍摄,得到拍摄图像,相较于相关技术,本申请能够引导用户构图,以提高电子设备拍摄的图像质量。It can be seen from the above that this application obtains the key points of the human body in the shooting scene by acquiring the preview image of the shooting scene and calling the pre-trained key point detection model to perform key point detection on the preview image; and dividing the preview image into multiple categories Area, and obtain the positioning point set corresponding to the shooting scene according to the category area and the key points of the human body, and determine the composition point set corresponding to the positioning point set; when the positioning point set does not match the composition point set, the output is used to instruct the adjustment of the electronic device Prompt information of the shooting posture; when the positioning point set matches the composition point set, the shooting scene is shot to obtain the shot image. Compared with the related technology, the present application can guide the user to compose the image to improve the image quality taken by the electronic device.
在一实施例中,确定对应定位点集合的构图点集合之后,还包括:In an embodiment, after determining the set of composition points corresponding to the set of anchor points, the method further includes:
当定位点集合与构图点集合匹配时,对拍摄场景进行拍摄,得到拍摄图像。When the set of positioning points matches the set of composition points, the shooting scene is photographed to obtain a photographed image.
当定位点集合与构图点集合匹配时,电子设备判定此时拍摄场景中的人和物具有较佳的构图,即对拍摄场景进行拍摄,从而得到拍摄场景的高质量拍摄图像。When the set of positioning points matches the set of composition points, the electronic device determines that the people and objects in the shooting scene at this time have a better composition, that is, the shooting scene is photographed, so as to obtain a high-quality shooting image of the shooting scene.
在一实施例中,调用预训练的关键点检测模型对预览图像进行关键点检测,得到拍摄场景中人体的人体关键点,包括:In one embodiment, calling the pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, including:
(1)从预览图像中截取人体的人体图像;(1) Intercept the human body image of the human body from the preview image;
(2)调用关键点检测模型对人体图像进行关键点检测,得到人体关键点。(2) Call the key point detection model to detect the key points of the human body image to obtain the key points of the human body.
为提高对预览图像进行关键点检测的检测效率,本申请中并不对完整的预览图像进行关键点检测,而是对预览图像中存在人体的局部进行关键点检测。In order to improve the detection efficiency of key point detection on the preview image, this application does not perform key point detection on the complete preview image, but performs key point detection on the part of the human body in the preview image.
其中,电子设备在获取到拍摄场景的预览图像之后,并不直接调用关键点检测模型对该预览图像进行关键点检测,而是先从该预览图像中截取出人体的人体图像,再调用关键点检测模型对截取出的人体图像进行关键点检测,从而得到拍摄场景中人体的人体关键点。Among them, after the electronic device obtains the preview image of the shooting scene, it does not directly call the key point detection model to perform key point detection on the preview image, but first intercepts the human body image of the human body from the preview image, and then calls the key point The detection model detects the key points of the intercepted human body image to obtain the key points of the human body in the shooting scene.
应当说明的是,本申请中对于如何从预览图像中截取人体图像不做限制,可由本领域普通技术人员根据实际需要采用合适的截取方式。It should be noted that there is no restriction on how to intercept the human body image from the preview image in this application, and a person of ordinary skill in the art can adopt a suitable interception method according to actual needs.
在一实施例中,从预览图像中截取人体的人体图像,包括:In one embodiment, intercepting the human body image of the human body from the preview image includes:
(1)调用预训练的人像检测模型对预览图像进行人像检测,得到对应预览图像的人像边界框;(1) Call the pre-trained portrait detection model to perform portrait detection on the preview image, and obtain the portrait bounding box corresponding to the preview image;
(2)截取人像边界框中的图像内容,得到人体图像。(2) Intercept the image content in the bounding box of the portrait to obtain the human body image.
应当说明的是,本申请实施例还预先采用机器学习方法训练有人像检测模型,该人像检测模型被配置为以图像为输入,以对应图像的人像边界框为输出,该人像边界框内的图像内容即为图像的人像部分。其中,该人像检测模型可以设置在电子设备本地,也可以设置在服务器,此外,本申请中对人像检测模型的构型不做具体限制,可由本领域普通技术人员根据实际需要选择,比如,本申请中采用Yolo模型或SSD模型作为基础模型,通过机器学习方法训练得到人像检测模型。It should be noted that the embodiment of the present application also pre-trains a portrait detection model using a machine learning method. The portrait detection model is configured to take an image as an input and a portrait bounding box corresponding to the image as an output. The image within the bounding box of the portrait The content is the portrait part of the image. Among them, the portrait detection model can be set locally on the electronic device or on the server. In addition, there is no specific restriction on the configuration of the portrait detection model in this application, and can be selected by a person of ordinary skill in the art according to actual needs. The Yolo model or SSD model is used as the basic model in the application, and the portrait detection model is obtained through machine learning training.
相应的,电子设备在从预览图像中截取人体图像时,可以从本地或服务器调用预训练的人像检测模型,并将预览图像输入预训练的人像检测模型中进行人像检测,得到对应预览图像的人像边界框。然后, 截取出该人像边界框中的图像内容即可得到人体图像。Correspondingly, when the electronic device intercepts the human body image from the preview image, it can call the pre-trained portrait detection model from the local or server, and input the preview image into the pre-trained portrait detection model for portrait detection to obtain the portrait corresponding to the preview image Bounding box. Then, the human body image can be obtained by cutting out the image content in the bounding box of the portrait.
比如,请参照图3,预览图像除了存在人像之外,还存在其它物体。将该预览图像输入到人像检测模型中进行人像检测,得到对应预览图像的人像边界框,如图3所示,该人像边界框内仅包括预览图像人像部分。然后,从预览图像中截取出人像边界框中的图像内容即可得到预览图像中的人体图像。For example, please refer to Figure 3, in addition to the presence of human figures, there are other objects in the preview image. The preview image is input into the portrait detection model for portrait detection, and the portrait bounding box corresponding to the preview image is obtained. As shown in FIG. 3, the portrait bounding box includes only the portrait part of the preview image. Then, cut out the image content in the bounding box of the portrait from the preview image to obtain the human body image in the preview image.
在一实施例中,关键点检测模型包括特征提取网络,双分支网络以及输出网络,双分支网络包括位置分支网络和关系分支网络,调用关键点检测模型对人体图像进行关键点检测,得到人体关键点,包括:In one embodiment, the key point detection model includes a feature extraction network, a dual branch network, and an output network. The dual branch network includes a location branch network and a relationship branch network. The key point detection model is called to perform key point detection on the human body image to obtain the human body key. Points, including:
(1)调用特征提取网络提取得到人体图像的图像特征;(1) Call the feature extraction network to extract the image features of the human body image;
(2)调用位置分支网络根据图像特征检测得到候选人体关键点,以及调用关系分支网络根据图像特征检测得到候选人体关键点之间的连接关系;(2) The call location branch network detects the key points of the candidate body according to the image feature, and the call relationship branch network detects the connection relationship between the key points of the candidate body according to the image feature;
(3)调用输出网络根据连接关系连接候选人体关键点,并根据人像边界框对连接后的候选人体关键点进行归一化处理,得到人体关键点。(3) Call the output network to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
请参照图4,关键点检测模型由三个部分组成,分别为特征提取网络、双分支网络以及输出网络。Please refer to Figure 4, the key point detection model consists of three parts, namely the feature extraction network, the dual branch network and the output network.
特征提取网络可以是任意已知的特征提取网络,比如VGG、MobileNet以及Resnet等,其用途在于对输入的图像进行特征提取,作为后续分支网络的输入。相应的,电子设备首先调用特征提取网络对截取出的人体图像进行特征提取,得到该人体图像的图像特征。The feature extraction network can be any known feature extraction network, such as VGG, MobileNet, and Resnet, etc., and its purpose is to perform feature extraction on the input image as the input of the subsequent branch network. Correspondingly, the electronic device first calls the feature extraction network to perform feature extraction on the intercepted human body image to obtain the image features of the human body image.
本申请中对关键点检测任务进行了分割,使用双分支网络来实现关键点检测,其中一条分支网络倾向于检测图像中可能存在的人体关键点,记为位置分支网络,另一条分支网络倾向于检测可能存在的人体关键点之间的连接关系,记为关系分支网络。相应的,电子设备在基于特征提取网络提取到的人体图像的图像特征之后,进一步调用位置分支网络根据前述图像特征检测得到可能存在的人体关键点,记为候选人体关键点。示例性的,位置分支网络的输出为heatmap,heatmap是一个height*width*keypoints的三维矩阵,其中,height和width分别表示高和宽,keypints表示候选人体关键点的数量,也就是说,每个候选人体关键点对应一个height*width的矩阵,矩阵中每个位置的值表示该候选人体关键点处于这个位置的可能性,值越大表示该候选人体关键点越有可能处于该位置。比如,可以取heatmap中每个区域中最大值的位置得到对应的候选人体关键点,其中,可以对heatmap进行最大池化,然后将池化前和池化后的heatmap对比,取值相等的位置作为候选人体关键点。In this application, the key point detection task is segmented, and the dual branch network is used to realize key point detection. One branch network tends to detect the key points of the human body that may exist in the image, which is recorded as the location branch network, and the other branch network tends to The connection relationship between key points of the human body that may exist is detected, and it is recorded as the relationship branch network. Correspondingly, after the electronic device extracts the image features of the human body image based on the feature extraction network, it further calls the location branch network to detect possible human body key points based on the aforementioned image feature detection, and record them as candidate body key points. Exemplarily, the output of the location branch network is heatmap, which is a three-dimensional matrix of height*width*keypoints, where height and width represent height and width respectively, and keypints represent the number of key points of the candidate body, that is, each The candidate body key point corresponds to a height*width matrix. The value of each position in the matrix indicates the possibility that the candidate body key point is in this position. The larger the value, the more likely the candidate body key point is in this position. For example, you can take the position of the maximum value in each area in the heatmap to get the key points of the corresponding candidate body. Among them, the heatmap can be pooled to the maximum, and then the heatmap before and after pooling can be compared, and the positions with the same value can be selected. As the key point of the candidate body.
此外,电子设备还调用关系分支网络根据前述图像特征检测得到候选人体关键点之间的连接关系。示例性的,关系分支网络的输出为pafmap,pafmap是一个height*width*(2*limbs)的三维矩阵,limbs表示肢体的数量(这里的肢体并非侠义的肢体,而是相关联的两个关键点之间的区域,比如,认为左眼和右眼的连接是一个肢体,脖子和左肩的连接是一个肢体)。每个肢体对应一个height*width*2的矩阵,可以认为是一张2通道的热图,热图的每个位置有2个值,分别为x和y,向量(x,y)表示该位置的肢体方向(x、y都为0时说明该位置没有肢体),表征了候选人体关键点间的连接关系。In addition, the electronic device also calls the relationship branch network to obtain the connection relationship between the key points of the candidate body based on the aforementioned image feature detection. Exemplarily, the output of the relational branch network is pafmap, which is a three-dimensional matrix of height*width*(2*limbs), and limbs represents the number of limbs (the limbs here are not chivalrous limbs, but two related keys The area between the points, for example, think that the connection between the left eye and the right eye is a limb, and the connection between the neck and the left shoulder is a limb). Each limb corresponds to a matrix of height*width*2, which can be considered as a 2-channel heat map. Each position of the heat map has 2 values, namely x and y, and the vector (x, y) represents the position The direction of the limbs (when x and y are both 0, it means that there is no limb at this position), which represents the connection relationship between the key points of the candidate body.
在得到候选人体关键点以及候选人体关键点的连接关系之后,即可根据该连接关系对候选人体关键点进行连接,从而得到完整的人体。其中,每次取一个肢体对应的pafmap,连接这个肢体两端的候选人体关键点。两个关键点dj1、dj2(j1、j2表示候选人体关键点类别,比如眼睛、鼻尖、眉毛等)来自同一人体的置信度:After obtaining the key points of the candidate body and the connection relationship of the key points of the candidate body, the key points of the candidate body can be connected according to the connection relationship, so as to obtain a complete human body. Among them, each time a pafmap corresponding to a limb is taken, and the key points of the candidate body at both ends of the limb are connected. The confidence that the two key points dj1 and dj2 (j1 and j2 represent the key point categories of the candidate body, such as eyes, nose tip, eyebrows, etc.) are from the same human body:
Figure PCTCN2021074905-appb-000001
Figure PCTCN2021074905-appb-000001
其中P(u)为两关键点之间内插的位置,即:Where P(u) is the position of interpolation between the two key points, namely:
P(u)=(1-u)d j1+ud j2P(u)=(1-u)d j1 +ud j2 ;
在实际使用中,一般在[0,1]上均匀间隔采样得到u,近似求积分。L c为pafmap中P(u)处的值。 In actual use, u is generally sampled at uniform intervals on [0,1], and the integral is approximated. L c is the value at P(u) in pafmap.
根据以上过程,就可以得到两个相邻候选关键点之间可能的连接,即潜在的肢体,从而直接完成人体的连接。According to the above process, the possible connection between two adjacent candidate key points, that is, the potential limbs, can be obtained, thereby directly completing the connection of the human body.
在完成对候选人体关键点的连接之后,电子设备还根据人像边界框对连接后的候选人体关键点进行归一化处理,得到人体关键点。其中,按照如下公式进行归一化处理,After completing the connection of the key points of the candidate body, the electronic device also normalizes the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body. Among them, the normalization process is carried out according to the following formula,
x’=x/w;x’=x/w;
y’=y/h;y’=y/h;
其中,x和y分别表示一候选人体关键点的横坐标和纵坐标,x’和y’分别表示前述候选人体关键点归一化后得到的人体关键点的横坐标和纵坐标,w表示人像边界框的宽,h表示人像边界框的高。Among them, x and y represent the abscissa and ordinate of the key points of a candidate body, x'and y'represent the abscissa and ordinate of the key points of the human body obtained by normalizing the key points of the candidate body, and w represents the portrait The width of the bounding box, h represents the height of the bounding box of the portrait.
在一实施例中,位置分支网络包括N个位置分段,关系分支网络包括N个关系分段,调用位置分支网络根据图像特征检测得到候选人体关键点,以及调用关系分支网络根据图像特征检测得到候选人体关键点之间的连接关系,包括:In an embodiment, the location branch network includes N location segments, the relationship branch network includes N relationship segments, the location branch network is called to obtain key points of candidates based on image feature detection, and the calling relationship branch network is detected based on image features. The connection relationship between the key points of the candidate body, including:
(1)调用第1个位置分段根据图像特征检测得到第1组候选人体关键点,以及调用第1个关系分段根据图像特征检测得到第1组候选人体关键点之间的连接关系;(1) Call the first position segment to obtain the key points of the first group of candidates based on image feature detection, and call the first relationship segment to obtain the connection relationship between the key points of the first group of candidates based on image feature detection;
(2)融合第1组候选人体关键点、第1组候选人体关键点之间的连接关系以及图像特征得到第1个融合特征,调用第2个位置分段根据第1个融合特征检测得到第2组候选人体关键点,以及调用第2个关系分段根据第1个融合特征检测得到第2组候选人体关键点之间的连接关系;(2) Fuse the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segment to obtain the first fusion feature based on the first fusion feature detection. 2 sets of candidate body key points, and call the second relationship segment to detect the connection relationship between the second set of candidate body key points according to the first fusion feature;
(3)融合第2组候选人体关键点、第2组候选人体关键点之间的连接关系以及图像特征得到第2个融合特征,以此类推,直至得到第N个位置分段根据第N-1个融合特征检测得到的第N组候选人体关键点,以及调用第N个关系分段根据第N-1个融合特征检测得到第N组候选人体关键点之间的连接关系;(3) Fusion of the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation is obtained according to the Nth- The key points of the Nth group of candidates obtained by 1 fusion feature detection, and the connection relationship between the key points of the Nth group of candidates obtained by calling the Nth relationship segment according to the N-1th fusion feature detection;
(4)将第N组候选人体关键点作为位置分支网络检测得到的候选人体关键点,以及将N组候选人体关键点之间的连接关系作为关系分支网络检测得到的候选人体关键点之间的连接关系。(4) Regarding the key points of the Nth group of candidates as the key points of the candidate bodies detected by the location branch network, and the connection relationship between the key points of the N groups of candidate bodies as the relationship between the key points of the candidate bodies detected by the relationship branch network Connection relationship.
应当说明的是,在本申请实施例中,位置分支网络包括N(N为大于2的正整数,可由本领域普通技术人员根据实际需要取值)个位置分段,关系分支网络包括N个关系分段。比如,请参照图5,位置分支网络包括N个位置分段,分别为位置分段1至位置分段N,对应的,关系分支网络包括N个关系分段,分别为关系分段1至关系分段N。其中,位置分段1和关系分段1组成网络分段1,位置分段2和关系分段2组成网络分段2,以此类推,位置分段N和关系分段N组成网络分段N,换言之,本申请实施例中构建的双分支网络可以看做是由多个网络分段组成,比如图5所示的网络分段1至网络分段N,每一网络分段均包括对应的位置分段和关系分段。It should be noted that, in the embodiment of the present application, the location branch network includes N (N is a positive integer greater than 2 and can be valued by a person of ordinary skill in the art according to actual needs) location segments, and the relationship branch network includes N relationships Segmented. For example, referring to Figure 5, the location branch network includes N location segments, namely location segment 1 to location segment N. Correspondingly, the relationship branch network includes N relationship segments, namely relationship segment 1 to relationship. Segment N. Among them, location segment 1 and relation segment 1 form network segment 1, location segment 2 and relation segment 2 form network segment 2, and so on, location segment N and relation segment N form network segment N In other words, the dual-branch network constructed in the embodiment of this application can be regarded as composed of multiple network segments, such as network segment 1 to network segment N shown in FIG. 5, each of which includes a corresponding Location segmentation and relationship segmentation.
以下继续以图5所示的网络结构为例进行说明。The following continues to take the network structure shown in FIG. 5 as an example for description.
本申请实施例中,电子设备将特征提取网络提取到的图像特征输入网络分段1中的位置分段1进行检测,得到位置分段1输出的第1组候选人体关键点,以及将提取得到的图像特征输入网络分段1中的关系分段1进行检测,得到关系分段1输出的第1组候选人体关键点之间的连接关系;然后,将第1组候选人体关键点(坐标)、第1组候选人体关键点之间的连接关系以及图像特征进行融合,得到第1个融合特征,作为网络分段1的输出;然后,将网络分段1输出的第1个融合特征输入网络分段2中的位置分段2进行检测,得到位置分段2输出的第2组候选人体关键点,以及将网络分段1输出的融合特征输入网络分段2中的关系分段2进行检测,得到关系分段2输出的第2组候选人体关键点之间的连接关系;然后,将第2组候选人体关键点(坐标)、第2组候选人体关键点之间的连接关系以及图像特征进行融合,得到第2个融合特征,作为网络分段2的输出;以此类推,直至获取到网络分段N中的位置分段N根据网络分段N-1输出的第N-1个融合特征所检测得到的第N组候选人体关键点,以及获取到网络分段N中的关系分段N根据网络分段N-1输出的第N-1个融合特征所检测得到的第N组候选人体关键点之间的连接关系;然后,将位置分段N输出的第N组候选人体关键点作为位置分支网络最终输出的候选人体关键点,将关系分段N输出的第N组候选人体关键点之间的连接关系作为关系分支网络最终输出的候选人体关键点之间的连接关系。In the embodiment of the present application, the electronic device inputs the image features extracted by the feature extraction network into the location segment 1 in the network segment 1 for detection, obtains the first group of candidate body key points output by the location segment 1, and extracts the obtained The image features of the image are input into the relationship segment 1 in the network segment 1 for detection, and the connection relationship between the key points of the first group of candidates output by the relationship segment 1 is obtained; then, the key points (coordinates) of the first group of candidates , The connection relationship between the key points of the first group of candidates and the image features are fused, and the first fusion feature is obtained as the output of network segment 1. Then, the first fusion feature output by network segment 1 is input to the network The location segment 2 in segment 2 is detected, and the key points of the second group of candidates output by location segment 2 are obtained, and the fusion features output by network segment 1 are input into the relationship segment 2 of network segment 2 for detection , Get the connection relationship between the key points of the second group of candidates output by the relationship segment 2; then, the connection relationship between the key points of the second group of candidates (coordinates), the connection relations between the key points of the second group of candidates, and the image features Perform fusion to obtain the second fusion feature as the output of network segment 2; and so on, until the position in network segment N is obtained. Segment N is the N-1th fusion output according to network segment N-1 The key points of the Nth group of candidates detected by the feature, and the relationship between the segment N obtained in the network segment N, and the Nth group of candidates detected according to the N-1th fusion feature output by the network segment N-1 The connection relationship between the key points of the human body; then, the key points of the Nth group of candidates output by the location segment N are used as the key points of the candidate body finally output by the location branch network, and the key points of the Nth group of candidates output by the relationship segment N The connection relationship between the points is used as the connection relationship between the key points of the candidate body finally output by the relationship branch network.
在一实施例中,第1个位置分段包括依次连接的多个第一卷积模块和多个第二卷积模块。In an embodiment, the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence.
应当说明的是,在本申请实施例中,第1个关系分段和第1个位置分段的结构相同,但二者不共享参数,以下以第1个位置分段为例进行说明。It should be noted that in the embodiment of the present application, the structure of the first relationship segment and the first location segment are the same, but the two do not share parameters. The following takes the first location segment as an example for description.
本申请实施例中,第1个位置分段包括依次连接的多个第一卷积模型和多个第二卷积模块,其中,第一卷积模块包括卷积核大小为3*3的卷积单元,第二卷积模块包括卷积核大小为1*1的卷积单元。In the embodiment of the present application, the first position segment includes a plurality of first convolution models and a plurality of second convolution modules that are sequentially connected, wherein the first convolution module includes a convolution kernel with a size of 3*3. Convolution unit, the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
应当说明的是,本申请实施例中对构成第1个位置分段的第一卷积模块和第二卷积模块的数量不做具体限制,可由本领域普通技术人员根据实际需要进行配置,比如,本申请实施例中,采用3个第一卷积模块和2个第二卷积模块,如图6所示。It should be noted that in the embodiment of the present application, the number of the first convolution module and the second convolution module constituting the first position segment is not specifically limited, and can be configured by a person of ordinary skill in the art according to actual needs, such as In the embodiment of the present application, three first convolution modules and two second convolution modules are used, as shown in FIG. 6.
在一实施例中,第2至N个位置分段的结构相同,第2个位置分段包括依次连接的多个第三卷积模块和多个第二卷积模块。In an embodiment, the structures of the second to N position segments are the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolution modules connected in sequence.
应当说明的是,在本申请实施例中,除了第1个关系分段和第1个位置分段之外,所有的关系分段和所有的位置分段的结构相同,但二者不共享参数,以下以第2个位置分段为例进行说明。It should be noted that, in the embodiment of this application, except for the first relationship segment and the first position segment, all the relationship segments and all the position segments have the same structure, but the two do not share parameters. , The following takes the second position segment as an example for description.
本申请实施例中,第2个位置分段包括依次连接的多个第三卷积模型和多个第二卷积模块,其中,第三卷积模块包括卷积核大小为7*7的卷积单元。In the embodiment of the present application, the second position segment includes multiple third convolution models and multiple second convolution modules that are sequentially connected, where the third convolution module includes a convolution kernel with a size of 7*7. Product unit.
本申请实施例中对构成第2个位置分段的第三卷积模块和第二卷积模块的数量不做具体限制,可由本领域普通技术人员根据实际需要进行配置,比如,本申请实施例中,采用5个第三卷积模块和2个第二卷积模块,如图7所示。In the embodiment of the application, the number of the third convolution module and the second convolution module constituting the second position segment is not specifically limited, and can be configured by a person of ordinary skill in the art according to actual needs. For example, the embodiment of the application In this, five third convolution modules and two second convolution modules are used, as shown in Figure 7.
应当说明的是,本申请采用7*7卷积单元的目的在于获得更大的感受野,从而获取更多信息。在其他实施例中,在算力有限的情况下,可以将每个7×7的卷积单元替换为3个3×3的卷积单元,以此来减少处理的数据量。It should be noted that the purpose of adopting the 7*7 convolution unit in this application is to obtain a larger receptive field, so as to obtain more information. In other embodiments, in the case of limited computing power, each 7×7 convolution unit can be replaced with three 3×3 convolution units to reduce the amount of processed data.
在一实施例中,根据类别区域以及人体关键点获取对应拍摄场景的定位点集合,包括:In an embodiment, acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body includes:
确定每一类别区域的类别中心点,将每一类别区域的类别中心点以及每一人体关键点作为定位点,得到定位点集合。Determine the category center point of each category area, and use the category center point of each category area and each key point of the human body as positioning points to obtain a set of positioning points.
本申请实施例中,对于预览图像划分得到的每一类别区域,确定每一类别区域的类别中心点,将每一类别区域的类别中心点作为一个定位点,以及将每一人体关键点作为一个定位点,由这这些定位点组成定位点集合。In the embodiment of the present application, for each category area obtained by dividing the preview image, the category center point of each category area is determined, the category center point of each category area is taken as a positioning point, and each human body key point is taken as a Anchor point, these anchor points form an anchor point set.
在一实施例中,确定对应定位点集合的构图点集合,包括:In an embodiment, determining the set of composition points corresponding to the set of anchor points includes:
(1)根据定位点集合,确定与预览图像相似度最高的预设构图模板图像;(1) According to the set of anchor points, determine the preset composition template image with the highest similarity to the preview image;
(2)将预设构图模板图像中每一类别中心点以及每一人体关键点作为构图点,得到构图点集合。(2) Taking the center point of each category and each key point of the human body in the preset composition template image as the composition point to obtain the composition point set.
应当说明的是,本申请预先构建有人像构图数据库,该人像构图数据中包括多个预设构图模板图像。It should be noted that this application constructs a portrait composition database in advance, and the portrait composition data includes a plurality of preset composition template images.
示例性的,按照如下方式构建人像构图数据库:Exemplarily, the portrait composition database is constructed as follows:
a,采集构图优秀的图像,数量应当尽可能地多。a. The number of images with excellent composition should be collected as much as possible.
b,对于采集得到的每一图像,调用人像检测模型检测得到其对应的人像边界框,并截取人像边界框中的图像内容得到人体图像,然后根据人像边界框调用关键点检测模型检测出人体图像中的人体关键点(具体可参照以上对预览图像的关键点检测过程,此处不再详述)。b. For each image collected, call the portrait detection model to detect the corresponding portrait bounding box, and intercept the image content in the portrait bounding box to obtain the human image, and then call the key point detection model according to the portrait bounding box to detect the human image The key points of the human body in (for details, please refer to the key point detection process of the preview image above, which will not be described in detail here).
c,对于采集得到的每一图像,将其划分为多个类别区域,确定出每一类别区域的类别中心。c. For each image collected, divide it into multiple category areas, and determine the category center of each category area.
d,将采集得到每一图像作为样本,将之前得到的类别中心、人体关键点(坐标)以及人像边界框的宽高比作为样本的特征,进行Q型聚类,每个类别包含多个样本。使用明氏距离衡量样本之间的相似度,使用AGENS层次聚类算法进行聚类,其中,可根据采集到的图像的场景和其中人体的姿态的分布确定聚类类别的数量,比如,场景和姿态比较多变时可设置较多的类别,较单一时设置较少的类别,具体可由本领域普通技术人员根据实际需要进行配置。d. Take each image collected as a sample, and use the previously obtained category center, key points (coordinates) of the human body and the aspect ratio of the bounding box of the portrait as the characteristics of the sample, and perform Q-type clustering. Each category contains multiple samples. . The Ming’s distance is used to measure the similarity between samples, and the AGENS hierarchical clustering algorithm is used for clustering. Among them, the number of clustering categories can be determined according to the scene of the collected image and the distribution of the posture of the human body, such as scene and More categories can be set when the posture is more changeable, and fewer categories can be set when the posture is more singular, which can be specifically configured by a person of ordinary skill in the art according to actual needs.
e,将位于每一类别中心的图像作为一个预设构图模板图像。e. Use the image at the center of each category as a preset composition template image.
本申请实施例中,在确定对应定位点集合的构图点集合时,电子设备可以将定位点集合(包括预览图像的类别中心点和人体关键点)以及预览图像的人像边界框的宽高比作为预览图像的特征,从人像构 图数据中确定出与预览图像相似度(使用明氏距离衡量)最高的预设构图模板图像。然后,将该确定出的该预设构图模板图像的每一类别中心点和每一人体关键点作为构图点,得到构图点集合。In the embodiment of the present application, when determining the set of composition points corresponding to the set of anchor points, the electronic device may use the set of anchor points (including the category center points of the preview image and the key points of the human body) and the aspect ratio of the portrait bounding box of the preview image as The characteristics of the preview image are determined from the portrait composition data, and the preset composition template image with the highest similarity (measured by the Minnesian distance) to the preview image is determined. Then, each category center point and each human body key point of the determined preset composition template image are used as composition points to obtain a set of composition points.
比如,请参照图8,电子设备在可以预览图像中显示构图点集合和定位点集合,如图8所示,构图点集合包括构图点1和构图点2,定位点集合包括与构图点1对应的定位点1以及与构图点2对应的定位点2,并结合定位点1到构图点1的指向箭头以及定位点2到构图点2的指向箭头作为指示调整电子设备拍摄姿态的提示信息,对用户进行构图指导。For example, referring to Figure 8, the electronic device displays the composition point set and the positioning point set in the previewable image. As shown in Figure 8, the composition point set includes composition point 1 and composition point 2, and the positioning point set includes composition point 1 corresponding to composition point 1. The positioning point 1 and the positioning point 2 corresponding to the composition point 2 are combined with the pointing arrow from the positioning point 1 to the composition point 1 and the pointing arrow from the positioning point 2 to the composition point 2 as the prompt information for adjusting the shooting posture of the electronic device. The user guides the composition.
在一实施例中,本申请提供的构图提示方法还包括:In an embodiment, the composition prompting method provided by the present application further includes:
(1)将对应同一类别的定位点和构图点组合为数据组,以及将对应同一人体位置的定位点和构图点组合为数据组,得到多个数据组;(1) Combine positioning points and composition points corresponding to the same category into a data group, and combine positioning points and composition points corresponding to the same human body position into a data group to obtain multiple data groups;
(2)计算每一数据组中定位点和构图点的距离,并计算多个数据组的距离和值;(2) Calculate the distance between the positioning point and the composition point in each data group, and calculate the distance and value of multiple data groups;
(3)当距离和值小于预设阈值时,判定定位点集合与构图点集合匹配。(3) When the distance and the value are less than the preset threshold, it is determined that the set of positioning points matches the set of composition points.
本申请实施例中,电子设备将对应同一类别的定位点和构图点组合为数据组,以及将对应同一人体位置的定位点和构图点组合为数据组,由此得到多个数据组。In the embodiment of the present application, the electronic device combines positioning points and composition points corresponding to the same category into a data group, and combines positioning points and composition points corresponding to the same human body position into a data group, thereby obtaining multiple data sets.
对于每一数据组,电子设备计算其中定位点和构图点的距离(欧氏距离),并计算多个数据组的距离和值。For each data group, the electronic device calculates the distance between the positioning point and the composition point (Euclidean distance), and calculates the distance and value of the multiple data groups.
然后,电子设备判断计算得到距离和值是否达到预设阈值,若是,则判定定位点集合与构图点集合匹配,否则不匹配。Then, the electronic device determines whether the calculated distance and value reach the preset threshold, and if so, it determines that the set of positioning points matches the set of composition points, otherwise it does not match.
请参照图9,图9为本申请实施例提供的构图提示方法的流程示意图,本申请实施例提供的构图提示方法的流程可以如下:Please refer to FIG. 9. FIG. 9 is a schematic flowchart of a composition prompting method provided by an embodiment of the present application. The flow of the composition prompting method provided by an embodiment of the present application may be as follows:
在201中,电子设备获取拍摄场景的预览图像,并从预览图像中截取人体图像。In 201, the electronic device obtains a preview image of the shooting scene, and intercepts a human body image from the preview image.
拍摄场景为电子设备在启动拍摄类应用程序后摄像头所对准的场景,其可以为任何场景,其中可以包括人和物等。预览图像为电子设备通过摄像头对拍摄场景进行图像场景得到,缺省用于展示给用户,以使得用户能够预览图像拍摄的成像效果。The shooting scene is the scene where the camera of the electronic device is aimed after the shooting application is started, and it can be any scene, which can include people and objects. The preview image is obtained by the electronic device using the camera to perform the image scene of the shooting scene, and is used by default to show to the user, so that the user can preview the imaging effect of the image shooting.
本申请实施例中,电子设备利用实时采集的预览图像来对拍摄场景中的人体进行关键点检测,以检测到前述人体的人体关键点。In the embodiment of the present application, the electronic device uses the preview image collected in real time to detect the key points of the human body in the shooting scene, so as to detect the key points of the human body.
其中,电子设备首先获取到拍摄场景的预览图像。应当说明的是,本申请实施例还预先采用机器学习方法训练有人像检测模型,该人像检测模型被配置为以图像为输入,以对应图像的人像边界框为输出,该人像边界框内的图像内容即为图像的人像部分。相应的,电子设备在获取到预览图像之后,可以调用人像检测模型对预览图像进行人像检测,得到对应预览图像的人像边界框。然后,截取出该人像边界框中的图像内容即可得到人体图像。Among them, the electronic device first obtains a preview image of the shooting scene. It should be noted that the embodiment of the present application also pre-trains a portrait detection model using a machine learning method. The portrait detection model is configured to take an image as an input and a portrait bounding box corresponding to the image as an output. The image within the bounding box of the portrait The content is the portrait part of the image. Correspondingly, after acquiring the preview image, the electronic device can call the portrait detection model to perform portrait detection on the preview image, and obtain the portrait bounding box corresponding to the preview image. Then, the human body image can be obtained by cutting out the image content in the bounding box of the portrait.
在202中,电子设备调用预训练的关键点检测模型对人体图像进行关键点检测,得到拍摄场景中人体的人体关键点。In 202, the electronic device calls the pre-trained key point detection model to perform key point detection on the human body image to obtain the human body key points of the human body in the shooting scene.
应当说明的是,本申请实施例中还采用机器学习方法预先训练有关键点检测模型。其中,该关键点检测模型可以设置在电子设备本地,也可以设置在服务器。此外,本申请中对人像检测模型的构型不做具体限制,可由本领域普通技术人员根据实际需要选择。相应的,电子设备除了获取拍摄场景的预览图像之外,还从本地或服务器调用预训练的关键点检测模型,并将获取到的预览图像输入预训练的关键点检测模型中进行关键点检测,得到拍摄场景中人体的人体关键点。人体关键点用于定位人体的头、颈、肩、肘、手、臀、膝以及脚等部位,头部关键点又可以细分为眼睛、鼻尖、嘴、眉毛以及头部各部件轮廓点等。比如,请参照图2,对于图2左侧所示的人体图像,将其输入到预训练的关键点检测模型进行关键点检测,得到了多个人体关键点,如图2右侧所示。It should be noted that, in the embodiment of the present application, a machine learning method is also used to pre-train a key point detection model. Among them, the key point detection model can be set locally on the electronic device or on the server. In addition, the configuration of the portrait detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs. Correspondingly, in addition to obtaining the preview image of the shooting scene, the electronic device also calls the pre-trained key point detection model from the local or server, and inputs the obtained preview image into the pre-trained key point detection model for key point detection. Obtain the key points of the human body in the shooting scene. The key points of the human body are used to locate the head, neck, shoulders, elbows, hands, hips, knees and feet of the human body. The key points of the head can be subdivided into the eyes, nose, mouth, eyebrows and contour points of various parts of the head, etc. . For example, referring to Figure 2, for the human body image shown on the left side of Figure 2, input it into the pre-trained key point detection model for key point detection, and multiple human key points are obtained, as shown on the right side of Figure 2.
在203中,电子设备将预览图像划分为多个类别区域,并确定出每一类别区域的类别中心点。In 203, the electronic device divides the preview image into multiple category areas, and determines the category center point of each category area.
在204中,电子设备将每一类别中心点以及每一人体关键点作为定位点,得到定位点集合。In 204, the electronic device uses the center point of each category and each key point of the human body as positioning points to obtain a set of positioning points.
本申请实施例中,电子设备在获取到拍摄场景的预览图像之后,除了对预览图像进行关键点检测之 外,还将预览图像划分为多个类别区域。In the embodiment of the present application, after acquiring the preview image of the shooting scene, the electronic device not only performs key point detection on the preview image, but also divides the preview image into multiple category areas.
示例性的,本申请中还采用机器学习方法预先训练有语义分割模型。其中,该语义分割模型可以设置在电子设备本地,也可以设置在服务器。此外,本申请中对语义分割模型的构型不做具体限制,可由本领域普通技术人员根据实际需要选择。比如,本申请中采用ICNet构型的语义分割模型。Exemplarily, a machine learning method is also used in this application to pre-train a semantic segmentation model. Among them, the semantic segmentation model can be set locally on the electronic device or on the server. In addition, the configuration of the semantic segmentation model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs. For example, the semantic segmentation model of ICNet configuration is adopted in this application.
在将预览图像划分为多个类别区域时,电子设备可以从本地或服务器调用预训练的语义分割模型,并将获取到的预览图像输入预训练的语义分割模型中进行语义分割,得到预览图像中每个区域所属的物体类别信息。然后,根据该类别信息,电子设备将预览图像划分为多个类别区域,并确定出每一类别区域的类别中心点。When dividing the preview image into multiple category areas, the electronic device can call the pre-trained semantic segmentation model from the local or the server, and input the obtained preview image into the pre-trained semantic segmentation model for semantic segmentation, to obtain the preview image Object category information to which each area belongs. Then, according to the category information, the electronic device divides the preview image into multiple category areas, and determines the category center point of each category area.
将每一类别区域的类别中心点作为一个定位点,以及将每一人体关键点作为一个定位点,由这这些定位点组成定位点集合。The category center point of each category area is taken as an anchor point, and each key point of the human body is taken as an anchor point, and these anchor points form an anchor point set.
在205中,电子设备根据定位点集合,确定与预览图像相似度最高的预设构图模板图像。In 205, the electronic device determines the preset composition template image with the highest similarity to the preview image according to the set of positioning points.
在206中,电子设备将预设构图模板图像中的每一类别中心点以及每一人体关键点作为构图点,得到构图点集合。In 206, the electronic device uses each category center point and each human body key point in the preset composition template image as a composition point to obtain a set of composition points.
应当说明的是,本申请预先构建有人像构图数据库,该人像构图数据中包括多个预设构图模板图像。It should be noted that this application constructs a portrait composition database in advance, and the portrait composition data includes a plurality of preset composition template images.
本申请实施例中,在确定对应定位点集合的构图点集合时,电子设备可以将定位点集合(包括预览图像的类别中心点和人体关键点)以及预览图像的人像边界框的宽高比作为预览图像的特征,从人像构图数据中确定出与预览图像相似度(使用明氏距离衡量)最高的预设构图模板图像。然后,将该确定出的该预设构图模板图像的每一类别中心点和每一人体关键点作为构图点,得到构图点集合。In the embodiment of the present application, when determining the set of composition points corresponding to the set of anchor points, the electronic device may use the set of anchor points (including the category center points of the preview image and the key points of the human body) and the aspect ratio of the portrait bounding box of the preview image as The characteristics of the preview image are determined from the portrait composition data, and the preset composition template image with the highest similarity (measured by the Minnesian distance) to the preview image is determined. Then, each category center point and each human body key point of the determined preset composition template image are used as composition points to obtain a set of composition points.
其中,构图点集合中的构图点与定位点集合中的定位点一一对应,当每一定位点均与其对应的构图点匹配时,认为此时能够获得最佳的构图。其中,定位点与构图点匹配包括定位点为构图点的距离小于或等于预设距离,本申请对该预设距离的取值不做具体限定,可由本领域普通技术人员根据实际需要取值。Wherein, the composition points in the composition point set correspond to the positioning points in the positioning point set one-to-one, and when each positioning point matches its corresponding composition point, it is considered that the best composition can be obtained at this time. Wherein, matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance. This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.
在207中,当定位点集合与构图点集合不匹配时,电子设备输出用于指示调整电子设备拍摄姿态的提示信息。In 207, when the positioning point set does not match the composition point set, the electronic device outputs prompt information for instructing to adjust the shooting posture of the electronic device.
其中,根据以上定位点和构图点匹配的定义,可由本领域普通技术人员根据实际需要配置定位点集合与构图点集合匹配的约束条件,本申请对此不做具体限制,比如,可以配置为定位点集合中的每一定位点均与其对应的构图点集合中的构图点匹配时,判定定位点集合与构图点集合匹配;又比如,可以配置为定位点集合中预设数量的定位点与其对应对构图点集合中的构图点匹配时,判定定位点集合与构图点集合匹配。Among them, according to the above definition of matching of positioning points and composition points, a person of ordinary skill in the art can configure the constraint conditions for matching the positioning point set and the composition point set according to actual needs. This application does not make specific restrictions on this, for example, it can be configured as positioning When each anchor point in the point set matches with the composition point in the corresponding composition point set, it is determined that the anchor point set matches the composition point set; for example, it can be configured as a preset number of anchor points in the anchor point set and their counterparts. When the composition point in the composition point set is matched, it is determined that the anchor point set matches the composition point set.
相应的,电子设备实时判定拍摄场景对应的定位点集合与构图点集合是否匹配,若不匹配,则输出用于指示调整电子设备拍摄姿态的提示信息,以使得拍摄场景对应的定位点集合与构图点集合匹配,从而使得拍摄场景中的人和物能够获得较佳的构图。Correspondingly, the electronic device determines in real time whether the set of anchor points corresponding to the shooting scene matches the set of composition points, and if they do not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device, so that the set of anchor points corresponding to the shooting scene and the composition are output. The point set is matched, so that the people and objects in the shooting scene can obtain a better composition.
比如,请参照图8,电子设备在可以预览图像中显示构图点集合和定位点集合,如图8所示,构图点集合包括构图点1和构图点2,定位点集合包括与构图点1对应的定位点1以及与构图点2对应的定位点2,并结合定位点1到构图点1的指向箭头以及定位点2到构图点2的指向箭头作为指示调整电子设备拍摄姿态的提示信息,对用户进行构图指导。For example, referring to Figure 8, the electronic device displays the composition point set and the positioning point set in the previewable image. As shown in Figure 8, the composition point set includes composition point 1 and composition point 2, and the positioning point set includes composition point 1 corresponding to composition point 1. The positioning point 1 and the positioning point 2 corresponding to the composition point 2 are combined with the pointing arrow from the positioning point 1 to the composition point 1 and the pointing arrow from the positioning point 2 to the composition point 2 as the prompt information for adjusting the shooting posture of the electronic device. The user guides the composition.
在208中,当定位点集合与构图点集合匹配时,电子设备对拍摄场景进行拍摄,得到拍摄图像。In 208, when the set of positioning points matches the set of composition points, the electronic device photographs the shooting scene to obtain a photographed image.
当定位点集合与构图点集合匹配时,电子设备判定此时拍摄场景中的人和物具有较佳的构图,即对拍摄场景进行拍摄,从而得到拍摄场景的高质量拍摄图像。When the set of positioning points matches the set of composition points, the electronic device determines that the people and objects in the shooting scene at this time have a better composition, that is, the shooting scene is photographed, so as to obtain a high-quality shooting image of the shooting scene.
在一实施例中,还提供了一种构图提示装置。请参照图10,图10为本申请实施例提供的构图提示装置的结构示意图。其中该构图提示装置应用于电子设备,该构图提示装置包括关键点检测模块301、定位点确定模块302、构图点确定模块303、构图提示模块304以及图像拍摄模块305,如下:In an embodiment, a composition prompting device is also provided. Please refer to FIG. 10, which is a schematic structural diagram of a composition prompting device provided by an embodiment of the application. The composition prompting device is applied to electronic equipment, and the composition prompting device includes a key point detection module 301, a positioning point determination module 302, a composition point determination module 303, a composition prompt module 304, and an image capturing module 305, as follows:
关键点检测模块301,用于获取拍摄场景的预览图像,并调用预训练的关键点检测模型对预览图像 进行关键点检测,得到拍摄场景中人体的人体关键点;The key point detection module 301 is used to obtain a preview image of the shooting scene, and call a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene;
定位点确定模块302,用于将预览图像划分为多个类别区域,并根据类别区域以及人体关键点获取对应拍摄场景的定位点集合;The positioning point determination module 302 is configured to divide the preview image into multiple category areas, and obtain a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;
构图点确定模块303,用于确定对应定位点集合的构图点集合;The composition point determination module 303 is used to determine the composition point set corresponding to the positioning point set;
构图提示模块304,用于当定位点集合与构图点集合不匹配时,输出用于指示调整电子设备拍摄姿态的提示信息。The composition prompting module 304 is configured to output prompt information for instructing to adjust the shooting posture of the electronic device when the set of positioning points and the set of composition points do not match.
在一实施例中,本申请提供的构图提示装置还包括图像拍摄模块,用于当定位点集合与构图点集合匹配时,对拍摄场景进行拍摄,得到拍摄图像。In an embodiment, the composition prompting device provided by the present application further includes an image capturing module, which is used to capture the shooting scene when the positioning point set matches the composition point set to obtain the captured image.
在一实施例中,在调用预训练的关键点检测模型对预览图像进行关键点检测,得到拍摄场景中人体的人体关键点时,关键点检测模块301用于:In one embodiment, when the pre-trained key point detection model is called to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, the key point detection module 301 is used to:
从预览图像中截取人体的人体图像;Intercept the human body image of the human body from the preview image;
调用关键点检测模型对人体图像进行关键点检测,得到人体关键点。Call the key point detection model to detect the key points of the human body image to obtain the key points of the human body.
在一实施例中,在从预览图像中截取人体的人体图像时,关键点检测模块301用于:In an embodiment, when the human body image of the human body is intercepted from the preview image, the key point detection module 301 is used to:
调用预训练的人像检测模型对预览图像进行人像检测,得到对应预览图像的人像边界框;Call the pre-trained portrait detection model to perform portrait detection on the preview image, and obtain the portrait bounding box corresponding to the preview image;
截取人像边界框中的图像内容,得到人体图像。The image content in the bounding box of the portrait is intercepted to obtain the human body image.
在一实施例中,关键点检测模型包括特征提取网络,双分支网络以及输出网络,双分支网络包括位置分支网络和关系分支网络,在调用关键点检测模型对人体图像进行关键点检测,得到人体关键点时,关键点检测模块301用于:In one embodiment, the key point detection model includes a feature extraction network, a dual branch network, and an output network. The dual branch network includes a location branch network and a relationship branch network. The key point detection model is called to perform key point detection on the human body image to obtain the human body. In the case of key points, the key point detection module 301 is used to:
调用特征提取网络提取得到人体图像的图像特征;Call the feature extraction network to extract the image features of the human body image;
调用位置分支网络根据图像特征检测得到候选人体关键点,以及调用关系分支网络根据图像特征检测得到候选人体关键点之间的连接关系;The call location branch network detects the key points of the candidate body according to the image feature, and the call relationship branch network detects the connection relationship between the key points of the candidate body according to the image feature;
调用输出网络根据连接关系连接候选人体关键点,并根据人像边界框对连接后的候选人体关键点进行归一化处理,得到人体关键点。The output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
在一实施例中,位置分支网络包括N个位置分段,关系分支网络包括N个关系分段,在调用位置分支网络根据图像特征检测得到候选人体关键点,以及调用关系分支网络根据图像特征检测得到候选人体关键点之间的连接关系时,关键点检测模块301用于:In one embodiment, the location branch network includes N location segments, the relationship branch network includes N relationship segments, the call location branch network detects candidate body key points according to image features, and the call relationship branch network detects the image features according to the image features. When obtaining the connection relationship between the key points of the candidate body, the key point detection module 301 is used to:
调用第1个位置分段根据图像特征检测得到第1组候选人体关键点,以及调用第1个关系分段根据图像特征检测得到第1组候选人体关键点之间的连接关系;Calling the first position segment to obtain the key points of the first group of candidates based on image feature detection, and call the first relationship segment to obtain the connection relationship between the key points of the first group of candidates based on image feature detection;
融合第1组候选人体关键点、第1组候选人体关键点之间的连接关系以及图像特征得到第1个融合特征,调用第2个位置分段根据第1个融合特征检测得到第2组候选人体关键点,以及调用第2个关系分段根据第1个融合特征检测得到第2组候选人体关键点之间的连接关系;Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation to obtain the second group of candidates based on the first fusion feature detection The key points of the human body, and call the second relationship segment to detect the connection relationship between the key points of the second group of candidates according to the first fusion feature;
融合第2组候选人体关键点、第2组候选人体关键点之间的连接关系以及图像特征得到第2个融合特征,以此类推,直至得到第N个位置分段根据第N-1个融合特征检测得到的第N组候选人体关键点,以及调用第N个关系分段根据第N-1个融合特征检测得到第N组候选人体关键点之间的连接关系;Fuse the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation is obtained according to the N-1th fusion The key points of the Nth group of candidates obtained by feature detection, and the connection relationship between the key points of the Nth group of candidates obtained by calling the Nth relationship segment according to the N-1th fusion feature detection;
将第N组候选人体关键点作为位置分支网络检测得到的候选人体关键点,以及将N组候选人体关键点之间的连接关系作为关系分支网络检测得到的候选人体关键点之间的连接关系。The key points of the Nth group of candidates are taken as the key points of the candidates detected by the location branch network, and the connection relationship between the key points of the N groups of candidates is taken as the connection relationship between the key points of the candidate detected by the relationship branch network.
在一实施例中,第1个位置分段包括依次连接的多个第一卷积模块和多个第二卷积模块。In an embodiment, the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence.
在一实施例中,第一卷积模块包括卷积核大小为3*3的卷积单元,第二卷积模块包括卷积核大小为1*1的卷积单元。In an embodiment, the first convolution module includes a convolution unit with a convolution kernel size of 3*3, and the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
在一实施例中,第2-N个位置分段的结构相同,第2个位置分段包括依次连接的多个第三卷积模块和多个第二卷积模块。In an embodiment, the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolution modules connected in sequence.
在一实施例中,第三卷积模块包括卷积核大小为7*7的卷积单元。In an embodiment, the third convolution module includes a convolution unit with a convolution kernel size of 7*7.
在一实施例中,在根据类别区域以及人体关键点获取对应拍摄场景的定位点集合时,定位点确定模块302用于:In an embodiment, when acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body, the positioning point determination module 302 is configured to:
确定每一类别区域的类别中心点,将每一类别区域的类别中心点以及每一人体关键点作为定位点,得到定位点集合。Determine the category center point of each category area, and use the category center point of each category area and each key point of the human body as positioning points to obtain a set of positioning points.
在一实施例中,在确定对应定位点集合的构图点集合时,构图点确定模块303用于:In an embodiment, when determining the composition point set corresponding to the anchor point set, the composition point determination module 303 is configured to:
根据定位点集合,确定与预览图像相似度最高的预设构图模板图像;According to the set of anchor points, determine the preset composition template image with the highest similarity to the preview image;
将预设构图模板图像中每一类别中心点以及每一人体关键点作为构图点,得到构图点集合。The center point of each category and each key point of the human body in the preset composition template image are used as composition points to obtain a set of composition points.
在一实施例中,本申请提供的构图提示装置还包括判断模块,用于:In an embodiment, the composition prompting device provided by the present application further includes a judgment module for:
将对应同一类别的定位点和构图点组合为数据组,以及将对应同一人体位置的定位点和构图点组合为数据组,得到多个数据组;Combine positioning points and composition points corresponding to the same category into a data group, and combine positioning points and composition points corresponding to the same human body position into a data group to obtain multiple data groups;
计算每一数据组中定位点和构图点的距离,并计算多个数据组的距离和值;Calculate the distance between the positioning point and the composition point in each data group, and calculate the distance and value of multiple data groups;
当距离和值小于预设阈值时,判定定位点集合与构图点集合匹配。When the distance and the value are less than the preset threshold, it is determined that the set of positioning points matches the set of composition points.
应当说明的是,本申请实施例提供的构图提示装置与上文实施例中的构图提示方法属于同一构思,在构图提示装置上可以运行构图提示方法实施例中提供的任一方法,其具体实现过程详见以上实施例,此处不再赘述。It should be noted that the composition reminding device provided in this embodiment of the application belongs to the same concept as the composition reminding method in the above embodiment. Any method provided in the composition reminding method embodiment can be run on the composition reminding device, and its specific implementation For details of the process, refer to the above embodiment, which will not be repeated here.
在一实施例中,还提供一种电子设备,请参照图11,电子设备包括处理器401和存储器402。In an embodiment, an electronic device is also provided. Referring to FIG. 11, the electronic device includes a processor 401 and a memory 402.
本申请实施例中的处理器401是通用处理器,比如ARM架构的处理器。The processor 401 in the embodiment of the present application is a general-purpose processor, such as an ARM architecture processor.
存储器402中存储有计算机程序,其可以为高速随机存取存储器,还可以为非易失性存储器,比如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件等。相应地,存储器402还可以包括存储器控制器,以提供处理器401对存储器402中计算机程序的访问,实现如下功能:A computer program is stored in the memory 402, which may be a high-speed random access memory or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the computer program in the memory 402 to implement the following functions:
获取拍摄场景的预览图像,并调用预训练的关键点检测模型对预览图像进行关键点检测,得到拍摄场景中人体的人体关键点;Obtain a preview image of the shooting scene, and call the pre-trained key point detection model to perform key point detection on the preview image, and obtain the human body key points of the human body in the shooting scene;
将预览图像划分为多个类别区域,并根据类别区域以及人体关键点获取对应拍摄场景的定位点集合;Divide the preview image into multiple category areas, and obtain a set of anchor points corresponding to the shooting scene according to the category areas and the key points of the human body;
确定对应定位点集合的构图点集合;Determine the set of composition points corresponding to the set of anchor points;
当定位点集合与构图点集合不匹配时,输出用于指示调整电子设备拍摄姿态的提示信息。When the set of positioning points and the set of composition points do not match, a prompt message for instructing to adjust the shooting posture of the electronic device is output.
在一实施例中,在调用预训练的关键点检测模型对预览图像进行关键点检测,得到拍摄场景中人体的人体关键点时,处理器401用于执行:In one embodiment, when the pre-trained key point detection model is called to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, the processor 401 is configured to execute:
从预览图像中截取人体的人体图像;Intercept the human body image of the human body from the preview image;
调用关键点检测模型对人体图像进行关键点检测,得到人体关键点。Call the key point detection model to detect the key points of the human body image to obtain the key points of the human body.
在一实施例中,在从预览图像中截取人体的人体图像时,处理器401用于执行:In an embodiment, when the human body image of the human body is intercepted from the preview image, the processor 401 is configured to execute:
调用预训练的人像检测模型对预览图像进行人像检测,得到对应预览图像的人像边界框;Call the pre-trained portrait detection model to perform portrait detection on the preview image, and obtain the portrait bounding box corresponding to the preview image;
截取人像边界框中的图像内容,得到人体图像。The image content in the bounding box of the portrait is intercepted to obtain the human body image.
在一实施例中,关键点检测模型包括特征提取网络,双分支网络以及输出网络,双分支网络包括位置分支网络和关系分支网络,在调用关键点检测模型对人体图像进行关键点检测,得到人体关键点时,处理器401用于执行:In one embodiment, the key point detection model includes a feature extraction network, a dual branch network, and an output network. The dual branch network includes a location branch network and a relationship branch network. The key point detection model is called to perform key point detection on the human body image to obtain the human body. At key points, the processor 401 is used to execute:
调用特征提取网络提取得到人体图像的图像特征;Call the feature extraction network to extract the image features of the human body image;
调用位置分支网络根据图像特征检测得到候选人体关键点,以及调用关系分支网络根据图像特征检测得到候选人体关键点之间的连接关系;The call location branch network detects the key points of the candidate body according to the image feature, and the call relationship branch network detects the connection relationship between the key points of the candidate body according to the image feature;
调用输出网络根据连接关系连接候选人体关键点,并根据人像边界框对连接后的候选人体关键点进行归一化处理,得到人体关键点。The output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
在一实施例中,位置分支网络包括N个位置分段,关系分支网络包括N个关系分段,在调用位置分支网络根据图像特征检测得到候选人体关键点,以及调用关系分支网络根据图像特征检测得到候选人体关键点之间的连接关系时,处理器401用于执行:In one embodiment, the location branch network includes N location segments, the relationship branch network includes N relationship segments, the call location branch network detects candidate body key points according to image features, and the call relationship branch network detects the image features according to the image features. When the connection relationship between the key points of the candidate body is obtained, the processor 401 is used to execute:
调用第1个位置分段根据图像特征检测得到第1组候选人体关键点,以及调用第1个关系分段根据 图像特征检测得到第1组候选人体关键点之间的连接关系;Call the first position segment to obtain the key points of the first group of candidates based on image feature detection, and call the first relationship segment to obtain the connection relationship between the key points of the first group of candidates based on image feature detection;
融合第1组候选人体关键点、第1组候选人体关键点之间的连接关系以及图像特征得到第1个融合特征,调用第2个位置分段根据第1个融合特征检测得到第2组候选人体关键点,以及调用第2个关系分段根据第1个融合特征检测得到第2组候选人体关键点之间的连接关系;Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation to obtain the second group of candidates based on the first fusion feature detection The key points of the human body, and call the second relationship segment to detect the connection relationship between the key points of the second group of candidates according to the first fusion feature;
融合第2组候选人体关键点、第2组候选人体关键点之间的连接关系以及图像特征得到第2个融合特征,以此类推,直至得到第N个位置分段根据第N-1个融合特征检测得到的第N组候选人体关键点,以及调用第N个关系分段根据第N-1个融合特征检测得到第N组候选人体关键点之间的连接关系;Fuse the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation is obtained according to the N-1th fusion The key points of the Nth group of candidates obtained by feature detection, and the connection relationship between the key points of the Nth group of candidates obtained by calling the Nth relationship segment according to the N-1th fusion feature detection;
将第N组候选人体关键点作为位置分支网络检测得到的候选人体关键点,以及将N组候选人体关键点之间的连接关系作为关系分支网络检测得到的候选人体关键点之间的连接关系。The key points of the Nth group of candidates are taken as the key points of the candidates detected by the location branch network, and the connection relationship between the key points of the N groups of candidates is taken as the connection relationship between the key points of the candidate detected by the relationship branch network.
在一实施例中,第1个位置分段包括依次连接的多个第一卷积模块和多个第二卷积模块。In an embodiment, the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence.
在一实施例中,第一卷积模块包括卷积核大小为3*3的卷积单元,第二卷积模块包括卷积核大小为1*1的卷积单元。In an embodiment, the first convolution module includes a convolution unit with a convolution kernel size of 3*3, and the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
在一实施例中,第2-N个位置分段的结构相同,第2个位置分段包括依次连接的多个第三卷积模块和多个第二卷积模块。In an embodiment, the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of second convolution modules connected in sequence.
在一实施例中,第三卷积模块包括卷积核大小为7*7的卷积单元。In an embodiment, the third convolution module includes a convolution unit with a convolution kernel size of 7*7.
在一实施例中,在根据类别区域以及人体关键点获取对应拍摄场景的定位点集合时,处理器401用于执行:In an embodiment, when acquiring a set of anchor points corresponding to the shooting scene according to the category area and the key points of the human body, the processor 401 is configured to execute:
确定每一类别区域的类别中心点,将每一类别区域的类别中心点以及每一人体关键点作为定位点,得到定位点集合。Determine the category center point of each category area, and use the category center point of each category area and each key point of the human body as positioning points to obtain a set of positioning points.
在一实施例中,在确定对应定位点集合的构图点集合时,处理器401用于执行:In an embodiment, when determining the composition point set corresponding to the anchor point set, the processor 401 is configured to execute:
根据定位点集合,确定与预览图像相似度最高的预设构图模板图像;According to the set of anchor points, determine the preset composition template image with the highest similarity to the preview image;
将预设构图模板图像中每一类别中心点以及每一人体关键点作为构图点,得到构图点集合。The center point of each category and each key point of the human body in the preset composition template image are used as composition points to obtain a set of composition points.
在一实施例中,处理器401还用于执行:In an embodiment, the processor 401 is further configured to execute:
将对应同一类别的定位点和构图点组合为数据组,以及将对应同一人体位置的定位点和构图点组合为数据组,得到多个数据组;Combine positioning points and composition points corresponding to the same category into a data group, and combine positioning points and composition points corresponding to the same human body position into a data group to obtain multiple data groups;
计算每一数据组中定位点和构图点的距离,并计算多个数据组的距离和值;Calculate the distance between the positioning point and the composition point in each data group, and calculate the distance and value of multiple data groups;
当距离和值小于预设阈值时,判定定位点集合与构图点集合匹配。When the distance and the value are less than the preset threshold, it is determined that the set of positioning points matches the set of composition points.
应当说明的是,本申请实施例提供的电子设备与上文实施例中的构图提示方法属于同一构思,在电子设备上可以运行构图提示方法实施例中提供的任一方法,其具体实现过程详见构图提示方法实施例,此处不再赘述。It should be noted that the electronic device provided in the embodiment of this application belongs to the same concept as the composition reminding method in the above embodiment. Any method provided in the composition reminding method embodiment can be run on the electronic device. The specific implementation process is detailed. See the embodiment of the composition prompting method, which will not be repeated here.
需要说明的是,对本申请实施例的构图提示方法而言,本领域普通技术人员可以理解实现本申请实施例的构图提示方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在电子设备的存储器中,并被该电子设备内的处理器执行,在执行过程中可包括如构图提示方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器、随机存取记忆体等。It should be noted that for the composition prompting method of the embodiment of the present application, those of ordinary skill in the art can understand that all or part of the process of implementing the composition prompting method of the embodiment of the present application can be completed by controlling the relevant hardware through a computer program. The computer program may be stored in a computer readable storage medium, such as stored in the memory of an electronic device, and executed by a processor in the electronic device, and may include embodiments such as composition prompting methods during execution. Process. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.
以上对本申请实施例所提供的一种构图提示方法、模型训练方法、装置、存储介质及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The composition prompting method, model training method, device, storage medium, and electronic equipment provided by the embodiments of the application are described in detail above. Specific examples are used in this article to illustrate the principles and implementation of the application. The above implementations The description of the example is only used to help understand the method and core idea of this application; at the same time, for those skilled in the art, according to the idea of this application, there will be changes in the specific implementation and the scope of application. In summary As mentioned, the content of this specification should not be construed as a limitation to this application.

Claims (20)

  1. 一种构图提示方法,其中,包括A composition reminding method, which includes
    获取拍摄场景的预览图像,并调用预训练的关键点检测模型对所述预览图像进行关键点检测,得到所述拍摄场景中人体的人体关键点;Acquiring a preview image of the shooting scene, and calling a pre-trained key point detection model to perform key point detection on the preview image, to obtain the human body key points of the human body in the shooting scene;
    将所述预览图像划分为多个类别区域,并根据所述类别区域以及所述人体关键点获取对应所述拍摄场景的定位点集合;Dividing the preview image into multiple category areas, and obtaining a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;
    确定对应所述定位点集合的构图点集合;Determining a set of composition points corresponding to the set of positioning points;
    当所述定位点集合与所述构图点集合不匹配时,输出用于指示调整所述电子设备拍摄姿态的提示信息。When the set of positioning points does not match the set of composition points, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
  2. 根据权利要求1所述的构图提示方法,其中,所述调用预训练的关键点检测模型对所述预览图像进行关键点检测,得到所述拍摄场景中人体的人体关键点,包括:The composition prompting method according to claim 1, wherein the calling a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene comprises:
    从所述预览图像中截取所述人体的人体图像;Intercepting the human body image of the human body from the preview image;
    调用所述关键点检测模型对所述人体图像进行关键点检测,得到所述人体关键点。Calling the key point detection model to perform key point detection on the human body image to obtain the human body key point.
  3. 根据权利要求2所述的构图提示方法,其中,从所述预览图像中截取所述人体的人体图像,包括:The composition prompting method according to claim 2, wherein the intercepting the human body image of the human body from the preview image comprises:
    调用预训练的人像检测模型对所述预览图像进行人像检测,得到对应所述预览图像的人像边界框;Calling a pre-trained portrait detection model to perform portrait detection on the preview image to obtain a portrait bounding box corresponding to the preview image;
    截取所述人像边界框中的图像内容,得到所述人体图像。The image content in the bounding box of the portrait is intercepted to obtain the image of the human body.
  4. 根据权利要求3所述的构图提示方法,其中,所述关键点检测模型包括特征提取网络,双分支网络以及输出网络,所述双分支网络包括位置分支网络和关系分支网络,所述调用所述关键点检测模型对所述人体图像进行关键点检测,得到所述人体关键点,包括:The composition prompting method according to claim 3, wherein the key point detection model includes a feature extraction network, a dual-branch network, and an output network, the dual-branch network includes a location branch network and a relationship branch network, and the call to the The key point detection model performs key point detection on the human body image to obtain the human body key points, including:
    调用所述特征提取网络提取得到所述人体图像的图像特征;Calling the feature extraction network to extract the image features of the human body image;
    调用所述位置分支网络根据所述图像特征检测得到候选人体关键点,以及调用所述关系分支网络根据所述图像特征检测得到候选人体关键点之间的连接关系;Calling the location branch network to detect key points of the candidate body according to the image feature, and calling the relationship branch network to detect the connection relationship between the key points of the candidate body according to the image feature;
    调用所述输出网络根据所述连接关系连接所述候选人体关键点,并根据所述人像边界框对连接后的候选人体关键点进行归一化处理,得到所述人体关键点。The output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
  5. 根据权利要求4所述的构图提示方法,其中,所述位置分支网络包括N个位置分段,所述关系分支网络包括N个关系分段,所述调用所述位置分支网络根据所述图像特征检测得到候选人体关键点,以及调用所述关系分支网络根据所述图像特征检测得到候选人体关键点之间的连接关系,包括:The composition prompting method according to claim 4, wherein the location branch network includes N location segments, the relationship branch network includes N relationship segments, and the location branch network is invoked according to the image feature Detecting the key points of the candidate body, and calling the relationship branch network to detect the connection relationship between the key points of the candidate body according to the image feature, including:
    调用第1个位置分段根据所述图像特征检测得到第1组候选人体关键点,以及调用第1个关系分段根据所述图像特征检测得到所述第1组候选人体关键点之间的连接关系;Call the first position segment to obtain the first group of candidate key points based on the image feature detection, and call the first relationship segment to obtain the connection between the first group of candidate key points based on the image feature detection relation;
    融合所述第1组候选人体关键点、所述第1组候选人体关键点之间的连接关系以及所述图像特征得到第1个融合特征,调用第2个位置分段根据所述第1个融合特征检测得到第2组候选人体关键点,以及调用第2个关系分段根据所述第1个融合特征检测得到所述第2组候选人体关键点之间的连接关系;Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation according to the first Fusion feature detection obtains the key points of the second group of candidates, and calls the second relationship segment to obtain the connection relationship between the key points of the second group of candidates according to the first fusion feature detection;
    融合所述第2组候选人体关键点、所述第2组候选人体关键点之间的连接关系以及所述图像特征得到第2个融合特征,以此类推,直至得到第N个位置分段根据第N-1个融合特征检测得到的第N组候选人体关键点,以及调用第N个关系分段根据第N-1个融合特征检测得到所述第N组候选人体关键点之间的连接关系;Fusion of the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation basis is obtained The key points of the Nth group of candidates obtained by the N-1th fusion feature detection, and the Nth relationship segment is called to obtain the connection relationship between the key points of the Nth group of candidates according to the N-1th fusion feature detection ;
    将所述第N组候选人体关键点作为所述位置分支网络检测得到的候选人体关键点,以及将所述N组候选人体关键点之间的连接关系作为所述关系分支网络检测得到的所述候选人体关键点之间的连接关系。Use the Nth group of candidate key points as the candidate key points detected by the location branch network, and use the connection relationship between the N groups of candidate key points as the relationship branch network detected The connection between the key points of the candidate body.
  6. 根据权利要求5所述的构图提示方法,其中,所述第1个位置分段包括依次连接的多个第一卷积模块和多个第二卷积模块,所述第一卷积模块包括卷积核大小为3*3的卷积单元,所述第二卷积模块包括卷积核大小为1*1的卷积单元。The composition prompting method according to claim 5, wherein the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence, and the first convolution module includes a convolution module. A convolution unit with a convolution kernel size of 3*3, and the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
  7. 根据权利要求6所述的构图提示方法,其中,第2-N个位置分段的结构相同,所述第2个位置分段包括依次连接的多个第三卷积模块和多个所述第二卷积模块。The composition prompting method according to claim 6, wherein the structure of the 2-Nth position segment is the same, and the second position segment includes a plurality of third convolution modules and a plurality of the first convolution modules connected in sequence. Two convolution module.
  8. 根据权利要求7所述的构图提示方法,其中,所述第三卷积模块包括卷积核大小为7*7的卷积单元。8. The composition prompting method according to claim 7, wherein the third convolution module includes a convolution unit with a convolution kernel size of 7*7.
  9. 根据权利要求1-8任一项所述的构图提示方法,其中,所述确定对应所述定位点集合的构图点集合之后,还包括:8. The composition prompting method according to any one of claims 1-8, wherein after the determining a composition point set corresponding to the positioning point set, the method further comprises:
    当所述定位点集合与所述构图点集合匹配时,对所述拍摄场景进行拍摄,得到拍摄图像。When the set of positioning points matches the set of composition points, the shooting scene is photographed to obtain a photographed image.
  10. 根据权利要求1-8任一项所述的构图提示方法,其中,所述根据所述类别区域以及所述人体关键点获取对应所述拍摄场景的定位点集合,包括:8. The composition prompting method according to any one of claims 1-8, wherein the acquiring a set of positioning points corresponding to the shooting scene according to the category area and the key points of the human body comprises:
    确定每一类别区域的类别中心点,将每一类别区域的类别中心点以及每一人体关键点作为所述定位点,得到所述定位点集合。The category center point of each category area is determined, and the category center point of each category area and each key point of the human body are used as the positioning points to obtain the set of positioning points.
  11. 根据权利要求10所述的构图提示方法,其中,所述确定对应所述定位点集合的构图点集合,包括:The composition prompting method according to claim 10, wherein the determining a composition point set corresponding to the positioning point set comprises:
    根据所述定位点集合,确定与所述预览图像相似度最高的预设构图模板图像;Determine the preset composition template image with the highest similarity to the preview image according to the set of positioning points;
    将所述预设构图模板图像中每一类别中心点以及每一人体关键点作为所述构图点,得到所述构图点集合。Taking each category center point and each human body key point in the preset composition template image as the composition point to obtain the composition point set.
  12. 根据权利要求11所述的构图提示方法,其中,还包括:The composition prompting method according to claim 11, further comprising:
    将对应同一类别的定位点和构图点组合为数据组,以及将对应同一人体位置的定位点和构图点组合为数据组,得到多个数据组;Combine positioning points and composition points corresponding to the same category into a data group, and combine positioning points and composition points corresponding to the same human body position into a data group to obtain multiple data groups;
    计算每一数据组中定位点和构图点的距离,并计算多个数据组的距离和值;Calculate the distance between the positioning point and the composition point in each data group, and calculate the distance and value of multiple data groups;
    当所述距离和值小于预设阈值时,判定所述定位点集合与所述构图点集合匹配。When the distance and value are less than a preset threshold, it is determined that the set of positioning points matches the set of composition points.
  13. 一种构图提示装置,其中,包括:A composition prompting device, which includes:
    关键点检测模块,用于获取拍摄场景的预览图像,并调用预训练的关键点检测模型对所述预览图像进行关键点检测,得到所述拍摄场景中人体的人体关键点;The key point detection module is used to obtain a preview image of the shooting scene, and call a pre-trained key point detection model to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene;
    定位点确定模块,用于将所述预览图像划分为多个类别区域,并根据所述类别区域以及所述人体关键点获取对应所述拍摄场景的定位点集合;An anchor point determination module, configured to divide the preview image into a plurality of category areas, and obtain an anchor point set corresponding to the shooting scene according to the category area and the key points of the human body;
    构图点确定模块,用于确定对应所述定位点集合的构图点集合;A composition point determination module, configured to determine a composition point set corresponding to the positioning point set;
    构图提示模块,用于当所述定位点集合与所述构图点集合不匹配时,输出用于指示调整所述电子设备拍摄姿态的提示信息。The composition prompting module is configured to output prompt information for instructing to adjust the shooting posture of the electronic device when the set of positioning points does not match the set of composition points.
  14. 一种存储介质,其上存储有计算机程序,其中,当所述计算机程序被处理器加载时执行:A storage medium on which a computer program is stored, wherein, when the computer program is loaded by a processor, it executes:
    获取拍摄场景的预览图像,并调用预训练的关键点检测模型对所述预览图像进行关键点检测,得到所述拍摄场景中人体的人体关键点;Acquiring a preview image of the shooting scene, and calling a pre-trained key point detection model to perform key point detection on the preview image, to obtain the human body key points of the human body in the shooting scene;
    将所述预览图像划分为多个类别区域,并根据所述类别区域以及所述人体关键点获取对应所述拍摄场景的定位点集合;Dividing the preview image into multiple category areas, and obtaining a set of positioning points corresponding to the shooting scene according to the category areas and the key points of the human body;
    确定对应所述定位点集合的构图点集合;Determining a set of composition points corresponding to the set of positioning points;
    当所述定位点集合与所述构图点集合不匹配时,输出用于指示调整所述电子设备拍摄姿态的提示信息。When the set of positioning points does not match the set of composition points, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
  15. 一种电子设备,包括处理器和存储器,所述存储器储存有计算机程序,其中,所述处理器通过加载所述计算机程序,用于执行:An electronic device includes a processor and a memory, the memory stores a computer program, wherein the processor loads the computer program to execute:
    获取待处理图像,并识别所述待处理图像的水平分界线;Acquiring the image to be processed, and identifying the horizontal dividing line of the image to be processed;
    旋转所述待处理图像以将所述水平分界线旋转至预设位置,并裁剪旋转后的待处理图像得到裁剪图像;Rotating the to-be-processed image to rotate the horizontal dividing line to a preset position, and cropping the rotated to-be-processed image to obtain a cropped image;
    将所述裁剪图像划分为多个子图像,并将所述子图像以及所述待处理图像作为候选图像进行图像质量评分;Dividing the cropped image into a plurality of sub-images, and using the sub-images and the image to be processed as candidate images for image quality scoring;
    筛选出质量评分最高的候选图像作为所述待处理图像的处理结果图像。The candidate image with the highest quality score is selected as the processing result image of the image to be processed.
  16. 根据权利要求15所述的电子设备,其中,在调用预训练的关键点检测模型对所述预览图像进行 关键点检测,得到所述拍摄场景中人体的人体关键点时,所述处理器用于执行:The electronic device according to claim 15, wherein, when a pre-trained key point detection model is called to perform key point detection on the preview image to obtain the human body key points of the human body in the shooting scene, the processor is configured to execute :
    从所述预览图像中截取所述人体的人体图像;Intercepting the human body image of the human body from the preview image;
    调用所述关键点检测模型对所述人体图像进行关键点检测,得到所述人体关键点。Calling the key point detection model to perform key point detection on the human body image to obtain the human body key point.
  17. 根据权利要求16所述的电子设备,其中,在从所述预览图像中截取所述人体的人体图像时,所述处理器用于执行:The electronic device according to claim 16, wherein, when the human body image of the human body is intercepted from the preview image, the processor is configured to execute:
    调用预训练的人像检测模型对所述预览图像进行人像检测,得到对应所述预览图像的人像边界框;Calling a pre-trained portrait detection model to perform portrait detection on the preview image to obtain a portrait bounding box corresponding to the preview image;
    截取所述人像边界框中的图像内容,得到所述人体图像。The image content in the bounding box of the portrait is intercepted to obtain the image of the human body.
  18. 根据权利要求17所述的电子设备,其中,所述关键点检测模型包括特征提取网络,双分支网络以及输出网络,所述双分支网络包括位置分支网络和关系分支网络,在调用所述关键点检测模型对所述人体图像进行关键点检测,得到所述人体关键点时,所述处理器用于执行:The electronic device according to claim 17, wherein the key point detection model includes a feature extraction network, a dual branch network, and an output network, and the dual branch network includes a location branch network and a relation branch network. The detection model detects the key points of the human body image, and when the key points of the human body are obtained, the processor is configured to execute:
    调用所述特征提取网络提取得到所述人体图像的图像特征;Calling the feature extraction network to extract the image features of the human body image;
    调用所述位置分支网络根据所述图像特征检测得到候选人体关键点,以及调用所述关系分支网络根据所述图像特征检测得到候选人体关键点之间的连接关系;Calling the location branch network to detect key points of the candidate body according to the image feature, and calling the relationship branch network to detect the connection relationship between the key points of the candidate body according to the image feature;
    调用所述输出网络根据所述连接关系连接所述候选人体关键点,并根据所述人像边界框对连接后的候选人体关键点进行归一化处理,得到所述人体关键点。The output network is called to connect the key points of the candidate body according to the connection relationship, and normalize the key points of the candidate body after the connection according to the bounding box of the portrait to obtain the key points of the human body.
  19. 根据权利要求18所述的电子设备,其中,所述位置分支网络包括N个位置分段,所述关系分支网络包括N个关系分段,在调用所述位置分支网络根据所述图像特征检测得到候选人体关键点,以及调用所述关系分支网络根据所述图像特征检测得到候选人体关键点之间的连接关系时,所述处理器用于执行:The electronic device according to claim 18, wherein the location branch network includes N location segments, the relationship branch network includes N relationship segments, and the location branch network is called based on the image feature detection. When the key points of the candidate body are called, and the connection relationship between the key points of the candidate body is obtained by calling the relationship branch network according to the image feature detection, the processor is configured to execute:
    调用第1个位置分段根据所述图像特征检测得到第1组候选人体关键点,以及调用第1个关系分段根据所述图像特征检测得到所述第1组候选人体关键点之间的连接关系;Call the first position segment to obtain the first group of candidate key points based on the image feature detection, and call the first relationship segment to obtain the connection between the first group of candidate key points based on the image feature detection relation;
    融合所述第1组候选人体关键点、所述第1组候选人体关键点之间的连接关系以及所述图像特征得到第1个融合特征,调用第2个位置分段根据所述第1个融合特征检测得到第2组候选人体关键点,以及调用第2个关系分段根据所述第1个融合特征检测得到所述第2组候选人体关键点之间的连接关系;Fusion of the key points of the first group of candidates, the connection relationship between the key points of the first group of candidates, and the image features to obtain the first fusion feature, and call the second position segmentation according to the first Fusion feature detection obtains the key points of the second group of candidates, and calls the second relationship segment to obtain the connection relationship between the key points of the second group of candidates according to the first fusion feature detection;
    融合所述第2组候选人体关键点、所述第2组候选人体关键点之间的连接关系以及所述图像特征得到第2个融合特征,以此类推,直至得到第N个位置分段根据第N-1个融合特征检测得到的第N组候选人体关键点,以及调用第N个关系分段根据第N-1个融合特征检测得到所述第N组候选人体关键点之间的连接关系;Fusion of the key points of the second group of candidates, the connection relationship between the key points of the second group of candidates, and the image features to obtain the second fusion feature, and so on, until the Nth position segmentation basis is obtained The key points of the Nth group of candidates obtained by the N-1th fusion feature detection, and the Nth relationship segment is called to obtain the connection relationship between the key points of the Nth group of candidates according to the N-1th fusion feature detection ;
    将所述第N组候选人体关键点作为所述位置分支网络检测得到的候选人体关键点,以及将所述N组候选人体关键点之间的连接关系作为所述关系分支网络检测得到的所述候选人体关键点之间的连接关系。Use the Nth group of candidate key points as the candidate key points detected by the location branch network, and use the connection relationship between the N groups of candidate key points as the relationship branch network detected The connection between the key points of the candidate body.
  20. 根据权利要求19所述的电子设备,其中,所述第1个位置分段包括依次连接的多个第一卷积模块和多个第二卷积模块,所述第一卷积模块包括卷积核大小为3*3的卷积单元,所述第二卷积模块包括卷积核大小为1*1的卷积单元。The electronic device according to claim 19, wherein the first position segment includes a plurality of first convolution modules and a plurality of second convolution modules connected in sequence, and the first convolution module includes convolution A convolution unit with a kernel size of 3*3, and the second convolution module includes a convolution unit with a convolution kernel size of 1*1.
PCT/CN2021/074905 2020-02-27 2021-02-02 Photographic composition prompting method and apparatus, storage medium, and electronic device WO2021169754A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010125410.4 2020-02-27
CN202010125410.4A CN111277759B (en) 2020-02-27 2020-02-27 Composition prompting method and device, storage medium and electronic equipment

Publications (1)

Publication Number Publication Date
WO2021169754A1 true WO2021169754A1 (en) 2021-09-02

Family

ID=71000403

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074905 WO2021169754A1 (en) 2020-02-27 2021-02-02 Photographic composition prompting method and apparatus, storage medium, and electronic device

Country Status (2)

Country Link
CN (1) CN111277759B (en)
WO (1) WO2021169754A1 (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111277759B (en) * 2020-02-27 2021-08-31 Oppo广东移动通信有限公司 Composition prompting method and device, storage medium and electronic equipment
CN111860276B (en) * 2020-07-14 2023-04-11 咪咕文化科技有限公司 Human body key point detection method, device, network equipment and storage medium
CN112036319B (en) * 2020-08-31 2023-04-18 北京字节跳动网络技术有限公司 Picture processing method, device, equipment and storage medium
CN113194254A (en) * 2021-04-28 2021-07-30 上海商汤智能科技有限公司 Image shooting method and device, electronic equipment and storage medium
CN116471477A (en) * 2022-01-11 2023-07-21 华为技术有限公司 Method for debugging camera and related equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008035246A (en) * 2006-07-28 2008-02-14 Mitsubishi Space Software Kk Composition evaluation device, composition adjustment device, image photographing device, composition evaluation program, composition adjustment program, image photographing program, composition evaluation method, composition adjustment method, and image photographing method
US20100110266A1 (en) * 2008-10-31 2010-05-06 Samsung Electronics Co., Ltd. Image photography apparatus and method for proposing composition based person
CN107509032A (en) * 2017-09-08 2017-12-22 维沃移动通信有限公司 One kind is taken pictures reminding method and mobile terminal
CN109660719A (en) * 2018-12-11 2019-04-19 维沃移动通信有限公司 A kind of information cuing method and mobile terminal
CN109788191A (en) * 2018-12-21 2019-05-21 中国科学院自动化研究所南京人工智能芯片创新研究院 Photographic method, device, computer equipment and storage medium
CN111277759A (en) * 2020-02-27 2020-06-12 Oppo广东移动通信有限公司 Composition prompting method and device, storage medium and electronic equipment

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7317815B2 (en) * 2003-06-26 2008-01-08 Fotonation Vision Limited Digital image processing composition using face detection information
TW201023633A (en) * 2008-12-05 2010-06-16 Altek Corp An image capturing device for automatically position indicating and the automatic position indicating method thereof
CN101908153B (en) * 2010-08-21 2012-11-21 上海交通大学 Method for estimating head postures in low-resolution image treatment
CN104917951A (en) * 2014-03-14 2015-09-16 宏碁股份有限公司 Camera device and auxiliary human image shooting method thereof
CN104601889B (en) * 2015-01-20 2018-03-30 广东欧珀移动通信有限公司 The photographic method and device of a kind of mobile terminal
CN108737733B (en) * 2018-06-08 2020-08-04 Oppo广东移动通信有限公司 Information prompting method and device, electronic equipment and computer readable storage medium
KR102661983B1 (en) * 2018-08-08 2024-05-02 삼성전자주식회사 Method for processing image based on scene recognition of image and electronic device therefor
CN109218615A (en) * 2018-09-27 2019-01-15 百度在线网络技术(北京)有限公司 Image taking householder method, device, terminal and storage medium
CN109614613B (en) * 2018-11-30 2020-07-31 北京市商汤科技开发有限公司 Image description statement positioning method and device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008035246A (en) * 2006-07-28 2008-02-14 Mitsubishi Space Software Kk Composition evaluation device, composition adjustment device, image photographing device, composition evaluation program, composition adjustment program, image photographing program, composition evaluation method, composition adjustment method, and image photographing method
US20100110266A1 (en) * 2008-10-31 2010-05-06 Samsung Electronics Co., Ltd. Image photography apparatus and method for proposing composition based person
CN107509032A (en) * 2017-09-08 2017-12-22 维沃移动通信有限公司 One kind is taken pictures reminding method and mobile terminal
CN109660719A (en) * 2018-12-11 2019-04-19 维沃移动通信有限公司 A kind of information cuing method and mobile terminal
CN109788191A (en) * 2018-12-21 2019-05-21 中国科学院自动化研究所南京人工智能芯片创新研究院 Photographic method, device, computer equipment and storage medium
CN111277759A (en) * 2020-02-27 2020-06-12 Oppo广东移动通信有限公司 Composition prompting method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN111277759B (en) 2021-08-31
CN111277759A (en) 2020-06-12

Similar Documents

Publication Publication Date Title
WO2021169754A1 (en) Photographic composition prompting method and apparatus, storage medium, and electronic device
CN109359575B (en) Face detection method, service processing method, device, terminal and medium
WO2021036059A1 (en) Image conversion model training method, heterogeneous face recognition method, device and apparatus
WO2021227726A1 (en) Methods and apparatuses for training face detection and image detection neural networks, and device
WO2019128508A1 (en) Method and apparatus for processing image, storage medium, and electronic device
Mehmood et al. Efficient image recognition and retrieval on IoT-assisted energy-constrained platforms from big data repositories
WO2020199480A1 (en) Body movement recognition method and device
WO2020228525A1 (en) Place recognition method and apparatus, model training method and apparatus for place recognition, and electronic device
Haider et al. Deepgender: real-time gender classification using deep learning for smartphones
WO2021175071A1 (en) Image processing method and apparatus, storage medium, and electronic device
CN111327828B (en) Photographing method and device, electronic equipment and storage medium
KR20160101973A (en) System and method for identifying faces in unconstrained media
WO2019153504A1 (en) Group creation method and terminal thereof
WO2021203823A1 (en) Image classification method and apparatus, storage medium, and electronic device
EP3905104B1 (en) Living body detection method and device
CN113298158B (en) Data detection method, device, equipment and storage medium
WO2024001123A1 (en) Image recognition method and apparatus based on neural network model, and terminal device
WO2021217919A1 (en) Facial action unit recognition method and apparatus, and electronic device, and storage medium
WO2021197466A1 (en) Eyeball detection method, apparatus and device, and storage medium
Cai et al. Visual focus of attention estimation using eye center localization
CN113591562A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN107168536A (en) Test question searching method, test question searching device and electronic terminal
Zuo et al. Face liveness detection algorithm based on livenesslight network
CN112528978B (en) Face key point detection method and device, electronic equipment and storage medium
WO2022120669A1 (en) Gesture recognition method, computer device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21760767

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21760767

Country of ref document: EP

Kind code of ref document: A1