WO2021175071A1 - Image processing method and apparatus, storage medium, and electronic device - Google Patents

Image processing method and apparatus, storage medium, and electronic device Download PDF

Info

Publication number
WO2021175071A1
WO2021175071A1 PCT/CN2021/075025 CN2021075025W WO2021175071A1 WO 2021175071 A1 WO2021175071 A1 WO 2021175071A1 CN 2021075025 W CN2021075025 W CN 2021075025W WO 2021175071 A1 WO2021175071 A1 WO 2021175071A1
Authority
WO
WIPO (PCT)
Prior art keywords
key point
human body
image
attribution
information
Prior art date
Application number
PCT/CN2021/075025
Other languages
French (fr)
Chinese (zh)
Inventor
吴佳涛
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2021175071A1 publication Critical patent/WO2021175071A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands

Definitions

  • This application relates to the field of image processing technology, and in particular to an image processing method, device, storage medium and electronic equipment.
  • the key point detection is mainly to detect the key points of the human body, that is, to detect some key points of the human body, such as eyes, nose, elbows, shoulders, etc., and connect them in order according to the order of the limbs. Describe the human body.
  • the embodiments of the present application provide an image processing method, device, storage medium, and electronic equipment, which can improve the efficiency of key point detection.
  • a set of human body key points belonging to the same human body is identified.
  • the image acquisition module is used to acquire the to-be-detected image that requires key point detection
  • the image detection module is used to call a pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information;
  • the human body recognition module is used to identify a set of human body key points belonging to the same human body based on the key point location information and the key point attribution information.
  • the storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is loaded by a processor, the image processing method as provided in the present application is executed.
  • the electronic device provided by the embodiment of the present application includes a processor and a memory, the memory stores a computer program, and the processor loads the computer program to execute the image processing method provided by the present application.
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the application.
  • Fig. 2 is an example diagram of a key point detection interface provided by an embodiment of the present application.
  • Fig. 3 is an example diagram of a selection sub-interface provided by an embodiment of the present application.
  • Fig. 4 is a schematic structural diagram of a key point detection model provided by an embodiment of the present application.
  • Fig. 5 is a schematic structural diagram of a feature prediction network in an embodiment of the present application.
  • Fig. 6 is a schematic structural diagram of a home branch in an embodiment of the present application.
  • FIG. 7 is a schematic diagram of another flow of the image processing method provided in an embodiment of the present application.
  • Fig. 8 is an example diagram of outputting prompt information in an embodiment of the present application.
  • Fig. 9 is a diagram showing an example of matching of positioning points and composition points in an embodiment of the present application.
  • Fig. 10 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
  • FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • AI Artificial Intelligence
  • digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology.
  • Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics.
  • Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
  • Machine Learning is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance.
  • Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence.
  • Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies.
  • the embodiments of the present application provide an image processing method, an image processing device, a storage medium, and electronic equipment, wherein the execution subject of the image processing method may be the image processing device provided in the embodiment of the application, or integrate the image processing device
  • the image processing device can be implemented in hardware or software.
  • the electronic device may be a device equipped with a processor (including but not limited to a general-purpose processor, a customized processor, etc.) and having processing capabilities, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.
  • This application provides an image processing method, including:
  • a set of human body key points belonging to the same human body is identified.
  • the key point detection model includes a feature extraction network and a feature prediction network, and the pre-trained key point detection model is invoked to perform key point detection on the image to be detected to obtain key point positions Information and key point attribution information, including:
  • the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on the image feature to obtain key point location information and the key point Attribution information, including:
  • the attribution branch is called to perform key point attribution detection based on the image feature and the key point location information, to obtain the key point attribution information.
  • the location branch includes a convolution unit with a convolution kernel size of 1*1.
  • the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module, and the attribution branch is called to perform key points based on the image features and the key point location information. Attribution detection to obtain attribution information of the key point includes:
  • the feature optimization submodule includes a convolution unit with a convolution kernel size of 1*1
  • the output submodule includes a convolution unit with a convolution kernel size of 1*1.
  • the method before acquiring the image to be detected that requires key point detection, the method further includes:
  • the key point position loss and the key point attribution loss are fused to obtain a fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
  • the acquiring key point attribution loss based on the sample key point location information and the predicted key point attribution information includes:
  • the acquiring the image to be detected that requires key point detection includes:
  • the electronic device When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as an image to be detected;
  • the method further includes:
  • FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the application.
  • the image processing method provided by an embodiment of the application may be as follows:
  • the key point detection mentioned in this application is mainly the detection of key points of the human body, that is, the detection of some key points of the human body, such as eyes, nose, elbows, shoulders, etc., Connect in sequence and describe the human body through these key points of the human body.
  • the electronic device can receive the key point detection request input by the user, and obtain the to-be-detected image that requires key-point detection according to the key-point detection request, and can also automatically identify the to-be-detected image that needs to be key-point detection, and obtain the The image to be detected is used for key point detection.
  • the electronic device may receive the input key point detection request through the key point detection interface including the request input interface, as shown in Figure 2, the request input interface may be in the form of an input box, and the user can request input in the form of the input box Enter the identification information of the image that needs key point detection in the interface, and enter the confirmation information (for example, directly press the Enter key on the keyboard) to input the key point detection request, which carries the image that needs key point detection ⁇ identification information.
  • the electronic device can obtain the image that needs to be detected for the key point according to the identification information in the received key point detection request, and record it as the image to be detected.
  • the key point detection interface described in Figure 2 also includes an "open" control.
  • the electronic device detects that the open control is triggered, it will superimpose the selection sub-interface on the key point detection interface.
  • the selection sub-interface provides the user with thumbnails of images that can be used for key point detection, such as image A, image B, image C, image D, image E, image F and other image thumbnails.
  • the user can trigger the confirmation control provided by the selection sub-interface after selecting the thumbnail of the image that needs key point detection to input to the electronic device
  • a key point detection request the key point detection request is associated with the thumbnail of the image selected by the user, and instructs the electronic device to use the image selected by the user as the image to be detected that requires key point detection.
  • the pre-trained key point detection model is called to perform key point detection on the image to be detected to obtain key point location information and key point attribution information.
  • a machine learning method is used in this application to pre-train a key point detection model.
  • the key point detection model is configured to simultaneously predict all the key points of the human body in the input image and the human body to which they belong, which can be set locally in the electronic device or in the server.
  • the configuration of the key point detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs.
  • the electronic device After the electronic device obtains the to-be-detected image that requires key-point detection, it calls the pre-trained key-point detection model from the local or server, and inputs the acquired to-be-detected image into the key-point detection model to obtain the key point
  • the key point location information and key point attribution information output by the detection model are used to describe all the key points of the human body in the image to be detected, and the key point attribution information is used to describe the human body to which each key point of the human body belongs.
  • the key point location information describes the existence of the human key point A and the human key point B in the image to be detected
  • the key point attribution information describes that the human key point A belongs to the human body A, and the human key point B belongs to the human body B.
  • a set of human body key points belonging to the same human body is identified according to key point location information and key point attribution information.
  • the key point location information describes all the key points of the human body in the image to be detected
  • the key point attribution information describes the human body to which each key point of the human body belongs.
  • This application obtains the image to be detected that requires key point detection; calls the pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information; according to key point location information and key point attribution The information identifies a set of key points of the human body belonging to the same human body.
  • the present application does not require a human body detection algorithm as a front support, and can detect all key points of the human body in the image at the same time, thereby achieving the purpose of improving the efficiency of key point detection.
  • the key point detection model includes a feature extraction network and a feature prediction network.
  • the pre-trained key point detection model is called to perform key point detection on the image to be detected to obtain key point location information and key point attribution information, including:
  • the key point detection model is composed of two parts, which are a feature extraction network used for feature extraction and a feature prediction network used for key point detection.
  • the feature extraction network can be any known feature extraction network, such as VGG, MobileNet, and ResNet. If a deeper network model such as VGG and ResNet is used, the computational complexity of the model will increase, but higher Detection accuracy. If a lightweight network model such as MobileNet is used, a certain detection accuracy will be lost, but a faster detection speed can be obtained.
  • the specific selection can be made by a person of ordinary skill in the art according to actual needs. This application does not specifically limit this .
  • the electronic device when the electronic device calls the key point detection model to perform key point detection on the image to be detected, it can first call the feature extraction network in the key point detection model to perform feature extraction on the image to be detected to obtain the image features of the image to be detected.
  • the key point location information is displayed in the form of a key point location heat map, which is a three-dimensional matrix of height*width*keypoints, where height and width represent height and width respectively, and keypoints represent the number of key points of the human body, that is to say ,
  • a key point location heat map which is a three-dimensional matrix of height*width*keypoints, where height and width represent height and width respectively, and keypoints represent the number of key points of the human body, that is to say .
  • Each key point of the human body corresponds to a height*width matrix
  • the value of each position in the matrix indicates the possibility of the key point of the human body at this position, and the larger the value, the more likely the key point of the human body is at this position.
  • the key point location heat map can be pooled to the maximum, and then the key points before and after pooling can be pooled.
  • the display form of the key point attribution information can be an integer human body number, that is, at each key point position of the human body detected, the feature prediction module will predict an integer as the human body number, and the human body key points with the same body number belong to The same human body.
  • the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on image features to obtain key point location information and key point attribution information, including:
  • the key point detection task is segmented, and a dual branch network is used to realize key point detection.
  • One of the branch networks is configured to detect the key points of the human body in the image, which is recorded as the location branch.
  • the other branch network is configured to detect the human body to which the key points of the human body belong, and it is recorded as the belonging branch.
  • the electronic device calls the feature prediction network to perform key point detection on image features, it can call the location branch in the feature prediction network to perform key point location detection based on the image feature to obtain key point location information corresponding to the image to be detected.
  • the attribution of the key points of the human body is the deeper feature information of the key point location. Only by knowing the accurate key point location can a more accurate prediction of the key point attribution be made. Based on this consideration, the electronic device calls the attribution branch in the feature prediction network to perform key point attribution detection based on image features and key point location information, and obtain key point attribution information corresponding to the image to be detected.
  • the location branch includes a convolution unit with a convolution kernel size of 1*1.
  • the attribution branch includes a feature optimization submodule, a fusion submodule, and an output submodule.
  • the attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information, including:
  • the attribution branch is composed of three parts, which are feature optimization sub-modules used to further extract image features to optimize image features, and are used to fuse optimized image features and key points.
  • the location information fusion sub-module is used to perform key point attribution and detection of the fusion feature to the output sub-module.
  • the electronic device when the electronic device calls the attribution branch to perform key point attribution detection based on image features and key point location information, it can call the feature optimization submodule in the attribution branch to optimize image features, and record the optimized image features as optimized Image features; then, call the fusion sub-module to fuse and optimize the image features and key point location information to obtain the fusion feature; finally, call the output sub-module to detect the key point location of the fusion feature, and obtain the key point location information corresponding to the image to be detected.
  • the feature optimization sub-module includes a 1*1 convolution unit
  • the output sub-module includes a 1*1 convolution unit
  • the fusion sub-module includes a Concat unit.
  • the electronic device calls the feature optimization sub-module to perform further convolution operations on the image features to optimize the image features and obtain the optimized image Feature; then, the electronic device calls the fusion submodule to connect the feature map and the key point location heat map to achieve feature fusion.
  • the feature map is 19-dimensional
  • the method before acquiring the image to be detected that requires key point detection, the method further includes:
  • the fusion loss is obtained by fusing the position loss of the key point and the loss of the key point attribution, and the parameters of the key point detection model are adjusted according to the fusion loss.
  • the embodiment of the present application also provides a training solution for the key point detection model.
  • the electronic device first obtains the sample image and the sample key point position information corresponding to the sample image.
  • an image including the human body can be obtained from the ImageNet data set as the sample image, and the corresponding sample key point position information is obtained by labeling the sample image.
  • the electronic device also constructs a key point detection model, and the structure of the key point detection model can refer to the relevant description in the above embodiment, which will not be repeated here.
  • the electronic device calls the key point detection model to perform key point detection on the sample image, and correspondingly obtains the predicted key point location information and predicted key point attribution information of the corresponding sample image.
  • the predicted key point location information describes all the information in the sample image.
  • the key points of the human body, and the attribution information of the predicted key points describes the human body to which each key point of the human body belongs.
  • the electronic device obtains the key point location loss according to the sample key point location information and the predicted key point location information, and the key point location loss is used to measure the difference between the predicted key point location information and the sample key point location information.
  • the key point location loss can be expressed as:
  • L position represents the loss of the key point position
  • (i,j) represents the coordinate position
  • p(i,j) represents the value of the position (i,j) in the heat map of the predicted key point position
  • g(i,j) represents the sample
  • width represents the width of the heat map of the predicted key point location
  • height represents the height of the heat map of the predicted key point location.
  • the electronic device also obtains the key point attribution loss based on the predicted key point location information and the predicted key point attribution information. It should be noted that the loss of key point attribution is different from the loss of key point position. Because the number of human bodies in different sample images is different, it is impossible to pre-mark the attribution of human body key points in the sample image, that is, there is no real human body key point. Attribution is the training goal.
  • obtaining the attribution loss of the key point according to the predicted key point location information and the predicted key point attribution information includes:
  • Clustering algorithm (selected by those of ordinary skill in the art according to actual needs) is used to cluster key points according to the key points of the human body in the sample image described by the predicted key point location information to obtain multiple sets of human key points belonging to different human bodies , Among them, the human body key points in the same human body key point set belong to the same human body.
  • n represents the set of human body key points corresponding to the nth person
  • k represents the kth key point
  • K represents the number of human body key points
  • h nk represents the body number at the kth person key point of the nth person
  • N the number of key point collections of the human body
  • is a constant, taking empirical values, n/n’ ⁇ [1, N], and n ⁇ n’;
  • L attribution represents the attribution loss of key points.
  • the electronic device after obtaining the key point location loss and the key point attribution loss, the electronic device also fuses the key point location loss and the key point attribution loss to obtain the fusion loss, which can be expressed as:
  • Ltotal represents the fusion loss
  • the electronic device After obtaining the fusion loss, the electronic device adjusts the parameters of the key point detection model according to the fusion loss until the training of the key point detection model is completed.
  • acquiring the image to be detected that requires key point detection includes:
  • the shooting scene is the scene that the camera of the electronic device is aimed at after the shooting function is enabled, and it can be any scene, which can include people and objects.
  • the electronic device can start the system application "camera” of the electronic device according to the user's operation. After the "camera” is started, the electronic device will enable the shooting function to collect images in real time through the camera. At this time, the camera is aimed at The scene is the shooting scene.
  • the electronic device can start the "camera” according to the user's touch operation on the entrance of the "camera”, and can also start the "camera” according to the user's voice password "start the camera” and so on.
  • the electronic device when it enables the shooting function, it obtains a preview image of the shooting scene, uses the preview image as the image to be detected that requires key point detection, performs key point detection on it, and obtains the preview image.
  • a human body key point set corresponding to each human body will be finally obtained, and there are multiple human body key point sets; and when there are multiple human body key points in the preview image
  • a set of human body key points corresponding to the human body will finally be obtained.
  • the electronic device determines the target human body based on the identified set of key points of the human body. For example, when there is a set of human body key points, the human body corresponding to the human body key point set is directly determined as the target human body; when there are multiple human body key point sets, one of the human bodies is determined according to the preset target decision strategy The human body corresponding to the key point set is regarded as the target human body.
  • the electronic device After determining the target human body, the electronic device further classifies the target human body in the shooting scene according to the set of human body key points corresponding to the target human body and the preset human body classification strategy to obtain the human body type of the target human body.
  • the classification of human body types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.
  • the electronic device determines the positioning point corresponding to the target human body according to the human body type and the set of key points of the human body corresponding to the target human body according to the preset positioning point decision strategy. In addition, it also determines the corresponding target human body according to the preset composition type decision strategy.
  • the type of composition is used to represent the position of the target human body. It should be noted that the division of composition types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.
  • a plurality of optional candidate composition points are preset.
  • the electronic device may further determine the currently selectable candidate composition points according to the determined composition type, and then determine the composition point corresponding to the target human body from the currently selectable candidate composition points according to the positioning point.
  • the electronic device After determining the positioning point and composition point of the corresponding target human body, the electronic device determines in real time whether the positioning point matches the composition point, and if it does not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device to make the target in the shooting scene
  • the positioning point of the human body is matched with the composition point to obtain a better composition; if they match, the shooting scene can be directly photographed to obtain the shooting image of the shooting scene.
  • matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance.
  • This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.
  • FIG. 7 is a schematic diagram of another flow of the image processing method provided by the embodiment of the application.
  • the flow of the image processing method provided by the embodiment of the application may also be as follows:
  • the electronic device when the shooting function is enabled, the electronic device obtains a preview image of the shooting scene, and uses the preview image as an image to be detected that requires key point detection.
  • the shooting scene is the scene that the camera of the electronic device is aimed at after the shooting function is enabled, and it can be any scene, which can include people and objects.
  • the electronic device can start the system application "camera” of the electronic device according to the user's operation. After the "camera” is started, the electronic device will enable the shooting function to collect images in real time through the camera. At this time, the camera is aimed at The scene is the shooting scene.
  • the electronic device can start the "camera” according to the user's touch operation on the entrance of the "camera”, and can also start the "camera” according to the user's voice password "start the camera” and so on.
  • the electronic device acquires a preview image of the shooting scene when the shooting function is enabled, and uses the preview image as the image to be detected that requires key point detection.
  • the electronic device invokes the pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information.
  • a machine learning method is used in this application to pre-train a key point detection model.
  • the key point detection model is configured to simultaneously predict all the key points of the human body in the input image and the human body to which they belong, which can be set locally in the electronic device or in the server.
  • the configuration of the key point detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs.
  • the electronic device After the electronic device obtains the to-be-detected image that requires key-point detection, it calls the pre-trained key-point detection model from the local or server, and inputs the acquired to-be-detected image into the key-point detection model to obtain the key point
  • the key point location information and key point attribution information output by the detection model are used to describe all the key points of the human body in the image to be detected, and the key point attribution information is used to describe the human body to which each key point of the human body belongs.
  • the key point location information describes the existence of the human key point A and the human key point B in the image to be detected
  • the key point attribution information describes that the human key point A belongs to the human body A, and the human key point B belongs to the human body B.
  • the electronic device identifies a set of human body key points belonging to the same human body based on the key point location information and the key point attribution information.
  • the key point location information describes all the key points of the human body in the image to be detected
  • the key point attribution information describes the human body to which each key point of the human body belongs.
  • the electronic device determines the target human body according to the identified human body key point set, and classifies the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body.
  • the electronic device determines the target human body based on the identified set of key points of the human body. For example, when there is a set of human body key points, the human body corresponding to the human body key point set is directly determined as the target human body; when there are multiple human body key point sets, one of the human bodies is determined according to the preset target decision strategy The human body corresponding to the key point set is regarded as the target human body.
  • the electronic device After determining the target human body, the electronic device further classifies the target human body in the shooting scene according to the set of human body key points corresponding to the target human body and the preset human body classification strategy to obtain the human body type of the target human body.
  • the classification of human body types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.
  • the head length and head width of the target human body are obtained according to the head key points, and the larger of the head length and head width is obtained.
  • the ratio of the value to the length of the bounding box of the portrait If the ratio is in the first ratio interval, the target human body is determined to be the first human body type, if the ratio is in the second ratio interval, the target human body is determined to be the second human body type, if the ratio is in the third If the ratio is in the fourth ratio interval, the target human body is determined to be the third human body type, and if the ratio is in the fourth ratio interval, the target human body is determined to be the fourth human body type; or,
  • the target human body is determined to be the fourth human body type; or,
  • the target human body is determined to be the third body type; or,
  • the target human body is determined to be the second human body type.
  • an optional target human body classification strategy is provided. First, identify whether the detected human body key points include only the head key points. If only the head key points are included, it means that there may be other key points that have not been detected. come out.
  • the electronic device further obtains the head length and head width of the target human body according to the key points of the head. Then, the electronic device determines the larger value of the head length and the head width, and calculates the larger value and the length of the bounding box of the portrait (where the length of the bounding box of the portrait is the length of the side of the vertical axis). Ratio, and then divide the human body type according to the calculated ratio. For example, if the head length is greater than the head width, the electronic device calculates the ratio of the head length to the portrait bounding box length. Correspondingly, if the head width is greater than the head length, the electronic device calculates the head width and the portrait bounding box length Ratio.
  • the ratio is in the first ratio interval, it is determined that the target human body is the first human body type
  • the ratio is in the second ratio interval, it is determined that the target human body is the second human body type
  • the target human body is determined to be the third human body type
  • the ratio is in the fourth ratio interval, it is determined that the target human body is the fourth human body type.
  • each ratio interval can be divided by a person of ordinary skill in the art according to actual needs, and this application does not specifically limit this.
  • the first ratio interval is configured as (1/4, + ⁇ ], that is, when the ratio of the larger of the head length and head width of the target human body to the length of the bounding box of the portrait is greater than 1/4, it is determined
  • the target human body in the shooting scene is the first human body type, and it is determined that the user wants to take a close-up of the face of the aforementioned target human body at this time;
  • the second ratio interval is configured as (1/6, 1/4), that is, when the ratio of the larger value of the head length and head width of the target human body to the length of the bounding box of the portrait is greater than 1/6, but not greater than 1/
  • the target human body in the shooting scene is the second human body type, and it is determined that the user wants to photograph the bust of the aforementioned target human body at this time;
  • the third ratio interval is configured as (1/9, 1/6), that is, when the ratio of the larger value of the head length and head width of the target human body to the length of the bounding box of the portrait is greater than 1/9, but not greater than 1/
  • the target human body in the shooting scene is the third human body type, and it is determined that the user wants to take a seven-part portrait of the aforementioned target human body at this time;
  • the fourth ratio interval is configured as (- ⁇ ,1/9], that is, when the ratio of the larger value of the head length and head width of the target human body to the length of the portrait bounding box is not greater than 1/9, it is determined that the shooting scene
  • the target human body is the fourth human body type, and it is determined that the user wants to take a full-length image of the aforementioned target human body at this time.
  • the human body type is classified according to the key points of other parts.
  • the target human body in the shooting scene is the fourth human body type, and it is determined that the user wants to take a full-length image of the aforementioned target human body at this time;
  • the target human body is the third human body type, and it is determined that the user wants to take a seven-part portrait of the aforementioned target human body at this time;
  • the target human body is determined to be the second human body type, and it is determined that the user wants to take the bust of the aforementioned target human body at this time.
  • the electronic device determines the positioning point and composition type corresponding to the target human body according to the human body type and the set of human body key points corresponding to the target human body.
  • the positioning point is used to represent the position of the human body.
  • the electronic device classifies and obtains the human body type, it further determines the positioning point corresponding to the target human body according to the human body type and the aforementioned set of key points of the target human body according to the preset positioning point decision strategy.
  • the preset composition type decision strategy determines the composition type corresponding to the aforementioned target human body.
  • composition types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.
  • composition types classified in the embodiment of the present application include a facial close-up composition and a full-body composition.
  • the electronic device recognizes whether the head orientation of the target human body is forward or lateral according to the head key points in the human body key point set of the target human body;
  • the geometric center point of the bounding box of the portrait is determined as the anchor point, and the composition type is determined to be the first composition type;
  • the head orientation of the target human body is lateral and the human body type is the first human body type
  • multiple symmetric head key points among the head key points are identified, and the geometric center points of the multiple symmetric head key points are determined and positioned Point and confirm that the composition type is the first composition type; or,
  • the geometric center point of the bounding box of the portrait is determined as the anchor point, and the composition type is determined to be the second composition type;
  • the human body type is the second human body type, the third human body type, or the fourth human body type
  • multiple symmetrical head key points among the head key points are identified, and multiple The geometric center point of the key point of the symmetrical head determines the positioning point, and the composition type is determined to be the second composition type; or,
  • the human body type is the third human body type or the fourth human body type
  • the average value of the ordinate of the key points of the head is determined as the ordinate of the positioning point, and the geometric center of the bounding box of the portrait is determined.
  • the abscissa of the point is determined as the abscissa of the anchor point
  • the composition type is determined as the second composition type.
  • This application provides an optional anchor point decision strategy and composition type decision strategy.
  • the electronic device first recognizes whether the head orientation of the target human body is forward or lateral according to the head key points in the human body key point set of the target human body.
  • the electronic device can obtain the abscissa of the key point of the eye, the abscissa of the key point of the nose, and the abscissa of the key point of the mouth, and then obtain the average value of the abscissa, if the average value is located at the left or right of the bounding box of the portrait Within 1/4 of the area, the head is judged to be lateral, otherwise it is positive.
  • the positioning point and composition type are further determined according to the recognized head orientation and human body type.
  • the composition type is determined to be the first composition type (ie, the face close-up type composition) ).
  • symmetric head key points among the head key points are identified, and the geometric center points of the multiple symmetric head key points are determined and positioned Point, and confirm that the composition type is the first composition type.
  • symmetrical head key points refer to head key points that appear in pairs and are all detected, such as left eye key point and right eye key point, left ear key point and right ear key point, etc.
  • the geometric center points of the multiple symmetric head key points are the geometric center points of the polygon obtained by connecting the multiple symmetric head key points.
  • the geometric center point of the portrait bounding box is determined as the positioning point, and the composition type is determined to be the second composition type (ie, the whole body type composition).
  • the human body type is the second human body type, the third human body type, or the fourth human body type
  • multiple symmetrical head key points among the head key points are identified, and multiple The geometric center point of the key point of the symmetrical head determines the positioning point, and the composition type is determined to be the second composition type; or,
  • the human body type is the third human body type or the fourth human body type
  • the average value of the ordinate of the key points of the head is determined as the ordinate of the positioning point, and the geometric center of the bounding box of the portrait is determined.
  • the abscissa of the point is determined as the abscissa of the anchor point
  • the composition type is determined as the second composition type.
  • the electronic device determines the composition point corresponding to the target human body according to the positioning point and the composition type.
  • a plurality of optional candidate composition points are preset.
  • the electronic device may further determine the currently selectable candidate composition points according to the determined composition type, and then determine the composition point corresponding to the target human body from the currently selectable candidate composition points according to the positioning point.
  • the candidate composition point closest to the anchor point is selected from the candidate composition points corresponding to the first composition type, and the candidate composition point is determined as the composition point;
  • the candidate composition point closest to the anchor point is selected from the candidate composition points corresponding to the second composition type, and determined as the composition point.
  • a plurality of optional candidate composition points are preset in the embodiment of the present application, which are divided into two parts, which are the candidate composition points suitable for horizontal screen shooting and the candidate composition points suitable for vertical screen shooting.
  • candidate composition points suitable for landscape shooting include the center of the image, the midpoint of the upper third line, and the intersection of the upper third line and other thirds, which are suitable for portrait shooting
  • candidate composition points include the center of the image and the midpoint of the upper three-pointer.
  • a plurality of optional candidate composition points are also preset in the embodiment of this application, which are also divided into two parts, which are the candidate composition points suitable for landscape shooting and the candidate composition points suitable for vertical shooting.
  • candidate composition points suitable for screen shooting where candidate composition points for landscape shooting include the image center, the four intersection points of the top/bottom three-point line and the left/right three-point line, and the top/bottom three-point line and the left /Four midpoints of the right third line.
  • the candidate composition points suitable for portrait shooting include the center of the image and the midpoint of the upper third line.
  • the electronic device first recognizes that the current shooting mode is portrait mode or landscape mode, and then determines the closest to the anchor point from the determined composition type among the candidate composition points corresponding to the current shooting mode Candidate composition points are used as composition points corresponding to the target human body.
  • the electronic device when the positioning point does not match the composition point, the electronic device outputs prompt information for instructing to adjust the shooting posture of the electronic device.
  • matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance.
  • This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.
  • the electronic device determines in real time whether the positioning point of the target human body in the shooting scene matches the composition point, and if it does not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device, so that the positioning point of the target human body in the shooting scene matches the composition point.
  • the composition points are matched to obtain a better composition.
  • the determined positioning point is the geometric center point of a plurality of symmetrical head key points of the human head
  • the determined composition point is the intersection of the upper three-point line and the right three-point line.
  • the electronic device superimposes the upper/lower/left/right three-pointers, as well as the determined positioning point and composition point on the preview image collected in real time, and uses the arrow from the positioning point to the composition point as the prompt information to guide the user to adjust
  • the shooting posture of the electronic device makes the positioning point and the composition point in the real-time preview image match, as shown in FIG. 9.
  • the electronic device photographs the shooting scene to obtain a photographed image.
  • the electronic device determines that a better composition can be obtained at this time, that is, the shooting scene is photographed, so as to obtain a photographed image of the shooting scene.
  • the application also provides an image processing device.
  • FIG. 10 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the application.
  • the image processing device is applied to electronic equipment.
  • the image processing device includes an image acquisition module 301, an image detection module 302, and a human body recognition module 303, as follows:
  • the image acquisition module 301 is used to acquire the image to be detected that requires key point detection;
  • the image detection module 302 is configured to call a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;
  • the human body recognition module 303 is used to identify a set of human body key points belonging to the same human body according to the key point location information and the key point attribution information.
  • the feature prediction network includes a location branch and an attribution branch.
  • the feature prediction network is called to perform key point detection on image features, and when key point location information and key point attribution information are obtained, the image detection module 302 is used to:
  • the attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information.
  • the location branch includes a convolution unit with a convolution kernel size of 1*1.
  • the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module.
  • the image detection module 302 is used for:
  • the feature optimization submodule includes a convolution unit with a convolution kernel size of 1*1
  • the output submodule includes a convolution unit with a convolution kernel size of 1*1.
  • the image processing device provided in this application further includes a model training module, which is used to:
  • Fusion key point location loss and key point attribution loss are fused to obtain the fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
  • the model training module when obtaining the attribution loss of the key point according to the predicted key point location information and the predicted key point attribution information, the model training module is used to:
  • the image acquisition module 301 when acquiring an image to be detected that requires key point detection, the image acquisition module 301 is used to:
  • the electronic device When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as the image to be detected;
  • the image processing device provided in this application also includes a composition prompting module, which is used after the human body recognition module 303 identifies a set of human body key points belonging to the same human body according to the key point position information and key point attribution information:
  • a prompt message for instructing to adjust the shooting posture of the electronic device is output.
  • the electronic device includes a processor 401 and a memory 402.
  • the processor 401 in the embodiment of the present application is a general-purpose processor, such as an ARM architecture processor.
  • a computer program is stored in the memory 402, which may be a high-speed random access memory or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 402 may also include a memory controller to provide the processor 401 with access to the computer program in the memory 402 to implement the following functions:
  • a set of human body key points belonging to the same human body is identified.
  • the key point detection model includes a feature extraction network and a feature prediction network.
  • the processor 401 is used to execute:
  • the feature prediction network includes a location branch and an attribution branch.
  • the feature prediction network is called to perform key point detection on image features, and when key point location information and key point attribution information are obtained, the processor 401 is configured to execute:
  • the attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information.
  • the location branch includes a convolution unit with a convolution kernel size of 1*1.
  • the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module.
  • the processor 401 Used to execute:
  • the feature optimization submodule includes a convolution unit with a convolution kernel size of 1*1
  • the output submodule includes a convolution unit with a convolution kernel size of 1*1.
  • the processor 401 before acquiring the image to be detected that requires key point detection, the processor 401 is further configured to execute:
  • Fusion key point location loss and key point attribution loss are fused to obtain the fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
  • the processor 401 when acquiring the attribution loss of a key point according to the predicted key point location information and the predicted key point attribution information, the processor 401 is configured to execute:
  • the processor 401 when acquiring a to-be-detected image that requires key point detection, the processor 401 is configured to execute:
  • the electronic device When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as the image to be detected;
  • the processor 401 is further configured to execute:
  • a prompt message for instructing to adjust the shooting posture of the electronic device is output.
  • the electronic device provided in this embodiment of the application belongs to the same concept as the image capturing method in the above embodiment, and any method provided in the image capturing method embodiment can be run on the electronic device.
  • the specific implementation process is detailed. See the embodiment of the feature extraction method, which will not be repeated here.
  • the electronic device provided in this embodiment of the application belongs to the same concept as the image processing method in the above embodiment. Any method provided in the image processing method embodiment can be run on the electronic device. The specific implementation process is detailed. See the embodiment of the image processing method, which will not be repeated here.
  • the computer program may be stored in a computer readable storage medium, such as stored in the memory of an electronic device, and executed by a processor in the electronic device, and may include embodiments such as image processing methods during execution.
  • the storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Geometry (AREA)
  • Biophysics (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in embodiments of the present application are an image processing method and apparatus, a storage medium, and an electronic device: obtaining an image to be detected requiring key point detection; calling a pre-trained key point detection model to perform key point detection on said image to obtain key point position information and key point attribution information; and identifying, according to the key point position information and the key point attribution information, a human body key point set belonging to a same human body.

Description

图像处理方法、装置、存储介质及电子设备Image processing method, device, storage medium and electronic equipment
本申请要求于2020年03月06日提交中国专利局、申请号为202010152690.8、发明名称为“图像处理方法、装置、存储介质及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office, the application number is 202010152690.8, and the invention title is "Image processing methods, devices, storage media and electronic equipment" on March 6, 2020. The entire contents are incorporated by reference. In this application.
技术领域Technical field
本申请涉及图像处理技术领域,具体涉及一种图像处理方法、装置、存储介质及电子设备。This application relates to the field of image processing technology, and in particular to an image processing method, device, storage medium and electronic equipment.
背景技术Background technique
目前,关键点检测主要为对人体关键点的检测,也即是检测人体的一些关键点,如眼睛、鼻子、手肘、肩膀等,并将它们按照肢体顺序依次连接,通过这些人体关键点来描述人体。At present, the key point detection is mainly to detect the key points of the human body, that is, to detect some key points of the human body, such as eyes, nose, elbows, shoulders, etc., and connect them in order according to the order of the limbs. Describe the human body.
发明内容Summary of the invention
本申请实施例提供了一种图像处理方法、装置、存储介质及电子设备,能够提高关键点检测的效率。The embodiments of the present application provide an image processing method, device, storage medium, and electronic equipment, which can improve the efficiency of key point detection.
本申请实施例提供的图像处理方法,包括:The image processing method provided by the embodiment of the application includes:
获取需要进行关键点检测的待检测图像;Obtain the image to be detected that requires key point detection;
调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;Calling a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;
根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合。According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.
本申请实施例提供的图像处理装置,包括:The image processing device provided by the embodiment of the application includes:
图像获取模块,用于获取需要进行关键点检测的待检测图像;The image acquisition module is used to acquire the to-be-detected image that requires key point detection;
图像检测模块,用于调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;The image detection module is used to call a pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information;
人体识别模块,用于根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合。The human body recognition module is used to identify a set of human body key points belonging to the same human body based on the key point location information and the key point attribution information.
本申请实施例提供的存储介质,其上存储有计算机程序,当所述计算机程序被处理器加载时执行如本申请提供的图像处理方法。The storage medium provided by the embodiment of the present application has a computer program stored thereon, and when the computer program is loaded by a processor, the image processing method as provided in the present application is executed.
本申请实施例提供的电子设备,包括处理器和存储器,所述存储器存有计算机程序,所述处理器通过加载所述计算机程序,用于执行本申请提供的图像处理方法。The electronic device provided by the embodiment of the present application includes a processor and a memory, the memory stores a computer program, and the processor loads the computer program to execute the image processing method provided by the present application.
附图说明Description of the drawings
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application, the following will briefly introduce the drawings that need to be used in the description of the embodiments. Obviously, the drawings in the following description are only some embodiments of the present application. For those skilled in the art, other drawings can be obtained based on these drawings without creative work.
图1为本申请实施例提供的图像处理方法的流程示意图。FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the application.
图2是本申请实施例提供的关键点检测界面的示例图。Fig. 2 is an example diagram of a key point detection interface provided by an embodiment of the present application.
图3是本申请实施例提供的选择子界面的示例图。Fig. 3 is an example diagram of a selection sub-interface provided by an embodiment of the present application.
图4是本申请实施例提供的关键点检测模型的结构示意图。Fig. 4 is a schematic structural diagram of a key point detection model provided by an embodiment of the present application.
图5是本申请实施例中特征预测网络的结构示意图。Fig. 5 is a schematic structural diagram of a feature prediction network in an embodiment of the present application.
图6是本申请实施例中归属分支的结构示意图。Fig. 6 is a schematic structural diagram of a home branch in an embodiment of the present application.
图7是本申请实施例中提供的图像处理方法的另一流程示意图。FIG. 7 is a schematic diagram of another flow of the image processing method provided in an embodiment of the present application.
图8是本申请实施例中输出提示信息的示例图。Fig. 8 is an example diagram of outputting prompt information in an embodiment of the present application.
图9是本申请实施例中定位点和构图点匹配的示例图。Fig. 9 is a diagram showing an example of matching of positioning points and composition points in an embodiment of the present application.
图10是本申请实施例提供的图像处理装置的结构示意图。Fig. 10 is a schematic structural diagram of an image processing device provided by an embodiment of the present application.
图11是本申请实施例提供的电子设备的结构示意图。FIG. 11 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
具体实施方式Detailed ways
请参照图式,其中相同的组件符号代表相同的组件,本申请的原理是以实施在一适当的运算环境中来举例说明。以下的说明是通过所例示的本申请具体实施例,其不应被视为限制本申请未在此详述的其它具体实施例。Please refer to the drawings, where the same component symbols represent the same components, and the principle of the present application is implemented in an appropriate computing environment for illustration. The following description is based on the specific embodiments of the present application exemplified, which should not be construed as limiting other specific embodiments of the present application that are not described in detail herein.
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个综合技术,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式做出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。Artificial Intelligence (AI) is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results. In other words, artificial intelligence is a comprehensive technology of computer science, which attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence. Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
人工智能技术是一门综合学科,涉及领域广泛,既有硬件层面的技术也有软件层面的技术。人工智能基础技术一般包括如传感器、专用人工智能芯片、云计算、分布式存储、大数据处理技术、操作/交互系统、机电一体化等技术。人工智能软件技术主要包括计算机视觉技术、语音处理技术、自然语言处理技术以及机器学习/深度学习等几大方向。Artificial intelligence technology is a comprehensive discipline, covering a wide range of fields, including both hardware-level technology and software-level technology. Basic artificial intelligence technologies generally include technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, and mechatronics. Artificial intelligence software technology mainly includes computer vision technology, speech processing technology, natural language processing technology, and machine learning/deep learning.
其中,机器学习(Machine Learning,ML)是一门多领域交叉学科,涉及概率论、统计学、逼近论、凸分析、算法复杂度理论等多门学科。专门研究计算机怎样模拟或实现人类的学习行为,以获取新的知识或技能,重新组织已有的知识结构使之不断改善自身的性能。机器学习是人工智能的核心,是使计算机具有智能的根本途径,其应用遍及人工智能的各个领域。机器学习和深度学习通常包括人工神经网络、置信网络、强化学习、迁移学习、归纳学习等技术。Among them, Machine Learning (ML) is a multi-field interdisciplinary subject, involving probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory and other subjects. Specializing in the study of how computers simulate or realize human learning behaviors in order to acquire new knowledge or skills, and reorganize the existing knowledge structure to continuously improve its own performance. Machine learning is the core of artificial intelligence, the fundamental way to make computers intelligent, and its applications cover all fields of artificial intelligence. Machine learning and deep learning usually include artificial neural networks, belief networks, reinforcement learning, transfer learning, inductive learning and other technologies.
本申请实施例提供的方案涉及人工智能的机器学习技术,具体通过如下实施例进行说明:The solutions provided in the embodiments of the present application involve artificial intelligence machine learning technology, which is specifically illustrated by the following embodiments:
本申请实施例提供一种图像处理方法、图像处理装置、存储介质以及电子设备,其中,该图像处理方法的执行主体可以是本申请实施例中提供的图像处理装置,或者集成了该图像处理装置的电子设备,其中该图像处理装置可以采用硬件或软件的方式实现。其中,电子设备可以是智能手机、平板电脑、掌上电脑、笔记本电脑、或者台式电脑等配置有处理器(包括但不限于通用处理器、定制化处理器等)而具有处理能力的设备。The embodiments of the present application provide an image processing method, an image processing device, a storage medium, and electronic equipment, wherein the execution subject of the image processing method may be the image processing device provided in the embodiment of the application, or integrate the image processing device The image processing device can be implemented in hardware or software. Among them, the electronic device may be a device equipped with a processor (including but not limited to a general-purpose processor, a customized processor, etc.) and having processing capabilities, such as a smart phone, a tablet computer, a palmtop computer, a notebook computer, or a desktop computer.
本申请提供一种图像处理方法,包括:This application provides an image processing method, including:
获取需要进行关键点检测的待检测图像;Obtain the image to be detected that requires key point detection;
调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;Calling a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;
根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合。According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.
可选地,在一实施例中,所述关键点检测模型包括特征提取网络和特征预测网络,所述调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息,包括:Optionally, in an embodiment, the key point detection model includes a feature extraction network and a feature prediction network, and the pre-trained key point detection model is invoked to perform key point detection on the image to be detected to obtain key point positions Information and key point attribution information, including:
调用所述特征提取网络提取得到所述待检测图像的图像特征;Calling the feature extraction network to extract the image features of the image to be detected;
调用所述特征预测网络对所述图像特征进行关键点检测,得到关键点位置信息和所述关键点归属信息。Invoke the feature prediction network to perform key point detection on the image feature to obtain key point location information and the key point attribution information.
可选地,在一实施例中,所述特征预测网络包括位置分支和归属分支,所述调用所述特征预测网络对所述图像特征进行关键点检测,得到关键点位置信息和所述关键点归属信息,包括:Optionally, in an embodiment, the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on the image feature to obtain key point location information and the key point Attribution information, including:
调用所述位置分支对所述图像特征进行关键点位置检测,得到所述关键点位置信息;Calling the location branch to perform key point location detection on the image feature to obtain the key point location information;
调用所述归属分支根据所述图像特征以及所述关键点位置信息进行关键点归属检测,得到所述关键点归属信息。The attribution branch is called to perform key point attribution detection based on the image feature and the key point location information, to obtain the key point attribution information.
可选地,在一实施例中,所述位置分支包括卷积核尺寸为1*1的卷积单元。Optionally, in an embodiment, the location branch includes a convolution unit with a convolution kernel size of 1*1.
可选地,在一实施例中,所述归属分支包括特征优化子模块、融合子模块以及输出子模块,所述调用所述归属分支根据所述图像特征以及所述关键点位置信息进行关键点归属检测,得到所述关键点归属信息,包括:Optionally, in an embodiment, the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module, and the attribution branch is called to perform key points based on the image features and the key point location information. Attribution detection to obtain attribution information of the key point includes:
调用所述特征优化子模块对所述图像特征进行优化处理,得到优化图像特征;Calling the feature optimization sub-module to perform optimization processing on the image features to obtain optimized image features;
调用所述融合子模块融合所述优化图像特征以及所述关键点位置信息得到融合特征;Calling the fusion sub-module to fuse the optimized image feature and the key point position information to obtain a fusion feature;
调用所述输出子模块对所述融合特征进行关键点位置检测,得到所述关键点位置信息。Invoke the output sub-module to perform key point position detection on the fusion feature to obtain the key point position information.
可选地,在一实施例中,所述特征优化子模块包括卷积核尺寸为1*1的卷积单元,所述输出子模块包括卷积核尺寸为1*1的卷积单元。Optionally, in an embodiment, the feature optimization submodule includes a convolution unit with a convolution kernel size of 1*1, and the output submodule includes a convolution unit with a convolution kernel size of 1*1.
可选地,在一实施例中,所述获取需要进行关键点检测的待检测图像之前,还包括:Optionally, in an embodiment, before acquiring the image to be detected that requires key point detection, the method further includes:
获取样本图像以及对应所述样本图像的样本关键点位置信息,并构建所述关键点检测模型;Acquiring a sample image and sample key point position information corresponding to the sample image, and constructing the key point detection model;
调用所述关键点检测模型对所述样本图像进行关键点检测,得到预测关键点位置信息和预测关键点归属信息;Calling the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;
根据所述样本关键点位置信息和所述预测关键点位置信息获取关键点位置损失,以及根据所述预测关键点位置信息和所述预测关键点归属信息获取关键点归属损失;Acquiring key point location loss according to the sample key point location information and the predicted key point location information, and acquiring key point attribution loss according to the predicted key point location information and the predicted key point attribution information;
融合所述关键点位置损失以及所述关键点归属损失得到融合损失,并根据所述融合损失调整所述关键点检测模型的参数。The key point position loss and the key point attribution loss are fused to obtain a fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
可选地,在一实施例中,所述根据所述样本关键点位置信息和所述预测关键点归属信息获取关键点归属损失,包括:Optionally, in an embodiment, the acquiring key point attribution loss based on the sample key point location information and the predicted key point attribution information includes:
根据所述预测关键点位置信息进行关键点聚类,得到归属不同人体的多个人体关键点集合;Perform key point clustering according to the predicted key point location information to obtain multiple human body key point sets belonging to different human bodies;
根据所述多个人体关键点集合以及所述预测关键点归属信息获取所述关键点归属损失。Acquire the attribution loss of the key point according to the multiple sets of human body key points and the predicted key point attribution information.
可选地,在一实施例中,所述获取需要进行关键点检测的待检测图像,包括:Optionally, in an embodiment, the acquiring the image to be detected that requires key point detection includes:
当电子设备使能拍摄功能时,获取拍摄场景的预览图像,并将所述预览图像作为待检测图像;When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as an image to be detected;
所述根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合之后,还包括:After identifying a set of human body key points belonging to the same human body according to the key point location information and the key point attribution information, the method further includes:
根据识别出的人体关键点集合确定目标人体,并根据目标人体对应的人体关键点集合进行人体分类,得到所述目标人体的人体类型;Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;
根据所述人体类型以及所述目标人体对应的人体关键点集合确定对应所述目标人体的定位点以及构图类型;Determining a positioning point and a composition type corresponding to the target human body according to the human body type and a set of human body key points corresponding to the target human body;
根据所述定位点以及所述构图类型确定对应所述目标人体的构图点;Determining a composition point corresponding to the target human body according to the positioning point and the composition type;
当所述定位点与所述构图点不匹配时,输出用于指示调整所述电子设备拍摄姿态的提示信息。When the positioning point does not match the composition point, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
请参照图1,图1为本申请实施例提供的图像处理方法的流程示意图,本申请实施例提供的图像处理方法的流程可以如下:Please refer to FIG. 1. FIG. 1 is a schematic flowchart of an image processing method provided by an embodiment of the application. The image processing method provided by an embodiment of the application may be as follows:
在101中,获取需要进行关键点检测的待检测图像。In 101, obtain the image to be detected that requires key point detection.
应当说明的是,本申请中所提及的关键点检测主要为对人体关键点的检测,也即是检测人体的一些关键点,如眼睛、鼻子、手肘、肩膀等,并将它们按照肢体顺序依次连接,通过这些人体关键点来描述人体。It should be noted that the key point detection mentioned in this application is mainly the detection of key points of the human body, that is, the detection of some key points of the human body, such as eyes, nose, elbows, shoulders, etc., Connect in sequence and describe the human body through these key points of the human body.
其中,电子设备可以接收用户输入的关键点检测请求,并根据该关键点检测请求获取需要进行关键点检测的待检测图像,还可以自动识别需要进行关键点检测的待检测图像,并获取到该待检测图像以用于进行关键点检测。Among them, the electronic device can receive the key point detection request input by the user, and obtain the to-be-detected image that requires key-point detection according to the key-point detection request, and can also automatically identify the to-be-detected image that needs to be key-point detection, and obtain the The image to be detected is used for key point detection.
比如,电子设备可以通过包括请求输入接口的关键点检测界面接收输入的关键点检测请求,如图2所示,该请求输入接口可以为输入框的形式,用户可以在该输入框形式的请求输入接口中键入需要进行关键点检测的图像的标识信息,并输入确认信息(如直接按下键盘的回车键)以输入关键点检测请求,该关键点检测请求携带有需要进行关键点检测的图像的标识信息。相应的,电子设备即可根据接收到的关键点检测请求中的标识信息获取到需要进行关键点检测的图像,记为待检测图像。For example, the electronic device may receive the input key point detection request through the key point detection interface including the request input interface, as shown in Figure 2, the request input interface may be in the form of an input box, and the user can request input in the form of the input box Enter the identification information of the image that needs key point detection in the interface, and enter the confirmation information (for example, directly press the Enter key on the keyboard) to input the key point detection request, which carries the image that needs key point detection的identification information. Correspondingly, the electronic device can obtain the image that needs to be detected for the key point according to the identification information in the received key point detection request, and record it as the image to be detected.
又比如,在图2所述的关键点检测界面中,还包括“打开”控件,一方面,电子设备在侦测到该打开控件触发时,将在关键点检测界面之上叠加显示选择子界面(如图3所示),该选择子界面向用户提供可进行关键点检测的图像的缩略图,如图像A、图像B、图像C、图像D、图像E、图像F等图像的 缩略图,供用户查找并选中需要进行关键点检测的图像的缩略图;另一方面,用户可以在选中需要进行关键点检测的图像的缩略图之后,触发选择子界面提供的确认控件,以向电子设备输入关键点检测请求,该关键点检测请求与用户选中的图像的缩略图相关联,指示电子设备将用户选中的图像作为需要进行关键点检测的待检测图像。For another example, the key point detection interface described in Figure 2 also includes an "open" control. On the one hand, when the electronic device detects that the open control is triggered, it will superimpose the selection sub-interface on the key point detection interface. (As shown in Figure 3), the selection sub-interface provides the user with thumbnails of images that can be used for key point detection, such as image A, image B, image C, image D, image E, image F and other image thumbnails. For the user to find and select the thumbnail of the image that needs key point detection; on the other hand, the user can trigger the confirmation control provided by the selection sub-interface after selecting the thumbnail of the image that needs key point detection to input to the electronic device A key point detection request, the key point detection request is associated with the thumbnail of the image selected by the user, and instructs the electronic device to use the image selected by the user as the image to be detected that requires key point detection.
在102中,调用预训练的关键点检测模型对待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息。In 102, the pre-trained key point detection model is called to perform key point detection on the image to be detected to obtain key point location information and key point attribution information.
示例性的,本申请中采用机器学习方法预先训练有关键点检测模型。其中,该关键点检测模型被配置为同时预测输入图像中所有的人体关键点及其归属的人体,其可以设置在电子设备本地,也可以设置在服务器。此外,本申请中对关键点检测模型的构型不做具体限制,可由本领域普通技术人员根据实际需要选择。Exemplarily, a machine learning method is used in this application to pre-train a key point detection model. Wherein, the key point detection model is configured to simultaneously predict all the key points of the human body in the input image and the human body to which they belong, which can be set locally in the electronic device or in the server. In addition, the configuration of the key point detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs.
相应的,电子设备在获取到需要进行关键点检测的待检测图像之后,从本地或服务器调用预训练的关键点检测模型,并将获取到的待检测图像输入该关键点检测模型,得到关键点检测模型输出的关键点位置信息和关键点归属信息。其中,关键点位置信息用于描述待检测图像中存在的所有人体关键点,关键点归属信息用于描述每一人体关键点归属的人体。Correspondingly, after the electronic device obtains the to-be-detected image that requires key-point detection, it calls the pre-trained key-point detection model from the local or server, and inputs the acquired to-be-detected image into the key-point detection model to obtain the key point The key point location information and key point attribution information output by the detection model. Among them, the key point location information is used to describe all the key points of the human body in the image to be detected, and the key point attribution information is used to describe the human body to which each key point of the human body belongs.
比如,关键点位置信息描述了待检测图像中存在人体关键点A和人体关键点B,关键点归属信息描述了人体关键点A归属于人体甲,人体关键点B归属于人体乙。For example, the key point location information describes the existence of the human key point A and the human key point B in the image to be detected, and the key point attribution information describes that the human key point A belongs to the human body A, and the human key point B belongs to the human body B.
在103中,根据关键点位置信息以及关键点归属信息识别出归属于同一人体的人体关键点集合。In 103, a set of human body key points belonging to the same human body is identified according to key point location information and key point attribution information.
如上所述,关键点位置信息描述了待检测图像中存在的所有人体关键点,关键点归属信息描述了每一人体关键点归属的人体,在得到对应待检测图像的关键点位置信息和关键点归属信息之后,电子设备即可根据关键点位置信息以及关键点归属信息识别出归属于同一人体的关键点集合,由此,即可同时实现对多人体的关键点检测。As mentioned above, the key point location information describes all the key points of the human body in the image to be detected, and the key point attribution information describes the human body to which each key point of the human body belongs. After obtaining the key point location information and key points corresponding to the image to be detected After the attribution information, the electronic device can identify the key point set belonging to the same human body according to the key point location information and the key point attribution information, and thus, can simultaneously realize the key point detection of multiple human bodies.
本申请通过获取需要进行关键点检测的待检测图像;调用预训练的关键点检测模型对待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;根据关键点位置信息以及关键点归属信息识别出归属于同一人体的人体关键点集合。相较于相关技术,本申请无需人体检测算法作为前置支撑,可同时检测图像中所有人体的关键点,从而达到提高关键点检测效率的目的。This application obtains the image to be detected that requires key point detection; calls the pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information; according to key point location information and key point attribution The information identifies a set of key points of the human body belonging to the same human body. Compared with related technologies, the present application does not require a human body detection algorithm as a front support, and can detect all key points of the human body in the image at the same time, thereby achieving the purpose of improving the efficiency of key point detection.
在一实施例中,关键点检测模型包括特征提取网络和特征预测网络,调用预训练的关键点检测模型对待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息,包括:In one embodiment, the key point detection model includes a feature extraction network and a feature prediction network. The pre-trained key point detection model is called to perform key point detection on the image to be detected to obtain key point location information and key point attribution information, including:
(1)调用特征提取网络提取得到待检测图像的图像特征;(1) Call the feature extraction network to extract the image features of the image to be detected;
(2)调用特征预测网络对图像特征进行关键点检测,得到关键点位置信息和关键点归属信息。(2) Call the feature prediction network to detect the key points of the image features, and obtain the key point location information and key point attribution information.
请参照图4,在本申请实施例中,关键点检测模型由两部分组成,分别为用于特征提取的特征提取网络,用于关键点检测的特征预测网络。其中,特征提取网络可以是任意已知的特征提取网络,比如VGG、MobileNet以及ResNet等,若使用较深层次的网络模型如VGG和ResNet,则会增加模型的运算量,但能得到更高的检测精度,若使用轻量化的网络模型如MobileNet,则会损失一定的检测精度,但能获取更快的检测速度,具体可由本领域普通技术人员根据实际需要进行选择,本申请对此不作具体限制。Referring to FIG. 4, in the embodiment of the present application, the key point detection model is composed of two parts, which are a feature extraction network used for feature extraction and a feature prediction network used for key point detection. Among them, the feature extraction network can be any known feature extraction network, such as VGG, MobileNet, and ResNet. If a deeper network model such as VGG and ResNet is used, the computational complexity of the model will increase, but higher Detection accuracy. If a lightweight network model such as MobileNet is used, a certain detection accuracy will be lost, but a faster detection speed can be obtained. The specific selection can be made by a person of ordinary skill in the art according to actual needs. This application does not specifically limit this .
相应的,电子设备在调用关键点检测模型对待检测图像进行关键点检测时,可以首先调用关键点检测模型中的特征提取网络对待检测图像进行特征提取,得到待检测图像的图像特征,然后,再调用关键点检测模型中的特征预测网络根据待检测图像的图像特征进行关键点检测,得到对应待检测图像的关键点位置信息以及关键点归属信息。Correspondingly, when the electronic device calls the key point detection model to perform key point detection on the image to be detected, it can first call the feature extraction network in the key point detection model to perform feature extraction on the image to be detected to obtain the image features of the image to be detected. Call the feature prediction network in the key point detection model to perform key point detection based on the image features of the image to be detected, and obtain key point location information and key point attribution information corresponding to the image to be detected.
比如,关键点位置信息的展现形式为关键点位置热图,其为一个height*width*keypoints的三维矩阵,其中,height和width分别表示高和宽,keypoints表示人体关键点的数量,也就是说,每个人体关键点对应一个height*width的矩阵,矩阵中每个位置的值表示该人体关键点处于这个位置的可能性,值越大表示该人体关键点越有可能处于该位置。比如,可以取关键点位置热图中每个区域中最大值的位置得到对应的人体关键点,其中,可以对关键点位置热图进行最大池化,然后将池化前和池化后的关键点 位置热图对比,取值相等的位置作为人体关键点。For example, the key point location information is displayed in the form of a key point location heat map, which is a three-dimensional matrix of height*width*keypoints, where height and width represent height and width respectively, and keypoints represent the number of key points of the human body, that is to say , Each key point of the human body corresponds to a height*width matrix, the value of each position in the matrix indicates the possibility of the key point of the human body at this position, and the larger the value, the more likely the key point of the human body is at this position. For example, you can take the position of the maximum value in each area of the key point location heat map to obtain the corresponding human body key point. Among them, the key point location heat map can be pooled to the maximum, and then the key points before and after pooling can be pooled. Point position heat map comparison, the position with the same value is taken as the key point of the human body.
另外,关键点归属信息的展现形式可以为整数的人体编号,即在检测的每一人体关键点位置处,特征预测模块均会预测一个整数作为人体编号,人体编号相同的人体关键点即归属于同一人体。In addition, the display form of the key point attribution information can be an integer human body number, that is, at each key point position of the human body detected, the feature prediction module will predict an integer as the human body number, and the human body key points with the same body number belong to The same human body.
在一实施例中,特征预测网络包括位置分支和归属分支,调用特征预测网络对图像特征进行关键点检测,得到关键点位置信息和关键点归属信息,包括:In an embodiment, the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on image features to obtain key point location information and key point attribution information, including:
(1)调用位置分支对图像特征进行关键点位置检测,得到关键点位置信息;(1) Call the location branch to detect the key point position of the image feature, and obtain the key point position information;
(2)调用归属分支根据图像特征以及关键点位置信息进行关键点归属检测,得到关键点归属信息。(2) Invoke the attribution branch to perform key point attribution detection based on image features and key point location information to obtain key point attribution information.
请参照图5,本申请实施例中,对关键点检测任务进行了分割,使用双分支网络来实现关键点检测,其中一条分支网络被配置为检测图像中存在的人体关键点,记为位置分支,另一条分支网络被配置为检测人体关键点所归属的人体,记为归属分支。相应的,电子设备在调用特征预测网络对图像特征进行关键点检测时,可以调用特征预测网络中的位置分支根据图像特征进行关键点位置检测,得到对应待检测图像的关键点位置信息。Please refer to FIG. 5, in this embodiment of the application, the key point detection task is segmented, and a dual branch network is used to realize key point detection. One of the branch networks is configured to detect the key points of the human body in the image, which is recorded as the location branch. , The other branch network is configured to detect the human body to which the key points of the human body belong, and it is recorded as the belonging branch. Correspondingly, when the electronic device calls the feature prediction network to perform key point detection on image features, it can call the location branch in the feature prediction network to perform key point location detection based on the image feature to obtain key point location information corresponding to the image to be detected.
此外,应当说明的是,在高层级语义信息中,人体关键点的归属是关键点位置的更深层次的特征信息,只有知道了准确的关键点位置,才能进行更精确的关键点归属的预测。基于此考虑,电子设备调用特征预测网络中的归属分支根据图像特征以及关键点位置信息进行关键点归属检测,得到对应待检测图像的关键点归属信息。In addition, it should be noted that in the high-level semantic information, the attribution of the key points of the human body is the deeper feature information of the key point location. Only by knowing the accurate key point location can a more accurate prediction of the key point attribution be made. Based on this consideration, the electronic device calls the attribution branch in the feature prediction network to perform key point attribution detection based on image features and key point location information, and obtain key point attribution information corresponding to the image to be detected.
在一实施例中,位置分支包括卷积核尺寸为1*1的卷积单元。In an embodiment, the location branch includes a convolution unit with a convolution kernel size of 1*1.
在一实施例中,归属分支包括特征优化子模块、融合子模块以及输出子模块,调用归属分支根据图像特征以及关键点位置信息进行关键点归属检测,得到关键点归属信息,包括:In an embodiment, the attribution branch includes a feature optimization submodule, a fusion submodule, and an output submodule. The attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information, including:
(1)调用特征优化子模块对图像特征进行优化处理,得到优化图像特征;(1) Call the feature optimization sub-module to optimize the image features to obtain optimized image features;
(2)调用融合子模块融合优化图像特征以及关键点位置信息得到融合特征;(2) Call the fusion sub-module to fuse and optimize image features and key point position information to obtain fusion features;
(3)调用输出子模块对融合特征进行关键点位置检测,得到关键点位置信息。(3) Invoke the output sub-module to detect the key point position of the fusion feature, and obtain the key point position information.
请参照图6,在本申请实施例中,归属分支由三部分组成,分别为用于对图像特征做进一步提取以优化图像特征的特征优化子模块,用于融合优化后的图像特征以及关键点位置信息的融合子模块,用于对融合特征进行关键点归属检测到输出子模块。Please refer to FIG. 6, in the embodiment of the present application, the attribution branch is composed of three parts, which are feature optimization sub-modules used to further extract image features to optimize image features, and are used to fuse optimized image features and key points. The location information fusion sub-module is used to perform key point attribution and detection of the fusion feature to the output sub-module.
相应的,电子设备在调用归属分支根据图像特征以及关键点位置信息进行关键点归属检测时,可以调用归属分支中的特征优化子模块对图像特征进行优化处理,将优化后的图像特征记为优化图像特征;然后,调用融合子模块融合优化图像特征以及关键点位置信息得到融合特征;最后,调用输出子模块对融合特征进行关键点位置检测,得到对应待检测图像的关键点位置信息。Correspondingly, when the electronic device calls the attribution branch to perform key point attribution detection based on image features and key point location information, it can call the feature optimization submodule in the attribution branch to optimize image features, and record the optimized image features as optimized Image features; then, call the fusion sub-module to fuse and optimize the image features and key point location information to obtain the fusion feature; finally, call the output sub-module to detect the key point location of the fusion feature, and obtain the key point location information corresponding to the image to be detected.
其中,特征优化子模块包括1*1的卷积单元,输出子模块包括1*1的卷积单元,融合子模块包括Concat单元。Among them, the feature optimization sub-module includes a 1*1 convolution unit, the output sub-module includes a 1*1 convolution unit, and the fusion sub-module includes a Concat unit.
示例性的,以图像特征为特征图,关键点位置信息关键点位置热图为例,电子设备调用特征优化子模块对图像特征做进一步的卷积运算,实现对图像特征的优化,得到优化图像特征;然后,电子设备调用融合子模块对特征图和关键点位置热图进行通道的连接,实现特征融合,比如,特征图为19维,关键点位置热图为38维,经过融合子模块进行通道的连接之后,得到优化图像特征为19+38=57维的特征图;最后,电子设备调用输出子模块对融合得到的优化图像特征再进行卷积运算,得到对应待检测图像的关键点归属信息。Exemplarily, taking the image feature as the feature map and the key point location information key point location heat map as an example, the electronic device calls the feature optimization sub-module to perform further convolution operations on the image features to optimize the image features and obtain the optimized image Feature; then, the electronic device calls the fusion submodule to connect the feature map and the key point location heat map to achieve feature fusion. For example, the feature map is 19-dimensional, and the key point location heat map is 38-dimensional, which is performed through the fusion sub-module After the channels are connected, an optimized image feature of 19+38=57 dimensions is obtained; finally, the electronic device calls the output sub-module to perform a convolution operation on the optimized image features obtained by the fusion, and obtain the key points corresponding to the image to be detected. information.
在一实施例中,在获取需要进行关键点检测的待检测图像之前,还包括:In an embodiment, before acquiring the image to be detected that requires key point detection, the method further includes:
(1)获取样本图像以及对应样本图像的样本关键点位置信息,并构建关键点检测模型;(1) Obtain sample image and sample key point position information corresponding to the sample image, and build a key point detection model;
(2)调用关键点检测模型对样本图像进行关键点检测,得到预测关键点位置信息和预测关键点归属信息;(2) Invoke the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;
(3)根据样本关键点位置信息和预测关键点位置信息获取关键点位置损失,以及根据预测关键点位置信息和预测关键点归属信息获取关键点归属损失;(3) Obtain key point location loss based on sample key point location information and predicted key point location information, and obtain key point attribution loss based on predicted key point location information and predicted key point attribution information;
(4)融合关键点位置损失以及关键点归属损失得到融合损失,并根据融合损失调整关键点检测模 型的参数。(4) The fusion loss is obtained by fusing the position loss of the key point and the loss of the key point attribution, and the parameters of the key point detection model are adjusted according to the fusion loss.
本申请实施例中还提供一种关键点检测模型的训练方案。The embodiment of the present application also provides a training solution for the key point detection model.
其中,电子设备首先获取样本图像以及对应样本图像的样本关键点位置信息,比如,可以从ImageNet数据集中获取包括人体的图像作为样本图像,并根据样本图像进行标注得到对应的样本关键点位置信息。The electronic device first obtains the sample image and the sample key point position information corresponding to the sample image. For example, an image including the human body can be obtained from the ImageNet data set as the sample image, and the corresponding sample key point position information is obtained by labeling the sample image.
此外,电子设备还构建关键点检测模型,该关键点检测模型的结构可以参照以上实施例中的相关描述,此处不再赘述。In addition, the electronic device also constructs a key point detection model, and the structure of the key point detection model can refer to the relevant description in the above embodiment, which will not be repeated here.
然后,电子设备调用关键点检测模型对样本图像进行关键点检测,相应得到对应样本图像的预测关键点位置信息和预测关键点归属信息,其中,预测关键点位置信息描述了样本图像中存在的所有的人体关键点,预测关键点归属信息描述了每一人体关键点所归属的人体。Then, the electronic device calls the key point detection model to perform key point detection on the sample image, and correspondingly obtains the predicted key point location information and predicted key point attribution information of the corresponding sample image. The predicted key point location information describes all the information in the sample image. The key points of the human body, and the attribution information of the predicted key points describes the human body to which each key point of the human body belongs.
然后,电子设备根据样本关键点位置信息和预测关键点位置信息获取关键点位置损失,该关键点位置损失用于衡量预测关键点位置信息和样本关键点位置信息之间的差异。以样本关键点位置信息和预测关键点位置信息的展现形式均为热图(二者尺寸一致)为例,关键点位置损失可以表示为:Then, the electronic device obtains the key point location loss according to the sample key point location information and the predicted key point location information, and the key point location loss is used to measure the difference between the predicted key point location information and the sample key point location information. Taking sample key point location information and predicted key point location information in the form of heat maps (the two have the same size) as an example, the key point location loss can be expressed as:
Figure PCTCN2021075025-appb-000001
Figure PCTCN2021075025-appb-000001
其中,L 位置表示关键点位置损失,(i,j)表示坐标位置,p(i,j)表示预测关键点位置热图中位置(i,j)的值,g(i,j)表示样本关键点位置热图中位置(i,j)的值,width表示预测关键点位置热图的宽,height表示预测关键点位置热图的高。 Among them, L position represents the loss of the key point position, (i,j) represents the coordinate position, p(i,j) represents the value of the position (i,j) in the heat map of the predicted key point position, and g(i,j) represents the sample The value of the position (i, j) in the heat map of the key point position, width represents the width of the heat map of the predicted key point location, and height represents the height of the heat map of the predicted key point location.
另一方面,电子设备还根据预测关键点位置信息和预测关键点归属信息获取关键点归属损失。应当说明的是,关键点归属损失不同于关键点位置损失,由于不同的样本图像中的人体个数不同,无法预先对样本图像中人体关键点的归属进行标注,即没有真实的人体关键点的归属作为训练目标。On the other hand, the electronic device also obtains the key point attribution loss based on the predicted key point location information and the predicted key point attribution information. It should be noted that the loss of key point attribution is different from the loss of key point position. Because the number of human bodies in different sample images is different, it is impossible to pre-mark the attribution of human body key points in the sample image, that is, there is no real human body key point. Attribution is the training goal.
在一实施例中,根据预测关键点位置信息和预测关键点归属信息获取关键点归属损失,包括:In an embodiment, obtaining the attribution loss of the key point according to the predicted key point location information and the predicted key point attribution information includes:
(1)根据预测关键点位置信息进行关键点聚类,得到归属不同人体的多个人体关键点集合;(1) Carry out key point clustering according to the predicted key point location information, and obtain multiple human body key point sets belonging to different human bodies;
(2)根据多个人体关键点集合以及预测关键点归属信息获取关键点归属损失。(2) Obtain the attribution loss of key points based on multiple sets of human body key points and predicted key point attribution information.
本申请实施例中,采用聚类的思想进行处理,在模型的训练和预测过程中,每个关键点位置处均会预测一个整数作为人体编号,因此关键点归属损失需要保证训练是朝着“缩小相同人体的人体编号之间的差距,增大不同人体的人体编号之间的差距”这个目标进行,可以表示为:In the embodiment of this application, the idea of clustering is adopted for processing. In the training and prediction process of the model, an integer is predicted at each key point position as the human body number. Therefore, the key point attribution loss needs to ensure that the training is towards " The goal of narrowing the gap between the body numbers of the same human body and increasing the gap between the body numbers of different human bodies" can be expressed as:
采用聚类算法(可由本领域普通技术人员根据实际需要进行选择)根据预测关键点位置信息所描述的样本图像中的人体关键点进行关键点聚类,得到归属不同人体的多个人体关键点集合,其中,同一人体关键点集合中的人体关键点归属于同一人体。Clustering algorithm (selected by those of ordinary skill in the art according to actual needs) is used to cluster key points according to the key points of the human body in the sample image described by the predicted key point location information to obtain multiple sets of human key points belonging to different human bodies , Among them, the human body key points in the same human body key point set belong to the same human body.
根据预测关键点归属信息,对每一人体关键点集合所对应的人体编号值求平均,得到:According to the predicted key point attribution information, average the human body number value corresponding to each human body key point set to obtain:
Figure PCTCN2021075025-appb-000002
Figure PCTCN2021075025-appb-000002
其中n表示第n个人体对应的人体关键点集合,k表示第k个关键点,K表示人体关键点的个数,h nk表示第n个人的第k个人体关键点处的人体编号; Where n represents the set of human body key points corresponding to the nth person, k represents the kth key point, K represents the number of human body key points, and h nk represents the body number at the kth person key point of the nth person;
计算每一人体关键点集合中的各人体关键点位置处的人体编号与该前述值的差距,并求平方和:Calculate the difference between the human body number at each human body key point position in each human body key point set and the aforementioned value, and find the square sum:
Figure PCTCN2021075025-appb-000003
Figure PCTCN2021075025-appb-000003
其中N表示人体关键点集合的个数;Where N represents the number of key point collections of the human body;
计算不同人体关键点集合间的人体均值之间的差距,保证当某两个人体的人体编号之间的差距非常大时,该项损失为0,当某两个人体的人体编号之间的差距非常小时,该项损失较大,需要在训练过程中减小:Calculate the difference between the human body mean values between different sets of key points of the human body to ensure that when the difference between the body numbers of a certain two human bodies is very large, the loss is 0, and when the difference between the body numbers of a certain two human bodies is very large Very small, the loss is large, and it needs to be reduced during the training process:
Figure PCTCN2021075025-appb-000004
Figure PCTCN2021075025-appb-000004
其中σ为常数,取经验值,n/n’∈[1,N],且n≠n’;Where σ is a constant, taking empirical values, n/n’ ∈ [1, N], and n≠n’;
L 归属=L1+L2; L attribution =L1+L2;
其中L 归属表示关键点归属损失。 Where L attribution represents the attribution loss of key points.
本申请实施例中,在获取得到关键点位置损失以及关键点归属损失之后,电子设备还融合关键点位置损失以及关键点归属损失得到融合损失,可以表示为:In the embodiment of the present application, after obtaining the key point location loss and the key point attribution loss, the electronic device also fuses the key point location loss and the key point attribution loss to obtain the fusion loss, which can be expressed as:
Ltotal=L 位置+L 归属Ltotal=L position +L attribution ;
其中,Ltotal表示融合损失。Among them, Ltotal represents the fusion loss.
在得到融合损失之后,电子设备即根据融合损失调整关键点检测模型的参数,直至完成对关键点检测模型的训练。After obtaining the fusion loss, the electronic device adjusts the parameters of the key point detection model according to the fusion loss until the training of the key point detection model is completed.
在一实施例中,获取需要进行关键点检测的待检测图像,包括:In an embodiment, acquiring the image to be detected that requires key point detection includes:
(1)当电子设备使能拍摄功能时,获取拍摄场景的预览图像,并将预览图像作为待检测图像;(1) When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as the image to be detected;
根据关键点位置信息以及关键点归属信息识别出归属于同一人体的人体关键点集合之后,还包括:After identifying the set of human body key points belonging to the same human body according to the key point location information and key point attribution information, it also includes:
(2)根据识别出的人体关键点集合确定目标人体,并根据目标人体对应的人体关键点集合进行人体分类,得到目标人体的人体类型;(2) Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;
(3)根据人体类型以及目标人体对应的人体关键点集合确定对应目标人体的定位点以及构图类型;(3) Determine the positioning point and composition type of the target human body according to the human body type and the set of human body key points corresponding to the target human body;
(4)根据定位点以及构图类型确定对应人体的构图点;(4) Determine the composition point corresponding to the human body according to the positioning point and the composition type;
(5)当定位点与构图点不匹配时,输出用于指示调整电子设备拍摄姿态的提示信息。(5) When the positioning point does not match the composition point, a prompt message for instructing to adjust the shooting posture of the electronic device is output.
应当说明的是,拍摄场景为电子设备在使能拍摄功能后摄像头所对准的场景,其可以为任何场景,其中可以包括人和物等。It should be noted that the shooting scene is the scene that the camera of the electronic device is aimed at after the shooting function is enabled, and it can be any scene, which can include people and objects.
比如,电子设备可以根据用户操作来启动电子设备的系统应用“相机”,在启动“相机”后,电子设备将使能拍摄功能,通过摄像头实时进行图像采集,此时,其摄像头所对准的场景即为拍摄场景。其中,电子设备可以根据用户对“相机”入口的触摸操作来启动“相机”,还可以根据用户的语音口令“启动相机”来启动“相机”等。For example, the electronic device can start the system application "camera" of the electronic device according to the user's operation. After the "camera" is started, the electronic device will enable the shooting function to collect images in real time through the camera. At this time, the camera is aimed at The scene is the shooting scene. Among them, the electronic device can start the "camera" according to the user's touch operation on the entrance of the "camera", and can also start the "camera" according to the user's voice password "start the camera" and so on.
本申请实施例中,电子设备在使能拍摄功能时,获取到拍摄场景的预览图像,并将该预览图像作为需要进行关键点检测的待检测图像,对其进行关键点检测,得到预览图像中归属于同一人体的人体关键点集合,其中,当预览图像中存在多个人体时,将最终得到分别对应每一人体的人体关键点集合,共多个人体关键点集合;而当预览图像中存在一个人体时,将最终得到对应该人体的一个人体关键点集合。In the embodiment of the present application, when the electronic device enables the shooting function, it obtains a preview image of the shooting scene, uses the preview image as the image to be detected that requires key point detection, performs key point detection on it, and obtains the preview image. A set of human body key points belonging to the same human body. When there are multiple human bodies in the preview image, a human body key point set corresponding to each human body will be finally obtained, and there are multiple human body key point sets; and when there are multiple human body key points in the preview image When there is a human body, a set of human body key points corresponding to the human body will finally be obtained.
之后,电子设备根据识别出的人体关键点集合确定出目标人体。比如,当存在一个人体关键点集合时,直接将该人体关键点集合所对应的人体确定为目标人体;当存在多个人体关键点集合时,根据预设的目标决策策略,确定出其中一个人体关键点集合所对应的人体作为目标人体。After that, the electronic device determines the target human body based on the identified set of key points of the human body. For example, when there is a set of human body key points, the human body corresponding to the human body key point set is directly determined as the target human body; when there are multiple human body key point sets, one of the human bodies is determined according to the preset target decision strategy The human body corresponding to the key point set is regarded as the target human body.
在确定出目标人体之后,电子设备进一步根据目标人体对应的人体关键点集合以及预设的人体分类 策略,对拍摄场景中的目标人体进行分类,得到该人目标人体的人体类型。应当说明的是,对于人体类型的划分,本申请中不做具体限制,可由本领域普通技术人员根据实际需要进行配置。After determining the target human body, the electronic device further classifies the target human body in the shooting scene according to the set of human body key points corresponding to the target human body and the preset human body classification strategy to obtain the human body type of the target human body. It should be noted that the classification of human body types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.
之后,电子设备根据人体类型以及目标人体对应的人体关键点集合,按照预设定位点决策策略确定出对应目标人体的定位点,此外,还按照预设的构图类型决策策略确定出对应前述目标人体的构图类型。其中,定位点用于代表目标人体的位置。应当说明的是,对于构图类型的划分,本申请中不做具体限制,可由本领域普通技术人员根据实际需要进行配置。After that, the electronic device determines the positioning point corresponding to the target human body according to the human body type and the set of key points of the human body corresponding to the target human body according to the preset positioning point decision strategy. In addition, it also determines the corresponding target human body according to the preset composition type decision strategy. The type of composition. Among them, the positioning point is used to represent the position of the target human body. It should be noted that the division of composition types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.
应当说明的是,本申请实施例中对应于不同的构图类型,预先设置有多个可选的候选构图点。电子设备可以根据确定的构图类型进一步确定出当前可选的候选构图点,然后再根据定位点从当前可选的候选构图点中确定出对应目标人体的构图点。It should be noted that, corresponding to different composition types in the embodiment of the present application, a plurality of optional candidate composition points are preset. The electronic device may further determine the currently selectable candidate composition points according to the determined composition type, and then determine the composition point corresponding to the target human body from the currently selectable candidate composition points according to the positioning point.
在确定出对应目标人体的定位点以及构图点之后,电子设备实时判断定位点与构图点是否匹配,若不匹配,则输出用于指示调整电子设备拍摄姿态的提示信息,以使得拍摄场景中目标人体的定位点与构图点匹配,从而获得较佳的构图;若匹配,则可直接对拍摄场景进行拍摄,得到拍摄场景的拍摄图像。After determining the positioning point and composition point of the corresponding target human body, the electronic device determines in real time whether the positioning point matches the composition point, and if it does not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device to make the target in the shooting scene The positioning point of the human body is matched with the composition point to obtain a better composition; if they match, the shooting scene can be directly photographed to obtain the shooting image of the shooting scene.
其中,定位点与构图点匹配包括定位点与构图点的距离小于或等于预设距离,本申请对该预设距离的取值不做具体限定,可由本领域普通技术人员根据实际需要取值。Wherein, matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance. This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.
请参照图7,图7为本申请实施例提供的图像处理方法的另一流程示意图,本申请实施例提供的图像处理方法的流程还可以如下:Please refer to FIG. 7. FIG. 7 is a schematic diagram of another flow of the image processing method provided by the embodiment of the application. The flow of the image processing method provided by the embodiment of the application may also be as follows:
在201中,当使能拍摄功能时,电子设备获取拍摄场景的预览图像,并将预览图像作为需要进行关键点检测的待检测图像。In 201, when the shooting function is enabled, the electronic device obtains a preview image of the shooting scene, and uses the preview image as an image to be detected that requires key point detection.
应当说明的是,拍摄场景为电子设备在使能拍摄功能后摄像头所对准的场景,其可以为任何场景,其中可以包括人和物等。It should be noted that the shooting scene is the scene that the camera of the electronic device is aimed at after the shooting function is enabled, and it can be any scene, which can include people and objects.
比如,电子设备可以根据用户操作来启动电子设备的系统应用“相机”,在启动“相机”后,电子设备将使能拍摄功能,通过摄像头实时进行图像采集,此时,其摄像头所对准的场景即为拍摄场景。其中,电子设备可以根据用户对“相机”入口的触摸操作来启动“相机”,还可以根据用户的语音口令“启动相机”来启动“相机”等。For example, the electronic device can start the system application "camera" of the electronic device according to the user's operation. After the "camera" is started, the electronic device will enable the shooting function to collect images in real time through the camera. At this time, the camera is aimed at The scene is the shooting scene. Among them, the electronic device can start the "camera" according to the user's touch operation on the entrance of the "camera", and can also start the "camera" according to the user's voice password "start the camera" and so on.
本申请实施例中,电子设备在使能拍摄功能时,获取到拍摄场景的预览图像,并将该预览图像作为需要进行关键点检测的待检测图像。In the embodiment of the present application, the electronic device acquires a preview image of the shooting scene when the shooting function is enabled, and uses the preview image as the image to be detected that requires key point detection.
在202中,电子设备调用预训练的关键点检测模型对待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息。In 202, the electronic device invokes the pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information.
示例性的,本申请中采用机器学习方法预先训练有关键点检测模型。其中,该关键点检测模型被配置为同时预测输入图像中所有的人体关键点及其归属的人体,其可以设置在电子设备本地,也可以设置在服务器。此外,本申请中对关键点检测模型的构型不做具体限制,可由本领域普通技术人员根据实际需要选择。Exemplarily, a machine learning method is used in this application to pre-train a key point detection model. Wherein, the key point detection model is configured to simultaneously predict all the key points of the human body in the input image and the human body to which they belong, which can be set locally in the electronic device or in the server. In addition, the configuration of the key point detection model is not specifically limited in this application, and can be selected by a person of ordinary skill in the art according to actual needs.
相应的,电子设备在获取到需要进行关键点检测的待检测图像之后,从本地或服务器调用预训练的关键点检测模型,并将获取到的待检测图像输入该关键点检测模型,得到关键点检测模型输出的关键点位置信息和关键点归属信息。其中,关键点位置信息用于描述待检测图像中存在的所有人体关键点,关键点归属信息用于描述每一人体关键点归属的人体。Correspondingly, after the electronic device obtains the to-be-detected image that requires key-point detection, it calls the pre-trained key-point detection model from the local or server, and inputs the acquired to-be-detected image into the key-point detection model to obtain the key point The key point location information and key point attribution information output by the detection model. Among them, the key point location information is used to describe all the key points of the human body in the image to be detected, and the key point attribution information is used to describe the human body to which each key point of the human body belongs.
比如,关键点位置信息描述了待检测图像中存在人体关键点A和人体关键点B,关键点归属信息描述了人体关键点A归属于人体甲,人体关键点B归属于人体乙。For example, the key point location information describes the existence of the human key point A and the human key point B in the image to be detected, and the key point attribution information describes that the human key point A belongs to the human body A, and the human key point B belongs to the human body B.
在203中,电子设备根据关键点位置信息以及关键点归属信息识别出归属于同一人体的人体关键点集合。In 203, the electronic device identifies a set of human body key points belonging to the same human body based on the key point location information and the key point attribution information.
如上所述,关键点位置信息描述了待检测图像中存在的所有人体关键点,关键点归属信息描述了每一人体关键点归属的人体,在得到对应待检测图像的关键点位置信息和关键点归属信息之后,电子设备即可根据关键点位置信息以及关键点归属信息识别出归属于同一人体的关键点集合。其中,当待检测图 像中存在多个人体时,将最终得到分别对应每一人体的人体关键点集合,共多个人体关键点集合;而当待检测图像中存在一个人体时,将最终得到对应该人体的一个人体关键点集合。As mentioned above, the key point location information describes all the key points of the human body in the image to be detected, and the key point attribution information describes the human body to which each key point of the human body belongs. After obtaining the key point location information and key points corresponding to the image to be detected After the attribution information, the electronic device can identify the set of key points belonging to the same human body based on the key point location information and the key point attribution information. Among them, when there are multiple human bodies in the image to be detected, a human body key point set corresponding to each human body will be finally obtained, a total of multiple human body key point sets; and when there is a human body in the image to be detected, the corresponding human body key point set will be finally obtained. It should be a collection of key points of the human body.
在204中,电子设备根据识别出的人体关键点集合确定目标人体,并根据目标人体对应的人体关键点集合进行人体分类,得到目标人体的人体类型。In 204, the electronic device determines the target human body according to the identified human body key point set, and classifies the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body.
其中,电子设备根据识别出的人体关键点集合确定出目标人体。比如,当存在一个人体关键点集合时,直接将该人体关键点集合所对应的人体确定为目标人体;当存在多个人体关键点集合时,根据预设的目标决策策略,确定出其中一个人体关键点集合所对应的人体作为目标人体。Among them, the electronic device determines the target human body based on the identified set of key points of the human body. For example, when there is a set of human body key points, the human body corresponding to the human body key point set is directly determined as the target human body; when there are multiple human body key point sets, one of the human bodies is determined according to the preset target decision strategy The human body corresponding to the key point set is regarded as the target human body.
应当说明的是,本申请中对目标决策策略的设置不做具体限制,可由本领域普通技术人员根据实际需要进行设置。It should be noted that there are no specific restrictions on the setting of the target decision strategy in this application, and can be set by a person of ordinary skill in the art according to actual needs.
在确定出目标人体之后,电子设备进一步根据目标人体对应的人体关键点集合以及预设的人体分类策略,对拍摄场景中的目标人体进行分类,得到该人目标人体的人体类型。应当说明的是,对于人体类型的划分,本申请中不做具体限制,可由本领域普通技术人员根据实际需要进行配置。After determining the target human body, the electronic device further classifies the target human body in the shooting scene according to the set of human body key points corresponding to the target human body and the preset human body classification strategy to obtain the human body type of the target human body. It should be noted that the classification of human body types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.
示例性的,当目标人体的人体关键点集合中仅包括头部关键点时,根据头部关键点获取目标人体的头部长度和头部宽度,并获取头部长度和头部宽度中较大值与人像边界框长度的比值,若比值位于第一比值区间,则确定目标人体为第一人体类型,若比值位于第二比值区间,则确定目标人体为第二人体类型,若比值位于第三比值区间,则确定目标人体为第三人体类型,若比值位于第四比值区间,则确定目标人体为第四人体类型;或者,Exemplarily, when the human body key point set of the target human body includes only the head key points, the head length and head width of the target human body are obtained according to the head key points, and the larger of the head length and head width is obtained. The ratio of the value to the length of the bounding box of the portrait. If the ratio is in the first ratio interval, the target human body is determined to be the first human body type, if the ratio is in the second ratio interval, the target human body is determined to be the second human body type, if the ratio is in the third If the ratio is in the fourth ratio interval, the target human body is determined to be the third human body type, and if the ratio is in the fourth ratio interval, the target human body is determined to be the fourth human body type; or,
当目标人体的人体关键点集合中包括头部关键点和脚部关键点时,确定目标人体为第四人体类型;或者,When the set of human body key points of the target human body includes head key points and foot key points, the target human body is determined to be the fourth human body type; or,
当目标人体的人体关键点集合中包括除脚部关键点之外的关键点时,确定目标人体为第三人体类型;或者,When the set of key points of the target human body includes key points other than the key points of the feet, the target human body is determined to be the third body type; or,
当目标人体的人体关键点集合中包括除髋关节关键点和脚部关键点之外的关键点时,确定目标人体为第二人体类型。When the set of human body key points of the target human body includes key points other than the key points of the hip joint and the key points of the feet, the target human body is determined to be the second human body type.
本申请实施例中,提供一可选的目标人体分类策略,首先识别检测到的人体关键点中是否仅包括头部关键点,若仅包括头部关键点,则说明可能存在其他关键点未检测出来。In the embodiment of this application, an optional target human body classification strategy is provided. First, identify whether the detected human body key points include only the head key points. If only the head key points are included, it means that there may be other key points that have not been detected. come out.
此时,电子设备进一步根据头部关键点获取到目标人体的头部长度和头部宽度。然后,电子设备确定出头部长度和头部宽度中的较大值,并计算该较大值与人像边界框长度(其中,人像边界框的长度取其在纵轴的侧边的长度)的比值,进而根据计算得到比值进行人体类型的划分。比如,假设头部长度大于头部宽度,则电子设备计算头部长度与人像边界框长度的比值,相应的,若头部宽度大于头部长度,则电子设备计算头部宽度与人像边界框长度的比值。At this time, the electronic device further obtains the head length and head width of the target human body according to the key points of the head. Then, the electronic device determines the larger value of the head length and the head width, and calculates the larger value and the length of the bounding box of the portrait (where the length of the bounding box of the portrait is the length of the side of the vertical axis). Ratio, and then divide the human body type according to the calculated ratio. For example, if the head length is greater than the head width, the electronic device calculates the ratio of the head length to the portrait bounding box length. Correspondingly, if the head width is greater than the head length, the electronic device calculates the head width and the portrait bounding box length Ratio.
其中,若比值位于第一比值区间,则确定目标人体为第一人体类型;Wherein, if the ratio is in the first ratio interval, it is determined that the target human body is the first human body type;
若比值位于第二比值区间,则确定目标人体为第二人体类型;If the ratio is in the second ratio interval, it is determined that the target human body is the second human body type;
若比值位于第三比值区间,则确定目标人体为第三人体类型;If the ratio is in the third ratio interval, the target human body is determined to be the third human body type;
若比值位于第四比值区间,则确定目标人体为第四人体类型。If the ratio is in the fourth ratio interval, it is determined that the target human body is the fourth human body type.
本申请实施例中,定义有四种人体类型,分别为第一人体类型、第二人体类型、第三人体类型以及第四人体类型。其中,各比值区间可由本领域普通技术人员根据实际需要进行划分,本申请对此不做具体限制。In the embodiment of the present application, there are four body types defined, which are the first body type, the second body type, the third body type, and the fourth body type. Among them, each ratio interval can be divided by a person of ordinary skill in the art according to actual needs, and this application does not specifically limit this.
示例性的,第一比值区间被配置为(1/4,+∞],即当目标人体的头部长度和头部宽度中较大值与人像边界框长度的比值大于1/4时,确定拍摄场景中的目标人体为第一人体类型,判定用户此时想要拍摄前述目标人体的面部特写;Exemplarily, the first ratio interval is configured as (1/4, +∞], that is, when the ratio of the larger of the head length and head width of the target human body to the length of the bounding box of the portrait is greater than 1/4, it is determined The target human body in the shooting scene is the first human body type, and it is determined that the user wants to take a close-up of the face of the aforementioned target human body at this time;
第二比值区间被配置为(1/6,1/4],即当目标人体的头部长度和头部宽度中较大值与人像边界框长度的比值大于1/6,但是不大于1/4时,确定拍摄场景中目标人体为第二人体类型,判定用户此时想要拍摄前述目标人体的胸像;The second ratio interval is configured as (1/6, 1/4), that is, when the ratio of the larger value of the head length and head width of the target human body to the length of the bounding box of the portrait is greater than 1/6, but not greater than 1/ At 4 o'clock, it is determined that the target human body in the shooting scene is the second human body type, and it is determined that the user wants to photograph the bust of the aforementioned target human body at this time;
第三比值区间被配置为(1/9,1/6],即当目标人体的头部长度和头部宽度中较大值与人像边界框长度 的比值大于1/9,但是不大于1/6时,确定拍摄场景中目标人体为第三人体类型,判定用户此时想要拍摄前述目标人体的七分身像;The third ratio interval is configured as (1/9, 1/6), that is, when the ratio of the larger value of the head length and head width of the target human body to the length of the bounding box of the portrait is greater than 1/9, but not greater than 1/ At 6 o'clock, it is determined that the target human body in the shooting scene is the third human body type, and it is determined that the user wants to take a seven-part portrait of the aforementioned target human body at this time;
第四比值区间被配置为(-∞,1/9],即当目标人体的头部长度和头部宽度中较大值与人像边界框长度的比值不大于1/9时,确定拍摄场景中目标人体为第四人体类型,判定用户此时想要拍摄前述目标人体的全身像。The fourth ratio interval is configured as (-∞,1/9], that is, when the ratio of the larger value of the head length and head width of the target human body to the length of the portrait bounding box is not greater than 1/9, it is determined that the shooting scene The target human body is the fourth human body type, and it is determined that the user wants to take a full-length image of the aforementioned target human body at this time.
此外,当检测到的人体关键点中除了头部关键点之外还包括其它部位的关键点时,则根据其它部位的关键点进行人体类型的划分。In addition, when the detected key points of the human body include key points of other parts in addition to the key points of the head, the human body type is classified according to the key points of other parts.
其中,当检测到的人体关键点中包括头部关键点和脚部关键点时,确定拍摄场景中目标人体为第四人体类型,判定用户此时想要拍摄前述目标人体的全身像;Wherein, when the detected key points of the human body include the key points of the head and the key points of the feet, it is determined that the target human body in the shooting scene is the fourth human body type, and it is determined that the user wants to take a full-length image of the aforementioned target human body at this time;
当检测到的人体关键点中包括除脚部关键点之外的关键点时,确定目标人体为第三人体类型,判定用户此时想要拍摄前述目标人体的七分身像;When the detected key points of the human body include key points other than the key points of the feet, it is determined that the target human body is the third human body type, and it is determined that the user wants to take a seven-part portrait of the aforementioned target human body at this time;
当人体关键点中包括除髋关节关键点和脚部关键点之外的关键点时,确定目标人体为第二人体类型,判定用户此时想要拍摄前述目标人体的胸像。When the key points of the human body include key points other than the key points of the hip joint and the key points of the feet, the target human body is determined to be the second human body type, and it is determined that the user wants to take the bust of the aforementioned target human body at this time.
在205中,电子设备根据人体类型以及目标人体对应的人体关键点集合确定对应目标人体的定位点以及构图类型。In 205, the electronic device determines the positioning point and composition type corresponding to the target human body according to the human body type and the set of human body key points corresponding to the target human body.
其中,定位点用于代表人体的位置。本申请实施例中,电子设备在分类得到人体类型之后,进一步根据该人体类型以及前述目标人体的人体关键点集合,按照预设定位点决策策略确定出对应目标人体的定位点,此外,还按照预设的构图类型决策策略确定出对应前述目标人体的构图类型。Among them, the positioning point is used to represent the position of the human body. In the embodiment of the present application, after the electronic device classifies and obtains the human body type, it further determines the positioning point corresponding to the target human body according to the human body type and the aforementioned set of key points of the target human body according to the preset positioning point decision strategy. In addition, according to The preset composition type decision strategy determines the composition type corresponding to the aforementioned target human body.
其中,对于构图类型的划分,本申请中不做具体限制,可由本领域普通技术人员根据实际需要进行配置。Among them, the division of composition types is not specifically limited in this application, and can be configured by a person of ordinary skill in the art according to actual needs.
比如,本申请实施例中划分的构图类型包括面部特写型构图和全身型构图。For example, the composition types classified in the embodiment of the present application include a facial close-up composition and a full-body composition.
示例性的,电子设备根据目标人体的人体关键点集合中头部关键点识别目标人体的头部朝向为正向或是侧向;Exemplarily, the electronic device recognizes whether the head orientation of the target human body is forward or lateral according to the head key points in the human body key point set of the target human body;
当目标人体的头部朝向为正向且人体类型为第一人体类型时,将人像边界框的几何中心点确定为定位点,以及确定构图类型为第一构图类型;或者,When the head orientation of the target human body is forward and the human body type is the first human body type, the geometric center point of the bounding box of the portrait is determined as the anchor point, and the composition type is determined to be the first composition type; or,
当目标人体的头部朝向为侧向且人体类型为第一人体类型时,识别出头部关键点中的多个对称头部关键点,将多个对称头部关键点的几何中心点确定定位点,以及确定构图类型为第一构图类型;或者,When the head orientation of the target human body is lateral and the human body type is the first human body type, multiple symmetric head key points among the head key points are identified, and the geometric center points of the multiple symmetric head key points are determined and positioned Point and confirm that the composition type is the first composition type; or,
当目标人体的头部朝向为侧向且人体类型为第二人体类型时,将人像边界框的几何中心点确定为定位点,以及确定构图类型为第二构图类型;或者,When the head of the target human body is oriented laterally and the human body type is the second human body type, the geometric center point of the bounding box of the portrait is determined as the anchor point, and the composition type is determined to be the second composition type; or,
当目标人体的头部朝向为正向,且人体类型为第二人体类型、第三人体类型或第四人体类型时,识别出头部关键点中的多个对称头部关键点,将多个对称头部关键点的几何中心点确定定位点,以及确定构图类型为第二构图类型;或者,When the head orientation of the target human body is positive, and the human body type is the second human body type, the third human body type, or the fourth human body type, multiple symmetrical head key points among the head key points are identified, and multiple The geometric center point of the key point of the symmetrical head determines the positioning point, and the composition type is determined to be the second composition type; or,
当目标人体的头部朝向为侧向,且人体类型为第三人体类型或第四人体类型时,将头部关键点的纵坐标均值确定为定位点的纵坐标,将人像边界框的几何中心点的横坐标确定为定位点的横坐标,以及确定构图类型为第二构图类型。When the head of the target human body is oriented laterally, and the human body type is the third human body type or the fourth human body type, the average value of the ordinate of the key points of the head is determined as the ordinate of the positioning point, and the geometric center of the bounding box of the portrait is determined. The abscissa of the point is determined as the abscissa of the anchor point, and the composition type is determined as the second composition type.
本申请提供一可选的定位点决策策略和构图类型决策策略。This application provides an optional anchor point decision strategy and composition type decision strategy.
其中,电子设备首先根据目标人体的人体关键点集合中的头部关键点来识别目标人体的头部朝向为正向或是侧向。Wherein, the electronic device first recognizes whether the head orientation of the target human body is forward or lateral according to the head key points in the human body key point set of the target human body.
比如,电子设备可以获取眼睛关键点的横坐标、鼻尖关键点的横坐标以及嘴关键点的横坐标,然后求取其横坐标的平均值,若该平均值位于人像边界框最左或最右1/4的区域内,则判定头部朝向为侧向,否则为正向。For example, the electronic device can obtain the abscissa of the key point of the eye, the abscissa of the key point of the nose, and the abscissa of the key point of the mouth, and then obtain the average value of the abscissa, if the average value is located at the left or right of the bounding box of the portrait Within 1/4 of the area, the head is judged to be lateral, otherwise it is positive.
然后,根据识别到的头部朝向以及人体类型来进一步确定定位点和构图类型。Then, the positioning point and composition type are further determined according to the recognized head orientation and human body type.
其中,当目标人体的头部朝向为正向且人体类型为第一人体类型时,将人像边界框的几何中心点确定为定位点,以及确定构图类型为第一构图类型(即面部特写型构图)。Wherein, when the head orientation of the target human body is forward and the human body type is the first human body type, the geometric center point of the bounding box of the portrait is determined as the anchor point, and the composition type is determined to be the first composition type (ie, the face close-up type composition) ).
当目标人体的头部朝向为侧向且人体类型为第一人体类型时,识别出头部关键点中的多个对称头部关键点,将多个对称头部关键点的几何中心点确定定位点,以及确定构图类型为第一构图类型。其中,对称头部关键点是指成对出现且都被检测出的头部关键点,比如,左眼关键点和右眼关键点,左耳关键点和右耳关键点等。应当说明的是,多个对称头部关键点的几何中心点,即连接多个对称头部关键点得到的多边形的几何中心点。When the head orientation of the target human body is lateral and the human body type is the first human body type, multiple symmetric head key points among the head key points are identified, and the geometric center points of the multiple symmetric head key points are determined and positioned Point, and confirm that the composition type is the first composition type. Among them, symmetrical head key points refer to head key points that appear in pairs and are all detected, such as left eye key point and right eye key point, left ear key point and right ear key point, etc. It should be noted that the geometric center points of the multiple symmetric head key points are the geometric center points of the polygon obtained by connecting the multiple symmetric head key points.
当目标人体的头部朝向为侧向且人体类型为第二人体类型时,将人像边界框的几何中心点确定为定位点,以及确定构图类型为第二构图类型(即全身型构图)。When the head of the target human body is oriented laterally and the human body type is the second human body type, the geometric center point of the portrait bounding box is determined as the positioning point, and the composition type is determined to be the second composition type (ie, the whole body type composition).
当目标人体的头部朝向为正向,且人体类型为第二人体类型、第三人体类型或第四人体类型时,识别出头部关键点中的多个对称头部关键点,将多个对称头部关键点的几何中心点确定定位点,以及确定构图类型为第二构图类型;或者,When the head orientation of the target human body is positive, and the human body type is the second human body type, the third human body type, or the fourth human body type, multiple symmetrical head key points among the head key points are identified, and multiple The geometric center point of the key point of the symmetrical head determines the positioning point, and the composition type is determined to be the second composition type; or,
当目标人体的头部朝向为侧向,且人体类型为第三人体类型或第四人体类型时,将头部关键点的纵坐标均值确定为定位点的纵坐标,将人像边界框的几何中心点的横坐标确定为定位点的横坐标,以及确定构图类型为第二构图类型。When the head of the target human body is oriented laterally, and the human body type is the third human body type or the fourth human body type, the average value of the ordinate of the key points of the head is determined as the ordinate of the positioning point, and the geometric center of the bounding box of the portrait is determined. The abscissa of the point is determined as the abscissa of the anchor point, and the composition type is determined as the second composition type.
在206中,电子设备根据定位点以及构图类型确定对应目标人体的构图点。In 206, the electronic device determines the composition point corresponding to the target human body according to the positioning point and the composition type.
应当说明的是,本申请实施例中对应于不同的构图类型,预先设置有多个可选的候选构图点。电子设备可以根据确定的构图类型进一步确定出当前可选的候选构图点,然后再根据定位点从当前可选的候选构图点中确定出对应目标人体的构图点。It should be noted that, corresponding to different composition types in the embodiment of the present application, a plurality of optional candidate composition points are preset. The electronic device may further determine the currently selectable candidate composition points according to the determined composition type, and then determine the composition point corresponding to the target human body from the currently selectable candidate composition points according to the positioning point.
示例性的,当构图类型为第一构图类型时,从第一构图类型对应的候选构图点中选取距离定位点最近的候选构图点,确定为构图点;Exemplarily, when the composition type is the first composition type, the candidate composition point closest to the anchor point is selected from the candidate composition points corresponding to the first composition type, and the candidate composition point is determined as the composition point;
当构图类型为第二构图类型时,从第二构图类型对应的候选构图点中选取距离定位点最近的候选构图点,确定为构图点。When the composition type is the second composition type, the candidate composition point closest to the anchor point is selected from the candidate composition points corresponding to the second composition type, and determined as the composition point.
示例性的,针对于第一构图类型,本申请实施例中预先设置有多个可选的候选构图点,划分为两部分,分别为适于横屏拍摄时的候选构图点和适于竖屏拍摄时的候选构图点,其中,适于横屏拍摄时的候选构图点包括图像中心、上三分线中点,以及上三分线与其它三分线的交点,适于竖屏拍摄时的候选构图点包括图像中心和上三分线中点。Exemplarily, for the first composition type, a plurality of optional candidate composition points are preset in the embodiment of the present application, which are divided into two parts, which are the candidate composition points suitable for horizontal screen shooting and the candidate composition points suitable for vertical screen shooting. Candidate composition points during shooting, where candidate composition points suitable for landscape shooting include the center of the image, the midpoint of the upper third line, and the intersection of the upper third line and other thirds, which are suitable for portrait shooting Candidate composition points include the center of the image and the midpoint of the upper three-pointer.
同样的,针对于第二构图类型,本申请实施例中也预先设置有多个可选的候选构图点,同样划分为两部分,分别为适于横屏拍摄时的候选构图点和适于竖屏拍摄时的候选构图点,其中,适于横屏拍摄时的候选构图点包括图像中心、上/下三分线与左/右三分线的四个交点以及上/下三分线和左/右三分线的四个中点,适于竖屏拍摄时的候选构图点包括图像中心和上三分线中点。Similarly, for the second composition type, a plurality of optional candidate composition points are also preset in the embodiment of this application, which are also divided into two parts, which are the candidate composition points suitable for landscape shooting and the candidate composition points suitable for vertical shooting. Candidate composition points for screen shooting, where candidate composition points for landscape shooting include the image center, the four intersection points of the top/bottom three-point line and the left/right three-point line, and the top/bottom three-point line and the left /Four midpoints of the right third line. The candidate composition points suitable for portrait shooting include the center of the image and the midpoint of the upper third line.
基于以上设置的候选构图点,电子设备首先识别出当前的拍摄模式为竖屏模式或是横屏模式,然后从确定的构图类型在当前拍摄模式对应的候选构图点中确定出距离定位点最近的候选构图点,作为对应目标人体的构图点。Based on the candidate composition points set above, the electronic device first recognizes that the current shooting mode is portrait mode or landscape mode, and then determines the closest to the anchor point from the determined composition type among the candidate composition points corresponding to the current shooting mode Candidate composition points are used as composition points corresponding to the target human body.
在207中,当定位点与构图点不匹配时,电子设备输出用于指示调整电子设备拍摄姿态的提示信息。In 207, when the positioning point does not match the composition point, the electronic device outputs prompt information for instructing to adjust the shooting posture of the electronic device.
其中,定位点与构图点匹配包括定位点为构图点的距离小于或等于预设距离,本申请对该预设距离的取值不做具体限定,可由本领域普通技术人员根据实际需要取值。Wherein, matching the positioning point and the composition point includes that the distance between the positioning point and the composition point is less than or equal to the preset distance. This application does not specifically limit the value of the preset distance, and can be selected by those of ordinary skill in the art according to actual needs.
相应的,电子设备实时判定拍摄场景中目标人体的定位点与构图点是否匹配,若不匹配,则输出用于指示调整电子设备拍摄姿态的提示信息,以使得拍摄场景中目标人体的定位点与构图点匹配,从而获得较佳的构图。Correspondingly, the electronic device determines in real time whether the positioning point of the target human body in the shooting scene matches the composition point, and if it does not match, it outputs prompt information for instructing to adjust the shooting posture of the electronic device, so that the positioning point of the target human body in the shooting scene matches the composition point. The composition points are matched to obtain a better composition.
示例性的,请参照图8,确定的定位点为人体头部多个对称头部关键点的几何中心点,确定的构图点为上三分线和右三分线的交点。电子设备在实时采集的预览图像之上叠加显示上/下/左/右三分线,以及确定的定位点和构图点,并利用由定位点到构图点的箭头作为提示信息,引导用户来调整电子设备的拍摄姿态,使得实时预览图像中定位点和构图点匹配,如图9所示。Exemplarily, please refer to FIG. 8, the determined positioning point is the geometric center point of a plurality of symmetrical head key points of the human head, and the determined composition point is the intersection of the upper three-point line and the right three-point line. The electronic device superimposes the upper/lower/left/right three-pointers, as well as the determined positioning point and composition point on the preview image collected in real time, and uses the arrow from the positioning point to the composition point as the prompt information to guide the user to adjust The shooting posture of the electronic device makes the positioning point and the composition point in the real-time preview image match, as shown in FIG. 9.
在208中,当定位点与构图点匹配时,电子设备对拍摄场景进行拍摄,得到拍摄图像。In 208, when the positioning point matches the composition point, the electronic device photographs the shooting scene to obtain a photographed image.
当定位点与构图点匹配时,电子设备判定此时能够获得较佳的构图,即对拍摄场景进行拍摄,从而 得到拍摄场景的拍摄图像。When the positioning point matches the composition point, the electronic device determines that a better composition can be obtained at this time, that is, the shooting scene is photographed, so as to obtain a photographed image of the shooting scene.
本申请还提供了一种图像处理装置。请参照图10,图10为本申请实施例提供的图像处理装置的结构示意图。其中该图像处理装置应用于电子设备,该图像处理装置包括图像获取模块301、图像检测模块302以及人体识别模块303,如下:The application also provides an image processing device. Please refer to FIG. 10, which is a schematic structural diagram of an image processing apparatus provided by an embodiment of the application. The image processing device is applied to electronic equipment. The image processing device includes an image acquisition module 301, an image detection module 302, and a human body recognition module 303, as follows:
图像获取模块301,用于获取需要进行关键点检测的待检测图像;The image acquisition module 301 is used to acquire the image to be detected that requires key point detection;
图像检测模块302,用于调用预训练的关键点检测模型对待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;The image detection module 302 is configured to call a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;
人体识别模块303,用于根据关键点位置信息以及关键点归属信息识别出归属于同一人体的人体关键点集合。The human body recognition module 303 is used to identify a set of human body key points belonging to the same human body according to the key point location information and the key point attribution information.
在一实施例中,关键点检测模型包括特征提取网络和特征预测网络,在调用预训练的关键点检测模型对待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息时,图像检测模块302用于:In one embodiment, the key point detection model includes a feature extraction network and a feature prediction network. When the pre-trained key point detection model is called to perform key point detection on the image to be detected, and the key point location information and key point attribution information are obtained, the image detection Module 302 is used to:
调用特征提取网络提取得到待检测图像的图像特征;Call the feature extraction network to extract the image features of the image to be detected;
调用特征预测网络对图像特征进行关键点检测,得到关键点位置信息和关键点归属信息。Call the feature prediction network to detect the key points of the image features, and obtain the key point location information and key point attribution information.
在一实施例中,特征预测网络包括位置分支和归属分支,调用特征预测网络对图像特征进行关键点检测,得到关键点位置信息和关键点归属信息时,图像检测模块302用于:In one embodiment, the feature prediction network includes a location branch and an attribution branch. The feature prediction network is called to perform key point detection on image features, and when key point location information and key point attribution information are obtained, the image detection module 302 is used to:
调用位置分支对图像特征进行关键点位置检测,得到关键点位置信息;Call the location branch to detect the key point position of the image feature, and obtain the key point position information;
调用归属分支根据图像特征以及关键点位置信息进行关键点归属检测,得到关键点归属信息。The attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information.
在一实施例中,位置分支包括卷积核尺寸为1*1的卷积单元。In an embodiment, the location branch includes a convolution unit with a convolution kernel size of 1*1.
在一实施例中,归属分支包括特征优化子模块、融合子模块以及输出子模块,在调用归属分支根据图像特征以及关键点位置信息进行关键点归属检测,得到关键点归属信息时,图像检测模块302用于:In one embodiment, the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module. When the attribution branch is called to perform key point attribution detection based on image features and key point location information, and the key point attribution information is obtained, the image detection module 302 is used for:
调用特征优化子模块对图像特征进行优化处理,得到优化图像特征;Call the feature optimization sub-module to optimize the image features to obtain optimized image features;
调用融合子模块融合优化图像特征以及关键点位置信息得到融合特征;Call the fusion sub-module to fuse and optimize image features and key point location information to obtain fusion features;
调用输出子模块对融合特征进行关键点位置检测,得到关键点位置信息。Call the output sub-module to detect the key point position of the fusion feature, and obtain the key point position information.
在一实施例中,特征优化子模块包括卷积核尺寸为1*1的卷积单元,输出子模块包括卷积核尺寸为1*1的卷积单元。In an embodiment, the feature optimization submodule includes a convolution unit with a convolution kernel size of 1*1, and the output submodule includes a convolution unit with a convolution kernel size of 1*1.
在一实施例中,本申请提供的图像处理装置还包括模型训练模块,在获取需要进行关键点检测的待检测图像之前,用于:In an embodiment, the image processing device provided in this application further includes a model training module, which is used to:
获取样本图像以及对应样本图像的样本关键点位置信息,并构建关键点检测模型;Obtain sample image and sample key point location information corresponding to the sample image, and build a key point detection model;
调用关键点检测模型对样本图像进行关键点检测,得到预测关键点位置信息和预测关键点归属信息;Call the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;
根据样本关键点位置信息和预测关键点位置信息获取关键点位置损失,以及根据预测关键点位置信息和预测关键点归属信息获取关键点归属损失;Obtain key point location loss based on sample key point location information and predicted key point location information, and obtain key point attribution loss based on predicted key point location information and predicted key point attribution information;
融合关键点位置损失以及关键点归属损失得到融合损失,并根据融合损失调整关键点检测模型的参数。Fusion key point location loss and key point attribution loss are fused to obtain the fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
在一实施例中,在根据预测关键点位置信息和预测关键点归属信息获取关键点归属损失时,模型训练模块用于:In an embodiment, when obtaining the attribution loss of the key point according to the predicted key point location information and the predicted key point attribution information, the model training module is used to:
根据预测关键点位置信息进行关键点聚类,得到归属不同人体的多个人体关键点集合;Perform key point clustering according to the predicted key point location information to obtain multiple human body key point sets belonging to different human bodies;
根据多个人体关键点集合以及预测关键点归属信息获取关键点归属损失。Obtain the attribution loss of key points according to multiple sets of human body key points and predicted key point attribution information.
在一实施例中,在获取需要进行关键点检测的待检测图像时,图像获取模块301用于:In an embodiment, when acquiring an image to be detected that requires key point detection, the image acquisition module 301 is used to:
当电子设备使能拍摄功能时,获取拍摄场景的预览图像,并将预览图像作为待检测图像;When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as the image to be detected;
本申请提供的图像处理装置还包括构图提示模块,在人体识别模块303根据关键点位置信息以及关键点归属信息识别出归属于同一人体的人体关键点集合之后,用于:The image processing device provided in this application also includes a composition prompting module, which is used after the human body recognition module 303 identifies a set of human body key points belonging to the same human body according to the key point position information and key point attribution information:
根据识别出的人体关键点集合确定目标人体,并根据目标人体对应的人体关键点集合进行人体分 类,得到目标人体的人体类型;Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;
根据人体类型以及目标人体对应的人体关键点集合确定对应目标人体的定位点以及构图类型;Determine the positioning point and composition type of the target human body according to the human body type and the set of human body key points corresponding to the target human body;
根据定位点以及构图类型确定对应人体的构图点;Determine the composition point corresponding to the human body according to the positioning point and the composition type;
当定位点与构图点不匹配时,输出用于指示调整电子设备拍摄姿态的提示信息。When the positioning point does not match the composition point, a prompt message for instructing to adjust the shooting posture of the electronic device is output.
本申请还提供一种电子设备,请参照图11,电子设备包括处理器401和存储器402。This application also provides an electronic device. Please refer to FIG. 11. The electronic device includes a processor 401 and a memory 402.
本申请实施例中的处理器401是通用处理器,比如ARM架构的处理器。The processor 401 in the embodiment of the present application is a general-purpose processor, such as an ARM architecture processor.
存储器402中存储有计算机程序,其可以为高速随机存取存储器,还可以为非易失性存储器,比如至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件等。相应地,存储器402还可以包括存储器控制器,以提供处理器401对存储器402中计算机程序的访问,实现如下功能:A computer program is stored in the memory 402, which may be a high-speed random access memory or a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. Correspondingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the computer program in the memory 402 to implement the following functions:
获取需要进行关键点检测的待检测图像;Obtain the image to be detected that requires key point detection;
调用预训练的关键点检测模型对待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;Call the pre-trained key point detection model to perform key point detection on the image to be detected, and obtain key point location information and key point attribution information;
根据关键点位置信息以及关键点归属信息识别出归属于同一人体的人体关键点集合。According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.
在一实施例中,关键点检测模型包括特征提取网络和特征预测网络,在调用预训练的关键点检测模型对待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息时,处理器401用于执行:In one embodiment, the key point detection model includes a feature extraction network and a feature prediction network. When the pre-trained key point detection model is invoked to perform key point detection on the image to be detected, and the key point location information and key point attribution information are obtained, the processor 401 is used to execute:
调用特征提取网络提取得到待检测图像的图像特征;Call the feature extraction network to extract the image features of the image to be detected;
调用特征预测网络对图像特征进行关键点检测,得到关键点位置信息和关键点归属信息。Call the feature prediction network to detect the key points of the image features, and obtain the key point location information and key point attribution information.
在一实施例中,特征预测网络包括位置分支和归属分支,调用特征预测网络对图像特征进行关键点检测,得到关键点位置信息和关键点归属信息时,处理器401用于执行:In an embodiment, the feature prediction network includes a location branch and an attribution branch. The feature prediction network is called to perform key point detection on image features, and when key point location information and key point attribution information are obtained, the processor 401 is configured to execute:
调用位置分支对图像特征进行关键点位置检测,得到关键点位置信息;Call the location branch to detect the key point position of the image feature, and obtain the key point position information;
调用归属分支根据图像特征以及关键点位置信息进行关键点归属检测,得到关键点归属信息。The attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information.
在一实施例中,位置分支包括卷积核尺寸为1*1的卷积单元。In an embodiment, the location branch includes a convolution unit with a convolution kernel size of 1*1.
在一实施例中,归属分支包括特征优化子模块、融合子模块以及输出子模块,在调用归属分支根据图像特征以及关键点位置信息进行关键点归属检测,得到关键点归属信息时,处理器401用于执行:In an embodiment, the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module. When the attribution branch is called to perform key point attribution detection based on image features and key point location information to obtain key point attribution information, the processor 401 Used to execute:
调用特征优化子模块对图像特征进行优化处理,得到优化图像特征;Call the feature optimization sub-module to optimize the image features to obtain optimized image features;
调用融合子模块融合优化图像特征以及关键点位置信息得到融合特征;Call the fusion sub-module to fuse and optimize image features and key point location information to obtain fusion features;
调用输出子模块对融合特征进行关键点位置检测,得到关键点位置信息。Call the output sub-module to detect the key point position of the fusion feature, and obtain the key point position information.
在一实施例中,特征优化子模块包括卷积核尺寸为1*1的卷积单元,输出子模块包括卷积核尺寸为1*1的卷积单元。In an embodiment, the feature optimization submodule includes a convolution unit with a convolution kernel size of 1*1, and the output submodule includes a convolution unit with a convolution kernel size of 1*1.
在一实施例中,在获取需要进行关键点检测的待检测图像之前,处理器401还用于执行:In an embodiment, before acquiring the image to be detected that requires key point detection, the processor 401 is further configured to execute:
获取样本图像以及对应样本图像的样本关键点位置信息,并构建关键点检测模型;Obtain sample image and sample key point location information corresponding to the sample image, and build a key point detection model;
调用关键点检测模型对样本图像进行关键点检测,得到预测关键点位置信息和预测关键点归属信息;Call the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;
根据样本关键点位置信息和预测关键点位置信息获取关键点位置损失,以及根据预测关键点位置信息和预测关键点归属信息获取关键点归属损失;Obtain key point location loss based on sample key point location information and predicted key point location information, and obtain key point attribution loss based on predicted key point location information and predicted key point attribution information;
融合关键点位置损失以及关键点归属损失得到融合损失,并根据融合损失调整关键点检测模型的参数。Fusion key point location loss and key point attribution loss are fused to obtain the fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
在一实施例中,在根据预测关键点位置信息和预测关键点归属信息获取关键点归属损失时,处理器401用于执行:In an embodiment, when acquiring the attribution loss of a key point according to the predicted key point location information and the predicted key point attribution information, the processor 401 is configured to execute:
根据预测关键点位置信息进行关键点聚类,得到归属不同人体的多个人体关键点集合;Perform key point clustering according to the predicted key point location information to obtain multiple human body key point sets belonging to different human bodies;
根据多个人体关键点集合以及预测关键点归属信息获取关键点归属损失。Obtain the attribution loss of key points according to multiple sets of human body key points and predicted key point attribution information.
在一实施例中,在获取需要进行关键点检测的待检测图像时,处理器401用于执行:In an embodiment, when acquiring a to-be-detected image that requires key point detection, the processor 401 is configured to execute:
当电子设备使能拍摄功能时,获取拍摄场景的预览图像,并将预览图像作为待检测图像;When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as the image to be detected;
根据关键点位置信息以及关键点归属信息识别出归属于同一人体的人体关键点集合之后,处理器401还用于执行:After identifying the set of human body key points belonging to the same human body according to the key point location information and the key point attribution information, the processor 401 is further configured to execute:
根据识别出的人体关键点集合确定目标人体,并根据目标人体对应的人体关键点集合进行人体分类,得到目标人体的人体类型;Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;
根据人体类型以及目标人体对应的人体关键点集合确定对应目标人体的定位点以及构图类型;Determine the positioning point and composition type of the target human body according to the human body type and the set of human body key points corresponding to the target human body;
根据定位点以及构图类型确定对应人体的构图点;Determine the composition point corresponding to the human body according to the positioning point and the composition type;
当定位点与构图点不匹配时,输出用于指示调整电子设备拍摄姿态的提示信息。When the positioning point does not match the composition point, a prompt message for instructing to adjust the shooting posture of the electronic device is output.
应当说明的是,本申请实施例提供的电子设备与上文实施例中的图像拍摄方法属于同一构思,在电子设备上可以运行图像拍摄方法实施例中提供的任一方法,其具体实现过程详见特征提取方法实施例,此处不再赘述。It should be noted that the electronic device provided in this embodiment of the application belongs to the same concept as the image capturing method in the above embodiment, and any method provided in the image capturing method embodiment can be run on the electronic device. The specific implementation process is detailed. See the embodiment of the feature extraction method, which will not be repeated here.
应当说明的是,本申请实施例提供的电子设备与上文实施例中的图像处理方法属于同一构思,在电子设备上可以运行图像处理方法实施例中提供的任一方法,其具体实现过程详见图像处理方法实施例,此处不再赘述。It should be noted that the electronic device provided in this embodiment of the application belongs to the same concept as the image processing method in the above embodiment. Any method provided in the image processing method embodiment can be run on the electronic device. The specific implementation process is detailed. See the embodiment of the image processing method, which will not be repeated here.
需要说明的是,对本申请实施例的图像处理方法而言,本领域普通技术人员可以理解实现本申请实施例的图像处理方法的全部或部分流程,是可以通过计算机程序来控制相关的硬件来完成,所述计算机程序可存储于一计算机可读取存储介质中,如存储在电子设备的存储器中,并被该电子设备内的处理器执行,在执行过程中可包括如图像处理方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储器、随机存取记忆体等。It should be noted that for the image processing method of the embodiment of the present application, those of ordinary skill in the art can understand that all or part of the process of implementing the image processing method of the embodiment of the present application can be completed by controlling the relevant hardware through a computer program. The computer program may be stored in a computer readable storage medium, such as stored in the memory of an electronic device, and executed by a processor in the electronic device, and may include embodiments such as image processing methods during execution. Process. Wherein, the storage medium can be a magnetic disk, an optical disk, a read-only memory, a random access memory, etc.
以上对本申请实施例所提供的一种图像处理方法、装置、存储介质及电子设备进行了详细介绍,本文中应用了具体个例对本申请的原理及实施方式进行了阐述,以上实施例的说明只是用于帮助理解本申请的方法及其核心思想;同时,对于本领域的技术人员,依据本申请的思想,在具体实施方式及应用范围上均会有改变之处,综上所述,本说明书内容不应理解为对本申请的限制。The above describes in detail an image processing method, device, storage medium, and electronic equipment provided by the embodiments of the present application. Specific examples are used in this article to illustrate the principles and implementations of the present application. The description of the above embodiments is only It is used to help understand the methods and core ideas of this application; at the same time, for those skilled in the art, according to the ideas of this application, there will be changes in the specific implementation and scope of application. In summary, this specification The content should not be construed as a limitation on this application.

Claims (20)

  1. 一种图像处理方法,其中,包括:An image processing method, which includes:
    获取需要进行关键点检测的待检测图像;Obtain the image to be detected that requires key point detection;
    调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;Calling a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;
    根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合。According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.
  2. 根据权利要求1所述的图像处理方法,其中,所述关键点检测模型包括特征提取网络和特征预测网络,所述调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息,包括:The image processing method according to claim 1, wherein the key point detection model includes a feature extraction network and a feature prediction network, and the pre-trained key point detection model is invoked to perform key point detection on the image to be detected to obtain Key point location information and key point attribution information, including:
    调用所述特征提取网络提取得到所述待检测图像的图像特征;Calling the feature extraction network to extract the image features of the image to be detected;
    调用所述特征预测网络对所述图像特征进行关键点检测,得到关键点位置信息和所述关键点归属信息。Invoke the feature prediction network to perform key point detection on the image feature to obtain key point location information and the key point attribution information.
  3. 根据权利要求2所述的图像处理方法,其中,所述特征预测网络包括位置分支和归属分支,所述调用所述特征预测网络对所述图像特征进行关键点检测,得到关键点位置信息和所述关键点归属信息,包括:The image processing method according to claim 2, wherein the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on the image feature to obtain key point location information and all The attribution information of the key points, including:
    调用所述位置分支对所述图像特征进行关键点位置检测,得到所述关键点位置信息;Calling the location branch to perform key point location detection on the image feature to obtain the key point location information;
    调用所述归属分支根据所述图像特征以及所述关键点位置信息进行关键点归属检测,得到所述关键点归属信息。The attribution branch is called to perform key point attribution detection based on the image feature and the key point location information, to obtain the key point attribution information.
  4. 根据权利要求3所述的图像处理方法,其中,所述位置分支包括卷积核尺寸为1*1的卷积单元。The image processing method according to claim 3, wherein the location branch includes a convolution unit with a convolution kernel size of 1*1.
  5. 根据权利要求3所述的图像处理方法,其中,所述归属分支包括特征优化子模块、融合子模块以及输出子模块,所述调用所述归属分支根据所述图像特征以及所述关键点位置信息进行关键点归属检测,得到所述关键点归属信息,包括:The image processing method according to claim 3, wherein the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module, and the attribution branch is called according to the image feature and the key point location information Perform key point attribution detection to obtain the key point attribution information, including:
    调用所述特征优化子模块对所述图像特征进行优化处理,得到优化图像特征;Calling the feature optimization sub-module to perform optimization processing on the image features to obtain optimized image features;
    调用所述融合子模块融合所述优化图像特征以及所述关键点位置信息得到融合特征;Calling the fusion sub-module to fuse the optimized image feature and the key point position information to obtain a fusion feature;
    调用所述输出子模块对所述融合特征进行关键点位置检测,得到所述关键点位置信息。Invoke the output sub-module to perform key point position detection on the fusion feature to obtain the key point position information.
  6. 根据权利要求5所述的图像处理方法,其中,所述特征优化子模块包括卷积核尺寸为1*1的卷积单元,所述输出子模块包括卷积核尺寸为1*1的卷积单元。The image processing method according to claim 5, wherein the feature optimization sub-module includes a convolution unit with a convolution kernel size of 1*1, and the output sub-module includes a convolution unit with a convolution kernel size of 1*1. unit.
  7. 根据权利要求1-6任一项所述的图像处理方法,其中,所述获取需要进行关键点检测的待检测图像之前,还包括:The image processing method according to any one of claims 1 to 6, wherein before said acquiring the image to be detected that requires key point detection, the method further comprises:
    获取样本图像以及对应所述样本图像的样本关键点位置信息,并构建所述关键点检测模型;Acquiring a sample image and sample key point position information corresponding to the sample image, and constructing the key point detection model;
    调用所述关键点检测模型对所述样本图像进行关键点检测,得到预测关键点位置信息和预测关键点归属信息;Calling the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;
    根据所述样本关键点位置信息和所述预测关键点位置信息获取关键点位置损失,以及根据所述预测关键点位置信息和所述预测关键点归属信息获取关键点归属损失;Acquiring key point location loss according to the sample key point location information and the predicted key point location information, and acquiring key point attribution loss according to the predicted key point location information and the predicted key point attribution information;
    融合所述关键点位置损失以及所述关键点归属损失得到融合损失,并根据所述融合损失调整所述关键点检测模型的参数。The key point position loss and the key point attribution loss are fused to obtain a fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
  8. 根据权利要求7所述的图像处理方法,其中,所述根据所述样本关键点位置信息和所述预测关键点归属信息获取关键点归属损失,包括:8. The image processing method according to claim 7, wherein said acquiring the attribution loss of a key point according to the position information of the sample key point and the attribution information of the predicted key point comprises:
    根据所述预测关键点位置信息进行关键点聚类,得到归属不同人体的多个人体关键点集合;Perform key point clustering according to the predicted key point location information to obtain multiple human body key point sets belonging to different human bodies;
    根据所述多个人体关键点集合以及所述预测关键点归属信息获取所述关键点归属损失。Acquire the attribution loss of the key point according to the multiple sets of human body key points and the predicted key point attribution information.
  9. 根据权利要求1-6任一项所述的图像处理方法,其中,所述获取需要进行关键点检测的待检测图像,包括:The image processing method according to any one of claims 1 to 6, wherein the acquiring the image to be detected that needs to be detected by key points comprises:
    当电子设备使能拍摄功能时,获取拍摄场景的预览图像,并将所述预览图像作为待检测图像;When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as an image to be detected;
    所述根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合 之后,还包括:After identifying a set of human body key points belonging to the same human body according to the key point location information and the key point attribution information, the method further includes:
    根据识别出的人体关键点集合确定目标人体,并根据目标人体对应的人体关键点集合进行人体分类,得到所述目标人体的人体类型;Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;
    根据所述人体类型以及所述目标人体对应的人体关键点集合确定对应所述目标人体的定位点以及构图类型;Determining a positioning point and a composition type corresponding to the target human body according to the human body type and a set of human body key points corresponding to the target human body;
    根据所述定位点以及所述构图类型确定对应所述目标人体的构图点;Determining a composition point corresponding to the target human body according to the positioning point and the composition type;
    当所述定位点与所述构图点不匹配时,输出用于指示调整所述电子设备拍摄姿态的提示信息。When the positioning point does not match the composition point, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
  10. 一种图像处理装置,其中,包括:An image processing device, which includes:
    图像获取模块,用于获取需要进行关键点检测的待检测图像;The image acquisition module is used to acquire the to-be-detected image that requires key point detection;
    图像检测模块,用于调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;The image detection module is used to call a pre-trained key point detection model to perform key point detection on the image to be detected to obtain key point location information and key point attribution information;
    人体识别模块,用于根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合。The human body recognition module is used to identify a set of human body key points belonging to the same human body based on the key point location information and the key point attribution information.
  11. 一种存储介质,其上存储有计算机程序,其中,当所述计算机程序被处理器加载时执行:A storage medium on which a computer program is stored, wherein, when the computer program is loaded by a processor, it executes:
    获取需要进行关键点检测的待检测图像;Obtain the image to be detected that requires key point detection;
    调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;Calling a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;
    根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合。According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.
  12. 一种电子设备,包括处理器和存储器,所述存储器储存有计算机程序,其中,所述处理器通过加载所述计算机程序,用于执行:An electronic device includes a processor and a memory, the memory stores a computer program, wherein the processor loads the computer program to execute:
    获取需要进行关键点检测的待检测图像;Obtain the image to be detected that requires key point detection;
    调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息;Calling a pre-trained key point detection model to perform key point detection on the image to be detected, to obtain key point location information and key point attribution information;
    根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合。According to the key point location information and the key point attribution information, a set of human body key points belonging to the same human body is identified.
  13. 根据权利要求12所述的电子设备,其中,所述关键点检测模型包括特征提取网络和特征预测网络,在调用预训练的关键点检测模型对所述待检测图像进行关键点检测,得到关键点位置信息和关键点归属信息时,所述处理器用于执行:The electronic device according to claim 12, wherein the key point detection model includes a feature extraction network and a feature prediction network, and the pre-trained key point detection model is called to perform key point detection on the image to be detected to obtain key points In the case of location information and key point attribution information, the processor is used to execute:
    调用所述特征提取网络提取得到所述待检测图像的图像特征;Calling the feature extraction network to extract the image features of the image to be detected;
    调用所述特征预测网络对所述图像特征进行关键点检测,得到关键点位置信息和所述关键点归属信息。Invoke the feature prediction network to perform key point detection on the image feature to obtain key point location information and the key point attribution information.
  14. 根据权利要求13所述的电子设备,其中,所述特征预测网络包括位置分支和归属分支,在调用所述特征预测网络对所述图像特征进行关键点检测,得到关键点位置信息和所述关键点归属信息时,所述处理器用于执行:The electronic device according to claim 13, wherein the feature prediction network includes a location branch and an attribution branch, and the feature prediction network is called to perform key point detection on the image feature to obtain key point location information and the key point. When the attribution information is clicked, the processor is used to execute:
    调用所述位置分支对所述图像特征进行关键点位置检测,得到所述关键点位置信息;Calling the location branch to perform key point location detection on the image feature to obtain the key point location information;
    调用所述归属分支根据所述图像特征以及所述关键点位置信息进行关键点归属检测,得到所述关键点归属信息。The attribution branch is called to perform key point attribution detection based on the image feature and the key point location information, to obtain the key point attribution information.
  15. 根据权利要求14所述的电子设备,其中,所述位置分支包括卷积核尺寸为1*1的卷积单元。The electronic device according to claim 14, wherein the location branch includes a convolution unit with a convolution kernel size of 1*1.
  16. 根据权利要求14所述的电子设备,其中,所述归属分支包括特征优化子模块、融合子模块以及输出子模块,在调用所述归属分支根据所述图像特征以及所述关键点位置信息进行关键点归属检测,得到所述关键点归属信息时,所述处理器用于执行:The electronic device according to claim 14, wherein the attribution branch includes a feature optimization sub-module, a fusion sub-module, and an output sub-module, and the attribution branch is called to perform keying based on the image features and the location information of the key points. Point attribution detection, when the key point attribution information is obtained, the processor is configured to execute:
    调用所述特征优化子模块对所述图像特征进行优化处理,得到优化图像特征;Calling the feature optimization sub-module to perform optimization processing on the image features to obtain optimized image features;
    调用所述融合子模块融合所述优化图像特征以及所述关键点位置信息得到融合特征;Calling the fusion sub-module to fuse the optimized image feature and the key point position information to obtain a fusion feature;
    调用所述输出子模块对所述融合特征进行关键点位置检测,得到所述关键点位置信息。Invoke the output sub-module to perform key point position detection on the fusion feature to obtain the key point position information.
  17. 根据权利要求16所述的电子设备,其中,所述特征优化子模块包括卷积核尺寸为1*1的卷积单 元,所述输出子模块包括卷积核尺寸为1*1的卷积单元。The electronic device according to claim 16, wherein the feature optimization sub-module includes a convolution unit with a convolution kernel size of 1*1, and the output sub-module includes a convolution unit with a convolution kernel size of 1*1 .
  18. 根据权利要求12-17任一项所述的电子设备,其中,在获取需要进行关键点检测的待检测图像之前,所述处理器还用于执行:The electronic device according to any one of claims 12-17, wherein, before acquiring the image to be detected that requires key point detection, the processor is further configured to execute:
    获取样本图像以及对应所述样本图像的样本关键点位置信息,并构建所述关键点检测模型;Acquiring a sample image and sample key point position information corresponding to the sample image, and constructing the key point detection model;
    调用所述关键点检测模型对所述样本图像进行关键点检测,得到预测关键点位置信息和预测关键点归属信息;Calling the key point detection model to perform key point detection on the sample image to obtain predicted key point location information and predicted key point attribution information;
    根据所述样本关键点位置信息和所述预测关键点位置信息获取关键点位置损失,以及根据所述预测关键点位置信息和所述预测关键点归属信息获取关键点归属损失;Acquiring key point location loss according to the sample key point location information and the predicted key point location information, and acquiring key point attribution loss according to the predicted key point location information and the predicted key point attribution information;
    融合所述关键点位置损失以及所述关键点归属损失得到融合损失,并根据所述融合损失调整所述关键点检测模型的参数。The key point position loss and the key point attribution loss are fused to obtain a fusion loss, and the parameters of the key point detection model are adjusted according to the fusion loss.
  19. 根据权利要求18所述的电子设备,其中,在根据所述样本关键点位置信息和所述预测关键点归属信息获取关键点归属损失时,所述处理器用于执行:The electronic device according to claim 18, wherein, when acquiring the attribution loss of a key point according to the sample key point location information and the predicted key point attribution information, the processor is configured to execute:
    根据所述预测关键点位置信息进行关键点聚类,得到归属不同人体的多个人体关键点集合;Perform key point clustering according to the predicted key point location information to obtain multiple human body key point sets belonging to different human bodies;
    根据所述多个人体关键点集合以及所述预测关键点归属信息获取所述关键点归属损失。Acquire the attribution loss of the key point according to the multiple sets of human body key points and the predicted key point attribution information.
  20. 根据权利要求12-17任一项所述的电子设备,其中,在获取需要进行关键点检测的待检测图像时,所述处理器用于执行:The electronic device according to any one of claims 12-17, wherein, when acquiring an image to be detected that requires key point detection, the processor is configured to execute:
    当电子设备使能拍摄功能时,获取拍摄场景的预览图像,并将所述预览图像作为待检测图像;When the electronic device enables the shooting function, obtain a preview image of the shooting scene, and use the preview image as an image to be detected;
    在根据所述关键点位置信息以及所述关键点归属信息识别出归属于同一人体的人体关键点集合之后,所述处理器还用于执行:After identifying a set of human body key points belonging to the same human body according to the key point location information and the key point attribution information, the processor is further configured to execute:
    根据识别出的人体关键点集合确定目标人体,并根据目标人体对应的人体关键点集合进行人体分类,得到所述目标人体的人体类型;Determine the target human body according to the identified human body key point set, and classify the human body according to the human body key point set corresponding to the target human body to obtain the human body type of the target human body;
    根据所述人体类型以及所述目标人体对应的人体关键点集合确定对应所述目标人体的定位点以及构图类型;Determining a positioning point and a composition type corresponding to the target human body according to the human body type and a set of human body key points corresponding to the target human body;
    根据所述定位点以及所述构图类型确定对应所述目标人体的构图点;Determining a composition point corresponding to the target human body according to the positioning point and the composition type;
    当所述定位点与所述构图点不匹配时,输出用于指示调整所述电子设备拍摄姿态的提示信息。When the positioning point does not match the composition point, outputting prompt information for instructing to adjust the shooting posture of the electronic device.
PCT/CN2021/075025 2020-03-06 2021-02-03 Image processing method and apparatus, storage medium, and electronic device WO2021175071A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010152690.8A CN111368751A (en) 2020-03-06 2020-03-06 Image processing method, image processing device, storage medium and electronic equipment
CN202010152690.8 2020-03-06

Publications (1)

Publication Number Publication Date
WO2021175071A1 true WO2021175071A1 (en) 2021-09-10

Family

ID=71206558

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/075025 WO2021175071A1 (en) 2020-03-06 2021-02-03 Image processing method and apparatus, storage medium, and electronic device

Country Status (2)

Country Link
CN (1) CN111368751A (en)
WO (1) WO2021175071A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115862074A (en) * 2023-02-28 2023-03-28 科大讯飞股份有限公司 Human body direction determining method, human body direction determining device, screen control method, human body direction determining device and related equipment

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111368751A (en) * 2020-03-06 2020-07-03 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment
CN111985549B (en) * 2020-08-12 2023-03-31 中国科学院光电技术研究所 Deep learning method for automatic positioning and identification of components for given rigid body target
CN113168533A (en) * 2020-08-26 2021-07-23 深圳市大疆创新科技有限公司 Gesture recognition method and device
CN111953907B (en) * 2020-08-28 2021-11-23 维沃移动通信有限公司 Composition method and device
CN112966574A (en) * 2021-02-22 2021-06-15 厦门艾地运动科技有限公司 Human body three-dimensional key point prediction method and device and electronic equipment
CN114518801B (en) * 2022-02-18 2023-10-27 美的集团(上海)有限公司 Device control method, control device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140003709A1 (en) * 2012-06-28 2014-01-02 Honda Motor Co., Ltd. Road marking detection and recognition
CN105608450A (en) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 Heterogeneous face identification method based on deep convolutional neural network
CN108229509A (en) * 2016-12-16 2018-06-29 北京市商汤科技开发有限公司 For identifying object type method for distinguishing and device, electronic equipment
CN111368751A (en) * 2020-03-06 2020-07-03 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109960986A (en) * 2017-12-25 2019-07-02 北京市商汤科技开发有限公司 Human face posture analysis method, device, equipment, storage medium and program
CN110163059B (en) * 2018-10-30 2022-08-23 腾讯科技(深圳)有限公司 Multi-person posture recognition method and device and electronic equipment
CN109660719A (en) * 2018-12-11 2019-04-19 维沃移动通信有限公司 A kind of information cuing method and mobile terminal
CN109788191A (en) * 2018-12-21 2019-05-21 中国科学院自动化研究所南京人工智能芯片创新研究院 Photographic method, device, computer equipment and storage medium
CN110084161B (en) * 2019-04-17 2023-04-18 中山大学 Method and system for rapidly detecting key points of human skeleton
CN110580445B (en) * 2019-07-12 2023-02-07 西北工业大学 Face key point detection method based on GIoU and weighted NMS improvement
CN110598554B (en) * 2019-08-09 2023-01-03 中国地质大学(武汉) Multi-person posture estimation method based on counterstudy

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140003709A1 (en) * 2012-06-28 2014-01-02 Honda Motor Co., Ltd. Road marking detection and recognition
CN105608450A (en) * 2016-03-01 2016-05-25 天津中科智能识别产业技术研究院有限公司 Heterogeneous face identification method based on deep convolutional neural network
CN108229509A (en) * 2016-12-16 2018-06-29 北京市商汤科技开发有限公司 For identifying object type method for distinguishing and device, electronic equipment
CN111368751A (en) * 2020-03-06 2020-07-03 Oppo广东移动通信有限公司 Image processing method, image processing device, storage medium and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115862074A (en) * 2023-02-28 2023-03-28 科大讯飞股份有限公司 Human body direction determining method, human body direction determining device, screen control method, human body direction determining device and related equipment

Also Published As

Publication number Publication date
CN111368751A (en) 2020-07-03

Similar Documents

Publication Publication Date Title
WO2021175071A1 (en) Image processing method and apparatus, storage medium, and electronic device
WO2021227726A1 (en) Methods and apparatuses for training face detection and image detection neural networks, and device
Qu et al. RGBD salient object detection via deep fusion
WO2019128508A1 (en) Method and apparatus for processing image, storage medium, and electronic device
WO2019128507A1 (en) Image processing method and apparatus, storage medium and electronic device
CN112131978B (en) Video classification method and device, electronic equipment and storage medium
WO2020182121A1 (en) Expression recognition method and related device
US10679041B2 (en) Hybrid deep learning method for recognizing facial expressions
WO2021169754A1 (en) Photographic composition prompting method and apparatus, storage medium, and electronic device
CN110555481A (en) Portrait style identification method and device and computer readable storage medium
CN110674741A (en) Machine vision gesture recognition method based on dual-channel feature fusion
CN111368672A (en) Construction method and device for genetic disease facial recognition model
US20230237771A1 (en) Self-supervised learning method and apparatus for image features, device, and storage medium
JP6787831B2 (en) Target detection device, detection model generation device, program and method that can be learned by search results
WO2023284182A1 (en) Training method for recognizing moving target, method and device for recognizing moving target
CN111242019A (en) Video content detection method and device, electronic equipment and storage medium
US20220270404A1 (en) Hybrid deep learning method for recognizing facial expressions
CN105631404A (en) Method and device for clustering pictures
Zhang et al. Facial component-landmark detection with weakly-supervised lr-cnn
CN107479715A (en) The method and apparatus that virtual reality interaction is realized using gesture control
CN113591562A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN115410240A (en) Intelligent face pockmark and color spot analysis method and device and storage medium
CN112655021A (en) Image processing method, image processing device, electronic equipment and storage medium
CN114529944B (en) Human image scene identification method combining human body key point heat map features
Lakshmy et al. Image based group happiness intensity analysis

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21763733

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21763733

Country of ref document: EP

Kind code of ref document: A1