CN113114924A

CN113114924A - Image shooting method and device, computer readable storage medium and electronic equipment

Info

Publication number: CN113114924A
Application number: CN202010033715.2A
Authority: CN
Inventors: 武锐
Original assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Current assignee: Beijing Horizon Robotics Technology Research and Development Co Ltd
Priority date: 2020-01-13
Filing date: 2020-01-13
Publication date: 2021-07-13

Abstract

Disclosed are an image shooting method, an image shooting device, a computer readable storage medium and an electronic device, which relate to the technical field of image processing. The method comprises the following steps: determining human body key points of a shot object in an image preview interface; determining the posture similarity between the posture of the shot object and a reference posture in a posture reference template according to the human body key points; and when the posture similarity meets a first preset condition, triggering a photographing instruction to generate an image containing the photographed object. The embodiment of the disclosure can obtain the image of the shot object in the specific posture, thereby achieving the expected shooting effect and being beneficial to improving the user experience.

Description

Image shooting method and device, computer readable storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to an image capturing method and apparatus, a computer-readable storage medium, and an electronic device.

Background

Photographing is an important way for people to record life, and terminal devices such as mobile phones, tablet computers and cameras also become an indispensable part of people's life, and people often use the terminal devices to photograph images. When a user shoots with a terminal device, the shooting requirements are diversified. For example, the jumping posture is a common photographing mode in people's leisure and tourism. The jumping posture needs the photographer to grasp the photographing time, the photographer presses the shutter at the jumping moment of the photographed person, the photographing is carried out continuously at a high speed, and finally manual screening is carried out on a plurality of photos. However, most users do not have professional shooting technology, and it is difficult to achieve the desired shooting effect through manually shot images.

Disclosure of Invention

The present disclosure is proposed to solve the above technical problems. The embodiment of the disclosure provides an image shooting method and device, a computer readable storage medium and an electronic device.

According to an aspect of the present disclosure, there is provided an image photographing method including: determining human body key points of a shot object in an image preview interface; determining the posture similarity between the posture of the shot object and a reference posture in a posture reference template according to the human body key points; and when the posture similarity meets a first preset condition, triggering a photographing instruction to generate an image containing the photographed object.

According to another aspect of the embodiments of the present disclosure, there is provided an image photographing apparatus including: the key point confirming module is used for confirming human key points of the shot object in the image preview interface; the gesture confirming module is used for confirming gesture similarity between the gesture of the shot object and a reference gesture in a gesture reference template according to the human body key point; and the shooting module is used for triggering a shooting instruction to generate an image containing the shot object when the posture similarity meets a first preset condition.

According to another aspect of the embodiments of the present disclosure, there is provided a computer-readable storage medium storing a computer program for executing the image capturing method mentioned in any of the above embodiments.

According to another aspect of the embodiments of the present disclosure, there is provided an electronic apparatus including: a processor; a memory for storing the processor-executable instructions; the processor is used for executing the image shooting method mentioned in any one of the above embodiments.

According to the image shooting method provided by the embodiment of the disclosure, the posture of the shot object can be determined according to the key points of the human body of the shot object, the posture similarity between the posture of the shot object and the reference posture in the posture reference template can be determined according to the posture of the shot object, when the posture similarity meets the preset condition, the specific posture of the shot object meets the expected shooting effect (for example, jumping posture), at the moment, the shooting operation can be executed, so that the image of the shot object in the specific posture can be obtained, the expected shooting effect is achieved, and the user experience is favorably improved.

Drawings

The above and other objects, features and advantages of the present disclosure will become more apparent by describing in more detail embodiments of the present disclosure with reference to the attached drawings. The accompanying drawings are included to provide a further understanding of the embodiments of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the principles of the disclosure and not to limit the disclosure. In the drawings, like reference numbers generally represent like parts or steps.

Fig. 1 is a schematic diagram of an implementation scenario provided by an embodiment of the present disclosure.

Fig. 2 is a flowchart illustrating an image capturing method according to an exemplary embodiment of the present disclosure.

Fig. 3 is a schematic structural diagram of a human body key point according to an exemplary embodiment of the present disclosure.

Fig. 4 is a flowchart illustrating an image capturing method according to another exemplary embodiment of the present disclosure.

Fig. 5 is a flowchart illustrating an image capturing method according to still another exemplary embodiment of the present disclosure.

Fig. 6 is a flowchart illustrating an image capturing method according to still another exemplary embodiment of the present disclosure.

Fig. 7 is a flowchart illustrating an image capturing method according to still another exemplary embodiment of the present disclosure.

Fig. 8 is a block diagram of an image capturing apparatus according to an exemplary embodiment of the present disclosure.

Fig. 9 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. It is to be understood that the described embodiments are merely a subset of the embodiments of the present disclosure and not all embodiments of the present disclosure, with the understanding that the present disclosure is not limited to the example embodiments described herein.

Summary of the application

Conventionally, in photographing in a specific posture, for example, photographing in a jumping posture, it is generally necessary for a photographer to grasp a photographing timing, and at the moment when a subject finishes the specific posture, the photographer presses a shutter to perform high-speed continuous photographing, and finally, manual screening is performed among a plurality of photographs. However, most users do not have professional shooting technology, and it is difficult to achieve the desired shooting effect through manually shot images.

In view of the above problem, an embodiment of the present disclosure provides an image capturing method, which extracts a human key point of a captured object through a preset neural network model, determines a pose similarity between a pose of the captured object and a reference pose in a pose reference template according to the extracted human key point of the captured object, performs a capturing operation when the pose similarity satisfies a preset condition, and further generates an image including the captured object. Compared with the existing shooting method with the specific posture, the shooting method provided by the embodiment of the disclosure does not need a photographer to grasp the shooting time, and does not need to screen shot images, so that the shooting method provided by the embodiment of the disclosure can obtain the images of the shot object in the specific posture, an expected shooting effect is achieved, and the user experience is facilitated to be improved.

Having described the general principles of the present disclosure, various non-limiting embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings.

Exemplary System

Fig. 1 is a schematic diagram of an implementation scenario provided by an embodiment of the present disclosure. The implementation scenario includes: a server 140 and a plurality of terminal devices 110, 120, 130. The terminal devices 110, 120, and 130 are provided with cameras, and can acquire the to-be-processed image 150.

The terminal devices 110, 120, and 130 may be mobile terminal devices such as a mobile phone, a game console, a tablet Computer, a camera, a video camera, and a vehicle-mounted Computer, or the terminal devices 110, 120, and 130 may be Personal Computers (PCs), such as a laptop portable Computer and a desktop Computer. Those skilled in the art will appreciate that the types of terminal devices 110, 120, 130 may be the same or different, and that the number may be greater or fewer. For example, the number of the terminals may be one, or several tens or hundreds of the terminals, or more. The number of terminals and the type of the device are not limited in the embodiments of the present disclosure.

The terminal devices 110, 120, 130 and the server 140 are connected via a communication network. Optionally, the communication network is a wired network or a wireless network. Optionally, the server 140 is a server, or is composed of a plurality of servers, or is a virtualization platform, or is a cloud computing service center.

The terminal devices 110, 120, and 130 may be deployed with a neural network model for performing image processing on the image to be processed 150. In an embodiment, the terminal device 110, 120, 130 extracts key points from the image 150 to be processed through a neural network model to obtain human key points of the object, then the terminal device 110, 120, 130 determines a pose similarity between the pose of the object and a reference pose in a pose reference template according to the human key points, and finally, when the pose similarity satisfies a preset condition, a photographing instruction of the terminal device 110, 120, 130 may be triggered, so as to generate an image including the object. The gesture reference template is stored in a gesture database, and is obtained by extracting human body key point information in an image containing a human body and determining the reference gesture of the human body in the image containing the human body according to the human body key point information.

In some alternative embodiments, the image 150 to be processed may also be processed by a neural network model in the server 140. In an embodiment, the terminal devices 110, 120, and 130 may send the acquired to-be-processed image 150 or the image processing result to the server 140, the server 140 performs key point extraction on the to-be-processed image 150 through a neural network model on the server 140 to obtain a human key point of the photographed object, then the server 140 sends the human key point of the photographed object to the terminal devices 110, 120, and 130, the terminal devices 110, 120, and 130 determine a pose similarity between a pose of the photographed object and a reference pose in the pose reference template according to the human key point, and finally, when the pose similarity satisfies a preset condition, a photographing instruction of the terminal devices 110, 120, and 130 may be triggered, so as to generate an image including the photographed object.

In alternative embodiments, the image 150 to be processed may be further processed using a neural network model in the server 140. In an embodiment, while the terminal device 110, 120, 130 acquires the image 150 to be processed, the terminal device 110, 120, 130 acquires a neural network model from the server 140 to perform keypoint extraction on the image 150 to be processed, so as to obtain a human body keypoint of the object to be shot, then the terminal device 110, 120, 130 determines a pose similarity between a pose of the object to be shot and a reference pose in a pose reference template according to the human body keypoint, and finally, when the pose similarity satisfies a preset condition, a shooting instruction of the terminal device 110, 120, 130 may be triggered, so as to generate an image containing the object to be shot.

Through the implementation scenes, the image of the shot object in the specific posture can be obtained, the expected shooting effect is achieved, and the user experience is improved.

Exemplary method

Fig. 2 is a flowchart illustrating an image capturing method according to an exemplary embodiment of the present disclosure. The method may be applied to an implementation scenario shown in fig. 1, and is executed by the terminal device shown in fig. 1, but the embodiment of the present disclosure is not limited thereto. The terminal device may be a mobile terminal device having a shooting function, such as a mobile phone, a game console, a tablet Computer, a camera, a video camera, and a vehicle-mounted Computer, or the terminal device may also be a Personal Computer (PC), such as a laptop portable Computer and a desktop Computer, which are non-mobile terminal devices having a shooting function, and the embodiment of the disclosure is not limited thereto.

As shown in fig. 2, an image capturing method provided by the embodiment of the present disclosure includes the following steps:

s201: and determining the human key points of the shot object in the image preview interface.

In an embodiment, after the terminal device starts a shooting function, a camera of the terminal device starts to enter a framing state, and a current frame image including a shot object is displayed through an image preview interface of the terminal device. And the terminal equipment identifies the current frame image and determines the human body key point of the shot object in the current frame image. The current frame image is used for representing the frame image which is displayed in the image preview interface and currently contains the shot object.

It should be understood that the human key points in the present disclosure can be obtained based on a method of recognizing human key points in related art such as a human gesture recognition algorithm. In one embodiment, the human key points of the object can be the key points of the object with obvious visual features. The key points of the human body of the object may be human body joint points, for example, characteristic points of human body joint parts such as a head, a neck, a shoulder, an elbow, a crotch, a knee, and a foot of the human body. For example, as shown in fig. 3, the human key points of the subject may include at least two key points of the following twenty-five key points:

head keypoint 1, neck keypoint 2, spine keypoint 3, mid-spine keypoint 4, fundus keypoint 5, left shoulder keypoint 21, right shoulder keypoint 22, left elbow keypoint 23, right elbow keypoint 24, left wrist keypoint 25, right wrist keypoint 26, left hand keypoint 27, right hand keypoint 28, left thumb keypoint 29, right thumb keypoint 30, left tip keypoint 31, right tip keypoint 32, left crotch keypoint 33, right crotch keypoint 34, left knee keypoint 35, right knee keypoint 36, left ankle keypoint 37, right ankle keypoint 38, left foot keypoint 39, and right foot keypoint 40.

In other possible implementations, the human key points of the object to be photographed may also include key points of other parts in the human body, such as a left eye key point, a right eye key point, a left ear key point, a right ear key point, a nose key point, a toe key point, and the like.

S202: and determining the posture similarity between the posture of the shot object and the reference posture in the posture reference template according to the human key points.

In one embodiment, a pose reference template that the user needs to employ may be determined from a pose database. The pose reference template may only include the reference pose of the human body composed of the human body key points (as shown in fig. 3), or may include the human body itself and the reference pose of the human body composed of the human body key points, which is not limited in the embodiment of the present disclosure.

It should be noted that the gesture database may be a standard gesture database from a network side, or may be a user-defined gesture database. The gesture database comprises a plurality of gesture reference templates, and the gesture reference template is obtained by extracting human body key point information in an image containing a human body and determining the reference gesture of the human body in the image containing the human body according to the human body key point information.

In one embodiment, the terminal equipment receives a gesture reference template containing a reference gesture selected from a pre-established gesture database by a user; the posture of the shot object can be determined according to the position information of the key points of the human body of the shot object; and then, according to an image similarity matching algorithm, calculating the posture similarity between the posture of the shot object in the current frame image and the reference posture in the posture reference template.

S203: and when the posture similarity meets a first preset condition, triggering a photographing instruction to generate an image containing the photographed object.

It should be noted that, the first preset condition is not limited in the embodiment of the present disclosure, for example, when the posture similarity is within a preset value range, it indicates that the posture similarity satisfies the first preset condition, and at this time, a photographing instruction may be triggered, so as to generate an image including a photographed object; or when the gesture similarity is greater than or equal to the first similarity threshold, a photographing instruction can be triggered, and an image containing the photographed object is generated. The first similarity threshold and the preset value range are not particularly limited in the embodiments of the present disclosure, and those skilled in the art can set the first similarity threshold and the preset value range according to actual application situations.

It should be understood that the number of images containing a subject generated by triggering a photographing instruction is not further limited in the embodiments of the present disclosure, and only one image containing a subject may be generated, or a plurality of images containing a subject may be generated as long as the generated images containing a subject satisfy a specific posture.

Therefore, according to the image shooting method provided by the embodiment of the disclosure, the posture of the shot object can be determined according to the human key point of the shot object, and then the posture similarity between the posture of the shot object and the reference posture in the posture reference template is determined according to the posture of the shot object, when the posture similarity meets the preset condition, the specific posture of the shot object meets the expected shooting effect (for example, jumping posture), and at the moment, the shooting operation can be executed, so that the image of the shot object in the specific posture can be obtained, the expected shooting effect is achieved, and the user experience is favorably improved.

In an embodiment of the present disclosure, as shown in fig. 4, a human body key point of a photographed object in an image preview interface may be determined according to the following two steps:

s401: and acquiring an image to be processed in the image preview interface.

In an embodiment, after the terminal device starts a shooting function, a camera of the terminal device starts to enter a view state, where an image presented in an image preview interface is a to-be-processed image, but it should be noted that the to-be-processed image includes a to-be-shot object, and the to-be-shot object is in a motion state, images of the to-be-processed image including the to-be-shot object at different motion moments can be displayed in real time through the image preview interface of the terminal device, and the terminal device can acquire an image of the to-be-shot object at each motion moment in real time.

In an embodiment, the step S401 of acquiring the to-be-processed image in the image preview interface includes the following two steps:

1) judging whether the shot object exists in an image preview interface or not by adopting a preset algorithm, wherein the preset algorithm comprises at least one of an optical flow algorithm, an interframe difference algorithm and a background difference algorithm; 2) and when the shot object exists in the current frame image of the image preview interface, acquiring the current frame image as an image to be processed.

It should be understood that the user triggers the current capture in the preset pose capture mode by selecting a pose reference template containing a reference pose from a pre-established pose database, or by a preset control button in the application to which the image preview interface belongs. In the preset gesture shooting mode, when the terminal device acquires a current frame image in an image preview interface, and performs target detection on the current frame image, it is determined whether the current frame image contains a human body, that is, a shot object. And when the current frame image contains the shot object, acquiring the current frame image as an image to be processed.

It should be further understood that, when the current frame image does not include the subject, whether the image preview interface includes the subject within the preset time period is continuously detected. If the shot object is not detected within the preset time period, a reminding message can be output at the moment, and the reminding message is used for reminding a user to align the camera of the terminal equipment to the shot object. If the shot object is detected within the preset time period, the image frame currently including the shot object can be acquired as the image to be processed. Illustratively, the subject is a moving human body.

In an embodiment, the target detection may be performed on the current frame image through a preset algorithm, and whether the shot object exists in the image preview interface is determined according to whether a moving human body exists in the current frame image of the image preview interface. When a moving human body exists in a current frame image of the image preview interface, determining that a shot object exists in the image preview interface; otherwise, the shot object does not exist in the image preview interface. The predetermined algorithm is, for example, an optical flow algorithm, an inter-frame difference algorithm, a background difference algorithm, a Gaussian Mixture model (Mixture of Gaussian mode), or a sample consistency modeling algorithm (SACON). The background difference algorithm is to subtract a background image from a current frame image to obtain a difference image, wherein the difference image is a shot object. The interframe difference algorithm is a method for judging whether a shot object exists in two or three adjacent frames of images by using difference between pixels. The optical flow algorithm is to change a current frame image in an image preview interface into a vector of a speed, if no moving human body exists in the current frame image, the vector of the optical flow is continuously changed in the whole current frame image area, when the moving human body exists in the current frame image, relative motion exists between the moving human body and a background in the current frame image, the vector of the speed formed by the moving human body is different from the vector of the speed of the background, and thus the moving human body, namely a shot object, can be calculated.

For example, target detection may also be performed on the current frame image through an algorithm based on deep learning to determine whether the current frame image contains a human body. The deep learning-based algorithm may be a Single-point multi-box Detector (SSD) algorithm, a yolo (young Only Look one) algorithm, a fast regional Convolutional Neural network (fast RCNN), a Feature Pyramid Network (FPN), and the like.

S402: and extracting key points of the image to be processed through a preset neural network model to obtain the human body key points of the shot object.

It should be understood that the preset neural network model is pre-deployed in the terminal device, or the preset neural network model is obtained from the network terminal when the terminal device is in the preset posture shooting mode. Optionally, the preset Neural Network model may be a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), or the like, and the type of the preset Neural Network model is not limited in the embodiment of the present disclosure. Optionally, the Network structure of the preset neural Network model may be designed autonomously according to a computer vision task design, or the Network structure of the preset neural Network model may adopt at least a part of an existing Network structure, for example, a Mask-region convolutional neural Network (Mask-RCNN), a dense pose-region convolutional neural Network (DensePose-RCNN), a Stacked Hourglass Network (Stacked Hourglass Network), a twin Network (Simaese Network), and the like.

In one embodiment, the preset neural network model adopts a human body posture recognition algorithm such as an open posture (openposition) algorithm, a real-time Multi-Person posture Estimation (real Multi-Person position Estimation) algorithm, a dense posture (DensePose) algorithm and the like or other self-designed human body posture recognition algorithms to extract key points of the image to be processed, so that the human body key points of the shot object can be obtained.

As shown in fig. 5, based on the embodiment shown in fig. 2, the step S202 may include the following steps:

s5011: and calculating the image plane coordinates of the human body key points on the image preview interface.

It should be understood that, since the terminal device can acquire the image of the object at each motion moment in real time, the human key points of the object can be obtained according to the preset neural network model, and the position information of the human key points of the object at each motion moment can also be obtained. A frame image including an initial frame image of a subject or a frame image of a subject whose movement position, speed, acceleration, etc. is less than a specific value in an image preview interface is referred to as a reference frame image, and the reference frame image is used as reference information for determining image plane coordinates of a human body key point.

In an embodiment, with reference to the position information of the human key point of the object to be shot in the reference frame image of the image preview interface, the image plane coordinates of the human key point in the image to be processed on the image preview interface can be obtained by comparing the position information of the human key point in the image to be processed with the position information of the human key point in the reference frame image. Optionally, the position information of the human key points in the reference frame image may be used as an origin to establish a coordinate system of the human key points in the image to be processed on the image preview interface, for example, the position information of the human foot key points, the human waist key points, the human head key points, and the like in the reference frame image is used as an origin to correspondingly determine the image plane coordinates of the human key points in the image to be processed, such as the human foot key points, the human waist key points, the human head key points, and the like. Alternatively, the image plane coordinates of the human key points of the object on the image preview interface may be determined according to the position information of the object in the world coordinate system, which is not limited in the embodiment of the present disclosure.

S5012: and when the image plane coordinates of the human body key points on the image preview interface meet a second preset condition, calculating the posture similarity between the posture of the shot object and the reference posture in the posture reference template.

It should be understood that the second preset condition may be set according to a specific application, for example, when the image plane coordinates of the human body key points on the image preview interface are within a preset range, which indicates that the image plane coordinates of the human body key points on the image preview interface satisfy the second preset condition, at this time, the pose similarity between the pose of the photographed object and the reference pose in the pose reference template may be calculated, that is, the second preset condition refers to a condition that the image plane coordinates set according to the specific application are within the preset range. For example, taking a jump shot of a shot object as an example, the distance from the human body key point of the shot object to the ground may be determined according to the image plane coordinates of the human body key point of the shot object, and then the preset range is set.

In one embodiment, to obtain the distance between the human body key point of the photographed object and the ground, firstly, determining the image plane coordinates of the waist key point when the photographed object is still, then, based on the imaging principle and the image recognition and analysis technology, calculating the image plane coordinates of the waist key point after the photographed object jumps in the image to be processed according to the jumping speed of the photographed object and the time interval between the time when the photographed object is still and the time when the photographed object is obtained, and finally, according to the image plane coordinates of the waist key point when the photographed object is still and the image plane coordinates of the waist key point after the photographed object jumps, the distance between the waist key point of the photographed object and the ground can be estimated and obtained, and the preset range is further set; in another embodiment, in order to obtain the distance from the human key point of the object to be photographed to the ground, image plane coordinates of the waist key point of the reference frame image are first determined to determine relative position information between the waist key point of the reference frame image and the ground, after the image to be processed is obtained, image plane coordinates of the waist key point of the image to be processed are determined to determine relative position information between the waist key point of the image to be processed and the waist key point of the reference frame image, and finally, according to the relative position information between the waist key point of the reference frame image and the ground and the relative position information between the waist key point of the image to be processed and the waist key point of the reference frame image, the distance from the waist key point of the object to be photographed to the ground can be estimated, and the preset range is further set.

For example, the human body key points and the position information of the human body key points can be obtained according to a human body posture recognition algorithm adopted by the preset neural network model, and the human body key points are connected together according to the connection relationship between the human body key points, so as to obtain the posture of the shot object in the image to be processed. For example, connecting the wrist key point, the elbow key point, and the shoulder key point yields the arm posture of the subject, and connecting the foot key point, the knee key point, and the crotch key point yields the leg posture, and the like.

In an embodiment, the pose similarity between the pose of the captured object and the reference pose in the pose reference template can be calculated according to an image similarity matching algorithm such as cosine distance (cosine similarity), Manhattan distance (Manhattan distance), euclidean distance and the like.

In another embodiment, the preset neural network model may be used to directly compare the human body key points in the image to be processed with the reference postures in the posture reference template, so as to obtain the posture similarity between the posture of the photographed object in the image to be processed and the reference posture in the posture reference template.

Therefore, according to the image shooting method provided by the embodiment of the disclosure, the image plane coordinates of the human key points on the image preview interface are accurately calculated, and when the image plane coordinates of the human key points on the image preview interface meet the second preset condition, the posture similarity between the posture of the shot object and the reference posture in the posture reference template is calculated, so that compared with manual shooting, the image of the shot object in a specific posture can be shot more accurately, an expected shooting effect is achieved, and user experience is facilitated to be improved.

As shown in fig. 6, based on the embodiment shown in fig. 2, the step S202 may include the following steps:

s6011: and determining the motion track of the shot object on the image preview interface according to the human key points.

It should be understood that, since the terminal device may obtain the image of the captured object at each motion moment in real time, and the images at different motion moments correspond to different frame images, that is, the terminal device may obtain the multi-frame images including the captured object in the image preview interface in real time, and may obtain the human key points of the captured object according to the preset neural network model, and may also obtain the position information of the human key points of the captured object at each motion moment, the motion trajectory of the captured object on the image preview interface may be obtained through the position information of the human key points in the multi-frame images including the captured object in the image preview interface. For example, the motion trajectory of the object is determined by comparing the position relationship of the specific human key points of the object in the multi-frame images, but the embodiment of the present disclosure is not limited thereto, and the motion trajectory of the object may also be determined by other methods.

In one embodiment, the motion behavior of the object can be determined based on the motion trajectory of the object, for example, taking the foot key point and the crotch key point as an example, when the motion trajectories of the foot key point and the crotch key point are determined to be moving upward away from the ground from the multi-frame image, the motion behavior of the object can be determined to be jumping, or when the motion trajectories of the foot key point and the crotch key point are moving horizontally, the motion behavior of the object can be determined to be traveling horizontally.

S6012: and when the motion trail meets a third preset condition, calculating the attitude similarity between the attitude of the shot object and the reference attitude in the attitude reference template.

It should be understood that the third preset condition may be set according to a specific application, for example, when the motion trajectory of the object is a preset motion trajectory, which indicates that the motion trajectory of the object satisfies the third preset condition, the gesture similarity between the gesture of the object and the reference gesture in the gesture reference template may be calculated, that is, the third preset condition refers to a condition that the motion trajectory of the object set according to the specific application is the preset motion trajectory. For example, the preset motion trajectory may be a jump, a movement of an arm or a leg in a specific direction, and the like, and the preset motion trajectory is not limited in the embodiment of the disclosure.

For example, the preset neural network model and the human posture recognition algorithm may be used to obtain the human key points and the position information of the human key points, and the human key points are connected according to the relationship between the human key points to obtain the posture of the object to be photographed in the image to be processed. For example, connecting the wrist key point, the elbow key point, and the shoulder key point yields the arm posture of the subject, and connecting the foot key point, the knee key point, and the crotch key point yields the leg posture, and the like.

In the embodiment of the present disclosure, the method for calculating the posture similarity in the embodiment corresponding to fig. 5 may be referred to, and the posture similarity between the posture of the photographed object and the reference posture in the posture reference template is determined, which is not described herein again.

Therefore, according to the image shooting method provided by the embodiment of the disclosure, the motion track of the shot object on the image preview interface is accurately determined according to the human key points, and when the motion track meets a third preset condition, the posture similarity between the posture of the shot object and the reference posture in the posture reference template is calculated, so that compared with manual shooting, an image of the shot object in a specific posture can be shot more accurately, an expected shooting effect is achieved, and user experience is improved.

As shown in fig. 7, based on the embodiment shown in fig. 2, the step S202 may include the following steps:

s7011: and determining the posture of the shot object according to the relative position relation among a plurality of key points in the key points of the human body.

It should be understood that, since the terminal device can acquire the image of the object at each motion moment in real time, the human key points of the object can be obtained according to the preset neural network model, and the position information of the human key points of the object at each motion moment can also be obtained. After the position information of the human key points is determined, a plurality of key points can be selected from the human key points, and the relative position relation among the key points is calculated according to the position information of the key points. For example, a leg key portion may be formed by selecting a foot key point, a knee key point, and a crotch key point from twenty-five human body key points in fig. 3, and determining a relative positional relationship between a plurality of key points of the leg key portion; or determining the relative positional relationship between the trunk and the legs of the subject from the crotch key points and the leg key points.

For example, when an angle between a line between a left foot key point and a left knee key point of the subject and a line between a left crotch key point and a left knee key point is within a preset angle range, and/or an angle between a line between a right foot key point and a right knee key point of the subject and a line between a right crotch key point and a right knee key point is within a preset angle range, the posture of the subject may be determined to be jumping. It should be understood that the preset included angle range is, for example, 30 degrees to 120 degrees, and a specific value thereof may be set according to a specific application, which is not limited in the embodiment of the present disclosure.

For example, the posture of the subject may be determined according to the magnitude of the relative distance between the crotch key point and the foot key point on one side or both sides of the body of the subject, for example, when the relative distance between the crotch key point and the foot key point on one side or both sides of the body of the subject is less than a certain distance threshold, the posture of the subject is determined to be jumping. It should be understood that the specific value of the specific distance threshold may be set according to a specific application, and the embodiment of the disclosure does not limit this.

For example, the preset neural network model and the human posture recognition algorithm may also be used to obtain the human key points and the position information of the human key points, and the human key points are connected according to the relationship between the human key points to obtain the posture of the object to be photographed in the image to be processed. For example, connecting the wrist key point, the elbow key point, and the shoulder key point yields the arm posture of the subject, and connecting the foot key point, the knee key point, and the crotch key point yields the leg posture, and the like.

S7012: when the posture of the shot object meets a fourth preset condition, determining the posture similarity between the posture of the shot object and the reference posture in the reference posture template.

It should be understood that the fourth preset condition may be set according to a specific application, for example, when the posture of the object is determined to be the preset posture, which means that the posture of the object satisfies the fourth preset condition, the posture similarity between the posture of the object and the reference posture in the posture reference template may be calculated, that is, the fourth preset condition refers to a condition that the posture of the object set according to the specific application is the preset posture. For example, the preset gesture may be a jumping gesture, a horizontal movement gesture, a head tilting gesture, a hand swinging gesture, and the like, and the specific form of the preset gesture is not limited in the embodiment of the disclosure.

Similarly, in the embodiment of the present disclosure, the method for calculating the pose similarity in the embodiment corresponding to fig. 5 may be referred to, and the pose similarity between the pose of the captured object and the reference pose in the pose reference template is determined, which is not described herein again.

Therefore, according to the image shooting method provided by the embodiment of the disclosure, the posture of the shot object is accurately determined according to the relative position relationship among the plurality of key points in the human body key points, and when the posture of the shot object meets the fourth preset condition, the posture similarity between the posture of the shot object and the reference posture in the reference posture template is determined, so that compared with manual shooting, an image of the shot object in a specific posture can be shot more accurately, an expected shooting effect is achieved, and user experience is improved.

Exemplary devices

The disclosed apparatus embodiments may be used to perform the disclosed method embodiments. For details not disclosed in the embodiments of the apparatus of the present disclosure, refer to the embodiments of the method of the present disclosure.

Fig. 8 is a block diagram of an image capturing apparatus according to an exemplary embodiment of the present disclosure. The apparatus 800 has the functions of implementing the above method embodiments in fig. 2 and 4 to 7, and the functions may be implemented by hardware, or by hardware executing corresponding software. The apparatus 800 may include: a key point confirmation module 810, a posture confirmation module 820, and a photographing module 830.

And the key point confirming module 810 is used for confirming human key points of the shot object in the image preview interface.

In an embodiment, after the terminal device starts a shooting function, a camera of the terminal device starts to enter a framing state, and a current frame image including a shot object is displayed through an image preview interface of the terminal device. The key point determination module 810 performs image recognition on the current frame image, and determines a human key point of a photographed object in the current frame image. The current frame image is used for representing the frame image which is displayed in the image preview interface and currently contains the shot object.

And the posture confirmation module 820 is used for determining the posture similarity between the posture of the shot object and the reference posture in the posture reference template according to the human body key point.

In one embodiment, the pose validation module 820 receives the pose reference template containing the reference pose selected by the user from the pre-established pose database; the posture of the shot object can be determined according to the position information of the key points of the human body of the shot object; then, according to an image similarity matching algorithm, the pose similarity between the pose of the subject in the image containing the subject and the reference pose in the pose reference template is calculated.

And a shooting module 830, configured to trigger a shooting instruction to generate an image including the shot object when the gesture similarity satisfies a first preset condition.

It should be noted that the first preset condition is not limited in the embodiment of the present disclosure, for example, when the gesture similarity is within a preset value range, the shooting module 830 may trigger a shooting instruction, so as to generate an image including a shot object; alternatively, when the gesture similarity is greater than or equal to the first similarity threshold, the photographing module 830 may trigger a photographing instruction, and further generate an image including the photographed object. The first similarity threshold and the preset value range are not particularly limited in the embodiments of the present disclosure, and those skilled in the art can set the first similarity threshold and the preset value range according to actual application situations.

In an optional embodiment, the keypoint confirming module 810 further includes an image obtaining unit 811 for obtaining the image to be processed in the image preview interface; and a key point extracting unit 812, configured to perform key point extraction on the image to be processed through a preset neural network model, so as to obtain a human body key point of the photographed object.

In an optional embodiment, the image obtaining unit 811 is configured to determine whether the photographed object exists in the image preview interface by using a preset algorithm, where the preset algorithm includes at least one of an optical flow algorithm, an inter-frame difference algorithm, and a background difference algorithm; and when the shot object exists in the current frame image of the image preview interface, acquiring the current frame image as the image to be processed.

In an optional embodiment, the pose confirming module 820 may further include an image plane coordinate calculating unit 821, configured to calculate image plane coordinates of the human key points on the image preview interface; and an attitude similarity calculation unit 824, configured to calculate an attitude similarity between the attitude of the captured object and a reference attitude in the attitude reference template when the image plane coordinates of the human body key points on the image preview interface satisfy a second preset condition.

In an optional embodiment, the gesture confirmation module 820 may further include a motion trajectory determination unit 822, configured to determine a motion trajectory of the captured object on the image preview interface according to the human key point; and an attitude similarity calculation unit 824 for calculating an attitude similarity between the attitude of the subject and a reference attitude in the attitude reference template when the motion trajectory satisfies a third preset condition.

In an optional embodiment, the gesture confirmation module 820 may further include a position relationship determination unit 823 for determining a position relationship between a plurality of key points in the human body key points; and an attitude similarity calculation unit 824 for determining an attitude similarity between the attitude of the subject and a reference attitude in the reference attitude template when the attitude of the subject satisfies a fourth preset condition.

In one embodiment, the human key points include human joint points of the subject. The image processing device provided by the embodiment of the disclosure can determine the posture of the photographed object according to the key points of the human body of the photographed object, and then determine the posture similarity between the posture of the photographed object and the reference posture in the posture reference template, when the posture similarity meets the preset condition, it indicates that the specific posture of the photographed object meets the expected shooting effect (for example, jumping posture), and at this time, the shooting operation can be executed, so that the image of the photographed object in the specific posture can be obtained, and the expected shooting effect can be achieved, which is helpful for improving the user experience.

Exemplary electronic device

Next, an electronic apparatus according to an embodiment of the present disclosure is described with reference to fig. 9. Fig. 9 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure. As shown in fig. 9, the electronic device 900 includes one or more processors 910 and memory 920.

The processor 910 may be a Central Processing Unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device 900 to perform desired functions.

Memory 920 may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, Random Access Memory (RAM), cache memory (cache), and/or the like. The non-volatile memory may include, for example, Read Only Memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium and executed by processor 910 to implement the image capture methods of the various embodiments of the present disclosure described above and/or other desired functions. Various contents such as a pose reference template, a to-be-processed image, and a reference frame image may also be stored in the computer readable storage medium.

In one example, the electronic device 900 may further include: an input device 930 and an output device 940, which are interconnected by a bus system and/or other form of connection mechanism (not shown). The input device 930 includes, but is not limited to, a keyboard, a mouse, a camera, and the like.

Of course, for simplicity, only some of the components of the electronic device 900 relevant to the present disclosure are shown in fig. 9, omitting components such as buses, input/output interfaces, and the like. In addition, electronic device 900 may include any other suitable components depending on the particular application.

Exemplary computer program product and computer-readable storage Medium

In addition to the above-described methods and apparatus, embodiments of the present disclosure may also be a computer program product comprising computer program instructions that, when executed by a processor, cause the processor to perform the steps in the image capture method according to various embodiments of the present disclosure described in the "exemplary methods" section above of this specification.

The computer program product may write program code for carrying out operations for embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium having stored thereon computer program instructions that, when executed by a processor, cause the processor to perform the steps in the image capturing method according to various embodiments of the present disclosure described in the "exemplary methods" section above in this specification.

The computer-readable storage medium may take any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may include, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The foregoing describes the general principles of the present disclosure in conjunction with specific embodiments, however, it is noted that the advantages, effects, etc. mentioned in the present disclosure are merely examples and are not limiting, and they should not be considered essential to the various embodiments of the present disclosure. Furthermore, the foregoing disclosure of specific details is for the purpose of illustration and description and is not intended to be limiting, since the disclosure is not intended to be limited to the specific details so described.

The block diagrams of devices, apparatuses, systems referred to in this disclosure are only given as illustrative examples and are not intended to require or imply that the connections, arrangements, configurations, etc. must be made in the manner shown in the block diagrams. These devices, apparatuses, devices, systems may be connected, arranged, configured in any manner, as will be appreciated by those skilled in the art. Words such as "including," "comprising," "having," and the like are open-ended words that mean "including, but not limited to," and are used interchangeably therewith. The words "or" and "as used herein mean, and are used interchangeably with, the word" and/or, "unless the context clearly dictates otherwise. The word "such as" is used herein to mean, and is used interchangeably with, the phrase "such as but not limited to".

It is also noted that in the devices, apparatuses, and methods of the present disclosure, each component or step can be decomposed and/or recombined. These decompositions and/or recombinations are to be considered equivalents of the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit embodiments of the disclosure to the form disclosed herein. While a number of example aspects and embodiments have been discussed above, those of skill in the art will recognize certain variations, modifications, alterations, additions and sub-combinations thereof.

Claims

1. An image capturing method comprising:

determining human body key points of a shot object in an image preview interface;

determining the posture similarity between the posture of the shot object and a reference posture in a posture reference template according to the human body key points;

and when the posture similarity meets a first preset condition, triggering a photographing instruction to generate an image containing the photographed object.

2. The method of claim 1, wherein the determining of pose similarity between the pose of the subject and a reference pose in a pose reference template according to the human keypoints comprises:

calculating the image surface coordinates of the human body key points on the image preview interface;

and when the image plane coordinates of the human body key points on the image preview interface meet a second preset condition, calculating the posture similarity between the posture of the shot object and the reference posture in the posture reference template.

3. The method of claim 1, wherein the determining of pose similarity between the pose of the subject and a reference pose in a pose reference template according to the human keypoints comprises:

determining a motion track of the shot object on the image preview interface according to the human body key points;

and when the motion trail meets a third preset condition, calculating the attitude similarity between the attitude of the shot object and the reference attitude in the attitude reference template.

4. The method of claim 1, wherein the determining of pose similarity between the pose of the subject and a reference pose in a pose reference template according to the human keypoints comprises:

determining the posture of the shot object according to the relative position relation among a plurality of key points in the human body key points;

when the posture of the shot object meets a fourth preset condition, determining the posture similarity between the posture of the shot object and the reference posture in the reference posture template.

5. The method according to any one of claims 1-4, wherein the determining human key points of the photographed object in the image preview interface comprises:

acquiring an image to be processed in the image preview interface;

and extracting key points of the image to be processed through a preset neural network model to obtain human key points of the shot object.

6. The method of claim 5, wherein the acquiring the to-be-processed image in the image preview interface comprises:

judging whether the shot object exists in the image preview interface by adopting a preset algorithm, wherein the preset algorithm comprises at least one of an optical flow algorithm, an interframe difference algorithm and a background difference algorithm;

and when the shot object exists in the current frame image of the image preview interface, acquiring the current frame image as the image to be processed.

7. The method according to any one of claims 1-4, wherein the human keypoints comprise human joint points of the photographic subject.

8. An image capturing apparatus comprising:

the key point confirming module is used for confirming human key points of the shot object in the image preview interface;

the gesture confirming module is used for confirming gesture similarity between the gesture of the shot object and a reference gesture in a gesture reference template according to the human body key point;

and the shooting module is used for triggering a shooting instruction to generate an image containing the shot object when the posture similarity meets a first preset condition.

9. A computer-readable storage medium storing a computer program for executing the image capturing method according to any one of claims 1 to 7.

10. An electronic device, the electronic device comprising:

a processor;

a memory for storing the processor-executable instructions;

the processor for performing the image capturing method of any one of claims 1 to 7.