WO2020248396A1 - 图像拍摄方法、装置、设备以及存储介质 - Google Patents

图像拍摄方法、装置、设备以及存储介质 Download PDF

Info

Publication number
WO2020248396A1
WO2020248396A1 PCT/CN2019/103656 CN2019103656W WO2020248396A1 WO 2020248396 A1 WO2020248396 A1 WO 2020248396A1 CN 2019103656 W CN2019103656 W CN 2019103656W WO 2020248396 A1 WO2020248396 A1 WO 2020248396A1
Authority
WO
WIPO (PCT)
Prior art keywords
bounding box
image
pixel
reference position
offset
Prior art date
Application number
PCT/CN2019/103656
Other languages
English (en)
French (fr)
Inventor
张明
董健
Original Assignee
睿魔智能科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 睿魔智能科技(深圳)有限公司 filed Critical 睿魔智能科技(深圳)有限公司
Priority to US17/606,075 priority Critical patent/US11736800B2/en
Publication of WO2020248396A1 publication Critical patent/WO2020248396A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • H04N23/611Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/64Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/69Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Definitions

  • This application relates to the field of computer software applications, such as an image shooting method, device, equipment, and storage medium.
  • the camera will be equipped with smart shooting.
  • the smart camera mode only detects the environmental parameters of the current shooting and automatically adjusts the environmental parameters to assist non-professionals in taking professional photos.
  • Such automatic adjustment parameters are usually limited to aperture and shutter. Speed, etc., the degree of intelligence is low. Based on this, the technology of automatically tracking the target for shooting was developed.
  • a bounding box is used to locate the position of the target, and then the movement of the lens is controlled based on the "central control” method to realize the automatic follow-up function.
  • portrait shooting this method has many limitations. Portrait shooting is more complicated. Under different postures, the effect achieved by the traditional bounding box “center control” method is very different from the actual expected effect of human beings. The traditional bounding box “center control method” is only suitable for special situations where the target is very few in the picture.
  • the present application provides an image shooting method, device, equipment, and storage medium, which can automatically control the rotation of the camera based on the pixel-level visual characteristics of the image to improve the shooting effect.
  • This application provides an image shooting method, which includes:
  • the lens shift offset is determined according to the position of each pixel in the bounding box and the first reference position.
  • the present application provides an image capturing device, which includes:
  • the bounding box acquisition module is set to acquire the bounding box of the lens tracking target in the image to be shot;
  • a reference position prediction module configured to use a pre-trained reference model to predict the first reference position of the image to be taken
  • the lens shift determining module is configured to determine the lens shift shift amount according to the position of each pixel in the bounding box and the first reference position.
  • the present application provides an image capturing device, the image capturing device includes a memory and a processor, the memory stores a computer program that can be run on the processor, and the processor implements the aforementioned image capturing when the computer program is executed. method.
  • the present application provides a computer-readable storage medium, the storage medium stores a computer program, the computer program includes program instructions, and the program instructions, when executed, implement the aforementioned image capturing method.
  • FIG. 1 is a flowchart of an image shooting method provided by Embodiment 1 of the present application.
  • FIG. 2 is a sub-flow chart of an image shooting method provided in Embodiment 1 of the present application;
  • FIG. 3 is a flowchart of another image shooting method provided by Embodiment 2 of the present application.
  • FIG. 4 is a training flowchart of a reference model provided in Embodiment 2 of the present application.
  • FIG. 5 is a sub-flow chart of training a reference model provided in the second embodiment of the present application.
  • FIG. 6 is a schematic diagram of the structure of an image capturing device according to Embodiment 3 of the present application.
  • FIG. 7 is a schematic diagram of the structure of a training sub-module of an image shooting device according to Embodiment 3 of the present application.
  • FIG. 8 is a schematic structural diagram of a position acquiring unit of an image shooting device according to Embodiment 3 of this application.
  • FIG. 9 is a schematic structural diagram of a lens shift determining module of an image shooting device according to Embodiment 3 of the application.
  • FIG. 10 is a schematic structural diagram of an image capturing device provided in Embodiment 4 of the present application.
  • first”, “second”, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element.
  • first speed difference may be referred to as the second speed difference
  • second speed difference may be referred to as the first speed difference.
  • the first speed difference and the second speed difference are both speed differences, but the first speed difference and the second speed difference are not the same speed difference.
  • the terms “first”, “second”, etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
  • first and second may explicitly or implicitly include one or more of these features.
  • a plurality of means at least two, such as two, three, etc., unless specifically defined otherwise.
  • one part is said to be “fixed” to another part, it can be directly on the other part or there may be a central part.
  • a part is considered to be “connected” to another part, it may be directly connected to the other part or a central part may be present at the same time.
  • the terms “vertical”, “horizontal”, “left”, “right” and similar expressions used herein are for illustrative purposes only and do not mean that they are the only implementation.
  • Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many steps in this document can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when its operation is completed, but may also have additional steps not included in the drawings. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
  • this embodiment provides an image shooting method, which includes the following steps.
  • the lens tracking target mentioned here refers to the main shooting target that needs to be kept in the lens at all times, such as people, pets and other photographic materials.
  • a bounding box is used to determine the position of the lens tracking target, and the bounding box refers to the area range of the picture where the lens tracking target appears in the image to be shot.
  • the bounding box has a rectangular outer frame shape that is long in the longitudinal direction or the transverse direction. The size and position of the bounding box in this embodiment depend on the size of the lens tracking target in the image captured by the lens. In one embodiment, the bounding box may be determined based on a visual tracking method in related technologies.
  • S120 Predict the first reference position of the image to be shot by using the pre-trained reference model.
  • the “center control” method is usually used to locate the target to the exact center of the image, but this method does not take into account the influence of the posture of the tracked target on the composition of the picture. For example, when shooting a standing portrait, “center control” The method puts the center of the standing portrait at the center of the image, and the upper body of the human body is closer to the center of the image to obtain a better composition effect. Therefore, this embodiment uses a pre-trained reference model to predict the first reference of the image to be shot position.
  • the reference model is based on deep convolutional neural network (Convolutional Neural Networks, CNN) training.
  • the first reference position is to predict the best composition position of the lens tracking target in the image.
  • the best composition position is based on the images taken by a large number of photographers containing the lens tracking target, and the lens tracking target obtained by statistical analysis is in the image taken by the photographer In the location.
  • the best composition position is determined by the reference model according to the information of the lens tracking target in the image.
  • the information of the lens tracking target includes one or more of the size and position of the bounding box of the lens tracking target and the posture of the lens tracking target.
  • S130 Determine a lens movement offset according to each pixel position in the bounding box and the first reference position.
  • the composition prediction position of the bounding box is determined, and the movement offset required by the lens can be calculated by combining the initial position of the bounding box.
  • the traditional bounding box "center control" method only uses the center point of the bounding box to calculate, and the “center control” method calculates the movement offset required by the lens to move the center point of the bounding box to the center of the screen.
  • This calculation method is The effect is better when the bounding box is small enough, but the size of the bounding box in actual shooting is uncertain, and for the composition effect, the proportion of the lens tracking target in the image cannot be too small, that is, the bounding box is in the image. The proportion cannot be too small. Therefore, in order to obtain more accurate lens offset calculation results, this embodiment uses each pixel in the bounding box based on the pixel-level visual features of the image based on the first reference position predicted by the reference model The position calculates the shift offset of the lens.
  • step S130 includes step S1310-step S1320.
  • S1310 Calculate the position offset of each pixel in the bounding box according to the first reference position.
  • (x,y) are the normalized coordinates of the pixel, x represents the horizontal coordinate, and y represents the vertical coordinate.
  • XT is the horizontal coordinate image of the reference position
  • YT is the vertical coordinate image of the reference position, which is predicted by the reference model.
  • DX is the horizontal offset image
  • DY is the vertical offset image, which is calculated by subsequent methods.
  • the formula is used according to the first reference position Calculate the position offset of each pixel in the bounding box.
  • DX(x,y) is the horizontal offset of each pixel in the bounding box
  • XT(x,y) is the horizontal offset of each pixel in the bounding box when the bounding box is located at the first reference position Position, that is, the horizontal coordinate of each pixel in the bounding box in the image predicted by the reference model
  • DY(x,y) is the vertical offset of each pixel in the bounding box
  • YT(x,y) is located in the first reference
  • the vertical position of each pixel in the bounding box of the position is the vertical coordinate of each pixel in the bounding box in the image predicted by the reference model
  • x is the horizontal position of each pixel in the bounding box. It can also be understood as each pixel in the bounding box.
  • the horizontal coordinate of the initial position of, y is the vertical position of each pixel in the bounding box can also be understood as the vertical coordinate of the initial position of each pixel in the bounding box.
  • the coordinate difference between the position of each pixel in the bounding box and the initial position of each pixel can be calculated separately when the bounding box is located at the first reference position, so as to indicate The position offset of each pixel in the bounding box between the image predicted by the reference model and the image taken before the lens shift.
  • S1320 Calculate the lens movement offset according to the position offset of each pixel in the bounding box.
  • an image shooting method is provided.
  • a reference model trained by a deep convolutional neural network is used to predict the image to be captured to obtain a first reference position with a better composition effect, based on the pixel-level visual features of the image and the first reference position.
  • the reference position calculates the position offset of each pixel to obtain the lens shift offset.
  • the image shooting method provided by this application determines the position of the lens tracking target in the image to be shot through the bounding box, and predicts the first reference position of the image to be shot by using a reference model trained on the basis of a convolutional neural network that can simulate the composition idea of a photographer ,
  • the pixel-level calculation method is used to calculate the lens shift offset required to achieve the tracking target at the first reference position.
  • the pixel-level visual characteristics of the camera automatically control the rotation of the camera, can automatically adapt to the change of the target posture and adapt to the change of the camera's shooting angle to shoot, improve the shooting effect, and help improve the user experience.
  • FIG. 3 is a schematic flow chart of another image shooting method provided in Embodiment 2 of the present application. This embodiment is implemented on the basis of Embodiment 1. As shown in FIG. 3, the following steps are further included before step S110.
  • Step S100 Obtain a pre-trained reference model based on deep convolutional neural network training.
  • step S100 obtaining a pre-trained reference model based on deep convolutional neural network training (ie, the training process of the reference model) includes step S310-step S360.
  • multiple training images are preset in the image data set, and the training image type can be selected according to different shooting targets.
  • portrait shooting is taken as an example, and all the training images collected in the image data set include portraits.
  • These training images can cover many types of main scenes such as indoors, beaches and mountains, and various postures such as running, sitting, lying down and dancing.
  • Each training image in the image data set has corresponding label data.
  • the label data in this embodiment includes the bounding box information and key point information of the tracking target in the training image.
  • the bounding box information includes the position of the bounding box and the size of the bounding box.
  • 17 joint points of the human body are exemplarily selected as key points, and coordinate information corresponding to the joint points are respectively marked as key point information.
  • Each joint point is marked as (xi, yi, si), i is a natural number from 1 to 17, indicating the i-th key point, xi is the horizontal coordinate of the i-th key point, and yi is the vertical coordinate of the i-th key point ,
  • si is a natural number from 1 to 17, indicating the i-th key point
  • xi is the horizontal coordinate of the i-th key point
  • yi is the vertical coordinate of the i-th key point
  • S320 Obtain the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target.
  • the traditional "center control” method controls the center point of the target bounding box to move to the center of the image to complete the composition.
  • the calculation process of this method is simple and does not take into account the influence of the target posture on the composition. Therefore, the shooting effect is quite different from the actual expectation.
  • the shooting method provided in this embodiment when training the reference model, the difference in composition requirements of the tracking target in different poses is fully considered, and the different poses of the tracking target can be distinguished according to the different key point information of the tracking target marked in step S310. Tracking the bounding box information and key point information of the target calculates the reference position of the center point of the bounding box, and can fully simulate the photographer's composition control ability, and its composition effect is better.
  • step S320 includes step S3210-step S3230:
  • the horizontal coordinate range and vertical coordinate range of the image are both [0, 1].
  • the setting of the reference point and the reference line can be adjusted according to different composition requirements.
  • the horizontal coordinate range is changed by the above reference point and reference line.
  • vertical coordinate range The limited area is set as the best composition area for tracking the target.
  • p i and p j represent two different points
  • x pi and y pi represent the horizontal and vertical coordinates of the point p i
  • x pj , y pj represents the horizontal coordinate and vertical coordinate of the point p j , respectively.
  • P xy (x/W, y/H), and L xy is a normalized two-point line segment.
  • the second loss value may reflect the degree of conformity between the tracking target and the customized optimal composition area when the bounding box is placed in different positions. The smaller the second loss value, the closer to the customized optimal composition area.
  • custom grids, reference points, and reference lines can be adjusted according to different requirements for image accuracy.
  • the key points of the tracking target and the relationship between the key line segments and the key points can also be customized. For example, when the accuracy is higher, W and H can be increased, that is, the number of grids of the image segmentation grid is increased.
  • (x, y) are the normalized coordinates of the pixel
  • ⁇ ⁇ , ⁇ 1 are the number of tracking targets in the training image
  • X TG (x, y) is the horizontal coordinate of the reference position of each pixel
  • Y TG (x, y) is the vertical coordinate of the reference position of each pixel
  • x ti and x ci are the horizontal coordinate of the reference position and the initial position of the center point of the bounding box of each tracking target, respectively
  • y ti and y ci are respectively
  • the vertical coordinates of the reference position and the vertical coordinates of the initial position of the center point of the bounding box of each tracking target, and the reference position image of the training image can be obtained after the reference position coordinates of each pixel are determined.
  • the reference position image Compared with the image obtained by the traditional "center control" method, the reference position image fully considers the composition requirements when the target pose is different, and the composition effect is more precise and reasonable.
  • the initial model of the deep convolutional neural network is used to predict the training image to obtain the second reference position of the tracking target in the image. Furthermore, the prediction result image can be obtained, and the horizontal and vertical coordinates of each pixel in the prediction result image are X T (x, y) and Y T (x, y), respectively.
  • S350 Calculate a first loss value according to the reference position image and the prediction result image, and adjust the parameters of the deep convolutional neural network according to the first loss value.
  • the first loss value adopts Euclidean distance loss. According to the aforementioned reference position image and prediction result image, it is calculated by formula (2):
  • X TG (x, y) and Y TG (x, y) are obtained from the formula (1), and X T (x, y) and Y T (x, y) are obtained from the prediction result image.
  • the reference position image is an image that is expected to achieve the composition effect.
  • the first loss value represents the deviation between the predicted result image and the reference position image. Based on the first loss value, the deep convolutional neural network is back-propagated to adjust the deep convolutional neural network parameters to make the prediction The resulting image is closer to the reference position image.
  • Steps S310-S350 are sequentially performed on the multiple training images in the image data set, until the first loss value in step S350 no longer decreases, the training of the deep convolutional neural network is ended, and a pre-trained reference model is obtained.
  • the deep convolutional neural network Adjusts the parameters of the deep convolutional neural network according to the first loss value, and you will get different first loss values.
  • the first loss value continues to decrease, it indicates that the predicted result image is getting closer and closer to the reference position image, and the deep convolutional neural network is constantly adjusted Network
  • the final first loss value no longer decreases it can be considered that the predicted result image is closest to the reference position image at this time, and the desired deep convolutional neural network model can be obtained as a trained reference model.
  • the custom-defined first loss value is expected to be lower than k, and the value obtained after multiple training with multiple training images is at least When m consecutive first loss values are always lower than k, it can be deemed that the first loss value no longer decreases.
  • This embodiment provides the training process of the pre-trained reference model used in the first embodiment, and provides a more reasonable composition method based on the key point information of the tracking target.
  • the reference image composition effect achieved is better, based on the reference.
  • the image and the first loss value calculated by the deep convolutional neural network are back-propagated to the deep convolutional neural network to obtain a trained reference model that can adapt to different postures of the target to predict a more reasonable composition prediction image.
  • this embodiment provides an image capturing device 500, which includes: a bounding box acquisition module 510, configured to acquire the bounding box of a lens tracking target in the image to be captured; a reference position prediction module 520, configured to use the The trained reference model predicts the first reference position of the image to be shot; the lens shift determining module 530 is configured to determine the lens shift shift amount according to each pixel position in the bounding box and the first reference position.
  • the frame acquisition module 510 is configured to acquire multiple bounding boxes corresponding to the lens tracking targets according to the number of lens tracking targets in the image to be shot.
  • the reference position prediction module 520 further includes a model training sub-module 521, which is configured to obtain a trained reference model based on deep convolutional neural network training.
  • the model training sub-module 521 includes: a data set unit 5210, configured to obtain training images and corresponding label data from a preset image data set, the label data includes the bounding box information and the tracking target in the training image Key point information; position acquisition unit 5211, set to acquire the reference position of the center point of the bounding box based on the bounding box information and key point information of the tracking target; image acquisition unit 5212, set to acquire the training image correspondence based on the reference position of the center point of the bounding box
  • the image acquisition unit 5212 is configured to acquire training images according to the reference position of the center point of the bounding box of each tracking target, the initial position of the center point of the bounding box of each tracking target, and the number of tracking targets
  • the prediction result image acquisition unit 5213 is set to use the deep convolutional neural network to predict the second reference position of the training image to obtain the prediction result image
  • the loss value processing unit 5214 is set to use the reference position image and the The prediction result image calculates
  • X TG (, y) is the horizontal position of each pixel in the bounding box calculated according to the reference position of the center point of the bounding box
  • X T (, y) is the bounding box predicted by the deep convolutional neural network
  • the horizontal position of each pixel Y TG (x, y) is the vertical position of each pixel in the bounding box calculated according to the reference position of the center point of the bounding box
  • Y T (x, y) is the depth of the convolutional neural network The vertical position of each pixel within the predicted bounding box.
  • the position acquisition unit 5212 includes: a grid division subunit 52120, configured to generate a grid table based on the training image, and divide the training image into W*H grids, W, H is a natural number greater than 1; the second loss value processing subunit 52121 is set to obtain the second loss value when the center of the bounding box is placed at a different grid center; the reference position obtaining subunit 52122 is set to select The center position of the grid with the smallest second loss value is used as the reference position of the center point of the bounding box.
  • the lens offset determination module 530 includes: a pixel position offset obtaining sub-module 5300, configured to calculate the position offset of each pixel in the bounding box according to the first reference position ;
  • the lens movement offset acquisition sub-module 5301 is set to calculate the lens movement offset according to the position offset of each pixel in the bounding box.
  • the pixel position offset obtaining submodule 5300 is configured to use the formula according to the first reference position Calculate the position offset of each pixel in the bounding box;
  • DX (x, y) is the horizontal offset of each pixel in the bounding box
  • XT (x, y) is the horizontal position of each pixel in the bounding box when the bounding box is at the first reference position
  • x is the horizontal position of each pixel in the bounding box
  • DY (x, y) is the vertical offset of each pixel in the bounding box
  • YT (x, y) is when the bounding box is located at the first reference position
  • y is the vertical position of each pixel in the bounding box.
  • the lens movement offset obtaining sub-module 5301 is set to use the formula according to the position offset of each pixel in the bounding box Calculate the lens movement offset d;
  • d (d x , d y ), d x is the horizontal movement offset of the lens, d y is the vertical movement offset of the lens, (x, y) ⁇ ⁇ indicates that the pixel (x, y) belongs to the boundary In the box ⁇ , ⁇ (x, y) ⁇ ⁇ 1 represents the sum of the number of pixels contained in the bounding box ⁇ .
  • This embodiment provides an image shooting device that can automatically adapt to changes in the posture of a target and adapt to changes in the shooting angle of the camera to perform shooting, thereby improving the shooting effect and improving the user experience.
  • An image capturing device provided in an embodiment of the present application can execute an image capturing method provided in the foregoing embodiment of the present application, and has functional modules and beneficial effects corresponding to the execution method.
  • FIG. 10 is a schematic structural diagram of an image capturing device 600 provided in Embodiment 4 of the application.
  • the image capturing device includes a memory 610 and a processor 620.
  • the number of processors 620 in the image capturing device may be For one or more, one processor 620 is taken as an example in FIG. 10; the memory 610 and the processor 620 in the image capturing device may be connected through a bus or in other ways. In FIG. 10, the connection through a bus is taken as an example.
  • the memory 610 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the image capturing method in the embodiment of the present application (for example, the boundary in the image capturing device).
  • the processor 620 executes various functional applications and data processing of the image capturing device by running the software programs, instructions, and modules stored in the memory 610, thus realizing the foregoing image capturing method.
  • the processor 620 is configured to run a computer executable program stored in the memory 610 to achieve the following: step S110, obtain the bounding box of the lens tracking target in the image to be shot; step S120, use the pre-trained The reference model predicts the first reference position of the image to be shot; step S130, determining the lens shift offset according to each pixel position in the bounding box and the first reference position.
  • An image capturing device provided by an embodiment of the present application is not limited to the method operations described above, and may also perform related operations in the image capturing method provided in any embodiment of the embodiment of the present application.
  • the memory 610 may mainly include a program storage area and a data storage area.
  • the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the terminal, and the like.
  • the memory 610 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 610 may include a memory remotely provided with respect to the processor 620, and these remote memories may be connected to the image capturing device through a network. Examples of the aforementioned networks include but are not limited to the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • This embodiment provides an image shooting device, which can automatically adapt to changes in the posture of a target and adapt to changes in the shooting angle of the camera for shooting, improving the shooting effect, and helping to improve the user experience.
  • the fifth embodiment of the present application also provides a storage medium containing computer-executable instructions, when the computer-executable instructions are executed by a computer processor, are used to execute an image capturing method, the image capturing method includes: acquiring an image to be captured The inner camera tracks the bounding box of the target; uses a pre-trained reference model to predict the first reference position of the image to be shot; determines the lens shift offset according to each pixel position in the bounding box and the first reference position .
  • An embodiment of the application provides a storage medium containing computer-executable instructions.
  • the computer-executable instructions are not limited to the method operations described above, and can also perform related operations in the image capturing method provided by any embodiment of the application. .
  • this application can be implemented by software and general hardware, or can be implemented by hardware.
  • the technical solution of this application can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, read-only memory (ROM), Random Access Memory (RAM), flash memory (FLASH), hard disk or optical disk, etc., including multiple instructions to make a computer device (may be a personal computer, image capture device, or network device, etc.) Apply the method described in any embodiment.
  • the multiple units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be realized; in addition, the function of each functional unit
  • the names are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Studio Devices (AREA)
  • Image Analysis (AREA)

Abstract

本文公开了一种图像拍摄方法、装置、设备以及存储介质,该方法包括:获取待拍摄图像内镜头跟踪目标的边界框;利用预先训练好的参考模型预测待拍摄图像的第一参考位置;根据边界框内每个像素的位置和第一参考位置确定镜头移动偏移量。

Description

图像拍摄方法、装置、设备以及存储介质
本申请要求在2019年06月12日提交中国专利局、申请号为201910506435.6的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机软件应用领域,例如涉及一种图像拍摄方法、装置、设备以及存储介质。
背景技术
随着人们生活水平的提高以及电子设备的发展,拍照更为大众化,但每个人的拍照水平不一样,为了让不同拍摄水平的人也能拍摄出高质量的照片,相机中会设置有智能拍摄模式,在相关技术中,智能拍照模式只是通过检测当前拍摄的环境参数,针对该环境参数进行自动调节,以协助非专业人士拍摄出专业的照片,这种自动调节的参数通常只限于光圈、快门速度等,智能化程度较低。基于此发展出了自动跟踪目标进行拍摄的技术。
自动跟踪目标进行拍摄被应用在众多场景下,通过一个边界框来定位目标的位置,然后基于“中心控制”法来控制镜头的移动,来实现自动跟拍功能。然而,在人像拍摄中,这一方法具有很多局限性。人像拍摄比较复杂,不同姿态下,传统的边界框“中心控制”法实现的效果与人类的实际期望效果存在很大的差异。传统边界框“中心控制法”仅仅适用于目标在画面中非常少的特殊情形下。
发明内容
本申请提供了一种图像拍摄方法、装置、设备以及存储介质,能够基于图像的像素级视觉特征自动控制摄像头的转动,提高拍摄效果。
本申请提供了一种图像拍摄方法,该图像拍摄方法包括:
获取待拍摄图像内镜头跟踪目标的边界框;
利用预先训练好的参考模型预测所述待拍摄图像的第一参考位置;
根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量。
本申请提供了一种图像拍摄装置,该图像拍摄装置包括:
边界框获取模块,设置为获取待拍摄图像内镜头跟踪目标的边界框;
参考位置预测模块,设置为预先训练好的参考模型利用预先训练好的参考模型预测所述待拍摄图像的第一参考位置;
镜头偏移确定模块,设置为根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量。
本申请提供了一种图像拍摄设备,该图像拍摄设备包括存储器和处理器,所述存储器上存储有可在处理器运行的计算机程序,所述处理器执行所述计算机程序时实现前述的图像拍摄方法。
本申请提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被执行时实现前述的图像拍摄方法。
附图说明
图1是本申请实施例一提供的一种图像拍摄方法的流程图;
图2是本申请实施例一提供的一种图像拍摄方法的子流程图;
图3是本申请实施例二提供的另一种图像拍摄方法的流程图;
图4是本申请实施例二提供的一种参考模型的训练流程图;
图5是本申请实施例二提供的一种参考模型的训练子流程图;
图6是本申请实施例三提供的一种图像拍摄装置的结构示意图;
图7是本申请实施例三提供的一种图像拍摄装置的训练子模块结构示意图;
图8为本申请实施例三提供的一种图像拍摄装置的位置获取单元结构示意图;
图9为本申请实施例三提供的一种图像拍摄装置的镜头偏移确定模块结构示意图;
图10是本申请实施例四提供的一种图像拍摄设备的结构示意图。
具体实施方式
下面结合本申请实施例中的附图,对本申请实施中的技术方案进行描述。本文所描述的具体实施例仅仅是本申请一部分实施例,而不是全部的实施例,仅用于解释本申请,而非对本申请的限定。
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术 领域的技术人员理解的含义相同。本文中在本申请的说明书中使用的术语只是为了描述实施方式的目的,不是旨在于限制本申请。本文所使用的术语“和/或”包括一个或多个相关的所列项目的任意的和所有的组合。
此外,术语“第一”、“第二”等可在本文中用于描述多种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语仅用于将一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一速度差值称为第二速度差值,且类似地,可将第二速度差值称为第一速度差值。第一速度差值和第二速度差值两者都是速度差值,但第一速度差值和第二速度差值不是同一速度差值。术语“第一”、“第二”等不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确的限定。在一个部分被称为“固定于”另一个部分的情况下,它可以直接在另一个部分上也可以存在居中的部分。在一个部分被认为是“连接”到另一个部分的情况下,它可以是直接连接到另一个部分或者可能同时存在居中部分。本文所使用的术语“垂直的”、“水平的”、“左”、“右”以及类似的表述,只是为了说明的目的,并不表示是唯一的实施方式。
一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将多个步骤描述成顺序的处理,但是本文中的许多步骤可以被并行地、并发地或者同时实施。此外,多个步骤的顺序可以被重新安排。当其操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、函数、规程、子例程、子程序等等。
实施例一
参见图1,本实施例提供了一种图像拍摄方法,该方法包括以下步骤。
S110、获取待拍摄图像内镜头跟踪目标的边界框。
在拍摄图像的情况下,为了实现更佳的构图效果通常将待拍摄目标或镜头跟踪目标尽可能的置于图像的中心,因此在调整镜头移动前,需要先确定镜头跟踪目标在图像中的位置,此处所指的镜头跟踪目标指的是需要始终保持在镜头内的主要拍摄目标,如人、宠物以及其他摄影素材。本实施例中采用边界框确定镜头跟踪目标的位置,边界框指对应于待拍摄图像中的镜头跟踪目标所出现的画面的区域范围。一实施例中,边界框具有在纵向或横向上长的矩形外框形状。本实施例边界框的大小和位置取决于镜头跟踪目标在镜头所采集的图像中的大小,一实施例中,边界框可以基于相关技术中的视觉追踪方法确定。
S120、利用预先训练好的参考模型预测待拍摄图像的第一参考位置。
相关技术中通常使用“中心控制”法将目标定位到图像的正中心,但是这种方式并未考虑到所跟踪目标的姿态不同在构图时的影响,例如拍摄站立的人像时,“中心控制”法会将站立人像的正中心置于图像中心,而将人体的上半身更靠近图像中心能获得更佳的构图效果,因此本实施例采用预先训练好的参考模型来预测待拍摄图像的第一参考位置。
参考模型基于深度卷积神经网络(Convolutional Neural Networks,CNN)训练得到。第一参考位置为预测镜头跟踪目标在图像中的最佳构图位置,最佳构图位置是根据大量摄影师所拍摄的包含镜头跟踪目标的图像,统计分析得到的镜头跟踪目标在摄影师拍摄的图像中的位置。最佳构图位置由参考模型根据图像中镜头跟踪目标的信息所确定,镜头跟踪目标的信息包括镜头跟踪目标的边界框的大小、位置以及镜头跟踪目标的姿态中的一种或多种。
S130、根据边界框内每个像素位置和第一参考位置确定镜头移动偏移量。
在确定了第一参考位置后即确定了边界框的构图预测位置,结合边界框的初始位置即可计算出镜头需求的移动偏移量。传统的边界框“中心控制”法仅仅使用边界框的中心点进行计算,通过“中心控制”法计算将边界框中心点移动到画面的中心位置镜头需求的移动偏移量,这种计算方式在边界框足够小的情况下效果比较好,但是实际拍摄中边界框的大小是不确定的,且为了构图效果,镜头跟踪目标在图像中所占的比例不能过小,即边界框在图像中所占的比例不能过小,因此为了得到更精确的镜头偏移量计算结果,本实施例在参考模型预测的第一参考位置基础上,基于图像的像素级视觉特征使用边界框内的每个像素位置计算镜头的移动偏移量。
在一些实施例中,如图2所示,步骤S130包括步骤S1310-步骤S1320。
S1310、根据第一参考位置计算得到边界框内每个像素的位置偏移量。
定义:(x,y)为像素归一化坐标,x表示水平方向坐标,y表示垂直方向坐标。
XT为参考位置水平坐标图像,YT为参考位置垂直坐标图像,由参考模型预测得到。
DX为水平偏移图像,DY为垂直偏移图像,通过后续方法计算得到。
一实施例中,根据第一参考位置利用公式
Figure PCTCN2019103656-appb-000001
计算得到边界框内每个像素的位置偏移量。
上述公式中,DX(x,y)为边界框内每个像素的水平偏移量,XT(x,y)为在边界框位于第一参考位置的情况下,边界框内每个像素的水平位置,即参考模型预 测后的图像中边界框内每个像素的水平坐标,DY(x,y)为边界框内每个像素的垂直偏移量,YT(x,y)为位于第一参考位置的边界框内每个像素的垂直位置即参考模型预测后的图像中边界框内每个像素的垂直坐标,x为边界框内每个像素的水平位置也可以理解为边界框内每个像素的初始位置的水平坐标,y为边界框内每个像素的垂直位置也可以理解为边界框内每个像素的初始位置的垂直坐标。
本实施例中,根据S1310中的计算公式可以分别计算在边界框位于第一参考位置的情况下,边界框内每个像素的位置与所述每个像素的初始位置的坐标差值,以表示参考模型预测的图像与镜头偏移前所拍摄的图像相比,边界框内每个像素的位置偏移量。
S1320、根据边界框内每个像素的位置偏移量计算得到镜头移动偏移量。
一实施例中,根据边界框内每个像素的位置偏移量,利用公式
Figure PCTCN2019103656-appb-000002
计算实现参考模型所预测的图像所需的镜头移动偏移量d;上述公式中,d x为镜头的水平移动偏移量,d y为镜头的垂直移动偏移量,(x,y)∈Θ表示像素(x,y)属于边界框Θ内,∑ (x,y)∈Θ1表示的是边界框Θ内包含的像素数之和,镜头移动偏移量d=(d x,d y)。
本实施例中提供了一种图像拍摄方法,采用由深度卷积神经网络训练好的参考模型对待拍摄图像进行预测得到构图效果更佳的第一参考位置,基于图像的像素级视觉特征和第一参考位置计算每个像素的位置偏移量从而得到镜头移动偏移量,本实施例的技术方案能够自动适应拍摄目标的不同姿态、不同位置,预测目标的参考位置控制摄像头移动实现更佳构图效果,无需人为控制摄像头的转动即可提高拍摄效果,提升用户的拍摄体验。
本申请提供的图像拍摄方法通过边界框确定镜头跟踪目标在待拍摄图像中的位置,利用基于卷积神经网络训练好的能够模拟摄像师构图思路的参考模型,预测待拍摄图像的第一参考位置,根据第一参考位置和待拍摄图像中用于确定跟踪目标位置的边界框,采用像素级的计算方式计算出实现跟踪目标位于第一参考位置所需的镜头移动偏移量,实现了基于图像的像素级视觉特征,自动控制摄像头的转动,能自动适应目标姿态的变化及适应相机的拍摄角度变化来进行拍摄,提高拍摄效果,有利于提高用户使用体验。
实施例二
图3是本申请实施例二提供的另一种图像拍摄方法的流程示意图,本实施例在实施例一的基础上实现,如图3所示,在步骤S110之前还包括以下步骤。
步骤S100、基于深度卷积神经网络训练得到预先训练好的参考模型。
在一些实施例中,如图4所示,步骤S100、基于深度卷积神经网络训练得到预先训练好的参考模型(即参考模型的训练过程)包括步骤S310-步骤S360。
S310、从预先设定的图像数据集中获取训练图像和对应的标记数据,标记数据包括训练图像中跟踪目标的边界框信息和关键点信息。
本实施例中,图像数据集中预先设置有多张训练图像,训练图像类型可以根据拍摄目标不同自行选择,本实施例中以人像拍摄为例,图像数据集中搜集的均为包括人像的训练图像,这些训练图像可以覆盖多类主要场景如:室内、海边和山上以及多种姿态如:跑步、打坐、平躺和舞蹈。
图像数据集中每张训练图像都具有对应的标记数据,本实施列的标记数据包括训练图像中跟踪目标的边界框信息和关键点信息。边界框信息包括边界框的位置和边界框的大小。本实施例中,示例性的选择人体的17个关节点作为关键点,分别标记关节点对应的坐标信息作为关键点信息。每个关节点标记为(xi,yi,si),i为1到17的自然数,表示第i个关键点,xi为第i个关键点的水平坐标,yi为第i个关键点的垂直坐标,si等于0时表示该关键点不存在(对应的xi和yi均为0),si等于1时表示该关键点存在,i为1到17时分别对应以下关键点信息:1-头顶、2-左眼、3-右眼、4-鼻子、5-咽喉、6-左肩、7-左肘、8-左腕、9-右肩、10-右肘、11-右腕、12-左臀、13-左膝、14-左踝、15-右臀、16-右膝、17-右踝。
S320、根据跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置。
传统的“中心控制”法控制目标边界框中心点移动到图像的中心完成构图,这种方式计算过程简单并未考虑到目标的姿态不同对构图的影响因而拍摄效果与实际期望相差较大,因此,本实施例提供的拍摄方法中,在训练参考模型时充分考虑跟踪目标不同姿态时的构图需求差异,根据步骤S310中所标记的跟踪目标关键点信息不同可以区别出跟踪目标的不同姿态,根据跟踪目标的边界框信息和关键点信息计算边界框中心点的参考位置,并且能够充分模拟摄影师的构图控制能力,其构图效果更好。
在一些实施例中,如图5所示,步骤S320包括步骤S3210-步骤S3230:
S3210、基于训练图像生成一幅网格表,将训练图像划分为W*H个网格,W、H为大于1的自然数,每个网格在后续计算边界框的构图位置时提供一个位置选择,W、H的数值可根据精度需求调整。
S3220、获取在将边界框中心放置于不同的网格中心的情况下的第二损失 值。
第二损失值的计算过程如下:
图像的水平坐标范围和垂直坐标范围均为[0,1]。
(1)定义一组参考点,示例如下:
Figure PCTCN2019103656-appb-000003
(2)定义一组参考线,示例如下:
Figure PCTCN2019103656-appb-000004
参考点和参考线的设置基于构图需求不同可自行调整,本实施例中通过上述参考点、参考线,将水平坐标范围
Figure PCTCN2019103656-appb-000005
和垂直坐标范围
Figure PCTCN2019103656-appb-000006
所限定的区域定为追踪目标最佳构图区域。
(3)基于跟踪目标的关键点信息定义跟踪目标的关键点集合和对应的权值参数集合:
P={p i},i=1,2,…,17;
W p={w pi},i=1,2,…,17。
(4)根据跟踪目标的关键点信息定义关键线段,关键线段用于补充跟踪目标的姿态信息,基于关键点所体现的姿态在一定情况下存在一些误差,结合基于关键点的关键线段可以更清晰的体现跟踪目标的姿态,示例性的为:
L1:鼻子->{左臀和右臀中点};
L2:左肩->左肘;
L3:左肘->左腕;
L4:右肩->右肘;
L5:右肘->右腕;
L6:左臀->左膝;
L7:左膝->左踝;
L8:右臀->右膝;
L9:右膝->右踝。
(5)基于上述9条关键线段分别定义跟踪目标的关键线段集合和对应的权值参数集合:
L={l j},j=1,2,…,9;
W l={w lj},j=1,2,…,9。
当跟踪目标的姿态不同时,目标的关键点位置发生变化,上述关键线段的长度、位置均会对应发生变化。
(6)关键点与参考点之间的距离计算公式:
Figure PCTCN2019103656-appb-000007
本实施例中,关键点与参考点之间的距离计算公式中p i、p j分别代表两个不同的点,x pi、y pi分别表示点p i的水平坐标和垂直坐标,x pj、y pj分别表示点p j的水平坐标和垂直坐标。
(7)关键线与参考线之间的距离计算公式:
Figure PCTCN2019103656-appb-000008
关键线与参考线之间的距离计算公式中,(x c,y c)是线段l的中点,x=a表示一条垂直线,y=a表示一条水平线。
(8)将边界框中心分别放置到不同网格的中心(x,y)处,计算此时第二损失值损失值D xy
Figure PCTCN2019103656-appb-000009
Figure PCTCN2019103656-appb-000010
D xy=D p+D l
上述公式中,P xy=P→(x,y)为关键点归一化,L xy=L→(x,y)为关键线段归一化。
在一实施例中,P xy=(x/W,y/H),L xy为归一化后的两点的线段。
第二损失值可以体现将边界框放置到不同位置时跟踪目标与自定义的最佳构图区域的符合程度,第二损失值越小越接近自定义的最佳构图区域。
S3230、选取第二损失值最小的网格的中心位置作为边界框中心点的参考位置。
Figure PCTCN2019103656-appb-000011
时选取(x t,y t)作为边界框中心点的参考位置,在自定义的网格、参考点和参考线不变的情况下,(x t,y t)与对应的关键点信息(此处包括关键线段)关系是确定的,即映射关系为(x t,y t)=O(P),P为镜头追踪拍摄目标的关键点信息。
在替代实施例中,根据对图像精度的需求不同可以调整自定义的网格、参考点和参考线。一实施例中,还可以自定义跟踪目标的关键点以及关键线段和关键点的关系。例如精度要求较高时,可以将W、H提高,即增加了图像分割网格的格数。
S330、基于边界框中心点的参考位置获取训练图像对应的参考位置图像。
在训练图像中存在多个目标人像的情况下,需要根据每个跟踪目标的边界框中心点的参考位置、每个跟踪目标的边界框中心点的初始位置和跟踪目标数量获取训练图像对应的参考位置图像,获取方式如下:
(1)所有跟踪目标的边界框中心点的参考位置集合定义为:
Θ={O(P i)}={(x ti,y ti)}。
(2)每个跟踪目标的边界框中心的初始位置坐标定义为:
Δ={(x ci,y ci)}。
(3)训练图像中每个像素的参考位置计算公式:
Figure PCTCN2019103656-appb-000012
式(1)中,(x,y)为像素归一化坐标,∑ Θ,Δ1为训练图像中的跟踪目标数量,X TG(x,y)为每个像素参考位置的水平坐标,Y TG(x,y)为每个像素参考位置的垂直坐标,x ti、x ci分别为每个跟踪目标的边界框中心点的参考位置水平坐标和初始位置水平坐标,y ti、y ci分别为每个跟踪目标的边界框中心点的参考位置垂直坐标和初始位置垂直坐标,当每个像素的参考位置坐标确定后即可得到训练图像的参考位置图像。
参考位置图像与传统“中心控制”法得到的图像相比更充分地考虑到了目标姿态不同时的构图需求,构图效果更精细合理。
S340、利用深度卷积神经网络预测训练图像的第二参考位置以得到预测结果图像。
利用深度卷积神经网络初始模型对训练图像进行预测,得到跟踪目标在图像中的第二参考位置。进而可以得到预测结果图像,预测结果图像中每个像素的水平坐标和垂直坐标分别为X T(x,y)、Y T(x,y)。
S350、根据参考位置图像和预测结果图像计算第一损失值,并根据第一损失值对深度卷积神经网络的参数进行调节。
第一损失值采用欧几里得距离损失,根据前述得到参考位置图像和预测结 果图像通过公式(2)计算得到:
L=∑ x,y(X TG(x,y)-X T(x,y)) 2+∑ x,y(Y TG(x,y)-Y T(x,y)) 2     (2)
(2)式中X TG(x,y)、Y TG(x,y)由(1)式求得,X T(x,y)、Y T(x,y)由预测结果图像求得。参考位置图像是期望实现构图效果的图像,第一损失值表示预测结果图像与参考位置图像偏差,基于第一损失值对深度卷积神经网络进行反向传播调节深度卷积神经网络参数,使得预测结果图像更接近参考位置图像。
S360、对图像数据集中的多张训练图像依次执行步骤S310-S350,直到步骤S350中的第一损失值不再下降,结束对深度卷积神经网络的训练,得到预先训练好的参考模型。
根据第一损失值调整深度卷积神经网络的参数,会得到不同的第一损失值,当第一损失值不断下降时表明预测结果图像越来越接近参考位置图像,不断地调节深度卷积神经网络,最终第一损失值不再降低时可以视为此时预测结果图像最接近参考位置图像,此时可以获得所期望的深度卷积神经网络模型作为训练好的参考模型使用。
由于不同训练图像得到的第一损失值之间可能存在一定差异,因此无法保证每个训练图像计算得到的第一损失值能同时达到最低,此处所指的第一损失值不再下降是一种表示第一损失值趋于稳定且达到预期要求的表述方式,示例性的:自定义第一损失值预期要求为低于k,则在采用多个训练图像进行的多次训练后得到的至少m个连续的第一损失值始终低于k时即可视为第一损失值不再下降。
本实施例提供了实施例一中所使用的预先训练好的参考模型的训练流程,基于跟踪目标的关键点信息提供了更为合理的构图方式,其实现的参考图像构图效果更好,基于参考图像和深度卷积神经网络计算得到的第一损失值对深度卷积神经网络进行反向传播得到的训练好的参考模型能够适应目标的不同姿态预测出构图更合理的预测图像。
实施例三
如图6所示,本实施例提供了一种图像拍摄装置500,包括:边界框获取模块510,设置为获取待拍摄图像内镜头跟踪目标的边界框;参考位置预测模块520,设置为利用预先训练好的参考模型预测所述待拍摄图像的第一参考位置;镜头偏移确定模块530,设置为根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量。
本实施例中,边框获取模块510是设置为根据待拍摄图像内镜头跟踪目标的数量不同获取多个与镜头跟踪目标对应的边界框。
本实施例中,如图7所示,参考位置预测模块520还包括模型训练子模块521,模型训练子模块521设置为基于深度卷积神经网络训练获得训练好的参考模型。
如图7所示,模型训练子模块521包括:数据集单元5210,设置为从预先设定的图像数据集中获取训练图像和对应的标记数据,标记数据包括训练图像中跟踪目标的边界框信息和关键点信息;位置获取单元5211,设置为根据跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置;图像获取单元5212,设置为基于边界框中心点的参考位置获取训练图像对应的参考位置图像,一实施例中,图像获取单元5212是设置为根据每个跟踪目标的边界框中心点的参考位置、每个跟踪目标的边界框中心点的初始位置和跟踪目标数量获取训练图像对应的参考位置图像;预测结果图像获取单元5213,设置为利用深度卷积神经网络预测训练图像的第二参考位置以得到预测结果图像;损失值处理单元5214,设置为根据参考位置图像和所述预测结果图像计算第一损失值,并根据第一损失值对深度卷积神经网络的参数进行调节;模型获取单元5215,设置为在第一损失值不再下降的情况下,结束对深度卷积神经网络的训练,得到训练好的参考模型。
一实施例中,第一损失值利用公式L=∑ x,y(X TG(x,y)-X T(x,y)) 2+∑ x,y(Y TG(x,y)-Y T(x,y)) 2得到。
上述公式中,X TG(,y)为根据边界框中心点的参考位置所计算的边界框内每个像素的水平位置,X T(,y)为由深度卷积神经网络预测的边界框内每个像素的水平位置,Y TG(x,y)为根据边界框中心点的参考位置所计算的边界框内每个像素的垂直位置,Y T(x,y)为由深度卷积神经网络预测的边界框内每个像素的垂直位置。
一实施例中,如图8所示,位置获取单元5212包括:网格划分子单元52120,设置为基于训练图像生成一幅网格表,将训练图像划分为W*H个网格,W、H为大于1的自然数;第二损失值处理子单元52121,设置为获取在将边界框中心放置于不同的网格中心的情况下的第二损失值;参考位置获取子单元52122,设置为选取第二损失值最小的网格的中心位置作为边界框中心点的参考位置。
一实施例中,如图9所示,镜头偏移确定模块530包括:像素的位置偏移量获取子模块5300,设置为根据第一参考位置计算得到边界框内每个像素的位置偏移量;镜头移动偏移量获取子模块5301,设置为根据边界框内每个像素的位置偏移量计算得到镜头移动偏移量。
一实施例中,像素的位置偏移量获取子模块5300是设置为根据第一参考位置,利用公式
Figure PCTCN2019103656-appb-000013
计算得到边界框内每个像素的位置偏移量;
其中,DX(x,y)为边界框内每个像素的水平偏移量,XT(x,y)为在边界框位于第一参考位置的情况下,边界框内每个像素的水平位置,x为边界框内每个像素的水平位置,DY(x,y)为边界框内每个像素的垂直偏移量,YT(x,y)为在边界框位于第一参考位置的情况下,边界框内每个像素的垂直位置,y为边界框内每个像素的垂直位置。
一实施例中,镜头移动偏移量获取子模块5301是设置为根据边界框内每个像素的位置偏移量,利用公式
Figure PCTCN2019103656-appb-000014
计算得到镜头移动偏移量d;
其中,d=(d x,d y),d x为镜头的水平移动偏移量,d y为镜头的垂直移动偏移量,(x,y)∈Θ表示像素(x,y)属于边界框Θ内,∑ (x,y)∈Θ1表示的是边界框Θ内包含的像素数之和。
本实施例提供了一种图像拍摄装置,能自动适应目标姿态的变化及适应相机的拍摄角度变化来进行拍摄,提高拍摄效果,有利于提高用户使用体验。
本申请实施例所提供的一种图像拍摄装置可执行本申请前述实施例所提供的一种图像拍摄方法,具备执行方法相应的功能模块和有益效果。
实施例四
图10为本申请实施例四提供的一种图像拍摄设备600的结构示意图,如图10所示,该种图像拍摄设备包括存储器610、处理器620,图像拍摄设备中处理器620的数量可以是一个或多个,图10中以一个处理器620为例;图像拍摄设备中的存储器610、处理器620可以通过总线或其他方式连接,图10中以通过总线连接为例。
存储器610作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的图像拍摄方法对应的程序指令/模块(例如,图像拍摄装置中的边界框获取模块510、参考位置预测模块520、镜头偏移确定模块530)。处理器620通过运行存储在存储器610中的软件程序、指令以及模块,从而执行图像拍摄设备的多种功能应用以及数据处理,即实现上述的图像拍摄方法。
本实施例中,所述处理器620设置为运行存储在存储器610中的计算机可执行程序,以实现如下:步骤S110、获取待拍摄图像内镜头跟踪目标的边界框;步骤S120、利用预先训练好的参考模型预测所述待拍摄图像的第一参考位置;步骤S130、根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动 偏移量。
本申请实施例所提供的一种图像拍摄设备,该图像拍摄设备不限于如上所述的方法操作,还可以执行本申请实施例任意实施例所提供的图像拍摄方法中的相关操作。
存储器610可主要包括存储程序区和存储数据区。一实施例中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器610可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器610可包括相对于处理器620远程设置的存储器,这些远程存储器可以通过网络连接至图像拍摄设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
本实施例提供了一种图像拍摄设备,能自动适应目标姿态的变化及适应相机的拍摄角度变化来进行拍摄,提高拍摄效果,有利于提高用户使用体验。
实施例五
本申请实施例五还提供一种包含计算机可执行指令的存储介质,所述计算机可执行指令在由计算机处理器执行时用于执行一种图像拍摄方法,该图像拍摄方法包括:获取待拍摄图像内镜头跟踪目标的边界框;利用预先训练好的参考模型预测所述待拍摄图像的第一参考位置;根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量。
本申请实施例所提供的一种包含计算机可执行指令的存储介质,该计算机可执行指令不限于如上所述的方法操作,还可以执行本申请任意实施例所提供的图像拍摄方法中的相关操作。
通过以上关于实施方式的描述,所属领域的技术人员可以了解到,本申请可借助软件及通用硬件来实现,也可以通过硬件实现。基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括多个指令用以使得一台计算机设备(可以是个人计算机,图像拍摄设备,或者网络设备等)执行本申请任意实施例所述的方法。
上述图像拍摄装置的实施例中,所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,每个功能单元的名称也只是为了便于相互区分,并不用于限制本申请的保护范围。

Claims (10)

  1. 一种图像拍摄方法,包括:
    获取待拍摄图像内镜头跟踪目标的边界框;
    利用预先训练好的参考模型预测所述待拍摄图像的第一参考位置;
    根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量。
  2. 根据权利要求1所述的方法,其中,所述根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量包括:
    根据所述第一参考位置计算得到所述边界框内每个像素的位置偏移量;
    根据所述边界框内每个像素的位置偏移量计算得到镜头移动偏移量。
  3. 根据权利要求1或2所述的方法,其中,所述预先训练好的参考模型的训练过程包括:
    从预先设定的图像数据集中获取训练图像和对应的标记数据,所述标记数据包括所述训练图像中跟踪目标的边界框信息和关键点信息;
    根据所述跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置;
    基于所述边界框中心点的参考位置获取所述训练图像对应的参考位置图像;
    利用深度卷积神经网络预测所述训练图像的第二参考位置以得到预测结果图像;
    根据所述参考位置图像和所述预测结果图像计算第一损失值,并根据所述第一损失值对所述深度卷积神经网络的参数进行调节;
    对所述图像数据集中的多张训练图像依次执行上述步骤,直到第一损失值不再下降,结束对所述深度卷积神经网络的训练,得到所述预先训练好的参考模型。
  4. 根据权利要求3所述的方法,其中,所述根据所述跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置包括:
    基于所述训练图像生成一幅网格表,将所述训练图像划分为W*H个网格,W、H为大于1的自然数;
    获取在将边界框中心放置于不同的网格中心的情况下的第二损失值;
    选取所述第二损失值最小的网格的中心位置作为所述边界框中心点的参考 位置。
  5. 根据权利要求3或4所述的方法,其中,所述基于所述边界框中心点的参考位置获取所述训练图像对应的参考位置图像包括:根据每个跟踪目标的边界框中心点的参考位置、所述每个跟踪目标的边界框中心点的初始位置和跟踪目标数量获取所述训练图像对应的参考位置图像。
  6. 根据权利要求2-5任一项所述的方法,其中,所述根据所述第一参考位置计算得到所述边界框内每个像素的位置偏移量包括:根据所述第一参考位置,利用公式
    Figure PCTCN2019103656-appb-100001
    计算得到所述边界框内每个像素的位置偏移量;
    其中,DX(x,y)为所述边界框内每个像素的水平偏移量,XT(x,y)为在所述所述边界框位于所述第一参考位置的情况下,所述边界框内每个像素的水平位置,DY(x,y)为所述边界框内每个像素的垂直偏移量,YT(x,y)为在所述边界框位于所述第一参考位置的情况下,所述边界框内每个像素的垂直位置,x为所述边界框内每个像素的水平位置,y为所述边界框内每个像素的垂直位置;
    所述根据所述边界框内每个像素的位置偏移量计算得到镜头移动偏移量包括:根据所述边界框内每个像素的位置偏移量,利用公式
    Figure PCTCN2019103656-appb-100002
    计算得到镜头移动偏移量d;
    其中,d=(d x,d y),d x为镜头的水平移动偏移量,d y为镜头的垂直移动偏移量,(x,y)∈Θ表示像素(x,y)属于边界框Θ内,∑ (x,y)∈Θ1表示的是所述边界框Θ内包含的像素数之和。
  7. 根据权利要求3-6任一项所述的方法,其中,所述第一损失值利用公式L=∑ x,y(X TG(x,y)-X T(x,y)) 2+∑ x,y(Y TG(x,y)-Y T(x,y)) 2计算得到;
    其中,X TG(x,y)为根据所述边界框中心点的参考位置所计算的边界框内每个像素的水平位置,X T(x,y)为由所述深度卷积神经网络预测的边界框内每个像素的水平位置,Y TG(x,y)为根据所述边界框中心点的参考位置所计算的边界框内每个像素的垂直位置,Y T(x,y)为由所述深度卷积神经网络预测的边界框内每个像素的垂直位置。
  8. 一种图像拍摄装置,包括:
    边界框获取模块,设置为获取待拍摄图像内镜头跟踪目标的边界框;
    参考位置预测模块,设置为预先训练好的参考模型利用预先训练好的参考 模型预测所述待拍摄图像的第一参考位置;
    镜头偏移确定模块,设置为根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量。
  9. 一种图像拍摄设备,包括存储器和处理器,所述存储器上存储有可在处理器运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1-7的图像拍摄方法。
  10. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被执行时实现如权利要求1-7任意一项所述的图像拍摄方法。
PCT/CN2019/103656 2019-06-12 2019-08-30 图像拍摄方法、装置、设备以及存储介质 WO2020248396A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/606,075 US11736800B2 (en) 2019-06-12 2019-08-30 Method, apparatus, and device for image capture, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910506435.6A CN110072064B (zh) 2019-06-12 2019-06-12 一种图像拍摄方法、装置、设备以及存储介质
CN201910506435.6 2019-06-12

Publications (1)

Publication Number Publication Date
WO2020248396A1 true WO2020248396A1 (zh) 2020-12-17

Family

ID=67372768

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103656 WO2020248396A1 (zh) 2019-06-12 2019-08-30 图像拍摄方法、装置、设备以及存储介质

Country Status (3)

Country Link
US (1) US11736800B2 (zh)
CN (1) CN110072064B (zh)
WO (1) WO2020248396A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110072064B (zh) 2019-06-12 2020-07-03 睿魔智能科技(深圳)有限公司 一种图像拍摄方法、装置、设备以及存储介质
CN111147749A (zh) * 2019-12-31 2020-05-12 宇龙计算机通信科技(深圳)有限公司 拍摄方法、拍摄装置、终端及存储介质
CN112017210A (zh) * 2020-07-14 2020-12-01 创泽智能机器人集团股份有限公司 目标物体跟踪方法及装置
TWI767714B (zh) 2021-05-19 2022-06-11 華碩電腦股份有限公司 電子裝置以及其影像擷取器的控制方法
KR20230073887A (ko) * 2021-11-19 2023-05-26 한국전자통신연구원 3차원 손 자세 추정 방법 및 증강 시스템

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170244887A1 (en) * 2016-02-19 2017-08-24 Canon Kabushiki Kaisha Image capturing apparatus, control method of the same, and storage medium
CN107749952A (zh) * 2017-11-09 2018-03-02 睿魔智能科技(东莞)有限公司 一种基于深度学习的智能无人摄影方法和系统
CN109117794A (zh) * 2018-08-16 2019-01-01 广东工业大学 一种运动目标行为跟踪方法、装置、设备及可读存储介质
CN109803090A (zh) * 2019-01-25 2019-05-24 睿魔智能科技(深圳)有限公司 无人拍摄自动变焦方法及系统、无人摄像机及存储介质
CN110072064A (zh) * 2019-06-12 2019-07-30 睿魔智能科技(深圳)有限公司 一种图像拍摄方法、装置、设备以及存储介质

Family Cites Families (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE69434657T2 (de) * 1993-06-04 2007-02-15 Sarnoff Corp. System und Verfahren zur elektronischen Bildstabilisierung
JP3515926B2 (ja) * 1999-06-23 2004-04-05 本田技研工業株式会社 車両の周辺監視装置
JP3950707B2 (ja) * 2002-02-22 2007-08-01 キヤノン株式会社 光学機器
CN102710896B (zh) * 2012-05-07 2015-10-14 浙江宇视科技有限公司 针对动态目标进行拉框放大的方法和装置
JP6335434B2 (ja) * 2013-04-19 2018-05-30 キヤノン株式会社 撮像装置、その制御方法およびプログラム
CN103905733B (zh) * 2014-04-02 2018-01-23 哈尔滨工业大学深圳研究生院 一种单目摄像头对人脸实时跟踪的方法及系统
US10048749B2 (en) * 2015-01-09 2018-08-14 Microsoft Technology Licensing, Llc Gaze detection offset for gaze tracking models
JP2016140030A (ja) * 2015-01-29 2016-08-04 株式会社リコー 画像処理装置、撮像装置、及び画像処理プログラム
JP6800628B2 (ja) * 2016-06-22 2020-12-16 キヤノン株式会社 追跡装置、追跡方法、及びプログラム
US20180189609A1 (en) * 2017-01-04 2018-07-05 Qualcomm Incorporated Training data for machine-based object recognition
US10699421B1 (en) * 2017-03-29 2020-06-30 Amazon Technologies, Inc. Tracking objects in three-dimensional space using calibrated visual cameras and depth cameras
US10628961B2 (en) * 2017-10-13 2020-04-21 Qualcomm Incorporated Object tracking for neural network systems
CN108234872A (zh) * 2018-01-03 2018-06-29 上海传英信息技术有限公司 移动终端及其拍照方法
CN108200344A (zh) * 2018-01-23 2018-06-22 江苏冠达通电子科技有限公司 摄像机的调整变焦方法
CN108960090B (zh) * 2018-06-20 2023-05-30 腾讯科技(深圳)有限公司 视频图像处理方法及装置、计算机可读介质和电子设备
CN109064514B (zh) * 2018-07-03 2022-04-26 北京航空航天大学 一种基于投影点坐标回归的六自由度位姿估计方法
CN109087337B (zh) * 2018-11-07 2020-07-14 山东大学 基于分层卷积特征的长时间目标跟踪方法及系统
US11277556B2 (en) * 2019-04-01 2022-03-15 Jvckenwood Corporation Control device for automatic tracking camera

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170244887A1 (en) * 2016-02-19 2017-08-24 Canon Kabushiki Kaisha Image capturing apparatus, control method of the same, and storage medium
CN107749952A (zh) * 2017-11-09 2018-03-02 睿魔智能科技(东莞)有限公司 一种基于深度学习的智能无人摄影方法和系统
CN109117794A (zh) * 2018-08-16 2019-01-01 广东工业大学 一种运动目标行为跟踪方法、装置、设备及可读存储介质
CN109803090A (zh) * 2019-01-25 2019-05-24 睿魔智能科技(深圳)有限公司 无人拍摄自动变焦方法及系统、无人摄像机及存储介质
CN110072064A (zh) * 2019-06-12 2019-07-30 睿魔智能科技(深圳)有限公司 一种图像拍摄方法、装置、设备以及存储介质

Also Published As

Publication number Publication date
CN110072064A (zh) 2019-07-30
US11736800B2 (en) 2023-08-22
CN110072064B (zh) 2020-07-03
US20220201219A1 (en) 2022-06-23

Similar Documents

Publication Publication Date Title
WO2020248396A1 (zh) 图像拍摄方法、装置、设备以及存储介质
WO2020248395A1 (zh) 跟拍方法、装置、设备及存储介质
JP7048764B2 (ja) パノラマビデオのターゲット追跡方法及びパノラマカメラ
CN110139115B (zh) 基于关键点的虚拟形象姿态控制方法、装置及电子设备
CN105678809A (zh) 手持式自动跟拍装置及其目标跟踪方法
CN107749952B (zh) 一种基于深度学习的智能无人摄影方法和系统
CN105718887A (zh) 基于移动终端摄像头实现动态捕捉人脸摄像的方法及系统
CN114095662B (zh) 拍摄指引方法及电子设备
CN106292162A (zh) 立体照相装置和相关控制方法
WO2021147650A1 (zh) 拍照方法、装置、存储介质及电子设备
CN108090463B (zh) 对象控制方法、装置、存储介质和计算机设备
CN108702456A (zh) 一种对焦方法、设备及可读存储介质
WO2019227333A1 (zh) 集体照拍摄方法和装置
JP2020053774A (ja) 撮像装置および画像記録方法
WO2022227752A1 (zh) 拍照方法及装置
WO2022143311A1 (zh) 一种智能取景推荐的拍照方法及装置
CN114363522A (zh) 拍照方法及相关装置
US10887525B2 (en) Delivery of notifications for feedback over visual quality of images
CN114140530A (zh) 一种图像处理方法及投影设备
WO2021147648A1 (zh) 提示方法、装置、存储介质及电子设备
WO2021184326A1 (zh) 电子装置的控制方法、装置、设备及系统
WO2021056442A1 (zh) 摄像装置的构图方法、系统及存储介质
KR102619701B1 (ko) 동적 객체에 대한 3차원 자세 추정 데이터 생성 방법 및 그를 위한 컴퓨팅 장치
CN115294508B (zh) 一种基于静态空间三维重构的跟焦方法、系统及摄像系统
CN112839164A (zh) 一种拍照方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932368

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932368

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13-05-2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19932368

Country of ref document: EP

Kind code of ref document: A1