WO2020248396A1 - 图像拍摄方法、装置、设备以及存储介质 - Google Patents
图像拍摄方法、装置、设备以及存储介质 Download PDFInfo
- Publication number
- WO2020248396A1 WO2020248396A1 PCT/CN2019/103656 CN2019103656W WO2020248396A1 WO 2020248396 A1 WO2020248396 A1 WO 2020248396A1 CN 2019103656 W CN2019103656 W CN 2019103656W WO 2020248396 A1 WO2020248396 A1 WO 2020248396A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- bounding box
- image
- pixel
- reference position
- offset
- Prior art date
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
- H04N23/611—Control of cameras or camera modules based on recognised objects where the recognised objects include parts of the human body
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/246—Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/695—Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/61—Control of cameras or camera modules based on recognised objects
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/64—Computer-aided capture of images, e.g. transfer from script file into camera, check of taken image quality, advice or proposal for image composition or decision on when to take image
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N23/00—Cameras or camera modules comprising electronic image sensors; Control thereof
- H04N23/60—Control of cameras or camera modules
- H04N23/69—Control of means for changing angle of the field of view, e.g. optical zoom objectives or electronic zooming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20021—Dividing image into blocks, subimages or windows
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
Definitions
- This application relates to the field of computer software applications, such as an image shooting method, device, equipment, and storage medium.
- the camera will be equipped with smart shooting.
- the smart camera mode only detects the environmental parameters of the current shooting and automatically adjusts the environmental parameters to assist non-professionals in taking professional photos.
- Such automatic adjustment parameters are usually limited to aperture and shutter. Speed, etc., the degree of intelligence is low. Based on this, the technology of automatically tracking the target for shooting was developed.
- a bounding box is used to locate the position of the target, and then the movement of the lens is controlled based on the "central control” method to realize the automatic follow-up function.
- portrait shooting this method has many limitations. Portrait shooting is more complicated. Under different postures, the effect achieved by the traditional bounding box “center control” method is very different from the actual expected effect of human beings. The traditional bounding box “center control method” is only suitable for special situations where the target is very few in the picture.
- the present application provides an image shooting method, device, equipment, and storage medium, which can automatically control the rotation of the camera based on the pixel-level visual characteristics of the image to improve the shooting effect.
- This application provides an image shooting method, which includes:
- the lens shift offset is determined according to the position of each pixel in the bounding box and the first reference position.
- the present application provides an image capturing device, which includes:
- the bounding box acquisition module is set to acquire the bounding box of the lens tracking target in the image to be shot;
- a reference position prediction module configured to use a pre-trained reference model to predict the first reference position of the image to be taken
- the lens shift determining module is configured to determine the lens shift shift amount according to the position of each pixel in the bounding box and the first reference position.
- the present application provides an image capturing device, the image capturing device includes a memory and a processor, the memory stores a computer program that can be run on the processor, and the processor implements the aforementioned image capturing when the computer program is executed. method.
- the present application provides a computer-readable storage medium, the storage medium stores a computer program, the computer program includes program instructions, and the program instructions, when executed, implement the aforementioned image capturing method.
- FIG. 1 is a flowchart of an image shooting method provided by Embodiment 1 of the present application.
- FIG. 2 is a sub-flow chart of an image shooting method provided in Embodiment 1 of the present application;
- FIG. 3 is a flowchart of another image shooting method provided by Embodiment 2 of the present application.
- FIG. 4 is a training flowchart of a reference model provided in Embodiment 2 of the present application.
- FIG. 5 is a sub-flow chart of training a reference model provided in the second embodiment of the present application.
- FIG. 6 is a schematic diagram of the structure of an image capturing device according to Embodiment 3 of the present application.
- FIG. 7 is a schematic diagram of the structure of a training sub-module of an image shooting device according to Embodiment 3 of the present application.
- FIG. 8 is a schematic structural diagram of a position acquiring unit of an image shooting device according to Embodiment 3 of this application.
- FIG. 9 is a schematic structural diagram of a lens shift determining module of an image shooting device according to Embodiment 3 of the application.
- FIG. 10 is a schematic structural diagram of an image capturing device provided in Embodiment 4 of the present application.
- first”, “second”, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element.
- first speed difference may be referred to as the second speed difference
- second speed difference may be referred to as the first speed difference.
- the first speed difference and the second speed difference are both speed differences, but the first speed difference and the second speed difference are not the same speed difference.
- the terms “first”, “second”, etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
- first and second may explicitly or implicitly include one or more of these features.
- a plurality of means at least two, such as two, three, etc., unless specifically defined otherwise.
- one part is said to be “fixed” to another part, it can be directly on the other part or there may be a central part.
- a part is considered to be “connected” to another part, it may be directly connected to the other part or a central part may be present at the same time.
- the terms “vertical”, “horizontal”, “left”, “right” and similar expressions used herein are for illustrative purposes only and do not mean that they are the only implementation.
- Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many steps in this document can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when its operation is completed, but may also have additional steps not included in the drawings. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
- this embodiment provides an image shooting method, which includes the following steps.
- the lens tracking target mentioned here refers to the main shooting target that needs to be kept in the lens at all times, such as people, pets and other photographic materials.
- a bounding box is used to determine the position of the lens tracking target, and the bounding box refers to the area range of the picture where the lens tracking target appears in the image to be shot.
- the bounding box has a rectangular outer frame shape that is long in the longitudinal direction or the transverse direction. The size and position of the bounding box in this embodiment depend on the size of the lens tracking target in the image captured by the lens. In one embodiment, the bounding box may be determined based on a visual tracking method in related technologies.
- S120 Predict the first reference position of the image to be shot by using the pre-trained reference model.
- the “center control” method is usually used to locate the target to the exact center of the image, but this method does not take into account the influence of the posture of the tracked target on the composition of the picture. For example, when shooting a standing portrait, “center control” The method puts the center of the standing portrait at the center of the image, and the upper body of the human body is closer to the center of the image to obtain a better composition effect. Therefore, this embodiment uses a pre-trained reference model to predict the first reference of the image to be shot position.
- the reference model is based on deep convolutional neural network (Convolutional Neural Networks, CNN) training.
- the first reference position is to predict the best composition position of the lens tracking target in the image.
- the best composition position is based on the images taken by a large number of photographers containing the lens tracking target, and the lens tracking target obtained by statistical analysis is in the image taken by the photographer In the location.
- the best composition position is determined by the reference model according to the information of the lens tracking target in the image.
- the information of the lens tracking target includes one or more of the size and position of the bounding box of the lens tracking target and the posture of the lens tracking target.
- S130 Determine a lens movement offset according to each pixel position in the bounding box and the first reference position.
- the composition prediction position of the bounding box is determined, and the movement offset required by the lens can be calculated by combining the initial position of the bounding box.
- the traditional bounding box "center control" method only uses the center point of the bounding box to calculate, and the “center control” method calculates the movement offset required by the lens to move the center point of the bounding box to the center of the screen.
- This calculation method is The effect is better when the bounding box is small enough, but the size of the bounding box in actual shooting is uncertain, and for the composition effect, the proportion of the lens tracking target in the image cannot be too small, that is, the bounding box is in the image. The proportion cannot be too small. Therefore, in order to obtain more accurate lens offset calculation results, this embodiment uses each pixel in the bounding box based on the pixel-level visual features of the image based on the first reference position predicted by the reference model The position calculates the shift offset of the lens.
- step S130 includes step S1310-step S1320.
- S1310 Calculate the position offset of each pixel in the bounding box according to the first reference position.
- (x,y) are the normalized coordinates of the pixel, x represents the horizontal coordinate, and y represents the vertical coordinate.
- XT is the horizontal coordinate image of the reference position
- YT is the vertical coordinate image of the reference position, which is predicted by the reference model.
- DX is the horizontal offset image
- DY is the vertical offset image, which is calculated by subsequent methods.
- the formula is used according to the first reference position Calculate the position offset of each pixel in the bounding box.
- DX(x,y) is the horizontal offset of each pixel in the bounding box
- XT(x,y) is the horizontal offset of each pixel in the bounding box when the bounding box is located at the first reference position Position, that is, the horizontal coordinate of each pixel in the bounding box in the image predicted by the reference model
- DY(x,y) is the vertical offset of each pixel in the bounding box
- YT(x,y) is located in the first reference
- the vertical position of each pixel in the bounding box of the position is the vertical coordinate of each pixel in the bounding box in the image predicted by the reference model
- x is the horizontal position of each pixel in the bounding box. It can also be understood as each pixel in the bounding box.
- the horizontal coordinate of the initial position of, y is the vertical position of each pixel in the bounding box can also be understood as the vertical coordinate of the initial position of each pixel in the bounding box.
- the coordinate difference between the position of each pixel in the bounding box and the initial position of each pixel can be calculated separately when the bounding box is located at the first reference position, so as to indicate The position offset of each pixel in the bounding box between the image predicted by the reference model and the image taken before the lens shift.
- S1320 Calculate the lens movement offset according to the position offset of each pixel in the bounding box.
- an image shooting method is provided.
- a reference model trained by a deep convolutional neural network is used to predict the image to be captured to obtain a first reference position with a better composition effect, based on the pixel-level visual features of the image and the first reference position.
- the reference position calculates the position offset of each pixel to obtain the lens shift offset.
- the image shooting method provided by this application determines the position of the lens tracking target in the image to be shot through the bounding box, and predicts the first reference position of the image to be shot by using a reference model trained on the basis of a convolutional neural network that can simulate the composition idea of a photographer ,
- the pixel-level calculation method is used to calculate the lens shift offset required to achieve the tracking target at the first reference position.
- the pixel-level visual characteristics of the camera automatically control the rotation of the camera, can automatically adapt to the change of the target posture and adapt to the change of the camera's shooting angle to shoot, improve the shooting effect, and help improve the user experience.
- FIG. 3 is a schematic flow chart of another image shooting method provided in Embodiment 2 of the present application. This embodiment is implemented on the basis of Embodiment 1. As shown in FIG. 3, the following steps are further included before step S110.
- Step S100 Obtain a pre-trained reference model based on deep convolutional neural network training.
- step S100 obtaining a pre-trained reference model based on deep convolutional neural network training (ie, the training process of the reference model) includes step S310-step S360.
- multiple training images are preset in the image data set, and the training image type can be selected according to different shooting targets.
- portrait shooting is taken as an example, and all the training images collected in the image data set include portraits.
- These training images can cover many types of main scenes such as indoors, beaches and mountains, and various postures such as running, sitting, lying down and dancing.
- Each training image in the image data set has corresponding label data.
- the label data in this embodiment includes the bounding box information and key point information of the tracking target in the training image.
- the bounding box information includes the position of the bounding box and the size of the bounding box.
- 17 joint points of the human body are exemplarily selected as key points, and coordinate information corresponding to the joint points are respectively marked as key point information.
- Each joint point is marked as (xi, yi, si), i is a natural number from 1 to 17, indicating the i-th key point, xi is the horizontal coordinate of the i-th key point, and yi is the vertical coordinate of the i-th key point ,
- si is a natural number from 1 to 17, indicating the i-th key point
- xi is the horizontal coordinate of the i-th key point
- yi is the vertical coordinate of the i-th key point
- S320 Obtain the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target.
- the traditional "center control” method controls the center point of the target bounding box to move to the center of the image to complete the composition.
- the calculation process of this method is simple and does not take into account the influence of the target posture on the composition. Therefore, the shooting effect is quite different from the actual expectation.
- the shooting method provided in this embodiment when training the reference model, the difference in composition requirements of the tracking target in different poses is fully considered, and the different poses of the tracking target can be distinguished according to the different key point information of the tracking target marked in step S310. Tracking the bounding box information and key point information of the target calculates the reference position of the center point of the bounding box, and can fully simulate the photographer's composition control ability, and its composition effect is better.
- step S320 includes step S3210-step S3230:
- the horizontal coordinate range and vertical coordinate range of the image are both [0, 1].
- the setting of the reference point and the reference line can be adjusted according to different composition requirements.
- the horizontal coordinate range is changed by the above reference point and reference line.
- vertical coordinate range The limited area is set as the best composition area for tracking the target.
- p i and p j represent two different points
- x pi and y pi represent the horizontal and vertical coordinates of the point p i
- x pj , y pj represents the horizontal coordinate and vertical coordinate of the point p j , respectively.
- P xy (x/W, y/H), and L xy is a normalized two-point line segment.
- the second loss value may reflect the degree of conformity between the tracking target and the customized optimal composition area when the bounding box is placed in different positions. The smaller the second loss value, the closer to the customized optimal composition area.
- custom grids, reference points, and reference lines can be adjusted according to different requirements for image accuracy.
- the key points of the tracking target and the relationship between the key line segments and the key points can also be customized. For example, when the accuracy is higher, W and H can be increased, that is, the number of grids of the image segmentation grid is increased.
- (x, y) are the normalized coordinates of the pixel
- ⁇ ⁇ , ⁇ 1 are the number of tracking targets in the training image
- X TG (x, y) is the horizontal coordinate of the reference position of each pixel
- Y TG (x, y) is the vertical coordinate of the reference position of each pixel
- x ti and x ci are the horizontal coordinate of the reference position and the initial position of the center point of the bounding box of each tracking target, respectively
- y ti and y ci are respectively
- the vertical coordinates of the reference position and the vertical coordinates of the initial position of the center point of the bounding box of each tracking target, and the reference position image of the training image can be obtained after the reference position coordinates of each pixel are determined.
- the reference position image Compared with the image obtained by the traditional "center control" method, the reference position image fully considers the composition requirements when the target pose is different, and the composition effect is more precise and reasonable.
- the initial model of the deep convolutional neural network is used to predict the training image to obtain the second reference position of the tracking target in the image. Furthermore, the prediction result image can be obtained, and the horizontal and vertical coordinates of each pixel in the prediction result image are X T (x, y) and Y T (x, y), respectively.
- S350 Calculate a first loss value according to the reference position image and the prediction result image, and adjust the parameters of the deep convolutional neural network according to the first loss value.
- the first loss value adopts Euclidean distance loss. According to the aforementioned reference position image and prediction result image, it is calculated by formula (2):
- X TG (x, y) and Y TG (x, y) are obtained from the formula (1), and X T (x, y) and Y T (x, y) are obtained from the prediction result image.
- the reference position image is an image that is expected to achieve the composition effect.
- the first loss value represents the deviation between the predicted result image and the reference position image. Based on the first loss value, the deep convolutional neural network is back-propagated to adjust the deep convolutional neural network parameters to make the prediction The resulting image is closer to the reference position image.
- Steps S310-S350 are sequentially performed on the multiple training images in the image data set, until the first loss value in step S350 no longer decreases, the training of the deep convolutional neural network is ended, and a pre-trained reference model is obtained.
- the deep convolutional neural network Adjusts the parameters of the deep convolutional neural network according to the first loss value, and you will get different first loss values.
- the first loss value continues to decrease, it indicates that the predicted result image is getting closer and closer to the reference position image, and the deep convolutional neural network is constantly adjusted Network
- the final first loss value no longer decreases it can be considered that the predicted result image is closest to the reference position image at this time, and the desired deep convolutional neural network model can be obtained as a trained reference model.
- the custom-defined first loss value is expected to be lower than k, and the value obtained after multiple training with multiple training images is at least When m consecutive first loss values are always lower than k, it can be deemed that the first loss value no longer decreases.
- This embodiment provides the training process of the pre-trained reference model used in the first embodiment, and provides a more reasonable composition method based on the key point information of the tracking target.
- the reference image composition effect achieved is better, based on the reference.
- the image and the first loss value calculated by the deep convolutional neural network are back-propagated to the deep convolutional neural network to obtain a trained reference model that can adapt to different postures of the target to predict a more reasonable composition prediction image.
- this embodiment provides an image capturing device 500, which includes: a bounding box acquisition module 510, configured to acquire the bounding box of a lens tracking target in the image to be captured; a reference position prediction module 520, configured to use the The trained reference model predicts the first reference position of the image to be shot; the lens shift determining module 530 is configured to determine the lens shift shift amount according to each pixel position in the bounding box and the first reference position.
- the frame acquisition module 510 is configured to acquire multiple bounding boxes corresponding to the lens tracking targets according to the number of lens tracking targets in the image to be shot.
- the reference position prediction module 520 further includes a model training sub-module 521, which is configured to obtain a trained reference model based on deep convolutional neural network training.
- the model training sub-module 521 includes: a data set unit 5210, configured to obtain training images and corresponding label data from a preset image data set, the label data includes the bounding box information and the tracking target in the training image Key point information; position acquisition unit 5211, set to acquire the reference position of the center point of the bounding box based on the bounding box information and key point information of the tracking target; image acquisition unit 5212, set to acquire the training image correspondence based on the reference position of the center point of the bounding box
- the image acquisition unit 5212 is configured to acquire training images according to the reference position of the center point of the bounding box of each tracking target, the initial position of the center point of the bounding box of each tracking target, and the number of tracking targets
- the prediction result image acquisition unit 5213 is set to use the deep convolutional neural network to predict the second reference position of the training image to obtain the prediction result image
- the loss value processing unit 5214 is set to use the reference position image and the The prediction result image calculates
- X TG (, y) is the horizontal position of each pixel in the bounding box calculated according to the reference position of the center point of the bounding box
- X T (, y) is the bounding box predicted by the deep convolutional neural network
- the horizontal position of each pixel Y TG (x, y) is the vertical position of each pixel in the bounding box calculated according to the reference position of the center point of the bounding box
- Y T (x, y) is the depth of the convolutional neural network The vertical position of each pixel within the predicted bounding box.
- the position acquisition unit 5212 includes: a grid division subunit 52120, configured to generate a grid table based on the training image, and divide the training image into W*H grids, W, H is a natural number greater than 1; the second loss value processing subunit 52121 is set to obtain the second loss value when the center of the bounding box is placed at a different grid center; the reference position obtaining subunit 52122 is set to select The center position of the grid with the smallest second loss value is used as the reference position of the center point of the bounding box.
- the lens offset determination module 530 includes: a pixel position offset obtaining sub-module 5300, configured to calculate the position offset of each pixel in the bounding box according to the first reference position ;
- the lens movement offset acquisition sub-module 5301 is set to calculate the lens movement offset according to the position offset of each pixel in the bounding box.
- the pixel position offset obtaining submodule 5300 is configured to use the formula according to the first reference position Calculate the position offset of each pixel in the bounding box;
- DX (x, y) is the horizontal offset of each pixel in the bounding box
- XT (x, y) is the horizontal position of each pixel in the bounding box when the bounding box is at the first reference position
- x is the horizontal position of each pixel in the bounding box
- DY (x, y) is the vertical offset of each pixel in the bounding box
- YT (x, y) is when the bounding box is located at the first reference position
- y is the vertical position of each pixel in the bounding box.
- the lens movement offset obtaining sub-module 5301 is set to use the formula according to the position offset of each pixel in the bounding box Calculate the lens movement offset d;
- d (d x , d y ), d x is the horizontal movement offset of the lens, d y is the vertical movement offset of the lens, (x, y) ⁇ ⁇ indicates that the pixel (x, y) belongs to the boundary In the box ⁇ , ⁇ (x, y) ⁇ ⁇ 1 represents the sum of the number of pixels contained in the bounding box ⁇ .
- This embodiment provides an image shooting device that can automatically adapt to changes in the posture of a target and adapt to changes in the shooting angle of the camera to perform shooting, thereby improving the shooting effect and improving the user experience.
- An image capturing device provided in an embodiment of the present application can execute an image capturing method provided in the foregoing embodiment of the present application, and has functional modules and beneficial effects corresponding to the execution method.
- FIG. 10 is a schematic structural diagram of an image capturing device 600 provided in Embodiment 4 of the application.
- the image capturing device includes a memory 610 and a processor 620.
- the number of processors 620 in the image capturing device may be For one or more, one processor 620 is taken as an example in FIG. 10; the memory 610 and the processor 620 in the image capturing device may be connected through a bus or in other ways. In FIG. 10, the connection through a bus is taken as an example.
- the memory 610 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the image capturing method in the embodiment of the present application (for example, the boundary in the image capturing device).
- the processor 620 executes various functional applications and data processing of the image capturing device by running the software programs, instructions, and modules stored in the memory 610, thus realizing the foregoing image capturing method.
- the processor 620 is configured to run a computer executable program stored in the memory 610 to achieve the following: step S110, obtain the bounding box of the lens tracking target in the image to be shot; step S120, use the pre-trained The reference model predicts the first reference position of the image to be shot; step S130, determining the lens shift offset according to each pixel position in the bounding box and the first reference position.
- An image capturing device provided by an embodiment of the present application is not limited to the method operations described above, and may also perform related operations in the image capturing method provided in any embodiment of the embodiment of the present application.
- the memory 610 may mainly include a program storage area and a data storage area.
- the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the terminal, and the like.
- the memory 610 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
- the memory 610 may include a memory remotely provided with respect to the processor 620, and these remote memories may be connected to the image capturing device through a network. Examples of the aforementioned networks include but are not limited to the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- This embodiment provides an image shooting device, which can automatically adapt to changes in the posture of a target and adapt to changes in the shooting angle of the camera for shooting, improving the shooting effect, and helping to improve the user experience.
- the fifth embodiment of the present application also provides a storage medium containing computer-executable instructions, when the computer-executable instructions are executed by a computer processor, are used to execute an image capturing method, the image capturing method includes: acquiring an image to be captured The inner camera tracks the bounding box of the target; uses a pre-trained reference model to predict the first reference position of the image to be shot; determines the lens shift offset according to each pixel position in the bounding box and the first reference position .
- An embodiment of the application provides a storage medium containing computer-executable instructions.
- the computer-executable instructions are not limited to the method operations described above, and can also perform related operations in the image capturing method provided by any embodiment of the application. .
- this application can be implemented by software and general hardware, or can be implemented by hardware.
- the technical solution of this application can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, read-only memory (ROM), Random Access Memory (RAM), flash memory (FLASH), hard disk or optical disk, etc., including multiple instructions to make a computer device (may be a personal computer, image capture device, or network device, etc.) Apply the method described in any embodiment.
- the multiple units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be realized; in addition, the function of each functional unit
- the names are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application.
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Studio Devices (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (10)
- 一种图像拍摄方法,包括:获取待拍摄图像内镜头跟踪目标的边界框;利用预先训练好的参考模型预测所述待拍摄图像的第一参考位置;根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量。
- 根据权利要求1所述的方法,其中,所述根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量包括:根据所述第一参考位置计算得到所述边界框内每个像素的位置偏移量;根据所述边界框内每个像素的位置偏移量计算得到镜头移动偏移量。
- 根据权利要求1或2所述的方法,其中,所述预先训练好的参考模型的训练过程包括:从预先设定的图像数据集中获取训练图像和对应的标记数据,所述标记数据包括所述训练图像中跟踪目标的边界框信息和关键点信息;根据所述跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置;基于所述边界框中心点的参考位置获取所述训练图像对应的参考位置图像;利用深度卷积神经网络预测所述训练图像的第二参考位置以得到预测结果图像;根据所述参考位置图像和所述预测结果图像计算第一损失值,并根据所述第一损失值对所述深度卷积神经网络的参数进行调节;对所述图像数据集中的多张训练图像依次执行上述步骤,直到第一损失值不再下降,结束对所述深度卷积神经网络的训练,得到所述预先训练好的参考模型。
- 根据权利要求3所述的方法,其中,所述根据所述跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置包括:基于所述训练图像生成一幅网格表,将所述训练图像划分为W*H个网格,W、H为大于1的自然数;获取在将边界框中心放置于不同的网格中心的情况下的第二损失值;选取所述第二损失值最小的网格的中心位置作为所述边界框中心点的参考 位置。
- 根据权利要求3或4所述的方法,其中,所述基于所述边界框中心点的参考位置获取所述训练图像对应的参考位置图像包括:根据每个跟踪目标的边界框中心点的参考位置、所述每个跟踪目标的边界框中心点的初始位置和跟踪目标数量获取所述训练图像对应的参考位置图像。
- 其中,DX(x,y)为所述边界框内每个像素的水平偏移量,XT(x,y)为在所述所述边界框位于所述第一参考位置的情况下,所述边界框内每个像素的水平位置,DY(x,y)为所述边界框内每个像素的垂直偏移量,YT(x,y)为在所述边界框位于所述第一参考位置的情况下,所述边界框内每个像素的垂直位置,x为所述边界框内每个像素的水平位置,y为所述边界框内每个像素的垂直位置;其中,d=(d x,d y),d x为镜头的水平移动偏移量,d y为镜头的垂直移动偏移量,(x,y)∈Θ表示像素(x,y)属于边界框Θ内,∑ (x,y)∈Θ1表示的是所述边界框Θ内包含的像素数之和。
- 根据权利要求3-6任一项所述的方法,其中,所述第一损失值利用公式L=∑ x,y(X TG(x,y)-X T(x,y)) 2+∑ x,y(Y TG(x,y)-Y T(x,y)) 2计算得到;其中,X TG(x,y)为根据所述边界框中心点的参考位置所计算的边界框内每个像素的水平位置,X T(x,y)为由所述深度卷积神经网络预测的边界框内每个像素的水平位置,Y TG(x,y)为根据所述边界框中心点的参考位置所计算的边界框内每个像素的垂直位置,Y T(x,y)为由所述深度卷积神经网络预测的边界框内每个像素的垂直位置。
- 一种图像拍摄装置,包括:边界框获取模块,设置为获取待拍摄图像内镜头跟踪目标的边界框;参考位置预测模块,设置为预先训练好的参考模型利用预先训练好的参考 模型预测所述待拍摄图像的第一参考位置;镜头偏移确定模块,设置为根据所述边界框内每个像素位置和所述第一参考位置确定镜头移动偏移量。
- 一种图像拍摄设备,包括存储器和处理器,所述存储器上存储有可在处理器运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1-7的图像拍摄方法。
- 一种计算机可读存储介质,存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被执行时实现如权利要求1-7任意一项所述的图像拍摄方法。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US17/606,075 US11736800B2 (en) | 2019-06-12 | 2019-08-30 | Method, apparatus, and device for image capture, and storage medium |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910506435.6A CN110072064B (zh) | 2019-06-12 | 2019-06-12 | 一种图像拍摄方法、装置、设备以及存储介质 |
CN201910506435.6 | 2019-06-12 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020248396A1 true WO2020248396A1 (zh) | 2020-12-17 |
Family
ID=67372768
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/103656 WO2020248396A1 (zh) | 2019-06-12 | 2019-08-30 | 图像拍摄方法、装置、设备以及存储介质 |
Country Status (3)
Country | Link |
---|---|
US (1) | US11736800B2 (zh) |
CN (1) | CN110072064B (zh) |
WO (1) | WO2020248396A1 (zh) |
Families Citing this family (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110072064B (zh) | 2019-06-12 | 2020-07-03 | 睿魔智能科技(深圳)有限公司 | 一种图像拍摄方法、装置、设备以及存储介质 |
CN111147749A (zh) * | 2019-12-31 | 2020-05-12 | 宇龙计算机通信科技(深圳)有限公司 | 拍摄方法、拍摄装置、终端及存储介质 |
CN112017210A (zh) * | 2020-07-14 | 2020-12-01 | 创泽智能机器人集团股份有限公司 | 目标物体跟踪方法及装置 |
TWI767714B (zh) | 2021-05-19 | 2022-06-11 | 華碩電腦股份有限公司 | 電子裝置以及其影像擷取器的控制方法 |
KR20230073887A (ko) * | 2021-11-19 | 2023-05-26 | 한국전자통신연구원 | 3차원 손 자세 추정 방법 및 증강 시스템 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170244887A1 (en) * | 2016-02-19 | 2017-08-24 | Canon Kabushiki Kaisha | Image capturing apparatus, control method of the same, and storage medium |
CN107749952A (zh) * | 2017-11-09 | 2018-03-02 | 睿魔智能科技(东莞)有限公司 | 一种基于深度学习的智能无人摄影方法和系统 |
CN109117794A (zh) * | 2018-08-16 | 2019-01-01 | 广东工业大学 | 一种运动目标行为跟踪方法、装置、设备及可读存储介质 |
CN109803090A (zh) * | 2019-01-25 | 2019-05-24 | 睿魔智能科技(深圳)有限公司 | 无人拍摄自动变焦方法及系统、无人摄像机及存储介质 |
CN110072064A (zh) * | 2019-06-12 | 2019-07-30 | 睿魔智能科技(深圳)有限公司 | 一种图像拍摄方法、装置、设备以及存储介质 |
Family Cites Families (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE69434657T2 (de) * | 1993-06-04 | 2007-02-15 | Sarnoff Corp. | System und Verfahren zur elektronischen Bildstabilisierung |
JP3515926B2 (ja) * | 1999-06-23 | 2004-04-05 | 本田技研工業株式会社 | 車両の周辺監視装置 |
JP3950707B2 (ja) * | 2002-02-22 | 2007-08-01 | キヤノン株式会社 | 光学機器 |
CN102710896B (zh) * | 2012-05-07 | 2015-10-14 | 浙江宇视科技有限公司 | 针对动态目标进行拉框放大的方法和装置 |
JP6335434B2 (ja) * | 2013-04-19 | 2018-05-30 | キヤノン株式会社 | 撮像装置、その制御方法およびプログラム |
CN103905733B (zh) * | 2014-04-02 | 2018-01-23 | 哈尔滨工业大学深圳研究生院 | 一种单目摄像头对人脸实时跟踪的方法及系统 |
US10048749B2 (en) * | 2015-01-09 | 2018-08-14 | Microsoft Technology Licensing, Llc | Gaze detection offset for gaze tracking models |
JP2016140030A (ja) * | 2015-01-29 | 2016-08-04 | 株式会社リコー | 画像処理装置、撮像装置、及び画像処理プログラム |
JP6800628B2 (ja) * | 2016-06-22 | 2020-12-16 | キヤノン株式会社 | 追跡装置、追跡方法、及びプログラム |
US20180189609A1 (en) * | 2017-01-04 | 2018-07-05 | Qualcomm Incorporated | Training data for machine-based object recognition |
US10699421B1 (en) * | 2017-03-29 | 2020-06-30 | Amazon Technologies, Inc. | Tracking objects in three-dimensional space using calibrated visual cameras and depth cameras |
US10628961B2 (en) * | 2017-10-13 | 2020-04-21 | Qualcomm Incorporated | Object tracking for neural network systems |
CN108234872A (zh) * | 2018-01-03 | 2018-06-29 | 上海传英信息技术有限公司 | 移动终端及其拍照方法 |
CN108200344A (zh) * | 2018-01-23 | 2018-06-22 | 江苏冠达通电子科技有限公司 | 摄像机的调整变焦方法 |
CN108960090B (zh) * | 2018-06-20 | 2023-05-30 | 腾讯科技(深圳)有限公司 | 视频图像处理方法及装置、计算机可读介质和电子设备 |
CN109064514B (zh) * | 2018-07-03 | 2022-04-26 | 北京航空航天大学 | 一种基于投影点坐标回归的六自由度位姿估计方法 |
CN109087337B (zh) * | 2018-11-07 | 2020-07-14 | 山东大学 | 基于分层卷积特征的长时间目标跟踪方法及系统 |
US11277556B2 (en) * | 2019-04-01 | 2022-03-15 | Jvckenwood Corporation | Control device for automatic tracking camera |
-
2019
- 2019-06-12 CN CN201910506435.6A patent/CN110072064B/zh active Active
- 2019-08-30 US US17/606,075 patent/US11736800B2/en active Active
- 2019-08-30 WO PCT/CN2019/103656 patent/WO2020248396A1/zh active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170244887A1 (en) * | 2016-02-19 | 2017-08-24 | Canon Kabushiki Kaisha | Image capturing apparatus, control method of the same, and storage medium |
CN107749952A (zh) * | 2017-11-09 | 2018-03-02 | 睿魔智能科技(东莞)有限公司 | 一种基于深度学习的智能无人摄影方法和系统 |
CN109117794A (zh) * | 2018-08-16 | 2019-01-01 | 广东工业大学 | 一种运动目标行为跟踪方法、装置、设备及可读存储介质 |
CN109803090A (zh) * | 2019-01-25 | 2019-05-24 | 睿魔智能科技(深圳)有限公司 | 无人拍摄自动变焦方法及系统、无人摄像机及存储介质 |
CN110072064A (zh) * | 2019-06-12 | 2019-07-30 | 睿魔智能科技(深圳)有限公司 | 一种图像拍摄方法、装置、设备以及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110072064A (zh) | 2019-07-30 |
US11736800B2 (en) | 2023-08-22 |
CN110072064B (zh) | 2020-07-03 |
US20220201219A1 (en) | 2022-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020248396A1 (zh) | 图像拍摄方法、装置、设备以及存储介质 | |
WO2020248395A1 (zh) | 跟拍方法、装置、设备及存储介质 | |
JP7048764B2 (ja) | パノラマビデオのターゲット追跡方法及びパノラマカメラ | |
CN110139115B (zh) | 基于关键点的虚拟形象姿态控制方法、装置及电子设备 | |
CN105678809A (zh) | 手持式自动跟拍装置及其目标跟踪方法 | |
CN107749952B (zh) | 一种基于深度学习的智能无人摄影方法和系统 | |
CN105718887A (zh) | 基于移动终端摄像头实现动态捕捉人脸摄像的方法及系统 | |
CN114095662B (zh) | 拍摄指引方法及电子设备 | |
CN106292162A (zh) | 立体照相装置和相关控制方法 | |
WO2021147650A1 (zh) | 拍照方法、装置、存储介质及电子设备 | |
CN108090463B (zh) | 对象控制方法、装置、存储介质和计算机设备 | |
CN108702456A (zh) | 一种对焦方法、设备及可读存储介质 | |
WO2019227333A1 (zh) | 集体照拍摄方法和装置 | |
JP2020053774A (ja) | 撮像装置および画像記録方法 | |
WO2022227752A1 (zh) | 拍照方法及装置 | |
WO2022143311A1 (zh) | 一种智能取景推荐的拍照方法及装置 | |
CN114363522A (zh) | 拍照方法及相关装置 | |
US10887525B2 (en) | Delivery of notifications for feedback over visual quality of images | |
CN114140530A (zh) | 一种图像处理方法及投影设备 | |
WO2021147648A1 (zh) | 提示方法、装置、存储介质及电子设备 | |
WO2021184326A1 (zh) | 电子装置的控制方法、装置、设备及系统 | |
WO2021056442A1 (zh) | 摄像装置的构图方法、系统及存储介质 | |
KR102619701B1 (ko) | 동적 객체에 대한 3차원 자세 추정 데이터 생성 방법 및 그를 위한 컴퓨팅 장치 | |
CN115294508B (zh) | 一种基于静态空间三维重构的跟焦方法、系统及摄像系统 | |
CN112839164A (zh) | 一种拍照方法及装置 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19932368 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19932368 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 13-05-2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19932368 Country of ref document: EP Kind code of ref document: A1 |