WO2020248395A1 - 跟拍方法、装置、设备及存储介质 - Google Patents

跟拍方法、装置、设备及存储介质 Download PDF

Info

Publication number
WO2020248395A1
WO2020248395A1 PCT/CN2019/103654 CN2019103654W WO2020248395A1 WO 2020248395 A1 WO2020248395 A1 WO 2020248395A1 CN 2019103654 W CN2019103654 W CN 2019103654W WO 2020248395 A1 WO2020248395 A1 WO 2020248395A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
scale
target
training
information
Prior art date
Application number
PCT/CN2019/103654
Other languages
English (en)
French (fr)
Inventor
张明
董健
Original Assignee
睿魔智能科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 睿魔智能科技(深圳)有限公司 filed Critical 睿魔智能科技(深圳)有限公司
Publication of WO2020248395A1 publication Critical patent/WO2020248395A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/292Multi-camera tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • G06T7/75Determining position or orientation of objects or cameras using feature-based methods involving models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence

Definitions

  • This application relates to the field of photographing technology, for example, to a method, device, equipment, and storage medium for photographing.
  • the target object In many fields, in order to achieve better shooting results, it is necessary to automatically follow a target object that needs to be tracked by the camera. In follow-up shots, the target object is usually relatively stable in the frame, and the scene remains unchanged. This requires that the motion speed of the photographer and the target object is basically the same, so as to ensure that the position of the target object in the picture is relatively stable, and the target object will not be moved out of the picture, nor will the scene change.
  • This shooting method can record the posture and actions of the target object through the movement of the camera, without interfering with the subject, and expressing the subject in a relatively natural state.
  • This application provides a follow-up method, device, equipment and storage medium to achieve the effect of follow-up of multiple target objects or the entire group of objects.
  • the embodiment of the present application provides a follow-up method, and the follow-up method includes:
  • An embodiment of the present application provides a follow-up device, which includes:
  • An acquiring module configured to acquire a captured image of the camera in real time, the captured image including at least one target image
  • a calculation module configured to use a pre-trained model to predict the scale information corresponding to each target image and the offset information corresponding to each target image in the captured image;
  • the control module is configured to confirm the control offset information of the camera according to the scale information and the offset information.
  • An embodiment of the present application provides a computer device, and the device includes:
  • One or more processors are One or more processors;
  • Memory set to store one or more programs
  • the one or more processors implement the tracking method as described above.
  • An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
  • the computer program includes program instructions, which when executed by a processor, implement the camera-following method described above.
  • FIG. 1 is a schematic flow chart of a method for taking photos according to Embodiment 1 of the present application
  • FIG. 2 is a schematic flowchart of another photo-following method provided in Embodiment 2 of the present application.
  • FIG. 3 is a schematic flow chart of another method for taking photos according to Embodiment 2 of the present application.
  • FIG. 4 is a schematic flowchart of another camera-following method provided in Embodiment 3 of the present application.
  • FIG. 5 is a schematic flowchart of another photo-following method provided in Embodiment 4 of the present application.
  • FIG. 6 is a schematic structural diagram of a camera follower provided by Embodiment 5 of the present application.
  • FIG. 7 is a schematic structural diagram of a camera-following device provided in Embodiment 6 of the present application.
  • Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many steps in this document can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when its operation is completed, but may also have additional steps not included in the drawings. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
  • first”, “second”, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element.
  • the first speed difference may be referred to as the second speed difference
  • the second speed difference may be referred to as the first speed difference.
  • the first speed difference and the second speed difference are both speed differences, but the first speed difference and the second speed difference are not the same speed difference.
  • the terms “first”, “second”, etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
  • the features defined with “first” and “second” may explicitly or implicitly include one or more of these features.
  • "a plurality of” means at least two, such as two, three, etc., unless specifically defined otherwise.
  • Fig. 1 is a schematic flow chart of a method for follow-up shots provided in Embodiment 1 of the application. This embodiment is applicable to a situation where a camera device is used to follow-up multiple people.
  • Step 1100 Acquire a captured image of the camera in real time, where the captured image includes at least one target image.
  • the captured image of each frame captured by the camera device is acquired.
  • the imaging device of this embodiment may include a video camera, a camera, and so on.
  • the target image is a pre-selected image of a person in the captured image or all images of a person in the captured image.
  • the target image is a preselected image of a target person in each frame of images captured by the camera device, and may be one person image or multiple person images. In an alternative embodiment, the target image may be an animal image, a vehicle image, or other camera material images in addition to a person image.
  • a character image data set consisting of character images and label data corresponding to each character image is constructed.
  • the label data in this embodiment includes the portrait bounding box of each character, the pixel area of the character’s torso in the character image, and The distance of the character from the lens. In this embodiment, the bounding box of the portrait is used to determine the position of the character image in each frame of the image.
  • the bounding box refers to the area in each frame of the image corresponding to the image of the character, and generally has a vertical or horizontal orientation.
  • the long rectangular frame shape In this embodiment, the size and position of the bounding box depend on the size of the tracking target in the image captured by the lens, and the bounding box can be determined based on a visual tracking method in related technologies.
  • the pixel area in the character image is the pixel area composed of all the pixels corresponding to each character image.
  • the distance between the person and the lens is the distance between the camera device and the human object captured by the camera, which can be directly obtained by the camera device in the related art.
  • Step 1200 Use a pre-trained model to predict the scale information corresponding to each target image in the captured image and the offset information corresponding to each target image.
  • the model of this embodiment may be a deep convolutional neural network model
  • the scale information is the size information of the character frame of the target image
  • the offset information is the position information of the target image movement.
  • the set of bounding boxes is ⁇
  • the bounding box of each portrait is denoted as B i
  • scale is the scale response map
  • the scale information of the bounding box of the portrait is obtained using the following formula.
  • scale(x,y) represents the value of the coordinate (x,y) on the scale chart, and (x pi ,y pi ) is the extreme point position.
  • N is generally taken as 0, 1, 2, or 3, or determined as needed, m and n are index symbols used to traverse the rectangular area.
  • XT is the reference position horizontal coordinate image
  • YT is the reference position vertical coordinate image
  • the offset image DX&DY of each pixel can be directly calculated through the reference position image XT&YT
  • DX is The horizontal offset image
  • DY is the vertical offset image
  • the offset information is obtained through the offset image DX&DY
  • the offset information includes the average offset control amount (d xi , d yi ).
  • dx(i,j) and dy(i,j) are the values of coordinates (i,j) in the offset image DX&DY.
  • s th is the set threshold.
  • Step 1300 Confirm the control offset information of the camera according to the scale information and the offset information.
  • calculation is performed according to the scale information corresponding to each target image and the offset information corresponding to each target image to obtain the control offset information of the camera to ensure that the camera can capture the movement of multiple persons in the image.
  • the calculation process is to perform a weighted calculation on the product of the scale information and the offset information of all target images.
  • the calculation process is to perform a weighted calculation on the product of the offset information of all target images and the power-processed scale information to obtain the control offset information of the camera.
  • the scale value can be exponentiated, for example:
  • is the power exponent. The larger the value, the larger the scale of the object has the more dominant weight. The smaller the value, all targets tend to have the same weight. The selection of this value can design different parameters according to different scenarios.
  • a computer program is used to first obtain a captured image of a camera in real time.
  • the captured image includes at least one target image to obtain information parameters of one or more target images, and then use a pre-trained model to predict The scale information corresponding to each target image in the captured image and the offset information corresponding to each target image are captured, and finally the control offset information of the camera is confirmed according to the scale information and offset information of the target image, so as to realize the tracking of multiple target images.
  • the shooting effect solves the problem that there is no follow-up method for multiple targets or the entire group of objects in related technologies, and realizes the effect that users can follow multiple targets or the entire group of objects in a specific scene.
  • FIG. 2 is a schematic flow chart of another method for taking photos according to Embodiment 2 of the present application.
  • This embodiment is based on the solution of the first embodiment, improved, and provides a solution for the training process of the offset model. As shown in Figure 2, the method includes the following steps:
  • Step 2100 Acquire a captured image of the camera in real time, and the captured image includes at least one target image.
  • Step 2200 Use a pre-trained scale model to predict scale information corresponding to each target image in the captured image.
  • Step 2300 Use the pre-trained offset model to predict offset information corresponding to each target image in the captured image.
  • Step 2400 Confirm the control offset information of the camera according to the scale information and the offset information.
  • the training of the offset model in step 2300 may include the following steps:
  • Step 2210 Obtain training images and corresponding label data from a preset image data set, where the label data includes bounding box information and key point information of the tracking target in the training image.
  • multiple training images are preset in the image data set, and the training image type can be selected according to different shooting targets.
  • portrait shooting is taken as an example, and all the training images collected in the image data set include portraits.
  • These training images can cover many types of main scenes such as indoors, beaches and mountains, and various postures such as running, sitting, lying down and dancing.
  • Each training image in the image data set has corresponding label data.
  • the label data in this embodiment includes the bounding box information and key point information of the tracking target in the training image.
  • the bounding box information includes the position of the bounding box and the size of the bounding box.
  • 17 joint points of the human body are exemplarily selected as key points, and coordinate information corresponding to the joint points are respectively marked as key point information.
  • Each joint point is marked as (xi, yi, si), i is a natural number from 1 to 17, indicating the i-th key point, xi is the horizontal coordinate of the i-th key point, and yi is the vertical coordinate of the i-th key point , When si is equal to 0, it means that the key point does not exist and no mark is needed.
  • si When si is equal to 1, it means that the key point exists.
  • i When i is 1 to 17, it corresponds to the following key point information: 1-top of head, 2-left eye, 3-right eye , 4-nose, 5-throat, 6-left shoulder, 7-left elbow, 8-left wrist, 9-right shoulder, 10-right elbow, 11-right wrist, 12-left hip, 13-left knee, 14-left ankle , 15-right hip, 16-right knee, 17-right ankle.
  • Step 2220 Obtain the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target.
  • the traditional "center control” method controls the center point of the target bounding box to move to the center of the image to complete the composition.
  • the calculation process of this method is simple and does not take into account the influence of the target's posture on the composition, so the shooting effect is quite different from the actual expectation. Therefore, in the shooting method provided in this embodiment, when training the offset model, the difference in composition requirements of the tracking target in different poses is fully considered, and the difference of the tracking target can be distinguished according to the different key point information of the tracking target marked in step 2210 Posture, calculate the reference position of the center point of the bounding box based on the bounding box information and key point information of the tracking target, and can fully simulate the photographer's composition control ability, and its composition effect is better.
  • Step 2230 Obtain a reference position image corresponding to the training image based on the reference position of the center point of the bounding box.
  • the acquisition method is as follows:
  • the reference position set of the center point of the bounding box of all tracking targets is defined as:
  • the initial position coordinates of the bounding box center of each tracking target are defined as:
  • (x, y) is the normalized coordinate of the pixel
  • ⁇ ⁇ , ⁇ 1 is the number of tracking targets in the training image
  • X TG (x, y) is the horizontal coordinate of the reference position of each pixel
  • Y TG (x, y) is the vertical coordinate of the reference position of each pixel
  • x ti and x ci are the horizontal coordinate of the reference position and the initial position of the center point of the bounding box of each tracking target
  • y ti and y ci are respectively
  • the vertical coordinates of the reference position and the vertical coordinates of the initial position of the center point of the bounding box of each tracking target, and the reference position image of the training image can be obtained after the reference position coordinates of each pixel are determined.
  • the reference position image Compared with the image obtained by the traditional "center control" method, the reference position image fully considers the composition requirements when the target pose is different, and the composition effect is more precise and reasonable.
  • Step 2240 Use the deep convolutional neural network to predict the reference position of the training image to obtain the prediction result image.
  • the initial model of the deep convolutional neural network is used to predict the training image to obtain the reference position of the tracking target in the image, and then the prediction result image can be obtained.
  • the horizontal and vertical coordinates of each pixel in the prediction result image are respectively It is X T (x,y), Y T (x,y).
  • Step 2250 Calculate a first loss value according to the reference position image and the prediction result image, and adjust the parameters of the deep convolutional neural network according to the first loss value.
  • the first loss value adopts Euclidean distance loss, and is calculated by formula (2) according to the aforementioned reference position image and prediction result image:
  • X TG (x, y) and Y TG (x, y) are obtained from the formula (1), and X T (x, y) and Y T (x, y) are obtained from the prediction result image.
  • the reference position image is an image that is expected to achieve the composition effect.
  • the first loss value represents the deviation between the predicted result image and the reference position image. Based on the first loss value, the deep convolutional neural network is back-propagated to adjust the deep convolutional neural network parameters to make the prediction The resulting image is closer to the reference position image.
  • Step 2260 Perform steps 2210-2250 sequentially on multiple training images in the image data set, until the first loss value in step 2250 no longer decreases, and end the training of the deep convolutional neural network to obtain a pre-trained migration model .
  • the parameters of the deep convolutional neural network are adjusted according to the first loss value to obtain different first loss values.
  • the first loss value continues to decrease, it indicates that the predicted result image is getting closer and closer to the reference position image.
  • Adjust the deep convolutional neural network until the first loss value no longer decreases, it can be considered that the predicted result image is closest to the reference position image at this time, and the desired deep convolutional neural network model can be obtained as a trained deep neural volume.
  • the paper network model is used.
  • the first loss value standards of different training images are different, and the first loss value referred to here is no longer a decrease in the first loss.
  • the value tends to be stable and meets the expected requirements.
  • the custom first loss value is expected to be lower than k, then at least m consecutive first loss values obtained after multiple trainings using multiple training images When a loss value is always lower than k, it can be regarded as the first loss value no longer decreases.
  • This embodiment provides a method of using a pre-trained model to predict the scale information corresponding to each target image and the offset information corresponding to each target image in the captured image in the first embodiment.
  • the computer program uses the preset Obtain the training image and corresponding label data in the image data set of the training image.
  • the label data includes the bounding box information and key point information of the tracking target in the training image; secondly, the reference of the bounding box center point is obtained according to the bounding box information and key point information of the tracking target Position; then obtain the reference position image corresponding to the training image based on the reference position of the center point of the bounding box; then use the deep convolutional neural network to predict the reference position of the training image to obtain the prediction result image; then calculate the first image based on the reference position image and the prediction result image A loss value, and adjust the parameters of the deep convolutional neural network according to the first loss value; finally, perform the above steps in sequence on multiple training images in the image data set until the first loss value in step 2250 no longer drops , End the training of the deep convolutional neural network, and get the trained migration model.
  • the migration model training method provided in this embodiment solves the problem of how to train the deep neural convolutional network of the migration information, and realizes the effect of better predicting the migration information in the shooting method.
  • FIG. 4 is a schematic flowchart of another camera-following method provided in Embodiment 3 of the present application.
  • This embodiment is based on the solution of the second embodiment and is improved, and provides a solution for obtaining the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target.
  • the method includes the following steps:
  • Step 2222 Generate a grid table based on the training image. Divide the training image into W*H grids, where W and H are natural numbers greater than 1, and each grid provides a position option when calculating the composition position of the bounding box in the subsequent calculation. , The values of W and H can be adjusted according to the accuracy requirements.
  • Step 2224 Obtain a second loss value when the center of the bounding box is placed at a different grid center.
  • the horizontal coordinate range and vertical coordinate range of the image are both [0, 1].
  • the setting of the reference point and the reference line can be adjusted based on different composition requirements.
  • the horizontal coordinate range is changed by the above reference point and reference line.
  • vertical coordinate range The limited area is set as the best composition area for tracking the target.
  • the key line segment is defined according to the key point information of the tracking target.
  • the key line segment is used to supplement the posture information of the tracking target.
  • the posture reflected by the key point may have some errors under certain circumstances. Combining the key line segment based on the key point can reflect the tracking more clearly The posture of the target, for example:
  • p i and p j represent two different points
  • x pi and y pi represent the horizontal and vertical coordinates of point p i
  • x pj and y pj represent points, respectively The horizontal and vertical coordinates of p j .
  • P xy (x/W, y/H), and L xy is a normalized two-point line segment.
  • the second loss value can reflect the degree of conformity between the tracking target and the customized optimal composition area when the bounding box is placed in different positions. The smaller the second loss value, the closer to the customized optimal composition area.
  • Step 2226 Select the center position of the grid with the smallest second loss value as the reference position of the center point of the bounding box.
  • custom grids, reference points, and reference lines can be adjusted according to different requirements for image accuracy.
  • the key points of the tracking target and the relationship between the key line segments and the key points can also be customized. For example, when the accuracy is higher, W and H can be increased, that is, the number of grids of the image segmentation grid is increased.
  • This embodiment provides the process of obtaining the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target in the third embodiment.
  • the computer program divides the training image by generating a grid table based on the training image Are W*H grids, secondly, obtain the second loss value when the center of the bounding box is placed at a different grid center, and then select the center position of the grid with the smallest second loss value as the reference position of the center point of the bounding box, The problem of better obtaining the reference position of the center point of the bounding box is solved, and the effect of better obtaining the offset information in the offset model training is realized.
  • FIG. 5 is a schematic flow chart of another method for taking photos according to Embodiment 4 of the present application. This embodiment is based on the solution of the second embodiment, has been improved, and provides a solution for the training process of the scale model. As shown in Figure 5, the method includes the following steps:
  • Step 2310 Obtain a Gaussian response map of the training sample image.
  • first by formula Calculate the relative scale S of the bounding box of each person in the person image, where w is the pixel width of the person image, h is the pixel height of the person image, and As is the absolute scale of the person. d is the distance between the person and the camera, and a is the pixel area of the person's torso in the person image; then, according to the relative scale S of the bounding box of each person's portrait, a Gaussian response map with the same scale as the person image is generated, and the Gaussian response is The extreme point of the figure is located in the center of the bounding box of the portrait, and the size of the extreme point is equal to the relative scale S; finally, the Gaussian response map of each character is superimposed to form the Gaussian response map of the character image; perform the above for all training sample images In three steps, the Gaussian response map corresponding to each training sample image is obtained.
  • Step 2320 Use the deep convolutional neural network to process the training sample image to obtain a scale response map of the training sample image.
  • a deep convolutional neural network is used to process the person image in the training sample image to obtain a scale response map of the same size as the person image in the training sample image.
  • Step 2330 Perform Euclidean distance loss calculation on the Gaussian response map and the scale response map, and adjust the parameters of the deep convolutional neural network according to the calculation result.
  • the Gaussian response map generated in step 2310 and the scale response map obtained in step 2320 are calculated for Euclidean distance loss, and the calculation result is adjusted using a backpropagation algorithm to adjust the parameters of the deep convolutional neural network.
  • Step 2340 Steps 2310-2330 are sequentially performed on multiple training sample images until the calculated Euclidean distance loss no longer decreases, the training of the deep convolutional neural network is ended, and the pre-trained scale model is obtained.
  • adjusting the parameters of the deep convolutional neural network according to the Euclidean distance loss will result in different Euclidean distance loss.
  • the Euclidean distance loss continues to decrease, it indicates that the predicted result image is getting closer and closer.
  • Scale response map continuously adjust the deep convolutional neural network, and finally, when the Euclidean distance loss is no longer reduced, it can be regarded as the predicted result image is closest to the scale response map at this time, and the desired deep convolutional neural network can be obtained at this time
  • the model is used as a trained deep neural network model.
  • the Euclidean distance loss standard for different training sample images is different, and the Euclidean distance loss referred to here is no longer Descent is a way of expressing that the Euclidean distance loss tends to be stable and meets the expected requirements.
  • This embodiment provides a method for training a mesoscale model in the follow-up method. Firstly, the Gaussian response map of the training sample image is obtained, and then the deep convolutional neural network is used to process the training sample image to obtain the scale response map of the training sample image. Gaussian response graph and scale response graph are used to calculate the Euclidean distance loss, and the parameters of the deep convolutional neural network are adjusted according to the calculation results. Finally, the above steps are performed sequentially on multiple training sample images until the calculated Euclidean distance loss is not Decrease again, end the training of the deep convolutional neural network, and get a trained scale model. The problem of how to obtain the trained deep convolutional neural network corresponding to the scale model is solved, and the effect of better training the scale model is achieved.
  • FIG. 6 is a schematic structural diagram of a camera follower provided by Embodiment 5 of the present application.
  • the camera follower provided by the embodiment of the present application may include: an acquisition module 3100 configured to acquire a captured image of a camera in real time, and the captured image includes at least one target image; a calculation module 3200 configured to use a pre-trained model to predict The scale information corresponding to each target image and the offset information corresponding to each target image in the captured image; the control module 3300 is configured to confirm the control offset information of the camera according to the scale information and the offset information.
  • control module 3300 can also be replaced with a weighting control module, which is configured to perform a weighted calculation on the product of the scale information and the offset information corresponding to the target image to obtain the control offset information of the camera.
  • a weighting control module configured to perform a weighted calculation on the product of the scale information and the offset information corresponding to the target image to obtain the control offset information of the camera.
  • the weighting control module can be replaced with a power processing control module, which is configured to perform weighting calculation on the product of the offset information corresponding to all target images and the scale information after the power processing to obtain the control offset information of the camera.
  • the acquisition module 3100 can also be replaced with a person acquisition module, which is configured to acquire the captured image of the camera in real time.
  • the captured image includes at least one target image, and the target image is a preselected image of a person or object in the captured image. All character object images.
  • the calculation module 3200 may further include a scale calculation module and an offset calculation module.
  • the scale calculation module is configured to use a pre-trained scale model to predict the scale information corresponding to each target image in the captured image; the offset calculation module It is set to use a pre-trained offset model to predict the offset information corresponding to each target image in the captured image.
  • the offset calculation module includes: an offset acquisition unit configured to acquire training images and corresponding label data from a preset image data set, the label data including bounding box information and key points of the tracking target in the training image information.
  • the center point obtaining unit is configured to obtain the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target.
  • the reference position obtaining unit is configured to obtain a reference position image corresponding to the training image based on the reference position of the center point of the bounding box.
  • the convolutional neural network calculation unit is configured to use the deep convolutional neural network to predict the reference position of the training image to obtain the prediction result image.
  • the loss value calculation unit is configured to calculate the first loss value according to the reference position image and the prediction result image, and adjust the parameters of the deep convolutional neural network according to the first loss value.
  • the convolutional neural network training unit is set to sequentially perform steps 2210-2250 for training on multiple training images in the image data set, until the first loss value in step 2250 no longer decreases, and the training of the deep convolutional neural network ends, Get the pre-trained offset model.
  • the center point acquisition unit includes: a grid table generating subunit, configured to divide the training image into W*H grids, where W and H are natural numbers greater than 1, to generate a grid table.
  • the loss value acquisition subunit is set to acquire the second loss value when the center of the bounding box is placed at a different grid center.
  • the reference position acquisition subunit is set to select the center position of the grid with the smallest second loss value as the reference position of the center point of the bounding box.
  • the scale calculation module includes: a Gaussian response graph unit configured to obtain a Gaussian response graph of the training sample image.
  • the scale response map unit is set to use the deep convolutional neural network to process the training sample image to obtain the scale response map of the training sample image.
  • the Euclidean distance loss unit is set to perform Euclidean distance loss calculation on the Gaussian response graph and the scale response graph, and adjust the parameters of the deep convolutional neural network according to the calculation result.
  • Obtain the scale model unit set to perform steps 2310-2330 on multiple training sample images in sequence until the calculated Euclidean distance loss no longer decreases, and end the training of the deep convolutional neural network to obtain a pre-trained scale model .
  • the technical solution of this embodiment solves the problem of the lack of a follow-up method for multiple targets or the entire group of objects in the related art by providing a follow-up device, and realizes that the user can follow the photo in a specific scene. The effect of multiple targets or the entire group of objects being followed.
  • FIG. 7 is a schematic structural diagram of a computer device provided by Embodiment 6 of the application.
  • the computer device includes a memory 4100 and a processor 4200.
  • the number of processors 4200 in the computer device may be one or more.
  • a processor 4200 is taken as an example; the memory 4100 and the processor 4200 in the device may be connected through a bus or other methods, and the connection through a bus is taken as an example in FIG. 7.
  • the memory 4100 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the tracking method in the embodiment of the application (for example, the acquisition in the tracking device Module 3100, calculation module 3200, control module 3300).
  • the processor 4200 executes at least one functional application and data processing of the device/terminal by running the software programs, instructions, and modules stored in the memory 4100, that is, realizes the above-mentioned tracking method.
  • the processor 4200 is configured to run a computer program stored in the memory 4100 to implement the following steps: obtain real-time captured images of the camera, the captured images include at least one target image; use a pre-trained model to predict each of the captured images Scale information corresponding to each target image and offset information corresponding to each target image; confirm the control offset information of the camera according to the scale information and offset information.
  • the computer program of the computer device provided in the embodiment of the present application is not limited to the above method operations, and can also perform related operations in the follow-up method provided in any embodiment of the present application.
  • the memory 4100 may mainly include a program storage area and a data storage area.
  • the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the terminal, and the like.
  • the memory 4100 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
  • the memory 4100 may include a memory remotely provided with respect to the processor 4200, and these remote memories may be connected to the device/terminal/device through a network. Examples of the aforementioned networks include but are not limited to the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
  • the seventh embodiment of the present application also provides a storage medium containing computer executable instructions, and a computer program is stored thereon.
  • the computer program includes program instructions.
  • a method for follow-up is implemented. Including: real-time acquisition of the camera's captured image, the captured image includes at least one target image; using a pre-trained model to predict the scale information corresponding to each target image in the captured image and the offset information corresponding to each target image; according to the scale information and The offset information confirms the control offset information of the camera.
  • An embodiment of the present application provides a storage medium containing computer-executable instructions.
  • the computer-executable instructions are not limited to the above method operations, and can also perform related operations in the follow-up method provided by any embodiment of the present application.
  • this application can be implemented by software and general hardware, or can be implemented by hardware.
  • the technical solution of this application can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, read-only memory (ROM), Random Access Memory (RAM), flash memory (FLASH), hard disk or optical disk, etc., including multiple instructions to make a computer device (which can be a personal computer, device, or network device, etc.) execute any of this application The method described in the embodiment.
  • the multiple units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be realized; in addition, each functional unit
  • the names are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种跟拍方法、装置、设备及存储介质,所述方法包括:实时获取摄像机的拍摄图像,所述拍摄图像包括至少一个目标图像(1100);利用预先训练好的模型预测所述拍摄图像中每个目标图像对应的尺度信息和每个目标图像对应的偏移信息(1200);根据尺度信息和偏移信息确认摄像机的控制偏移信息(1300)。

Description

跟拍方法、装置、设备及存储介质
本申请要求在2019年06月12日提交中国专利局、申请号为201910505922.0的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。
技术领域
本申请涉及拍摄技术领域,例如涉及一种跟拍方法、装置、设备及存储介质。
背景技术
在许多领域中,为了更好的拍摄效果,都需要通过摄像机自动跟拍一个需要跟踪的目标对象。在跟拍镜头中,目标对象通常在画面中的位置相对稳定,而且景别也保持不变。这就要求拍摄者与目标对象的运动速度基本一致,这样才能够保证目标对象在画面中的位置相对稳定,既不会使目标对象移出画面,也不会出现景别的变化。这种拍摄方式通过摄像机的运动,可以记录下目标对象的姿态、动作等,同时不会干扰被摄对象,在一种相对自然的状态下表现被摄人物。
然而,在许多场景下,都需要对目标对象或者整个群体对象进行跟拍,相关技术中一般只能针对单个目标对象进行跟拍,例如针对单个人物的运动轨迹进行跟拍。因此,需要一种合适的方法能够对多个目标对象进行有效跟拍,使拍摄镜头内能够展现多个目标运动轨迹。
发明内容
本申请提供一种跟拍方法、装置、设备及存储介质,以实现对多个目标对象或整个群体对象进行跟拍的效果。
本申请实施例提供了一种跟拍方法,该跟拍方法包括:
实时获取摄像机的拍摄图像,所述拍摄图像包括至少一个目标图像;
利用预先训练好的模型预测所述拍摄图像中每个目标图像对应的尺度信息和每个目标图像对应的偏移信息;
根据所述尺度信息和偏移信息确认摄像机的控制偏移信息。
本申请实施例提供了一种跟拍装置,该跟拍装置包括:
获取模块,设置为实时获取摄像机的拍摄图像,所述拍摄图像包括至少一 个目标图像;
计算模块,设置为利用预先训练好的模型预测所述拍摄图像中每个目标图像对应的尺度信息和每个目标图像对应的偏移信息;
控制模块,设置为根据所述尺度信息和偏移信息确认摄像机的控制偏移信息。
本申请实施例提供了一种计算机设备,所述设备包括:
一个或多个处理器;
存储器,设置为存储一个或多个程序,
当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如上述中任一所述的跟拍方法。
本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序包括程序指令,该程序指令被处理器执行时实现如上述中任一所述的跟拍方法。
附图说明
图1是本申请实施例一提供的一种跟拍方法的流程示意图;
图2是本申请实施例二提供的另一种跟拍方法的流程示意图;
图3是本申请实施例二提供的另一种跟拍方法的流程示意图;
图4是本申请实施例三提供的另一种跟拍方法的流程示意图;
图5是本申请实施例四提供的另一种跟拍方法的流程示意图;
图6是本申请实施例五提供的一种跟拍装置的结构示意图;
图7是本申请实施例六提供的一种跟拍设备的结构示意图。
具体实施方式
下面结合附图和实施例对本申请进行说明。本文所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。
一些示例性实施例被描述成作为流程图描绘的处理或方法。虽然流程图将多个步骤描述成顺序的处理,但是本文中的许多步骤可以被并行地、并发地或者同时实施。此外,多个步骤的顺序可以被重新安排。当其操作完成时处理可以被终止,但是还可以具有未包括在附图中的附加步骤。处理可以对应于方法、 函数、规程、子例程、子程序等等。
此外,术语“第一”、“第二”等可在本文中用于描述多种方向、动作、步骤或元件等,但这些方向、动作、步骤或元件不受这些术语限制。这些术语仅用于将一个方向、动作、步骤或元件与另一个方向、动作、步骤或元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一速度差值称为第二速度差值,且类似地,可将第二速度差值称为第一速度差值。第一速度差值和第二速度差值两者都是速度差值,但第一速度差值和第二速度差值不是同一速度差值。术语“第一”、“第二”等不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。在本申请的描述中,“多个”的含义是至少两个,例如两个,三个等,除非另有明确的限定。
实施例一
图1为本申请实施例一提供的一种跟拍方法的流程示意图,本实施例适用于采用摄像装置对多人跟拍的情况,所述方法包括如下步骤:
步骤1100、实时获取摄影机的拍摄图像,拍摄图像包括至少一个目标图像。
一实施例中,在摄像机拍摄过程中,获取摄像装置拍摄的每一帧的拍摄图像。本实施例的摄像装置可以包括摄影机、照相机等等。
一实施例中,目标图像为预先选中的拍摄图像中的人物对象图像或拍摄图像中所有的人物对象图像。
一实施例中,目标图像是预先选中的在摄像装置拍摄的每一帧图像中的目标人物的图像,可以为一个人物图像或者多个人物图像。替代实施例中,目标图像除了人物图像还可以是动物图像、车辆图像或者其他摄像素材图像。本实施例中,构建由人物图像及每张人物图像对应的标记数据组成的人物图像数据集,本实施例的标记数据包括每个人物的人像边界框、人物躯干在人物图像中的像素面积及人物离镜头的距离。本实施例中,人像边界框用于确定人物图像在每一帧图像中的位置,边界框指对应于人物图像所出现的画面在每一帧图像中的区域范围,并且一般具有在纵向或横向上长的矩形外框形状。本实施例中,边界框的大小和位置取决于跟踪目标在镜头所采集的图像中的大小,边界框可以基于相关技术中的视觉追踪方法确定。人物图像中的像素面积为每个人物图像对应的所有像素点组成的像素面积。人物离镜头的距离为摄像装置距离所拍摄的人物对象的距离,可以由相关技术中的摄像装置直接获取。
步骤1200、利用预先训练好的模型预测拍摄图像中每个目标图像对应的尺度信息和每个目标图像对应的偏移信息。
一实施例中,本实施例的模型可以为深度卷积神经网络模型,尺度信息为目标图像的人物框尺度大小信息,偏移信息为目标图像移动的位置信息。将人物图像数据集的每张人物图像对应的标记数据输入到训练好的深度卷积神经网络模型中得到每个目标图像对应的尺度响应图和参考位置图像;通过对应的尺度响应图,假定人像边界框的集合为Ψ,每个人像边界框表示为B i,scale为尺度响应图,使用以下公式获得人像边界框的尺度信息。
计算人像边界框内scale的极值点位置:
(x pi,y pi)=max{scale(x,y)|(x,y)∈B i}。
scale(x,y)表示scale图上坐标为(x,y)的值,(x pi,y pi)为极值点位置。
计算极值点附近一片矩形区域(边长为2N+1)的均值作为尺度信息s i
Figure PCTCN2019103654-appb-000001
N一般取为0、1、2或3,或者根据需要确定,m和n是用于遍历矩形区域的索引符号。
根据深度卷积神经网络模型得到的参考位置图像XT&YT,XT为参考位置水平坐标图像,YT为参考位置垂直坐标图像,通过参考位置图像XT&YT可以直接计算得到每个像素的偏移图像DX&DY,DX为水平偏移图像,DY为垂直偏移图像,通过偏移图像DX&DY得到偏移信息,偏移信息包括平均偏移控制量(d xi,d yi)。
在偏移图像DX&DY上每个像素的偏移计算如下:
DX(x,y)=XT(x,y)-x;
DY(x,y)=YT(x,y)-y。
在DX&DY图像上计算每个目标图像的平均偏移控制量(d xi,d yi),如下:
Figure PCTCN2019103654-appb-000002
Figure PCTCN2019103654-appb-000003
Figure PCTCN2019103654-appb-000004
dx(i,j)和dy(i,j)是偏移图像DX&DY中坐标为(i,j)的值。s th是设定的阈值。
步骤1300、根据尺度信息和偏移信息确认摄像机的控制偏移信息。
一实施例中,根据每个目标图像对应的尺度信息和每个目标图像对应的偏移信息进行计算,得到摄像机的控制偏移信息,保证摄像机能够对图像中的多人移动进行捕捉。
一实施例中,计算过程为对所有目标图像的尺度信息和偏移信息的乘积进行加权计算。
一实施例中,基于所有的目标图像B i∈Ψ,我们知道了每个目标图像的尺度信息s i和偏移信息(d xi,d yi),可以使用平均计算,求得最终的偏移控制量:
Figure PCTCN2019103654-appb-000005
Figure PCTCN2019103654-appb-000006
替代实施例中,计算过程为对所有目标图像的偏移信息和经过幂处理的尺度信息的乘积进行加权计算得到摄像机的控制偏移信息。
为了给距离更近的对象分配更高的控制权重,可以对尺度值进行幂处理,例如:
Figure PCTCN2019103654-appb-000007
Figure PCTCN2019103654-appb-000008
β为幂指数。该值越大,尺度越大的对象具备更主导的权重。该值越小,则所有目标趋向于同等权重。该值的选择可依据场景的不同设计不同的参数。
本实施例提供的一种跟拍方法,首先通过计算机程序实时获取摄像机的拍摄图像,该拍摄图像包括至少一个目标图像,得到一个或者多个目标图像的信息参数,然后利用预先训练好的模型预测拍摄图像中每个目标图像对应的尺度信息和每个目标图像对应的偏移信息,最后根据目标图像的尺度信息和偏移信息确认摄像机的控制偏移信息,从而实现对多个目标图像进行跟拍的效果,解决了相关技术中没有对多个目标或整个群体对象进行跟拍的跟拍方法的问题,实现了用户能够在特定场景下对多个目标或整个群体对象进行跟拍的效果。
实施例二
请参考图2,图2是本申请实施例二提供的另一种跟拍方法的流程示意图。本实施例以实施例一方案为基础,进行了改进,提供了偏移模型的训练过程的方案。如图2所示,该方法包括如下步骤:
步骤2100,实时获取摄影机的拍摄图像,拍摄图像包括至少一个目标图像。
步骤2200,利用预先训练好的尺度模型预测拍摄图像中每个目标图像对应的尺度信息。
步骤2300,利用预先训练好的偏移模型预测拍摄图像中每个目标图像对应的偏移信息。
步骤2400,根据尺度信息和偏移信息确认摄像机的控制偏移信息。
一实施例中,如图3所示,步骤2300中对偏移模型的训练可以包括以下步骤:
步骤2210、从预先设定的图像数据集中获取训练图像和对应的标记数据,标记数据包括训练图像中跟踪目标的边界框信息和关键点信息。
本实施例中,图像数据集中预先设置有多张训练图像,训练图像类型可以根据拍摄目标不同自行选择,本实施例中以人像拍摄为例,图像数据集中搜集的均为包括人像的训练图像,这些训练图像可以覆盖多类主要场景如:室内、海边和山上以及多种姿态如:跑步、打坐、平躺和舞蹈。
图像数据集中每张训练图像都具有对应的标记数据,本实施列的标记数据包括训练图像中跟踪目标的边界框信息和关键点信息。边界框信息包括边界框的位置和边界框的大小。本实施例中,示例性的选择人体的17个关节点作为关键点,分别标记关节点对应的坐标信息作为关键点信息。每个关节点标记为(xi,yi,si),i为1到17的自然数,表示第i个关键点,xi为第i个关键点的水平坐标,yi为第i个关键点的垂直坐标,si等于0时表示该关键点不存在不用标记,si等于1时表示该关键点存在,i为1到17时分别对应以下关键点信息:1-头顶、2-左眼、3-右眼、4-鼻子、5-咽喉、6-左肩、7-左肘、8-左腕、9-右肩、10-右肘、11-右腕、12-左臀、13-左膝、14-左踝、15-右臀、16-右膝、17-右踝。
步骤2220、根据跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置。
传统的“中心控制”法控制目标边界框中心点移动到图像的中心完成构图,这种方式计算过程简单并未考虑到目标的姿态不同对构图的影响,因而拍摄效果与实际期望相差较大,因此,本实施例提供的拍摄方法中,在训练偏移模型时充分考虑跟踪目标不同姿态时的构图需求差异,根据步骤2210中所标记的跟踪目标的关键点信息不同可以区别出跟踪目标的不同姿态,根据跟踪目标的边 界框信息和关键点信息计算边界框中心点的参考位置,并且能够充分模拟摄影师的构图控制能力,其构图效果更好。
步骤2230、基于边界框中心点的参考位置获取训练图像对应的参考位置图像。
当训练图像中存在多个目标人像时,需要根据每个跟踪目标的边界框中心点的参考位置、边界框中心点的初始位置和跟踪目标数量获取训练图像对应的参考位置图像,获取方式如下:
所有跟踪目标的边界框中心点的参考位置集合定义为:
Θ={O(P i)}={(x ti,y ti)}。
每个跟踪目标的边界框中心的初始位置坐标定义为:
Δ={(x ci,y ci)}。
训练图像中每个像素的参考位置计算公式:
Figure PCTCN2019103654-appb-000009
式(1)中,(x,y)为像素归一化坐标,∑ Θ,Δ1为训练图像中的跟踪目标数量,X TG(x,y)为每个像素参考位置的水平坐标,Y TG(x,y)为每个像素参考位置的垂直坐标,x ti、x ci分别为每个跟踪目标的边界框中心点的参考位置水平坐标和初始位置水平坐标,y ti、y ci分别为每个跟踪目标的边界框中心点的参考位置垂直坐标和初始位置垂直坐标,当每个像素的参考位置坐标确定后即可得到训练图像的参考位置图像。
参考位置图像与传统“中心控制”法得到的图像相比更充分地考虑到了目标姿态不同时的构图需求,构图效果更精细合理。
步骤2240、利用深度卷积神经网络预测训练图像的参考位置以得到预测结果图像。
一实施例中,利用深度卷积神经网络初始模型对训练图像进行预测,得到跟踪目标在图像中的参考位置,进而可以得到预测结果图像,预测结果图像中每个像素的水平坐标和垂直坐标分别为X T(x,y)、Y T(x,y)。
步骤2250、根据参考位置图像和预测结果图像计算第一损失值,并根据第一损失值对深度卷积神经网络神的参数进行调节。
一实施例中,第一损失值采用欧几里得距离损失,根据前述得到参考位置图像和预测结果图像通过公式(2)计算得到:
L=∑ x,y(X TG(x,y)-X T(x,y)) 2+∑ x,y(Y TG(x,y)-Y T(x,y)) 2    (2)
(2)式中X TG(x,y)、Y TG(x,y)由(1)式求得,X T(x,y)、Y T(x,y)由预测结果图像求得。参考位置图像是期望实现构图效果的图像,第一损失值表示预测结果图像与参考位置图像偏差,基于第一损失值对深度卷积神经网络进行反向传播调节深度卷积神经网络参数,使得预测结果图像更接近参考位置图像。
步骤2260、对图像数据集中的多张训练图像依次执行步骤2210-2250,直到步骤2250中的第一损失值不再下降,结束对深度卷积神经网络的训练,得到预先训练好的偏移模型。
一实施例中,根据第一损失值调整深度卷积神经网络的参数,会得到不同的第一损失值,当第一损失值不断下降时表明预测结果图像越来越接近参考位置图像,不断地调节深度卷积神经网络,直到第一损失值不再降低时可以视为此时预测结果图像最接近参考位置图像,此时可以获得所期望的深度卷积神经网络模型作为训练好的深度神经卷纸网络模型使用。
由于不同训练图像得到的第一损失值之间始终可能存在一定差异,不同训练图像的第一损失值标准是不同的,此处所指的第一损失值不再下降是一种表示第一损失值趋于稳定且达到预期要求的表述方式,示例性的:自定义第一损失值预期要求为低于k,则在采用多个训练图像进行的多次训练后得到的至少m个连续的第一损失值始终低于k时即可视为第一损失值不再下降。
本实施例提供了实施例一中的利用预先训练好的模型预测拍摄图像中每个目标图像对应的尺度信息和每个目标图像对应的偏移信息的方法,首先,计算机程序通过从预先设定的图像数据集中获取训练图像和对应的标记数据,该标记数据包括训练图像中跟踪目标的边界框信息和关键点信息;其次根据跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置;然后基于边界框中心点的参考位置获取训练图像对应的参考位置图像;接着利用深度卷积神经网络预测训练图像的参考位置以得到预测结果图像;随后根据参考位置图像和预测结果图像计算第一损失值,并根据第一损失值对深度卷积神经网络神的参数进行调节;最后对图像数据集中的多张训练图像依次执行以上步骤,直到步骤2250中的第一损失值不再下降时,结束对深度卷积神经网络的训练,得到训练好的偏移模型。本实施例提供的偏移模型训练方法解决了如何训练偏移信息的深度神经卷积网络的问题,实现了更好预测跟拍方法中偏移信息的效果。
实施例三
请参考图4,图4是本申请实施例三提供的另一种跟拍方法的流程示意图。 本实施例以实施例二方案为基础,进行了改进,提供了根据跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置的方案。如图4所示,该方法包括如下步骤:
步骤2222、基于训练图像生成一幅网格表将训练图像划分为W*H个网格,W、H为大于1的自然数,每个网格在后续计算边界框的构图位置时提供一个位置选择,W、H的数值可根据精度需求调整。
步骤2224、获取将边界框中心放置于不同的网格中心时的第二损失值。
第二损失值的计算过程如下:
图像的水平坐标范围和垂直坐标范围均为[0,1]。
定义一组参考点,示例如下:
Figure PCTCN2019103654-appb-000010
定义一组参考线,示例如下:
Figure PCTCN2019103654-appb-000011
参考点和参考线的设置基于构图需求不同可自行调整,本实施例中通过上述参考点、参考线,将水平坐标范围
Figure PCTCN2019103654-appb-000012
和垂直坐标范围
Figure PCTCN2019103654-appb-000013
所限定的区域定为追踪目标最佳构图区域。
基于跟踪目标的关键点信息定义跟踪目标的关键点集合和对应的权值参数集合:
P={p i},i=1,2,…,17;
W p={w pi},i=1,2,…,17。
根据跟踪目标的关键点信息定义关键线段,关键线段用于补充跟踪目标的姿态信息,基于关键点所体现的姿态在一定情况下存在一些误差,结合基于关键点的关键线段可以更清晰的体现跟踪目标的姿态,示例性的为:
L1:鼻子->{左臀和右臀中点};
L2:左肩->左肘;
L3:左肘->左腕;
L4:右肩->右肘;
L5:右肘->右腕;
L6:左臀->左膝;
L7:左膝->左踝;
L8:右臀->右膝;
L9:右膝->右踝。
基于上述9条关键线段分别定义跟踪目标的关键线段集合和对应的权值参数集合:
L={l j},j=1,2,…,9;
W l={w lj},j=1,2,…,9。
当跟踪目标的姿态不同时,目标的关键点位置发生变化,上述关键线段的长度、位置均会对应发生变化。
关键点与参考点之间的距离计算公式:
Figure PCTCN2019103654-appb-000014
关键点与参考点之间的距离计算公式中p i、p j分别代表两个不同的点,x pi、y pi分别表示点p i的水平坐标和垂直坐标,x pj、y pj分别表示点p j的水平坐标和垂直坐标。
关键线与参考线之间的距离计算公式:
Figure PCTCN2019103654-appb-000015
关键线与参考线之间的距离计算公式中,(x c,y c)是线段l的中点,x=a表示一条垂直线,y=a表示一条水平线。
将边界框中心分别放置到不同网格的中心(x,y)处,计算此时第二损失值损失值D xy
Figure PCTCN2019103654-appb-000016
Figure PCTCN2019103654-appb-000017
D xy=D p+D l
上述公式中,P xy=P→(x,y)为关键点归一化,L xy=L→(x,y)为关键线段归一化。
在一实施例中,P xy=(x/W,y/H),L xy为归一化后的两点的线段。
第二损失值可以体现将边界框放置到不同位置时跟踪目标与自定义的最佳 构图区域的符合程度,第二损失值越小越接近自定义的最佳构图区域。
步骤2226、选取第二损失值最小的网格的中心位置作为边界框中心点的参考位置。
Figure PCTCN2019103654-appb-000018
时,选取(x t,y t)作为边界框中心点的参考位置,在自定义的网格、参考点和参考线不变的情况下,(x t,y t)与对应的关键点信息(此处包括关键线段)关系是确定的,即映射关系为(x t,y t)=Ο(P),P为镜头追踪拍摄目标的关键点信息。
在替代实施例中,根据对图像精度的需求不同可以调整自定义的网格、参考点和参考线。一实施例中,还可以自定义跟踪目标的关键点以及关键线段和关键点的关系。例如精度要求较高时,可以将W、H提高,即增加了图像分割网格的格数。
本实施例提供了实施例三中根据跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置的方案的流程,首先计算机程序通过基于训练图像生成一幅网格表将训练图像划分为W*H个网格,其次获取将边界框中心放置于不同的网格中心时的第二损失值,其次选取第二损失值最小的网格的中心位置作为边界框中心点的参考位置,解决了更好获取边界框中心点的参考位置的问题,实现了在偏移模型训练中更好获得偏移信息的效果。
实施例四
请参考图5,图5是本申请实施例四提供的另一种跟拍方法的流程示意图。本实施例以实施例二方案为基础,进行了改进,提供了尺度模型的训练过程的方案。如图5所示,该方法包括如下步骤:
步骤2310、获取训练样本图像的高斯响应图。
一实施例中,首先,通过公式
Figure PCTCN2019103654-appb-000019
计算得到人物图像中每个人物的人像边界框的相对尺度S,该公式中w为人物图像的像素宽度,h为人物图像的像素高度,As为人物绝对尺度的量,
Figure PCTCN2019103654-appb-000020
d为人物离镜头的距离,a为人物躯干在人物图像中的像素面积;然后,根据每个人物的人像边界框的相对尺度S生成一幅与人物图像尺度一样的高斯响应图,该高斯响应图的极值点位于人像边界框的中心,极值点的大小等于相对尺度S;最后,由每个人物的高斯响应图叠加在一起形成人物图像的高斯响应图;对所有训练样本图像执行以上三个步骤,得到每张训练样本图像对应的高斯响应图。
步骤2320、使用深度卷积神经网络处理训练样本图像,得到训练样本图像的尺度响应图。
一实施例中,使用深度卷积神经网络处理训练样本图像中的人物图像,得到一幅与训练样本图像中的人物图像相同大小的尺度响应图。
步骤2330、将高斯响应图与尺度响应图进行欧几里得距离损失计算,根据计算结果调节深度卷积神经网络的参数。
一实施例中,将步骤2310生成的高斯响应图与2320获得的尺度响应图进行欧几里得距离损失计算,并将计算结果使用反向传播算法对深度卷积神经网络的参数进行调节。
步骤2340、对多张训练样本图像依次执行步骤2310-2330,直到计算的欧几里得距离损失不再下降,结束对深度卷积神经网络的训练,得到预先训练好的尺度模型。
一实施例中,根据欧几里得距离损失调整深度卷积神经网络的参数,会得到不同的欧几里得距离损失,当欧几里得距离损失不断下降时表明预测结果图像越来越接近尺度响应图,不断地调节深度卷积神经网络,最终欧几里得距离损失不再降低时可以视为此时预测结果图像最接近尺度响应图,此时可以获得所期望的深度卷积神经网络模型作为训练好的深度神经卷纸网络模型使用。
由于不同训练样本图像得到的欧几里得距离损失之间始终可能存在一定差异,不同训练样本图像的欧几里得距离损失标准是不同的,此处所指的欧几里得距离损失不再下降是一种表示欧几里得距离损失趋于稳定且达到预期要求的表述方式,示例性的:自定义欧几里得距离损失预期要求为低于k,则在采用多个训练图像进行的多次训练后得到的至少m个连续的欧几里得距离损失始终低于k时即可视为欧几里得距离损失不再下降。
本实施例提供了一种跟拍方法中尺度模型训练的方法,首先获取训练样本图像的高斯响应图,其次使用深度卷积神经网络处理训练样本图像,得到训练样本图像的尺度响应图,然后将高斯响应图与尺度响应图进行欧几里得距离损失计算,根据计算结果调节深度卷积神经网络的参数,最后对多张训练样本图像依次执行以上步骤,直到计算的欧几里得距离损失不再下降,结束对深度卷积神经网络的训练,得到训练好的尺度模型。解决了如何得到对应尺度模型的训练好的深度卷积神经网络的问题,实现了更好地训练尺度模型的效果。
实施例五
本申请实施例五所提供的跟拍装置可执行本申请任意实施例所提供的跟拍 方法,具备执行方法相应的功能模块和有益效果。图6是本申请实施例五提供的一种跟拍装置的结构示意图。参照图6,本申请实施例提供的跟拍置可以包括:获取模块3100,设置为实时获取摄像机的拍摄图像,拍摄图像包括至少一个目标图像;计算模块3200,设置为利用预先训练好的模型预测拍摄图像中每个目标图像对应的尺度信息和每个目标图像对应的偏移信息;控制模块3300,设置为根据尺度信息和偏移信息确认摄像机的控制偏移信息。
一实施例中,控制模块3300还可以替换为加权控制模块,设置为对所述目标图象对应的尺度信息和偏移信息的乘积进行加权计算得到摄像机的控制偏移信息。
一实施例中,加权控制模块换可以替换为幂处理控制模块,设置为对所有目标图象对应的偏移信息和经过幂处理的尺度信息的乘积进行加权计算得到摄像机的控制偏移信息。
一实施例中,获取模块3100还可以替换为人物获取模块,设置为实时获取摄像机的拍摄图像,拍摄图像包括至少一个目标图像,目标图像为预先选中的拍摄图像中的人物对象图像或拍摄图像中所有的人物对象图像。
一实施例中,计算模块3200还可以包括为尺度计算模块和偏移计算模块,尺度计算模块设置为利用预先训练好的尺度模型预测拍摄图像中每个目标图像对应的尺度信息;偏移计算模块设置为利用预先训练好的偏移模型预测拍摄图像中每个目标图像对应的偏移信息。
一实施例中,偏移计算模块包括:偏移获取单元,设置为从预先设定的图像数据集中获取训练图像和对应的标记数据,标记数据包括训练图像中跟踪目标的边界框信息和关键点信息。中心点获取单元,设置为根据跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置。参考位置获取单元,设置为基于边界框中心点的参考位置获取训练图像对应的参考位置图像。卷积神经网络计算单元,设置为利用深度卷积神经网络预测训练图像的参考位置以得到预测结果图像。损失值计算单元,设置为根据参考位置图像和预测结果图像计算第一损失值,并根据第一损失值对深度卷积神经网络神的参数进行调节。卷积神经网络训练单元,设置为对图像数据集中的多张训练图像依次执行步骤2210-2250进行训练,直到步骤2250中的第一损失值不再下降,结束对深度卷积神经网络的训练,得到预先训练好的偏移模型。
一实施例中,中心点获取单元包括:网格表生成子单元,设置为将训练图像划分为W*H个网格,W、H为大于1的自然数,生成一幅网格表。损失值获取子单元,设置为获取将边界框中心放置于不同的网格中心时的第二损失值。参考位置获取子单元,设置为选取第二损失值最小的网格的中心位置作为边界 框中心点的参考位置。
一实施例中,尺度计算模块包括:高斯响应图单元,设置为获取训练样本图像的高斯响应图。尺度响应图单元,设置为使用深度卷积神经网络处理训练样本图像,得到训练样本图像的尺度响应图。欧几里得距离损失单元,设置为将高斯响应图与尺度响应图进行欧几里得距离损失计算,根据计算结果调节深度卷积神经网络的参数。获取尺度模型单元,设置为对多张训练样本图像依次执行步骤2310-2330,直到计算的欧几里得距离损失不再下降,结束对深度卷积神经网络的训练,得到预先训练好的尺度模型。
本实施例的技术方案,通过提供了一种跟拍装置,解决了相关技术中没有对多个目标或整个群体对象进行跟拍的跟拍方法的问题,实现了用户能够在在特定场景下对多个目标或整个群体对象进行跟拍的效果。
实施例六
图7为本申请实施例六提供的一种计算机设备的结构示意图,如图7所示,该计算机设备包括存储器4100、处理器4200,计算机设备中处理器4200的数量可以是一个或多个,图7中以一个处理器4200为例;设备中的存储器4100、处理器4200可以通过总线或其他方式连接,图7中以通过总线连接为例。
存储器4100作为一种计算机可读存储介质,可设置为存储软件程序、计算机可执行程序以及模块,如本申请实施例中的跟拍方法对应的程序指令/模块(例如,跟拍装置中的获取模块3100、计算模块3200、控制模块3300)。处理器4200通过运行存储在存储器4100中的软件程序、指令以及模块,从而执行设备/终端的至少一种功能应用以及数据处理,即实现上述的跟拍方法。
一实施例中,处理器4200设置为运行存储在存储器4100中的计算机程序,实现如下步骤:实时获取摄像机的拍摄图像,拍摄图像包括至少一个目标图像;利用预先训练好的模型预测拍摄图像中每个目标图像对应的尺度信息和每个目标图像对应的偏移信息;根据尺度信息和偏移信息确认摄像机的控制偏移信息。
一实施例中,本申请实施例所提供的一种计算机设备,其计算机程序不限于如上的方法操作,还可以执行本申请任意实施例所提供的跟拍方法中的相关操作。
存储器4100可主要包括存储程序区和存储数据区。一实施例中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器4100可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他 非易失性固态存储器件。在一些实例中,存储器4100可包括相对于处理器4200远程设置的存储器,这些远程存储器可以通过网络连接至设备/终端/设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。
实施例七
本申请实施例七还提供一种包含计算机可执行指令的存储介质,其上存储有计算机程序,计算机程序包括程序指令,该程序指令被处理器执行时实现一种跟拍方法,该跟拍方法包括:实时获取摄像机的拍摄图像,拍摄图像包括至少一个目标图像;利用预先训练好的模型预测拍摄图像中每个目标图像对应的尺度信息和每个目标图像对应的偏移信息;根据尺度信息和偏移信息确认摄像机的控制偏移信息。
本申请实施例所提供的一种包含计算机可执行指令的存储介质,计算机可执行指令不限于如上的方法操作,还可以执行本申请任意实施例所提供的跟拍方法中的相关操作。
通过以上关于实施方式的描述,所属领域的技术人员可以了解到,本申请可借助软件及通用硬件来实现,也可以通过硬件实现。基于这样的理解,本申请的技术方案可以以软件产品的形式体现出来,该计算机软件产品可以存储在计算机可读存储介质中,如计算机的软盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、闪存(FLASH)、硬盘或光盘等,包括多个指令用以使得一台计算机设备(可以是个人计算机,设备,或者网络设备等)执行本申请任意实施例所述的方法。
上述跟拍装置的实施例中,所包括的多个单元和模块只是按照功能逻辑进行划分的,但并不局限于上述的划分,只要能够实现相应的功能即可;另外,每个功能单元的名称也只是为了便于相互区分,并不用于限制本申请的保护范围。

Claims (11)

  1. 一种跟拍方法,包括:
    实时获取摄像机的拍摄图像,所述拍摄图像包括至少一个目标图像;
    利用预先训练好的模型预测所述拍摄图像中每个目标图像对应的尺度信息和所述每个目标图像对应的偏移信息;
    根据所述尺度信息和所述偏移信息确认所述摄像机的控制偏移信息。
  2. 根据权利要求1中所述的方法,其中,所述根据所述尺度信息和所述偏移信息确认所述摄像机的控制偏移信息包括:对所有目标图像对应的尺度信息和偏移信息的乘积进行加权计算得到所述摄像机的控制偏移信息。
  3. 根据权利要求2中所述的方法,其中,所述对每个目标图像对应的尺度信息和偏移信息进行加权计算得到所述摄像机的控制偏移信息包括:对所有目标图像对应的偏移信息和经过幂处理的尺度信息的乘积进行加权计算得到所述摄像机的控制偏移信息。
  4. 根据权利要求1-3中任一项所述的方法,其中,所述目标图像为预先选中的所述拍摄图像中的人物对象图像或所述拍摄图像中所有的人物对象图像。
  5. 根据权利要求1-4中任一项中所述的法,其中,所述预先训练好的模型包括预先训练好的尺度模型和预先训练好的偏移模型;
    所述利用预先训练好的模型预测所述拍摄图像中每个目标图像对应的尺度信息和所述每个目标图像对应的偏移信息包括:
    利用所述预先训练好的尺度模型预测所述拍摄图像中每个目标图像对应的尺度信息;
    利用所述预先训练好的偏移模型预测所述拍摄图像中每个目标图像对应的偏移信息。
  6. 根据权利要求5中所述的方法,其中,所述偏移模型的训练过程包括:
    从预先设定的图像数据集中获取训练图像和对应的标记数据,所述标记数据包括所述训练图像中跟踪目标的边界框信息和关键点信息;
    根据所述跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置;
    基于所述边界框中心点的参考位置获取所述训练图像对应的参考位置图像;
    利用深度卷积神经网络预测所述训练图像的参考位置以得到预测结果图像;
    根据所述参考位置图像和所述预测结果图像计算第一损失值,并根据所述第一损失值对所述深度卷积神经网络神的参数进行调节;
    对所述图像数据集中的多张训练图像依次执行上述步骤,直到第一损失值不再下降,结束对所述深度卷积神经网络的训练,得到所述预先训练好的偏移模型。
  7. 根据所述要求6中所述的方法,其中,所述根据所述跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置包括:
    将所述训练图像划分为W*H个网格,生成一幅网格表,W和H为大于1的自然数;
    获取在将边界框中心放置于不同的网格中心的情况下的第二损失值;
    选取所述第二损失值最小的网格的中心位置作为所述边界框中心点的参考位置。
  8. 根据权利要求5中所述的方法,其中,所述尺度模型的训练过程包括:
    获取训练样本图像的高斯响应图;
    使用深度卷积神经网络处理所述训练样本图像,得到所述训练样本图像的尺度响应图;
    将所述高斯响应图与所述尺度响应图进行欧几里得距离损失计算,根据计算结果调节所述深度卷积神经网络的参数;
    对多张训练样本图像依次执行上述步骤,直到计算的所述欧几里得距离损失不再下降,结束对所述深度卷积神经网络的训练,得到所述预先训练好的尺度模型。
  9. 一种跟拍装置,包括:
    获取模块,设置为实时获取摄像机的拍摄图像,所述拍摄图像包括至少一个目标图像;
    计算模块,设置为利用预先训练好的模型预测所述拍摄图像中每个目标图像对应的尺度信息和所述每个目标图像对应的偏移信息;
    控制模块,设置为根据所述尺度信息和所述偏移信息确认所述摄像机的控制偏移信息。
  10. 一种设备,包括:
    至少一个处理器;
    存储器,设置为存储至少一个程序,
    当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-8中任一所述的跟拍方法。
  11. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时实现如权利要求1-8中任一所述的跟拍方法。
PCT/CN2019/103654 2019-06-12 2019-08-30 跟拍方法、装置、设备及存储介质 WO2020248395A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910505922.0 2019-06-12
CN201910505922.0A CN110232706B (zh) 2019-06-12 2019-06-12 多人跟拍方法、装置、设备及存储介质

Publications (1)

Publication Number Publication Date
WO2020248395A1 true WO2020248395A1 (zh) 2020-12-17

Family

ID=67859704

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/103654 WO2020248395A1 (zh) 2019-06-12 2019-08-30 跟拍方法、装置、设备及存储介质

Country Status (2)

Country Link
CN (1) CN110232706B (zh)
WO (1) WO2020248395A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633355A (zh) * 2020-12-18 2021-04-09 北京迈格威科技有限公司 图像数据处理方法及装置、目标检测模型训练方法及装置
CN115665553A (zh) * 2022-09-29 2023-01-31 深圳市旗扬特种装备技术工程有限公司 一种无人机的自动跟踪方法、装置、电子设备及存储介质

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111104925B (zh) * 2019-12-30 2022-03-11 上海商汤临港智能科技有限公司 图像处理方法、装置、存储介质和电子设备
CN111462194B (zh) * 2020-03-30 2023-08-11 苏州科达科技股份有限公司 对象跟踪模型的训练方法、装置及存储介质
CN112084876B (zh) * 2020-08-13 2024-05-03 宜通世纪科技股份有限公司 一种目标对象追踪方法、系统、装置及介质
CN112788426A (zh) * 2020-12-30 2021-05-11 北京安博盛赢教育科技有限责任公司 一种功能显示区的显示方法、装置、介质和电子设备
CN114554086B (zh) * 2022-02-10 2024-06-25 支付宝(杭州)信息技术有限公司 一种辅助拍摄方法、装置及电子设备
WO2024055957A1 (zh) * 2022-09-16 2024-03-21 维沃移动通信有限公司 拍摄参数的调整方法、装置、电子设备和可读存储介质

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050190972A1 (en) * 2004-02-11 2005-09-01 Thomas Graham A. System and method for position determination
CN102867311A (zh) * 2011-07-07 2013-01-09 株式会社理光 目标跟踪方法和目标跟踪设备
CN107749952A (zh) * 2017-11-09 2018-03-02 睿魔智能科技(东莞)有限公司 一种基于深度学习的智能无人摄影方法和系统
CN109803090A (zh) * 2019-01-25 2019-05-24 睿魔智能科技(深圳)有限公司 无人拍摄自动变焦方法及系统、无人摄像机及存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101888479B (zh) * 2009-05-14 2012-05-02 汉王科技股份有限公司 检测和跟踪目标图像的方法及装置
JP6273685B2 (ja) * 2013-03-27 2018-02-07 パナソニックIpマネジメント株式会社 追尾処理装置及びこれを備えた追尾処理システム並びに追尾処理方法
WO2015083199A1 (en) * 2013-12-04 2015-06-11 J Tech Solutions, Inc. Computer device and method executed by the computer device
CN104346811B (zh) * 2014-09-30 2017-08-22 深圳市华尊科技股份有限公司 基于视频图像的目标实时追踪方法及其装置
CN108986169A (zh) * 2018-07-06 2018-12-11 北京字节跳动网络技术有限公司 用于处理图像的方法和装置
CN109522896A (zh) * 2018-11-19 2019-03-26 武汉科技大学 基于模板匹配与双自由度云台相机的仪表搜寻方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050190972A1 (en) * 2004-02-11 2005-09-01 Thomas Graham A. System and method for position determination
CN102867311A (zh) * 2011-07-07 2013-01-09 株式会社理光 目标跟踪方法和目标跟踪设备
CN107749952A (zh) * 2017-11-09 2018-03-02 睿魔智能科技(东莞)有限公司 一种基于深度学习的智能无人摄影方法和系统
CN109803090A (zh) * 2019-01-25 2019-05-24 睿魔智能科技(深圳)有限公司 无人拍摄自动变焦方法及系统、无人摄像机及存储介质

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633355A (zh) * 2020-12-18 2021-04-09 北京迈格威科技有限公司 图像数据处理方法及装置、目标检测模型训练方法及装置
CN115665553A (zh) * 2022-09-29 2023-01-31 深圳市旗扬特种装备技术工程有限公司 一种无人机的自动跟踪方法、装置、电子设备及存储介质
CN115665553B (zh) * 2022-09-29 2023-06-13 深圳市旗扬特种装备技术工程有限公司 一种无人机的自动跟踪方法、装置、电子设备及存储介质

Also Published As

Publication number Publication date
CN110232706B (zh) 2022-07-29
CN110232706A (zh) 2019-09-13

Similar Documents

Publication Publication Date Title
WO2020248395A1 (zh) 跟拍方法、装置、设备及存储介质
WO2020248396A1 (zh) 图像拍摄方法、装置、设备以及存储介质
CN110139115B (zh) 基于关键点的虚拟形象姿态控制方法、装置及电子设备
WO2019228196A1 (zh) 一种全景视频的目标跟踪方法和全景相机
CN109241910B (zh) 一种基于深度多特征融合级联回归的人脸关键点定位方法
US11703949B2 (en) Directional assistance for centering a face in a camera field of view
CN107749952B (zh) 一种基于深度学习的智能无人摄影方法和系统
CN105678809A (zh) 手持式自动跟拍装置及其目标跟踪方法
CN105718887A (zh) 基于移动终端摄像头实现动态捕捉人脸摄像的方法及系统
CN110998659A (zh) 图像处理系统、图像处理方法、及程序
CN106973221B (zh) 基于美学评价的无人机摄像方法和系统
EP2430614A1 (de) Verfahren zur echtzeitfähigen, rechnergestützten analyse einer eine veränderliche pose enthaltenden bildsequenz
CN108090463B (zh) 对象控制方法、装置、存储介质和计算机设备
WO2021052208A1 (zh) 用于运动障碍病症分析的辅助拍摄设备、控制方法和装置
CN107351080B (zh) 一种基于相机单元阵列的混合智能研究系统及控制方法
CN109685709A (zh) 一种智能机器人的照明控制方法及装置
US11087514B2 (en) Image object pose synchronization
CN108702456A (zh) 一种对焦方法、设备及可读存储介质
CN106203428B (zh) 基于模糊估计融合的图像显著性检测方法
CN108416800A (zh) 目标跟踪方法及装置、终端、计算机可读存储介质
CN116580151A (zh) 人体三维模型构建方法、电子设备及存储介质
WO2021147650A1 (zh) 拍照方法、装置、存储介质及电子设备
CN107705307B (zh) 一种基于深度学习的拍摄构图方法和系统
CN115457666A (zh) 活体对象运动重心识别方法、系统及计算机可读存储介质
CN114140530A (zh) 一种图像处理方法及投影设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19932919

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19932919

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19932919

Country of ref document: EP

Kind code of ref document: A1