WO2020248395A1 - 跟拍方法、装置、设备及存储介质 - Google Patents
跟拍方法、装置、设备及存储介质 Download PDFInfo
- Publication number
- WO2020248395A1 WO2020248395A1 PCT/CN2019/103654 CN2019103654W WO2020248395A1 WO 2020248395 A1 WO2020248395 A1 WO 2020248395A1 CN 2019103654 W CN2019103654 W CN 2019103654W WO 2020248395 A1 WO2020248395 A1 WO 2020248395A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- image
- scale
- target
- training
- information
- Prior art date
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/292—Multi-camera tracking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/70—Determining position or orientation of objects or cameras
- G06T7/73—Determining position or orientation of objects or cameras using feature-based methods
- G06T7/75—Determining position or orientation of objects or cameras using feature-based methods involving models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
Definitions
- This application relates to the field of photographing technology, for example, to a method, device, equipment, and storage medium for photographing.
- the target object In many fields, in order to achieve better shooting results, it is necessary to automatically follow a target object that needs to be tracked by the camera. In follow-up shots, the target object is usually relatively stable in the frame, and the scene remains unchanged. This requires that the motion speed of the photographer and the target object is basically the same, so as to ensure that the position of the target object in the picture is relatively stable, and the target object will not be moved out of the picture, nor will the scene change.
- This shooting method can record the posture and actions of the target object through the movement of the camera, without interfering with the subject, and expressing the subject in a relatively natural state.
- This application provides a follow-up method, device, equipment and storage medium to achieve the effect of follow-up of multiple target objects or the entire group of objects.
- the embodiment of the present application provides a follow-up method, and the follow-up method includes:
- An embodiment of the present application provides a follow-up device, which includes:
- An acquiring module configured to acquire a captured image of the camera in real time, the captured image including at least one target image
- a calculation module configured to use a pre-trained model to predict the scale information corresponding to each target image and the offset information corresponding to each target image in the captured image;
- the control module is configured to confirm the control offset information of the camera according to the scale information and the offset information.
- An embodiment of the present application provides a computer device, and the device includes:
- One or more processors are One or more processors;
- Memory set to store one or more programs
- the one or more processors implement the tracking method as described above.
- An embodiment of the present application provides a computer-readable storage medium on which a computer program is stored.
- the computer program includes program instructions, which when executed by a processor, implement the camera-following method described above.
- FIG. 1 is a schematic flow chart of a method for taking photos according to Embodiment 1 of the present application
- FIG. 2 is a schematic flowchart of another photo-following method provided in Embodiment 2 of the present application.
- FIG. 3 is a schematic flow chart of another method for taking photos according to Embodiment 2 of the present application.
- FIG. 4 is a schematic flowchart of another camera-following method provided in Embodiment 3 of the present application.
- FIG. 5 is a schematic flowchart of another photo-following method provided in Embodiment 4 of the present application.
- FIG. 6 is a schematic structural diagram of a camera follower provided by Embodiment 5 of the present application.
- FIG. 7 is a schematic structural diagram of a camera-following device provided in Embodiment 6 of the present application.
- Some exemplary embodiments are described as processes or methods depicted as flowcharts. Although the flowchart describes multiple steps as sequential processing, many steps in this document can be implemented in parallel, concurrently, or simultaneously. In addition, the order of multiple steps can be rearranged. The processing may be terminated when its operation is completed, but may also have additional steps not included in the drawings. Processing can correspond to methods, functions, procedures, subroutines, subroutines, and so on.
- first”, “second”, etc. may be used herein to describe various directions, actions, steps or elements, etc., but these directions, actions, steps or elements are not limited by these terms. These terms are only used to distinguish one direction, action, step or element from another direction, action, step or element.
- the first speed difference may be referred to as the second speed difference
- the second speed difference may be referred to as the first speed difference.
- the first speed difference and the second speed difference are both speed differences, but the first speed difference and the second speed difference are not the same speed difference.
- the terms “first”, “second”, etc. cannot be understood as indicating or implying relative importance or implicitly indicating the number of indicated technical features.
- the features defined with “first” and “second” may explicitly or implicitly include one or more of these features.
- "a plurality of” means at least two, such as two, three, etc., unless specifically defined otherwise.
- Fig. 1 is a schematic flow chart of a method for follow-up shots provided in Embodiment 1 of the application. This embodiment is applicable to a situation where a camera device is used to follow-up multiple people.
- Step 1100 Acquire a captured image of the camera in real time, where the captured image includes at least one target image.
- the captured image of each frame captured by the camera device is acquired.
- the imaging device of this embodiment may include a video camera, a camera, and so on.
- the target image is a pre-selected image of a person in the captured image or all images of a person in the captured image.
- the target image is a preselected image of a target person in each frame of images captured by the camera device, and may be one person image or multiple person images. In an alternative embodiment, the target image may be an animal image, a vehicle image, or other camera material images in addition to a person image.
- a character image data set consisting of character images and label data corresponding to each character image is constructed.
- the label data in this embodiment includes the portrait bounding box of each character, the pixel area of the character’s torso in the character image, and The distance of the character from the lens. In this embodiment, the bounding box of the portrait is used to determine the position of the character image in each frame of the image.
- the bounding box refers to the area in each frame of the image corresponding to the image of the character, and generally has a vertical or horizontal orientation.
- the long rectangular frame shape In this embodiment, the size and position of the bounding box depend on the size of the tracking target in the image captured by the lens, and the bounding box can be determined based on a visual tracking method in related technologies.
- the pixel area in the character image is the pixel area composed of all the pixels corresponding to each character image.
- the distance between the person and the lens is the distance between the camera device and the human object captured by the camera, which can be directly obtained by the camera device in the related art.
- Step 1200 Use a pre-trained model to predict the scale information corresponding to each target image in the captured image and the offset information corresponding to each target image.
- the model of this embodiment may be a deep convolutional neural network model
- the scale information is the size information of the character frame of the target image
- the offset information is the position information of the target image movement.
- the set of bounding boxes is ⁇
- the bounding box of each portrait is denoted as B i
- scale is the scale response map
- the scale information of the bounding box of the portrait is obtained using the following formula.
- scale(x,y) represents the value of the coordinate (x,y) on the scale chart, and (x pi ,y pi ) is the extreme point position.
- N is generally taken as 0, 1, 2, or 3, or determined as needed, m and n are index symbols used to traverse the rectangular area.
- XT is the reference position horizontal coordinate image
- YT is the reference position vertical coordinate image
- the offset image DX&DY of each pixel can be directly calculated through the reference position image XT&YT
- DX is The horizontal offset image
- DY is the vertical offset image
- the offset information is obtained through the offset image DX&DY
- the offset information includes the average offset control amount (d xi , d yi ).
- dx(i,j) and dy(i,j) are the values of coordinates (i,j) in the offset image DX&DY.
- s th is the set threshold.
- Step 1300 Confirm the control offset information of the camera according to the scale information and the offset information.
- calculation is performed according to the scale information corresponding to each target image and the offset information corresponding to each target image to obtain the control offset information of the camera to ensure that the camera can capture the movement of multiple persons in the image.
- the calculation process is to perform a weighted calculation on the product of the scale information and the offset information of all target images.
- the calculation process is to perform a weighted calculation on the product of the offset information of all target images and the power-processed scale information to obtain the control offset information of the camera.
- the scale value can be exponentiated, for example:
- ⁇ is the power exponent. The larger the value, the larger the scale of the object has the more dominant weight. The smaller the value, all targets tend to have the same weight. The selection of this value can design different parameters according to different scenarios.
- a computer program is used to first obtain a captured image of a camera in real time.
- the captured image includes at least one target image to obtain information parameters of one or more target images, and then use a pre-trained model to predict The scale information corresponding to each target image in the captured image and the offset information corresponding to each target image are captured, and finally the control offset information of the camera is confirmed according to the scale information and offset information of the target image, so as to realize the tracking of multiple target images.
- the shooting effect solves the problem that there is no follow-up method for multiple targets or the entire group of objects in related technologies, and realizes the effect that users can follow multiple targets or the entire group of objects in a specific scene.
- FIG. 2 is a schematic flow chart of another method for taking photos according to Embodiment 2 of the present application.
- This embodiment is based on the solution of the first embodiment, improved, and provides a solution for the training process of the offset model. As shown in Figure 2, the method includes the following steps:
- Step 2100 Acquire a captured image of the camera in real time, and the captured image includes at least one target image.
- Step 2200 Use a pre-trained scale model to predict scale information corresponding to each target image in the captured image.
- Step 2300 Use the pre-trained offset model to predict offset information corresponding to each target image in the captured image.
- Step 2400 Confirm the control offset information of the camera according to the scale information and the offset information.
- the training of the offset model in step 2300 may include the following steps:
- Step 2210 Obtain training images and corresponding label data from a preset image data set, where the label data includes bounding box information and key point information of the tracking target in the training image.
- multiple training images are preset in the image data set, and the training image type can be selected according to different shooting targets.
- portrait shooting is taken as an example, and all the training images collected in the image data set include portraits.
- These training images can cover many types of main scenes such as indoors, beaches and mountains, and various postures such as running, sitting, lying down and dancing.
- Each training image in the image data set has corresponding label data.
- the label data in this embodiment includes the bounding box information and key point information of the tracking target in the training image.
- the bounding box information includes the position of the bounding box and the size of the bounding box.
- 17 joint points of the human body are exemplarily selected as key points, and coordinate information corresponding to the joint points are respectively marked as key point information.
- Each joint point is marked as (xi, yi, si), i is a natural number from 1 to 17, indicating the i-th key point, xi is the horizontal coordinate of the i-th key point, and yi is the vertical coordinate of the i-th key point , When si is equal to 0, it means that the key point does not exist and no mark is needed.
- si When si is equal to 1, it means that the key point exists.
- i When i is 1 to 17, it corresponds to the following key point information: 1-top of head, 2-left eye, 3-right eye , 4-nose, 5-throat, 6-left shoulder, 7-left elbow, 8-left wrist, 9-right shoulder, 10-right elbow, 11-right wrist, 12-left hip, 13-left knee, 14-left ankle , 15-right hip, 16-right knee, 17-right ankle.
- Step 2220 Obtain the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target.
- the traditional "center control” method controls the center point of the target bounding box to move to the center of the image to complete the composition.
- the calculation process of this method is simple and does not take into account the influence of the target's posture on the composition, so the shooting effect is quite different from the actual expectation. Therefore, in the shooting method provided in this embodiment, when training the offset model, the difference in composition requirements of the tracking target in different poses is fully considered, and the difference of the tracking target can be distinguished according to the different key point information of the tracking target marked in step 2210 Posture, calculate the reference position of the center point of the bounding box based on the bounding box information and key point information of the tracking target, and can fully simulate the photographer's composition control ability, and its composition effect is better.
- Step 2230 Obtain a reference position image corresponding to the training image based on the reference position of the center point of the bounding box.
- the acquisition method is as follows:
- the reference position set of the center point of the bounding box of all tracking targets is defined as:
- the initial position coordinates of the bounding box center of each tracking target are defined as:
- (x, y) is the normalized coordinate of the pixel
- ⁇ ⁇ , ⁇ 1 is the number of tracking targets in the training image
- X TG (x, y) is the horizontal coordinate of the reference position of each pixel
- Y TG (x, y) is the vertical coordinate of the reference position of each pixel
- x ti and x ci are the horizontal coordinate of the reference position and the initial position of the center point of the bounding box of each tracking target
- y ti and y ci are respectively
- the vertical coordinates of the reference position and the vertical coordinates of the initial position of the center point of the bounding box of each tracking target, and the reference position image of the training image can be obtained after the reference position coordinates of each pixel are determined.
- the reference position image Compared with the image obtained by the traditional "center control" method, the reference position image fully considers the composition requirements when the target pose is different, and the composition effect is more precise and reasonable.
- Step 2240 Use the deep convolutional neural network to predict the reference position of the training image to obtain the prediction result image.
- the initial model of the deep convolutional neural network is used to predict the training image to obtain the reference position of the tracking target in the image, and then the prediction result image can be obtained.
- the horizontal and vertical coordinates of each pixel in the prediction result image are respectively It is X T (x,y), Y T (x,y).
- Step 2250 Calculate a first loss value according to the reference position image and the prediction result image, and adjust the parameters of the deep convolutional neural network according to the first loss value.
- the first loss value adopts Euclidean distance loss, and is calculated by formula (2) according to the aforementioned reference position image and prediction result image:
- X TG (x, y) and Y TG (x, y) are obtained from the formula (1), and X T (x, y) and Y T (x, y) are obtained from the prediction result image.
- the reference position image is an image that is expected to achieve the composition effect.
- the first loss value represents the deviation between the predicted result image and the reference position image. Based on the first loss value, the deep convolutional neural network is back-propagated to adjust the deep convolutional neural network parameters to make the prediction The resulting image is closer to the reference position image.
- Step 2260 Perform steps 2210-2250 sequentially on multiple training images in the image data set, until the first loss value in step 2250 no longer decreases, and end the training of the deep convolutional neural network to obtain a pre-trained migration model .
- the parameters of the deep convolutional neural network are adjusted according to the first loss value to obtain different first loss values.
- the first loss value continues to decrease, it indicates that the predicted result image is getting closer and closer to the reference position image.
- Adjust the deep convolutional neural network until the first loss value no longer decreases, it can be considered that the predicted result image is closest to the reference position image at this time, and the desired deep convolutional neural network model can be obtained as a trained deep neural volume.
- the paper network model is used.
- the first loss value standards of different training images are different, and the first loss value referred to here is no longer a decrease in the first loss.
- the value tends to be stable and meets the expected requirements.
- the custom first loss value is expected to be lower than k, then at least m consecutive first loss values obtained after multiple trainings using multiple training images When a loss value is always lower than k, it can be regarded as the first loss value no longer decreases.
- This embodiment provides a method of using a pre-trained model to predict the scale information corresponding to each target image and the offset information corresponding to each target image in the captured image in the first embodiment.
- the computer program uses the preset Obtain the training image and corresponding label data in the image data set of the training image.
- the label data includes the bounding box information and key point information of the tracking target in the training image; secondly, the reference of the bounding box center point is obtained according to the bounding box information and key point information of the tracking target Position; then obtain the reference position image corresponding to the training image based on the reference position of the center point of the bounding box; then use the deep convolutional neural network to predict the reference position of the training image to obtain the prediction result image; then calculate the first image based on the reference position image and the prediction result image A loss value, and adjust the parameters of the deep convolutional neural network according to the first loss value; finally, perform the above steps in sequence on multiple training images in the image data set until the first loss value in step 2250 no longer drops , End the training of the deep convolutional neural network, and get the trained migration model.
- the migration model training method provided in this embodiment solves the problem of how to train the deep neural convolutional network of the migration information, and realizes the effect of better predicting the migration information in the shooting method.
- FIG. 4 is a schematic flowchart of another camera-following method provided in Embodiment 3 of the present application.
- This embodiment is based on the solution of the second embodiment and is improved, and provides a solution for obtaining the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target.
- the method includes the following steps:
- Step 2222 Generate a grid table based on the training image. Divide the training image into W*H grids, where W and H are natural numbers greater than 1, and each grid provides a position option when calculating the composition position of the bounding box in the subsequent calculation. , The values of W and H can be adjusted according to the accuracy requirements.
- Step 2224 Obtain a second loss value when the center of the bounding box is placed at a different grid center.
- the horizontal coordinate range and vertical coordinate range of the image are both [0, 1].
- the setting of the reference point and the reference line can be adjusted based on different composition requirements.
- the horizontal coordinate range is changed by the above reference point and reference line.
- vertical coordinate range The limited area is set as the best composition area for tracking the target.
- the key line segment is defined according to the key point information of the tracking target.
- the key line segment is used to supplement the posture information of the tracking target.
- the posture reflected by the key point may have some errors under certain circumstances. Combining the key line segment based on the key point can reflect the tracking more clearly The posture of the target, for example:
- p i and p j represent two different points
- x pi and y pi represent the horizontal and vertical coordinates of point p i
- x pj and y pj represent points, respectively The horizontal and vertical coordinates of p j .
- P xy (x/W, y/H), and L xy is a normalized two-point line segment.
- the second loss value can reflect the degree of conformity between the tracking target and the customized optimal composition area when the bounding box is placed in different positions. The smaller the second loss value, the closer to the customized optimal composition area.
- Step 2226 Select the center position of the grid with the smallest second loss value as the reference position of the center point of the bounding box.
- custom grids, reference points, and reference lines can be adjusted according to different requirements for image accuracy.
- the key points of the tracking target and the relationship between the key line segments and the key points can also be customized. For example, when the accuracy is higher, W and H can be increased, that is, the number of grids of the image segmentation grid is increased.
- This embodiment provides the process of obtaining the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target in the third embodiment.
- the computer program divides the training image by generating a grid table based on the training image Are W*H grids, secondly, obtain the second loss value when the center of the bounding box is placed at a different grid center, and then select the center position of the grid with the smallest second loss value as the reference position of the center point of the bounding box, The problem of better obtaining the reference position of the center point of the bounding box is solved, and the effect of better obtaining the offset information in the offset model training is realized.
- FIG. 5 is a schematic flow chart of another method for taking photos according to Embodiment 4 of the present application. This embodiment is based on the solution of the second embodiment, has been improved, and provides a solution for the training process of the scale model. As shown in Figure 5, the method includes the following steps:
- Step 2310 Obtain a Gaussian response map of the training sample image.
- first by formula Calculate the relative scale S of the bounding box of each person in the person image, where w is the pixel width of the person image, h is the pixel height of the person image, and As is the absolute scale of the person. d is the distance between the person and the camera, and a is the pixel area of the person's torso in the person image; then, according to the relative scale S of the bounding box of each person's portrait, a Gaussian response map with the same scale as the person image is generated, and the Gaussian response is The extreme point of the figure is located in the center of the bounding box of the portrait, and the size of the extreme point is equal to the relative scale S; finally, the Gaussian response map of each character is superimposed to form the Gaussian response map of the character image; perform the above for all training sample images In three steps, the Gaussian response map corresponding to each training sample image is obtained.
- Step 2320 Use the deep convolutional neural network to process the training sample image to obtain a scale response map of the training sample image.
- a deep convolutional neural network is used to process the person image in the training sample image to obtain a scale response map of the same size as the person image in the training sample image.
- Step 2330 Perform Euclidean distance loss calculation on the Gaussian response map and the scale response map, and adjust the parameters of the deep convolutional neural network according to the calculation result.
- the Gaussian response map generated in step 2310 and the scale response map obtained in step 2320 are calculated for Euclidean distance loss, and the calculation result is adjusted using a backpropagation algorithm to adjust the parameters of the deep convolutional neural network.
- Step 2340 Steps 2310-2330 are sequentially performed on multiple training sample images until the calculated Euclidean distance loss no longer decreases, the training of the deep convolutional neural network is ended, and the pre-trained scale model is obtained.
- adjusting the parameters of the deep convolutional neural network according to the Euclidean distance loss will result in different Euclidean distance loss.
- the Euclidean distance loss continues to decrease, it indicates that the predicted result image is getting closer and closer.
- Scale response map continuously adjust the deep convolutional neural network, and finally, when the Euclidean distance loss is no longer reduced, it can be regarded as the predicted result image is closest to the scale response map at this time, and the desired deep convolutional neural network can be obtained at this time
- the model is used as a trained deep neural network model.
- the Euclidean distance loss standard for different training sample images is different, and the Euclidean distance loss referred to here is no longer Descent is a way of expressing that the Euclidean distance loss tends to be stable and meets the expected requirements.
- This embodiment provides a method for training a mesoscale model in the follow-up method. Firstly, the Gaussian response map of the training sample image is obtained, and then the deep convolutional neural network is used to process the training sample image to obtain the scale response map of the training sample image. Gaussian response graph and scale response graph are used to calculate the Euclidean distance loss, and the parameters of the deep convolutional neural network are adjusted according to the calculation results. Finally, the above steps are performed sequentially on multiple training sample images until the calculated Euclidean distance loss is not Decrease again, end the training of the deep convolutional neural network, and get a trained scale model. The problem of how to obtain the trained deep convolutional neural network corresponding to the scale model is solved, and the effect of better training the scale model is achieved.
- FIG. 6 is a schematic structural diagram of a camera follower provided by Embodiment 5 of the present application.
- the camera follower provided by the embodiment of the present application may include: an acquisition module 3100 configured to acquire a captured image of a camera in real time, and the captured image includes at least one target image; a calculation module 3200 configured to use a pre-trained model to predict The scale information corresponding to each target image and the offset information corresponding to each target image in the captured image; the control module 3300 is configured to confirm the control offset information of the camera according to the scale information and the offset information.
- control module 3300 can also be replaced with a weighting control module, which is configured to perform a weighted calculation on the product of the scale information and the offset information corresponding to the target image to obtain the control offset information of the camera.
- a weighting control module configured to perform a weighted calculation on the product of the scale information and the offset information corresponding to the target image to obtain the control offset information of the camera.
- the weighting control module can be replaced with a power processing control module, which is configured to perform weighting calculation on the product of the offset information corresponding to all target images and the scale information after the power processing to obtain the control offset information of the camera.
- the acquisition module 3100 can also be replaced with a person acquisition module, which is configured to acquire the captured image of the camera in real time.
- the captured image includes at least one target image, and the target image is a preselected image of a person or object in the captured image. All character object images.
- the calculation module 3200 may further include a scale calculation module and an offset calculation module.
- the scale calculation module is configured to use a pre-trained scale model to predict the scale information corresponding to each target image in the captured image; the offset calculation module It is set to use a pre-trained offset model to predict the offset information corresponding to each target image in the captured image.
- the offset calculation module includes: an offset acquisition unit configured to acquire training images and corresponding label data from a preset image data set, the label data including bounding box information and key points of the tracking target in the training image information.
- the center point obtaining unit is configured to obtain the reference position of the center point of the bounding box according to the bounding box information and key point information of the tracking target.
- the reference position obtaining unit is configured to obtain a reference position image corresponding to the training image based on the reference position of the center point of the bounding box.
- the convolutional neural network calculation unit is configured to use the deep convolutional neural network to predict the reference position of the training image to obtain the prediction result image.
- the loss value calculation unit is configured to calculate the first loss value according to the reference position image and the prediction result image, and adjust the parameters of the deep convolutional neural network according to the first loss value.
- the convolutional neural network training unit is set to sequentially perform steps 2210-2250 for training on multiple training images in the image data set, until the first loss value in step 2250 no longer decreases, and the training of the deep convolutional neural network ends, Get the pre-trained offset model.
- the center point acquisition unit includes: a grid table generating subunit, configured to divide the training image into W*H grids, where W and H are natural numbers greater than 1, to generate a grid table.
- the loss value acquisition subunit is set to acquire the second loss value when the center of the bounding box is placed at a different grid center.
- the reference position acquisition subunit is set to select the center position of the grid with the smallest second loss value as the reference position of the center point of the bounding box.
- the scale calculation module includes: a Gaussian response graph unit configured to obtain a Gaussian response graph of the training sample image.
- the scale response map unit is set to use the deep convolutional neural network to process the training sample image to obtain the scale response map of the training sample image.
- the Euclidean distance loss unit is set to perform Euclidean distance loss calculation on the Gaussian response graph and the scale response graph, and adjust the parameters of the deep convolutional neural network according to the calculation result.
- Obtain the scale model unit set to perform steps 2310-2330 on multiple training sample images in sequence until the calculated Euclidean distance loss no longer decreases, and end the training of the deep convolutional neural network to obtain a pre-trained scale model .
- the technical solution of this embodiment solves the problem of the lack of a follow-up method for multiple targets or the entire group of objects in the related art by providing a follow-up device, and realizes that the user can follow the photo in a specific scene. The effect of multiple targets or the entire group of objects being followed.
- FIG. 7 is a schematic structural diagram of a computer device provided by Embodiment 6 of the application.
- the computer device includes a memory 4100 and a processor 4200.
- the number of processors 4200 in the computer device may be one or more.
- a processor 4200 is taken as an example; the memory 4100 and the processor 4200 in the device may be connected through a bus or other methods, and the connection through a bus is taken as an example in FIG. 7.
- the memory 4100 can be configured to store software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to the tracking method in the embodiment of the application (for example, the acquisition in the tracking device Module 3100, calculation module 3200, control module 3300).
- the processor 4200 executes at least one functional application and data processing of the device/terminal by running the software programs, instructions, and modules stored in the memory 4100, that is, realizes the above-mentioned tracking method.
- the processor 4200 is configured to run a computer program stored in the memory 4100 to implement the following steps: obtain real-time captured images of the camera, the captured images include at least one target image; use a pre-trained model to predict each of the captured images Scale information corresponding to each target image and offset information corresponding to each target image; confirm the control offset information of the camera according to the scale information and offset information.
- the computer program of the computer device provided in the embodiment of the present application is not limited to the above method operations, and can also perform related operations in the follow-up method provided in any embodiment of the present application.
- the memory 4100 may mainly include a program storage area and a data storage area.
- the storage program area may store an operating system and an application program required by at least one function; the storage data area may store data created according to the use of the terminal, and the like.
- the memory 4100 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other non-volatile solid-state storage devices.
- the memory 4100 may include a memory remotely provided with respect to the processor 4200, and these remote memories may be connected to the device/terminal/device through a network. Examples of the aforementioned networks include but are not limited to the Internet, corporate intranets, local area networks, mobile communication networks, and combinations thereof.
- the seventh embodiment of the present application also provides a storage medium containing computer executable instructions, and a computer program is stored thereon.
- the computer program includes program instructions.
- a method for follow-up is implemented. Including: real-time acquisition of the camera's captured image, the captured image includes at least one target image; using a pre-trained model to predict the scale information corresponding to each target image in the captured image and the offset information corresponding to each target image; according to the scale information and The offset information confirms the control offset information of the camera.
- An embodiment of the present application provides a storage medium containing computer-executable instructions.
- the computer-executable instructions are not limited to the above method operations, and can also perform related operations in the follow-up method provided by any embodiment of the present application.
- this application can be implemented by software and general hardware, or can be implemented by hardware.
- the technical solution of this application can be embodied in the form of a software product, and the computer software product can be stored in a computer-readable storage medium, such as a computer floppy disk, read-only memory (ROM), Random Access Memory (RAM), flash memory (FLASH), hard disk or optical disk, etc., including multiple instructions to make a computer device (which can be a personal computer, device, or network device, etc.) execute any of this application The method described in the embodiment.
- the multiple units and modules included are only divided according to functional logic, but are not limited to the above division, as long as the corresponding function can be realized; in addition, each functional unit
- the names are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application.
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
Abstract
Description
Claims (11)
- 一种跟拍方法,包括:实时获取摄像机的拍摄图像,所述拍摄图像包括至少一个目标图像;利用预先训练好的模型预测所述拍摄图像中每个目标图像对应的尺度信息和所述每个目标图像对应的偏移信息;根据所述尺度信息和所述偏移信息确认所述摄像机的控制偏移信息。
- 根据权利要求1中所述的方法,其中,所述根据所述尺度信息和所述偏移信息确认所述摄像机的控制偏移信息包括:对所有目标图像对应的尺度信息和偏移信息的乘积进行加权计算得到所述摄像机的控制偏移信息。
- 根据权利要求2中所述的方法,其中,所述对每个目标图像对应的尺度信息和偏移信息进行加权计算得到所述摄像机的控制偏移信息包括:对所有目标图像对应的偏移信息和经过幂处理的尺度信息的乘积进行加权计算得到所述摄像机的控制偏移信息。
- 根据权利要求1-3中任一项所述的方法,其中,所述目标图像为预先选中的所述拍摄图像中的人物对象图像或所述拍摄图像中所有的人物对象图像。
- 根据权利要求1-4中任一项中所述的法,其中,所述预先训练好的模型包括预先训练好的尺度模型和预先训练好的偏移模型;所述利用预先训练好的模型预测所述拍摄图像中每个目标图像对应的尺度信息和所述每个目标图像对应的偏移信息包括:利用所述预先训练好的尺度模型预测所述拍摄图像中每个目标图像对应的尺度信息;利用所述预先训练好的偏移模型预测所述拍摄图像中每个目标图像对应的偏移信息。
- 根据权利要求5中所述的方法,其中,所述偏移模型的训练过程包括:从预先设定的图像数据集中获取训练图像和对应的标记数据,所述标记数据包括所述训练图像中跟踪目标的边界框信息和关键点信息;根据所述跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置;基于所述边界框中心点的参考位置获取所述训练图像对应的参考位置图像;利用深度卷积神经网络预测所述训练图像的参考位置以得到预测结果图像;根据所述参考位置图像和所述预测结果图像计算第一损失值,并根据所述第一损失值对所述深度卷积神经网络神的参数进行调节;对所述图像数据集中的多张训练图像依次执行上述步骤,直到第一损失值不再下降,结束对所述深度卷积神经网络的训练,得到所述预先训练好的偏移模型。
- 根据所述要求6中所述的方法,其中,所述根据所述跟踪目标的边界框信息和关键点信息获取边界框中心点的参考位置包括:将所述训练图像划分为W*H个网格,生成一幅网格表,W和H为大于1的自然数;获取在将边界框中心放置于不同的网格中心的情况下的第二损失值;选取所述第二损失值最小的网格的中心位置作为所述边界框中心点的参考位置。
- 根据权利要求5中所述的方法,其中,所述尺度模型的训练过程包括:获取训练样本图像的高斯响应图;使用深度卷积神经网络处理所述训练样本图像,得到所述训练样本图像的尺度响应图;将所述高斯响应图与所述尺度响应图进行欧几里得距离损失计算,根据计算结果调节所述深度卷积神经网络的参数;对多张训练样本图像依次执行上述步骤,直到计算的所述欧几里得距离损失不再下降,结束对所述深度卷积神经网络的训练,得到所述预先训练好的尺度模型。
- 一种跟拍装置,包括:获取模块,设置为实时获取摄像机的拍摄图像,所述拍摄图像包括至少一个目标图像;计算模块,设置为利用预先训练好的模型预测所述拍摄图像中每个目标图像对应的尺度信息和所述每个目标图像对应的偏移信息;控制模块,设置为根据所述尺度信息和所述偏移信息确认所述摄像机的控制偏移信息。
- 一种设备,包括:至少一个处理器;存储器,设置为存储至少一个程序,当所述至少一个程序被所述至少一个处理器执行,使得所述至少一个处理器实现如权利要求1-8中任一所述的跟拍方法。
- 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时实现如权利要求1-8中任一所述的跟拍方法。
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910505922.0 | 2019-06-12 | ||
CN201910505922.0A CN110232706B (zh) | 2019-06-12 | 2019-06-12 | 多人跟拍方法、装置、设备及存储介质 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2020248395A1 true WO2020248395A1 (zh) | 2020-12-17 |
Family
ID=67859704
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2019/103654 WO2020248395A1 (zh) | 2019-06-12 | 2019-08-30 | 跟拍方法、装置、设备及存储介质 |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110232706B (zh) |
WO (1) | WO2020248395A1 (zh) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633355A (zh) * | 2020-12-18 | 2021-04-09 | 北京迈格威科技有限公司 | 图像数据处理方法及装置、目标检测模型训练方法及装置 |
CN115665553A (zh) * | 2022-09-29 | 2023-01-31 | 深圳市旗扬特种装备技术工程有限公司 | 一种无人机的自动跟踪方法、装置、电子设备及存储介质 |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111104925B (zh) * | 2019-12-30 | 2022-03-11 | 上海商汤临港智能科技有限公司 | 图像处理方法、装置、存储介质和电子设备 |
CN111462194B (zh) * | 2020-03-30 | 2023-08-11 | 苏州科达科技股份有限公司 | 对象跟踪模型的训练方法、装置及存储介质 |
CN112084876B (zh) * | 2020-08-13 | 2024-05-03 | 宜通世纪科技股份有限公司 | 一种目标对象追踪方法、系统、装置及介质 |
CN112788426A (zh) * | 2020-12-30 | 2021-05-11 | 北京安博盛赢教育科技有限责任公司 | 一种功能显示区的显示方法、装置、介质和电子设备 |
CN114554086B (zh) * | 2022-02-10 | 2024-06-25 | 支付宝(杭州)信息技术有限公司 | 一种辅助拍摄方法、装置及电子设备 |
WO2024055957A1 (zh) * | 2022-09-16 | 2024-03-21 | 维沃移动通信有限公司 | 拍摄参数的调整方法、装置、电子设备和可读存储介质 |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050190972A1 (en) * | 2004-02-11 | 2005-09-01 | Thomas Graham A. | System and method for position determination |
CN102867311A (zh) * | 2011-07-07 | 2013-01-09 | 株式会社理光 | 目标跟踪方法和目标跟踪设备 |
CN107749952A (zh) * | 2017-11-09 | 2018-03-02 | 睿魔智能科技(东莞)有限公司 | 一种基于深度学习的智能无人摄影方法和系统 |
CN109803090A (zh) * | 2019-01-25 | 2019-05-24 | 睿魔智能科技(深圳)有限公司 | 无人拍摄自动变焦方法及系统、无人摄像机及存储介质 |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101888479B (zh) * | 2009-05-14 | 2012-05-02 | 汉王科技股份有限公司 | 检测和跟踪目标图像的方法及装置 |
JP6273685B2 (ja) * | 2013-03-27 | 2018-02-07 | パナソニックIpマネジメント株式会社 | 追尾処理装置及びこれを備えた追尾処理システム並びに追尾処理方法 |
WO2015083199A1 (en) * | 2013-12-04 | 2015-06-11 | J Tech Solutions, Inc. | Computer device and method executed by the computer device |
CN104346811B (zh) * | 2014-09-30 | 2017-08-22 | 深圳市华尊科技股份有限公司 | 基于视频图像的目标实时追踪方法及其装置 |
CN108986169A (zh) * | 2018-07-06 | 2018-12-11 | 北京字节跳动网络技术有限公司 | 用于处理图像的方法和装置 |
CN109522896A (zh) * | 2018-11-19 | 2019-03-26 | 武汉科技大学 | 基于模板匹配与双自由度云台相机的仪表搜寻方法 |
-
2019
- 2019-06-12 CN CN201910505922.0A patent/CN110232706B/zh active Active
- 2019-08-30 WO PCT/CN2019/103654 patent/WO2020248395A1/zh active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20050190972A1 (en) * | 2004-02-11 | 2005-09-01 | Thomas Graham A. | System and method for position determination |
CN102867311A (zh) * | 2011-07-07 | 2013-01-09 | 株式会社理光 | 目标跟踪方法和目标跟踪设备 |
CN107749952A (zh) * | 2017-11-09 | 2018-03-02 | 睿魔智能科技(东莞)有限公司 | 一种基于深度学习的智能无人摄影方法和系统 |
CN109803090A (zh) * | 2019-01-25 | 2019-05-24 | 睿魔智能科技(深圳)有限公司 | 无人拍摄自动变焦方法及系统、无人摄像机及存储介质 |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633355A (zh) * | 2020-12-18 | 2021-04-09 | 北京迈格威科技有限公司 | 图像数据处理方法及装置、目标检测模型训练方法及装置 |
CN115665553A (zh) * | 2022-09-29 | 2023-01-31 | 深圳市旗扬特种装备技术工程有限公司 | 一种无人机的自动跟踪方法、装置、电子设备及存储介质 |
CN115665553B (zh) * | 2022-09-29 | 2023-06-13 | 深圳市旗扬特种装备技术工程有限公司 | 一种无人机的自动跟踪方法、装置、电子设备及存储介质 |
Also Published As
Publication number | Publication date |
---|---|
CN110232706B (zh) | 2022-07-29 |
CN110232706A (zh) | 2019-09-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020248395A1 (zh) | 跟拍方法、装置、设备及存储介质 | |
WO2020248396A1 (zh) | 图像拍摄方法、装置、设备以及存储介质 | |
CN110139115B (zh) | 基于关键点的虚拟形象姿态控制方法、装置及电子设备 | |
WO2019228196A1 (zh) | 一种全景视频的目标跟踪方法和全景相机 | |
CN109241910B (zh) | 一种基于深度多特征融合级联回归的人脸关键点定位方法 | |
US11703949B2 (en) | Directional assistance for centering a face in a camera field of view | |
CN107749952B (zh) | 一种基于深度学习的智能无人摄影方法和系统 | |
CN105678809A (zh) | 手持式自动跟拍装置及其目标跟踪方法 | |
CN105718887A (zh) | 基于移动终端摄像头实现动态捕捉人脸摄像的方法及系统 | |
CN110998659A (zh) | 图像处理系统、图像处理方法、及程序 | |
CN106973221B (zh) | 基于美学评价的无人机摄像方法和系统 | |
EP2430614A1 (de) | Verfahren zur echtzeitfähigen, rechnergestützten analyse einer eine veränderliche pose enthaltenden bildsequenz | |
CN108090463B (zh) | 对象控制方法、装置、存储介质和计算机设备 | |
WO2021052208A1 (zh) | 用于运动障碍病症分析的辅助拍摄设备、控制方法和装置 | |
CN107351080B (zh) | 一种基于相机单元阵列的混合智能研究系统及控制方法 | |
CN109685709A (zh) | 一种智能机器人的照明控制方法及装置 | |
US11087514B2 (en) | Image object pose synchronization | |
CN108702456A (zh) | 一种对焦方法、设备及可读存储介质 | |
CN106203428B (zh) | 基于模糊估计融合的图像显著性检测方法 | |
CN108416800A (zh) | 目标跟踪方法及装置、终端、计算机可读存储介质 | |
CN116580151A (zh) | 人体三维模型构建方法、电子设备及存储介质 | |
WO2021147650A1 (zh) | 拍照方法、装置、存储介质及电子设备 | |
CN107705307B (zh) | 一种基于深度学习的拍摄构图方法和系统 | |
CN115457666A (zh) | 活体对象运动重心识别方法、系统及计算机可读存储介质 | |
CN114140530A (zh) | 一种图像处理方法及投影设备 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19932919 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19932919 Country of ref document: EP Kind code of ref document: A1 |
|
32PN | Ep: public notification in the ep bulletin as address of the adressee cannot be established |
Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 17.05.2022) |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19932919 Country of ref document: EP Kind code of ref document: A1 |