CN117615255B - Shooting tracking method, device, equipment and storage medium based on cradle head - Google Patents

Shooting tracking method, device, equipment and storage medium based on cradle head Download PDF

Info

Publication number
CN117615255B
CN117615255B CN202410077917.5A CN202410077917A CN117615255B CN 117615255 B CN117615255 B CN 117615255B CN 202410077917 A CN202410077917 A CN 202410077917A CN 117615255 B CN117615255 B CN 117615255B
Authority
CN
China
Prior art keywords
frame image
target
map
foreground
cradle head
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410077917.5A
Other languages
Chinese (zh)
Other versions
CN117615255A (en
Inventor
杨斯康
林菁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohem Technology Co ltd
Original Assignee
Hohem Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohem Technology Co ltd filed Critical Hohem Technology Co ltd
Priority to CN202410077917.5A priority Critical patent/CN117615255B/en
Publication of CN117615255A publication Critical patent/CN117615255A/en
Application granted granted Critical
Publication of CN117615255B publication Critical patent/CN117615255B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/695Control of camera direction for changing a field of view, e.g. pan, tilt or based on tracking of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • G06V10/507Summing image-intensity values; Histogram projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/56Extraction of image or video features relating to colour
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/61Control of cameras or camera modules based on recognised objects
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/63Control of cameras or camera modules by using electronic viewfinders
    • H04N23/631Graphical user interfaces [GUI] specially adapted for controlling image capture or setting capture parameters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The invention provides a shooting tracking method, a shooting tracking device, shooting tracking equipment and a storage medium based on a cradle head, wherein the method comprises the following steps: responding to the clicking operation aiming at the display picture on the shooting equipment, and acquiring a frame image at the same moment as the clicking operation from the video stream data as an initial frame image; respectively taking an initial frame image as a target frame image, taking the next frame image of the initial frame image as a current frame image, and generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image; and carrying out image fusion on the heat distribution map and the foreground frame selection map to generate a fusion heat map, determining the tracking position of the target object in the current frame image according to the fusion heat map, and controlling the cradle head to change the pose according to the continuously acquired tracking position. The method adopts the image fusion technology and combines the heat distribution diagram and the foreground frame selection diagram, so that the tracking position of the target object can be accurately determined, the posture of the cradle head can be further accurately controlled, and the shooting equipment can be better aligned to the target object.

Description

Shooting tracking method, device, equipment and storage medium based on cradle head
Technical Field
The present invention relates to the field of target tracking, and in particular, to a method, apparatus, device, and storage medium for capturing and tracking based on a pan/tilt.
Background
The target tracking technology based on the cradle head is a technology for tracking a moving target by using an onboard camera and a cradle head control system. The main idea is that the direction and the angle of the onboard camera are adjusted through the cradle head control system, so that the shooting equipment can always aim at a target object and track the position and the motion state of the target in real time. This technique generally involves three main processes: target detection, target tracking and cradle head control. The target detection is used for detecting the initial position of a target in a video sequence, then realizing real-time tracking of the target through a target tracking algorithm, and finally controlling the direction and the angle of the onboard camera through a cradle head control system so as to realize stable tracking of the target. However, this technique still has some drawbacks. Existing methods typically use convolution features that are widely used in image processing because they can capture high-level semantic information in the image, such as the shape, contour, and structure of objects. These features are very useful for identifying and classifying objects, however, convolution features are not always sensitive to detailed information such as color, texture, etc. Color and texture are important visual features in an image, particularly playing a key role in distinguishing similar objects or backgrounds. Although convolutional neural networks can learn color and texture features to some extent, they are typically more focused on extracting abstract semantic information, and ignore some details, resulting in poor discrimination between similar interferents or background objects with similar convolutional features to the tracking target.
Disclosure of Invention
The invention mainly aims to solve the technical problem that the existing object tracking through a cradle head has poor distinguishing capability on similar interferents or background objects with similar convolution characteristics with a tracking object.
The first aspect of the invention provides a shooting tracking method based on a cradle head, wherein shooting equipment is carried on the cradle head; the shooting tracking method based on the cradle head comprises the following steps:
Responding to a clicking operation aiming at a display picture on the shooting equipment, acquiring video stream data transmitted by the shooting equipment in real time, and acquiring a frame image at the same moment as the clicking operation from the video stream data as an initial frame image;
Respectively taking the initial frame image as a target frame image, taking the next frame image of the initial frame image as a current frame image, and generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image;
Performing image fusion on the heat distribution map and the foreground frame selection map to generate a fusion heat map, and determining the tracking position of the target object in the current frame image according to the fusion heat map;
According to the tracking position of the target object in the current frame image, carrying out first pose adjustment on the cradle head so that the shooting equipment aims at the target object;
Updating the current frame image to be a target frame image respectively, updating the next frame image of the current frame image to be the current frame image, and returning to the step of generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image until a preset stop condition is met, so that the shooting equipment continuously aims at the target object before the preset stop condition is met.
Optionally, in a first implementation manner of the first aspect of the present invention, the generating the heat distribution map and the foreground frame selection map between the target frame image and the current frame image by using the initial frame image as the target frame image and using a next frame image of the initial frame image as the current frame image includes:
respectively taking the initial frame image as a target frame image and taking the next frame image of the initial frame image as a current frame image;
Object detection is carried out on the target frame image according to the position of the clicking operation, and the position information and the size information of the target object in the target frame image are determined;
Generating a target area corresponding to the target frame image and a search area corresponding to the current frame image based on the position information and the size information;
Performing convolution feature extraction on the target area and the search area to generate a heat distribution diagram between the target frame image and the current frame image;
And extracting color features of the target area and the search area, and generating a foreground frame selection image between the target frame image and the current frame image.
Optionally, in a second implementation manner of the first aspect of the present invention, the performing convolution feature extraction on the target area and the search area, and generating a heat distribution map between the target frame image and the current frame image includes:
inputting the target area and the search area into a preset twin neural network, wherein the twin neural network comprises a first branch and a second branch;
Performing convolution feature extraction on the target area and the search area through the first branch and the second branch respectively to obtain a first feature map and a second feature map;
respectively carrying out channel dimension transformation on the first feature map and the second feature map to obtain a first feature map vector and a second feature map vector under each channel dimension;
And performing element-by-element multiplication operation on the first feature vector and the second feature vector in each channel dimension to obtain multiplication results in each channel dimension, and performing summation operation on each multiplication result in the channel dimension to obtain a heat distribution diagram between the target frame image and the current frame image.
Optionally, in a third implementation manner of the first aspect of the present invention, the performing color feature extraction on the target area and the search area, and generating a foreground frame selection map between the target frame image and the current frame image includes:
Dividing each pixel in the target area and the search area into corresponding color intervals according to corresponding color values to obtain corresponding first color histograms and second color histograms;
calculating a probability that each pixel in the search area is a foreground pixel based on the first color histogram and the second color histogram;
and identifying the pixels with the probability larger than a preset probability threshold as foreground pixels, and generating a foreground frame selection map according to all the foreground pixels.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the performing image fusion on the heat distribution map and the foreground frame selection map to generate a fusion heat map, and determining a tracking position of the target object in the current frame image according to the fusion heat map includes:
Performing size cutting on the heat distribution map or the foreground frame selection map according to the sizes of the heat distribution map and the foreground frame selection map so that the sizes of the heat distribution map and the foreground frame selection map are consistent;
Determining a gain section corresponding to each pixel according to the color value of each pixel in the cut foreground frame selection diagram;
Performing color gain on the color values of the corresponding pixels in the fusion heat map according to the gain parameters corresponding to the gain intervals, and generating the fusion heat map according to the color values after the color gain;
And determining the tracking position of the target object in the current frame image according to the fusion heat map.
Optionally, in a fifth implementation manner of the first aspect of the present invention, after performing, according to the tracking position of the target object in the current frame image, a first pose adjustment on the pan-tilt, so that the photographing device aligns with the target object, the method further includes:
When the cradle head carries out second pose adjustment, calculating the total pose adjustment quantity of the cradle head, and obtaining a first adjustment quantity of the first pose adjustment;
calculating a second adjustment amount of the cradle head for performing second pose adjustment according to the total pose adjustment amount and the first adjustment amount;
And taking the second adjustment amount as an influence parameter, and adjusting the generated search area according to the influence parameter when the search area is generated by the next frame of image adjusted by the second pose.
Optionally, in a sixth implementation manner of the first aspect of the present invention, the pan-tilt includes a gyroscope and an accelerometer; when the cradle head performs the second pose adjustment, calculating the total pose adjustment amount of the cradle head and obtaining the first adjustment amount of the first pose adjustment includes:
When the cradle head carries out second pose adjustment, acquiring the rotation speeds of the cradle head in each axial direction in a machine body coordinate axis through the gyroscope, and acquiring the current gravity direction information of the cradle head through an accelerometer;
Calculating the rotation angles of the cradle head in each axial direction in the machine body coordinate axis according to the rotation speed and the gravity direction information;
And taking the rotation angles of the axial directions as the total pose adjustment amount of the cradle head, and obtaining the first adjustment amount of the first pose adjustment.
The second aspect of the present invention provides a camera tracking device based on a pan-tilt, on which a camera is mounted, the camera tracking device based on a pan-tilt comprising:
The response module is used for responding to the clicking operation aiming at the display picture on the shooting equipment, acquiring video stream data transmitted by the shooting equipment in real time, and acquiring a frame image at the same moment as the clicking operation from the video stream data as an initial frame image;
The image generation module is used for respectively taking the initial frame image as a target frame image, taking the next frame image of the initial frame image as a current frame image, and generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image;
the image fusion module is used for carrying out image fusion on the heat distribution diagram and the foreground frame selection diagram to generate a fusion heat diagram, and determining the tracking position of the target object in the current frame image according to the fusion heat diagram;
The pose adjustment module is used for performing first pose adjustment on the cradle head according to the tracking position of the target object in the current frame image so that the shooting equipment is aligned to the target object;
and the circulation module is used for respectively updating the current frame image into a target frame image, updating the next frame image of the current frame image into the current frame image, and returning to the step of generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image until a preset stop condition is met, so that the shooting equipment continuously aims at the target object before the preset stop condition is met.
The third aspect of the present invention provides a pan-tilt-based photographing tracking device, comprising: a memory and at least one processor, the memory having instructions stored therein, the memory and the at least one processor being interconnected by a line; the at least one processor invokes the instructions in the memory to cause the pan-tilt-based capture tracking device to perform the steps of the pan-tilt-based capture tracking method described above.
A fourth aspect of the present invention provides a computer-readable storage medium having instructions stored therein, which when run on a computer, cause the computer to perform the steps of the pan-tilt-based shooting tracking method described above.
According to the shooting tracking method, the shooting tracking device, the shooting tracking equipment and the storage medium based on the cloud deck, the frame image at the same moment with the point selection operation is obtained from the video stream data as the initial frame image by responding to the point selection operation aiming at the display picture on the shooting equipment; respectively taking an initial frame image as a target frame image, taking the next frame image of the initial frame image as a current frame image, and generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image; and carrying out image fusion on the heat distribution map and the foreground frame selection map to generate a fusion heat map, determining the tracking position of the target object in the current frame image according to the fusion heat map, and controlling the cradle head to change the pose according to the continuously acquired tracking position. The method adopts the image fusion technology and combines the heat distribution diagram and the foreground frame selection diagram, so that the tracking position of the target object can be accurately determined, the posture of the cradle head can be further accurately controlled, and the shooting equipment can be better aligned to the target object.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
In order to make the above objects, features and advantages of the present invention more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
Fig. 1 is a schematic diagram of an embodiment of a pan-tilt-based shooting tracking method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an embodiment of a pan-tilt-based camera tracking device according to an embodiment of the present invention;
fig. 3 is a schematic diagram of another embodiment of a pan-tilt-based photographing tracking device according to an embodiment of the invention;
fig. 4 is a schematic diagram of an embodiment of a pan-tilt-based photographing tracking apparatus according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The terms "comprising" and "having" and any variations thereof, as used in the embodiments of the present invention, are intended to cover non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed but may optionally include other steps or elements not listed or inherent to such process, method, article, or apparatus.
For the sake of understanding the present embodiment, first, a detailed description is given of a pan-tilt-based shooting tracking method disclosed in the embodiments of the present invention. The shooting equipment is mounted on the cradle head, as shown in fig. 1, the shooting tracking method based on the cradle head comprises the following steps:
101. Responding to the clicking operation aiming at the display picture on the shooting equipment, acquiring video stream data transmitted by the shooting equipment in real time, and acquiring a frame image at the same moment as the clicking operation from the video stream data as an initial frame image;
In this embodiment, the photographing apparatus is generally an apparatus for photographing still or moving images such as an index camera, a cellular phone camera, a video camera, or the like. These devices are often equipped with a display screen for displaying photographed pictures in real time. On the display screen of the photographing apparatus, a real-time image captured through a lens can be seen. These images may be static scenes or may be moving objects or characters. The user can implement a click operation by clicking or selecting a specific area on the display screen of the photographing apparatus. For example, a user may select a particular region by clicking or drawing a box on an object of interest through a touch screen. The photographing apparatus generates video stream data through its built-in sensor and processor. The video stream data is made up of a series of successive image frames, each of which contains an image captured by the camera device at a certain instant. The video stream data may be transmitted by different transmission means. For wireless devices, such as cell phone cameras, video stream data may be transmitted over Wi-Fi or mobile data networks. For wired devices, such as digital cameras or video cameras, the video stream data can be transmitted to the connected devices through a USB interface or an HDMI interface, and the manner of transmitting the video stream data depends on the specific device and application scenario.
102. Respectively taking an initial frame image as a target frame image, taking the next frame image of the initial frame image as a current frame image, and generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image;
In one embodiment of the present invention, the generating the heat distribution map and the foreground frame map between the target frame image and the current frame image by using the initial frame image as the target frame image and using the next frame image of the initial frame image as the current frame image includes: respectively taking the initial frame image as a target frame image and taking the next frame image of the initial frame image as a current frame image; object detection is carried out on the target frame image according to the position of the clicking operation, and the position information and the size information of the target object in the target frame image are determined; generating a target area corresponding to the target frame image and a search area corresponding to the current frame image based on the position information and the size information; performing convolution feature extraction on the target area and the search area to generate a heat distribution diagram between the target frame image and the current frame image; and extracting color features of the target area and the search area, and generating a foreground frame selection image between the target frame image and the current frame image.
In particular, in video processing, object tracking is a common task that requires tracking the motion of a particular object in successive video frames. To achieve this goal, some computer vision algorithms are required to achieve detection and tracking of objects. In this process, it is first necessary to set a target frame image containing a target object to be tracked, and to set the initial frame image as the target frame image. The next frame of the initial frame image is also required as the current frame image for object detection and tracking. Next, object detection is performed on the target object in the target frame image using a computer vision algorithm to determine its position information and size information. Common object detection algorithms include deep learning-based object detection algorithms such as R-CNN, fast R-CNN, and YOLO, among others.
Specifically, based on the position information and the size information in a given target frame image, a target area and a search area in the current frame image may be generated. These regions define specific regions of interest in subsequent processing. The method of generating the target area generally involves centering on the position of the target object and calculating a bounding box of the target area from given size information. For example, the center coordinates of the target object and the width and height information may be used to calculate the upper left and lower right corner coordinates of the bounding box. In this way, a rectangular frame surrounding the target object, i.e. the target area, is obtained. As for the search area, a larger area is generally set in the current frame image centering on the target area. The size of the search area can be adjusted according to actual requirements and algorithm performance. By centering the target area as the search area, contextual information about the target object may be captured and more comprehensive features provided for subsequent processing and analysis.
Further, the performing convolution feature extraction on the target area and the search area, and generating a heat distribution map between the target frame image and the current frame image includes: inputting the target area and the search area into a preset twin neural network, wherein the twin neural network comprises a first branch and a second branch; performing convolution feature extraction on the target area and the search area through the first branch and the second branch respectively to obtain a first feature map and a second feature map; respectively carrying out channel dimension transformation on the first feature map and the second feature map to obtain a first feature map vector and a second feature map vector under each channel dimension; and performing element-by-element multiplication operation on the first feature vector and the second feature vector in each channel dimension to obtain multiplication results in each channel dimension, and performing summation operation on each multiplication result in the channel dimension to obtain a heat distribution diagram between the target frame image and the current frame image.
In particular, convolutional feature extraction is a commonly used operation in deep learning, which can extract representative features from input data. In the twin neural network, the first branch and the second branch have the same structure for processing the target region and the search region. In particular, convolutional feature extraction is typically composed of multiple convolutional layers, activation functions, and pooling layers. In each branch, the input target region and search region are first feature extracted using a convolution layer. The convolution layer carries out convolution operation on the input data and the convolution kernel by sliding the convolution kernel on the local receptive field, so as to obtain a convolution characteristic diagram. Next, an activation function is applied to the convolution feature map to introduce nonlinearities. Common activation functions include ReLU (RECTIFIED LINEAR Unit), leaky ReLU, and the like. The activation function can increase the expressive power of the network and improve the nonlinear expressive power of the feature. During convolutional feature extraction, it is also possible to downsample the feature map using a pooling layer to reduce the size of the feature map and preserve important features. Common pooling operations include maximum pooling and average pooling, which can reduce the dimension of the feature map by selecting a maximum or calculating an average value within the local receptive field. By performing the same convolution feature extraction operation on the first branch and the second branch, a first feature map and a second feature map of the target region and the search region can be obtained.
Specifically, in deep learning, a Convolutional Neural Network (CNN) is generally used to extract features of an image. Each convolution layer in the CNN outputs a three-dimensional tensor as a feature map, with the third dimension representing a different feature channel. If the feature map is to be converted into a vector, it needs to be channel-dimensional transformed. Specifically, the feature map can be regarded as a three-dimensional tensor in the shape of (H, W, C), where H and W represent the height and width of the feature map, respectively, and C represents the number of feature channels. Then, the two-dimensional matrix with the shape of (H, W, C) can be developed according to the channel dimension, namely, the feature vectors corresponding to each pixel point are connected in series to obtain a feature map vector. For two feature maps that have been converted to vector form, they can be subjected to an element-wise multiplication operation. The element-by-element multiplication is to multiply the corresponding elements of the two vectors to obtain a new vector, the length of which is equal to the length of the original vector. For example, if there are two column vectors a and b of (N, 1) shape, then their element-wise multiplication result can be expressed as c=a×b, where c is a column vector of (N, 1) shape. After the element-by-element multiplication operation, a multiplication result in each channel dimension is obtained. These results need to be summed in the channel dimension to obtain the heat distribution map between the target frame image and the current frame image. Specifically, the multiplication result in each channel dimension can be regarded as a two-dimensional matrix with the shape (HW, C), and then the matrix is summed in the second dimension to obtain a column vector with the shape (HW, 1), and the column vector is restored to the same shape as the original image, namely (H, W). This may be accomplished using a remodeling operation (reshape). The heat profile is obtained by reshaping the column vector of shape (H x W, 1) into a matrix of shape (H, W).
Further, the performing color feature extraction on the target area and the search area, and generating a foreground frame selection map between the target frame image and the current frame image includes: dividing each pixel in the target area and the search area into corresponding color intervals according to corresponding color values to obtain corresponding first color histograms and second color histograms; calculating a probability that each pixel in the search area is a foreground pixel based on the first color histogram and the second color histogram; and identifying the pixels with the probability larger than a preset probability threshold as foreground pixels, and generating a foreground frame selection map according to all the foreground pixels.
Specifically, the division of the color sections may employ various methods, such as dividing the values of three channels in the RGB color space into a certain number of sections, respectively, or using other color spaces (such as HSV, lab, etc.). After division, counting the number of pixels in each color interval to obtain a first color histogram and a second color histogram. The probability that each pixel belongs to a foreground pixel may be calculated using a probability statistical method, such as a bayesian classifier or a conditional probability density function, in combination with the information of the color histogram. This may involve normalization of the color histogram and calculation of the probability density function, and finally, by setting a preset probability threshold, pixels with probabilities greater than the threshold are screened out and identified as foreground pixels. And generating a foreground frame selection image according to the position information of all the pixels identified as the foreground, wherein the foreground region can be simply marked on the original image by a rectangular frame, or can be in the form of a finer foreground mask image and the like. In calculating the foreground pixel probability, it is assumed that an RGB color space is used and divided into 5 color bins, resulting in a first color histogram hist1= [ h1, h2, h3, h4, h5] and a second color histogram hist2= [ h1', h2', h3', h4', h5' ]. Next, we use a bayesian classifier to calculate the probability P (foreground |x, y) that each pixel belongs to a foreground pixel, where x and y represent the abscissa and ordinate of the pixel, respectively.
First, normalization processing is required for the color histogram, that is:
Wherein the method comprises the steps of And/>The probabilities of the ith color interval in the first color histogram and the second color histogram are respectively represented. Based on/>And/>Conditional probability density functions P (x, y-foreground) and P (x, y-background) can be calculated to represent the probability of the occurrence of a pixel (x, y) given the foreground and background, respectively. This may be achieved by modeling the training data set, which is not described in detail here.
Finally, the probability that each pixel belongs to a foreground pixel may be calculated according to the bayesian theorem:
Where P (for round) and P (background) represent the probability of foreground and background occurrence, respectively, can be estimated by the ratio of the number of foreground and background pixels in the training dataset. According to the above formula, we can calculate the probability P (foreground |x, y) that it belongs to the foreground pixel for each pixel (x, y) in the search area, and then identify pixels with probabilities greater than the threshold as foreground pixels according to a preset probability threshold.
103. Performing image fusion on the heat distribution map and the foreground frame selection map to generate a fusion heat map, and determining the tracking position of the target object in the current frame image according to the fusion heat map;
In one embodiment of the present invention, the image fusing the heat distribution map and the foreground frame selection map to generate a fused heat map, and determining the tracking position of the target object in the current frame image according to the fused heat map includes: performing size cutting on the heat distribution map or the foreground frame selection map according to the sizes of the heat distribution map and the foreground frame selection map so that the sizes of the heat distribution map and the foreground frame selection map are consistent; determining a gain section corresponding to each pixel according to the color value of each pixel in the cut foreground frame selection diagram; performing color gain on the color values of the corresponding pixels in the fusion heat map according to the gain parameters corresponding to the gain intervals, and generating the fusion heat map according to the color values after the color gain; and determining the tracking position of the target object in the current frame image according to the fusion heat map.
Specifically, in the color feature extraction process, statistics on background colors are reduced in order to count only color information of the tracking target itself as much as possible. The object input as the target end is used for tracking the area delimited by the target two-dimensional bounding box, so that the size of the generated foreground frame selection image and the fusion heat image are different, and therefore, the two images need to be subjected to size clipping for subsequent calculation. Before the color gain is achieved, a gain interval corresponding to each pixel needs to be determined. This may be determined by calculating the difference between the color value of each pixel in the foreground frame map and the background color value. The differences are sorted and divided into a plurality of intervals, and each interval corresponds to one gain parameter, so that the gain interval to which each pixel belongs is determined. And aiming at the gain interval of each pixel, performing color gain operation on the color value of the corresponding pixel in the fusion heat map according to the corresponding gain parameter. The gain operation may use a simple linear function or other more complex function to better adapt to different application scenarios, and finally, the adjusted color value is set to the new color value of the current pixel in the fusion heat map.
Specifically, first, the fusion heat map is preprocessed. Smoothing, thresholding, or other morphological operations may be performed to reduce noise and highlight the target object, and the region with the highest heat value is found from the preprocessed fusion heat map. Algorithms based on threshold or connected region analysis, etc., may be used to locate these regions. For each region of higher heat, its centroid or center of gravity may be calculated as the approximate location of the target object. Target detection algorithms, such as deep learning-based target detection models (e.g., faster R-CNN, yolo, etc.), may be further applied to more accurately detect and locate around the target object. From the location information of the target object, a bounding box or other form of marker of the target object may be generated to identify the location of the target object in the current frame image.
104. According to the tracking position of the target object in the current frame image, carrying out first pose adjustment on the cradle head so that the shooting equipment aims at the target object;
In an embodiment of the present invention, after the adjusting the first pose of the pan-tilt according to the tracking position of the target object in the current frame image, the capturing device aligns to the target object, the method further includes: when the cradle head carries out second pose adjustment, calculating the total pose adjustment quantity of the cradle head, and obtaining a first adjustment quantity of the first pose adjustment; calculating a second adjustment amount of the cradle head for performing second pose adjustment according to the total pose adjustment amount and the first adjustment amount; and taking the second adjustment amount as an influence parameter, and adjusting the generated search area according to the influence parameter when the search area is generated by the next frame of image adjusted by the second pose.
Specifically, firstly, comparing the current pose of the cradle head with the initial pose, calculating the initial pose deviation of the cradle head, and when the first pose is adjusted, acquiring a first adjustment amount to represent a pose adjustment value obtained according to a target detection or tracking algorithm. This adjustment value may be the angle the pan/tilt head needs to rotate, the distance the pan/tilt head translates, or other relevant parameters. Subtracting the first adjustment amount from the total pose adjustment amount to obtain a second adjustment amount. Thus, the influence of the first adjustment on the total pose adjustment amount can be eliminated, so that the actual adjustment amount of the second pose adjustment can be obtained. And using the second adjustment amount as an influence parameter for adjusting the search area when the search area is generated by the next frame image of the second pose adjustment. According to the magnitude and direction of the second adjustment amount, the position, the magnitude or other parameters of the search area may be adjusted accordingly. Wherein, according to the magnitude and direction of the second adjustment amount, the center position of the search area can be adjusted. If the second adjustment amount indicates that the cradle head needs to move rightwards, the center position of the search area can be moved rightwards by a certain distance; if the second adjustment amount indicates that the pan-tilt needs to be moved upwards, the center position of the search area may be moved upwards by a certain distance. According to specific requirements, the moving distance can be determined according to the second adjustment amount. The size of the search area may be adjusted according to the size of the second adjustment amount. If the second adjustment amount indicates that the cradle head needs to track the target more accurately, the size of the search area can be reduced to improve the accuracy; the size of the search area may be increased if the second adjustment amount indicates that the pan-tilt needs to expand the search range to include more possible targets. In addition to location and size, other search area parameters may be adjusted according to the second adjustment amount, depending on the particular needs. For example, the direction or angle of the search area may be adjusted according to the direction of the second adjustment amount so as to be more matched with the target; or the sensitivity of the search area can be adjusted according to the magnitude of the second adjustment amount, so that the response of the search area to the pose change is more appropriate.
Further, the cradle head comprises a gyroscope and an accelerometer; when the cradle head performs the second pose adjustment, calculating the total pose adjustment amount of the cradle head and obtaining the first adjustment amount of the first pose adjustment includes: when the cradle head carries out second pose adjustment, acquiring the rotation speeds of the cradle head in each axial direction in a machine body coordinate axis through the gyroscope, and acquiring the current gravity direction information of the cradle head through an accelerometer; calculating the rotation angles of the cradle head in each axial direction in the machine body coordinate axis according to the rotation speed and the gravity direction information; and taking the rotation angles of the axial directions as the total pose adjustment amount of the cradle head, and obtaining the first adjustment amount of the first pose adjustment.
Specifically, when the cradle head is opened, the above sensors, including the gyroscope and the accelerometer, need to be initialized, so that the gyroscope and the accelerometer can be ensured to work normally, and the gyroscope and the accelerometer can be calibrated to eliminate any deviation or error. And acquiring the rotation speeds of the cradle head in each axial direction in the machine body coordinate axis through the gyroscope. Gyroscopes typically provide angular velocity measurements in three axial directions (e.g., X, Y and Z axes). These values represent the rotational speed of the pan/tilt head about the axes. And acquiring current gravity direction information of the cradle head through an accelerometer. The accelerometer can measure acceleration of the cradle head in each axial direction. The accelerometer may provide a gravity direction vector pointing to the earth's center due to the force of gravity. And calculating the rotation angles of the cradle head in each axial direction in the machine body coordinate axis through the rotation speed and the gravity direction information. The method can be realized by using an attitude estimation algorithm, and a Kalman filtering method, a complementary filtering method or a quaternion method is adopted in a common algorithm, so that the rotation angles of the cradle head in all axial directions in a machine body coordinate axis can be obtained according to the method.
105. Updating the current frame image into a target frame image respectively, updating the next frame image of the current frame image into the current frame image, and returning to the step of generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image until a preset stop condition is met, so that the shooting equipment continuously aims at the target object before the preset stop condition is met.
In one embodiment of the present invention, the cycle is implemented by repeatedly updating the current frame image to the target frame image and updating the next frame image of the current frame image to the current frame image until a preset stop condition is met, for example, the photographing apparatus is powered off, the target object is too far away from the photographing apparatus to enable recognition of the target, or the user selects to stop tracking, and before the stop condition is met, the photographing apparatus is continuously aligned to the target object by continuously cycling until the preset stop condition is met.
In the present embodiment, by responding to a click operation with respect to a display screen on a photographing apparatus, a frame image at the same time as the click operation is acquired as an initial frame image from video stream data; respectively taking an initial frame image as a target frame image, taking the next frame image of the initial frame image as a current frame image, and generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image; and carrying out image fusion on the heat distribution map and the foreground frame selection map to generate a fusion heat map, determining the tracking position of the target object in the current frame image according to the fusion heat map, and controlling the cradle head to change the pose according to the continuously acquired tracking position. The method adopts the image fusion technology and combines the heat distribution diagram and the foreground frame selection diagram, so that the tracking position of the target object can be accurately determined, the posture of the cradle head can be further accurately controlled, and the shooting equipment can be better aligned to the target object.
The embodiment of the invention describes a camera-based shooting tracking method, and the following describes a camera-based shooting tracking device in the embodiment of the invention, where a shooting device is mounted on the camera, referring to fig. 2, and one embodiment of the camera-based shooting tracking device in the embodiment of the invention includes:
A response module 201, configured to respond to a click operation for a display screen on the capturing device, acquire video stream data transmitted by the capturing device in real time, and acquire, from the video stream data, a frame image at the same time as the click operation as an initial frame image;
a map generating module 202, configured to take the initial frame image as a target frame image, take a next frame image of the initial frame image as a current frame image, and generate a heat distribution map and a foreground frame selection map between the target frame image and the current frame image;
The image fusion module 203 is configured to perform image fusion on the heat distribution map and the foreground frame selection map, generate a fusion heat map, and determine a tracking position of the target object in the current frame image according to the fusion heat map;
the pose adjustment module 204 is configured to perform a first pose adjustment on the pan-tilt according to a tracking position of the target object in the current frame image, so that the photographing device is aligned to the target object;
And the circulation module 205 is configured to update the current frame image to a target frame image, update a next frame image of the current frame image to a current frame image, and return to the step of generating the thermal profile and the foreground frame selection image between the target frame image and the current frame image until a preset stop condition is met, so that the photographing apparatus continuously aligns with the target object before the preset stop condition is met.
In the embodiment of the invention, the shooting tracking device based on the cradle head runs the shooting tracking method based on the cradle head, and the shooting tracking device based on the cradle head acquires a frame image at the same moment as the clicking operation from video stream data as an initial frame image by responding to the clicking operation aiming at a display picture on shooting equipment; respectively taking an initial frame image as a target frame image, taking the next frame image of the initial frame image as a current frame image, and generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image; and carrying out image fusion on the heat distribution map and the foreground frame selection map to generate a fusion heat map, determining the tracking position of the target object in the current frame image according to the fusion heat map, and controlling the cradle head to change the pose according to the continuously acquired tracking position. The method adopts the image fusion technology and combines the heat distribution diagram and the foreground frame selection diagram, so that the tracking position of the target object can be accurately determined, the posture of the cradle head can be further accurately controlled, and the shooting equipment can be better aligned to the target object.
Referring to fig. 3, a second embodiment of a pan-tilt-based photographing tracking apparatus according to an embodiment of the present invention includes:
A response module 201, configured to respond to a click operation for a display screen on the capturing device, acquire video stream data transmitted by the capturing device in real time, and acquire, from the video stream data, a frame image at the same time as the click operation as an initial frame image;
a map generating module 202, configured to take the initial frame image as a target frame image, take a next frame image of the initial frame image as a current frame image, and generate a heat distribution map and a foreground frame selection map between the target frame image and the current frame image;
The image fusion module 203 is configured to perform image fusion on the heat distribution map and the foreground frame selection map, generate a fusion heat map, and determine a tracking position of the target object in the current frame image according to the fusion heat map;
the pose adjustment module 204 is configured to perform a first pose adjustment on the pan-tilt according to a tracking position of the target object in the current frame image, so that the photographing device is aligned to the target object;
And the circulation module 205 is configured to update the current frame image to a target frame image, update a next frame image of the current frame image to a current frame image, and return to the step of generating the thermal profile and the foreground frame selection image between the target frame image and the current frame image until a preset stop condition is met, so that the photographing apparatus continuously aligns with the target object before the preset stop condition is met.
In one embodiment of the present invention, the graph generation module 202 includes:
An image determination unit 2021 for taking the initial frame image as a target frame image and taking a next frame image of the initial frame image as a current frame image, respectively;
An object detection unit 2022, configured to perform object detection on the target frame image according to the position of the clicking operation, and determine position information and size information of a target object in the target frame image;
A region generating unit 2023 for generating a target region corresponding to the target frame image and a search region corresponding to the current frame image based on the position information and the size information;
A convolution extracting unit 2024 configured to perform convolution feature extraction on the target region and the search region, generating a heat distribution map between the target frame image and the current frame image;
And a color extraction unit 2025, configured to perform color feature extraction on the target area and the search area, and generate a foreground frame map between the target frame image and the current frame image.
In one embodiment of the present invention, the convolution extracting unit 2024 includes:
inputting the target area and the search area into a preset twin neural network, wherein the twin neural network comprises a first branch and a second branch;
Performing convolution feature extraction on the target area and the search area through the first branch and the second branch respectively to obtain a first feature map and a second feature map;
respectively carrying out channel dimension transformation on the first feature map and the second feature map to obtain a first feature map vector and a second feature map vector under each channel dimension;
And performing element-by-element multiplication operation on the first feature vector and the second feature vector in each channel dimension to obtain multiplication results in each channel dimension, and performing summation operation on each multiplication result in the channel dimension to obtain a heat distribution diagram between the target frame image and the current frame image.
In one embodiment of the present invention, the color extraction unit 2025 includes:
Dividing each pixel in the target area and the search area into corresponding color intervals according to corresponding color values to obtain corresponding first color histograms and second color histograms;
calculating a probability that each pixel in the search area is a foreground pixel based on the first color histogram and the second color histogram;
and identifying the pixels with the probability larger than a preset probability threshold as foreground pixels, and generating a foreground frame selection map according to all the foreground pixels.
In one embodiment of the present invention, the graph fusion module 203 is specifically configured to:
Performing size cutting on the heat distribution map or the foreground frame selection map according to the sizes of the heat distribution map and the foreground frame selection map so that the sizes of the heat distribution map and the foreground frame selection map are consistent;
Determining a gain section corresponding to each pixel according to the color value of each pixel in the cut foreground frame selection diagram;
Performing color gain on the color values of the corresponding pixels in the fusion heat map according to the gain parameters corresponding to the gain intervals, and generating the fusion heat map according to the color values after the color gain;
And determining the tracking position of the target object in the current frame image according to the fusion heat map.
In an embodiment of the present invention, the pan-tilt-based photographing tracking device further includes an area adjustment module 206, where the area adjustment module 206 is specifically configured to:
When the cradle head carries out second pose adjustment, calculating the total pose adjustment quantity of the cradle head, and obtaining a first adjustment quantity of the first pose adjustment;
calculating a second adjustment amount of the cradle head for performing second pose adjustment according to the total pose adjustment amount and the first adjustment amount;
And taking the second adjustment amount as an influence parameter, and adjusting the generated search area according to the influence parameter when the search area is generated by the next frame of image adjusted by the second pose.
In one embodiment of the invention, the cradle head comprises a gyroscope and an accelerometer; the area adjustment module 206 is specifically further configured to:
When the cradle head carries out second pose adjustment, acquiring the rotation speeds of the cradle head in each axial direction in a machine body coordinate axis through the gyroscope, and acquiring the current gravity direction information of the cradle head through an accelerometer;
Calculating the rotation angles of the cradle head in each axial direction in the machine body coordinate axis according to the rotation speed and the gravity direction information;
And taking the rotation angles of the axial directions as the total pose adjustment amount of the cradle head, and obtaining the first adjustment amount of the first pose adjustment.
The present embodiment describes in detail specific functions of each module and unit constitution of a part of the modules on the basis of the above embodiment, and by responding to a click operation for a display screen on a photographing apparatus, a frame image at the same time as the click operation is acquired from video stream data as an initial frame image by each of the above modules and units of the modules; respectively taking an initial frame image as a target frame image, taking the next frame image of the initial frame image as a current frame image, and generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image; and carrying out image fusion on the heat distribution map and the foreground frame selection map to generate a fusion heat map, determining the tracking position of the target object in the current frame image according to the fusion heat map, and controlling the cradle head to change the pose according to the continuously acquired tracking position. The method adopts the image fusion technology and combines the heat distribution diagram and the foreground frame selection diagram, so that the tracking position of the target object can be accurately determined, the posture of the cradle head can be further accurately controlled, and the shooting equipment can be better aligned to the target object.
Fig. 2 and fig. 3 above describe the pan-tilt-based photographing tracking apparatus in the embodiment of the present invention in detail from the point of view of the modularized functional entity, and the pan-tilt-based photographing tracking device in the embodiment of the present invention is described in detail from the point of view of hardware processing below.
fig. 4 is a schematic structural diagram of a pan-tilt-based photographing tracking device according to an embodiment of the invention, where the pan-tilt-based photographing tracking device 400 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 410 (e.g., one or more processors) and a memory 420, and one or more storage mediums 430 (e.g., one or more mass storage devices) storing application programs 433 or data 432. Wherein memory 420 and storage medium 430 may be transitory or persistent storage. The program stored in the storage medium 430 may include one or more modules (not shown), each of which may include a series of instruction operations on the cradle head-based photographing tracking apparatus 400. Still further, the processor 410 may be configured to communicate with the storage medium 430, and execute a series of instruction operations in the storage medium 430 on the pan-tilt-based photographing tracking apparatus 400 to implement the steps of the pan-tilt-based photographing tracking method described above.
The pan-tilt-based capture tracking device 400 may also include one or more power supplies 440, one or more wired or wireless network interfaces 450, one or more input/output interfaces 460, and/or one or more operating systems 431, such as Windows Serve, mac OS X, unix, linux, freeBSD, and the like. It will be appreciated by those skilled in the art that the pan-tilt-based capture tracking device structure shown in fig. 4 is not limiting of the pan-tilt-based capture tracking device provided by the present invention, and may include more or fewer components than shown, or may combine certain components, or may be arranged in a different arrangement of components.
The present invention also provides a computer readable storage medium, which may be a non-volatile computer readable storage medium, and may also be a volatile computer readable storage medium, where instructions are stored in the computer readable storage medium, where the instructions when executed on a computer cause the computer to perform the steps of the pan-tilt-based shooting tracking method.
It will be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process of the system or apparatus and unit described above may refer to the corresponding process in the foregoing method embodiment, which is not repeated herein.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution or in part in the form of a software product stored in a storage medium, including instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a read-only memory (ROM), a random access memory (random access memory, RAM), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
The above embodiments are only for illustrating the technical solution of the present invention, and not for limiting the same; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A shooting tracking method based on a cradle head is characterized in that shooting equipment is mounted on the cradle head; the shooting tracking method based on the cradle head comprises the following steps:
Responding to a clicking operation aiming at a display picture on the shooting equipment, acquiring video stream data transmitted by the shooting equipment in real time, and acquiring a frame image at the same moment as the clicking operation from the video stream data as an initial frame image;
Respectively taking the initial frame image as a target frame image and taking the next frame image of the initial frame image as a current frame image; object detection is carried out on the target frame image according to the position of the clicking operation, and the position information and the size information of the target object in the target frame image are determined; generating a target area corresponding to the target frame image and a search area corresponding to the current frame image based on the position information and the size information; inputting the target area and the search area into a preset twin neural network, wherein the twin neural network comprises a first branch and a second branch; performing convolution feature extraction on the target area and the search area through the first branch and the second branch respectively to obtain a first feature map and a second feature map; respectively carrying out channel dimension transformation on the first feature map and the second feature map to obtain a first feature map vector and a second feature map vector under each channel dimension; performing element-by-element multiplication operation on the first feature map vector and the second feature map vector in each channel dimension to obtain multiplication results in each channel dimension, and performing summation operation on each multiplication result in the channel dimension to obtain a heat distribution diagram between a target frame image and the current frame image; dividing each pixel in the target area and the search area into corresponding color intervals according to corresponding color values to obtain corresponding first color histograms and second color histograms; calculating a probability that each pixel in the search area is a foreground pixel based on the first color histogram and the second color histogram; identifying the pixels with the probability larger than a preset probability threshold as foreground pixels, and generating a foreground frame selection chart according to all the foreground pixels;
Performing size cutting on the heat distribution map or the foreground frame selection map according to the sizes of the heat distribution map and the foreground frame selection map so that the sizes of the heat distribution map and the foreground frame selection map are consistent; determining a gain section corresponding to each pixel according to the color value of each pixel in the cut foreground frame selection diagram; performing color gain on the color values of the corresponding pixels according to the gain parameters corresponding to the gain intervals, and generating a fusion heat map according to the color values after the color gain; determining the tracking position of the target object in the current frame image according to the fusion heat map;
According to the tracking position of the target object in the current frame image, carrying out first pose adjustment on the cradle head so that the shooting equipment aims at the target object;
Updating the current frame image to be a target frame image respectively, updating the next frame image of the current frame image to be the current frame image, and returning to the step of generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image until a preset stop condition is met, so that the shooting equipment continuously aims at the target object before the preset stop condition is met.
2. The pan-tilt-based photographing tracking method according to claim 1, wherein after performing a first pose adjustment on the pan-tilt according to the tracking position of the target object in the current frame image, the photographing apparatus is aligned to the target object, further comprising:
When the cradle head carries out second pose adjustment, calculating the total pose adjustment quantity of the cradle head, and obtaining a first adjustment quantity of the first pose adjustment;
calculating a second adjustment amount of the cradle head for performing second pose adjustment according to the total pose adjustment amount and the first adjustment amount;
And taking the second adjustment amount as an influence parameter, and adjusting the generated search area according to the influence parameter when the search area is generated by the next frame of image adjusted by the second pose.
3. The pan-tilt-based shot tracking method of claim 2, wherein the pan-tilt comprises a gyroscope and an accelerometer; when the cradle head performs the second pose adjustment, calculating the total pose adjustment amount of the cradle head and obtaining the first adjustment amount of the first pose adjustment includes:
When the cradle head carries out second pose adjustment, acquiring the rotation speeds of the cradle head in each axial direction in a machine body coordinate axis through the gyroscope, and acquiring the current gravity direction information of the cradle head through an accelerometer;
Calculating the rotation angles of the cradle head in each axial direction in the machine body coordinate axis according to the rotation speed and the gravity direction information;
And taking the rotation angles of the axial directions as the total pose adjustment amount of the cradle head, and obtaining the first adjustment amount of the first pose adjustment.
4. Shooting tracking device based on cloud platform, its characterized in that, carry on the cloud platform and shoot equipment, shooting tracking device based on cloud platform includes:
The response module is used for responding to the clicking operation aiming at the display picture on the shooting equipment, acquiring video stream data transmitted by the shooting equipment in real time, and acquiring a frame image at the same moment as the clicking operation from the video stream data as an initial frame image;
The image generation module is used for respectively taking the initial frame image as a target frame image and taking the next frame image of the initial frame image as a current frame image; object detection is carried out on the target frame image according to the position of the clicking operation, and the position information and the size information of the target object in the target frame image are determined; generating a target area corresponding to the target frame image and a search area corresponding to the current frame image based on the position information and the size information; inputting the target area and the search area into a preset twin neural network, wherein the twin neural network comprises a first branch and a second branch; performing convolution feature extraction on the target area and the search area through the first branch and the second branch respectively to obtain a first feature map and a second feature map; respectively carrying out channel dimension transformation on the first feature map and the second feature map to obtain a first feature map vector and a second feature map vector under each channel dimension; performing element-by-element multiplication operation on the first feature map vector and the second feature map vector in each channel dimension to obtain multiplication results in each channel dimension, and performing summation operation on each multiplication result in the channel dimension to obtain a heat distribution diagram between a target frame image and the current frame image; dividing each pixel in the target area and the search area into corresponding color intervals according to corresponding color values to obtain corresponding first color histograms and second color histograms; calculating a probability that each pixel in the search area is a foreground pixel based on the first color histogram and the second color histogram; identifying the pixels with the probability larger than a preset probability threshold as foreground pixels, and generating a foreground frame selection chart according to all the foreground pixels;
The map fusion module is used for carrying out size cutting on the heat distribution map or the foreground frame selection map according to the sizes of the heat distribution map and the foreground frame selection map so that the sizes of the heat distribution map and the foreground frame selection map are consistent; determining a gain section corresponding to each pixel according to the color value of each pixel in the cut foreground frame selection diagram; performing color gain on the color values of the corresponding pixels according to the gain parameters corresponding to the gain intervals, and generating a fusion heat map according to the color values after the color gain; determining the tracking position of the target object in the current frame image according to the fusion heat map;
The pose adjustment module is used for performing first pose adjustment on the cradle head according to the tracking position of the target object in the current frame image so that the shooting equipment is aligned to the target object;
and the circulation module is used for respectively updating the current frame image into a target frame image, updating the next frame image of the current frame image into the current frame image, and returning to the step of generating a heat distribution diagram and a foreground frame selection diagram between the target frame image and the current frame image until a preset stop condition is met, so that the shooting equipment continuously aims at the target object before the preset stop condition is met.
5. A pan-tilt-based photographing tracking apparatus, comprising: a memory and at least one processor, the memory having instructions stored therein;
The at least one processor invoking the instructions in the memory to cause the pan-tilt-based camera tracking device to perform the steps of the pan-tilt-based camera tracking method of any of claims 1-3.
6. A computer readable storage medium having instructions stored thereon, wherein the instructions when executed by a processor implement the steps of the pan-tilt-based shot tracking method of any of claims 1-3.
CN202410077917.5A 2024-01-19 2024-01-19 Shooting tracking method, device, equipment and storage medium based on cradle head Active CN117615255B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410077917.5A CN117615255B (en) 2024-01-19 2024-01-19 Shooting tracking method, device, equipment and storage medium based on cradle head

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410077917.5A CN117615255B (en) 2024-01-19 2024-01-19 Shooting tracking method, device, equipment and storage medium based on cradle head

Publications (2)

Publication Number Publication Date
CN117615255A CN117615255A (en) 2024-02-27
CN117615255B true CN117615255B (en) 2024-04-19

Family

ID=89950224

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410077917.5A Active CN117615255B (en) 2024-01-19 2024-01-19 Shooting tracking method, device, equipment and storage medium based on cradle head

Country Status (1)

Country Link
CN (1) CN117615255B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117896618B (en) * 2024-03-15 2024-05-14 深圳市浩瀚卓越科技有限公司 Anti-shake method, device, equipment and storage medium for cradle head shooting

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
WO2020259264A1 (en) * 2019-06-28 2020-12-30 Oppo广东移动通信有限公司 Subject tracking method, electronic apparatus, and computer-readable storage medium
CN116091979A (en) * 2023-03-01 2023-05-09 长沙理工大学 Target tracking method based on feature fusion and channel attention
CN116740126A (en) * 2023-08-09 2023-09-12 深圳市深视智能科技有限公司 Target tracking method, high-speed camera, and storage medium
CN117412161A (en) * 2023-09-27 2024-01-16 众源科技(广东)股份有限公司 Trolley tracking method and device, storage medium and terminal equipment

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100792283B1 (en) * 2001-08-07 2008-01-07 삼성전자주식회사 Device and method for auto tracking moving object
JP6649864B2 (en) * 2015-10-23 2020-02-19 株式会社モルフォ Image processing apparatus, electronic device, image processing method, and program

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020259264A1 (en) * 2019-06-28 2020-12-30 Oppo广东移动通信有限公司 Subject tracking method, electronic apparatus, and computer-readable storage medium
CN111354017A (en) * 2020-03-04 2020-06-30 江南大学 Target tracking method based on twin neural network and parallel attention module
CN116091979A (en) * 2023-03-01 2023-05-09 长沙理工大学 Target tracking method based on feature fusion and channel attention
CN116740126A (en) * 2023-08-09 2023-09-12 深圳市深视智能科技有限公司 Target tracking method, high-speed camera, and storage medium
CN117412161A (en) * 2023-09-27 2024-01-16 众源科技(广东)股份有限公司 Trolley tracking method and device, storage medium and terminal equipment

Also Published As

Publication number Publication date
CN117615255A (en) 2024-02-27

Similar Documents

Publication Publication Date Title
US20200159256A1 (en) Method for detecting target object, detection apparatus and robot
US10198823B1 (en) Segmentation of object image data from background image data
US11205274B2 (en) High-performance visual object tracking for embedded vision systems
CN111328396B (en) Pose estimation and model retrieval for objects in images
CN109934065B (en) Method and device for gesture recognition
US10055013B2 (en) Dynamic object tracking for user interfaces
US8467596B2 (en) Method and apparatus for object pose estimation
US8369574B2 (en) Person tracking method, person tracking apparatus, and person tracking program storage medium
CN111783820A (en) Image annotation method and device
CN111797657A (en) Vehicle peripheral obstacle detection method, device, storage medium, and electronic apparatus
CN113286194A (en) Video processing method and device, electronic equipment and readable storage medium
CN117615255B (en) Shooting tracking method, device, equipment and storage medium based on cradle head
CN111062263B (en) Method, apparatus, computer apparatus and storage medium for hand gesture estimation
CN111382613B (en) Image processing method, device, equipment and medium
EP2864933A1 (en) Method, apparatus and computer program product for human-face features extraction
CN106650965B (en) Remote video processing method and device
JP7272024B2 (en) Object tracking device, monitoring system and object tracking method
CN112184757A (en) Method and device for determining motion trail, storage medium and electronic device
CN110070578B (en) Loop detection method
Perera et al. Human detection and motion analysis from a quadrotor UAV
JP2018120283A (en) Information processing device, information processing method and program
CN107274477B (en) Background modeling method based on three-dimensional space surface layer
CN116883897A (en) Low-resolution target identification method
Verma et al. Robust Stabilised Visual Tracker for Vehicle Tracking.
CN106406507B (en) Image processing method and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant