CN113838098A - Intelligent tracking shooting system for remote high-speed moving target - Google Patents

Intelligent tracking shooting system for remote high-speed moving target Download PDF

Info

Publication number
CN113838098A
CN113838098A CN202111156436.6A CN202111156436A CN113838098A CN 113838098 A CN113838098 A CN 113838098A CN 202111156436 A CN202111156436 A CN 202111156436A CN 113838098 A CN113838098 A CN 113838098A
Authority
CN
China
Prior art keywords
target
tracking
video
camera
real
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111156436.6A
Other languages
Chinese (zh)
Other versions
CN113838098B (en
Inventor
董立泉
赵祺森
杨焘
赵跃进
褚旭红
刘明
孔令琴
刘宗达
惠梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yangtze River Delta Research Institute Of Beijing University Of Technology Jiaxing
Beijing Institute of Technology BIT
Original Assignee
Yangtze River Delta Research Institute Of Beijing University Of Technology Jiaxing
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yangtze River Delta Research Institute Of Beijing University Of Technology Jiaxing, Beijing Institute of Technology BIT filed Critical Yangtze River Delta Research Institute Of Beijing University Of Technology Jiaxing
Publication of CN113838098A publication Critical patent/CN113838098A/en
Application granted granted Critical
Publication of CN113838098B publication Critical patent/CN113838098B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Evolutionary Biology (AREA)
  • Multimedia (AREA)
  • Closed-Circuit Television Systems (AREA)

Abstract

The invention discloses an intelligent tracking shooting system for a remote high-speed moving target, belonging to the fields of computer vision, computer control and television rebroadcasting. The invention comprises an optical system, a stable platform system and a display control system. The invention takes a panoramic camera as a visual sensor, integrates a self-adaptive target detection tracking method with a vehicle-mounted two-axis three-dimensional stable platform, and combines a global camera to guide a close-up camera to track and shoot and a close-up camera to detect a double mechanism in a closed loop manner in the target tracking method, thereby improving the tracking precision of the whole system and ensuring that a target is positioned in the center of a video picture. When multi-channel videos are output, the system performs real-time evaluation and automatic switching on the multi-channel video streams through a real-time video stream intelligent switching method, outputs stable video pictures in real time, realizes stable follow shooting and real-time editing of high-speed moving targets in severe environments and complex scenes, and can provide good technical support for subsequent remote broadcasting, cloud reconstruction, personalized rebroadcasting and other extended applications.

Description

Intelligent tracking shooting system for remote high-speed moving target
Technical Field
The invention relates to an intelligent tracking shooting system for a remote high-speed moving target, belonging to the fields of computer vision, computer control and television rebroadcasting.
Background
With the coming of the winter Olympic meeting, alpine skiing projects attract a large number of skiing enthusiasts with unique stimulation and challenge, and meanwhile, as the sports have the characteristics of high sports speed, more rotary sports and the like, sports fields are complex in terrain, changeable in weather environment and bad in climate, the environmental temperature usually reaches below-20 ℃, and great trouble is caused to the work of game photographers and rebroadcasts. Under traditional manual shooting rebroadcasting mode, a rebroadcasting service company needs to erect dozens of professional shooting platforms on a steep snow track, and needs to enter a skiing track for motor shooting by partial shooting rebroadcasting workers, and in addition, a media center needs to arrange a large number of workers to receive, clip and push the video signals returned by each machine position in real time, so that the whole mode organization is huge, the operation is complex, the construction difficulty is large, and huge manpower and material resources are consumed. In recent years, with the rapid development of computer vision technology, a shooting system combined with a target tracking technology is widely applied to the field of security monitoring, however, the sports event environment and scene are relatively complex, and the shooting system is required to still ensure that videos with stable frame rate, high-definition pictures and proper viewing angle are output in real time under the condition that normal running of the event is not influenced, which brings great challenges to the existing computer vision technology.
Disclosure of Invention
In order to solve the above problems, the present invention aims to provide an intelligent tracking shooting system for a long-distance high-speed moving target; the system takes a panoramic camera as a visual sensor, integrates a self-adaptive target detection tracking technology with a vehicle-mounted two-dimensional stable platform, and in the target tracking technology, a global camera is combined to guide a close-up camera to track shooting and a close-up camera to detect a double mechanism in a closed loop mode, so that the tracking precision of the whole system is improved, and a target is ensured to be in the center of a video picture. When multi-channel videos are output, the system performs real-time evaluation and automatic switching on the multi-channel video streams through a real-time video stream intelligent switching method, outputs stable video pictures in real time, realizes stable follow shooting and real-time editing of high-speed moving targets in severe environments and complex scenes, and can provide good technical support for subsequent remote broadcasting, cloud reconstruction, personalized rebroadcasting and other extended applications.
The purpose of the invention is realized by the following technical scheme:
the invention discloses an intelligent tracking shooting relay system for a remote high-speed moving target, which comprises an optical system, a stable platform system and a display control system.
The optical system comprises a panoramic tracking camera and a professional close-up video camera; the optical system is fixed on the stable platform system; optical systems are used to capture and track objects to be photographed in a wide range of panoramas, and to take close-up views of distant objects.
The stable platform system comprises a two-dimensional stable rotary table and a rotary table controller module which is matched with and controls the stable platform; the two-dimensional stabilizing turntable is used for loading the optical system; the rotary table controller module is used for driving the two-dimensional stable platform to rotate so that the optical system can complete the function of tracking shooting.
The display control system comprises: the system comprises an image acquisition card, a software control system and a display system.
The image acquisition card is used for acquiring real-time image data of the panoramic tracking camera and the professional close-up video camera.
The software control system is used for processing the real-time image data acquired by the image acquisition card and realizing the functions of target detection, target tracking and real-time video stream intelligent switching.
The software control system comprises a central control unit for carrying an upper computer program, a video tracker module for realizing target detection and tracking and a power supply module.
The software control system sends instructions through the central control unit to realize system power switch, target manual and automatic detection conversion, close-up camera lens focal length and aperture adjustment, stable platform two-dimensional steering, image processing, target tracking, path planning shooting, image display and picture recording; the image processing comprises display parameter adjustment and parameter character embedding of the pictures of the panoramic tracking camera and the professional close-up video camera; the path planning shooting means that N pixel points are preset in a global tracking picture to form a track, so that the two-dimensional stable platform carries out tracking shooting according to the track; the target tracking means that a central control unit is used for delimiting a detection area in a picture of a panoramic tracking camera, a button is used for switching a manual target selection mode and an automatic target detection mode, target detection and tracking can be performed by adopting a target detection and tracking method, a stable platform can rotate along with a target, the target is kept to be always presented in a close-up picture of a professional close-up camera, then secondary detection and tracking are performed on the target in the close-up picture by adopting a self-adaptive detection and tracking algorithm, the position of the tracked target in the close-up picture and the focal length and aperture of the close-up camera are finely adjusted, and the target can be located at the optimal position of the picture.
The display system is used for displaying the panoramic tracking camera and close-up pictures of distant targets, and is also used for displaying information of the two-dimensional stable turntable and the optical system.
The optical system is connected with a video tracker module in the display control system, and the stable platform system is connected with the display control system.
The invention discloses an intelligent tracking shooting relay system for a remote high-speed moving target, which comprises an optical system, a stable platform system and a display control system; the system also comprises an optical system and a stable platform system which are called as a secondary acquisition system; the image acquired by the secondary acquisition system is processed by an image acquisition card and a software control system, and a target is detected and tracked by adopting a target detection and tracking method; performing secondary detection tracking on the target in the close-up picture by adopting a self-adaptive detection tracking algorithm, and finely adjusting the position of the tracked target in the close-up picture and the focal length and aperture of the close-up camera to enable the target to be at the optimal position of the picture; the multi-channel video stream is evaluated and automatically switched by a real-time video stream intelligent switching method, and a video picture with better impression is output in real time by a display system.
The method for evaluating and automatically switching the multi-channel video streams by the real-time video stream intelligent switching method comprises the following steps:
the method comprises the steps of firstly, constructing a database, wherein the database consists of a LIVE database, a newly acquired high-definition video and a video average subjective value corresponding to the video; classifying the videos in the database according to the average subjective value
Figure BDA0003288762740000021
Video of an interval is noted as type VTjJ is 0, 1, 2, …, N; randomly extracting samples from the database to establish an experiment sample set;
constructing a deep convolutional neural network model, wherein the deep convolutional neural network model comprises a pre-classification network and a video average subjective value regression prediction network; the pre-classification network is used for predicting video type VTjPredicting a score interval where the average subjective value of the video is located; the video average subjective value regression prediction network is used for predicting the video average subjective value y;
step three, training a deep convolutional neural network model, initializing, and inputting the experimental sample set in the step one into the deep convolutional neural network model constructed in the step two for training; performing iterative optimization on the deep convolutional neural network model constructed in the step two by adopting a random gradient descent method, calculating a loss function and gradient of the deep convolutional network after each iteration, and optimizing the weight and bias term of the deep convolutional neural network so as to seek the optimal deep convolutional neural network model for the current training;
and step four, switching the video stream signals in real time, judging whether the multi-channel video contains a tracked target, inputting the video into the regression prediction network trained in the step three when the video contains the tracked target and the visual area of the tracked target is larger than a set threshold value d, performing video visual quality real-time prediction on the data of the multi-channel video signals, selecting the video stream ID with the highest visual quality, and outputting the video stream signals to a screen of a display system.
The second step of the real-time video stream intelligent switching method is realized as follows,
constructing a deep convolutional neural network model, namely constructing a pre-classification network firstly, wherein the pre-classification network is used for the type VT corresponding to the input videojPerforming a prediction, the pre-classification network comprising a plurality of 3D convolution layers, a 3D max poolingThe layer, the full link layer and an output layer with N classifications are subjected to softmax function calculation to obtain a classification result, namely the video type VTjThe loss function uses a cross entropy function, the cross entropy L is calculated as follows:
Figure BDA0003288762740000031
where y represents the average subjective value,
Figure BDA0003288762740000032
representing a prediction score;
then constructing a regression prediction network of the average subjective value of the video, wherein the regression prediction structure of the average subjective value of the video is used for predicting the average subjective value y of the video, and the network structure uses a regression prediction node to replace an N classification output layer of a pre-classification network; loading the weight and bias parameters of a convolution layer and a pooling layer in the pre-classification network through transfer learning, and abandoning the parameters of a full connection layer in the pre-classification network; the loss function uses a Mean Square Error (MSE) loss function:
Figure BDA0003288762740000033
wherein N represents a total of N video segments; the pre-classification network and the video average subjective value regression prediction network jointly form a deep convolutional neural network model.
The invention also discloses a target detection and tracking method based on the conditions of image definition and tracking stability, which is realized based on the intelligent tracking shooting rebroadcasting system aiming at the remote high-speed moving target and comprises the following steps:
step one, detecting video content by using a target detection method to obtain a candidate target to be tracked and a target area, and comparing confidence coefficient of the candidate target with a preset threshold TdetectScreening candidate targets, and forming a target set D ═ D by using n targets to be tracked obtained through screening1,D2…, Di, …, Dn }, wherein,di represents the ith target to be tracked; storing the center position and the bounding box information of each target simultaneously;
step two, performing feature extraction on information in a boundary frame of the determined tracking target in the first frame from the tracking target determined in the target set D to be tracked obtained in the step one, and taking the obtained feature matrix as a target feature model
Figure BDA0003288762740000041
And calculating the area of the tracking target in the first frame image to obtain the definition C of the area of the tracking targetcur
Starting from the second frame image, calculating a characteristic response matrix of the target of the previous frame and the current frame, wherein the position of the response peak value is the central position of the target of the current frame, and further acquiring the characteristic matrix in the target area of the current frame
Figure BDA0003288762740000042
Then judging the target feature matrix of the previous frame
Figure BDA0003288762740000043
And
Figure BDA0003288762740000044
change versus feature model
Figure BDA0003288762740000045
Updating the model so as to better adapt to the change of the tracking target;
Figure BDA0003288762740000046
wherein l is the learning rate;
step three, the learning rate l updated in real time is taken into formula (3) to realize the target characteristic model
Figure BDA0003288762740000047
Adaptive updating of (3); calculating the image definition C of the current frame target area in real timecurBy calculating CcurAnd Ccur-1The difference value of the target tracking time is adjusted, the learning rate l of model updating is adjusted, and the target tracking precision is improved;
Figure BDA0003288762740000048
wherein, Ccur-1Representing the sharpness value, L, of the target region of the previous framebaseFor basic learning rate, TcIs a sharpness threshold, if the sharpness is lower than the threshold TcIf so, immediately adjusting the learning rate to 0, and stopping updating the target characteristic model to avoid the model being polluted;
step four, calculating by adopting the target characteristic response matrix obtained in the step two to obtain an average peak correlation energy value APCE;
Figure BDA0003288762740000049
wherein, FmaxRepresenting the peak of the response, FminIndicates the lowest value of the response, Fx,yA response value representing an (x, y) position in the response map;
when the response peak value is smaller than the preset threshold value Tmax_resposAnd APCE values are all smaller than the preset TAPCEThen the tracking target is interfered, and the target characteristic model M in the previous frame image is stored0If the target is interfered by a plurality of frames continuously, stopping updating the target characteristic model, simultaneously starting a target re-detection mode, and executing the step five; otherwise, continuing the target tracking in the second step and the third step;
step five, matching the target to be tracked, and obtaining a new target D to be tracked through a target detection algorithm1,D2,D3,…,Di(ii) a Respectively calculating candidate target characteristic model M1,M2,M3,…,Mi(ii) a Respectively calculating candidate target characteristic model MiAnd the target feature model M saved in the fourth step0A characteristic response matrix therebetween; peak response minimizationA target characteristic model corresponding to the large time is used as an initial tracking model, and the tracking algorithm in the second step is continuously executed;
and step six, repeating the step two to the step five, and realizing target detection and tracking based on the conditions of image definition and tracking stability.
The invention realizes real-time tracking and shooting of a long-distance high-speed moving target through an intelligent tracking shooting relay system, detects the target in a large field of view and carries out primary tracking by using a panoramic tracking camera picture, detects the target by using a professional close-up camera picture, corrects the rotation angle of a two-dimensional stable platform, and further improves the precision of the tracking shooting system in a long-distance scene; the method for real-time evaluation of video quality and intelligent switching and clipping of video signals are combined, and the multi-path tracking shooting system can achieve collaborative shooting of the same target.
Has the advantages that:
1. the invention discloses an intelligent tracking shooting system for a long-distance high-speed moving target, which takes a panoramic camera as a visual sensor, integrates an adaptive target detection tracking technology with a vehicle-mounted two-axis three-dimensional stable platform, and combines a global camera to guide a close-up camera to track shooting and a close-up camera to detect a double mechanism in a closed loop manner in the target tracking technology, thereby improving the tracking precision of the whole system and ensuring that the target is in the center of a video picture.
2. The invention discloses an intelligent tracking shooting system for a remote high-speed moving target, which adopts an image definition evaluation function to adjust the updated learning rate of a tracking model, can effectively relieve the problem that a target tracking algorithm is easy to interfere due to the fact that the moving speed of the target changes too fast, judges the confidence coefficient of the model by using the average peak value correlation energy and the characteristic response peak value, does not update the model when the confidence coefficient is lower, starts a target re-detection mechanism, effectively relieves the problem that the target characteristic model fails due to the fact that the target is easy to be shielded and interfered, and can realize stable tracking in a complex environment.
3. The invention discloses an intelligent tracking shooting system for a remote high-speed moving target, which establishes a detection-tracking combined working mechanism by combining an APCE target re-detection mechanism through a self-adaptive program target tracking model, thereby effectively improving the tracking precision and stability of a tracking algorithm; the model is updated based on the definition self-adaption of the target area, and the system tracking precision under the conditions of large change of the target movement speed and image blurring can be effectively improved.
4. The invention discloses an intelligent tracking shooting system for a long-distance high-speed moving target, which realizes real-time tracking and shooting of the long-distance high-speed moving target through an intelligent tracking shooting relay system, detects the target in a large view field by using a panoramic tracking camera picture and carries out primary tracking, then detects the target by using a professional close-up camera picture, corrects the rotation angle of a two-dimensional stable turntable, and further improves the precision of the tracking shooting system in a long-distance scene; the method for real-time evaluation of video quality and intelligent switching and clipping of video signals are combined, and the multi-path tracking shooting system can achieve collaborative shooting of the same target.
5. When multi-path videos are output, the intelligent tracking shooting system for the remote high-speed moving target disclosed by the invention evaluates and automatically switches the multi-path video streams in real time through a real-time video stream intelligent switching method, outputs stable video pictures in real time, realizes stable follow shooting and real-time editing of the high-speed moving target in severe environment and complex scenes, and can provide good technical support for subsequent extended applications such as remote broadcasting, cloud reconstruction, personalized rebroadcasting and the like.
Drawings
FIG. 1 is a block diagram of the overall structure of an intelligent tracking shooting system for a remote high-speed moving target according to the present invention;
FIG. 2 is a schematic view of a high precision two dimensional stabilizing turret;
FIG. 3 is a schematic diagram of coordinate transformation of a turning angle relationship between a professional-grade close-up video camera and a panoramic tracking camera;
FIG. 4 is a schematic diagram of an image-based visual servoing system;
FIG. 5 is a diagram of a display control system power module, a display control box module and a tracker module;
FIG. 6 is a system communication signal flow diagram;
fig. 7 is a flow chart of real-time video stream intelligent switching.
Detailed Description
The invention is described in detail below with reference to the figures and specific embodiments. It should be noted that the described examples are only intended to facilitate the understanding of the invention, and do not have any limiting effect thereon.
The general hardware design structure block diagram of the invention is shown in figure 1, and the system consists of a two-dimensional stable turntable, a panoramic tracking camera, a professional close-up video camera, a turntable controller module and a display control system;
the panoramic tracking camera is fixed outside the stable platform, the professional close-up camera is fixed on the stable platform, and a three-dimensional graph of the corner relation coordinate conversion principle of the professional close-up camera and the panoramic tracking camera is shown in figure 3 and can be determined by the following mathematical principles:
Figure BDA0003288762740000061
the coordinate of any point P in the space in the rectangular coordinate system is (x)w,yw,zw) The coordinates in the spherical coordinate system are set as (rho, alpha, beta), A is a rotation matrix, and rho is the distance from P to the coordinate origin, wherein the rotation matrix is a coordinate change coefficient matrix of the corresponding relation of the coordinate positions in the panoramic tracking camera and the close-up camera every time the close-up camera rotates. α represents the pitch angle of the rotating camera and β represents the azimuth angle. To better understand the position relationship of points in space in two cameras, the position relationship can be respectively OwAnd OqAnd establishing a three-dimensional coordinate system of the two cameras as an origin. Since the three-dimensional rectangular coordinate of the fixed position camera is needed in the subsequent calculation, O is usedwEstablishing a three-dimensional rectangular coordinate system for an origin; the rotation angle of the rotating camera is OqAnd establishing a spherical coordinate system for the origin. For any point P in space, the coordinate in the rectangular coordinate system is (x)w,yw,zw) The coordinates in the spherical coordinate system are (ρ, α, β). If (x, y) and (alpha) are to be determinedβ), the positional correspondence of the point P in the two-camera coordinate system is first determined. In general, P to OqIs far greater than OwTo OqThus in practical application, OwTo OqIs negligible, i.e. these two points can be considered as one point. Therefore, the translation matrix can be ignored in the coordinate conversion, and only the rotation matrix relation of the two cameras needs to be calculated. In order to determine the transformation relationship between (x, y) and (α, β), it is also necessary to know the transformation relationship between the three-dimensional coordinates of the point P in space and the two-dimensional image plane coordinates, which are determined according to the following formula using the similarity relationship of triangles according to the imaging principle shown in fig. 3:
Figure BDA0003288762740000062
Figure BDA0003288762740000063
where f is the focal length of the panoramic tracking camera, the close-up camera to panoramic tracking camera corner relationship coordinate pixel transform is determined according to the following formula, since the calculations in a video frame typically use pixel units:
Figure BDA0003288762740000071
wherein (c)x,cy) Is the offset of the point P in O X Y. The coordinate of any point in the panoramic tracking camera is converted into the angle coordinate of the rotating platform, and the angle coordinate is determined according to the following formula:
Figure BDA0003288762740000072
wherein f isx、fyRespectively, the unit focal length in the direction of the image coordinate system OXY, fx,fy,cx,cyBelonging to camera internal parameters, the distortion correction acquisition in Zhang Zhen scaling can be utilized before using the camera.
The design drawing and the real drawing of the two-dimensional stable turntable are shown in figure 2; the two-dimensional stable rotary table is a high-precision direct-drive servo rotary table, a linear motor is adopted to directly drive a load to realize angle tracking and positioning, the linear motor is provided with an encoder, the rotating angle of the servo motor is detected in real time through a built-in encoder of the direct-drive motor, and a closed-loop control system is constructed to realize accurate closed-loop control of position, speed and acceleration. The mechanical conversion mechanism changes the restriction that the rotary motion of a rotary motor can be converted into linear motion only by depending on a mechanical conversion device in the past, and overcomes the defects of long transmission chain, large volume, low efficiency, poor precision and the like of the traditional mechanical conversion mechanism.
The display control system comprises: the system comprises an image acquisition card, a software control system and a display system; the image acquisition card is used for acquiring real-time image data of the panoramic tracking camera and the professional close-up video camera. The software control system is used for processing the real-time image data acquired by the image acquisition card and realizing the functions of target detection, target tracking and real-time video stream intelligent switching; the software control system comprises a central control unit for carrying an upper computer program, a video tracker module for realizing target detection and tracking and a power supply module; and the video tracker is responsible for tracking, processing and calculating the video stream transmitted by the wide-angle tracking camera.
The display system is provided with two display screens, can simultaneously display the wide-angle image of the panoramic tracking camera and the close-up image of the professional close-up camera in a split screen manner, and is convenient to use and observe.
The control console of the central control unit is provided with two aviation plug interfaces, one of the aviation plug interfaces is an external interface for power supply and communication, the other aviation plug interface is an independent communication interface, the left side of the control console is provided with a mouse and keyboard operation panel, and the operation panel is used for controlling the industrial personal computer. The right side of the console is the main operation panel of the console. The main operation panel is provided with a rocker and a key area, and a design program is burnt into the control panel by using a control protocol matched with the control platform, so that the key function on the control panel can be designed. The rocker comprises two potentiometers, namely a sliding rheostat. One potentiometer corresponds to an up-down control direction, the other potentiometer corresponds to a left-right control direction, the directions and pitching of the rotary platform are controlled, a command protocol is sent to the rotary table controller module to enable the two-dimensional stable rotary table to rotate, the rotary table controller module can also return real-time data of the two-dimensional stable rotary table to the control table, such as the directions, pitching angles and the like, and the control protocols are all sent according to a manual of a control protocol of a photoelectric platform of GDC series.
The central control unit has the functions of setting an automatic tracking target and a manual tracking target, manually and automatically adjusting a focus diaphragm, and performing linkage tracking, path planning tracking, picture recording and the like, wherein the path planning tracking is that the stable platform performs tracking shooting according to the position of a pixel point preset on a tracking image interface of the display control box. Similarly, the zoom and focus keys in the console can control the change of the parameters of the lens of the recorded broadcast camera, and the focal length and the focus parameters can also be returned to the console through a protocol; secondly, the manual/tracking button in the console is switched and pressed to change the working states of the video tracker and the console, namely, a manual signal is sent out, a rotation signal of the stable platform is executed by an operator controlling a rocker, a tracking signal is sent out, the rotation signal of the stable platform is changed into a miss distance signal sent to the console by the video tracker, and then the console sends a control signal to the controller to control the platform to rotate. Finally, all parameter data including platform angle parameters, professional close-up camera lens parameters, and tracker and console states are sent to the industrial personal computer for display by the console through a communication protocol (the communication port of the industrial personal computer for receiving console data is COM 2); in addition, because the mouse is relatively slow for operating the industrial personal computer, part of functions can be realized by sending commands to the industrial personal computer through peripheral keys of the industrial personal computer to realize software operation quickly (the communication port of the industrial personal computer for receiving the peripheral keys is COM 1).
The video tracker processes each frame of image of a video transmitted by the panoramic tracking camera and detects and tracks a target by using a target tracking method based on a related filtering self-adaptive tracking algorithm. Firstly, selecting a target through a display and control box, transmitting a target center pixel point coordinate to a video tracker, obtaining the target center pixel point coordinate in the next frame of image by the video tracker through running a self-adaptive tracking method, calculating miss distance which is a horizontal and vertical pixel difference value delta x and delta y between a tracking target center and a video picture center, then converting a rotation angle instruction through the display and control box and sending the miss distance to a turntable controller module, using the miss distance as an error value of a PID (proportion integration differentiation) automatic control feedback loop by the turntable controller module, immediately generating a control action for reducing the error, outputting a discrete electric signal to control the rotation angle and the rotation speed of a servo motor under the control action, carrying out continuous feedback correction, realizing the stable tracking of the target by the turntable, and further realizing the stable close-up shooting of the target by a recording and broadcasting camera.
The overall process is as follows, the panoramic tracking camera synchronously transmits image data to the video tracker, the tracker performs target detection on each frame of image, the detection result data is transmitted to the software control system, the central control unit immediately transmits a tracking instruction to the servo tracker after the detection tracking mode is started, the tracker runs an adaptive tracking detection algorithm to track the target and returns the angle to be rotated of the two-dimensional stable turntable to the central control unit in real time, and the central control unit transmits a control instruction to the turntable controller module to enable the stable rotation platform to track and rotate. Meanwhile, a professional close-up camera shoots a close-up picture of the moving target, the tracker carries out secondary target detection on the recorded broadcast picture, the rotating angle of the stable rotating platform is finely adjusted to correct the position of the moving target on the picture, and meanwhile optical zooming parameters such as the focal length and the aperture of the recorded broadcast camera are adjusted to achieve the best rebroadcast shooting effect.
The target detection tracking method based on the image definition and tracking stability conditions comprises the following specific implementation steps:
step one, detecting video content by using a target detection algorithm to obtain a candidate target to be tracked and a target area, and comparing confidence coefficient of the candidate target with a preset threshold TdetectThe candidate objects are screened out and the candidate objects are selected,forming a target set D ═ D by the n targets to be tracked obtained by screening1,D2…, Di, …, Dn }, wherein Di represents the ith target to be tracked; storing the center position and the bounding box information of each target simultaneously;
specifically, in this embodiment, a YOLOv4 target detection algorithm is used to perform frame-by-frame detection on a video image, and candidate targets to be tracked and target areas are obtained on each frame of image, where the target areas are rectangles. The confidence score is larger than the set threshold value TdetectN targets to be tracked form a target set D ═ D1,D2…, Di, …, Dn }, wherein Di represents the ith target to be tracked. Simultaneously storing the central position and the size information of the target area to be tracked, wherein the central position set P is { P ═ P1(x1,y1),P2(x2,y2),P3(x3,y3),…,Pi(xi,yi) H, bounding box size set S ═ S1(w1,h1),S2(w2,h2),S3(w3,h3),…,Si(wi,hi) In which xiIs the abscissa, y, of the pixel at the center point of the target areaiIs the ordinate, w, of the pixel at the center point of the target regioniAnd hiRespectively the width and height of the target area.
In addition, in the present embodiment, the target may also be detected according to the region of interest calibrated in advance.
Step two, determining a tracking target from the target set D to be tracked obtained in the step one, performing feature extraction on the target boundary box content of the determined tracking target in the first frame, and taking the obtained feature matrix x as a target feature model
Figure BDA0003288762740000091
And calculating a target area of the determined tracking target in the first frame image by using a Tenengrad function to obtain the definition C of the target areacur. Then, a correlation filtering method is applied to calculate the target characteristic response moments of the first frame and the current frameAnd the position of the response peak value is the central position of the current frame target, and the characteristic matrix in the current frame target area is further acquired
Figure BDA0003288762740000092
Then judging the target feature matrix of the previous frame
Figure BDA0003288762740000093
And
Figure BDA0003288762740000094
change versus feature model
Figure BDA0003288762740000095
Model updates are made to better accommodate changes in the tracked targets.
Figure BDA0003288762740000096
Wherein l is the learning rate.
In this embodiment, for the target to be tracked, the HOG feature, the color histogram feature and the gray feature of the target are respectively extracted, and the three feature vectors are respectively subjected to column vectorization and then are longitudinally connected to form the target to be tracked
Figure BDA0003288762740000097
Wherein HOG represents the HOG feature of the candidate object, P represents the color histogram feature of the candidate object, and Q represents the grayscale feature of the candidate object. And calculating through a correlation filtering template
f(z)=xTz (7)
Figure BDA0003288762740000098
Figure BDA0003288762740000099
WhereinZ is the feature matrix of the next frame image, f (z) is the feature response matrix, kxzIs a function of the correlation of the kernel,
Figure BDA00032887627400000910
is a representation of the kernel correlation function in the frequency domain, alpha is a representation of the non-linear coefficients in the frequency domain,
Figure BDA00032887627400000911
representing the computation of a characteristic response matrix function in the frequency domain.
After the characteristic response matrix function of the adjacent frame target is calculated each time, the existing model is updated,
Figure BDA00032887627400000912
Figure BDA00032887627400000913
wherein the content of the first and second substances,
Figure BDA00032887627400000914
for the observation model, l is the learning rate.
Step three, adaptively updating the target characteristic model, and calculating the image definition C of the target area of the current frame in real timecurAnd adjusting the learning rate of model updating by calculating the difference value between the basic definition and the current frame definition.
And extracting gradient values of the image I in the horizontal and vertical directions through a Sobel operator, and further calculating a Tenengrad value of the image definition function.
Figure BDA0003288762740000101
Figure BDA0003288762740000102
Figure BDA0003288762740000103
Wherein, the convolution kernels of Sobel in the horizontal direction and the vertical direction are respectively Gx、GyS (x, y) is a gradient expression at the point (x, y), and n is the total number of pixels in the evaluation region.
The learning rate of the model is calculated by the value of Tenengrad.
Figure BDA0003288762740000104
Wherein l represents the learning rate of the current frame feature model update, CcurRepresenting the sharpness value, C, of the target region of the current framecur-1Representing the sharpness value, L, of the target region of the previous framebaseFor basic learning rate, TcIs a sharpness threshold, if the sharpness is lower than the threshold TcAnd immediately adjusting the learning rate to 0, and stopping updating the target characteristic model to avoid the model being polluted.
Sharpness threshold Tc0.5, basic learning rate Lbase=0.02。
And step four, establishing a target re-detection mechanism, and calculating by combining the target characteristic response matrix obtained by calculation in the step two to obtain an average peak value correlation energy value APCE.
Figure BDA0003288762740000105
Wherein, FmaxRepresenting the peak of the response, FminIndicates the lowest value of the response, Fx,yThe response value of the (x, y) position in the response map is shown.
If the maximum response value and the APCE value are both smaller than the preset threshold value Tmax_resposAnd TAPCEThen, the target feature model M in the previous frame image is saved0Once the target is shielded for a plurality of continuous frames, the updating of the target characteristic model is stopped, and the target rechecking is started at the same timeAnd (5) measuring the mode.
Step five, matching the target to be tracked, and obtaining a new target set D ═ D { D ] to be tracked through a YOLOv4 target detection algorithm1,D2,D3,…,Di}. Calculating HOG characteristics, color histogram characteristics and gray level characteristics of all candidate targets in the set D, respectively carrying out column vectorization on respective three characteristic vectors, and longitudinally connecting the three characteristic vectors to form the target
Figure BDA0003288762740000106
Wherein, HOGiHOG feature, P, representing the ith candidate objectiColor histogram feature, Q, representing the ith candidate objectiRepresenting the gray scale feature of the ith candidate object. Set of feature models M ═ { M ═ M1,M2,M3,…,Mi}. Respectively calculating candidate target characteristic models M by applying related filtering methodiAnd the target feature model M saved in the fourth step0A characteristic response matrix in between. Taking a target characteristic model M corresponding to the maximum value of the peak responseiAnd as an initial tracking model, continuing to execute the tracking algorithm in the step two to track the target.
And step six, repeating the step two to the step five, and realizing target detection and tracking based on the conditions of image definition and tracking stability.
The specific implementation mode of the real-time video stream intelligent switching method is as follows:
step 1, establishing an experiment sample set, wherein videos in the experiment sample set come from a video quality evaluation database, the average subjective value of the videos in the video quality evaluation database is 0-100 minutes, and the videos are classified according to the average subjective value of the videos, namely, the average subjective value is in
Figure BDA0003288762740000111
Video of an interval is noted as type VTj,j=0,1,2,…,N。
And constructing a video quality evaluation library by using 6 source videos in the LIVE database, the collected lossless videos of 6 groups of television sports programs and the original videos shot by 6 groups of high-speed motion tracking shooting systems. The resolution of each video was 3840 × 2160, each video had 800 frames in total, and the FPS was set to 30. The content in the video in the database comprises scenery, individual sports, etc.
Each group of videos shot by the high-speed motion tracking shooting system is divided into 10 short videos, and each short video contains 80 frames of content. Millet L65M5-5A display was selected, and 50 subjects were selected to participate in the experiment. The testee watches a lossless live sports video first and then watches a group of high-definition videos shot in real scenes, the playing sequence of the videos appears randomly, and a 3s full-black video is arranged between every two videos as a transition. Each lossless live sports video and a group of live-action shot test videos are called as 1 group, 6 groups of videos are totally obtained, and 6 groups of videos are played randomly. And after each subject watches one video, giving scores until 6 groups of videos are watched and giving corresponding scores, wherein the scores required to be given by the viewers comprise the perceived quality and the acceptability of the videos, the score range is from 0 to 100, the perceived quality and the acceptability score are weighted and averaged to obtain a video average subjective value, then the video average subjective value is used as a label of the video, and a video set and the corresponding label are combined to form a video quality evaluation library. And then, slicing the database to be used as input of a video average subjective value regression prediction network and a classification network, wherein the slicing processing is specifically as follows:
changing each video into 960X540 resolution video through resize operation, dividing each video into 3X 8X 960X540 video segments, generating 18 types of videos and 1800 video segments, wherein 300 video segments are uniformly selected as a test sample set, and 1500 video segments are selected as a training sample set. The average subjective value of the video clips is the average subjective value of the corresponding previous high-definition video.
Step 2, constructing a deep convolutional neural network model, and firstly constructing a pre-classification network which is used for the type VT corresponding to the videojClassifying, wherein the pre-classification network comprises 6 3D convolution layers, 4 3D maximum pooling layers, 2 full-connection layers and a 10-classification output layer, classification results are obtained after calculation of a softmax function, and a loss function uses cross entropyThe function, cross entropy L, is calculated as follows:
L=-[ylogy+(1-y)log(1-y)] (17)
where y represents the average subjective value,
Figure BDA0003288762740000112
the prediction score is represented.
The input to the network is set to a video clip size of 3 x 8 x 960x 540. The 6 3D convolutional layers comprise Conv1, Conv2, Conv3a, Conv3b, Conv4a, Conv4b, the 4 3D maximum pooling layers comprise Pool1, Pool2, Pool3, Pool4, and the two fully connected layers comprise FC5, FC 6.
Where Conv1 indicates the first convolutional layer, the convolutional kernel size is 3 × 3 × 3, and d. tran et al propose that a 3 × 3 × 3 convolutional kernel is most effective [6], so that the convolutional kernel sizes of the following convolutional layers are all 3 × 3 × 3, 64 convolutional kernels are used, and the edge filling (padding) method is used during convolution, so that the input layer is convolved to obtain a 64 × 8 × 960 × 540 feature map.
Pool1 denotes the first layer pooling layer, the size of which is 1 × 3 × 3, the 1 in time domain is to not merge the temporal signals early, while satisfying the 8 frame length of the video input, and the feature map output is 64 × 8 × 320 × 180 after the convolutional layer passes through the pooling layer.
Conv2 shows the second convolutional layer, the convolutional kernel size is 3 × 3 × 3, 128 convolutional kernels are used, the edge filling method is used during convolution, and the output after convolution is a 128 × 8 × 320 × 180 characteristic diagram.
Pool2 represents the second pooling layer, the pooling kernel size of which is 2 × 3 × 3, the start of merging signals in the time domain, and the profile output after passing through this pooling layer is 128 × 4 × 107 × 60.
Conv3a shows the feature map of the third layer, with a convolution kernel size of 3 × 3 × 3, using 256 types of convolution kernels, and convolution, the edge-on-edge filling method, and the output after convolution is 256 × 4 × 107 × 60.
Conv3b shows the feature map of the fourth convolutional layer, with a convolutional kernel size of 3 × 3 × 3, using 256 convolutional kernels for convolution, the edge after convolution is 256 × 4 × 107 × 60 using the padding method.
Pool3 represents the third pooling layer with a pooling kernel size of 2 x 3 and a characteristic map output of 256 x 2 x54 x 30 after passing through the pooling layer.
Conv4a shows the fifth layer convolution layer, the convolution kernel size is 3 × 3 × 3, 512 convolution kernels are used, the filling method is used for the edges during convolution, and the output after convolution is a characteristic diagram of 512 × 2 × 54 × 30.
Conv4b shows the sixth convolutional layer, the convolutional kernel size is 3 × 3 × 3, 512 convolutional kernels are used, the edge filling method is used during convolution, and the output after convolution is a characteristic map of 512 × 2 × 54 × 30.
Pool4 denotes the fourth pooling layer with a pooling kernel size of 2 × 3 × 3, and a characteristic map output of 512 × 1 × 18 × 10 after passing through the pooling layer.
FC5 denotes the first fully connected layer, with an input of 512 × 1 × 18 × 10 ═ 96160, an output of 46080, and a dropout of 0.5.
FC6 represents the second tier fully connected tier with an input of 46080, an output of 23040, and a dropout of 0.5. And the output layer outputs a one-dimensional array for representing ten classifications.
And then constructing a video average subjective value regression prediction network, wherein the video average subjective value regression prediction structure is used for predicting the video average subjective value y, and the network structure uses a regression prediction node to replace an N classification output layer of the pre-classification network. And loading parameters such as weight parameters, bias items and the like of the pre-classification network through transfer learning, wherein the loaded parameters comprise parameters of all convolution layers and pooling layers, and discarding the parameters of all connection layers in the pre-classification network. The loss function uses a Mean Square Error (MSE) loss function:
Figure BDA0003288762740000121
where N represents a total of N video segments. The pre-classification network and the video average subjective value regression prediction network jointly form a deep convolutional neural network model.
And 3, training the deep convolutional neural network model, and inputting the training sample set into the deep convolutional neural network model constructed in the step 2 for training after initialization. And (3) performing iterative optimization on the deep convolutional neural network model constructed in the step (2) by adopting a random gradient descent method, calculating a loss function and gradient of the deep convolutional network after each iteration, and optimizing the weight and bias term of the deep convolutional neural network so as to seek the optimal deep convolutional neural network model for the current training.
When training the pre-classification network, randomly reading 100 segments from 1500 segments of videos in a training set as input, and then participating in training by combining the classification predicted values of the 100 video segments with a loss function, wherein the number of training iterations is 10000. And (3) by utilizing transfer learning, taking the weight and the bias item of the pre-classification network as a pre-training model of the video average subjective value regression prediction network, and then training the video average subjective value regression prediction network to give a predicted value of the video average subjective value.
In the regression prediction network, for 100 videos input each time, 4 types of videos are randomly selected from 18 types of high-definition videos, 25 videos are randomly selected from each type of video segments corresponding to the 4 types of videos, and then the 100 video segments are placed into the video average subjective value regression prediction network for training to obtain prediction scores of the videos. Then, the prediction scores of the 100 video segments are not all put into a loss function, but the prediction values of 25 video segments from the same type of video are averaged to obtain an average value of 4 types of video prediction, and then the 4 average scores are put into the loss function to participate in training optimization. The method can reduce the influence of the result with poor training effect on the loss function, enhance the convergence of the loss function and improve the training efficiency of the neural network.
And 4, switching video stream signals in real time, judging whether the multi-channel video contains a tracked target, inputting the video into the regression prediction network trained in the step 2 to perform video visual quality real-time prediction on the data of the multi-channel video signals when the video contains the tracked target and the visual area of the tracked target is larger than a set threshold value d, selecting a video stream ID with the highest visual quality, and outputting the video stream signal to a display control box screen.
The intelligent video switching method takes the following 3 factors into consideration:
(1) visibility Vi。ViRepresenting a video stream CiWhether the target O is included in the screen of (1). Vi0 indicates that the object O is not captured; v i1 indicates that the object is detected in the surveillance video image.
(2) Visible area Si。SiRepresenting a video stream CiThe pixel area occupied by the target O in the image.
(3) Video mean subjective value VQAi。VQAiRepresenting a video stream CiThe picture quality score value of (a).
The video intelligent switching method firstly considers the comparison of target visual areas with visual features, and when the target visual areas are larger than a set threshold value d, the comparison of video quality evaluation scores is introduced. For each moment, the switching method of the double video streams is as follows:
(1) for the tracked target O, firstly, the target visibility Vi is calculated according to the central coordinate values of the tracked target O in the two paths of videos, so that a camera capable of detecting the target is screened, and the video stream ID is stored in the set { C }.
(2) If the set { C } is empty, no scheduling task is performed, and a null value is returned; if only one element C is present in { C }iI.e. only camera C; if the target is detected, the return value is Ci. If the number of the elements in { C } is 2, then step (3) is entered.
(3) Calculating the pixel area Si, namely the visible area, of the tracking target O in all the video streams in the set { C }, comparing the visible area with a set threshold value, and storing the video stream ID corresponding to the tracking target picture exceeding the threshold value in the set { S }.
(4) And (4) inputting the video pictures corresponding to the video streams in the set { S } correspondence into the video quality evaluation network trained in the step (2) for calculation, and returning the video stream ID with the highest video quality.
Generally, the larger the pixel area (visible area) occupied by the object in the image frame is, the higher the video quality is, and the clearer the object displayed by the shot frame is, it can be determined as the video stream with the best view and feel for watching the object. According to the multi-camera cooperative target tracking method provided by the section, a camera with the highest video quality and the larger pixel area occupied by the target in the monitored image is selected as the final output, the camera is scheduled to undertake the target tracking task, and the goal of automatic cooperative double-camera seamless tracking rebroadcasting of the intelligent video monitoring system is achieved.
By adopting the visual servo high-speed moving target tracking camera system, the tracking camera is responsible for tracking the target, the recording and broadcasting camera fixed on the stable platform system captures the action close-up of the player, meanwhile, the close-up picture further finely adjusts the position of the tracking target in the recording and broadcasting picture through a secondary target detection algorithm, the secondary target detection algorithm is a deep learning algorithm, and the optimal position and the best performance effect of the tracking target in the recording and broadcasting picture are realized by learning the competition broadcasting picture and adjusting the azimuth angle of the stable platform and the optical zoom parameters such as the focal length and the aperture of the recording and broadcasting camera.
The invention can output stable and clear broadcast-grade pictures in real time by field test of national alpine skiing center and shooting of alpine skiers.
In summary, the above description is only a preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (6)

1. The utility model provides an intelligence tracking shooting system to long-distance high-speed moving target which characterized in that: comprises an optical system, a stable platform system and a display control system;
the optical system comprises a panoramic tracking camera and a professional close-up video camera; the optical system is fixed on the stable platform system; the optical system is used for capturing and tracking a target to be shot in a large-range panorama and shooting a close-up picture of a distant target;
the stable platform system comprises a two-dimensional stable rotary table and a rotary table controller module which is matched with and controls the stable platform; the two-dimensional stabilizing turntable is used for loading the optical system; the turntable controller module is used for driving the two-dimensional stable platform to rotate so that the optical system can complete the function of tracking shooting;
the display control system comprises: the system comprises an image acquisition card, a software control system and a display system;
the image acquisition card is used for acquiring real-time image data of the panoramic tracking camera and the professional close-up video camera;
the software control system is used for processing the real-time image data acquired by the image acquisition card and realizing the functions of target detection, target tracking and real-time video stream intelligent switching;
the software control system comprises a central control unit for carrying an upper computer program, a video tracker module for realizing target detection and tracking and a power supply module;
the software control system sends instructions through the central control unit to realize system power switch, target manual and automatic detection conversion, close-up camera lens focal length and aperture adjustment, stable platform two-dimensional steering, image processing, target tracking, path planning shooting, image display and picture recording; the image processing comprises display parameter adjustment and parameter character embedding of the pictures of the panoramic tracking camera and the professional close-up video camera; the path planning shooting means that N pixel points are preset in a global tracking picture to form a track, so that the two-dimensional stable platform carries out tracking shooting according to the track; the target tracking means that a central control unit is used for delimiting a detection area in a picture of a panoramic tracking camera, a button is used for switching a manual target selection mode and an automatic target detection mode, a target detection tracking method is used for target detection and tracking, a stable platform can rotate along with a target, the target is kept to be always presented in a close-up picture of a professional close-up camera, then, the target in the close-up picture is subjected to secondary detection tracking by adopting a self-adaptive detection tracking algorithm, the position of the tracked target in the close-up picture and the focal length and aperture of the close-up camera are finely adjusted, and the target can be located at the optimal position of the picture;
the display system is used for displaying the close-up picture of the panoramic tracking camera and a distant target and simultaneously displaying the information of the two-dimensional stable turntable and the optical system;
the optical system is connected with a video tracker module in the display control system, and the stable platform system is connected with the display control system.
2. The intelligent tracking shooting system for the remote high-speed moving target as claimed in claim 1, characterized in that: the system also comprises an optical system and a stable platform system which are called as a secondary acquisition system; the image acquired by the secondary acquisition system is processed by an image acquisition card and a software control system, the target is detected and tracked by adopting a target detection and tracking method, then the target in the close-up picture is detected and tracked for the second time by adopting a self-adaptive detection and tracking algorithm, and the position of the tracked target in the close-up picture and the focal length and the aperture of the close-up camera are finely adjusted and tracked, so that the target is in the optimal position of the picture; the multi-channel video stream is evaluated and automatically switched by a real-time video stream intelligent switching method, and a video picture with better impression is output in real time by a display system.
3. The intelligent tracking shooting system for the remote high-speed moving target as claimed in claim 2, characterized in that: the real-time video stream intelligent switching method can evaluate and automatically switch a plurality of paths of video streams, and comprises the following steps:
the method comprises the steps of firstly, constructing a database, wherein the database consists of a LIVE database, a newly acquired high-definition video and a video average subjective value corresponding to the video; classifying the videos in the database according to the average subjective value
Figure FDA0003288762730000021
Video of an interval is noted as type VTjJ is 0, 1, 2, …, N; randomly extracting samples from the database to establish an experiment sample set;
step two of the real-time video stream intelligent switching method, and the deep convolutional neural network model is constructed, wherein the deep convolutional neural network model comprises a pre-classification network and a video average main networkAn observation regression prediction network; the pre-classification network is used for predicting video type VTjPredicting a score interval where the average subjective value of the video is located; the video average subjective value regression prediction network is used for predicting the video average subjective value y;
step three, training a deep convolutional neural network model, initializing, and inputting the experimental sample set in the step one into the deep convolutional neural network model constructed in the step two for training; performing iterative optimization on the deep convolutional neural network model constructed in the step two by adopting a random gradient descent method, calculating a loss function and gradient of the deep convolutional network after each iteration, and optimizing the weight and bias term of the deep convolutional neural network so as to seek the optimal deep convolutional neural network model for the current training;
and step four, switching the video stream signals in real time, judging whether the multi-channel video contains a tracked target, inputting the video into the regression prediction network trained in the step three when the video contains the tracked target and the visual area of the tracked target is larger than a set threshold value d, performing video visual quality real-time prediction on the data of the multi-channel video signals, selecting the video stream ID with the highest visual quality, and outputting the video stream signals to a screen of a display system.
4. The intelligent tracking shooting system for the remote high-speed moving target as claimed in claim 3, characterized in that: the second real-time video stream intelligent switching method is realized in the following specific manner,
constructing a deep convolutional neural network model, namely constructing a pre-classification network firstly, wherein the pre-classification network is used for the type VT corresponding to the input videojPredicting, wherein the presorting network comprises a plurality of 3D convolution layers, a 3D maximum pooling layer, a full-link layer and an N-classified output layer, and a classification result, namely a video type VT is obtained after calculation of a softmax functionjThe loss function uses a cross entropy function, the cross entropy L is calculated as follows:
Figure FDA0003288762730000022
where y represents the average subjective value,
Figure FDA0003288762730000023
representing a prediction score;
then constructing a regression prediction network of the average subjective value of the video, wherein the regression prediction structure of the average subjective value of the video is used for predicting the average subjective value y of the video, and the network structure uses a regression prediction node to replace an N classification output layer of a pre-classification network; loading the weight and bias parameters of a convolution layer and a pooling layer in the pre-classification network through transfer learning, and abandoning the parameters of a full connection layer in the pre-classification network; the loss function uses a Mean Square Error (MSE) loss function:
Figure FDA0003288762730000031
wherein N represents a total of N video segments; the pre-classification network and the video average subjective value regression prediction network jointly form a deep convolutional neural network model.
5. The method for realizing the target detection tracking method based on the system as claimed in claim 1, characterized in that: comprises the following steps of (a) carrying out,
step one, detecting video content by using a target detection method to obtain a candidate target to be tracked and a target area, and comparing confidence coefficient of the candidate target with a preset threshold TdetectScreening candidate targets, and forming a target set D ═ D by using n targets to be tracked obtained through screening1,D2…, Di, …, Dn }, wherein Di represents the ith target to be tracked; storing the center position and the bounding box information of each target simultaneously;
step two, performing feature extraction on the information in the boundary box of the determined tracking target in the first frame from the tracking target determined in the target set D to be tracked obtained in the step one to obtain the featuresUsing the feature matrix as a target feature model
Figure FDA0003288762730000032
And calculating the area of the tracking target in the first frame image to obtain the definition C of the area of the tracking targetcur
Starting from the second frame image, calculating a characteristic response matrix of the target of the previous frame and the current frame, wherein the position of the response peak value is the central position of the target of the current frame, and further acquiring the characteristic matrix in the target area of the current frame
Figure FDA0003288762730000033
Then judging the target feature matrix of the previous frame
Figure FDA0003288762730000034
And
Figure FDA0003288762730000035
change versus feature model
Figure FDA0003288762730000036
Updating the model so as to better adapt to the change of the tracking target;
Figure FDA0003288762730000037
wherein l is the learning rate;
step three, the learning rate l updated in real time is taken into formula (3) to realize the target characteristic model
Figure FDA0003288762730000038
Adaptive updating of (3); calculating the image definition C of the current frame target area in real timecurBy calculating CcurAnd Ccur-1The difference value of the target tracking time is adjusted, the learning rate l of model updating is adjusted, and the target tracking precision is improved;
Figure FDA0003288762730000039
wherein, Ccur-1Representing the sharpness value, L, of the target region of the previous framebaseFor basic learning rate, TcIs a sharpness threshold, if the sharpness is lower than the threshold TcIf so, immediately adjusting the learning rate to 0, and stopping updating the target characteristic model to avoid the model being polluted;
step four, calculating by adopting the target characteristic response matrix obtained in the step two to obtain an average peak correlation energy value APCE;
Figure FDA00032887627300000310
wherein, FmaxRepresenting the peak of the response, FminIndicates the lowest value of the response, Fx,yA response value representing an (x, y) position in the response map;
when the response peak value is smaller than the preset threshold value Tmax_resposAnd APCE values are all smaller than the preset TAPCEThen the tracking target is interfered, and the target characteristic model M in the previous frame image is stored0If the target is interfered by a plurality of frames continuously, stopping updating the target characteristic model, simultaneously starting a target re-detection mode, and executing the step five; otherwise, continuing the target tracking in the second step and the third step;
step five, matching the target to be tracked, and obtaining a new target D to be tracked through a target detection algorithm1,D2,D3,…,Di(ii) a Respectively calculating candidate target characteristic model M1,M2,M3,…,Mi(ii) a Respectively calculating candidate target characteristic model MiAnd the target feature model M saved in the fourth step0A characteristic response matrix therebetween; taking a corresponding target characteristic model when the peak response is maximum, taking the model as an initial tracking model, and continuously executing the tracking algorithm in the second step;
and step six, repeating the step two to the step five, and realizing target detection and tracking based on the conditions of image definition and tracking stability.
6. The method of claim 5, wherein: the intelligent tracking shooting relay broadcasting system is used for realizing real-time tracking and shooting of a long-distance high-speed moving target, a panoramic tracking camera picture is used for detecting the target in a large view field and carrying out primary tracking, a professional close-up camera picture is used for detecting the target, the rotation angle of the two-dimensional stable platform is corrected, and the precision of the tracking shooting system in a long-distance scene is further improved; the method for real-time evaluation of video quality and intelligent switching and clipping of video signals are combined, and the multi-path tracking shooting system can achieve collaborative shooting of the same target.
CN202111156436.6A 2021-09-10 2021-09-30 Intelligent tracking shooting system for long-distance high-speed moving target Active CN113838098B (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN2021110608507 2021-09-10
CN202111060850 2021-09-10

Publications (2)

Publication Number Publication Date
CN113838098A true CN113838098A (en) 2021-12-24
CN113838098B CN113838098B (en) 2024-02-09

Family

ID=78967700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111156436.6A Active CN113838098B (en) 2021-09-10 2021-09-30 Intelligent tracking shooting system for long-distance high-speed moving target

Country Status (1)

Country Link
CN (1) CN113838098B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115171200A (en) * 2022-09-08 2022-10-11 深圳市维海德技术股份有限公司 Target tracking close-up method and device based on zooming, electronic equipment and medium
CN116184812A (en) * 2023-04-24 2023-05-30 荣耀终端有限公司 Signal compensation method, electronic equipment and medium
CN116258811A (en) * 2023-05-08 2023-06-13 北京德风新征程科技股份有限公司 Information transmission method, apparatus, electronic device, and computer-readable medium
CN116977902A (en) * 2023-08-14 2023-10-31 长春工业大学 Target tracking method and system for on-board photoelectric stabilized platform of coastal defense
CN117768597A (en) * 2022-09-16 2024-03-26 广州开得联智能科技有限公司 Guide broadcasting method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696128A (en) * 2020-05-27 2020-09-22 南京博雅集智智能技术有限公司 High-speed multi-target detection tracking and target image optimization method and storage medium
CN111970434A (en) * 2020-07-22 2020-11-20 吉林省智擎工业软件研究院有限公司 Multi-camera multi-target athlete tracking shooting video generation system and method
CN112927264A (en) * 2021-02-25 2021-06-08 华南理工大学 Unmanned aerial vehicle tracking shooting system and RGBD tracking method thereof

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696128A (en) * 2020-05-27 2020-09-22 南京博雅集智智能技术有限公司 High-speed multi-target detection tracking and target image optimization method and storage medium
CN111970434A (en) * 2020-07-22 2020-11-20 吉林省智擎工业软件研究院有限公司 Multi-camera multi-target athlete tracking shooting video generation system and method
CN112927264A (en) * 2021-02-25 2021-06-08 华南理工大学 Unmanned aerial vehicle tracking shooting system and RGBD tracking method thereof

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115171200A (en) * 2022-09-08 2022-10-11 深圳市维海德技术股份有限公司 Target tracking close-up method and device based on zooming, electronic equipment and medium
CN115171200B (en) * 2022-09-08 2023-01-31 深圳市维海德技术股份有限公司 Target tracking close-up method and device based on zooming, electronic equipment and medium
CN117768597A (en) * 2022-09-16 2024-03-26 广州开得联智能科技有限公司 Guide broadcasting method, device, equipment and storage medium
CN116184812A (en) * 2023-04-24 2023-05-30 荣耀终端有限公司 Signal compensation method, electronic equipment and medium
CN116258811A (en) * 2023-05-08 2023-06-13 北京德风新征程科技股份有限公司 Information transmission method, apparatus, electronic device, and computer-readable medium
CN116977902A (en) * 2023-08-14 2023-10-31 长春工业大学 Target tracking method and system for on-board photoelectric stabilized platform of coastal defense
CN116977902B (en) * 2023-08-14 2024-01-23 长春工业大学 Target tracking method and system for on-board photoelectric stabilized platform of coastal defense

Also Published As

Publication number Publication date
CN113838098B (en) 2024-02-09

Similar Documents

Publication Publication Date Title
CN113838098B (en) Intelligent tracking shooting system for long-distance high-speed moving target
CN108419014B (en) Method for capturing human face by linkage of panoramic camera and multiple capturing cameras
US9744457B2 (en) System and method for optical player tracking in sports venues
CN107659774B (en) Video imaging system and video processing method based on multi-scale camera array
CN103716594B (en) Panorama splicing linkage method and device based on moving target detecting
US11310418B2 (en) Computer-implemented method for automated detection of a moving area of interest in a video stream of field sports with a common object of interest
US20180139374A1 (en) Smart and connected object view presentation system and apparatus
US20070248283A1 (en) Method and apparatus for a wide area virtual scene preview system
CN105187723B (en) A kind of image pickup processing method of unmanned vehicle
CN110764537B (en) Automatic tripod head locking system and method based on motion estimation and visual tracking
US9418299B2 (en) Surveillance process and apparatus
CN112207821B (en) Target searching method of visual robot and robot
CN113436130B (en) Intelligent sensing system and device for unstructured light field
CN110910489B (en) Monocular vision-based intelligent court sports information acquisition system and method
CN110378928B (en) Dynamic and static matching target detection and tracking method
CN114697528A (en) Image processor, electronic device and focusing control method
JP2005223487A (en) Digital camera work apparatus, digital camera work method, and digital camera work program
JPH09322053A (en) Image pickup method for object in automatic image pickup camera system
CN113688680B (en) Intelligent recognition and tracking system
CN113382304B (en) Video stitching method based on artificial intelligence technology
KR101997799B1 (en) System for providing image associated with region of interest
JP3393969B2 (en) Method and apparatus for recognizing subject in automatic camera system
CN115527176B (en) Target object identification method, device and equipment
CN117812225B (en) Monitoring method, system and storage medium of self-adaptive security camera
CN113507565B (en) Full-automatic servo tracking shooting method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant