CN110658916A

CN110658916A - Target tracking method and system

Info

Publication number: CN110658916A
Application number: CN201910882940.0A
Authority: CN
Inventors: 张龙杰; 谢晓方; 孙涛; 贺英政; 王诚成; 刘厚君; 孙晨峰; 李威; 栗泽宏
Original assignee: Naval Aeronautical University
Current assignee: Naval Aeronautical University
Priority date: 2019-09-18
Filing date: 2019-09-18
Publication date: 2020-01-07

Abstract

The invention is suitable for the technical field of aviation, and provides a target tracking method and a target tracking system. The method comprises the following steps: acquiring head posture information of personnel, and generating an image acquisition instruction according to the head posture information, wherein the image acquisition instruction is used for controlling a servo holder to be connected with binocular image acquisition equipment to acquire binocular target images in a preset area; acquiring a binocular target image and performing target identification to obtain target information; and calculating the central pixel deviation of the target in real time according to the target information, and generating a servo control instruction according to the central pixel deviation, wherein the servo control instruction is used for controlling a servo holder to be connected with the binocular image acquisition equipment to track the target in real time. The invention tracks the head movement of a person in a follow-up manner, quickly detects a preset target, accurately measures the target distance, increases the intellectualization and automation of a helmet display aiming system, and reduces the operation burden of the person.

Description

Target tracking method and system

Technical Field

The invention belongs to the technical field of aviation, and particularly relates to a target tracking method and a target tracking system.

Background

In the field of aviation, a Helmet Display System and a Helmet aiming System jointly form a Helmet Integrated Display aiming System (HIDSS). From the development history, the comprehensive helmet display aiming system in the world goes through three development stages, namely a helmet sight, a helmet display system and a comprehensive helmet display aiming system. The early helmet sight mainly adopts collimation optics principle, will aim simple sighting symbol such as halo or crosshair and project on personnel's right eye front monocular or wear the goggles on, thereby adopts mechanical link means to make helmet and seat rudder top link to each other and reachs the sighting line position, the helmet sight main shortcoming of this stage is that weight is too big, and personnel can't wear for a long time, does not accord with the ergonomic requirement.

With the increasing perfection of helmet sights and helmet display systems, a helmet comprehensive display aiming system is finally developed to help personnel to complete tasks such as target tracking, air fighting, ground attack, cooperative combat and the like. Nowadays, the helmet integrated display aiming system becomes important equipment indispensable for armed helicopters and fighter personnel. However, the existing helmet comprehensive display aiming system still has the problems of low target positioning speed, low precision, insufficient dexterity, insufficient intellectualization and automation degree and the like.

Disclosure of Invention

In view of this, the embodiment of the invention provides a target tracking method and system, so as to solve the problems of slow target positioning speed, low precision, insufficient dexterity, and insufficient intelligence and automation degree of the existing helmet comprehensive display aiming system.

A first aspect of an embodiment of the present invention provides a target tracking method, including:

acquiring head posture information of personnel, and generating an image acquisition instruction according to the head posture information, wherein the image acquisition instruction is used for controlling a servo holder to be connected with binocular image acquisition equipment to acquire binocular target images in a preset area;

acquiring the binocular target image and performing target identification to obtain target information;

and calculating the central pixel deviation of the target in real time according to the target information, and generating a servo control instruction according to the central pixel deviation, wherein the servo control instruction is used for controlling a servo holder to be connected with the binocular image acquisition equipment to track the target in real time.

Optionally, the obtaining the binocular target image and performing target identification to obtain target information includes:

acquiring the binocular target image and identifying a target according to a preset YOLO network model to obtain target type and confidence information;

carrying out image fusion on the binocular target image based on an SGBM (binocular stereo matching) matching algorithm to obtain a target disparity map;

and determining the position information and the distance information of the target according to the disparity map.

Optionally, the obtaining the binocular target image and recognizing the target according to a preset YOLO network model includes:

acquiring a plurality of target images, and performing target labeling on each target image to obtain a target image training set;

inputting the target image training set into an original YOLO network model for target detection, and updating a weight file of the original YOLO network model according to a detection result;

establishing a preset YOLO network model when the weight file meets a preset condition;

and inputting the binocular target image into a preset YOLO network model for target recognition, and deleting an overlapping detection frame of a recognition result through a Non-Maximum Suppression (NMS) algorithm.

Optionally, the image fusion of the binocular target image based on the SGBM matching algorithm to obtain the target disparity map includes:

acquiring an internal parameter matrix and a distortion matrix obtained by binocular calibration of the binocular image acquisition equipment;

correcting the internal parameter matrix by using a rotation vector of a Rodrigues function to obtain a reprojection matrix, and calibrating the distortion matrix to obtain a remapping matrix;

performing image matching on the binocular gray level image of the binocular target image and the remapping matrix to obtain a first matching result;

and matching the binocular gray-scale image, the reprojection matrix and the first matching result based on an SGBM matching algorithm to obtain a target disparity map.

Optionally, the matching the binocular gray-scale image, the reprojection matrix, and the first matching result based on the SGBM matching algorithm to obtain the target disparity map includes:

respectively carrying out horizontal Sobel processing, image mapping processing and sampling processing on the binocular gray level image to obtain a preprocessed binocular image;

performing gradient cost calculation on the binocular gray level image, performing SAD (Sum of Absolute Difference) cost calculation on the preprocessed binocular image, and performing energy function path accumulation according to two calculation results to obtain preset matching points of the left target image and the right target image;

and performing parallax detection according to the preset matching points, the re-projection matrix and the first matching result to obtain a target parallax image.

Optionally, before calculating the central pixel deviation of the target in real time according to the target information, the method further includes:

acquiring voice information of personnel, and performing voice recognition on the voice information based on a preset TensorFlow system;

and entering an autonomous tracking mode according to the recognition result, wherein the autonomous tracking mode comprises the steps of calculating the central pixel deviation of the target in real time according to the target information, and generating a servo control instruction according to the central pixel deviation.

Optionally, the acquiring of the voice information of the person performs voice recognition on the voice information based on a preset tensrflow system, including:

establishing a voice sample set, performing Frequency domain conversion on the voice sample set and obtaining an MFCC (Mel Frequency Cepstrum Coefficient) feature vector set for calculating Cepstrum features;

establishing an initial TensorFlow system sequentially comprising a full connection layer network, a Bi-RNN network and at least two full connection layers;

inputting the MFCC feature vector set into the initial TensorFlow system to perform voice model training to obtain a preset TensorFlow system;

and acquiring voice information of personnel, and carrying out voice recognition on the voice information based on the preset TensorFlow system.

A second aspect of an embodiment of the present invention provides a target tracking system, including: the system comprises an attitude sensing module, binocular image acquisition equipment, a servo holder and an information processing module for realizing the steps of any one of the target tracking methods provided by the first aspect of the embodiment; the attitude sensing module, the binocular image acquisition equipment and the servo holder are all connected with the information processing module;

the posture sensing module is used for acquiring the head posture information of the person and sending the head posture information to the information processing module;

the servo holder is used for receiving an image acquisition instruction or a servo control instruction generated by the information processing module, controlling the binocular image acquisition equipment to acquire binocular target images in a preset area according to the image acquisition instruction, or controlling the binocular image acquisition equipment to track a target in real time according to the servo control instruction.

Optionally, the target tracking system further includes: the voice acquisition module is used for acquiring voice information of personnel; the voice acquisition module is connected with the information processing module;

the information processing module is further configured to: acquiring the voice information, and carrying out voice recognition on the voice information based on a preset TensorFlow system;

Optionally, the target tracking system further includes: an AR (Augmented Reality) display module; the AR display module is connected with the information processing module;

the AR display module is used for acquiring the target information identified by the information processing module and displaying the target information at a preset position visible to people.

Compared with the prior art, the embodiment of the invention has the following beneficial effects: the embodiment can be applied to a helmet display aiming system, binocular image acquisition equipment is controlled to acquire binocular target images in a preset area through the head posture information of a person, the follow-up tracking of the head movement of the person is realized, and the target images are acquired in time; and then, target identification is carried out on the binocular target image to obtain target information, the central pixel deviation of the target is calculated in real time according to the target information, the preset target is quickly detected, the target distance is accurately measured, the servo holder can be controlled according to the central pixel deviation to be connected with the binocular image acquisition equipment to track the target in real time, the intellectualization and the automation of a helmet display aiming system are increased, and the operation burden of personnel is reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed for the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

Fig. 1 is a schematic flow chart of an implementation of a target tracking method provided in an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a specific implementation of step S102 in FIG. 1;

fig. 3 is a schematic diagram of a specific implementation flow of step S201 in fig. 2;

FIG. 4 is a flowchart illustrating a specific implementation of step S202 in FIG. 2;

FIG. 5 is a flowchart illustrating a specific implementation of step S404 in FIG. 4;

FIG. 6 is a schematic diagram of an implementation flow of another target tracking method provided in the embodiment of the present invention;

fig. 7 is a schematic diagram of a specific implementation flow of step S601 in fig. 6;

fig. 8 is a schematic structural diagram of a target tracking system according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, circuits, and methods are omitted so as not to obscure the description of the present invention with unnecessary detail.

In order to explain the technical means of the present invention, the following description will be given by way of specific examples.

Referring to fig. 1, an implementation flow diagram of an embodiment of a target tracking method is provided, which is detailed as follows:

step S101, acquiring head posture information of personnel, and generating an image acquisition instruction according to the head posture information, wherein the image acquisition instruction is used for controlling a servo holder to be connected with binocular image acquisition equipment to acquire binocular target images in a preset area.

For promoting the automation level of the helmet comprehensive display aiming system and reducing the workload of personnel, the embodiment utilizes the head posture information of the personnel to realize the follow-up control of the binocular image acquisition equipment. Specifically, the head attitude angle information of the person can be acquired, the image acquisition instruction is generated according to the head attitude angle information, synchronous movement of the binocular image acquisition equipment and the head of the person is achieved, the binocular image acquisition equipment is controlled and guided to scan the preset area through head movement, and searching and target detection of a specific airspace are achieved.

Optionally, a servo control signal can be generated according to the head attitude angle information and the corner information of the servo system cradle head, and the servo cradle head realizes the follow-up function of the head of the person.

And S102, acquiring the binocular target image and performing target identification to obtain target information.

Optionally, the target information may include target distance information. The target distance is very important information when piloting the aircraft and carrying out training and combat mission, but utilize the distance information of optoelectronic system measurement at present, mostly be the distance of flight altitude to ground projection point, there are some disparities with actual target distance, so in order to provide more distance information for personnel, binocular image acquisition equipment is adopted to this embodiment, utilize binocular location technique, realize the aircraft to the real-time measurement of full airspace distance information, can utilize binocular image to fuse simultaneously and obtain the target information that the visual field is wider, personnel's field of vision has effectively been promoted.

And S103, calculating the central pixel deviation of the target in real time according to the target information, and generating a servo control instruction according to the central pixel deviation, wherein the servo control instruction is used for controlling a servo holder to be connected with the binocular image acquisition equipment to track the target in real time.

The embodiment can be applied to helmets of warplanes and can also be used in environments such as VR technology for game experience. Specifically, the present embodiment may include two target tracking modes, which are a follow-up search mode and an autonomous tracking mode, respectively; in a follow-up search mode, acquiring head posture information of a person in real time, generating an image acquisition instruction according to the head posture information, acquiring a binocular target image of binocular image acquisition equipment and identifying a target, and if the target is detected, displaying the target information in a visible area of the person to prompt the person to find and lock the target, for example, projecting the target information onto a person lens by utilizing AR display equipment to enable the person to simultaneously perceive the target information in a plurality of airspace directions.

After the personnel confirm the target, the autonomous tracking mode can be switched to through voice information. Under the autonomous tracking mode, the binocular image acquisition equipment is disconnected from following with head movement of personnel, central pixel deviation of a target is calculated in real time according to target information, a servo control instruction is generated according to the central pixel deviation, and the binocular image acquisition equipment is controlled by a servo holder to aim at the target, so that the following tracking of the target is realized.

Optionally, the embodiment may adopt a method based on a kernel correlation filtering theory to track the identified target. Specifically, starting a binocular image acquisition device and a servo holder, then creating a window matched with the resolution of the binocular image acquisition device, initializing a nuclear correlation filtering tracking object, and taking a target detected in the last frame of a target detection stage or the target position in the last frame of image in the target tracking process as a candidate area, namely calibrating an initial tracking frame; training and detecting each frame of image circularly by using a kernel correlation filtering tracking algorithm, updating a target tracking frame, and displaying in a video image in an overlapping manner; obtaining the coordinate of the upper left point of the target area in the window and the size of the tracking frame according to the tracking frame result updated by the kernel correlation filtering, thereby obtaining the center coordinate of the tracking frame and determining the deviation amount of the tracking frame and the window; and dividing the deviation value by 10 to be used as the change value of the PWM (Pulse Width Modulation) Pulse Width to control the steering engine.

In one embodiment, referring to fig. 2, a specific implementation process of obtaining the binocular target image and performing target identification to obtain target information in step S102 includes:

step S201, obtaining the binocular target image and identifying the target according to a preset YOLO network model to obtain the target type and confidence information.

And S202, carrying out image fusion on the binocular target image based on an SGBM matching algorithm to obtain a target disparity map.

And step S203, determining the position information and the distance information of the target according to the disparity map.

Compared with the traditional R-CNN series network algorithm, the preset YOLO network model of the embodiment adopts the training and detection of the single-pipeline strategy YOLO network model, namely the categories and the positions of different targets are directly predicted in the single CNN network without classification and regression, so that the algorithm is simple and the target recognition speed is high. In addition, the preset YOLO network model is formed by performing convolution on the whole picture, so that the view field is larger on the identification target, the background is not easy to be misjudged, meanwhile, the generalization capability of the preset YOLO network model is strong, and the model robustness is high during migration. Optionally, the target information may include a target type, confidence information, position information of the target, and distance information, but is not limited to the above information types.

Referring to fig. 3, a specific implementation process of acquiring the binocular target image and identifying the target according to the preset YOLO network model in step S201 includes:

step S301, a plurality of target images are obtained, and target labeling is carried out on each target image to obtain a target image training set.

Step S302, inputting the target image training set into an original YOLO network model for target detection, and updating a weight file of the original YOLO network model according to a detection result.

Step S303, establishing a preset YOLO network model when the weight file meets a preset condition.

Step S304, inputting the binocular target image into a preset YOLO network model for target recognition, and deleting the overlapped detection frames of the recognition result through an NMS algorithm.

Referring to fig. 4, the specific implementation process of performing image fusion on the binocular target image based on the SGBM matching algorithm to obtain the target disparity map in step S202 includes:

step S401, obtaining an internal parameter matrix and a distortion matrix obtained by binocular calibration of the binocular image acquisition equipment.

And S402, correcting the internal parameter matrix by using the rotation vector of the Rodrigues function to obtain a re-projection matrix, and calibrating the distortion matrix to obtain a re-projection matrix.

And S403, carrying out image matching on the binocular gray level image of the binocular target image and the remapping matrix to obtain a first matching result.

And S404, matching the binocular gray-scale image, the reprojection matrix and the first matching result based on an SGBM matching algorithm to obtain a target disparity map.

In one embodiment, referring to fig. 5, a specific implementation process of matching the binocular grayscale image, the reprojection matrix, and the first matching result based on the SGBM matching algorithm in step S404 to obtain the target disparity map includes:

step S501, horizontal Sobel processing, image mapping processing and sampling processing are respectively carried out on the binocular gray level image, and a preprocessed binocular image is obtained.

Illustratively, the binocular gray level image is subjected to horizontal Sobel operator processing correction, then the corrected binocular gray level image is filtered by using an image mapping method, and finally the filtered binocular gray level image is subjected to sampling processing to obtain a preprocessed binocular image. Optionally, the filter coefficient of the image mapping may be 60, so as to control the binocular grayscale image mapping relationship.

Step S502, gradient cost calculation is carried out on the binocular gray level image, SAD cost calculation is carried out on the preprocessed binocular image, and energy function path accumulation is carried out according to two calculation results to obtain preset matching points of the left target image and the right target image.

The purpose of the cost calculation is to perform similarity detection on the images in the window. For example, the window size in the SAD cost calculation method may be 16, the minimum disparity may be 32, the maximum disparity may be 176, the maximum disparity determines the disparity search range, and the search boundary is minimum disparity + maximum disparity.

And S503, performing parallax detection according to the preset matching points, the reprojection matrix and the first matching result to obtain a target parallax image.

Optionally, the negative parallax detection method in this embodiment may sequentially include three steps of uniqueness detection, left-right consistency detection, and connected region detection. For example, the parallax uniqueness percentage in the uniqueness detection in this embodiment may be 30, the threshold (i.e., the maximum allowable error value) for marking the occlusion point in the left-right consistency detection process may be 1, the allowable floating range of the parallax of the adjacent pixels (beyond this range, two points are not connected) in the connected region detection process may be 2, and when the parallax value of the adjacent pixels is less than 2 and the threshold for marking the occlusion point is satisfied, the two points are considered to be connected. In addition, in the process of detecting the connected region, the number of the pixels in the connected region can be set to be 200, namely when the number of the pixels in the connected region is smaller than 200, a certain point is considered as a noise point, and when the number of the pixels in the connected region is larger than 200, the parallax value of the point is considered to be effective. The SGBM algorithm is utilized to calculate the disparity map of the binocular target image, and the position of the pixel point in the space is calculated on the basis of the disparity map, so that binocular positioning and ranging are realized.

In one embodiment, referring to fig. 6, before calculating the central pixel deviation of the target in real time according to the target information in step S103, the method further includes:

step S601, acquiring voice information of personnel, and performing voice recognition on the voice information based on a preset TensorFlow system.

And step S602, entering an autonomous tracking mode according to the identification result, wherein the autonomous tracking mode comprises the steps of calculating the central pixel deviation of the target in real time according to the target information, and generating a servo control instruction according to the central pixel deviation.

At present, there are two main approaches for implementing speech recognition: the method has the advantages of high speed and high precision, and has the disadvantages of increasing hardware cost and occupying additional hardware space, and the scheme is not suitable in consideration of dexterity and lightweight design thought of a helmet comprehensive display aiming system; secondly, online voice recognition, such as the voice recognition based on the Baidu cloud, is realized by means of the cloud platform, and the scheme has the advantages that a voice recognition algorithm is not required to be designed, only voice acquisition is required, the network support is required, and the smooth network is difficult to ensure in consideration of the application environment of the helmet comprehensive display aiming system, so that the scheme cannot be adopted. The TensorFlow system is used as a Google open source software library, the organization structure is flexible and visual, the model portability is strong, hardware resources can be fully utilized during design, so the embodiment performs voice recognition based on the TensorFlow system, the speed is high, the recognition precision is high, and meanwhile, the intelligent level of the helmet comprehensive display aiming system is improved.

In an embodiment, referring to fig. 7, the specific implementation process of acquiring the voice information of the person in step S601 and performing voice recognition on the voice information based on a preset tensrflow system includes:

step S701, a voice sample set is established, frequency domain conversion is carried out on the voice sample set, and an MFCC feature vector set used for calculating cepstrum features is obtained.

In specific application, the establishment of the voice sample set is to pay attention to diversification of sample data, pronunciation data of people of different ages and sexes under different environments is needed, the voice sample data is large enough to ensure the accuracy of a voice recognition model, in addition, the similarity of the sample data is small as much as possible, and particularly when the sample data is not large enough, the small similarity can ensure a high recognition rate.

Step S702, establishing an initial TensorFlow system which sequentially comprises a full connection layer network, a Bi-RNN network and at least two full connection layers.

For example, the initial tensrflow system of this embodiment may use a full-connection layer network with 3 1024 nodes, then pass through a Bi-RNN network, and finally connect two full-connection layers, where both full-connection layers have a dropout (random deactivation) layer, the activation function uses a Relu function with a truncation, and the truncation value may be 20.

And step S703, inputting the MFCC feature vector set into the initial TensorFlow system to perform voice model training to obtain a preset TensorFlow system.

Step S704, acquiring voice information of personnel, and performing voice recognition on the voice information based on the preset TensorFlow system.

Optionally, the corresponding voice instruction is recognized according to the voice information sent by the person, and the target recognition or tracking step is performed according to the voice instruction. Illustratively, the voice instructions may include a start tracking instruction, an autonomous tracking mode instruction, a follow-up mode instruction, and a system shutdown instruction. After the target is identified, the tracking starting instruction is switched to a target soft tracking process by using a target tracking algorithm; the autonomous tracking mode instruction can be based on a stably tracked target, and then the binocular image acquisition equipment is controlled to follow the target by the servo holder; the follow-up mode instruction can be based on the lost target in the target tracking process, or the servo cradle head controls the binocular image acquisition equipment to follow the head to move to follow the target when the servo cradle head works in the autonomous tracking mode; the system shutdown command may be performed at any time during system operation.

For example, the initial stage of the present embodiment may default to the follow-up mode. In the follow-up mode, the servo cradle head controls the binocular image acquisition equipment to realize follow-up control according to the head movement of a person, voice information of the person can be received after the person detects a target, a target tracking algorithm is called to realize soft tracking of the target after the person sends a tracking starting instruction, and after the target is stably tracked, when the person sends an autonomous tracking mode instruction, the servo cradle head releases follow-up of the binocular image acquisition equipment with the head of the person, so that the follow-up of the target is realized, and meanwhile, the target tracking algorithm is continuously operated. In the process of the autonomous tracking mode, when a person sends a follow-up mode instruction, the servo cradle head releases the follow-up of the binocular image acquisition equipment to the target, so that the head of the person is followed, and meanwhile, a target recognition algorithm is called to continuously detect and recognize the target. At any time of operation, when a person sends a system stop instruction, the servo cradle head stops working, and the target detection algorithm and the tracking algorithm stop working.

According to the target tracking method, the binocular image acquisition equipment is controlled to acquire the binocular target image in the preset area through the head posture information of the person, so that the follow-up tracking of the head movement of the person is realized, and the target image is acquired in time; and then, target recognition is carried out on the binocular target image to obtain target information, the central pixel deviation of the target is calculated in real time according to the target information, the preset target is quickly detected, the target distance is accurately measured, the servo holder can be controlled according to the central pixel deviation to be connected with the binocular image acquisition equipment to track the target in real time, the intellectualization and automation of a helmet display aiming system are increased by combining the voice recognition function and the AR display function, and the operation burden of personnel is reduced.

It should be understood by those skilled in the art that the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present invention.

Fig. 8 is a block diagram of a target tracking system according to an embodiment of the present invention, which corresponds to the target tracking method according to the above-described embodiment. For convenience of explanation, only the portions related to the present embodiment are shown. The system comprises: an attitude sensing module 110, a binocular image acquisition device 120, a servo pan-tilt 130, and an information processing module 140 for implementing the steps of the target tracking method according to any one of the above embodiments; the attitude sensing module 110, the binocular image capturing device 120 and the servo holder 130 are all connected with the information processing module 140.

The posture sensing module 110 collects the head posture information of the person and sends the head posture information to the information processing module 140; the information processing module 140 acquires head posture information of a person, and generates an image acquisition instruction according to the head posture information; the servo cloud deck 130 receives the image acquisition instruction generated by the information processing module 140, and controls the binocular image acquisition equipment 120 to acquire the binocular target image in the preset area according to the image acquisition instruction; the binocular image collecting device 120 collects binocular target images in a preset area and sends the images to the information processing module 140; the information processing module 140 acquires a binocular target image and performs target identification to obtain target information, calculates the central pixel deviation of a target in real time according to the target information, and generates a servo control instruction according to the central pixel deviation; finally, the servo holder 130 receives the servo control command generated by the information processing module 140, and connects the binocular image acquisition device 120 to track the target in real time according to the servo control command.

Optionally, before the posture sensing module 110 collects the head posture information of the person, calibration of the module is performed, which mainly includes accelerometer calibration and magnetometer calibration. Accelerometer calibration is used to remove the zero offset error of the accelerometer, which exists in different degrees due to the different conditions of each sensor, and calibration is necessary to obtain accurate acceleration data. The magnetic field calibration is used for removing zero offset of the magnetometer, and the magnetometer usually has a large zero error, which brings a large measurement error and affects the accuracy of the course angle Z-axis angle measurement, so the accelerometer calibration and the magnetometer calibration are performed in the embodiment.

Optionally, the information processing module 140 further obtains rotation angle information of the servo pan/tilt head 130, forms an error control signal according to the head posture information and the rotation angle information, and controls the servo pan/tilt head according to the error control signal, so as to implement accurate following of the binocular image capturing device 120 on the head movement of the person.

Optionally, the target tracking system may further include: a voice collecting module 150 for collecting voice information of the person; the voice acquisition module 150 is connected with the information processing module 140; the information processing module 140 is further configured to: and acquiring the voice information, performing voice recognition on the voice information based on a preset TensorFlow system, entering an autonomous tracking mode according to a recognition result, wherein the autonomous tracking mode comprises the steps of calculating the central pixel deviation of the target in real time according to the target information, and generating a servo control instruction according to the central pixel deviation.

Optionally, the target tracking system may further include: an AR display module 160; the AR display module 160 is connected to the information processing module 140, and is configured to obtain target information identified by the information processing module 140, and display the target information at a preset position visible to a person, for example, project the target information onto a person's goggles, so that the person can simultaneously perceive target information in multiple airspace directions. AR display module 160 compares in the image on traditional head-up display and the helmet sight lens of the hood, adopt the virtual reality technique with outside battlefield situation projection on personnel's lens of the hood, personnel's perception ability and the flexibility of carrying out the task to outside situation have been strengthened greatly, and very big simplification the structure of light path conduction in the optical system, make system size and weight very reduce, nimble arrange in on the helmet, traditional optical system has been improved because of the quality overweight, the focus skew leads to the fact the cervical vertebra oppression to the personnel, the fatigue that the personnel worn the helmet for a long time is alleviated, the danger of potential damage cervical vertebra has been reduced.

In specific applications, since the wireless transmission signal is easily subjected to electromagnetic interference, the posture sensing module 110 and the information processing module 140 of the embodiment are connected in series by a wire, so as to improve the accuracy of signal reception. Binocular image acquisition equipment 120 can be connected with information processing module 140 through USB, servo cloud platform 130 can be connected with information processing module 140 through the RS232 bus, binocular image acquisition equipment 120 links firmly with servo cloud platform 130 hardware, voice acquisition module 150 can be connected with information processing module 140 through the audio data line, AR display module 160 can be connected with information processing module 140 through the HDMI data line.

Optionally, the target tracking system may include a start tracking mode, an autonomous tracking mode, a follow-up mode, and a system shutdown. The starting tracking mode can be switched to a target soft tracking process by using a target tracking algorithm after a target is identified; the autonomous tracking mode can be based on the stable tracking of the target, and then the binocular image acquisition equipment is controlled to follow the target by the servo holder; the follow-up mode can be based on the target loss in the target tracking process, or the servo cradle head controls the binocular image acquisition equipment to follow the head to move to follow the target when the servo cradle head works in the autonomous tracking mode; system shutdown may occur at any time during system operation.

For example, after the target tracking system is powered on and started, the binocular image collecting device 120, the AR display module 160, and the servo holder 130 are powered on, and meanwhile, initialization is performed through a serial port, a bluetooth connection, and the like. Then the attitude sensing module 110 is calibrated, and after the initialization work is completed, the system enters a default follow-up mode. In the follow-up mode, the servo holder controls the binocular image acquisition equipment to realize follow-up control according to head movement of people, and after a target is detected, target information such as a target distance and confidence coefficient is displayed in an image in a superposed mode. And if the personnel do not send out a voice command for switching the working mode, continuously detecting and framing the target, and maintaining the follow-up state of the head of the personnel. If the person finds that the system has detected a target and wishes the servo system to switch to autonomous tracking mode, a voice command is issued causing the system to switch to autonomous tracking mode.

When the personnel send the autonomous tracking mode instruction, the servo cradle head releases the following of the binocular image acquisition equipment with the head of the personnel, so that the target is followed, and meanwhile, the target tracking algorithm is continuously operated. In the process of the autonomous tracking mode, when a person sends a follow-up mode instruction, the servo cradle head releases the follow-up of the binocular image acquisition equipment to the target, so that the head of the person is followed, and meanwhile, a target recognition algorithm is called to continuously detect and recognize the target. When the personnel send a shutdown instruction, the system automatically disconnects the serial port, the Bluetooth and the like, releases the resources of the binocular image acquisition device 120, closes the servo holder 130 and the attitude sensing module 110, and finally exits the system.

According to the target tracking system, the binocular image acquisition equipment is controlled to acquire the binocular target image in the preset area through the head posture information of the person, so that the follow-up tracking of the head movement of the person is realized, and the target image is acquired in time; and then, target recognition is carried out on the binocular target image to obtain target information, the central pixel deviation of the target is calculated in real time according to the target information, the preset target is quickly detected, the target distance is accurately measured, the servo holder can be controlled according to the central pixel deviation to be connected with the binocular image acquisition equipment to track the target in real time, the intellectualization and automation of a helmet display aiming system are increased by combining the voice recognition function and the AR display function, and the operation burden of personnel is reduced.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

In the embodiments provided by the present invention, it should be understood that the disclosed system and method can be implemented in other ways. For example, the above-described system embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, systems or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated modules/units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the flow of the method according to the embodiments of the present invention may also be implemented by a computer program, which may be stored in a computer-readable storage medium, and when the computer program is executed by a processor, the steps of the method embodiments may be implemented. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable medium may include: any entity or system capable of carrying said computer program code, recording medium, usb disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution medium, etc. It should be noted that the computer readable medium includes content that can be appropriately increased or decreased according to the requirements of legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable media does not include electrical carrier signals and telecommunication signals according to legislation and patent practice.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims

1. A target tracking method, comprising:

2. The target tracking method of claim 1, wherein the obtaining the binocular target image and performing target recognition to obtain target information comprises:

carrying out image fusion on the binocular target image based on an SGBM matching algorithm to obtain a target disparity map;

3. The target tracking method of claim 2, wherein the obtaining the binocular target image and identifying the target according to a preset YOLO network model comprises:

and inputting the binocular target image into a preset YOLO network model for target recognition, and deleting the overlapped detection frames of the recognition result through an NMS algorithm.

4. The target tracking method of claim 2, wherein the image fusion of the binocular target image based on the SGBM matching algorithm to obtain the target disparity map comprises:

5. The target tracking method of claim 4, wherein the matching the binocular gray-scale image, the reprojection matrix, and the first matching result based on the SGBM matching algorithm to obtain the target disparity map comprises:

performing gradient cost calculation on the binocular gray level image, performing SAD cost calculation on the preprocessed binocular image, and performing energy function path accumulation according to two calculation results to obtain preset matching points of the left target image and the right target image;

6. The target tracking method of any one of claims 1 to 5, further comprising, prior to calculating a center pixel bias of a target in real time from the target information:

7. The object tracking method according to claim 6, wherein the acquiring of the voice information of the person and the voice recognition of the voice information based on a preset TensorFlow system comprises:

establishing a voice sample set, and performing frequency domain conversion on the voice sample set to obtain an MFCC feature vector set for calculating cepstrum features;

8. An object tracking system, comprising: an attitude sensing module, a binocular image acquisition device, a servo pan-tilt and an information processing module for implementing the steps of the target tracking method according to any one of claims 1 to 7; the attitude sensing module, the binocular image acquisition equipment and the servo holder are all connected with the information processing module;

9. The target tracking system of claim 8, further comprising: the voice acquisition module is used for acquiring voice information of personnel; the voice acquisition module is connected with the information processing module;

10. The target tracking system of claim 8 or 9, further comprising: an AR display module; the AR display module is connected with the information processing module;