CN112507840A

CN112507840A - Man-machine hybrid enhanced small target detection and tracking method and system

Info

Publication number: CN112507840A
Application number: CN202011388969.2A
Authority: CN
Inventors: 果实; 倪勇; 卢凯良; 王玉翠; 陈彦璋
Original assignee: 716th Research Institute of CSIC
Current assignee: 716th Research Institute of CSIC
Priority date: 2020-12-02
Filing date: 2020-12-02
Publication date: 2021-03-16
Anticipated expiration: 2040-12-02
Also published as: CN112507840B

Abstract

The invention discloses a man-machine hybrid enhanced small target detection and tracking method and a system, wherein the method comprises the following steps: positioning the position of a target area, and acquiring a small target area image; extracting specific image information of the small target area image to obtain a small target detail image; constructing a small target detection and tracking model based on wearable equipment, designing and training a lightweight neural network target detection model, and optimizing the detection and tracking capability of a small target by using a small target detail image in combination with a context information technology and a Kalman filtering algorithm; acquiring new image information circularly by utilizing the steps to serve as a training set for enriching the small target detection and tracking model, and performing migration training on the small target detection and tracking model by utilizing the updated training set to obtain a small target detection and tracking enhancement model; the system comprises an acquisition identification module, a sensor module, a detection and tracking module and a communication module. The invention improves the accuracy of detecting and tracking small targets in the image.

Description

Man-machine hybrid enhanced small target detection and tracking method and system

Technical Field

The invention relates to small target detection and tracking in a real-time image, in particular to a man-machine hybrid enhanced small target detection and tracking method and system.

Background

Currently, real-time and high-precision target detection is a key and difficult technology which must be faced and solved in the field of computer vision. In recent years, research and technical development of a neural network and a deep learning technology in the field of computer vision are rapid, particularly for a target detection task, an end-to-end real-time target detection neural network model is realized, and the real-time performance and the precision of the neural network model far exceed those of a traditional target detection method. However, in the case of a small platform with limited computational power, such as a wearable device, in the case of a floor-mounted application in the actual military field, a small-scale target cannot be effectively detected after a lightweight process is performed by a conventional deep network model.

In an actual wearable device application scene, after a current deep learning network model is subjected to lightweight processing and adapts to the calculation force of the wearable device, the performance of detection and identification of small targets cannot meet the requirements of a large number of actual scenes.

The existing small target detection enhancement means mainly focuses on the improvement of a deep learning network model, and the means generally adopted can be comprehensively summarized into two types: aiming at the training process of a deep learning network model, various data enhancement means are adopted, the input data set during training is directly subjected to means such as overturning, graying and the like, and the generalization capability of the model on target detection and identification is improved by means of enhancing the input image, so that the detection precision of the model on small targets is improved; secondly, aiming at the structure of the deep learning network model, technical means of enhancing a main body (backbone) part of the model, adding target context information characteristic fusion and the like are adopted, and the physical sign extraction and detection capability of the model is enhanced. Data enhancement algorithms can be broadly divided into two categories: the first type is data enhancement based on a basic image processing technology, the second type is a data enhancement algorithm based on deep learning, and the accuracy of the model is improved by enhancing small target information; by using an FPN (feature Pyramid) context information feature fusion algorithm, feature images between different scales are fused to a certain extent, so that the feature images can have the characteristics of large information amount of low-level images and strong semantics of high-level images, and are better used for small target detection. However, these methods simply consider the deep learning network model itself, and never jointly apply the user, the deep learning target detection model and other device enhancement means to fully exploit the capability of detecting and tracking small targets.

The method for using the deep learning target detection model for the small target at present has the following problems:

(1) aiming at the condition that the computing capacity of the embedded device on the wearable device on the neural network model is limited, the use of an enhanced model backbone (backbone) part to enhance the capability of the model on detecting the small target is directly limited, and the improvement of the target detection model on the small target can only focus on the aspect of data enhancement under the common capability of the embedded device. Meanwhile, data enhancement generally considers enhancing an integral data set, and has great limitation on improving the detection capability of a model small target;

(2) the existing capability of improving the detection of the small target is generally only directed at the structure of a deep learning network model (for example, adding an FPN context feature fusion part, etc.), and the capability of other auxiliary devices (multi-focus sensors, etc.) is not considered, so that the capability is limited to the capability of the model itself, and the improvement on the detection capability of the small target is limited.

Disclosure of Invention

The invention aims to provide a man-machine hybrid enhanced small target detection and tracking method and system, which improve the detection capability of an integral model on small targets in a multi-aspect technical means combined mode.

The technical scheme for realizing the purpose of the invention is as follows:

a man-machine hybrid enhanced small target detection and tracking method comprises the following steps:

(1) positioning the position of a target area, and acquiring a small target area image;

(2) extracting specific image information of the small target area image to obtain a small target detail image;

(3) constructing a small target detection and tracking model: constructing a lightweight neural network target detection model, fusing context information, performing model training by combining a local loss function and a transfer learning method, and combining a target tracking model to obtain a small target detection and tracking model, wherein a small target detail image obtains new image information through the small target detection and tracking model;

(4) circulating the steps (1) to (3), taking the obtained new image information as a training set of the small target detection and tracking model, and performing migration training on the small target detection and tracking model by using the updated training set until convergence to obtain a small target detection and tracking enhancement model;

(5) and (3) executing the step (1) and the step (2) to obtain a small target detail image, and inputting the small target detail image into a small target detection and tracking enhancement model to detect and track the small target.

Further, the step (1) is specifically as follows:

identifying a pupil center and a cornea reflection center on an eye image by using an image processing algorithm, extracting a pupil and a cornea in a shot image by using a bright pupil and dark pupil principle and a processor, taking the cornea reflection center as a base point of a relative position of an eye tracking camera and an eyeball, and representing a staring point position by using a pupil center position coordinate; then, determining the sight line direction by using the relative positions of the light spots and the pupils; and according to the sight direction, obtaining the center of the fixation point of the observer by using a pupil corneal reflection technology, and performing area expansion of a pixel range 32 x 32 by using the center of the fixation point to obtain a small target area image.

Further, the determining the gaze direction by the relative position specifically includes: the relative position determines the direction of the line of sight through a line of sight mapping function model, which is as follows:

wherein (P)_x,P_y) As the coordinates of the fixation point, (V)_x,V_y) Is the pupil reflected spot vector.

Further, the constructing of the lightweight neural network target detection model in the step (3) specifically includes:

the model comprises a lightweight deep separable backbone network Mobilene and SSD single-stage target detection algorithm; the method comprises the steps that after a lightweight depth separable trunk network Mobilene is matched with an SSD single-stage target detection algorithm to obtain feature maps with different scales from an input small target region-of-interest image through a MobileNet neural network, dense sampling is conducted on the feature maps to generate a plurality of prior frames, then the prior frames are subjected to deviation to obtain prediction frames, then objects in the prediction frames are classified, then non-maximum value suppression operation is conducted, and finally a convolution method is adopted to obtain detection values.

Further, the lightweight depth separable backbone network mobilelen is constructed by using a depth separable convolution method; the depth separable convolution method comprises depthwise convolution and pointwise convolution; the depth separable convolution method firstly adopts depthwise convolution to respectively convolve different input channels, and then adopts pointwise convolution to combine the convolved outputs.

Further, the fusing context information in the step (3) specifically includes: adding an FPN context information feature fusion algorithm to the lightweight neural network target detection model, wherein the algorithm adopts a top-down path to fuse feature images with different scales, and semantically fusing the feature pyramid space on the upper layer to the feature pyramid space on the lower layer.

Further, the focal loss function formula in step (3) is as follows:

FL(p_t)＝-α_t(1-p_t)^γlog(p_t) (2)

wherein the focus parameter

y ∈ {1,0} represents a true class, p ∈ [0,1 ]]Is the probability value predicted by the model when the category is y class, and is alpha epsilon [0,1]Is a weighting factor for the class y.

Further, the target tracking model is constructed based on a kalman filtering algorithm: firstly, establishing a state equation based on a Kalman filtering cooperative algorithm, then taking a small target detail image as the input of the state equation, and adjusting the parameters of the state equation according to the output result of the state equation.

A man-machine hybrid enhanced small target detection and tracking system comprises an acquisition and identification module, a sensor module, a detection and tracking module and a communication module; wherein:

the acquisition and identification module is used for acquiring eye images, identifying and positioning the positions of target areas of the images and acquiring small target interesting area images;

the sensor module is used for extracting specific image information of the small target region-of-interest image to obtain a small target detail image;

the target tracking module comprises a small target detection and tracking enhancement model, and the small target is detected and tracked through specific image information of a target region-of-interest image;

the communication module is used for information communication among the modules.

Further, the sensor module is a plurality of image sensors with different focal lengths or a zoom image sensor, and the small target detail image pixels are larger than 300 × 300.

Compared with the prior art, the invention has the beneficial effects that: (1) according to the small target detection and tracking method, context information is fused by adopting a lightweight neural network target detection model, the local loss function and the transfer learning method are combined for model training, and then a target tracking model is combined to obtain a small target detection and tracking model, so that the detection capability of the small target is improved; (2) the output of the small target detection and tracking model is used as a training set, and the small target detection and tracking model is subjected to transfer training to obtain a small target detection and tracking enhancement model, so that the accuracy of small target detection and tracking in the image is improved; (3) the invention can be applied to other practical scenes and popularized to the fields of fire-fighting wearable equipment, public security training and the like.

Drawings

FIG. 1 is a flow chart of a man-machine hybrid enhanced small target detection and tracking method of the present invention.

FIG. 2 is a flow chart of constructing a lightweight neural network target detection model in the present invention.

Fig. 3 is a schematic view of a scene application of the eye movement attention tracking technology in combination with a small target detection and tracking model in the present invention.

Detailed Description

The invention provides a man-machine hybrid enhanced small target detection and tracking method, and the following describes the embodiment of the invention in detail with reference to the accompanying drawings.

As shown in fig. 1, a man-machine hybrid enhanced small target detection and tracking method includes the following steps:

(1) positioning a local region position interested by the observer based on an eye movement attention tracking device on the wearable device to obtain a small target region-of-interest image;

the concrete implementation steps of the step (1) are as follows:

an "eye movement" attention tracking device on a wearable device uses image processing algorithms to identify two key locations on each image, the pupil center and the corneal reflection center, sent by an eye tracking camera on the wearable device. The corneal reflection point is a point at which light from a fixed light source (infrared illuminator) is reflected back on the cornea. The reflected image of the infrared light irradiated on the eyes is shot by a camera, the pupil and the cornea in the shot image are extracted by processing through computing hardware (a processor) on wearable equipment by utilizing the principle of bright pupil and dark pupil, the corneal reflection point is used as a base point of the relative position of the camera and the eyeball, and the pupil center position coordinate represents the position of a gaze point. The relative position of the light spot and the pupil can obviously change along with the rotation of the eyeball, and then the sight line direction is determined by using the relative position. The relative position sight line determining method determines the sight line direction through a sight line mapping function model, two-dimensional eye movement characteristics extracted from an eye image are used as independent variables of a mapping function to be input, and a dependent variable of the function is the solved sight line direction or a fixation point. In order to obtain the line-of-sight mapping function, each user needs to be calibrated online. The sight line mapping function model is expressed by the following formula:

wherein, P_x、P_yAs coordinates of the gaze landing point (gaze point coordinates), V_x、V_yIs the coordinate of the pupil reflected spot vector.

The pupil corneal reflection technology can be used for obtaining the center of the individual fixation point, the obtained attention center is used for expanding the 32 x 32 area in the pixel range, and the position of the interested area where the small target is located, namely the image of the interested area of the small target, is obtained.

(2) Extracting specific image information of a small target region-of-interest image by using a multi-focus sensor on wearable equipment to obtain a small target detail image;

the concrete implementation steps of the step (2) are as follows:

the image sensors with different focal lengths on the wearable device are used or the zooming image sensors are directly adopted, so that the local area image where the center of the individual fixation point is located is extracted by more than 300 x 300 pixels, the local area image can be better utilized by a lightweight neural network model subsequently, and the situation that the pixels of the local area image are too low and the deep learning model cannot effectively extract features is avoided. The zoom sensor is combined with the lens with different focal sections or directly adopted, so that the local characteristics of more details can be obtained, and the detection of the model on the small target is facilitated.

(3) Constructing a small target detection and tracking model based on wearable equipment, designing a lightweight neural network target detection model, fusing a context information technology, performing model training by combining a local loss function and a transfer learning means, and combining a target tracking technology based on a Kalman filtering algorithm to obtain the small target detection and tracking model, wherein a small target detail image obtains new image information through the small target detection and tracking model, and the new image information is used for optimizing the detection and tracking capability of the detection and tracking model on the small target;

as shown in fig. 2, the specific implementation steps of step (3) are as follows:

1) after a detailed image of an area where a small target is located is obtained, an overall model of a lightweight neural network target detection model based on depth separable convolution is designed, and the model comprises a lightweight depth separable trunk network Mobilene matched with an SSD single-stage target detection algorithm. MobileNet uses a deep separable convolution method to build lightweight deep neural networks. The depth separable convolution can be decomposed into two smaller operations: depthwise restriction and pointwise restriction. For depthwise partial convolution, firstly, adopting depthwise convolution to respectively convolve different input channels, and then adopting pointwise convolution to combine the outputs of the depthwise convolution after the depthwise convolution is respectively convolved on the different input channels; the overall structure of the lightweight depth separable trunk network Mobilenet matched with the SSD single-stage target detection algorithm can be described as that after feature maps of different scales are obtained for an input image by using a Mobilenet neural network, dense sampling is performed on the feature maps to generate a plurality of Anchors (prior frames), the prior frames are shifted to obtain Bounding boxes (prediction frames), objects in the prediction frames are classified, then non-maximum suppression (NMS for short) operation is performed, and finally, a detection value is directly obtained by using a convolution mode.

2) A multi-scale feature map prediction mode is adopted for a lightweight neural network target detection model with depth separable convolution, an FPN (feature Pyramid) context information feature fusion algorithm is added to the obtained multi-scale feature map information, and feature images among a plurality of different scales are fused to a certain extent. The FPN feature fusion algorithm adopts a top-down path to perform feature image fusion, and fuses the semantics obtained from the upper feature pyramid space to the lower feature pyramid space, so that the features of high information content of low-level images and strong semantics of high-level images can be achieved between feature images.

3) In the process of generating the candidate frames by the network, the number of the candidate frames without the targets is far more than that of the candidate frames with the targets, and in order to balance the problem of unbalance of positive and negative samples, the original network classification loss function is replaced by a focal loss function which is based on a cross entropy loss function, so that the cross entropy loss function (CE) of the second classification is introduced firstly:

in the above formula y ∈ {1,0} represents the true class, p ∈ [0,1 ]]Is the probability value predicted by the model for the category y class. For convenience of notation, define p_tThe following were used:

the method for balancing the proportion unevenness of positive and negative samples is to add a weight factor alpha, which belongs to [0,1 ]]. For class 1 is α and for class 0 is (1- α), for ease of representation, for α_tCan adopt a sum of p_tAre defined in the same way, i.e.

Adding (1-p) to the original cross entropy loss function_t)^γAn adjustment factor, wherein the focusing parameter γ > 0. The focal loss function is thus formulated as follows:

FL(p_t)＝-α_t(1-p_t)^γlog(p_t) (4)

the focal loss function is based on the cross entropy loss function, and solves the problems of unbalance of positive and negative samples and classification difficulty difference in the classification problem of the target detection network classifier. And training the lightweight target detection network after the overall lightweight target detection model is completed, and performing migration training on target data sets possibly encountered in an actual battlefield by matching with COCO data set pre-training.

4) In order to solve the problem that the same small target cannot be continuously detected after the small target is detected by a neural network target detection network, a target tracking technology based on a Kalman filtering cooperative algorithm is adopted, so that target tracking can be maintained after the small target is detected by eye movement assistance. A state equation is established by a target tracking technology based on a Kalman filtering cooperation algorithm, and a small target detail image of target detection is used as state input to optimize equation parameters. By inputting the previous n frames of data, the position of the target in the n-th frame can be effectively predicted. Therefore, in the target tracking process, when the target is blocked or disappears, the Kalman filtering cooperative algorithm is added to effectively solve the problem.

(4) And acquiring new image information by using the steps as a training set of the small target detection and tracking model, and performing migration training on the small target detection and tracking model by using the updated training set until convergence to obtain the small target detection and tracking enhancement model.

The concrete implementation steps of the step (4) are as follows:

with reference to fig. 3, the eye movement attention tracking technology on the wearable device in step (1) is positioned at the interested local area position of the observer, the specific image information of the interested area is extracted through the multi-focal segment sensor in step (2), the image is input into the target detection and tracking model in step (3) to perform small target detection and tracking, new environment image information is continuously collected by circulating the steps to enrich the training set of the lightweight neural network model (target detection and tracking model), the lightweight neural network model is retrained by using the updated training set, the detection and tracking capability of the model on the small target is continuously optimized, and finally the small target detection and tracking enhanced model based on the man-machine hybrid technology is obtained.

(5) And inputting the acquired small target detail image into a small target detection and tracking enhancement model to realize the detection and tracking of the small target.

the sensor module is used for extracting specific image information of a small target region-of-interest image to obtain a small target detail image, the sensor module is a plurality of image sensors with different focal lengths or a zoom image sensor, and the pixel of the small target detail image is more than 300 x 300;

For specific limitations of the human-computer hybrid enhanced small target detection and tracking system, reference may be made to the above limitations of the small target detection and tracking method, which are not described herein again.

While the foregoing has been with reference to the disclosure of the present invention, it will be appreciated by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined by the appended claims. Therefore, the protection scope of the present invention should not be limited to the disclosure of the embodiments, but should include various alternatives and modifications without departing from the invention, which are covered by the claims of the present patent application.

Claims

1. A man-machine hybrid enhanced small target detection and tracking method is characterized by comprising the following steps:

2. The human-computer hybrid enhanced small target detection and tracking method according to claim 1, wherein the step (1) is specifically:

3. The human-computer hybrid enhanced small target detection and tracking method according to claim 2, wherein the relative position determination gaze direction is specifically: the relative position determines the direction of the line of sight through a line of sight mapping function model, which is as follows:

4. The human-computer hybrid enhanced small target detection and tracking method according to claim 1, wherein the constructing of the lightweight neural network target detection model in the step (3) specifically comprises:

5. The human-computer hybrid enhanced small target detection and tracking method according to claim 4, wherein the lightweight depth separable backbone network Mobilene is constructed using a depth separable convolution method; the depth separable convolution method comprises depthwise convolution and pointwise convolution; the depth separable convolution method firstly adopts depthwise convolution to respectively convolve different input channels, and then adopts pointwise convolution to combine the convolved outputs.

6. The human-computer hybrid enhanced small target detection and tracking method according to claim 4, wherein the fusion context information in the step (3) is specifically: adding an FPN context information feature fusion algorithm to the lightweight neural network target detection model, wherein the algorithm adopts a top-down path to fuse feature images with different scales, and semantically fusing the feature pyramid space on the upper layer to the feature pyramid space on the lower layer.

7. The human-computer hybrid enhanced small object detection and tracking method according to claim 1, wherein the focal loss function formula in step (3) is as follows:

FL(p_t)＝-α_t(1-p_t)^γlog(p_t) (2)

wherein the focusing parameter gamma is more than 0,

8. The human-computer hybrid enhanced small target detection and tracking method according to claim 1, wherein the target tracking model is constructed based on a Kalman filtering algorithm: firstly, establishing a state equation based on a Kalman filtering cooperative algorithm, then taking a small target detail image as the input of the state equation, and adjusting the parameters of the state equation according to the output result of the state equation.

9. A man-machine hybrid enhanced small target detection and tracking system is characterized by comprising an acquisition and identification module, a sensor module, a detection and tracking module and a communication module; wherein:

10. The human-computer hybrid enhanced small target detection and tracking method according to claim 9, wherein the sensor module is a plurality of image sensors with different focal lengths or a zoom image sensor, and the small target detail image pixels are larger than 300 x 300.